Philipp Reisner [Mon, 20 Feb 2012 20:53:28 +0000 (21:53 +0100)]
drbd: New disk option al-updates
By disabling al-updates one might increase performace. The price for
that is that in case a crashed primary (that had al-updates disabled)
is reintegraded, it will receive a full-resync instead of a bitmap
based resync.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Mon, 30 Apr 2012 10:53:52 +0000 (12:53 +0200)]
drbd: use bitmap_parse instead of __bitmap_parse
The buffer 'sc.cpu_mask' is a kernel buffer. If bitmap_parse is used
instead of __bitmap_parse the extra parameter that indicates a kernel
buffer is not needed.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
If bm_page_async_io is advised to use a new page for I/O
(BM_AIO_COPY_PAGES is set), it will get it from a mempool.
Once the mempool has to dip into its reserves the page is
not reinitialized, i.e. page->private contains garbage, which
will lead to various problems once the I/O completes (dereferences
of NULL pointers, the submitting thread getting stuck in D-state,
...).
Signed-off-by: Arne Redlich <arne.redlich@googlemail.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Mon, 7 May 2012 10:07:18 +0000 (12:07 +0200)]
drbd: allow bitmap to change during writeout from resync_finished
Symptom: messages similar to
"FIXME asender in bm_change_bits_to,
bitmap locked for 'write from resync_finished' by worker"
If a resync or verify is finished (or aborted), a full bitmap writeout
is triggered. If we have ongoing local IO, the bitmap may still change
during that writeout, pending and not yet processed acks may cause bits
to be cleared, while new writes may cause bits to be to be set.
To fix this, introduce the drbd_bm_write_copy_pages() variant.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Mon, 7 May 2012 10:00:56 +0000 (12:00 +0200)]
drbd: fix race between drbdadm invalidate/verify and finishing resync
When a resync or online verify is finished or aborted,
drbd does a bulk write-out of changed bitmap pages.
If *in that very moment* a new verify or resync is triggered,
this can race:
ASSERT( !test_bit(BITMAP_IO, &mdev->flags) ) in drbd_main.c
FIXME going to queue 'set_n_write from StartingSync' but 'write from resync_finished' still pending?
and similar.
This can be observed with e.g. tight invalidate loops in test scripts,
and probably has no real-life implication.
Still, that race can be solved by first quiescen the device,
before starting a new resync or verify.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Fri, 6 Apr 2012 10:08:51 +0000 (12:08 +0200)]
drbd: Ensure that data_size is not 0 before using data_size-1 as index
This could be exploited by a peer which runs modified code.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Fri, 30 Mar 2012 12:12:15 +0000 (14:12 +0200)]
drbd: Fixed processing of disk-barrier, disk-flushes and disk-drain
Since drbd_bump_write_ordering() is called in the attaching
process while the disk state is D_ATTACHING, it was not
considering these three flags during attach.
A call to this function was missing form drbd_adm_disk_opts().
Fixed both issues.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Mon, 26 Mar 2012 18:12:24 +0000 (20:12 +0200)]
drbd: complete_conflicting_writes() should not care about connections
complete_conflicting_writes() should not cause -EIO.
It should not timeout either, or care for connection states.
Connection timeout is detected elsewhere, and it's cleanup path is
supposed to remove any pending requests or peer_requests from the
write_requests tree.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Mon, 26 Mar 2012 15:29:30 +0000 (17:29 +0200)]
drbd: simplify retry path of failed READ requests
If a local or remote READ request fails, just push it back to the retry
workqueue. It will re-enter __drbd_make_request, and be re-assigned to
a suitable local or remote path, or failed, if we do not have access to
good data anymore.
This obsoletes w_read_retry_remote(),
and eliminates two goto...retry blocks in __req_mod()
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Mon, 26 Mar 2012 15:02:45 +0000 (17:02 +0200)]
drbd: factor out master_bio completion and drbd_request destruction paths
In preparation for multiple connections and reference counting,
separate the code paths for completion of the master bio
and destruction of the request object.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Thu, 24 Nov 2011 09:36:25 +0000 (10:36 +0100)]
drbd: fix potential deadlock during "restart" of conflicting writes
w_restart_write(), run from worker context, calls __drbd_make_request()
and further drbd_al_begin_io(, delegate=true), which then
potentially deadlocks. The previous patch moved a BUG_ON to expose
such call paths, which would now be triggered.
Also, if we call __drbd_make_request() from resource worker context,
like w_restart_write() did, and that should block for whatever reason
(!drbd_state_is_stable(), resource suspended, ...),
we potentially deadlock the whole resource, as the worker
is needed for state changes and other things.
Create a dedicated retry workqueue for this instead.
Also make sure that inc_ap_bio()/dec_ap_bio() are properly paired,
even if do_retry() needs to retry itself,
in case __drbd_make_request() returns != 0.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Thu, 23 Feb 2012 11:52:31 +0000 (12:52 +0100)]
drbd: Fix module refcount leak in drbd_accept()
drbd_accept was modelled after kernel_accept
with drbd commit 53eb779 in July 2008.
Only, kernel_accept was then broken, and only fixed later
with kernel commit 1b08534e in Dec 2008:
net: Fix module refcount leak in kernel_accept()
Impact: protocol families provided as modules, e.g. ipv6 or ib_sdp,
would soon have their reference count become negative, preventing
them from being unloaded (likely), or worse, hit zero without actually
being unused, allowing them to be unloaded while still in use (unlikely,
but if triggered, causing a kernel crash).
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Wed, 22 Feb 2012 10:51:57 +0000 (11:51 +0100)]
drbd: Consider the disk-timeout also for meta-data IO operations
If the backing device is already frozen during attach, we failed
to recognize that. The current disk-timeout code works on top
of the drbd_request objects. During attach we do not allow IO
and therefore never generate a drbd_request object but block
before that in drbd_make_request().
This patch adds the timeout to all drbd_md_sync_page_io().
Before this patch we used to go from D_ATTACHING directly
to D_DISKLESS if IO failed during attach. We can no longer
do this since we have to stay in D_FAILED until all IO
ops issued to the backing device returned.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Tue, 14 Feb 2012 11:12:35 +0000 (12:12 +0100)]
drbd: Reinstate disabling AL updates with invalidate-remote
Commit d0ef827e (drbd: switch configuration interface from connector to
genetlink) introduced a regression by removing the ability to set all
bits in the out of sync bitmap and to suspend updates to the activity log
of a disconnected device via the invalidate-remote management call.
Credits for reporting the issue are going to Arne Redlich.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Fri, 20 Jan 2012 12:52:27 +0000 (13:52 +0100)]
drbd: Fixed compat issue with disconnecting 8.4 from a primary 8.3
For compatibility reasons 8.4 has to send P_STATE_CHG_REQ (instead
of P_CONN_ST_CHG_REQ) when disconnecting.
In the receiving code path we missed to convert the old
answer (P_STATE_CHG_REPLY) back to 8.4 logic. Therefore
the CL_ST_CHG_SUCCESS or CL_ST_CHG_FAIL bit in the flags word
of mdev got set, while the state code was waiting for
the CONN_WD_ST_CHG_OKAY or CONN_WD_ST_CHG_FAIL bits in tconn.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Tue, 20 Dec 2011 10:49:58 +0000 (11:49 +0100)]
drbd: restart loop in drbd_make_request() [prepare for Linux-3.2]
With Linux-3.2 generic_make_request() will no longer loop over
the request function until it finally returns 0. Move this
loop into our drbd_make_request() function.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Mon, 19 Dec 2011 21:42:56 +0000 (22:42 +0100)]
drbd: Restore late assigning of tconn->data.sock and meta.sock
With commit from Mon Mar 28 16:33:12 2011 +0200
"drbd: drbd_connect(): Initialize struct drbd_socket before sending anything"
tconn->data.sock and tconn->meta.sock get assigned early, in
conn_connect.
The early assigning can trigger an OOPS, because it may released the socket
without acquiring the mutex protecting the socket. An other thread (worker)
might use setsockopt() on the socket while it gets free()ed.
Restored the (proven) 8.3 behavior of assigning these sockets after the two
connections are established.
Credits for reporting the issue are going to Arne Redlich.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Tue, 13 Dec 2011 17:32:18 +0000 (18:32 +0100)]
drbd: Do not send state packets while lower than C_CONNECTED cstate
I.e. in C_WF_REPORT_PARAMS or in C_WF_CONNECTION.
Sending may already work in these cstates, but the peer still expects
the HandShake / ConnectionFeatures packet.
Actually triggered by the Testuite on kugel.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Tue, 13 Dec 2011 10:09:16 +0000 (11:09 +0100)]
drbd: fix race between disconnect and receive_state
If the asender thread, or request_timer_fn(), or some other part of
the code, decided to drop the connection (because of timeout or other),
but the receiver just now was processing a P_STATE packet, there was a
chance that receive_state() would do a hard state change
"re-establishing" an already failed connection without additional handshake.
Impact:
while the connection state is erroneously "Connected",
requests may be queued and even sent,
which would never be acknowledged,
and may have been missed by the cleanup.
These requests would never be completed.
The next drbd_suspend_io() will then lock up,
waiting forever for these requests to complete.
Fixed in several code paths:
Make sure the connection state is NetworkFailure or worse
before starting the cleanup in drbd_disconnect().
This should make sure the cleanup won't miss any requests.
Disallow receive_state() to "upgrade" the connection state
from an error state. This will make sure the "illegal" state
transition won't happen.
For all connection failure states,
relax the safe-guard in sanitize_state() again
to silently mask out those state changes
(e.g. Timeout -> Connected becomes Timeout -> Timeout).
Note by Philipp Reisner:
The 3rd chunk described as "relax the safe-guard..."
is not there in 8.4 as it is relaxed to the maximum in
8.4 already
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Thu, 17 Nov 2011 13:32:12 +0000 (14:32 +0100)]
drbd: add missing rcu locks around recently introduced idr_for_each
Recent commit
drbd: Move write_ordering from mdev to tconn
introduced a new idr_for_each loop over all volumes,
but did not take necessary rcu locks or krefs.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Thu, 17 Nov 2011 09:11:47 +0000 (10:11 +0100)]
drbd: fix potential spinlock deadlock
drbd_try_clear_on_disk_bm() has a sanity check for the number of blocks
left to be resynced (rs_left) in the current resync extent.
If it detects a mismatch, it complains, and forces a disconnect using
drbd_force_state(mdev, NS(conn, C_DISCONNECTING));
Unfortunately, this may be called while holding the req_lock,
and drbd_force_state() want's to aquire that lock itself. Deadlock.
Don't force a disconnect, but fix up rs_left by recounting and
reassigning the number of dirty blocks in that extent.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Thu, 10 Nov 2011 13:56:07 +0000 (14:56 +0100)]
drbd: Prepare epochs per connection
An epoch object needs a pointer to the mdev it was received for.
This is necessary to be able to send the barrier ack packet for
the same volume as the original barrier packet was assigned to.
This prepares the next step, in which the (receiver side)
epoch list is moved from the device (mdev) to the connection (tconn)
object.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Thu, 10 Nov 2011 17:45:36 +0000 (18:45 +0100)]
drbd: Move the CREATE_BARRIER flag from connection to device
That is necessary since the whole transfer log is per connection(tconn)
and not per device(mdev).
This bug caused list corruption on the worker list. When a barrier is queued
for sending in the context of one device, another device did not see the
CREATE_BARRIER bit, and queued the same object again -> list corruption.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Thu, 10 Nov 2011 11:12:52 +0000 (12:12 +0100)]
drbd: Fixes from the drbd-8.3 branch
* drbd-8.3:
drbd: fix spurious meta data IO "error"
drbd: Fixed a race condition between detach and start of resync
drbd: fix harmless race to not trigger an ASSERT
drbd: Derive sync-UUIDs only from the bitmap-uuid if it is non-zero
drbd: Fixed current UUID generation (regression introduced recently, after 8.3.11)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Thu, 29 Sep 2011 11:00:14 +0000 (13:00 +0200)]
drbd: fix "stalled" empty resync
With sync-after dependencies, given "lucky" timing of pause/unpause
events, and the end of an empty (0 bits set) resync was sometimes not
detected on the SyncTarget, leading to a "stalled" SyncSource state.
Fixed this by expecting not only "Inconsistent -> UpToDate" but also
"Consistent -> UpToDate" transitions for the peer disk state
to end a resync.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Fri, 19 Aug 2011 08:39:00 +0000 (10:39 +0200)]
drbd: fix connect failure with all default net-options
If no net-options are configured (all on their default),
no DRBD_NLA_NET_CONF will be passed to the kernel.
The kernel must not require its presence,
there is no required option in there.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Mon, 18 Jul 2011 09:09:17 +0000 (11:09 +0200)]
drbd: The minor_count module parameter is only a hint nowadays
* The max of minor_count is 255
* In drbdadm count the number of minors, instead of finding
the highest minor number
* No longer us the magic in the init script
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Thu, 8 Nov 2012 14:04:36 +0000 (15:04 +0100)]
drbd: Bugfix for the connection behavior
If we get into the C_BROKEN_PIPE cstate once, the state engine set the
thi->t_state of the receiver thread to restarting. But with the while loop
in drbdd_init() a new connection gets established. After the call into
drbdd() returns immediately since the thi->t_state is not RUNNING. The
restart of drbd_init() then resets thi->t_state to RUNNING.
I.e. after entering C_BROKEN_PIPE once, the next successful established
connection gets wasted.
The two parts of the fix:
* Do not cause the thread to restart if we detect the issue
with the sockets while we are in C_WF_CONNECTION.
* Make sure that all actions that would have set us to C_BROKEN_PIPE
happen before the state change to C_WF_REPORT_PARAMS.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The last data-integrity-alg fix made data integrity checking work when the
algorithm was changed for an established connection, but the common case of
configuring the algorithm before connecting was still broken. Fix that.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Fri, 15 Jul 2011 11:53:06 +0000 (13:53 +0200)]
drbd: Do not mod_timer() with a past time
In case we can not find out why the request takes too long
(happens e.g. when IO got suspended on DRBD level). rearm
the timer with a reasonable value.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
drbd: Allow to create devices with a minor number > minor_count
The minor_count module/kernel parameter serves to scale the size of drbd's
internal memory pool, but it is no longer a limit for the number of minors or
the minor number. (Minor numbers can be arbitrarily high within the allowed
limit of 2^20.)
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Wed, 13 Jul 2011 08:24:51 +0000 (10:24 +0200)]
drbd: Changed some defaults
* Enabled the resync controller, with a fill target of 50Kib. That gives
reasonable resync speeds without tuning. A much better default than
the 250KiB/s fixed.
* Enable bitmap compression. It is save to use, and most people have
more CPU power than network bandwidth.
* ko-count of 7: Abort a connection if the peer fails to process a
write request within 42 seconds.
* al-extents of 1237: ~5 GiB seems to be a much more sane default
these days.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Mon, 11 Jul 2011 21:49:55 +0000 (23:49 +0200)]
drbd: report net config even for resources without a single volume
Currently it is legal (though unusual) to create and connect a resource,
before adding in all necessary volumes. We should include the network
configuration details, even if we don't have a single volume (yet).
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Wed, 6 Jul 2011 21:04:44 +0000 (23:04 +0200)]
drbd: Fixed removal of volumes/devices from connected resources
When removing a volume/device we need to switch the connection
status of the peer back into WFReportParams.
Before this fix it was left in Connected state. That means that
the peer device continued to inform us about state changes, etc...
But we deleted that minor -> protocol error.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>