John Sheu [Tue, 18 Mar 2014 06:13:56 +0000 (23:13 -0700)]
bcache: remove nested function usage
Uninlined nested functions can cause crashes when using ftrace, as they don't
follow the normal calling convention and confuse the ftrace function graph
tracer as it examines the stack.
Also, nested functions are supported as a gcc extension, but may fail on other
compilers (e.g. llvm).
Kent Overstreet [Fri, 28 Feb 2014 01:51:12 +0000 (17:51 -0800)]
bcache: Kill bucket->gc_gen
gc_gen was a temporary used to recalculate last_gc, but since we only need
bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update
last_gc directly.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Mon, 17 Mar 2014 23:55:55 +0000 (16:55 -0700)]
bcache: Kill unused freelist
This was originally added as at optimization that for various reasons isn't
needed anymore, but it does add a lot of nasty corner cases (and it was
responsible for some recently fixed bugs). Just get rid of it now.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Tue, 18 Mar 2014 00:15:53 +0000 (17:15 -0700)]
bcache: Rework btree cache reserve handling
This changes the bucket allocation reserves to use _real_ reserves - separate
freelists - instead of watermarks, which if nothing else makes the current code
saner to reason about and is going to be important in the future when we add
support for multiple btrees.
It also adds btree_check_reserve(), which checks (and locks) the reserves for
both bucket allocation and memory allocation for btree nodes; the old code just
kinda sorta assumed that since (e.g. for btree node splits) it had the root
locked and that meant no other threads could try to make use of the same
reserve; this technically should have been ok for memory allocation (we should
always have a reserve for memory allocation (the btree node cache is used as a
reserve and we preallocate it)), but multiple btrees will mean that locking the
root won't be sufficient anymore, and for the bucket allocation reserve it was
technically possible for the old code to deadlock.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 23 Jan 2014 09:44:55 +0000 (01:44 -0800)]
bcache: Kill btree_io_wq
With the locking rework in the last patch, this shouldn't be needed anymore -
btree_node_write_work() only takes b->write_lock which is never held for very
long.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Wed, 5 Mar 2014 00:42:42 +0000 (16:42 -0800)]
bcache: btree locking rework
Add a new lock, b->write_lock, which is required to actually modify - or write -
a btree node; this lock is only held for short durations.
This means we can write out a btree node without taking b->lock, which _is_ held
for long durations - solving a deadlock when btree_flush_write() (from the
journalling code) is called with a btree node locked.
Right now just occurs in bch_btree_set_root(), but with an upcoming journalling
rework is going to happen a lot more.
This also turns b->lock is now more of a read/intent lock instead of a
read/write lock - but not completely, since it still blocks readers. May turn it
into a real intent lock at some point in the future.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Tue, 18 Mar 2014 01:22:34 +0000 (18:22 -0700)]
bcache: Fix a race when freeing btree nodes
This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the
bucket on the unused freelist, where it can be reused right away without any
ordering requirements. It would be better to wait on at least a journal write to
go down before reusing the bucket. bch_btree_set_root() does this, and inserting
into non leaf nodes is completely synchronous so we should be ok, but future
patches are just going to get rid of the unused freelist - it was needed in the
past for various reasons but shouldn't be anymore.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Nicholas Swenson [Fri, 10 Jan 2014 00:03:04 +0000 (16:03 -0800)]
bcache: Fix moving_gc deadlocking with a foreground write
Deadlock happened because a foreground write slept, waiting for a bucket
to be allocated. Normally the gc would mark buckets available for invalidation.
But the moving_gc was stuck waiting for outstanding writes to complete.
These writes used the bcache_wq, the same queue foreground writes used.
This fix gives moving_gc its own work queue, so it was still finish moving
even if foreground writes are stuck waiting for allocation. It also makes
work queue a parameter to the data_insert path, so moving_gc can use its
workqueue for writes.
Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Mon, 17 Mar 2014 22:13:26 +0000 (15:13 -0700)]
bcache: Fix another bug recovering from unclean shutdown
The on disk bucket gens are allowed to be out of date, when we reuse buckets
that didn't have any live data in them. To deal with this, the initial gc has to
update the bucket gen when we find a pointer gen newer than the bucket's gen.
Unfortunately we weren't doing this for pointers in the journal that we're about
to replay.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Kent Overstreet [Thu, 13 Mar 2014 20:44:21 +0000 (13:44 -0700)]
bcache: Fix a journalling reclaim after recovery bug
On recovery we weren't correctly keeping track of what journal buckets had open
journal entries, thus it was possible for them to be overwritten until we'd
written all new journal entries.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Sam Bradshaw [Thu, 13 Mar 2014 21:33:30 +0000 (14:33 -0700)]
mtip32xx: mtip_async_complete() bug fixes
This patch fixes 2 issues in the fast completion path:
1) Possible double completions / double dma_unmap_sg() calls due to lack
of atomicity in the check and subsequent dereference of the upper layer
callback function. Fixed with cmpxchg before unmap and callback.
2) Regression in unaligned IO constraining workaround for p420m devices.
Fixed by checking if IO is unaligned and using proper semaphore if so.
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
nvme: Use pci_enable_msi_range() and pci_enable_msix_range()
As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() or pci_enable_msi_exact()
and pci_enable_msix_range() or pci_enable_msix_exact()
interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
cciss: Fallback to MSI rather than to INTx if MSI-X failed
Currently the driver falls back to INTx mode when MSI-X
initialization failed. This is a suboptimal behaviour
for chips that also support MSI. This update changes that
behaviour and falls back to MSI mode in case MSI-X mode
initialization failed.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Mike Miller <mike.miller@hp.com> Cc: iss_storagedev@hp.com Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
Arnd Bergmann [Wed, 26 Feb 2014 11:01:44 +0000 (12:01 +0100)]
swim3: fix interruptible_sleep_on race
interruptible_sleep_on is racy and going away. This replaces the one
caller in the swim3 driver with the equivalent race-free
wait_event_interruptible call. Since we're here already, this
also fixes the case where we get interrupted from atomic context,
which used to just spin in the loop.
Arnd Bergmann [Wed, 26 Feb 2014 11:01:41 +0000 (12:01 +0100)]
ataflop: fix sleep_on races
sleep_on() is inherently racy, and has been deprecated for a long time.
This fixes two instances in the atari floppy driver:
* fdc_wait/fdc_busy becomes an open-coded mutex. We cannot use the
regular mutex since it gets released in interrupt context. The
open-coded version using wait_event() and cmpxchg() is equivalent
to the existing code but does the checks atomically, and we can
now safely check the condition with irqs enabled.
* format_wait becomes a completion, which is the natural structure
here. The format ioctl waits for the background task to either
complete or abort.
This does not attempt to fix the preexisting bug of calling schedule
with local interrupts disabled.
Arnd Bergmann [Wed, 26 Feb 2014 11:01:43 +0000 (12:01 +0100)]
DAC960: remove sleep_on usage
sleep_on and its variants are going away. The use of sleep_on() in
DAC960_V2_ExecuteUserCommand seems to be bogus because the command
by the time we get there, the command has completed already and
we just enter the timeout. Based on this interpretation, I concluded
that we can replace it with a simple msleep(1000) and rearrange the
code around it slightly.
The interruptible_sleep_on_timeout in DAC960_gam_ioctl seems equivalent
to the race-free version using wait_event_interruptible_timeout.
I left the driver to return -EINTR rather than -ERESTARTSYS to preserve
the timeout behavior.
mtip32xx: Use pci_enable_msi() instead of pci_enable_msi_range()
Commit "mtip32xx: Use pci_enable_msix_range() instead of
pci_enable_msix()" was unnecessary, since pci_enable_msi()
function is not deprecated and is still preferable for
enabling the single MSI mode. This update reverts usage of
pci_enable_msi() function.
Besides, the changelog for that commit was bogus, since
mtip32xx driver uses MSI interrupt, not MSI-X.
Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
Kent Overstreet [Thu, 16 Jan 2014 23:04:18 +0000 (15:04 -0800)]
bcache: Fix flash_dev_cache_miss() for real this time
The code was using sectors to count the number of sectors it was zeroing... but
then it passed it to bio_advance()... after it had been set to 0. Amusing...
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
skd: Use pci_enable_msix_range() instead of pci_enable_msix()
As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() and pci_enable_msix_range()
interfaces.
When enabling MSI-X interrupts fails due to lack of memory
the call to pci_disable_msix() is missed and the device is
left with MSI-X interrupts enabled while the driver assumes
otherwise. This update fixes the described misbehaviour and
cleans up the code of skd_release_msix() function.
When enabling MSI-X, interrupts are requested for SKD_MAX_MSIX_COUNT
entries in skdev->msix_entries array, while the number of actually
allocated entries is skdev->msix_count. This might lead to an out of
boundary access in case number of allocated entries is less than
SKD_MAX_MSIX_COUNT. This update fixes the described misbehaviour.
mtip32xx: Use pci_enable_msix_range() instead of pci_enable_msix()
As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() and pci_enable_msix_range()
interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
drbd: Fix future possible NULL pointer dereference
Right now every resource has exactly one connection. But we are preparing
for dynamic connections. I.e. in the future thre can be resources without
connections.
However smatch points this out as 'variable dereferenced before check',
which is correct.
This issue was introduced in
drbd: get_one_status(): Iterate over resource->devices instead of connection->peer_devices
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
drbd: Add drbd_thread->resource and make drbd_thread->connection optional
In the drbd_thread "infrastructure" functions, only use the resource instead of
the connection. Make the connection field of drbd_thread optional. This will
allow to introduce threads which are not associated with a connection.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
drbd_device_work is a work item that has a reference to a device,
while drbd_work is a more generic work item that does not carry
a reference to a device.
All callbacks get a pointer to a drbd_work instance, those callbacks
that expect a drbd_device_work use the container_of macro to get it.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
This allows drbd_alert(), drbd_err(), drbd_warn(), and drbd_info() to work for
a resource, device, or connection so that we don't have to introduce three
separate sets of macros for that.
The drbd_printk() macro itself is pretty ugly, but that problem is limited to
one place in the code. Using drbd_printk() on an object type which it doesn't
understand results in an undefined drbd_printk_with_wrong_object_type symbol.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
DRBD was using dev_err() and similar all over the code; instead of having to
write dev_err(disk_to_dev(device->vdisk), ...) to convert a drbd_device into a
kernel device, a DEV macro was used which implicitly references the device
variable. This is terrible; introduce separate drbd_err() and similar macros
with an explicit device parameter instead.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
drbd: Replace conn_get_by_name() with drbd_find_resource()
So far, connections and resources always come in pairs, but in the future with
multiple connections per resource, the names will stick with the resources.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
In a first step, each resource has exactly one connection, and both objects are
allocated at the same time. The final result will be one resource and zero or
more connections.
Only allow to delete a resource if all its connections are C_STANDALONE.
Stop the worker threads of all connections early enough.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
drbd: Introduce "peer_device" object between "device" and "connection"
In a setup where a device (aka volume) can replicate to multiple peers and one
connection can be shared between multiple devices, we need separate objects to
represent devices on peer nodes and network connections.
As a first step to introduce multiple connections per device, give each
drbd_device object a single drbd_peer_device object which connects it to a
drbd_connection object.
Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Rashika Kheria [Thu, 19 Dec 2013 09:33:36 +0000 (15:03 +0530)]
drivers: block: Remove unused function drbd_bm_write_lazy() in drbd_bitmap.c
Remove unused function drbd_bm_write_lazy() in drbd/drbd_bitmap.c.
This eliminates the following warning in drbd/drbd_bitmap.c:
drivers/block/drbd/drbd_bitmap.c:1208:5: warning: no previous prototype for ‘drbd_bm_write_lazy’ [-Wmissing-prototypes]
Rashika Kheria [Thu, 19 Dec 2013 09:34:47 +0000 (15:04 +0530)]
drivers: block: Mark function seq_printf_with_thousands_grouping() as static in drbd_proc.c
Mark function seq_printf_with_thousands_grouping() as static in
drbd/drbd_proc.c because it is not used outside this file.
This eliminates the following warning in drbd/drbd_proc.c:
drivers/block/drbd/drbd_proc.c:49:6: warning: no previous prototype for ‘seq_printf_with_thousands_grouping’ [-Wmissing-prototypes]
Rashika Kheria [Thu, 19 Dec 2013 09:36:10 +0000 (15:06 +0530)]
drivers: block: Mark the function as static in drbd_worker.c
Mark functions drbd_endio_read_sec_final(), drbd_send_barrier(),
need_to_send_barrier(), dequeue_work_batch(), dequeue_work_item() and
wait_for_work() as static in drbd/drbd_worker.c because they are not
used outside this file.
This eliminates the following warnings in drbd/drbd_worker.c:
drivers/block/drbd/drbd_worker.c:99:6: warning: no previous prototype for ‘drbd_endio_read_sec_final’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_worker.c:1276:5: warning: no previous prototype for ‘drbd_send_barrier’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_worker.c:1774:6: warning: no previous prototype for ‘need_to_send_barrier’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_worker.c:1798:6: warning: no previous prototype for ‘dequeue_work_batch’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_worker.c:1806:6: warning: no previous prototype for ‘dequeue_work_item’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_worker.c:1815:6: warning: no previous prototype for ‘wait_for_work’ [-Wmissing-prototypes]
Rashika Kheria [Thu, 19 Dec 2013 09:37:47 +0000 (15:07 +0530)]
drivers: block: Move prototype declaration to appropriate header file from drbd_main.c
Move prototype declaration of functions drbdd_init() and drbd_asender()
from drbd/drbd_main.c to header file drbd/drbd_int.h because these
functions are used by more than one file.
This eliminates the following warning in drbd/drbd_receiver.c:
drivers/block/drbd/drbd_receiver.c:4836:5: warning: no previous prototype for ‘drbdd_init’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_receiver.c:5245:5: warning: no previous prototype for ‘drbd_asender’ [-Wmissing-prototypes]
Rashika Kheria [Thu, 19 Dec 2013 09:41:09 +0000 (15:11 +0530)]
drivers: block: Mark functions as static in drbd_receiver.c
Mark functions conn_wait_active_ee_empty() and
drbd_crypto_alloc_digest_safe() as static in drbd/drbd_receiver.c
because they are not used outside this file.
This eliminates the following warning in drbd/drbd_receiver.c:
drivers/block/drbd/drbd_receiver.c:1401:6: warning: no previous prototype for ‘conn_wait_active_ee_empty’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_receiver.c:3259:21: warning: no previous prototype for ‘drbd_crypto_alloc_digest_safe’ [-Wmissing-prototypes]
Rashika Kheria [Thu, 19 Dec 2013 09:42:27 +0000 (15:12 +0530)]
drivers: block: Mark functions as static in drbd_req.c
Mark functions drbd_request_prepare() and find_oldest_request() as
static in drbd/drbd_req.c because they are not used outside this file.
This eliminates the following warnings in drbd/drbd_req.c:
drivers/block/drbd/drbd_req.c:1037:1: warning: no previous prototype for ‘drbd_request_prepare’ [-Wmissing-prototypes]
drivers/block/drbd/drbd_req.c:1323:22: warning: no previous prototype for ‘find_oldest_request’ [-Wmissing-prototypes]