Lars Ellenberg [Mon, 19 Jul 2010 15:41:04 +0000 (17:41 +0200)]
drbd: drbd_md_sync before calling user space helpers
Just in case we have some pending meta data changes to sync, do it
before we call our userland helper, as that may take some time,
or even cause a hard reboot.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
The race:
drbd_md_sync()
if (!test_and_clear_bit(MD_DIRTY, &mdev->flags))
return;
==> RACE with drbd_md_mark_dirty() rearming the timer.
del_timer(&mdev->md_sync_timer);
Fixed by moving the del_timer before the test_and_clear_bit.
Additionally only rearm the timer in drbd_md_mark_dirty, if MD_DIRTY was
not already set, reduce the grace period from five to one second, and
add an ifdef'ed debuging aid to find code paths missing an explicit
drbd_md_sync, if any, as those are the only relevant ones for this race.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Wed, 1 Sep 2010 13:47:15 +0000 (15:47 +0200)]
drbd: Removed a race that could cause unexpected execution of w_make_resync_request()
The actual race happened int the drbd_start_resync() function. Where
drbd_resync_finished() -> __drbd_set_state() set STOP_SYNC_TIMER and
armed the timer.
If the timer fired before execution reaches the mod_timer statement
at the end of drbd_start_resync() the latter would cause an
unexpected call to w_make_resync_request().
Removed the STOP_SYNC_TIMER bit, and base it on the connection state.
The STOP_SYNC_TIMER bit probably originates probably the time before
the state engine.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Wed, 1 Sep 2010 12:39:30 +0000 (14:39 +0200)]
drbd: implicitly create unconfigured devices on sync-after dependencies
If pacemaker (for example) decided to initialize minor devices not in
the exact sync-after dependency order, the configuration partially
failed with an error "The sync-after minor number is invalid". (Bugz. #322)
We can avoid that by implicitly creating unconfigured minor devices,
if others depend on them.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Wed, 1 Sep 2010 07:50:23 +0000 (09:50 +0200)]
drbd: fix race between deconfiguring and reconfiguring network
If a drbd_nl_net_conf hits the small window between the state change
to C_STANDALONE and the corresponding cleanup in after_state_ch,
that cleanup would throw away stuff we now need again,
and later trigger BUG_ON()s.
Fixed by properly serializing the new config request with
any pending cleanup.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Tue, 31 Aug 2010 10:00:50 +0000 (12:00 +0200)]
drbd: Disable activity log updates when the whole device is out of sync
When the complete device is marked as out of sync, we can disable
updates of the on disk AL. Currently AL updates are only disabled
if one uses the "invalidate-remote" command on an unconnected,
primary device, or when at attach time all bits in the bitmap are
set.
As of now, AL updated do not get disabled when a all bits becomes
set due to application writes to an unconnected DRBD device.
While this is a missing feature, it is not considered important,
and might get added later.
BTW, after initializing a "one legged" DRBD device
drbdadm create-md resX
drbdadm -- --force primary resX
AL updates also get disabled, until the first connect.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Wed, 11 Aug 2010 19:21:50 +0000 (21:21 +0200)]
drbd: use rolling marks for resync speed calculation
The current resync speed as displayed in /proc/drbd fluctuates a lot.
Using an array of rolling marks makes this calculation much more stable.
We used to have this (a long time ago with 0.7), but it got lost somehow.
If "stalled", do not discard the rest of the information, just add a
" (stalled)" tag to the progress line.
This patch also shortens a spinlock critical section somewhat, and
reduces the number of atomic operations in put_ldev.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Lars Ellenberg [Fri, 9 Jul 2010 21:28:10 +0000 (23:28 +0200)]
drbd: fix list corruption (recent regression)
The commit 288f422ec13667de40b278535d2a5fb5c77352c4
drbd: Track all IO requests on the TL, not writes only
moved a list_add_tail(req, ) into a region where req
may have just been freed due to conflict detection.
Fix this by adding a proper cleanup section for that code path.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Wed, 23 Jun 2010 09:20:05 +0000 (11:20 +0200)]
drbd: Fixed a deadlock, probably only affected UP machines
After disconnect (most likely mdev->net_cnt == 0) and we are
still in an unstable state (!drbd_state_is_stable()). When we
get an IO request in drbd_get_max_buffers() (called from
__inc_ap_bio_cond(), called from inc_ap_bio()) we wake up
misc_wait. Misc_wait is also used in inc_ap_bio() to sleep
until the outcome of __inc_ap_bio_cond() changes. => Busy loop!
Solution: Have a dedicated wait queue for get_net_conf() and
put_net_conf().
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Fri, 11 Jun 2010 09:26:34 +0000 (11:26 +0200)]
drbd: Delayed creation of current-UUID
When a fencing policy of "resource-and-stonith" is configured,
and DRBD looses connection to it's peer, we can delay the
creation of a new current-UUID until IO gets thawed.
That allows one to deploy fence-peer handlers that actually
commit suicide on the machine they get started.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Thu, 10 Jun 2010 14:55:15 +0000 (16:55 +0200)]
drbd: Reduce the verbosity of some state transitions
State transitions in the space of non-allowed states used
to be very noisy. Reduce that, since that has little value
for the majority of the user base.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Philipp Reisner [Mon, 31 May 2010 08:14:17 +0000 (10:14 +0200)]
drbd: Finished the "on-no-data-accessible suspend-io;" functionality
When no data is accessible (no connection to the peer, nor a local disk)
allow the user to select to freeze all IO operations instead of getting
IO errors.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
floppy: switch to one queue per drive instead of sharing a queue
Pretty straight forward conversion. Note that we do round-robin
between the drives that have available requests, before we simply
used the drive that the IO scheduler told us to. Since the IO
scheduler doesn't care about multiple devices per queue, the resulting
sort would not have made sense.
Fixed by Vivek to get rid of a double lock problem in set_next_request()
Milan Broz [Mon, 23 Aug 2010 13:16:00 +0000 (15:16 +0200)]
loop: add some basic read-only sysfs attributes
Create /sys/block/loopX/loop directory and provide these attributes:
- backing_file
- autoclear
- offset
- sizelimit
This loop directory is present only if loop device is configured.
To be used in util-linux-ng (and possibly elsewhere like udev rules)
where code need to get loop attributes from kernel (and not store
duplicate info in userspace).
Moreover loop ioctls are not even able to provide full backing
file info because of buffer limits.
Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Linus Torvalds [Mon, 23 Aug 2010 02:55:14 +0000 (19:55 -0700)]
Merge branch 'radix-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev
* 'radix-tree' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev:
radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags
radix-tree: clear all tags in radix_tree_node_rcu_free
Dave Chinner [Mon, 23 Aug 2010 00:33:53 +0000 (10:33 +1000)]
radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags
Commit ebf8aa44beed48cd17893a83d92a4403e5f9d9e2 ("radix-tree:
omplement function radix_tree_range_tag_if_tagged") does not safely
set tags on on intermediate tree nodes. The code walks down the tree
setting tags before it has fully resolved the path to the leaf under
the assumption there will be a leaf slot with the tag set in the
range it is searching.
Unfortunately, this is not a valid assumption - we can abort after
setting a tag on an intermediate node if we overrun the number of
tags we are allowed to set in a batch, or stop scanning because we
we have passed the last scan index before we reach a leaf slot with
the tag we are searching for set.
As a result, we can leave the function with tags set on intemediate
nodes which can be tripped over later by tag-based lookups. The
result of these stale tags is that lookup may end prematurely or
livelock because the lookup cannot make progress.
The fix for the problem involves reocrding the traversal path we
take to the leaf nodes, and only propagating the tags back up the
tree once the tag is set in the leaf node slot. We are already
recording the path for efficient traversal, so there is no
additional overhead to do the intermediately node tag setting in
this manner.
This fixes a radix tree lookup livelock triggered by the new
writeback sync livelock avoidance code introduced in commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback
livelock avoidance using page tagging").
Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Jan Kara <jack@suse.cz>
Dave Chinner [Mon, 23 Aug 2010 00:33:19 +0000 (10:33 +1000)]
radix-tree: clear all tags in radix_tree_node_rcu_free
Commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement
writeback livelock avoidance using page tagging") introduced a new
radix tree tag, increasing the number of tags in each node from 2 to
3. It did not, however, fix up the code in
radix_tree_node_rcu_free() that cleans up after radix_tree_shrink()
and hence could leave stray tags set in the new tag array.
The result is that the livelock avoidance code added in the the
above commit would hit stale tags when doing tag based lookups,
resulting in livelocks when trying to traverse the tree.
Fix this problem in radix_tree_node_rcu_free() so it doesn't happen
again in the future by using a loop to walk all the tags up to
RADIX_TREE_MAX_TAGS to clear the stray tags radix_tree_shrink()
leaves behind.
Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Nick Piggin <npiggin@kernel.dk> Acked-by: Jan Kara <jack@suse.cz>
Linus Torvalds [Sun, 22 Aug 2010 18:27:36 +0000 (11:27 -0700)]
Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PIT: free irq source id in handling error path
KVM: destroy workqueue on kvm_create_pit() failures
KVM: fix poison overwritten caused by using wrong xstate size
Linus Torvalds [Sun, 22 Aug 2010 18:03:27 +0000 (11:03 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (58 commits)
drm/i915,intel_agp: Add support for Sandybridge D0
drm/i915: fix render pipe control notify on sandybridge
agp/intel: set 40-bit dma mask on Sandybridge
drm/i915: Remove the conflicting BUG_ON()
drm/i915/suspend: s/IS_IRONLAKE/HAS_PCH_SPLIT/
drm/i915/suspend: Flush register writes before busy-waiting.
i915: disable DAC on Ironlake also when doing CRT load detection.
drm/i915: wait for actual vblank, not just 20ms
drm/i915: make sure eDP PLL is enabled at the right time
drm/i915: fix VGA plane disable for Ironlake+
drm/i915: eDP mode set sequence corrections
drm/i915: add panel reset workaround
drm/i915: Enable RC6 on Ironlake.
drm/i915/sdvo: Only set is_lvds if we have a valid fixed mode.
drm/i915: Set up a render context on Ironlake
drm/i915 invalidate indirect state pointers at end of ring exec
drm/i915: Wake-up wait_request() from elapsed hang-check (v2)
drm/i915: Apply i830 errata for cursor alignment
drm/i915: Only update i845/i865 CURBASE when disabled (v2)
drm/i915: FBC is updated within set_base() so remove second call in mode_set()
...
Chris Wilson [Sun, 15 Aug 2010 09:52:34 +0000 (10:52 +0100)]
drm/i915: Remove the conflicting BUG_ON()
We now attempt to free "active" objects following a GPU hang as either
the GPU will be reset or the hang is permenant. In either case, the GPU
writes will not be flushed to main memory and it should be safe to
return that memory back to the system.
The BUG_ON(active) is thus overkill and can erroneously fire after a
EIO.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Eric Anholt <eric@anholt.net>
Jesse Barnes [Wed, 18 Aug 2010 20:20:54 +0000 (13:20 -0700)]
drm/i915: wait for actual vblank, not just 20ms
Waiting for a hard coded 20ms isn't always enough to make sure a vblank
period has actually occurred, so add code to make sure we really have
passed through a vblank period (or that the pipe is off when disabling).
This prevents problems with mode setting and link training, and seems to
fix a bug like https://bugs.freedesktop.org/show_bug.cgi?id=29278, but
on an HP 8440p instead. Hopefully also fixes
https://bugs.freedesktop.org/show_bug.cgi?id=29141.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Eric Anholt <eric@anholt.net>
Arjan van de Ven [Sat, 21 Aug 2010 20:07:26 +0000 (13:07 -0700)]
workqueue: Add basic tracepoints to track workqueue execution
With the introduction of the new unified work queue thread pools,
we lost one feature: It's no longer possible to know which worker
is causing the CPU to wake out of idle. The result is that PowerTOP
now reports a lot of "kworker/a:b" instead of more readable results.
This patch adds a pair of tracepoints to the new workqueue code,
similar in style to the timer/hrtimer tracepoints.
With this pair of tracepoints, the next PowerTOP can correctly
report which work item caused the wakeup (and how long it took):
Interrupt (43) i915 time 3.51ms wakeups 141
Work ieee80211_iface_work time 0.81ms wakeups 29
Work do_dbs_timer time 0.55ms wakeups 24
Process Xorg time 21.36ms wakeups 4
Timer sched_rt_period_timer time 0.01ms wakeups 1
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samuel Thibault [Sat, 21 Aug 2010 19:32:41 +0000 (21:32 +0200)]
Replace Configure with Enable in description of MAXSMP
The "Configure" word tends to make user believe they have to say 'yes'
to be able to choose the number of procs/nodes. "Enable" should be
unambiguous enough.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Aug 2010 23:49:40 +0000 (16:49 -0700)]
mm: make stack guard page logic use vm_prev pointer
Like the mlock() change previously, this makes the stack guard check
code use vma->vm_prev to see what the mapping below the current stack
is, rather than have to look it up with find_vma().
Also, accept an abutting stack segment, since that happens naturally if
you split the stack with mlock or mprotect.
Tested-by: Ian Campbell <ijc@hellion.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 20 Aug 2010 23:24:55 +0000 (16:24 -0700)]
mm: make the vma list be doubly linked
It's a really simple list, and several of the users want to go backwards
in it to find the previous vma. So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.
Tested-by: Ian Campbell <ijc@hellion.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tilman Sauerbeck [Fri, 20 Aug 2010 21:01:47 +0000 (14:01 -0700)]
mtd: nand: Fix probe of Samsung NAND chips
Apparently, the check for a 6-byte ID string introduced by commit 426c457a3216fac74e3d44dd39729b0689f4c7ab ("mtd: nand: extend NAND flash
detection to new MLC chips") is NOT sufficient to determine whether or
not a Samsung chip uses their new MLC detection scheme or the old,
standard scheme. This adds a condition to check cell type.
Signed-off-by: Tilman Sauerbeck <tilman@code-monkey.de> Signed-off-by: Brian Norris <norris@broadcom.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Cc: stable@kernel.org
Stefan Richter [Thu, 19 Aug 2010 21:13:43 +0000 (14:13 -0700)]
Documentation: kernel-locking: mutex_trylock cannot be used in interrupt context
Chapter 6 is right about mutex_trylock, but chapter 10 wasn't. This error
was introduced during semaphore-to-mutex conversion of the Unreliable
guide. :-)
If user context which performs mutex_lock() or mutex_trylock() is
preempted by interrupt context which performs mutex_trylock() on the same
mutex instance, a deadlock occurs. This is because these functions do not
disable local IRQs when they operate on mutex->wait_lock.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 19 Aug 2010 21:13:42 +0000 (14:13 -0700)]
drivers/scsi/qla4xxx: fix build
gcc-4.0.2:
drivers/scsi/qla4xxx/ql4_os.c: In function 'qla4_8xxx_error_recovery':
drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
drivers/scsi/qla4xxx/ql4_os.c:2377: sorry, unimplemented: called from here
drivers/scsi/qla4xxx/ql4_glbl.h:135: sorry, unimplemented: inlining failed in call to 'qla4_8xxx_set_drv_active': function body not available
drivers/scsi/qla4xxx/ql4_os.c:2393: sorry, unimplemented: called from here
Cc: Ravi Anand <ravi.anand@qlogic.com> Cc: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Thu, 19 Aug 2010 21:13:40 +0000 (14:13 -0700)]
uml: fix compile error in dma_get_cache_alignment()
Fix uml compile error:
include/linux/dma-mapping.h:145: error: redefinition of 'dma_get_cache_alignment'
arch/um/include/asm/dma-mapping.h:99: note: previous definition of 'dma_get_cache_alignment' was here
Introduced by commit 4565f0170dfc ("dma-mapping: unify
dma_get_cache_alignment implementations")
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Jeff Dike <jdike@addtoit.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Thu, 19 Aug 2010 21:13:39 +0000 (14:13 -0700)]
oom: __task_cred() need rcu_read_lock()
dump_tasks() needs to hold the RCU read lock around its access of the
target task's UID. To this end it should use task_uid() as it only needs
that one thing from the creds.
The fact that dump_tasks() holds tasklist_lock is insufficient to prevent the
target process replacing its credentials on another CPU.
Then, this patch change to call rcu_read_lock() explicitly.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Thu, 19 Aug 2010 21:13:39 +0000 (14:13 -0700)]
oom: fix tasklist_lock leak
Commit 0aad4b3124 ("oom: fold __out_of_memory into out_of_memory")
introduced a tasklist_lock leak. Then it caused following obvious
danger warnings and panic.
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
rsyslogd/1422 is leaving the kernel with locks still held!
1 lock held by rsyslogd/1422:
#0: (tasklist_lock){.+.+.+}, at: [<ffffffff810faf64>] out_of_memory+0x164/0x3f0
BUG: scheduling while atomic: rsyslogd/1422/0x00000002
INFO: lockdep is turned off.
This patch fixes it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kyungmin Park [Thu, 19 Aug 2010 21:13:37 +0000 (14:13 -0700)]
drivers/mmc/host/sdhci-s3c.c: use the correct mutex and card detect function
There's some merge problem between sdhic core and sdhci-s3c host. After
mutex is changed to spinlock. It needs to use use spin lock functions and
use the correct card detection function.
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>