git.karo-electronics.de Git - linux-beck.git/log

workqueue: deprecate __cancel_delayed_work()

Now that cancel_delayed_work() can be safely called from IRQ handlers,
there's no reason to use __cancel_delayed_work(). Use
cancel_delayed_work() instead of __cancel_delayed_work() and mark the
latter deprecated.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Roland Dreier <roland@kernel.org>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>

workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()

cancel_delayed_work() can't be called from IRQ handlers due to its use
of del_timer_sync() and can't cancel work items which are already
transferred from timer to worklist.

Also, unlike other flush and cancel functions, a canceled delayed_work
would still point to the last associated cpu_workqueue. If the
workqueue is destroyed afterwards and the work item is re-used on a
different workqueue, the queueing code can oops trying to dereference
already freed cpu_workqueue.

This patch reimplements cancel_delayed_work() using
try_to_grab_pending() and set_work_cpu_and_clear_pending(). This
allows the function to be called from IRQ handlers and makes its
behavior consistent with other flush / cancel functions.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>

workqueue: use mod_delayed_work() instead of __cancel + queue

Now that mod_delayed_work() is safe to call from IRQ handlers,
__cancel_delayed_work() followed by queue_delayed_work() can be
replaced with mod_delayed_work().

Most conversions are straight-forward except for the following.

* net/core/link_watch.c: linkwatch_schedule_work() was doing a quite
  elaborate dancing around its delayed_work.  Collapse it such that
  linkwatch_work is queued for immediate execution if LW_URGENT and
  existing timer is kept otherwise.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>

workqueue: use irqsafe timer for delayed_work

Up to now, for delayed_works, try_to_grab_pending() couldn't be used
from IRQ handlers because IRQs may happen while
delayed_work_timer_fn() is in progress leading to indefinite -EAGAIN.

This patch makes delayed_work use the new TIMER_IRQSAFE flag for
delayed_work->timer. This makes try_to_grab_pending() and thus
mod_delayed_work_on() safe to call from IRQ handlers.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: clean up delayed_work initializers and add missing one

Reimplement delayed_work initializers using new timer initializers
which take timer flags. This reduces code duplications and will ease
further initializer changes. This patch also adds a missing
initializer - INIT_DEFERRABLE_WORK_ONSTACK().

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: make deferrable delayed_work initializer names consistent

Initalizers for deferrable delayed_work are confused.

* __DEFERRED_WORK_INITIALIZER()
* DECLARE_DEFERRED_WORK()
* INIT_DELAYED_WORK_DEFERRABLE()

Rename them to

* __DEFERRABLE_WORK_INITIALIZER()
* DECLARE_DEFERRABLE_WORK()
* INIT_DEFERRABLE_WORK()

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: cosmetic whitespace updates for macro definitions

Consistently use the last tab position for '\' line continuation in
complex macro definitions. This is to help the following patches.

This patch is cosmetic.

Signed-off-by: Tejun Heo <tj@kernel.org>

Merge branch 'timers/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into for-3.7

timer: Implement TIMER_IRQSAFE

Timer internals are protected with irq-safe locks but timer execution
isn't, so a timer being dequeued for execution and its execution
aren't atomic against IRQs.  This makes it impossible to wait for its
completion from IRQ handlers and difficult to shoot down a timer from
IRQ handlers.

This issue caused some issues for delayed_work interface.  Because
there's no way to reliably shoot down delayed_work->timer from IRQ
handlers, __cancel_delayed_work() can't share the logic to steal the
target delayed_work with cancel_delayed_work_sync(), and can only
steal delayed_works which are on queued on timer.  Similarly, the
pending mod_delayed_work() can't be used from IRQ handlers.

This patch adds a new timer flag TIMER_IRQSAFE, which makes the timer
to be executed without enabling IRQ after dequeueing such that its
dequeueing and execution are atomic against IRQ handlers.

This makes it safe to wait for the timer's completion from IRQ
handlers, for example, using del_timer_sync().  It can never be
executing on the local CPU and if executing on other CPUs it won't be
interrupted until done.

This will enable simplifying delayed_work cancel/mod interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-5-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

timer: Clean up timer initializers

Over time, timer initializers became messy with unnecessarily
duplicated code which are inconsistently spread across timer.h and
timer.c.

This patch cleans up timer initializers.

* timer.c::__init_timer() is renamed to do_init_timer().

* __TIMER_INITIALIZER() added.  It takes @flags and all initializers
  are wrappers around it.

* init_timer[_on_stack]_key() now take @flags.

* __init_timer[_on_stack]() added.  They take @flags and all init
  macros are wrappers around them.

* __setup_timer[_on_stack]() added.  It uses __init_timer() and takes
  @flags.  All setup macros are wrappers around the two.

Note that this patch doesn't add missing init/setup combinations -
e.g. init_timer_deferrable_on_stack().  Adding missing ones is
trivial.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-4-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

timer: Relocate declarations of init_timer_on_stack_key()

init_timer_on_stack_key() is used by init macro definitions. Move
init_timer_on_stack_key() and destroy_timer_on_stack() declarations
above init macro defs. This will make the next init cleanup patch
easier to read.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-3-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

timer: Generalize timer->base flags handling

To prepare for addition of another flag, generalize timer->base flags
handling.

* Rename from TBASE_*_FLAG to TIMER_* and make them LU constants.

* Define and use TIMER_FLAG_MASK for flags masking so that multiple
  flags can be handled correctly.

* Don't dereference timer->base directly even if
  !tbase_get_deferrable().  All two such places are already passed in
  @base, so use it instead.

* Make sure tvec_base's alignment is large enough for timer->base
  flags using BUILD_BUG_ON().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1344449428-24962-2-git-send-email-tj@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

workqueue: deprecate system_nrt[_freezable]_wq

system_nrt[_freezable]_wq are now spurious. Mark them deprecated and
convert all users to system[_freezable]_wq.

If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant, so there's no reason to use system_nrt[_freezable]_wq.
Please use system[_freezable]_wq instead.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-By: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Airlie <airlied@linux.ie>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>

workqueue: deprecate flush[_delayed]_work_sync()

flush[_delayed]_work_sync() are now spurious. Mark them deprecated
and convert all users to flush[_delayed]_work().

If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant and the regular flushes guarantee that the work item is
not pending or running on any CPU on return, so there's no reason to
use the sync flushes at all and they're going away.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mattia Dongili <malattia@linux.it>
Cc: Kent Yoder <key@linux.vnet.ibm.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Bryan Wu <bryan.wu@canonical.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Sangbeom Kim <sbkim73@samsung.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Avi Kivity <avi@redhat.com>

workqueue: gut system_nrt[_freezable]_wq()

Now that all workqueues are non-reentrant, system[_freezable]_wq() are
equivalent to system_nrt[_freezable]_wq(). Replace the latter with
wrappers around system[_freezable]_wq(). The wrapping goes through
inline functions so that __deprecated can be added easily.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: gut flush[_delayed]_work_sync()

Now that all workqueues are non-reentrant, flush[_delayed]_work_sync()
are equivalent to flush[_delayed]_work(). Drop the separate
implementation and make them thin wrappers around
flush[_delayed]_work().

* start_flush_work() no longer takes @wait_executing as the only left
user - flush_work() - always sets it to %true.

* __cancel_work_timer() uses flush_work() instead of wait_on_work().

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: make all workqueues non-reentrant

By default, each per-cpu part of a bound workqueue operates separately
and a work item may be executing concurrently on different CPUs.  The
behavior avoids some cross-cpu traffic but leads to subtle weirdities
and not-so-subtle contortions in the API.

* There's no sane usefulness in allowing a single work item to be
  executed concurrently on multiple CPUs.  People just get the
  behavior unintentionally and get surprised after learning about it.
  Most either explicitly synchronize or use non-reentrant/ordered
  workqueue but this is error-prone.

* flush_work() can't wait for multiple instances of the same work item
  on different CPUs.  If a work item is executing on cpu0 and then
  queued on cpu1, flush_work() can only wait for the one on cpu1.

  Unfortunately, work items can easily cross CPU boundaries
  unintentionally when the queueing thread gets migrated.  This means
  that if multiple queuers compete, flush_work() can't even guarantee
  that the instance queued right before it is finished before
  returning.

* flush_work_sync() was added to work around some of the deficiencies
  of flush_work().  In addition to the usual flushing, it ensures that
  all currently executing instances are finished before returning.
  This operation is expensive as it has to walk all CPUs and at the
  same time fails to address competing queuer case.

  Incorrectly using flush_work() when flush_work_sync() is necessary
  is an easy error to make and can lead to bugs which are difficult to
  reproduce.

* Similar problems exist for flush_delayed_work[_sync]().

Other than the cross-cpu access concern, there's no benefit in
allowing parallel execution and it's plain silly to have this level of
contortion for workqueue which is widely used from core code to
extremely obscure drivers.

This patch makes all workqueues non-reentrant.  If a work item is
executing on a different CPU when queueing is requested, it is always
queued to that CPU.  This guarantees that any given work item can be
executing on one CPU at maximum and if a work item is queued and
executing, both are on the same CPU.

The only behavior change which may affect workqueue users negatively
is that non-reentrancy overrides the affinity specified by
queue_work_on().  On a reentrant workqueue, the affinity specified by
queue_work_on() is always followed.  Now, if the work item is
executing on one of the CPUs, the work item will be queued there
regardless of the requested affinity.  I've reviewed all workqueue
users which request explicit affinity, and, fortunately, none seems to
be crazy enough to exploit parallel execution of the same work item.

This adds an additional busy_hash lookup if the work item was
previously queued on a different CPU.  This shouldn't be noticeable
under any sane workload.  Work item queueing isn't a very
high-frequency operation and they don't jump across CPUs all the time.
In a micro benchmark to exaggerate this difference - measuring the
time it takes for two work items to repeatedly jump between two CPUs a
number (10M) of times with busy_hash table densely populated, the
difference was around 3%.

While the overhead is measureable, it is only visible in pathological
cases and the difference isn't huge.  This change brings much needed
sanity to workqueue and makes its behavior consistent with timer.  I
think this is the right tradeoff to make.

This enables significant simplification of workqueue API.
Simplification patches will follow.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: fix checkpatch issues

Fixed some checkpatch warnings.

tj: adapted to wq/for-3.7 and massaged pr_xxx() format strings a bit.

Signed-off-by: Valentin Ilie <valentin.ilie@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
LKML-Reference: <1345326762-21747-1-git-send-email-valentin.ilie@gmail.com>

workqueue: use system_highpri_wq for unbind_work

To speed cpu down processing up, use system_highpri_wq.
As scheduling priority of workers on it is higher than system_wq and
it is not contended by other normal works on this cpu, work on it
is processed faster than system_wq.

tj: CPU up/downs care quite a bit about latency these days. This
shouldn't hurt anything and makes sense.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: use system_highpri_wq for highpri workers in rebind_workers()

In rebind_workers(), we do inserting a work to rebind to cpu for busy workers.
Currently, in this case, we use only system_wq. This makes a possible
error situation as there is mismatch between cwq->pool and worker->pool.

To prevent this, we should use system_highpri_wq for highpri worker
to match theses. This implements it.

tj: Rephrased comment a bit.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: introduce system_highpri_wq

Commit 3270476a6c0ce322354df8679652f060d66526dc ('workqueue: reimplement
WQ_HIGHPRI using a separate worker_pool') introduce separate worker pool
for HIGHPRI. When we handle busyworkers for gcwq, it can be normal worker
or highpri worker. But, we don't consider this difference in rebind_workers(),
we use just system_wq for highpri worker. It makes mismatch between
cwq->pool and worker->pool.

It doesn't make error in current implementation, but possible in the future.
Now, we introduce system_highpri_wq to use proper cwq for highpri workers
in rebind_workers(). Following patch fix this issue properly.

tj: Even apart from rebinding, having system_highpri_wq generally
makes sense.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: change value of lcpu in __queue_delayed_work_on()

We assign cpu id into work struct's data field in __queue_delayed_work_on().
In current implementation, when work is come in first time,
current running cpu id is assigned.
If we do __queue_delayed_work_on() with CPU A on CPU B,
__queue_work() invoked in delayed_work_timer_fn() go into
the following sub-optimal path in case of WQ_NON_REENTRANT.

gcwq = get_gcwq(cpu);
if (wq->flags & WQ_NON_REENTRANT &&
(last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) {

Change lcpu to @cpu and rechange lcpu to local cpu if lcpu is WORK_CPU_UNBOUND.
It is sufficient to prevent to go into sub-optimal path.

tj: Slightly rephrased the comment.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: correct req_cpu in trace_workqueue_queue_work()

When we do tracing workqueue_queue_work(), it records requested cpu.
But, if !(@wq->flag & WQ_UNBOUND) and @cpu is WORK_CPU_UNBOUND,
requested cpu is changed as local cpu.
In case of @wq->flag & WQ_UNBOUND, above change is not occured,
therefore it is reasonable to correct it.

Use temporary local variable for storing requested cpu.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: use enum value to set array size of pools in gcwq

Commit 3270476a6c0ce322354df8679652f060d66526dc ('workqueue: reimplement
WQ_HIGHPRI using a separate worker_pool') introduce separate worker_pool
for HIGHPRI. Although there is NR_WORKER_POOLS enum value which represent
size of pools, definition of worker_pool in gcwq doesn't use it.
Using it makes code robust and prevent future mistakes.
So change code to use this enum value.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: add missing wmb() in clear_work_data()

Any operation which clears PENDING should be preceded by a wmb to
guarantee that the next PENDING owner sees all the changes made before
PENDING release.

There are only two places where PENDING is cleared -
set_work_cpu_and_clear_pending() and clear_work_data(). The caller of
the former already does smp_wmb() but the latter doesn't have any.

Move the wmb above set_work_cpu_and_clear_pending() into it and add
one to clear_work_data().

There hasn't been any report related to this issue, and, given how
clear_work_data() is used, it is extremely unlikely to have caused any
actual problems on any architecture.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>

workqueue: fix CPU binding of flush_delayed_work[_sync]()

delayed_work encodes the workqueue to use and the last CPU in
delayed_work->work.data while it's on timer.  The target CPU is
implicitly recorded as the CPU the timer is queued on and
delayed_work_timer_fn() queues delayed_work->work to the CPU it is
running on.

Unfortunately, this leaves flush_delayed_work[_sync]() no way to find
out which CPU the delayed_work was queued for when they try to
re-queue after killing the timer.  Currently, it chooses the local CPU
flush is running on.  This can unexpectedly move a delayed_work queued
on a specific CPU to another CPU and lead to subtle errors.

There isn't much point in trying to save several bytes in struct
delayed_work, which is already close to a hundred bytes on 64bit with
all debug options turned off.  This patch adds delayed_work->cpu to
remember the CPU it's queued for.

Note that if the timer is migrated during CPU down, the work item
could be queued to the downed global_cwq after this change.  As a
detached global_cwq behaves like an unbound one, this doesn't change
much for the delayed_work.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>

workqueue: use mod_delayed_work() instead of cancel + queue

Convert delayed_work users doing cancel_delayed_work() followed by
queue_delayed_work() to mod_delayed_work().

Most conversions are straight-forward.  Ones worth mentioning are,

* drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
  use mod_delayed_work() and cancel loop in
  edac_mc_reset_delay_period() is dropped.

* drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
  watchdog is active or not.  @fan_watchdog_active and related code
  dropped.

* drivers/power/charger-manager.c: Seemingly a lot of
  delayed_work_pending() abuse going on here.
  [delayed_]work_pending() are unsynchronized and racy when used like
  this.  I converted one instance in fullbatt_handler().  Please
  conver the rest so that it invokes workqueue APIs for the intended
  target state rather than trying to game work item pending state
  transitions.  e.g. if timer should be modified - call
  mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().

* drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
  simplified.  Note that round_jiffies() calls in this function are
  meaningless.  round_jiffies() work on absolute jiffies not delta
  delay used by delayed_work.

v2: Tomi pointed out that __cancel_delayed_work() users can't be
    safely converted to mod_delayed_work().  They could be calling it
    from irq context and if that happens while delayed_work_timer_fn()
    is running, it could deadlock.  __cancel_delayed_work() users are
    dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Roland Dreier <roland@kernel.org>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Johannes Berg <johannes@sipsolutions.net>

workqueue: implement mod_delayed_work[_on]()

Workqueue was lacking a mechanism to modify the timeout of an already
pending delayed_work.  delayed_work users have been working around
this using several methods - using an explicit timer + work item,
messing directly with delayed_work->timer, and canceling before
re-queueing, all of which are error-prone and/or ugly.

This patch implements mod_delayed_work[_on]() which behaves similarly
to mod_timer() - if the delayed_work is idle, it's queued with the
given delay; otherwise, its timeout is modified to the new value.
Zero @delay guarantees immediate execution.

v2: Updated to reflect try_to_grab_pending() changes.  Now safe to be
    called from bh context.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>

workqueue: mark a work item being canceled as such

There can be two reasons try_to_grab_pending() can fail with -EAGAIN.
One is when someone else is queueing or deqeueing the work item.  With
the previous patches, it is guaranteed that PENDING and queued state
will soon agree making it safe to busy-retry in this case.

The other is if multiple __cancel_work_timer() invocations are racing
one another.  __cancel_work_timer() grabs PENDING and then waits for
running instances of the target work item on all CPUs while holding
PENDING and !queued.  try_to_grab_pending() invoked from another task
will keep returning -EAGAIN while the current owner is waiting.

Not distinguishing the two cases is okay because __cancel_work_timer()
is the only user of try_to_grab_pending() and it invokes
wait_on_work() whenever grabbing fails.  For the first case, busy
looping should be fine but wait_on_work() doesn't cause any critical
problem.  For the latter case, the new contender usually waits for the
same condition as the current owner, so no unnecessarily extended
busy-looping happens.  Combined, these make __cancel_work_timer()
technically correct even without irq protection while grabbing PENDING
or distinguishing the two different cases.

While the current code is technically correct, not distinguishing the
two cases makes it difficult to use try_to_grab_pending() for other
purposes than canceling because it's impossible to tell whether it's
safe to busy-retry grabbing.

This patch adds a mechanism to mark a work item being canceled.
try_to_grab_pending() now disables irq on success and returns -EAGAIN
to indicate that grabbing failed but PENDING and queued states are
gonna agree soon and it's safe to busy-loop.  It returns -ENOENT if
the work item is being canceled and it may stay PENDING && !queued for
arbitrary amount of time.

__cancel_work_timer() is modified to mark the work canceling with
WORK_OFFQ_CANCELING after grabbing PENDING, thus making
try_to_grab_pending() fail with -ENOENT instead of -EAGAIN.  Also, it
invokes wait_on_work() iff grabbing failed with -ENOENT.  This isn't
necessary for correctness but makes it consistent with other future
users of try_to_grab_pending().

v2: try_to_grab_pending() was testing preempt_count() to ensure that
    the caller has disabled preemption.  This triggers spuriously if
    !CONFIG_PREEMPT_COUNT.  Use preemptible() instead.  Reported by
    Fengguang Wu.

v3: Updated so that try_to_grab_pending() disables irq on success
    rather than requiring preemption disabled by the caller.  This
    makes busy-looping easier and will allow try_to_grap_pending() to
    be used from bh/irq contexts.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>

workqueue: reorganize try_to_grab_pending() and __cancel_timer_work()

* Use bool @is_dwork instead of @timer and let try_to_grab_pending()
  use to_delayed_work() to determine the delayed_work address.

* Move timer handling from __cancel_work_timer() to
  try_to_grab_pending().

* Make try_to_grab_pending() use -EAGAIN instead of -1 for
  busy-looping and drop the ret local variable.

* Add proper function comment to try_to_grab_pending().

This makes the code a bit easier to understand and will ease further
changes.  This patch doesn't make any functional change.

v2: Use @is_dwork instead of @timer.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: factor out __queue_delayed_work() from queue_delayed_work_on()

This is to prepare for mod_delayed_work[_on]() and doesn't cause any
functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: introduce WORK_OFFQ_FLAG_*

Low WORK_STRUCT_FLAG_BITS bits of work_struct->data contain
WORK_STRUCT_FLAG_* and flush color.  If the work item is queued, the
rest point to the cpu_workqueue with WORK_STRUCT_CWQ set; otherwise,
WORK_STRUCT_CWQ is clear and the bits contain the last CPU number -
either a real CPU number or one of WORK_CPU_*.

Scheduled addition of mod_delayed_work[_on]() requires an additional
flag, which is used only while a work item is off queue.  There are
more than enough bits to represent off-queue CPU number on both 32 and
64bits.  This patch introduces WORK_OFFQ_FLAG_* which occupy the lower
part of the @work->data high bits while off queue.  This patch doesn't
define any actual OFFQ flag yet.

Off-queue CPU number is now shifted by WORK_OFFQ_CPU_SHIFT, which adds
the number of bits used by OFFQ flags to WORK_STRUCT_FLAG_SHIFT, to
make room for OFFQ flags.

To avoid shift width warning with large WORK_OFFQ_FLAG_BITS, ulong
cast is added to WORK_STRUCT_NO_CPU and, just in case, BUILD_BUG_ON()
to check that there are enough bits to accomodate off-queue CPU number
is added.

This patch doesn't make any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: move try_to_grab_pending() upwards

try_to_grab_pending() will be used by to-be-implemented
mod_delayed_work[_on](). Move try_to_grab_pending() and related
functions above queueing functions.

This patch only moves functions around.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: fix zero @delay handling of queue_delayed_work_on()

If @delay is zero and the dealyed_work is idle, queue_delayed_work()
queues it for immediate execution; however, queue_delayed_work_on()
lacks this logic and always goes through timer regardless of @delay.

This patch moves 0 @delay handling logic from queue_delayed_work() to
queue_delayed_work_on() so that both functions behave the same.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: unify local CPU queueing handling

Queueing functions have been using different methods to determine the
local CPU.

* queue_work() superflously uses get/put_cpu() to acquire and hold the
  local CPU across queue_work_on().

* delayed_work_timer_fn() uses smp_processor_id().

* queue_delayed_work() calls queue_delayed_work_on() with -1 @cpu
  which is interpreted as the local CPU.

* flush_delayed_work[_sync]() were using raw_smp_processor_id().

* __queue_work() interprets %WORK_CPU_UNBOUND as local CPU if the
  target workqueue is bound one but nobody uses this.

This patch converts all functions to uniformly use %WORK_CPU_UNBOUND
to indicate local CPU and use the local binding feature of
__queue_work().  unlikely() is dropped from %WORK_CPU_UNBOUND handling
in __queue_work().

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: set delayed_work->timer function on initialization

delayed_work->timer.function is currently initialized during
queue_delayed_work_on(). Export delayed_work_timer_fn() and set
delayed_work timer function during delayed_work initialization
together with other fields.

This ensures the timer function is always valid on an initialized
delayed_work. This is to help mod_delayed_work() implementation.

To detect delayed_work users which diddle with the internal timer,
trigger WARN if timer function doesn't match on queue.

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: disable irq while manipulating PENDING

Queueing operations use WORK_STRUCT_PENDING_BIT to synchronize access
to the target work item.  They first try to claim the bit and proceed
with queueing only after that succeeds and there's a window between
PENDING being set and the actual queueing where the task can be
interrupted or preempted.

There's also a similar window in process_one_work() when clearing
PENDING.  A work item is dequeued, gcwq->lock is released and then
PENDING is cleared and the worker might get interrupted or preempted
between releasing gcwq->lock and clearing PENDING.

cancel[_delayed]_work_sync() tries to claim or steal PENDING.  The
function assumes that a work item with PENDING is either queued or in
the process of being [de]queued.  In the latter case, it busy-loops
until either the work item loses PENDING or is queued.  If canceling
coincides with the above described interrupts or preemptions, the
canceling task will busy-loop while the queueing or executing task is
preempted.

This patch keeps irq disabled across claiming PENDING and actual
queueing and moves PENDING clearing in process_one_work() inside
gcwq->lock so that busy looping from PENDING && !queued doesn't wait
for interrupted/preempted tasks.  Note that, in process_one_work(),
setting last CPU and clearing PENDING got merged into single
operation.

This removes possible long busy-loops and will allow using
try_to_grab_pending() from bh and irq contexts.

v2: __queue_work() was testing preempt_count() to ensure that the
    caller has disabled preemption.  This triggers spuriously if
    !CONFIG_PREEMPT_COUNT.  Use preemptible() instead.  Reported by
    Fengguang Wu.

v3: Disable irq instead of preemption.  IRQ will be disabled while
    grabbing gcwq->lock later anyway and this allows using
    try_to_grab_pending() from bh and irq contexts.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>

workqueue: add missing smp_wmb() in process_one_work()

WORK_STRUCT_PENDING is used to claim ownership of a work item and
process_one_work() releases it before starting execution. When
someone else grabs PENDING, all pre-release updates to the work item
should be visible and all updates made by the new owner should happen
afterwards.

Grabbing PENDING uses test_and_set_bit() and thus has a full barrier;
however, clearing doesn't have a matching wmb. Given the preceding
spin_unlock and use of clear_bit, I don't believe this can be a
problem on an actual machine and there hasn't been any related report
but it still is theretically possible for clear_pending to permeate
upwards and happen before work->entry update.

Add an explicit smp_wmb() before work_clear_pending().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org

workqueue: make queueing functions return bool

All queueing functions return 1 on success, 0 if the work item was
already pending. Update them to return bool instead. This signifies
better that they don't return 0 / -errno.

This is cleanup and doesn't cause any functional difference.

While at it, fix comment opening for schedule_work_on().

Signed-off-by: Tejun Heo <tj@kernel.org>

workqueue: reorder queueing functions so that _on() variants are on top

Currently, queue/schedule[_delayed]_work_on() are located below the
counterpart without the _on postifx even though the latter is usually
implemented using the former. Swap them.

This is cleanup and doesn't cause any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>

Linux 3.6-rc1

Merge branch 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc

Pull OLPC platform updates from Andres Salomon:
"These move the OLPC Embedded Controller driver out of
  arch/x86/platform and into drivers/platform/olpc.

  OLPC machines are now ARM-based (which means lots of x86 and ARM
  changes), but are typically pretty self-contained..  so it makes more
  sense to go through a separate OLPC tree after getting the appropriate
  review/ACKs."

* 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc:
  x86: OLPC: move s/r-related EC cmds to EC driver
  Platform: OLPC: move global variables into priv struct
  Platform: OLPC: move debugfs support from x86 EC driver
  x86: OLPC: switch over to using new EC driver on x86
  Platform: OLPC: add a suspended flag to the EC driver
  Platform: OLPC: turn EC driver into a platform_driver
  Platform: OLPC: allow EC cmd to be overridden, and create a workqueue to call it
  drivers: OLPC: update various drivers to include olpc-ec.h
  Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driver

Merge tag 'dt2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull arm-soc Marvell Orion device-tree updates from Olof Johansson:
"This contains a set of device-tree conversions for Marvell Orion
  platforms that were staged early but took a few tries to get the
  branch into a format where it was suitable for us to pick up.

  Given that most people working on these platforms are hobbyists with
  limited time, we were a bit more flexible with merging it even though
  it came in late."

* tag 'dt2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (21 commits)
  ARM: Kirkwood: Replace mrvl with marvell
  ARM: Kirkwood: Describe GoFlex Net LEDs and SATA in DT.
  ARM: Kirkwood: Describe Dreamplug LEDs in DT.
  ARM: Kirkwood: Describe iConnects LEDs in DT.
  ARM: Kirkwood: Describe iConnects temperature sensor in DT.
  ARM: Kirkwood: Describe IB62x0 LEDs in DT.
  ARM: Kirkwood: Describe IB62x0 gpio-keys in DT.
  ARM: Kirkwood: Describe DNS32? gpio-keys in DT.
  ARM: Kirkwood: Move common portions into a kirkwood-dnskw.dtsi
  ARM: Kirkwood: Replace DNS-320/DNS-325 leds with dt bindings
  ARM: Kirkwood: Describe DNS325 temperature sensor in DT.
  ARM: Kirkwood: Use DT to configure SATA device.
  ARM: kirkwood: use devicetree for SPI on dreamplug
  ARM: kirkwood: Add LS-XHL and LS-CHLv2 support
  ARM: Kirkwood: Initial DTS support for Kirkwood GoFlex Net
  ARM: Kirkwood: Add basic device tree support for QNAP TS219.
  ATA: sata_mv: Add device tree support
  ARM: Orion: DTify the watchdog timer.
  ARM: Orion: Add arch support needed for I2C via DT.
  ARM: kirkwood: use devicetree for orion-spi
  ...

Conflicts:
drivers/watchdog/orion_wdt.c

Merge tag 'pm2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull arm-soc cpuidle enablement for OMAP from Olof Johansson:
"Coupled cpuidle was meant to merge for 3.5 through Len Brown's tree,
  but didn't go in because the pull request ended up rejected.  So it
  just got merged, and we got this staged branch that enables the
  coupled cpuidle code on OMAP.

  With a stable git workflow from the other maintainer we could have
  staged this earlier, but that wasn't the case so we have had to merge
  it late.

  The alternative is to hold it off until 3.7 but given that the code is
  well-isolated to OMAP and they are eager to see it go in, I didn't
  push back hard in that direction."

* tag 'pm2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: OMAP4: CPUidle: Open broadcast clock-event device.
  ARM: OMAP4: CPUidle: add synchronization for coupled idle states
  ARM: OMAP4: CPUidle: Use coupled cpuidle states to implement SMP cpuidle.
  ARM: OMAP: timer: allow gp timer clock-event to be used on both cpus

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"A few fixes for merge window fallout, and a bugfix for timer resume on
  PRIMA2."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: mmp: add missing irqs.h
  arm: mvebu: fix typo in .dtsi comment for Armada XP SoCs
  ARM: PRIMA2: delete redundant codes to restore LATCHED when timer resumes
  ARM: mxc: Include missing irqs.h header

Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh

Pull SuperH fixes from Paul Mundt.

* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh: (24 commits)
  sh: explicitly include sh_dma.h in setup-sh7722.c
  sh: ecovec: care CN5 VBUS if USB host mode
  sh: sh7724: fixup renesas_usbhs clock settings
  sh: intc: initial irqdomain support.
  sh: pfc: Fix up init ordering mess.
  serial: sh-sci: fix compilation breakage, when DMA is enabled
  dmaengine: shdma: restore partial transfer calculation
  sh: modify the sh_dmae_slave_config for RSPI in setup-sh7757
  sh: Fix up recursive fault in oops with unset TTB.
  sh: pfc: Build fix for pinctrl_remove_gpio_range() changes.
  sh: select the fixed regulator driver on several boards
  sh: ecovec: switch MMC power control to regulators
  sh: add fixed voltage regulators to se7724
  sh: add fixed voltage regulators to sdk7786
  sh: add fixed voltage regulators to rsk
  sh: add fixed voltage regulators to migor
  sh: add fixed voltage regulators to kfr2r09
  sh: add fixed voltage regulators to ap325rxa
  sh: add fixed voltage regulators to sh7757lcr
  sh: add fixed voltage regulators to sh2007
  ...

Merge tag 'md-3.6' of git://neil.brown.name/md

Pull additional md update from NeilBrown:
"This contains a few patches that depend on plugging changes in the
  block layer so needed to wait for those.

  It also contains a Kconfig fix for the new RAID10 support in dm-raid."

* tag 'md-3.6' of git://neil.brown.name/md:
  md/dm-raid: DM_RAID should select MD_RAID10
  md/raid1: submit IO from originating thread instead of md thread.
  raid5: raid5d handle stripe in batch way
  raid5: make_request use batch stripe release

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client

Pull two ceph fixes from Sage Weil:
"The first patch fixes up the old crufty open intent code to use the
  atomic_open stuff properly, and the second fixes a possible null deref
  and memory leak with the crypto keys."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: fix crypto key null deref, memory leak
  ceph: simplify+fix atomic_open

Merge tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs fixes from Tyler Hicks:
- Fixes a bug when the lower filesystem mount options include 'acl',
   but the eCryptfs mount options do not
- Cleanups in the messaging code
- Better handling of empty files in the lower filesystem to improve
   usability.  Failed file creations are now cleaned up and empty lower
   files are converted into eCryptfs during open().
- The write-through cache changes are being reverted due to bugs that
   are not easy to fix.  Stability outweighs the performance
   enhancements here.
- Improvement to the mount code to catch unsupported ciphers specified
   in the mount options

* tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  eCryptfs: check for eCryptfs cipher support at mount
  eCryptfs: Revert to a writethrough cache model
  eCryptfs: Initialize empty lower files when opening them
  eCryptfs: Unlink lower inode when ecryptfs_create() fails
  eCryptfs: Make all miscdev functions use daemon ptr in file private_data
  eCryptfs: Remove unused messaging declarations and function
  eCryptfs: Copy up POSIX ACL and read-only flags from lower mount

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull CIFS update from Steve French:
"Adds SMB2 rmdir/mkdir capability to the SMB2/SMB2.1 support in cifs.

  I am holding up a few more days on merging the remainder of the
  SMB2/SMB2.1 enablement although it is nearing review completion, in
  order to address some review comments from Jeff Layton on a few of the
  subsequent SMB2 patches, and also to debug an unrelated cifs problem
  that Pavel discovered."

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Add SMB2 support for rmdir
  CIFS: Move rmdir code to ops struct
  CIFS: Add SMB2 support for mkdir operation
  CIFS: Separate protocol specific part from mkdir
  CIFS: Simplify cifs_mkdir call

mm: remove node_start_pfn checking in new WARN_ON for now

Borislav Petkov reports that the new warning added in commit
88fdf75d1bb5 ("mm: warn if pg_data_t isn't initialized with zero")
triggers for him, and it is the node_start_pfn field that has already
been initialized once.

The call trace looks like this:

  x86_64_start_kernel ->
    x86_64_start_reservations ->
    start_kernel ->
    setup_arch ->
    paging_init ->
    zone_sizes_init ->
    free_area_init_nodes ->
    free_area_init_node

and (with the warning replaced by debug output), Borislav sees

  On node 0 totalpages: 4193848
    DMA zone: 64 pages used for memmap
    DMA zone: 6 pages reserved
    DMA zone: 3890 pages, LIFO batch:0
    DMA32 zone: 16320 pages used for memmap
    DMA32 zone: 798464 pages, LIFO batch:31
    Normal zone: 52736 pages used for memmap
    Normal zone: 3322368 pages, LIFO batch:31
  free_area_init_node: pgdat->node_start_pfn: 4423680      <----
  On node 1 totalpages: 4194304
    Normal zone: 65536 pages used for memmap
    Normal zone: 4128768 pages, LIFO batch:31
  free_area_init_node: pgdat->node_start_pfn: 8617984      <----
  On node 2 totalpages: 4194304
    Normal zone: 65536 pages used for memmap
    Normal zone: 4128768 pages, LIFO batch:31
  free_area_init_node: pgdat->node_start_pfn: 12812288     <----
  On node 3 totalpages: 4194304
    Normal zone: 65536 pages used for memmap
    Normal zone: 4128768 pages, LIFO batch:31

so remove the bogus warning for now to avoid annoying people.  Minchan
Kim is looking at it.

Reported-by: Borislav Petkov <bp@amd64.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ARM: mmp: add missing irqs.h

arch/arm/mach-mmp/gplugd.c:195:13: error: ‘MMP_NR_IRQS’ undeclared here
(not in a function)
make[1]: *** [arch/arm/mach-mmp/gplugd.o] Error 1

Include <mach/irqs.h> to fix this issue.

Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

arm: mvebu: fix typo in .dtsi comment for Armada XP SoCs

The comment was wrongly referring to Armada 370 while the file is
related to Armada XP.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

ARM: PRIMA2: delete redundant codes to restore LATCHED when timer resumes

The only way to write LATCHED registers to write LATCH_BIT to LATCH register,
that will latch COUNTER into LATCHED.e.g.
writel_relaxed(SIRFSOC_TIMER_LATCH_BIT, sirfsoc_timer_base +
SIRFSOC_TIMER_LATCH);

Writing values to LATCHED registers directly is useless at all.

Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

libceph: fix crypto key null deref, memory leak

Avoid crashing if the crypto key payload was NULL, as when it was not correctly
allocated and initialized. Also, avoid leaking it.

Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>

ceph: simplify+fix atomic_open

The initial ->atomic_open op was carried over from the old intent code,
which was incomplete and didn't really work.  Replace it with a fresh
method.  In particular:

* always attempt to do an atomic open+lookup, both for the create case
   and for lookups of existing files.
* fix symlink handling by returning 1 to the VFS so that we can follow
   the link to its destination. This fixes a longstanding ceph bug (#2392).

Signed-off-by: Sage Weil <sage@inktank.com>

sh: explicitly include sh_dma.h in setup-sh7722.c

setup-sh7722.c defines several objects, whose types are defined in
sh_dma.h, so, it has to be included explicitly.

Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS updates from Ralf Baechle:
"The lion share of this pull request are fixes for clk-related breakage
  caused by other changes during this merge window.  For some platforms
  the fix was as simple as selecting HAVE_CLK, for others like the
  Loongson 2 significant restructuring was required.

  The remainder are changes required to get the Lantiq code to work
  again."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: Loongson 2: Sort out clock managment.
  MIPS: Loongson 1: more clk support and add select HAVE_CLK
  MIPS: txx9: Fix redefinition of clk_* by adding select HAVE_CLK
  MIPS: BCM63xx: Fix redefinition of clk_* by adding select HAVE_CLK
  MIPS: AR7: Fix redefinition of clk_* by adding select HAVE_CLK
  MIPS: Lantiq: Platform specific CLK fixup
  MIPS: Lantiq: Add device_tree_init function
  MIPS: Lantiq: Fix interface clock and PCI control register offset

Merge branch 'for-linus-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml

Pull UML fixes from Richard Weinberger:
"This patch set contains mostly fixes and cleanups.  The UML tty driver
  uses now tty_port and is no longer broken like hell  :-)"

* 'for-linus-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  um: Add arch/x86/um to MAINTAINERS
  um: pass siginfo to guest process
  um: fix ubd_file_size for read-only files
  um: pull interrupt_end() into userspace()
  um: split syscall_trace(), pass pt_regs to it
  um: switch UPT_SET_RETURN_VALUE and regs_return_value to pt_regs
  um: set BLK_CGROUP=y in defconfig
  um: remove count_lock
  um: fully use tty_port
  um: Remove dead code
  um: remove line_ioctl()
  TTY: um/line, use tty from tty_port
  TTY: um/line, add tty_port

Merge branch 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM DMA engine updates from Russell King:
"This looks scary at first glance, but what it is is:
   - a rework of the sa11x0 DMA engine driver merged during the previous
     cycle, to extract a common set of helper functions for DMA engine
     implementations.
   - conversion of amba-pl08x.c to use these helper functions.
   - addition of OMAP DMA engine driver (using these helper functions),
     and conversion of some of the OMAP DMA users to use DMA engine.

  Nothing in the helper functions is ARM specific, so I hope that other
  implementations can consolidate some of their code by making use of
  these helpers.

  This has been sitting in linux-next most of the merge cycle, and has
  been tested by several OMAP folk.  I've tested it on sa11x0 platforms,
  and given it my best shot on my broken platforms which have the
  amba-pl08x controller.

  The last point is the addition to feature-removal-schedule.txt, which
  will have a merge conflict.  Between myself and TI, we're planning to
  remove the old TI DMA implementation next year."

Fix up trivial add/add conflicts in Documentation/feature-removal-schedule.txt
and drivers/dma/{Kconfig,Makefile}

* 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm: (53 commits)
  ARM: 7481/1: OMAP2+: omap2plus_defconfig: enable OMAP DMA engine
  ARM: 7464/1: mmc: omap_hsmmc: ensure probe returns error if DMA channel request fails
  Add feature removal of old OMAP private DMA implementation
  mtd: omap2: remove private DMA API implementation
  mtd: omap2: add DMA engine support
  spi: omap2-mcspi: remove private DMA API implementation
  spi: omap2-mcspi: add DMA engine support
  ARM: omap: remove mmc platform data dma_mask and initialization
  mmc: omap: remove private DMA API implementation
  mmc: omap: add DMA engine support
  mmc: omap_hsmmc: remove private DMA API implementation
  mmc: omap_hsmmc: add DMA engine support
  dmaengine: omap: add support for cyclic DMA
  dmaengine: omap: add support for setting fi
  dmaengine: omap: add support for returning residue in tx_state method
  dmaengine: add OMAP DMA engine driver
  dmaengine: sa11x0-dma: add cyclic DMA support
  dmaengine: sa11x0-dma: fix DMA residue support
  dmaengine: PL08x: ensure all descriptors are freed when channel is released
  dmaengine: PL08x: get rid of write only pool_ctr and free_txd locking
  ...

Merge branch 'audit' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM audit/signal updates from Russell King:
"ARM audit/signal handling updates from Al and Will.  This improves on
  the work Viro did last merge window, and sorts out some of the issues
  found with that work."

* 'audit' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7475/1: sys_trace: allow all syscall arguments to be updated via ptrace
  ARM: 7474/1: get rid of TIF_SYSCALL_RESTARTSYS
  ARM: 7473/1: deal with handlerless restarts without leaving the kernel
  ARM: 7472/1: pull all work_pending logics into C function
  ARM: 7471/1: Revert "7442/1: Revert "remove unused restart trampoline""
  ARM: 7470/1: Revert "7443/1: Revert "new way of handling ERESTART_RESTARTBLOCK""

Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
"This fixes various issues found during July"

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7479/1: mm: avoid NULL dereference when flushing gate_vma with VIVT caches
  ARM: Fix undefined instruction exception handling
  ARM: 7480/1: only call smp_send_stop() on SMP
  ARM: 7478/1: errata: extend workaround for erratum #720789
  ARM: 7477/1: vfp: Always save VFP state in vfp_pm_suspend on UP
  ARM: 7476/1: vfp: only clear vfp state for current cpu in vfp_pm_suspend
  ARM: 7468/1: ftrace: Trace function entry before updating index
  ARM: 7467/1: mutex: use generic xchg-based implementation for ARMv6+
  ARM: 7466/1: disable interrupt before spinning endlessly
  ARM: 7465/1: Handle >4GB memory sizes in device tree and mem=size@start option

um: Add arch/x86/um to MAINTAINERS

Signed-off-by: Richard Weinberger <richard@nod.at>

um: pass siginfo to guest process

UML guest processes now get correct siginfo_t for SIGTRAP, SIGFPE,
SIGILL and SIGBUS. Specifically, si_addr and si_code are now correct
where previously they were si_addr = NULL and si_code = 128.

Signed-off-by: Martin Pärtel <martin.partel@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>

um: fix ubd_file_size for read-only files

Made ubd_file_size not request write access. Fixes use of read-only images.

Signed-off-by: Martin Pärtel <martin.partel@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>

md/dm-raid: DM_RAID should select MD_RAID10

Now that DM_RAID supports raid10, it needs to select that code
to ensure it is included.

Cc: Jonathan Brassow <jbrassow@redhat.com>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid1: submit IO from originating thread instead of md thread.

queuing writes to the md thread means that all requests go through the
one processor which may not be able to keep up with very high request
rates.

So use the plugging infrastructure to submit all requests on unplug.
If a 'schedule' is needed, we fall back on the old approach of handing
the requests to the thread for it to handle.

Signed-off-by: NeilBrown <neilb@suse.de>

raid5: raid5d handle stripe in batch way

Let raid5d handle stripe in batch way to reduce conf->device_lock locking.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>

raid5: make_request use batch stripe release

make_request() does stripe release for every stripe and the stripe usually has
count 1, which makes previous release_stripe() optimization not work. In my
test, this release_stripe() becomes the heaviest pleace to take
conf->device_lock after previous patches applied.

Below patch makes stripe release batch. All the stripes will be released in
unplug. The STRIPE_ON_UNPLUG_LIST bit is to protect concurrent access stripe
lru.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: NeilBrown <neilb@suse.de>

um: pull interrupt_end() into userspace()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>

um: split syscall_trace(), pass pt_regs to it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[richard@nod.at: Fixed some minor build issues]
Signed-off-by: Richard Weinberger <richard@nod.at>

um: switch UPT_SET_RETURN_VALUE and regs_return_value to pt_regs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>

Merge tag 'fbdev-updates-for-3.6' of git://github.com/schandinat/linux-2.6

Pull fbdev updates from Florian Tobias Schandinat:
- large updates for OMAP
   - support for LCD3 overlay manager (omap5)
   - omapdss output cleanup
   - removal of passive matrix LCD support as there are no drivers for
     such panels for DSS or DSS2 and nobody complained (cleanup)
- large updates for SH Mobile
   - overlay support
   - separating MERAM (cache) from framebuffer driver
- some updates for Exynos and da8xx-fb
- various other small patches

* tag 'fbdev-updates-for-3.6' of git://github.com/schandinat/linux-2.6: (78 commits)
  da8xx-fb: fix compile issue due to missing include
  fbdev: Make pixel_to_pat() failure mode more friendly
  da8xx-fb: do not turn ON LCD backlight unless LCDC is enabled
  fbdev: sh_mobile_lcdc: Fix vertical panning step
  video: exynos mipi dsi: Fix mipi dsi regulators handling issue
  video: da8xx-fb: do clock reset of revision 2 LCDC before enabling
  arm: da850: configure LCDC fifo threshold
  video: da8xx-fb: configure FIFO threshold to reduce underflow errors
  video: da8xx-fb: fix flicker due to 1 frame delay in updated frame
  video: da8xx-fb rev2: fix disabling of palette completion interrupt
  da8xx-fb: add missing FB_BLANK operations
  video: exynos_dp: use usleep_range instead of delay
  video: exynos_dp: check the only INTERLANE_ALIGN_DONE bit during Link Training
  fb: epson1355fb: Fix section mismatch
  video: exynos_dp: fix wrong DPCD address during Link Training
  video/smscufx: fix line counting in fb_write
  aty128fb: Fix coding style issues
  fbdev: sh_mobile_lcdc: Fix pan offset computation in YUV mode
  fbdev: sh_mobile_lcdc: Fix overlay registers update during pan operation
  fbdev: sh_mobile_lcdc: Support horizontal panning
  ...

Merge tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes that have been found recently.  Most of
  the commits are regression fixes in HD-audio and some other random
  drivers."

* tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: snd-usb: fix clock source validity index
  ALSA: hda - Fix mute-LED GPIO initialization for IDT codecs
  ALSA: hda - Add descriptions for missing IDT 92HD83x models
  ALSA: hda - Fix polarity of mute LED on HP Mini 210
  ALSA: es1688 - freeup resources on init failure
  ALSA: hda - Workaround for silent output on VAIO Z with ALC889
  ALSA: hda - Fix WARNING from HDMI/DP parser
  ALSA: hda - Detach from converter at closing in patch_hdmi.c
  ALSA: hda - Fix mute-LED GPIO setup for HP Mini 210
  ALSA: mpu401: Fix missing initialization of irq field
  ALSA: hda - Fix invalid D3 of headphone DAC on VT202x codecs

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull second vfs pile from Al Viro:
"The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
  deadlock reproduced by xfstests 068), symlink and hardlink restriction
  patches, plus assorted cleanups and fixes.

  Note that another fsfreeze deadlock (emergency thaw one) is *not*
  dealt with - the series by Fernando conflicts a lot with Jan's, breaks
  userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
  for massive vfsmount leak; this is going to be handled next cycle.
  There probably will be another pull request, but that stuff won't be
  in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  delousing target_core_file a bit
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  fs: Remove old freezing mechanism
  ext2: Implement freezing
  btrfs: Convert to new freezing mechanism
  nilfs2: Convert to new freezing mechanism
  ntfs: Convert to new freezing mechanism
  fuse: Convert to new freezing mechanism
  gfs2: Convert to new freezing mechanism
  ocfs2: Convert to new freezing mechanism
  xfs: Convert to new freezing code
  ext4: Convert to new freezing mechanism
  fs: Protect write paths by sb_start_write - sb_end_write
  fs: Skip atime update on frozen filesystem
  fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  fs: Improve filesystem freezing handling
  switch the protection of percpu_counter list to spinlock
  nfsd: Push mnt_want_write() outside of i_mutex
  btrfs: Push mnt_want_write() outside of i_mutex
  fat: Push mnt_want_write() outside of i_mutex
  ...

MIPS: Loongson 2: Sort out clock managment.

For unexplainable reasons the Loongson 2 clock API was implemented in a
module so fixing this involved shifting large amounts of code around.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

Merge branch 'for-3.6/drivers' of git://git.kernel.dk/linux-block

Pull block driver changes from Jens Axboe:

- Making the plugging support for drivers a bit more sane from Neil.
   This supersedes the plugging change from Shaohua as well.

- The usual round of drbd updates.

- Using a tail add instead of a head add in the request completion for
   ndb, making us find the most completed request more quickly.

- A few floppy changes, getting rid of a duplicated flag and also
   running the floppy init async (since it takes forever in boot terms)
   from Andi.

* 'for-3.6/drivers' of git://git.kernel.dk/linux-block:
  floppy: remove duplicated flag FD_RAW_NEED_DISK
  blk: pass from_schedule to non-request unplug functions.
  block: stack unplug
  blk: centralize non-request unplug handling.
  md: remove plug_cnt feature of plugging.
  block/nbd: micro-optimization in nbd request completion
  drbd: announce FLUSH/FUA capability to upper layers
  drbd: fix max_bio_size to be unsigned
  drbd: flush drbd work queue before invalidate/invalidate remote
  drbd: fix potential access after free
  drbd: call local-io-error handler early
  drbd: do not reset rs_pending_cnt too early
  drbd: reset congestion information before reporting it in /proc/drbd
  drbd: report congestion if we are waiting for some userland callback
  drbd: differentiate between normal and forced detach
  drbd: cleanup, remove two unused global flags
  floppy: Run floppy initialization asynchronous

Merge branch 'for-3.6/core' of git://git.kernel.dk/linux-block

Pull core block IO bits from Jens Axboe:
"The most complicated part if this is the request allocation rework by
  Tejun, which has been queued up for a long time and has been in
  for-next ditto as well.

  There are a few commits from yesterday and today, mostly trivial and
  obvious fixes.  So I'm pretty confident that it is sound.  It's also
  smaller than usual."

* 'for-3.6/core' of git://git.kernel.dk/linux-block:
  block: remove dead func declaration
  block: add partition resize function to blkpg ioctl
  block: uninitialized ioc->nr_tasks triggers WARN_ON
  block: do not artificially constrain max_sectors for stacking drivers
  blkcg: implement per-blkg request allocation
  block: prepare for multiple request_lists
  block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv
  blkcg: inline bio_blkcg() and friends
  block: allocate io_context upfront
  block: refactor get_request[_wait]()
  block: drop custom queue draining used by scsi_transport_{iscsi|fc}
  mempool: add @gfp_mask to mempool_create_node()
  blkcg: make root blkcg allocation use %GFP_KERNEL
  blkcg: __blkg_lookup_create() doesn't need radix preload

Merge branch 'for-next' of git://neil.brown.name/md

Pull md updates from NeilBrown.

* 'for-next' of git://neil.brown.name/md:
  DM RAID: Add support for MD RAID10
  md/RAID1: Add missing case for attempting to repair known bad blocks.
  md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE.
  md/raid1: don't abort a resync on the first badblock.
  md: remove duplicated test on ->openers when calling do_md_stop()
  raid5: Add R5_ReadNoMerge flag which prevent bio from merging at block layer
  md/raid1: prevent merging too large request
  md/raid1: read balance chooses idlest disk for SSD
  md/raid1: make sequential read detection per disk based
  MD RAID10: Export md_raid10_congested
  MD: Move macros from raid1*.h to raid1*.c
  MD RAID1: rename mirror_info structure
  MD RAID10: rename mirror_info structure
  MD RAID10: Fix compiler warning.
  raid5: add a per-stripe lock
  raid5: remove unnecessary bitmap write optimization
  raid5: lockless access raid5 overrided bi_phys_segments
  raid5: reduce chance release_stripe() taking device_lock

locks: remove unused lm_release_private

In commit 3b6e2723f32d ("locks: prevent side-effects of
locks_release_private before file_lock is initialized") we removed the
last user of lm_release_private without removing the field itself.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MIPS: Loongson 1: more clk support and add select HAVE_CLK

This fixes a redefinition of clk_*:

arch/mips/loongson1/common/clock.c:23:13: error: redefinition of 'clk_get'
include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here
arch/mips/loongson1/common/clock.c:41:15: error: redefinition of 'clk_get_rate'
include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here
make[3]: *** [arch/mips/loongson1/common/clock.o] Error 1

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Reviewed-by: John Crispin <blogic@openwrt.org>
Acked-by: Kelvin Cheung <keguang.zhang@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/4143/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: txx9: Fix redefinition of clk_* by adding select HAVE_CLK

arch/mips/txx9/generic/setup.c:87:13: error: redefinition of 'clk_get'
include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here
arch/mips/txx9/generic/setup.c:97:5: error: redefinition of 'clk_enable'
include/linux/clk.h:295:19: note: previous definition of 'clk_enable' was here
arch/mips/txx9/generic/setup.c:103:6: error: redefinition of 'clk_disable'
include/linux/clk.h:300:20: note: previous definition of 'clk_disable' was here
arch/mips/txx9/generic/setup.c:108:15: error: redefinition of 'clk_get_rate'
include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here
arch/mips/txx9/generic/setup.c:114:6: error: redefinition of 'clk_put'
include/linux/clk.h:291:20: note: previous definition of 'clk_put' was here
make[3]: *** [arch/mips/txx9/generic/setup.o] Error 1

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Reviewed-by: John Crispin <blogic@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/4142/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: BCM63xx: Fix redefinition of clk_* by adding select HAVE_CLK

arch/mips/bcm63xx/clk.c:249:5: error: redefinition of 'clk_enable'
include/linux/clk.h:295:19: note: previous definition of 'clk_enable' was here
arch/mips/bcm63xx/clk.c:259:6: error: redefinition of 'clk_disable'
include/linux/clk.h:300:20: note: previous definition of 'clk_disable' was here
arch/mips/bcm63xx/clk.c:268:15: error: redefinition of 'clk_get_rate'
include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here
arch/mips/bcm63xx/clk.c:275:13: error: redefinition of 'clk_get'
include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here
arch/mips/bcm63xx/clk.c:302:6: error: redefinition of 'clk_put'
include/linux/clk.h:291:20: note: previous definition of 'clk_put' was here
make[2]: *** [arch/mips/bcm63xx/clk.o] Error 1

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Reviewed-by: John Crispin <blogic@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/4141/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: AR7: Fix redefinition of clk_* by adding select HAVE_CLK

arch/mips/ar7/clock.c:420:5: error: redefinition of 'clk_enable'
include/linux/clk.h:295:19: note: previous definition of 'clk_enable' was here
arch/mips/ar7/clock.c:426:6: error: redefinition of 'clk_disable'
include/linux/clk.h:300:20: note: previous definition of 'clk_disable' was here
arch/mips/ar7/clock.c:431:15: error: redefinition of 'clk_get_rate'
include/linux/clk.h:302:29: note: previous definition of 'clk_get_rate' was here
arch/mips/ar7/clock.c:437:13: error: redefinition of 'clk_get'
include/linux/clk.h:281:27: note: previous definition of 'clk_get' was here
arch/mips/ar7/clock.c:454:6: error: redefinition of 'clk_put'
include/linux/clk.h:291:20: note: previous definition of 'clk_put' was here
make[2]: *** [arch/mips/ar7/clock.o] Error 1

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Cc: linux-mips@linux-mips.org
Reviewed-by: John Crispin <blogic@openwrt.org>
Acked-by: Florian Fainelli <florian@openwrt.org>
Patchwork: https://patchwork.linux-mips.org/patch/4140/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Lantiq: Platform specific CLK fixup

As we use CLKDEV_LOOKUP but dont have support for COMMON_CLK yet, we need to
provide our own version of of_clk_get_from_provider().

Signed-off-by: John Crispin <blogic@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4117/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Lantiq: Add device_tree_init function

Add a lantiq specific version of device_tree_init. The generic MIPS version
was removed by.

commit 594e966bc412d64eec9282d28ce511bdd62fea39
Author: David Daney <david.daney@cavium.com>
Date: Thu Jul 5 18:12:38 2012 +0200

MIPS: Prune some target specific code out of prom.c

Signed-off-by: John Crispin <blogic@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4116/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Lantiq: Fix interface clock and PCI control register offset

The XRX200 based SoC have a different register offset for the interface
clock and PCI control registers. This patch detects the SoC and sets the
register offset at runtime. This make PCI work on the VR9 SoC.

Signed-off-by: John Crispin <blogic@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4113/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

delousing target_core_file a bit

* set_fs(KERNEL_DS) + getname() is probably the weirdest implementation
of strdup() I've seen. Especially since they don't to copy it at all...
* filp_open() never returns NULL; it's ERR_PTR(-E...) on failure.
* file->f_dentry is never going to be NULL, TYVM.
* match_strdup() + snprintf() + kfree() is a bloody weird way to spell
match_strlcpy().

Pox on cargo-cult programmers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

DM RAID: Add support for MD RAID10

Support the MD RAID10 personality through dm-raid.c

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Merge commit 'c039c332f23e794deb6d6f37b9f07ff3b27fb2cf' into md

Pull in pre-requisites for adding raid10 support to dm-raid.

block: remove dead func declaration

__generic_unplug_device() function is removed with commit
7eaceaccab5f40bbfda044629a6298616aeaed50, which forgot to
remove the declaration at meantime. Here remove it.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: add partition resize function to blkpg ioctl

Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
allows altering the size of an existing partition, even if it is currently
in use.

This patch converts hd_struct->nr_sects into sequence counter because
One might extend a partition while IO is happening to it and update of
nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
can lead to issues like reading inconsistent size of a partition. Sequence
counter have been used so that readers don't have to take bdev mutex lock
as we call sector_in_part() very frequently.

Now all the access to hd_struct->nr_sects should happen using sequence
counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
There is one exception though, set_capacity()/get_capacity(). I think
theoritically race should exist there too but this patch does not
modify set_capacity()/get_capacity() due to sheer number of call sites
and I am afraid that change might break something. I have left that as a
TODO item. We can handle it later if need be. This patch does not introduce
any new races as such w.r.t set_capacity()/get_capacity().

v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Phillip Susi <psusi@ubuntu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: uninitialized ioc->nr_tasks triggers WARN_ON

Hi,

I'm using the old-fashioned 'dump' backup tool, and I noticed that it spews the
below warning as of 3.5-rc1 and later (3.4 is fine):

[   10.886893] ------------[ cut here ]------------
[   10.886904] WARNING: at include/linux/iocontext.h:140 copy_process+0x1488/0x1560()
[   10.886905] Hardware name: Bochs
[   10.886906] Modules linked in:
[   10.886908] Pid: 2430, comm: dump Not tainted 3.5.0-rc7+ #27
[   10.886908] Call Trace:
[   10.886911]  [<ffffffff8107ce8a>] warn_slowpath_common+0x7a/0xb0
[   10.886912]  [<ffffffff8107ced5>] warn_slowpath_null+0x15/0x20
[   10.886913]  [<ffffffff8107c088>] copy_process+0x1488/0x1560
[   10.886914]  [<ffffffff8107c244>] do_fork+0xb4/0x340
[   10.886918]  [<ffffffff8108effa>] ? recalc_sigpending+0x1a/0x50
[   10.886919]  [<ffffffff8108f6b2>] ? __set_task_blocked+0x32/0x80
[   10.886920]  [<ffffffff81091afa>] ? __set_current_blocked+0x3a/0x60
[   10.886923]  [<ffffffff81051db3>] sys_clone+0x23/0x30
[   10.886925]  [<ffffffff8179bd73>] stub_clone+0x13/0x20
[   10.886927]  [<ffffffff8179baa2>] ? system_call_fastpath+0x16/0x1b
[   10.886928] ---[ end trace 32a14af7ee6a590b ]---

Reproducing is easy, I can hit it on a KVM system with a very basic
config (x86_64 make defconfig + enable the drivers needed). To hit it,
just install dump (on debian/ubuntu, not sure what the package might be
called on Fedora), and:

dump -o -f /tmp/foo /

You'll see the warning in dmesg once it forks off the I/O process and
starts dumping filesystem contents.

I bisected it down to the following commit:

commit f6e8d01bee036460e03bd4f6a79d014f98ba712e
Author: Tejun Heo <tj@kernel.org>
Date:   Mon Mar 5 13:15:26 2012 -0800

    block: add io_context->active_ref

    Currently ioc->nr_tasks is used to decide two things - whether an ioc
    is done issuing IOs and whether it's shared by multiple tasks.  This
    patch separate out the first into ioc->active_ref, which is acquired
    and released using {get|put}_io_context_active() respectively.

    This will be used to associate bio's with a given task.  This patch
    doesn't introduce any visible behavior change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It seems like the init of ioc->nr_tasks was removed in that patch,
so it starts out at 0 instead of 1.

Tejun, is the right thing here to add back the init, or should something else
be done?

The below patch removes the warning, but I haven't done any more extensive
testing on it.

Signed-off-by: Olof Johansson <olof@lixom.net>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

block: do not artificially constrain max_sectors for stacking drivers

blk_set_stacking_limits is intended to allow stacking drivers to build
up the limits of the stacked device based on the underlying devices'
limits. But defaulting 'max_sectors' to BLK_DEF_MAX_SECTORS (1024)
doesn't allow the stacking driver to inherit a max_sectors larger than
1024 -- due to blk_stack_limits' use of min_not_zero.

It is now clear that this artificial limit is getting in the way so
change blk_set_stacking_limits's max_sectors to UINT_MAX (which allows
stacking drivers like dm-multipath to inherit 'max_sectors' from the
underlying paths).

Reported-by: Vijay Chauhan <vijay.chauhan@netapp.com>
Tested-by: Vijay Chauhan <vijay.chauhan@netapp.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

ALSA: snd-usb: fix clock source validity index

uac_clock_source_is_valid() uses the control selector value to access
the bmControls bitmap of the clock source unit. This is wrong, as
control selector values start from 1, while the bitmap uses all
available bits.

In other words, "Clock Validity Control" is stored in D3..2, not D5..4
of the clock selector unit's bmControls.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: Andreas Koch <andreas@akdesigninc.com>
Cc: stable@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'common/irqdomain' into sh-latest

sh: ecovec: care CN5 VBUS if USB host mode

renesas_usbhs driver can control both USB Host/Gadget,
but it needs VBUS output if Host mode.

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

sh: sh7724: fixup renesas_usbhs clock settings

8cc88a55b03bd4940390125c2521c99368513be5
(sh: sh7724: use runtime PM implementation) broke sh7724 clocks.

renesas_usbhs needs HWBLK_USB0/1 clock on sh7724

Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>

sh: intc: initial irqdomain support.

Trivial support for irq domains, using either a linear map or radix tree
depending on the vector layout.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>

Merge branch 'common/pinctrl' into sh-latest