]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
12 years agoMerge branches 'bigrt.2012.07.31a', 'doctorture.2012.08.03a', 'fixes.2012.08.03a...
Paul E. McKenney [Fri, 3 Aug 2012 15:56:17 +0000 (08:56 -0700)]
Merge branches 'bigrt.2012.07.31a', 'doctorture.2012.08.03a', 'fixes.2012.08.03a', 'hotplug.2012.08.03a' and 'idle.2012.07.31a' into dev.3.7.2012.08.03a

Conflicts:
kernel/rcutree.c

12 years agorcu: Fix obsolete rcu_initiate_boost() header comment
Paul E. McKenney [Wed, 1 Aug 2012 22:57:54 +0000 (15:57 -0700)]
rcu: Fix obsolete rcu_initiate_boost() header comment

Commit 1217ed1b (rcu: permit rcu_read_unlock() to be called while holding
runqueue locks) made rcu_initiate_boost() restore irq state when releasing
the rcu_node structure's ->lock, but failed to update the header comment
accordingly.  This commit therefore brings the header comment up to date.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Make offline-CPU checking allow for indefinite delays
Paul E. McKenney [Wed, 1 Aug 2012 21:29:20 +0000 (14:29 -0700)]
rcu: Make offline-CPU checking allow for indefinite delays

The rcu_implicit_offline_qs() function implicitly assumed that execution
would progress predictably when interrupts are disabled, which is of course
not guaranteed when running on a hypervisor.  Furthermore, this function
is short, and is called from one place only in a short function.

This commit therefore ensures that the timing is checked before
checking the condition, which guarantees correct behavior even given
indefinite delays.  It also inlines rcu_implicit_offline_qs() into
rcu_implicit_dynticks_qs().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Improve boost selection when moving tasks to root rcu_node
Paul E. McKenney [Tue, 31 Jul 2012 21:09:49 +0000 (14:09 -0700)]
rcu: Improve boost selection when moving tasks to root rcu_node

The rcu_preempt_offline_tasks() moves all tasks queued on a given leaf
rcu_node structure to the root rcu_node, which is done when the last CPU
corresponding the the leaf rcu_node structure goes offline.  Now that
RCU-preempt's synchronize_rcu_expedited() implementation blocks CPU-hotplug
operations during the initialization of each rcu_node structure's
->boost_tasks pointer, rcu_preempt_offline_tasks() can do a better job
of setting the root rcu_node's ->boost_tasks pointer.

The key point is that rcu_preempt_offline_tasks() runs as part of the
CPU-hotplug process, so that a concurrent synchronize_rcu_expedited() is
guaranteed to either have not started on the one hand (in which case there
is no boosting on behalf of the expedited grace period) to be completely
initialized on the other (in which case, in absence of other priority
boosting, all ->boost_tasks pointers will be initialized).  Therefore,
if rcu_preempt_offline_tasks() finds that the ->boost_tasks pointer is
equal to the ->exp_tasks pointer, it can be sure that it is correcty
placed.

The case where there was boosting ongoing at the time that the
synchronize_rcu_expedited() function started, different nodes might
start boosting the tasks blocking the expedited grace period at different
times.  In this mixed case, the root node will either be boosting tasks
for the expedited grace period already, or it will start as soon as it
gets done boosting for the normal grace period -- but in this latter
case, the root node's tasks needed to be boosted in any case.

This commit therefore adds a check of the ->boost_tasks pointer against
the ->exp_tasks pointer to the list that prevents updating ->boost_tasks.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Track CPU-hotplug duration statistics
Paul E. McKenney [Mon, 23 Jul 2012 19:05:55 +0000 (12:05 -0700)]
rcu: Track CPU-hotplug duration statistics

Many rcutorture runs include CPU-hotplug operations in their stress
testing.  This commit accumulates statistics on the durations of these
operations in deference to the recent concern about the overhead and
latency of these operations.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Remove _rcu_barrier() dependency on __stop_machine()
Paul E. McKenney [Fri, 3 Aug 2012 00:43:50 +0000 (17:43 -0700)]
rcu: Remove _rcu_barrier() dependency on __stop_machine()

Currently, _rcu_barrier() relies on preempt_disable() to prevent
any CPU from going offline, which in turn depends on CPU hotplug's
use of __stop_machine().

This patch therefore makes _rcu_barrier() use get_online_cpus() to
block CPU-hotplug operations.  This has the added benefit of removing
the need for _rcu_barrier() to adopt callbacks:  Because CPU-hotplug
operations are excluded, there can be no callbacks to adopt.  This
commit simplifies the code accordingly.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agosched: Make migration_call() safe for stop_machine()-free hotplug
Paul E. McKenney [Tue, 24 Jul 2012 21:57:17 +0000 (14:57 -0700)]
sched: Make migration_call() safe for stop_machine()-free hotplug

The CPU_DYING branch of migration_call() relies on the fact that
CPU-hotplug offline operations use stop_machine().  This commit therefore
attempts to remedy this situation by acquiring the relevant runqueue
locks.  Note that sched_ttwu_pending() remains outside of the scope of
these new runqueue-lock critical sections because (1) sched_ttwu_pending()
does its own runqueue-lock acquisition and (2) sched_ttwu_pending() handles
pending wakeups, and no further wakeups can select this CPU because it
is already marked as offline.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Permit RCU_NONIDLE() to be used from interrupt context
Paul E. McKenney [Tue, 31 Jul 2012 17:12:48 +0000 (10:12 -0700)]
rcu: Permit RCU_NONIDLE() to be used from interrupt context

There is a need to use RCU from interrupt context, but either before
rcu_irq_enter() is called or after rcu_irq_exit() is called.  If the
interrupt occurs from idle, then lockdep-RCU will complain about such
uses, as they appear to be illegal uses of RCU from the idle loop.
In other environments, RCU_NONIDLE() could be used to properly protect
the use of RCU, but RCU_NONIDLE() currently cannot be invoked except
from process context.

This commit therefore modifies RCU_NONIDLE() to permit its use more
globally.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Properly initialize ->boost_tasks on CPU offline
Paul E. McKenney [Fri, 27 Jul 2012 20:41:47 +0000 (13:41 -0700)]
rcu: Properly initialize ->boost_tasks on CPU offline

When rcu_preempt_offline_tasks() clears tasks from a leaf rcu_node
structure, it does not NULL out the structure's ->boost_tasks field.
This commit therefore fixes this issue.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Move unanticipated condition from "if" to "WARN_ON()"
Paul E. McKenney [Fri, 27 Jul 2012 19:40:30 +0000 (12:40 -0700)]
rcu: Move unanticipated condition from "if" to "WARN_ON()"

The rcu_preempt_offline_tasks() checks for any of the rcu_node structure's
CPUs still being in an RCU critical section. However, it is extremely
unlikely that any will be due to the fact that all the CPUs must be
offline when this function is called.  This commit therefore moves the
check from the "if" statement to a "WARN_ON()".

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Reduce synchronize_rcu_expedited() latency
Paul E. McKenney [Tue, 31 Jul 2012 00:19:25 +0000 (17:19 -0700)]
rcu: Reduce synchronize_rcu_expedited() latency

The synchronize_rcu_expedited() function disables interrupts across a
scan of all leaf rcu_node structures, which is not good for real-time
scheduling latency on large systems (hundreds or especially thousands
of CPUs).  This commit therefore holds off CPU-hotplug operations using
get_online_cpus(), and removes the prior acquisiion of the ->onofflock
(which required disabling interrupts).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Eliminate signed overflow in synchronize_rcu_expedited()
Paul E. McKenney [Mon, 23 Jul 2012 23:03:51 +0000 (16:03 -0700)]
rcu: Eliminate signed overflow in synchronize_rcu_expedited()

In the C language, signed overflow is undefined.  It is true that
twos-complement arithmetic normally comes to the rescue, but if the
compiler can subvert this any time it has any information about the values
being compared.  For example, given "if (a - b > 0)", if the compiler
has enough information to realize that (for example) the value of "a"
is positive and that of "b" is negative, the compiler is within its
rights to optimize to a simple "if (1)", which might not be what you want.

This commit therefore converts synchronize_rcu_expedited()'s work-done
detection counter from signed to unsigned.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Update rcutorture defaults
Paul E. McKenney [Mon, 23 Jul 2012 19:30:22 +0000 (12:30 -0700)]
rcu: Update rcutorture defaults

A number of new features have been added to rcutorture over the years, but
the defaults have not been updated to include them.  This commit therefore
turns on a couple of them that have proven helpful and trustworthy, namely
periodic progress reports and testing of NO_HZ.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Userspace RCU extended QS selftest
Frederic Weisbecker [Wed, 11 Jul 2012 18:26:40 +0000 (20:26 +0200)]
rcu: Userspace RCU extended QS selftest

Provide a config option that enables the userspace
RCU extended quiescent state on every CPUs by default.

This is for testing purpose.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agox86: Exit RCU extended QS on notify resume
Frederic Weisbecker [Wed, 11 Jul 2012 18:26:39 +0000 (20:26 +0200)]
x86: Exit RCU extended QS on notify resume

do_notify_resume() may be called on irq or exception
exit. But at that time the exception has already called
rcu_user_enter() and the irq has already called rcu_irq_exit().

Since it can use RCU read side critical section, we must call
rcu_user_exit() before doing anything there. Then we must call
back rcu_user_enter() after this function because we know we are
going to userspace from there.

This complete support for userspace RCU extended quiescent state
in x86-64.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agox86: Use the new schedule_user API on userspace preemption
Frederic Weisbecker [Wed, 11 Jul 2012 18:26:38 +0000 (20:26 +0200)]
x86: Use the new schedule_user API on userspace preemption

This way we can exit the RCU extended quiescent state before
we schedule a new task from irq/exception exit.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Exit RCU extended QS on user preemption
Frederic Weisbecker [Wed, 11 Jul 2012 18:26:37 +0000 (20:26 +0200)]
rcu: Exit RCU extended QS on user preemption

When exceptions or irq are about to resume userspace, if
the task needs to be rescheduled, the arch low level code
calls schedule() directly.

At that time we may be in extended quiescent state from RCU
POV: the exception is not anymore protected inside
rcu_user_exit() - rcu_user_enter() and the irq has called
rcu_irq_exit() already.

Create a new API schedule_user() that calls schedule() inside
rcu_user_exit()-rcu_user_enter() in order to protect it. Archs
will need to rely on it now to implement user preemption safely.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Exit RCU extended QS on kernel preemption after irq/exception
Frederic Weisbecker [Wed, 11 Jul 2012 18:26:36 +0000 (20:26 +0200)]
rcu: Exit RCU extended QS on kernel preemption after irq/exception

When an exception or an irq exits, and we are going to resume into
interrupted kernel code, the low level architecture code calls
preempt_schedule_irq() if there is a need to reschedule.

If the interrupt/exception occured between a call to rcu_user_enter()
(from syscall exit, exception exit, do_notify_resume exit, ...) and
a real resume to userspace (iret,...), preempt_schedule_irq() can be
called whereas RCU thinks we are in userspace. But preempt_schedule_irq()
is going to run kernel code and may be some RCU read side critical
section. We must exit the userspace extended quiescent state before
we call it.

To solve this, just call rcu_user_exit() in the beginning of
preempt_schedule_irq().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agox86: Exception hooks for userspace RCU extended QS
Frederic Weisbecker [Wed, 11 Jul 2012 18:26:35 +0000 (20:26 +0200)]
x86: Exception hooks for userspace RCU extended QS

Add necessary hooks to x86 exception for userspace
RCU extended quiescent state support.

This includes traps, page fault, debug exceptions, etc...

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agox86: Syscall hooks for userspace RCU extended QS
Frederic Weisbecker [Wed, 11 Jul 2012 18:26:34 +0000 (20:26 +0200)]
x86: Syscall hooks for userspace RCU extended QS

Add syscall slow path hooks to notify syscall entry
and exit on CPUs that want to support userspace RCU
extended quiescent state.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Switch task's syscall hooks on context switch
Frederic Weisbecker [Mon, 16 Jul 2012 22:06:40 +0000 (15:06 -0700)]
rcu: Switch task's syscall hooks on context switch

Clear the syscalls hook of a task when it's scheduled out so that if
the task migrates, it doesn't run the syscall slow path on a CPU
that might not need it.

Also set the syscalls hook on the next task if needed.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Ignore userspace extended quiescent state by default
Frederic Weisbecker [Wed, 11 Jul 2012 18:26:32 +0000 (20:26 +0200)]
rcu: Ignore userspace extended quiescent state by default

By default we don't want to enter into RCU extended quiescent
state while in userspace because doing this produces some overhead
(eg: use of syscall slowpath). Set it off by default and ready to
run when some feature like adaptive tickless need it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Allow rcu_user_enter()/exit() to nest
Frederic Weisbecker [Wed, 11 Jul 2012 18:26:31 +0000 (20:26 +0200)]
rcu: Allow rcu_user_enter()/exit() to nest

Allow calls to rcu_user_enter() even if we are already
in userspace (as seen by RCU) and allow calls to rcu_user_exit()
even if we are already in the kernel.

This makes the APIs more flexible to be called from architectures.
Exception entries for example won't need to know if they come from
userspace before calling rcu_user_exit().

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Settle config for userspace extended quiescent state
Frederic Weisbecker [Wed, 11 Jul 2012 18:26:30 +0000 (20:26 +0200)]
rcu: Settle config for userspace extended quiescent state

Create a new config option under the RCU menu that put
CPUs under RCU extended quiescent state (as in dynticks
idle mode) when they run in userspace. This require
some contribution from architectures to hook into kernel
and userspace boundaries.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Make RCU_FAST_NO_HZ handle adaptive ticks
Paul E. McKenney [Thu, 28 Jun 2012 19:33:51 +0000 (12:33 -0700)]
rcu: Make RCU_FAST_NO_HZ handle adaptive ticks

The current implementation of RCU_FAST_NO_HZ tries reasonably hard to rid
the current CPU of RCU callbacks.  This is appropriate when the CPU is
entering idle, where it doesn't have much useful to do anyway, but is most
definitely not what you want when transitioning to user-mode execution.
This commit therefore detects the adaptive-tick case, and refrains from
burning CPU time getting rid of RCU callbacks in that case.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: New rcu_user_enter_irq() and rcu_user_exit_irq() APIs
Frederic Weisbecker [Mon, 4 Jun 2012 23:42:35 +0000 (16:42 -0700)]
rcu: New rcu_user_enter_irq() and rcu_user_exit_irq() APIs

In some cases, it is necessary to enter or exit userspace-RCU-idle mode
from an interrupt handler, for example, if some other CPU sends this
CPU a resched IPI.  In this case, the current CPU would enter the IPI
handler in userspace-RCU-idle mode, but would need to exit the IPI handler
after having exited that mode.

To allow this to work, this commit adds two new APIs to TREE_RCU:

- rcu_user_enter_irq(). This must be called from an interrupt between
rcu_irq_enter() and rcu_irq_exit().  After the irq calls rcu_irq_exit(),
the irq handler will return into an RCU extended quiescent state.
In theory, this interrupt is never a nested interrupt, but in practice
it might interrupt softirq, which looks to RCU like a nested interrupt.

- rcu_user_exit_irq(). This must be called from a non-nesting
interrupt, interrupting an RCU extended quiescent state, also
between rcu_irq_enter() and rcu_irq_exit(). After the irq calls
rcu_irq_exit(), the irq handler will return in an RCU non-quiescent
state.

[ Combined with "Allow calls to rcu_exit_user_irq from nesting irqs." ]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: New rcu_user_enter() and rcu_user_exit() APIs
Frederic Weisbecker [Thu, 28 Jun 2012 18:20:21 +0000 (11:20 -0700)]
rcu: New rcu_user_enter() and rcu_user_exit() APIs

RCU currently insists that only idle tasks can enter RCU idle mode, which
prohibits an adaptive tickless kernel (AKA nohz cpusets), which in turn
would mean that usermode execution would always take scheduling-clock
interrupts, even when there is only one task runnable on the CPU in
question.

This commit therefore adds rcu_user_enter() and rcu_user_exit(), which
allow non-idle tasks to enter RCU idle mode.  These are quite similar
to rcu_idle_enter() and rcu_idle_exit(), respectively, except that they
omit the idle-task checks.

[ Updated to use "user" flag rather than separate check functions. ]

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Max Krasnyansky <maxk@qualcomm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sven-Thorsten Dietrich <thebigcorporation@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
12 years agorcu: Remove callback acceleration from grace-period initialization
Paul E. McKenney [Thu, 12 Jul 2012 06:34:53 +0000 (23:34 -0700)]
rcu: Remove callback acceleration from grace-period initialization

Before grace-period initialization was moved to a kthread, the CPU
invoking this code would have at least one callback that needed
a grace period, often a newly registered callback.  However, moving
grace-period initialization means that the CPU with the callback
that was requesting a grace period is not necessarily the CPU that
is initializing the grace period, so this acceleration is less
valuable.  Because it also adds to the complexity of reasoning about
correctness, this commit removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Adjust for unconditional ->completed assignment
Paul E. McKenney [Wed, 11 Jul 2012 12:23:18 +0000 (05:23 -0700)]
rcu: Adjust for unconditional ->completed assignment

Now that the rcu_node structures' ->completed fields are unconditionally
assigned at grace-period cleanup time, they should already have the
correct value for the new grace period at grace-period initialization
time.  This commit therefore inserts a WARN_ON_ONCE() to verify this
invariant.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Add random PROVE_RCU_DELAY to grace-period initialization
Paul E. McKenney [Sat, 7 Jul 2012 12:57:03 +0000 (05:57 -0700)]
rcu: Add random PROVE_RCU_DELAY to grace-period initialization

There are some additional potential grace-period initialization races
on systems with more than one rcu_node structure, for example:

1. CPU 0 completes a grace period, but needs an additional
grace period, so starts initializing one, initializing all
the non-leaf rcu_node strcutures and the first leaf rcu_node
structure.  Because CPU 0 is both completing the old grace
period and starting a new one, it marks the completion of
the old grace period and the start of the new grace period
in a single traversal of the rcu_node structures.

Therefore, CPUs corresponding to the first rcu_node structure
can become aware that the prior grace period has ended, but
CPUs corresponding to the other rcu_node structures cannot
yet become aware of this.

2. CPU 1 passes through a quiescent state, and therefore informs
the RCU core.  Because its leaf rcu_node structure has already
been initialized, so this CPU's quiescent state is applied to
the new (and only partially initialized) grace period.

3. CPU 1 enters an RCU read-side critical section and acquires
a reference to data item A.  Note that this critical section
will not block the new grace period.

4. CPU 16 exits dyntick-idle mode.  Because it was in dyntick-idle
mode, some other CPU informed the RCU core of its extended
quiescent state for the past several grace periods.  This means
that CPU 16 is not yet aware that these grace periods have ended.

5. CPU 16 on the second leaf rcu_node structure removes data item A
from its enclosing data structure and passes it to call_rcu(),
which queues a callback in the RCU_NEXT_TAIL segment of the
callback queue.

6. CPU 16 enters the RCU core, possibly because it has taken a
scheduling-clock interrupt, or alternatively because it has
more than 10,000 callbacks queued.  It notes that the second
most recent grace period has ended (recall that it cannot yet
become aware that the most recent grace period has completed),
and therefore advances its callbacks.  The callback for data
item A is therefore in the RCU_NEXT_READY_TAIL segment of the
callback queue.

7. CPU 0 completes initialization of the remaining leaf rcu_node
structures for the new grace period, including the structure
corresponding to CPU 16.

8. CPU 16 again enters the RCU core, again, possibly because it has
taken a scheduling-clock interrupt, or alternatively because
it now has more than 10,000 callbacks queued. It notes that
the most recent grace period has ended, and therefore advances
its callbacks. The callback for data item A is therefore in
the RCU_NEXT_TAIL segment of the callback queue.

9. All CPUs other than CPU 1 pass through quiescent states, so that
the new grace period completes.  Note that CPU 1 is still in
its RCU read-side critical section, still referencing data item A.

10. Suppose that CPU 2 is the last CPU to pass through a quiescent
state for the new grace period, and suppose further that CPU 2
does not have any callbacks queued.  It therefore traverses
all of the rcu_node structures, marking the new grace period
as completed, but does not initialize a new grace period.

11. CPU 16 yet again enters the RCU core, yet again possibly because
it has taken a scheduling-clock interrupt, or alternatively
because it now has more than 10,000 callbacks queued. It notes
that the new grace period has ended, and therefore advances
its callbacks. The callback for data item A is therefore in
the RCU_DONE_TAIL segment of the callback queue.  This means
that this callback is now considered ready to be invoked.

12. CPU 16 invokes the callback, freeing data item A while CPU 1
is still referencing it.

This sort of scenario represents a day-one bug for TREE_RCU, however,
the recent changes that permit RCU grace-period initialization to
be preempted made it much more probable.  Still, it is sufficiently
improbable to make validation lengthy and inconvenient, so this commit
adds an anti-heisenbug to greatly increase the collision cross section,
also known as the probability of occurrence.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Fix day-zero grace-period initialization/cleanup race
Paul E. McKenney [Sat, 7 Jul 2012 14:56:57 +0000 (07:56 -0700)]
rcu: Fix day-zero grace-period initialization/cleanup race

The current approach to grace-period initialization is vulnerable to
extremely low-probabity races.  These races stem fro the fact that the
old grace period is marked completed on the same traversal through the
rcu_node structure that is marking the start of the new grace period.
These races can result in too-short grace periods, as shown in the
following scenario:

1. CPU 0 completes a grace period, but needs an additional
grace period, so starts initializing one, initializing all
the non-leaf rcu_node strcutures and the first leaf rcu_node
structure.  Because CPU 0 is both completing the old grace
period and starting a new one, it marks the completion of
the old grace period and the start of the new grace period
in a single traversal of the rcu_node structures.

Therefore, CPUs corresponding to the first rcu_node structure
can become aware that the prior grace period has completed, but
CPUs corresponding to the other rcu_node structures will see
this same prior grace period as still being in progress.

2. CPU 1 passes through a quiescent state, and therefore informs
the RCU core.  Because its leaf rcu_node structure has already
been initialized, this CPU's quiescent state is applied to the
new (and only partially initialized) grace period.

3. CPU 1 enters an RCU read-side critical section and acquires
a reference to data item A.  Note that this critical section
started after the beginning of the new grace period, and
therefore will not block this new grace period.

4. CPU 16 exits dyntick-idle mode.  Because it was in dyntick-idle
mode, other CPUs informed the RCU core of its extended quiescent
state for the past several grace periods.  This means that CPU
16 is not yet aware that these past grace periods have ended.
Assume that CPU 16 corresponds to the second leaf rcu_node
structure.

5. CPU 16 removes data item A from its enclosing data structure
and passes it to call_rcu(), which queues a callback in the
RCU_NEXT_TAIL segment of the callback queue.

6. CPU 16 enters the RCU core, possibly because it has taken a
scheduling-clock interrupt, or alternatively because it has more
than 10,000 callbacks queued.  It notes that the second most
recent grace period has completed (recall that it cannot yet
become aware that the most recent grace period has completed),
and therefore advances its callbacks.  The callback for data
item A is therefore in the RCU_NEXT_READY_TAIL segment of the
callback queue.

7. CPU 0 completes initialization of the remaining leaf rcu_node
structures for the new grace period, including the structure
corresponding to CPU 16.

8. CPU 16 again enters the RCU core, again, possibly because it has
taken a scheduling-clock interrupt, or alternatively because
it now has more than 10,000 callbacks queued. It notes that
the most recent grace period has ended, and therefore advances
its callbacks. The callback for data item A is therefore in
the RCU_WAIT_TAIL segment of the callback queue.

9. All CPUs other than CPU 1 pass through quiescent states.  Because
CPU 1 already passed through its quiescent state, the new grace
period completes.  Note that CPU 1 is still in its RCU read-side
critical section, still referencing data item A.

10. Suppose that CPU 2 wais the last CPU to pass through a quiescent
state for the new grace period, and suppose further that CPU 2
did not have any callbacks queued, therefore not needing an
additional grace period.  CPU 2 therefore traverses all of the
rcu_node structures, marking the new grace period as completed,
but does not initialize a new grace period.

11. CPU 16 yet again enters the RCU core, yet again possibly because
it has taken a scheduling-clock interrupt, or alternatively
because it now has more than 10,000 callbacks queued. It notes
that the new grace period has ended, and therefore advances
its callbacks. The callback for data item A is therefore in
the RCU_DONE_TAIL segment of the callback queue.  This means
that this callback is now considered ready to be invoked.

12. CPU 16 invokes the callback, freeing data item A while CPU 1
is still referencing it.

This scenario represents a day-zero bug for TREE_RCU.  This commit
therefore ensures that the old grace period is marked completed in
all leaf rcu_node structures before a new grace period is marked
started in any of them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Prevent initialization-time quiescent-state race
Paul E. McKenney [Fri, 6 Jul 2012 18:18:51 +0000 (11:18 -0700)]
rcu: Prevent initialization-time quiescent-state race

Now the the grace-period initialization procedure is preemptible, it is
subject to the following race on systems whose rcu_node tree contains
more than one node:

1. CPU 31 starts initializing the grace period, including the
first leaf rcu_node structures, and is then preempted.

2. CPU 0 refers to the first leaf rcu_node structure, and notes
that a new grace period has started.  It passes through a
quiescent state shortly thereafter, and informs the RCU core
of this rite of passage.

3. CPU 0 enters an RCU read-side critical section, acquiring
a pointer to an RCU-protected data item.

4. CPU 31 removes the data item referenced by CPU 0 from the
data structure, and registers an RCU callback in order to
free it.

5. CPU 31 resumes initializing the grace period, including its
own rcu_node structure.  In invokes rcu_start_gp_per_cpu(),
which advances all callbacks, including the one registered
in #4 above, to be handled by the current grace period.

6. The remaining CPUs pass through quiescent states and inform
the RCU core, but CPU 0 remains in its RCU read-side critical
section, still referencing the now-removed data item.

7. The grace period completes and all the callbacks are invoked,
including the one that frees the data item that CPU 0 is still
referencing.  Oops!!!

This commit therefore moves the callback handling to precede initialization
of any of the rcu_node structures, thus avoiding this race.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Make rcutree module parameters visible in sysfs
Paul E. McKenney [Sun, 1 Jul 2012 22:42:33 +0000 (15:42 -0700)]
rcu: Make rcutree module parameters visible in sysfs

The module parameters blimit, qhimark, and qlomark (and more
recently, rcu_fanout_leaf) have permission masks of zero, so
that their values are not visible from sysfs.  This is unnecessary
and inconvenient to administrators who might like an easy way to
see what these values are on a running system.  This commit therefore
sets their permission masks to 0444, allowing them to be read but
not written.

Suggested-by: Rusty Russell <rusty@ozlabs.org>
Suggested-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Remove now-unused rcu_state fields
Paul E. McKenney [Fri, 29 Jun 2012 15:25:07 +0000 (08:25 -0700)]
rcu: Remove now-unused rcu_state fields

Moving the RCU grace-period processing to a kthread and adjusting the
tracing resulted in two of the rcu_state structure's fields being unused.
This commit therefore removes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Control grace-period duration from sysfs
Paul E. McKenney [Wed, 27 Jun 2012 03:45:57 +0000 (20:45 -0700)]
rcu: Control grace-period duration from sysfs

Some uses of RCU benefit from shorter grace periods, while others benefit
more from the greater efficiency provided by longer grace periods.
Therefore, this commit allows the durations to be controlled from sysfs.
There are two sysfs parameters, one named "jiffies_till_first_fqs" that
specifies the delay in jiffies from the end of grace-period initialization
until the first attempt to force quiescent states, and the other named
"jiffies_till_next_fqs" that specifies the delay (again in jiffies)
between subsequent attempts to force quiescent states.  They both default
to three jiffies, which is compatible with the old hard-coded behavior.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Prevent force_quiescent_state() memory contention
Paul E. McKenney [Wed, 27 Jun 2012 00:00:35 +0000 (17:00 -0700)]
rcu: Prevent force_quiescent_state() memory contention

Large systems running RCU_FAST_NO_HZ kernels see extreme memory
contention on the rcu_state structure's ->fqslock field.  This
can be avoided by disabling RCU_FAST_NO_HZ, either at compile time
or at boot time (via the nohz kernel boot parameter), but large
systems will no doubt become sensitive to energy consumption.
This commit therefore uses a combining-tree approach to spread the
memory contention across new cache lines in the leaf rcu_node structures.
This can be thought of as a tournament lock that has only a try-lock
acquisition primitive.

The effect on small systems is minimal, because such systems have
an rcu_node "tree" consisting of a single node.  In addition, this
functionality is not used on fastpaths.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Adjust debugfs tracing for kthread-based quiescent-state forcing
Paul E. McKenney [Tue, 26 Jun 2012 21:00:48 +0000 (14:00 -0700)]
rcu: Adjust debugfs tracing for kthread-based quiescent-state forcing

Moving quiescent-state forcing into a kthread dispenses with the need
for the ->n_rp_need_fqs field, so this commit removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Allow RCU quiescent-state forcing to be preempted
Paul E. McKenney [Mon, 25 Jun 2012 15:41:11 +0000 (08:41 -0700)]
rcu: Allow RCU quiescent-state forcing to be preempted

RCU quiescent-state forcing is currently carried out without preemption
points, which can result in excessive latency spikes on large systems
(many hundreds or thousands of CPUs).  This patch therefore inserts
a voluntary preemption point into force_qs_rnp(), which should greatly
reduce the magnitude of these spikes.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Move quiescent-state forcing into kthread
Paul E. McKenney [Sat, 23 Jun 2012 00:06:26 +0000 (17:06 -0700)]
rcu: Move quiescent-state forcing into kthread

As the first step towards allowing quiescent-state forcing to be
preemptible, this commit moves RCU quiescent-state forcing into the
same kthread that is now used to initialize and clean up after grace
periods.  This is yet another step towards keeping scheduling
latency down to a dull roar.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Pull TINY_RCU dyntick-idle tracing into non-idle region
Paul E. McKenney [Wed, 11 Jul 2012 07:24:57 +0000 (00:24 -0700)]
rcu: Pull TINY_RCU dyntick-idle tracing into non-idle region

Because TINY_RCU's idle detection keys directly off of the nesting
level, rather than from a separate variable as in TREE_RCU, the
TINY_RCU dyntick-idle tracing on transition to idle must happen
before the change to the nesting level.  This commit therefore makes
this change by passing the desired new value (rather than the old value)
of the nesting level in to rcu_idle_enter_common().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Add PROVE_RCU_DELAY to provoke difficult races
Paul E. McKenney [Mon, 2 Jul 2012 21:42:01 +0000 (14:42 -0700)]
rcu: Add PROVE_RCU_DELAY to provoke difficult races

There have been some recent bugs that were triggered only when
preemptible RCU's __rcu_read_unlock() was preempted just after setting
->rcu_read_lock_nesting to INT_MIN, which is a low-probability event.
Therefore, reproducing those bugs (to say nothing of gaining confidence
in alleged fixes) was quite difficult.  This commit therefore creates
a new debug-only RCU kernel config option that forces a short delay
in __rcu_read_unlock() to increase the probability of those sorts of
bugs occurring.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Segregate rcu_state fields to improve cache locality
Dimitri Sivanich [Fri, 29 Jun 2012 21:17:29 +0000 (14:17 -0700)]
rcu: Segregate rcu_state fields to improve cache locality

The fields in the rcu_state structure that are protected by the
root rcu_node structure's ->lock can share a cache line with the
fields protected by ->onofflock.  This can result in excessive
memory contention on large systems, so this commit applies
____cacheline_internodealigned_in_smp to the ->onofflock field in
order to segregate them.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Dimitri Sivanich <sivanich@sgi.com>
12 years agorcu: Provide OOM handler to motivate lazy RCU callbacks
Paul E. McKenney [Tue, 12 Jun 2012 00:39:43 +0000 (17:39 -0700)]
rcu: Provide OOM handler to motivate lazy RCU callbacks

In kernels built with CONFIG_RCU_FAST_NO_HZ=y, CPUs can accumulate a
large number of lazy callbacks, which as the name implies will be slow
to be invoked.  This can be a problem on small-memory systems, where the
default 6-second sleep for CPUs having only lazy RCU callbacks could well
be fatal.  This commit therefore installs an OOM hander that ensures that
every CPU with non-lazy callbacks has at least one non-lazy callback,
in turn ensuring timely advancement for these callbacks.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Break up rcu_gp_kthread() into subfunctions
Paul E. McKenney [Fri, 22 Jun 2012 18:08:41 +0000 (11:08 -0700)]
rcu: Break up rcu_gp_kthread() into subfunctions

Then rcu_gp_kthread() function is too large and furthermore needs to
have the force_quiescent_state() code pulled in.  This commit therefore
breaks up rcu_gp_kthread() into rcu_gp_init() and rcu_gp_cleanup().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Prevent offline CPUs from executing RCU core code
Paul E. McKenney [Thu, 21 Jun 2012 16:54:10 +0000 (09:54 -0700)]
rcu: Prevent offline CPUs from executing RCU core code

Earlier versions of RCU invoked the RCU core from the CPU_DYING notifier
in order to note a quiescent state for the outgoing CPU.  Because the
CPU is marked "offline" during the execution of the CPU_DYING notifiers,
the RCU core had to tolerate being invoked from an offline CPU.  However,
commit b1420f1c (Make rcu_barrier() less disruptive) left only tracing
code in the CPU_DYING notifier, so the RCU core need no longer execute
on offline CPUs.  This commit therefore enforces this restriction.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Allow RCU grace-period cleanup to be preempted
Paul E. McKenney [Thu, 21 Jun 2012 15:19:05 +0000 (08:19 -0700)]
rcu: Allow RCU grace-period cleanup to be preempted

RCU grace-period cleanup is currently carried out with interrupts
disabled, which can result in excessive latency spikes on large systems
(many hundreds or thousands of CPUs).  This patch therefore makes the
RCU grace-period cleanup be preemptible, including voluntary preemption
points, which should eliminate those latency spikes.  Similar spikes from
forcing of quiescent states will be dealt with similarly by later patches.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Fix broken strings in RCU's source code.
Paul E. McKenney [Thu, 17 May 2012 22:12:45 +0000 (15:12 -0700)]
rcu: Fix broken strings in RCU's source code.

Although the C language allows you to break strings across lines, doing
this makes it hard for people to find the Linux kernel code corresponding
to a given console message.  This commit therefore fixes broken strings
throughout RCU's source code.

Suggested-by: Josh Triplett <josh@joshtriplett.org>
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Move RCU grace-period cleanup into kthread
Paul E. McKenney [Thu, 21 Jun 2012 00:07:14 +0000 (17:07 -0700)]
rcu: Move RCU grace-period cleanup into kthread

As a first step towards allowing grace-period cleanup to be preemptible,
this commit moves the RCU grace-period cleanup into the same kthread
that is now used to initialize grace periods.  This is needed to keep
scheduling latency down to a dull roar.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Allow RCU grace-period initialization to be preempted
Paul E. McKenney [Wed, 20 Jun 2012 00:18:20 +0000 (17:18 -0700)]
rcu: Allow RCU grace-period initialization to be preempted

RCU grace-period initialization is currently carried out with interrupts
disabled, which can result in 200-microsecond latency spikes on systems
on which RCU has been configured for 4096 CPUs.  This patch therefore
makes the RCU grace-period initialization be preemptible, which should
eliminate those latency spikes.  Similar spikes from grace-period cleanup
and the forcing of quiescent states will be dealt with similarly by later
patches.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Move RCU grace-period initialization into a kthread
Paul E. McKenney [Tue, 19 Jun 2012 01:36:08 +0000 (18:36 -0700)]
rcu: Move RCU grace-period initialization into a kthread

As the first step towards allowing grace-period initialization to be
preemptible, this commit moves the RCU grace-period initialization
into its own kthread.  This is needed to keep large-system scheduling
latency at reasonable levels.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Fix code-style issues involving "else"
Paul E. McKenney [Thu, 28 Jun 2012 15:08:25 +0000 (08:08 -0700)]
rcu: Fix code-style issues involving "else"

The Linux kernel coding style says that single-statement blocks should
omit curly braces unless the other leg of the "if" statement has
multiple statements, in which case the curly braces should be included.
This commit fixes RCU's violations of this rule.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agoMerge branches 'bigrtm.2012.07.04a', 'doctorture.2012.07.02a', 'fixes.2012.07.06a...
Paul E. McKenney [Fri, 6 Jul 2012 12:59:20 +0000 (05:59 -0700)]
Merge branches 'bigrtm.2012.07.04a', 'doctorture.2012.07.02a', 'fixes.2012.07.06a' and 'fnh.2012.07.02a' into HEAD

bigrtm: First steps towards getting RCU out of the way of
tens-of-microseconds real-time response on systems compiled
with NR_CPUS=4096.  Also cleanups for and increased concurrency
of rcu_barrier() family of primitives.
doctorture: rcutorture and documentation improvements.
fixes:  Miscellaneous fixes.
fnh: RCU_FAST_NO_HZ fixes and improvements.

12 years agorcu: Introduce check for callback list/count mismatch
Paul E. McKenney [Mon, 25 Jun 2012 19:54:17 +0000 (12:54 -0700)]
rcu: Introduce check for callback list/count mismatch

The recent bug that introduced the RCU callback list/count mismatch
showed the need for a diagnostic to check for this, which this commit
adds.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Make RCU_FAST_NO_HZ respect nohz= boot parameter
Paul E. McKenney [Sun, 24 Jun 2012 17:15:02 +0000 (10:15 -0700)]
rcu: Make RCU_FAST_NO_HZ respect nohz= boot parameter

If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to
also disable itself.  This commit therefore checks for tick_nohz_enabled
being zero, disabling rcu_prepare_for_idle() if so.  This commit assumes
that tick_nohz_enabled can change at runtime: If this is not the case,
then a simpler approach suffices.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Fix qlen_lazy breakage
Paul E. McKenney [Tue, 5 Jun 2012 22:53:53 +0000 (15:53 -0700)]
rcu: Fix qlen_lazy breakage

Commit d8169d4c (Make __kfree_rcu() less dependent on compiler choices)
created a macro out of an inline function in order to avoid build
breakage for certain combinations of gcc flags.  Unfortunately, it also
converted a kfree_call_rcu() to a call_rcu(), which made the rcu_data
structure's ->qlen_lazy field lose counts.  This commit therefore changes
the call_rcu() back to kfree_call_rcu().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Round FAST_NO_HZ lazy timeout to nearest second
Paul E. McKenney [Tue, 5 Jun 2012 03:45:10 +0000 (20:45 -0700)]
rcu: Round FAST_NO_HZ lazy timeout to nearest second

Currently, if several CPUs in the same package have all lazy RCU
callbacks, their wakeups will be uncorrelated.  If all the CPUs are in the
same power domain (as is often the case), this will result in unnecessary
power-ups of the package.  This commit therefore uses round_jiffies()
to round the timeouts to a second boundary, increasing the odds that
they can be coalesced with each other or with other timeouts.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: The rcu_needs_cpu() function is not a quiescent state
Paul E. McKenney [Thu, 10 May 2012 22:37:48 +0000 (15:37 -0700)]
rcu: The rcu_needs_cpu() function is not a quiescent state

The TINY_PREEMPT_RCU() function rcu_preempt_needs_cpu(), which is called
from rcu_needs_cpu(), assumes that it is in a quiescent state with respect
to the CPU.  This is no longer the case.  This commit therefore updates
rcu_preempt_needs_cpu() to make it aware that it is not running in a
quiescent state.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
12 years agorcu: Dump only the current CPU's buffers for idle-entry/exit warnings
Paul E. McKenney [Wed, 9 May 2012 22:55:39 +0000 (15:55 -0700)]
rcu: Dump only the current CPU's buffers for idle-entry/exit warnings

Problems in RCU idle entry and exit are almost always confined to the
offending CPU.  This commit therefore switches ftrace_dump() from
DUMP_ALL to DUMP_ORIG.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: Pascal Chapperon <pascal.chapperon@wanadoo.fr>
12 years agorcu: Add check for CPUs going offline with callbacks queued
Paul E. McKenney [Thu, 21 Jun 2012 18:26:42 +0000 (11:26 -0700)]
rcu: Add check for CPUs going offline with callbacks queued

If a CPU goes offline with callbacks queued, those callbacks might be
indefinitely postponed, which can result in a system hang.  This commit
therefore inserts warnings for this condition.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Disable preemption in rcu_blocking_is_gp()
Paul E. McKenney [Tue, 19 Jun 2012 18:58:27 +0000 (11:58 -0700)]
rcu: Disable preemption in rcu_blocking_is_gp()

It is time to optimize CONFIG_TREE_PREEMPT_RCU's synchronize_rcu()
for uniprocessor optimization, which means that rcu_blocking_is_gp()
can no longer rely on RCU read-side critical sections having disabled
preemption.  This commit therefore disables preemption across
rcu_blocking_is_gp()'s scan of the cpu_online_mask.

(Updated from previous version to fix embarrassing bug spotted by
Wu Fengguang.)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Prevent uninitialized string in RCU CPU stall info
Carsten Emde [Tue, 19 Jun 2012 08:43:16 +0000 (10:43 +0200)]
rcu: Prevent uninitialized string in RCU CPU stall info

An uninitialized string may be displayed at the end of the rcu_preempt
detected stall info such as

0: (1 GPs behind) idle=075/140000000000000/0 =8?^D=8?^D
                                             ^^^^^^^^^^
if CONFIG_RCU_FAST_NO_HZ is not defined.

This trivial patch clears the string in this case.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Fix rcu_is_cpu_idle() #ifdef in TINY_RCU
Paul E. McKenney [Wed, 6 Jun 2012 14:12:43 +0000 (07:12 -0700)]
rcu: Fix rcu_is_cpu_idle() #ifdef in TINY_RCU

The rcu_is_cpu_idle() function is used if CONFIG_DEBUG_LOCK_ALLOC,
but TINY_RCU defines it only when CONFIG_PROVE_RCU.  This causes
build failures when CONFIG_DEBUG_LOCK_ALLOC=y but CONFIG_PROVE_RCU=n.
This commit therefore adjusts the #ifdefs for rcu_is_cpu_idle() so
that it is defined when CONFIG_DEBUG_LOCK_ALLOC=y.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Split RCU core processing out of __call_rcu()
Paul E. McKenney [Wed, 30 May 2012 10:21:48 +0000 (03:21 -0700)]
rcu: Split RCU core processing out of __call_rcu()

The __call_rcu() function is a bit overweight, so this commit splits
it into actual enqueuing of and accounting for the callback (__call_rcu())
and associated RCU-core processing (__call_rcu_core()).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Prevent __call_rcu() from invoking RCU core on offline CPUs
Paul E. McKenney [Sat, 26 May 2012 15:56:01 +0000 (08:56 -0700)]
rcu: Prevent __call_rcu() from invoking RCU core on offline CPUs

The __call_rcu() function will invoke the RCU core, for example, if
it detects that the current CPU has too many callbacks.  However, this
can happen on an offline CPU that is on its way to the idle loop, in
which case it is an error to invoke the RCU core, and the excess callbacks
will be adopted in any case.  This commit therefore adds checks to
__call_rcu() for running on an offline CPU, refraining from invoking
the RCU core in this case.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Make __call_rcu() handle invocation from idle
Paul E. McKenney [Wed, 23 May 2012 05:10:24 +0000 (22:10 -0700)]
rcu: Make __call_rcu() handle invocation from idle

Although __call_rcu() is handled correctly when called from a momentary
non-idle period, if it is called on a CPU that RCU believes to be idle
on RCU_FAST_NO_HZ kernels, the callback might be indefinitely postponed.
This commit therefore ensures that RCU is aware of the new callback and
has a chance to force the CPU out of dyntick-idle mode when a new callback
is posted.

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Remove function versions of __kfree_rcu and __is_kfree_rcu_offset
Paul E. McKenney [Fri, 25 May 2012 21:25:58 +0000 (14:25 -0700)]
rcu: Remove function versions of __kfree_rcu and __is_kfree_rcu_offset

Commit d8169d4c (Make __kfree_rcu() less dependent on compiler choices)
added cpp macro versions of __kfree_rcu() and __is_kfree_rcu_offset(),
but failed to remove the old inline-function versions.  This commit does
this cleanup.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Consolidate tree/tiny __rcu_read_{,un}lock() implementations
Paul E. McKenney [Mon, 21 May 2012 18:58:36 +0000 (11:58 -0700)]
rcu: Consolidate tree/tiny __rcu_read_{,un}lock() implementations

The CONFIG_TREE_PREEMPT_RCU and CONFIG_TINY_PREEMPT_RCU versions of
__rcu_read_lock() and __rcu_read_unlock() are identical, so this commit
consolidates them into kernel/rcupdate.h.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Remove return value from rcu_assign_pointer()
Paul E. McKenney [Wed, 16 May 2012 22:51:08 +0000 (15:51 -0700)]
rcu: Remove return value from rcu_assign_pointer()

The return value from rcu_assign_pointer() is not used, and using it
would be quite ugly, for example:

q = rcu_assign_pointer(global_p, p);

To prevent this sort of ugliness from spreading, this commit wraps
rcu_assign_pointer() in a do-while loop.

Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agokey: Remove extraneous parentheses from rcu_assign_keypointer()
Paul E. McKenney [Wed, 16 May 2012 23:31:38 +0000 (16:31 -0700)]
key: Remove extraneous parentheses from rcu_assign_keypointer()

This commit removes the extraneous parentheses from rcu_assign_keypointer()
so that rcu_assign_pointer() can be wrapped in do-while.  It also wraps
rcu_assign_keypointer() in a do-while and parenthesizes its final argument,
as suggested by David Howells.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Remove return value from RCU_INIT_POINTER()
Paul E. McKenney [Wed, 16 May 2012 22:42:30 +0000 (15:42 -0700)]
rcu: Remove return value from RCU_INIT_POINTER()

The return value from RCU_INIT_POINTER() is not used, and using it
would be quite ugly, for example:

q = RCU_INIT_POINTER(global_p, p);

To prevent this sort of ugliness from appearing, this commit wraps
RCU_INIT_POINTER() in a do-while loop.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: David Howells <dhowells@redhat.com>
12 years agorcu: Use new RCU_POINTER_INITIALIZER for gcc-style initializations
Paul E. McKenney [Wed, 16 May 2012 22:33:15 +0000 (15:33 -0700)]
rcu: Use new RCU_POINTER_INITIALIZER for gcc-style initializations

This commit applies the INIT_RCU_POINTER() macro to all uses of
RCU_POINTER_INITIALIZER() that were all too cleverly creating gcc-style
initializations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
12 years agorcu: Add a gcc-style structure initializer for RCU pointers
Paul E. McKenney [Wed, 16 May 2012 22:23:45 +0000 (15:23 -0700)]
rcu: Add a gcc-style structure initializer for RCU pointers

RCU_INIT_POINTER() returns a value that is never used, and which should
be abolished due to terminal ugliness:

q = RCU_INIT_POINTER(global_p, p);

However, there are two uses that cannot be handled by a do-while
formulation because they do gcc-style initialization:

RCU_INIT_POINTER(.real_cred, &init_cred),
RCU_INIT_POINTER(.cred, &init_cred),

This usage is clever, but not necessarily the nicest approach.
This commit therefore creates an RCU_POINTER_INITIALIZER() macro that
is specifically designed for gcc-style initialization.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: David Howells <dhowells@redhat.com>
12 years agorcu: Add ACCESS_ONCE() to ->qlen accesses
Paul E. McKenney [Wed, 9 May 2012 22:44:42 +0000 (15:44 -0700)]
rcu: Add ACCESS_ONCE() to ->qlen accesses

The _rcu_barrier() function accesses other CPUs' rcu_data structure's
->qlen field without benefit of locking.  This commit therefore adds
the required ACCESS_ONCE() wrappers around accesses and updates that
need it.

ACCESS_ONCE() is not needed when a CPU accesses its own ->qlen, or
in code that cannot run while _rcu_barrier() is sampling ->qlen fields.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Consolidate duplicate callback-list initialization
Paul E. McKenney [Wed, 9 May 2012 22:39:56 +0000 (15:39 -0700)]
rcu: Consolidate duplicate callback-list initialization

There are a couple of open-coded initializations of the rcu_data
structure's RCU callback list.  This commit therefore consolidates
them into a new init_callback_list() function.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Fix detection of abruptly-ending stall
Paul E. McKenney [Wed, 9 May 2012 15:45:12 +0000 (08:45 -0700)]
rcu: Fix detection of abruptly-ending stall

The code that attempts to identify stalls that end just as we detect
them is broken by both flavors of initialization failure.  This commit
therefore properly initializes and computes the count of the number
of reasons why the RCU grace period is stalled.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Make rcutorture fakewriters invoke rcu_barrier()
Paul E. McKenney [Wed, 30 May 2012 00:50:51 +0000 (17:50 -0700)]
rcu: Make rcutorture fakewriters invoke rcu_barrier()

The current rcutorture rcu_barrier() testing never intentionally runs
more than one instance of rcu_barrier() at a given time.  This fails
to test the the shiny new concurrency features of rcu_barrier().  This
commit therefore modifies the rcutorture fakewriter kthread to randomly
invoke rcu_barrier() rather than the usual synchronize_rcu().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Fix diagnostic-printk typo in rcutorture
Paul E. McKenney [Fri, 25 May 2012 01:17:48 +0000 (18:17 -0700)]
rcu: Fix diagnostic-printk typo in rcutorture

The rcu_torture_barrier() function has a copy-and-paste typo in the
string passed to rcutorture_shutdown_absorb(), which this commit fixes.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Fix bug in rcu_barrier() torture test
Paul E. McKenney [Tue, 29 May 2012 02:21:41 +0000 (19:21 -0700)]
rcu: Fix bug in rcu_barrier() torture test

The child threads in the rcu_torture_barrier_cbs() are improperly
synchronized, which can cause the rcu_barrier() tests to hang.  The
failure mode is as follows:

1. CPU 0 running in rcu_torture_barrier() sets barrier_cbs_count
     to n_barrier_cbs.

2. CPU 1 running in rcu_torture_barrier_cbs() wakes up, posts
     its RCU callback, and atomically decrements barrier_cbs_count.
     Because barrier_cbs_count is not zero, it does not do the wake_up().

3. CPU 2 running in rcu_torture_barrier_cbs() wakes up, but
     finds that barrier_cbs_count is not equal to n_barrier_cbs,
     and so returns to sleep.

4. The value of barrier_cbs_count therefore never reaches zero,
     which causes the test to hang.

This commit therefore uses a phase variable to coordinate the test,
preventing this scenario from occurring.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Test srcu_barrier() from rcutorture test suite
Paul E. McKenney [Tue, 8 May 2012 17:21:50 +0000 (10:21 -0700)]
rcu: Test srcu_barrier() from rcutorture test suite

SRCU now has a call_srcu() and an srcu_barrier(), but rcutorture does not
test them.  This commit adds the machinery to allow rcutorture's existing
tests for call_rcu() and rcu_barrier() to apply to the SRCU equivalents.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Rationalize ordering of torture_ops list
Paul E. McKenney [Mon, 7 May 2012 20:44:44 +0000 (13:44 -0700)]
rcu: Rationalize ordering of torture_ops list

Move the raw SRCU interfaces out of the middle of the normal SRCU
interfaces.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Update documentation to cover call_srcu() and srcu_barrier().
Paul E. McKenney [Mon, 7 May 2012 20:43:30 +0000 (13:43 -0700)]
rcu: Update documentation to cover call_srcu() and srcu_barrier().

The advent of call_srcu() and srcu_barrier() obsoleted some of the
documentation, so this commit brings that up to date.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: RCU_SAVE_DYNTICK code no longer ever dead
Paul E. McKenney [Tue, 12 Jun 2012 20:16:58 +0000 (13:16 -0700)]
rcu: RCU_SAVE_DYNTICK code no longer ever dead

Before RCU had unified idle, the RCU_SAVE_DYNTICK leg of the switch
statement in force_quiescent_state() was dead code for CONFIG_NO_HZ=n
kernel builds.  With unified idle, the code is never dead.  This commit
therefore removes the "if" statement designed to make gcc aware of when
the code was and was not dead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Use for_each_rcu_flavor() in TREE_RCU tracing
Paul E. McKenney [Tue, 12 Jun 2012 19:25:39 +0000 (12:25 -0700)]
rcu: Use for_each_rcu_flavor() in TREE_RCU tracing

This commit applies the new for_each_rcu_flavor() macro to the
kernel/rcutree_trace.c file.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Introduce for_each_rcu_flavor() and use it
Paul E. McKenney [Tue, 12 Jun 2012 18:01:13 +0000 (11:01 -0700)]
rcu: Introduce for_each_rcu_flavor() and use it

The arrival of TREE_PREEMPT_RCU some years back included some ugly
code involving either #ifdef or #ifdef'ed wrapper functions to iterate
over all non-SRCU flavors of RCU.  This commit therefore introduces
a for_each_rcu_flavor() iterator over the rcu_state structures for each
flavor of RCU to clean up a bit of the ugliness.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Remove unneeded __rcu_process_callbacks() argument
Paul E. McKenney [Tue, 12 Jun 2012 16:40:38 +0000 (09:40 -0700)]
rcu: Remove unneeded __rcu_process_callbacks() argument

With the advent of __this_cpu_ptr(), it is no longer necessary to pass
both the rcu_state and rcu_data structures into __rcu_process_callbacks().
This commit therefore computes the rcu_data pointer from the rcu_state
pointer within __rcu_process_callbacks() so that callers can pass in
only the pointer to the rcu_state structure.  This paves the way for
linking the rcu_state structures together and iterating over them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Add rcu_barrier() statistics to debugfs tracing
Paul E. McKenney [Fri, 25 May 2012 18:49:15 +0000 (11:49 -0700)]
rcu: Add rcu_barrier() statistics to debugfs tracing

This commit adds an rcubarrier file to RCU's debugfs statistical tracing
directory, providing diagnostic information on rcu_barrier().

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Add tracing for _rcu_barrier()
Paul E. McKenney [Thu, 24 May 2012 01:47:05 +0000 (18:47 -0700)]
rcu: Add tracing for _rcu_barrier()

This commit adds event tracing for _rcu_barrier() execution.  This
is defined only if RCU_TRACE=y.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Increase rcu_barrier() concurrency
Paul E. McKenney [Tue, 29 May 2012 21:56:46 +0000 (14:56 -0700)]
rcu: Increase rcu_barrier() concurrency

The traditional rcu_barrier() implementation has serialized all requests,
regardless of RCU flavor, and also does not coalesce concurrent requests.
In the past, this has been good and sufficient.

However, systems are getting larger and use of rcu_barrier() has been
increasing.  This commit therefore introduces a counter-based scheme
that allows _rcu_barrier() calls for the same flavor of RCU to take
advantage of each others' work.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Remove needless initialization
Paul E. McKenney [Sat, 16 Jun 2012 01:22:05 +0000 (18:22 -0700)]
rcu: Remove needless initialization

For global variables, C defaults all fields to zero.  The initialization
of the rcu_state structure's ->n_force_qs and ->n_force_qs_ngp fields
is therefore redundant, so this commit removes these initializations.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Move rcu_barrier_mutex to rcu_state structure
Paul E. McKenney [Tue, 29 May 2012 12:18:53 +0000 (05:18 -0700)]
rcu: Move rcu_barrier_mutex to rcu_state structure

In order to allow each RCU flavor to concurrently execute its
rcu_barrier() function, it is necessary to move the relevant
state to the rcu_state structure.  This commit therefore moves the
rcu_barrier_mutex global variable to a new ->barrier_mutex field
in the rcu_state structure.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Move rcu_barrier_completion to rcu_state structure
Paul E. McKenney [Tue, 29 May 2012 10:03:37 +0000 (03:03 -0700)]
rcu: Move rcu_barrier_completion to rcu_state structure

In order to allow each RCU flavor to concurrently execute its
rcu_barrier() function, it is necessary to move the relevant
state to the rcu_state structure.  This commit therefore moves the
rcu_barrier_completion global variable to a new ->barrier_completion
field in the rcu_state structure.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Move rcu_barrier_cpu_count to rcu_state structure
Paul E. McKenney [Tue, 29 May 2012 07:34:56 +0000 (00:34 -0700)]
rcu: Move rcu_barrier_cpu_count to rcu_state structure

In order to allow each RCU flavor to concurrently execute its rcu_barrier()
function, it is necessary to move the relevant state to the rcu_state
structure.  This commit therefore moves the rcu_barrier_cpu_count global
variable to a new ->barrier_cpu_count field in the rcu_state structure.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Move _rcu_barrier()'s rcu_head structures to rcu_data structures
Paul E. McKenney [Tue, 29 May 2012 06:57:46 +0000 (23:57 -0700)]
rcu: Move _rcu_barrier()'s rcu_head structures to rcu_data structures

In order for multiple flavors of RCU to each concurrently run one
rcu_barrier(), each flavor needs its own per-CPU set of rcu_head
structures.  This commit therefore moves _rcu_barrier()'s set of
per-CPU rcu_head structures from per-CPU variables to the existing
per-CPU and per-RCU-flavor rcu_data structures.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Place pointer to call_rcu() in rcu_data structure
Paul E. McKenney [Tue, 29 May 2012 06:26:01 +0000 (23:26 -0700)]
rcu: Place pointer to call_rcu() in rcu_data structure

This is a preparatory commit for increasing rcu_barrier()'s concurrency.
It adds a pointer in the rcu_data structure to the corresponding call_rcu()
function.  This allows a pointer to the rcu_data structure to imply the
function pointer, which allows _rcu_barrier() state to be placed in the
rcu_state structure.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Prevent excessive line length in RCU_STATE_INITIALIZER()
Paul E. McKenney [Tue, 29 May 2012 02:41:41 +0000 (19:41 -0700)]
rcu: Prevent excessive line length in RCU_STATE_INITIALIZER()

Upcoming rcu_barrier() concurrency commits will result in line lengths
greater than 80 characters in the RCU_STATE_INITIALIZER(), so this commit
shortens the name of the macro's argument to prevent this.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
12 years agorcu: Size rcu_node tree from nr_cpu_ids rather than NR_CPUS
Paul E. McKenney [Wed, 9 May 2012 04:00:28 +0000 (21:00 -0700)]
rcu: Size rcu_node tree from nr_cpu_ids rather than NR_CPUS

The rcu_node tree array is sized based on compile-time constants,
including NR_CPUS.  Although this approach has worked well in the past,
the recent trend by many distros to define NR_CPUS=4096 results in
excessive grace-period-initialization latencies.

This commit therefore substitutes the run-time computed nr_cpu_ids for
the compile-time NR_CPUS when building the tree.  This can result in
much of the compile-time-allocated rcu_node array being unused.  If
this is a major problem, you are in a specialized situation anyway,
so you can manually adjust the NR_CPUS, RCU_FANOUT, and RCU_FANOUT_LEAF
kernel config parameters.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Four-level hierarchy is no longer experimental
Paul E. McKenney [Sat, 16 Jun 2012 01:16:00 +0000 (18:16 -0700)]
rcu: Four-level hierarchy is no longer experimental

Time to make the four-level-hierarchy setting less scary, so this
commit removes "Experimental" from the boot-time message.  Leave the
message in order to get a heads-up on any possible need to expand to
a five-level hierarchy.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agorcu: Control RCU_FANOUT_LEAF from boot-time parameter
Paul E. McKenney [Mon, 23 Apr 2012 22:52:53 +0000 (15:52 -0700)]
rcu: Control RCU_FANOUT_LEAF from boot-time parameter

Although making RCU_FANOUT_LEAF a kernel configuration parameter rather
than a fixed constant makes it easier for people to decrease cache-miss
overhead for large systems, it is of little help for people who must
run a single pre-built kernel binary.

This commit therefore allows the value of RCU_FANOUT_LEAF to be
increased (but not decreased!) via a boot-time parameter named
rcutree.rcu_fanout_leaf.

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
12 years agoRevert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation"
Paul E. McKenney [Mon, 2 Jul 2012 14:08:42 +0000 (07:08 -0700)]
Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation"

This reverts commit 616c310e83b872024271c915c1b9ab505b9efad9.
(Move PREEMPT_RCU preemption to switch_to() invocation).
Testing by Sasha Levin <levinsasha928@gmail.com> showed that this
can result in deadlock due to invoking the scheduler when one of
the runqueue locks is held.  Because this commit was simply a
performance optimization, revert it.

Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
12 years agoLinux 3.5-rc5 v3.5-rc5
Linus Torvalds [Sat, 30 Jun 2012 23:08:57 +0000 (16:08 -0700)]
Linux 3.5-rc5