Ram Pai [Thu, 22 Sep 2011 07:48:58 +0000 (15:48 +0800)]
Resource: fix wrong resource window calculation
__find_resource() incorrectly returns a resource window which overlaps
an existing allocated window. This happens when the parent's
resource-window spans 0x00000000 to 0xffffffff and is entirely allocated
to all its children resource-windows.
__find_resource() looks for gaps in resource allocation among the
children resource windows. When it encounters the last child window it
blindly tries the range next to one allocated to the last child. Since
the last child's window ends at 0xffffffff the calculation overflows,
leading the algorithm to believe that any window in the range 0x0000000
to 0xfffffff is available for allocation. This leads to a conflicting
window allocation.
Michal Ludvig reported this issue seen on his platform. The following
patch fixes the problem and has been verified by Michal. I believe this
bug has been there for ages. It got exposed by git commit 2bbc6942273b
("PCI : ability to relocate assigned pci-resources")
Signed-off-by: Ram Pai <linuxram@us.ibm.com> Tested-by: Michal Ludvig <mludvig@logix.net.nz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus
* 'v4l_for_linus' of git://linuxtv.org/mchehab/for_linus:
[media] omap3isp: Fix build error in ispccdc.c
[media] uvcvideo: Fix crash when linking entities
[media] v4l: Make sure we hold a reference to the v4l2_device before using it
[media] v4l: Fix use-after-free case in v4l2_device_release
[media] uvcvideo: Set alternate setting 0 on resume if the bus has been reset
[media] OMAP_VOUT: Fix build break caused by update_mode removal in DSS2
Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] cio: fix cio_tpi ignoring adapter interrupts
[S390] gmap: always up mmap_sem properly
[S390] Do not clobber personality flags on exec
* git://github.com/davem330/sparc:
sparc64: Force the execute bit in OpenFirmware's translation entries.
sparc: Make '-p' boot option meaningful again.
sparc, exec: remove redundant addr_limit assignment
sparc64: Future proof Niagara cpu detection.
Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux
* 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux:
drm/i915: FBC off for ironlake and older, otherwise on by default
drm/i915: Enable SDVO hotplug interrupts for HDMI and DVI
drm/i915: Enable dither whenever display bpc < frame buffer bpc
powerpc: Fix device-tree matching for Apple U4 bridge
Apple Quad G5 has some oddity in it's device-tree which causes the new
generic matching code to fail to relate nodes for PCI-E devices below U4
with their respective struct pci_dev. This breaks graphics on those
machines among others.
This fixes it using a quirk which copies the node pointer from the host
bridge for the root complex, which makes the generic code work for the
children afterward.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bootup: move 'usermodehelper_enable()' a little earlier
Commit d5767c53535a ("bootup: move 'usermodehelper_enable()' to the end
of do_basic_setup()") moved 'usermodehelper_enable()' to end of
do_basic_setup() to after the initcalls. But then I get failed to let
uvesafb work on my computer, and lose the splash boot.
So maybe we could start usermodehelper_enable a little early to make
some task work that need eary init with the help of user mode.
[ I would *really* prefer that initcalls not call into user space - even
the real 'init' hasn't been execve'd yet, after all! But for uvesafb
it really does look like we don't have much choice.
I considered doing this when we mount the root filesystem, but
depending on config options that is in multiple places. We could do
the usermode helper enable as a rootfs_initcall()..
So I'm just using wang yanqing's trivial patch. It's not wonderful,
but it's simple and should work. We should revisit this some day,
though. - Linus ]
David S. Miller [Thu, 29 Sep 2011 19:18:59 +0000 (12:18 -0700)]
sparc64: Force the execute bit in OpenFirmware's translation entries.
In the OF 'translations' property, the template TTEs in the mappings
never specify the executable bit. This is the case even though some
of these mappings are for OF's code segment.
Therefore, we need to force the execute bit on in every mapping.
This problem can only really trigger on Niagara/sun4v machines and the
history behind this is a little complicated.
Previous to sun4v, the sun4u TTE entries lacked a hardware execute
permission bit. So OF didn't have to ever worry about setting
anything to handle executable pages. Any valid TTE loaded into the
I-TLB would be respected by the chip.
But sun4v Niagara chips have a real hardware enforced executable bit
in their TTEs. So it has to be set or else the I-TLB throws an
instruction access exception with type code 6 (protection violation).
We've been extremely fortunate to not get bitten by this in the past.
The best I can tell is that the OF's mappings for it's executable code
were mapped using permanent locked mappings on sun4v in the past.
Therefore, the fact that we didn't have the exec bit set in the OF
translations we would use did not matter in practice.
Thanks to Greg Onufer for helping me track this down.
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul E. McKenney [Wed, 24 Aug 2011 23:52:09 +0000 (16:52 -0700)]
rcu: Move propagation of ->completed from rcu_start_gp() to rcu_report_qs_rsp()
It is possible for the CPU that noted the end of the prior grace period
to not need a new one, and therefore to decide to propagate ->completed
throughout the rcu_node tree without starting another grace period.
However, in so doing, it releases the root rcu_node structure's lock,
which can allow some other CPU to start another grace period. The first
CPU will be propagating ->completed in parallel with the second CPU
initializing the rcu_node tree for the new grace period. In theory
this is harmless, but in practice we need to keep things simple.
This commit therefore moves the propagation of ->completed to
rcu_report_qs_rsp(), and refrains from marking the old grace period
as having been completed until it has finished doing this. This
prevents anyone from starting a new grace period concurrently with
marking the old grace period as having been completed.
Of course, the optimization where a CPU needing a new grace period
doesn't bother marking the old one completed is still in effect:
In that case, the marking happens implicitly as part of initializing
the new grace period.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Sun, 21 Aug 2011 01:29:32 +0000 (18:29 -0700)]
rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states
The purpose of rcu_needs_cpu_flush() was to iterate on pushing the
current grace period in order to help the current CPU enter dyntick-idle
mode. However, this can result in failures if the CPU starts entering
dyntick-idle mode, but then backs out. In this case, the call to
rcu_pending() from rcu_needs_cpu_flush() might end up announcing a
non-existing quiescent state.
This commit therefore removes rcu_needs_cpu_flush() in favor of letting
the dyntick-idle machinery at the end of the softirq handler push the
loop along via its call to rcu_pending().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Mike Galbraith [Fri, 19 Aug 2011 18:39:11 +0000 (11:39 -0700)]
rcu: Wire up RCU_BOOST_PRIO for rcutree
RCU boost threads start life at RCU_BOOST_PRIO, while others remain
at RCU_KTHREAD_PRIO. While here, change thread names to match other
kthreads, and adjust rcu_yield() to not override the priority set by
the user. This last change sets the stage for runtime changes to
priority in the -rt tree.
Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Thu, 18 Aug 2011 16:30:32 +0000 (09:30 -0700)]
rcu: Make rcu_torture_boost() exit loops at end of test
One of the loops in rcu_torture_boost() fails to check kthread_should_stop(),
and thus might be slowing or even stopping completion of rcutorture tests
at rmmod time. This commit adds the kthread_should_stop() check to the
offending loop.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Wed, 17 Aug 2011 19:39:34 +0000 (12:39 -0700)]
rcu: Make rcu_torture_fqs() exit loops at end of test
The rcu_torture_fqs() function can prevent the rcutorture tests from
completing, resulting in a hang. This commit therefore ensures that
rcu_torture_fqs() will exit its inner loops at the end of the test,
and also applies the newish ULONG_CMP_LT() macro to time comparisons.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Wed, 17 Aug 2011 00:46:46 +0000 (17:46 -0700)]
rcu: Permit rt_mutex_unlock() with irqs disabled
Create a separate lockdep class for the rt_mutex used for RCU priority
boosting and enable use of rt_mutex_lock() with irqs disabled. This
prevents RCU priority boosting from falling prey to deadlocks when
someone begins an RCU read-side critical section in preemptible state,
but releases it with an irq-disabled lock held.
Unfortunately, the scheduler's runqueue and priority-inheritance locks
still must either completely enclose or be completely enclosed by any
overlapping RCU read-side critical section.
This version removes a redundant local_irq_restore() noted by
Yong Zhang.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Sun, 14 Aug 2011 22:56:54 +0000 (15:56 -0700)]
rcu: Avoid having just-onlined CPU resched itself when RCU is idle
CPUs set rdp->qs_pending when coming online to resolve races with
grace-period start. However, this means that if RCU is idle, the
just-onlined CPU might needlessly send itself resched IPIs. Adjust
the online-CPU initialization to avoid this, and also to correctly
cause the CPU to respond to the current grace period if needed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Josh Boyer <jwboyer@redhat.com> Tested-by: Christian Hoffmann <email@christianhoffmann.info>
Paul E. McKenney [Sat, 13 Aug 2011 20:31:47 +0000 (13:31 -0700)]
rcu: Suppress NMI backtraces when stall ends before dump
It is possible for an RCU CPU stall to end just as it is detected, in
which case the current code will uselessly dump all CPU's stacks.
This commit therefore checks for this condition and refrains from
sending needless NMIs.
And yes, the stall might also end just after we checked all CPUs and
tasks, but in that case we would at least have given some clue as
to which CPU/task was at fault.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Greater use of RCU during early boot (before the scheduler is operating)
is causing RCU to attempt to start grace periods during that time, which
in turn is resulting in both RCU and the callback functions attempting
to use the scheduler before it is ready.
This commit prevents these problems by prohibiting RCU grace periods
until after the scheduler has spawned the first non-idle task.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Commit 7765be (Fix RCU_BOOST race handling current->rcu_read_unlock_special)
introduced a new ->rcu_boosted field in the task structure. This is
redundant because the existing ->rcu_boost_mutex will be non-NULL at
any time that ->rcu_boosted is nonzero. Therefore, this commit removes
->rcu_boosted and tests ->rcu_boost_mutex instead.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
rcu: Prevent early boot set_need_resched() from __rcu_pending()
There isn't a whole lot of point in poking the scheduler before there
are other tasks to switch to. This commit therefore adds a check
for rcu_scheduler_fully_active in __rcu_pending() to suppress any
pre-scheduler calls to set_need_resched(). The downside of this approach
is additional runtime overhead in a reasonably hot code path.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
rcu: Dump local stack if cannot dump all CPUs' stacks
The trigger_all_cpu_backtrace() function is a no-op in architectures that
do not define arch_trigger_all_cpu_backtrace. On such architectures, RCU
CPU stall warning messages contain no stack trace information, which makes
debugging quite difficult. This commit therefore substitutes dump_stack()
for architectures that do not define arch_trigger_all_cpu_backtrace,
so that at least the local CPU's stack is dumped as part of the RCU CPU
stall warning message.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
rcu: Move __rcu_read_unlock()'s barrier() within if-statement
We only need to constrain the compiler if we are actually exiting
the top-level RCU read-side critical section. This commit therefore
moves the first barrier() cal in __rcu_read_unlock() to inside the
"if" statement, thus avoiding needless register flushes for inner
rcu_read_unlock() calls.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
rcu: Improve rcu_assign_pointer() and RCU_INIT_POINTER() documentation
The differences between rcu_assign_pointer() and RCU_INIT_POINTER() are
subtle, and it is easy to use the the cheaper RCU_INIT_POINTER() when
the more-expensive rcu_assign_pointer() should have been used instead.
The consequences of this mistake are quite severe.
This commit therefore carefully lays out the situations in which it it
permissible to use RCU_INIT_POINTER().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Eric Dumazet [Mon, 1 Aug 2011 05:09:25 +0000 (22:09 -0700)]
rcu: Make rcu_assign_pointer() unconditionally insert a memory barrier
Recent changes to gcc give warning messages on rcu_assign_pointers()'s
checks that allow it to determine when it is OK to omit the memory
barrier. Stephen Hemminger tried a number of gcc tricks to silence
this warning, but #pragmas and CPP macros do not work together in the
way that would be required to make this work.
However, we now have RCU_INIT_POINTER(), which already omits this
memory barrier, and which therefore may be used when assigning NULL to
an RCU-protected pointer that is accessible to readers. This commit
therefore makes rcu_assign_pointer() unconditionally emit the memory
barrier.
Reported-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Sat, 30 Jul 2011 14:32:48 +0000 (07:32 -0700)]
rcu: Make rcu_implicit_dynticks_qs() locals be correct size
When the ->dynticks field in the rcu_dynticks structure changed to an
atomic_t, its size on 64-bit systems changed from 64 bits to 32 bits.
The local variables in rcu_implicit_dynticks_qs() need to change as
well, hence this commit.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
rcu: Eliminate in_irq() checks in rcu_enter_nohz()
The in_irq() check in rcu_enter_nohz() is redundant because if we really
are in an interrupt, the attempt to re-enter dyntick-idle mode will invoke
rcu_needs_cpu() in any case, which will force the check for RCU callbacks.
So this commit removes the check along with the set_need_resched().
Suggested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Shi, Alex [Thu, 28 Jul 2011 06:56:12 +0000 (14:56 +0800)]
nohz: Remove nohz_cpu_mask
RCU no longer uses this global variable, nor does anyone else. This
commit therefore removes this variable. This reduces memory footprint
and also removes some atomic instructions and memory barriers from
the dyntick-idle path.
Signed-off-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Thu, 21 Jul 2011 23:00:17 +0000 (16:00 -0700)]
rcu: Allow rcutorture's stat_interval parameter to be changed at runtime
When rcutorture is compiled directly into the kernel
(instead of separately as a module), it is necessary to specify
rcutorture.stat_interval as a kernel command-line parameter, otherwise,
the rcu_torture_stats kthread is never started. However, when working
with the system after it has booted, it is convenient to be able to
change the time between statistic printing, particularly when logged
into the console.
This commit therefore allows the stat_interval parameter to be changed
at runtime.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Mon, 18 Jul 2011 23:54:51 +0000 (16:54 -0700)]
rcu: Remove unused and redundant interfaces
The rcu_dereference_bh_protected() and rcu_dereference_sched_protected()
macros are synonyms for rcu_dereference_protected() and are not used
anywhere in mainline. This commit therefore removes them.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Michal Hocko [Fri, 8 Jul 2011 15:48:24 +0000 (08:48 -0700)]
rcu: Not necessary to pass rcu_read_lock_held() to rcu_dereference_protected()
Since ca5ecddf (rcu: define __rcu address space modifier for sparse)
rcu_dereference_check() use rcu_read_lock_held() as a part of condition
automatically. Therefore, callers of rcu_dereference_check() no longer
need to pass rcu_read_lock_held() to rcu_dereference_check().
Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Mon, 27 Jun 2011 07:17:43 +0000 (00:17 -0700)]
rcu: Simplify quiescent-state accounting
There is often a delay between the time that a CPU passes through a
quiescent state and the time that this quiescent state is reported to the
RCU core. It is quite possible that the grace period ended before the
quiescent state could be reported, for example, some other CPU might have
deduced that this CPU passed through dyntick-idle mode. It is critically
important that quiescent state be counted only against the grace period
that was in effect at the time that the quiescent state was detected.
Previously, this was handled by recording the number of the last grace
period to complete when passing through a quiescent state. The RCU
core then checks this number against the current value, and rejects
the quiescent state if there is a mismatch. However, one additional
possibility must be accounted for, namely that the quiescent state was
recorded after the prior grace period completed but before the current
grace period started. In this case, the RCU core must reject the
quiescent state, but the recorded number will match. This is handled
when the CPU becomes aware of a new grace period -- at that point,
it invalidates any prior quiescent state.
This works, but is a bit indirect. The new approach records the current
grace period, and the RCU core checks to see (1) that this is still the
current grace period and (2) that this grace period has not yet ended.
This approach simplifies reasoning about correctness, and this commit
changes over to this new approach.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Sat, 25 Jun 2011 13:36:56 +0000 (06:36 -0700)]
rcu: Add grace-period, quiescent-state, and call_rcu trace events
Add trace events to record grace-period start and end, quiescent states,
CPUs noticing grace-period start and end, grace-period initialization,
call_rcu() invocation, tasks blocking in RCU read-side critical sections,
tasks exiting those same critical sections, force_quiescent_state()
detection of dyntick-idle and offline CPUs, CPUs entering and leaving
dyntick-idle mode (except from NMIs), CPUs coming online and going
offline, and CPUs being kicked for staying in dyntick-idle mode for too
long (as in many weeks, even on 32-bit systems).
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
rcu: Add the rcu flavor to callback trace events
The earlier trace events for registering RCU callbacks and for invoking
them did not include the RCU flavor (rcu_bh, rcu_preempt, or rcu_sched).
This commit adds the RCU flavor to those trace events.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Sat, 18 Jun 2011 16:55:39 +0000 (09:55 -0700)]
rcu: Make TINY_RCU also use softirq for RCU_BOOST=n
This patch #ifdefs TINY_RCU kthreads out of the kernel unless RCU_BOOST=y,
thus eliminating context-switch overhead if RCU priority boosting has
not been configured.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Tue, 21 Jun 2011 08:59:33 +0000 (01:59 -0700)]
rcu: Move RCU_BOOST declarations to allow compiler checking
Andi Kleen noticed that one of the RCU_BOOST data declarations was
out of sync with the definition. Move the declarations so that the
compiler can do the checking in the future.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Tue, 21 Jun 2011 08:14:54 +0000 (01:14 -0700)]
rcu: Add RCU type to callback-invocation tracing
Add a string to the rcu_batch_start() and rcu_batch_end() trace
messages that indicates the RCU type ("rcu_sched", "rcu_bh", or
"rcu_preempt"). The trace messages for the actual invocations
themselves are not marked, as it should be clear from the
rcu_batch_start() and rcu_batch_end() events before and after.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Tue, 21 Jun 2011 07:13:44 +0000 (00:13 -0700)]
rcu: Put names into TINY_RCU structures under RCU_TRACE
In order to allow event tracing to distinguish between flavors of
RCU, we need those names in the relevant RCU data structures. TINY_RCU
has avoided them for memory-footprint reasons, so add them only if
CONFIG_RCU_TRACE=y.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Sun, 19 Jun 2011 05:26:31 +0000 (22:26 -0700)]
rcu: Event-trace markers for computing RCU CPU utilization
This commit adds the trace_rcu_utilization() marker that is to be
used to allow postprocessing scripts compute RCU's CPU utilization,
give or take event-trace overhead. Note that we do not include RCU's
dyntick-idle interface because event tracing requires RCU protection,
which is not available in dyntick-idle mode.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Fri, 17 Jun 2011 22:53:19 +0000 (15:53 -0700)]
rcu: Add event-tracing for RCU callback invocation
There was recently some controversy about the overhead of invoking RCU
callbacks. Add TRACE_EVENT()s to obtain fine-grained timings for the
start and stop of a batch of callbacks and also for each callback invoked.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Tue, 21 Jun 2011 08:48:03 +0000 (01:48 -0700)]
rcu: Don't destroy rcu_torture_boost() callback until it is done
The rcu_torture_boost() cleanup code destroyed debug-objects state before
waiting for the last RCU callback to be invoked, resulting in rare but
very real debug-objects warnings. Move the destruction to after the
waiting to fix this problem.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
rcu: Drive configuration directly from SMP and PREEMPT
This commit eliminates the possibility of running TREE_PREEMPT_RCU
when SMP=n and of running TINY_RCU when PREEMPT=y. People who really
want these combinations can hand-edit init/Kconfig, but eliminating
them as choices for production systems reduces the amount of testing
required. It will also allow cutting out a few #ifdefs.
Note that running TREE_RCU and TINY_RCU on single-CPU systems using
SMP-built kernels is still supported.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
It has long been the case that the architecture must call nmi_enter()
and nmi_exit() rather than irq_enter() and irq_exit() in order to
permit RCU read-side critical sections in NMIs. Catch the documentation
up with reality.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Now that the RCU API contains synchronize_rcu_bh(), synchronize_sched(),
call_rcu_sched(), and rcu_bh_expedited()...
Make rcutorture test synchronize_rcu_bh(), getting rid of the old
rcu_bh_torture_synchronize() workaround. Similarly, make rcutorture test
synchronize_sched(), getting rid of the old sched_torture_synchronize()
workaround. Make rcutorture test call_rcu_sched() instead of wrappering
synchronize_sched(). Also add testing of rcu_bh_expedited().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Fri, 27 May 2011 05:14:36 +0000 (22:14 -0700)]
rcu: Abstract common code for RCU grace-period-wait primitives
Pull the code that waits for an RCU grace period into a single function,
which is then called by synchronize_rcu() and friends in the case of
TREE_RCU and TREE_PREEMPT_RCU, and from rcu_barrier() and friends in
the case of TINY_RCU and TINY_PREEMPT_RCU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Update rcutorture documentation to account for boosting, new types of
RCU torture testing that have been added over the past few years, and
the memory-barrier testing that was added an embarrassingly long time
ago.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Take a first step towards untangling Linux kernel header files by
placing the struct rcu_head definition into include/linux/types.h
and including include/linux/types.h in include/linux/rcupdate.h
where struct rcu_head used to be defined. The actual inclusion point
for include/linux/types.h is with the rest of the #include directives
rather than at the point where struct rcu_head used to be defined,
as suggested by Mathieu Desnoyers.
Once this is in place, then header files that need only rcu_head
can include types.h rather than rcupdate.h.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Paul E. McKenney [Tue, 24 May 2011 15:31:09 +0000 (08:31 -0700)]
rcu: Restore checks for blocking in RCU read-side critical sections
Long ago, using TREE_RCU with PREEMPT would result in "scheduling
while atomic" diagnostics if you blocked in an RCU read-side critical
section. However, PREEMPT now implies TREE_PREEMPT_RCU, which defeats
this diagnostic. This commit therefore adds a replacement diagnostic
based on PROVE_RCU.
Because rcu_lockdep_assert() and lockdep_rcu_dereference() are now being
used for things that have nothing to do with rcu_dereference(), rename
lockdep_rcu_dereference() to lockdep_rcu_suspicious() and add a third
argument that is a string indicating what is suspicious. This third
argument is passed in from a new third argument to rcu_lockdep_assert().
Update all calls to rcu_lockdep_assert() to add an informative third
argument.
Also, add a pair of rcu_lockdep_assert() calls from within
rcu_note_context_switch(), one complaining if a context switch occurs
in an RCU-bh read-side critical section and another complaining if a
context switch occurs in an RCU-sched read-side critical section.
These are present only if the PROVE_RCU kernel parameter is enabled.
Finally, fix some checkpatch whitespace complaints in lockdep.c.
Again, you must enable PROVE_RCU to see these new diagnostics. But you
are enabling PROVE_RCU to check out new RCU uses in any case, aren't you?
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Shaohua Li [Thu, 16 Jun 2011 23:02:54 +0000 (16:02 -0700)]
rcu: Avoid unnecessary self-wakeup of per-CPU kthreads
There are a number of cases where the RCU can find additional work
for the per-CPU kthread within the context of that per-CPU kthread.
In such cases, the per-CPU kthread is already running, so attempting
to wake itself up does nothing except waste CPU cycles. This commit
therefore checks to see if it is in the per-CPU kthread context,
omitting the wakeup in this case.
Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Eric Dumazet [Thu, 16 Jun 2011 22:53:18 +0000 (15:53 -0700)]
rcu: Use kthread_create_on_node()
Commit a26ac2455ffc (move TREE_RCU from softirq to kthread) added
per-CPU kthreads. However, kthread creation uses kthread_create(), which
can put the kthread's stack and task struct on the wrong NUMA node.
Therefore, use kthread_create_on_node() instead of kthread_create()
so that the stacks and task structs are placed on the correct NUMA node.
A similar change was carried out in commit 94dcf29a11b3 (kthread:
use kthread_create_on_node()).
Also change rcutorture's priority-boost-test kthread creation.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tejun Heo <tj@kernel.org> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Andrew Morton <akpm@linux-foundation.org> CC: Andi Kleen <ak@linux.intel.com> CC: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
bootup: move 'usermodehelper_enable()' to the end of do_basic_setup()
Doing it just before starting to call into cpu_idle() made a sick kind
of sense only because the original bug we fixed (see commit 288d5abec831: "Boot up with usermodehelper disabled") was about problems
with some scheduler data structures not being initialized, and they had
better be initialized at that point.
But it really didn't make any other conceptual sense, and doing it after
the initial "schedule()" call for the idle thread actually opened up a
race: what if the main initialization thread did everything without
needing to sleep, and got all the way into user land too? Without
actually having scheduled back to the idle thread?
Now, in normal circumstances that doesn't ever happen, but it looks like
Richard Cochran triggered exactly that on his ARM IXP4xx machines:
"I have some ARM IXP4xx based machines that use the two on chip MAC
ports (aka NPEs). The NPE needs a firmware in order to function.
Ever since the following commit [that 288d5abec831 one], it is no
longer possible to bring up the interfaces during the init scripts."
with a call trace showing an ioctl coming from user space. Richard says:
"The init is busybox, and the startup script does mount, syslogd, and
then ifup, so that all can go by quickly."
The fix is to move the usermodehelper_enable() into the main 'init'
thread, and just put it after we've done all our initcalls. By then,
everything really should be up, but we've obviously not actually started
the user-mode portion of init yet.
Reported-and-tested-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sage Weil [Wed, 28 Sep 2011 17:11:04 +0000 (10:11 -0700)]
libceph: fix pg_temp mapping update
The incremental map updates have a record for each pg_temp mapping that is
to be add/updated (len > 0) or removed (len == 0). The old code was
written as if the updates were a complete enumeration; that was just wrong.
Update the code to remove 0-length entries and drop the rbtree traversal.
This avoids misdirected (and hung) requests that manifest as server
errors like
[WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11
Sage Weil [Wed, 28 Sep 2011 17:08:27 +0000 (10:08 -0700)]
libceph: fix pg_temp mapping calculation
We need to apply the modulo pg_num calculation before looking up a pgid in
the pg_temp mapping rbtree. This fixes pg_temp mappings, and fixes
(some) misdirected requests that result in messages like
[WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11
on the server and stall make the client block without getting a reply (at
least until the pg_temp mapping goes way, but that can take a long long
time).
* git://github.com/davem330/net:
ipv6-multicast: Fix memory leak in IPv6 multicast.
ipv6: check return value for dst_alloc
net: check return value for dst_alloc
ipv6-multicast: Fix memory leak in input path.
bnx2x: add missing break in bnx2x_dcbnl_get_cap
bnx2x: fix WOL by enablement PME in config space
bnx2x: fix hw attention handling
net: fix a typo in Documentation/networking/scaling.txt
ath9k: Fix a dma warning/memory leak
rtlwifi: rtl8192cu: Fix unitialized struct
iwlagn: fix dangling scan request
batman-adv: do_bcast has to be true for broadcast packets only
cfg80211: Fix validation of AKM suites
iwlegacy: do not use interruptible waits
iwlegacy: fix command queue timeout
ath9k_hw: Fix Rx DMA stuck for AR9003 chips
* git://bedivere.hansenpartnership.com/git/scsi-rc-fixes-2.6:
[SCSI] 3w-9xxx: fix iommu_iova leak
[SCSI] cxgb3i: convert cdev->l2opt to use rcu to prevent NULL dereference
[SCSI] scsi: qla4xxx needs libiscsi.o
[SCSI] libsas: fix failure to revalidate domain for anything but the first expander child.
[SCSI] aacraid: reset should disable MSI interrupt
Hannes Reinecke [Wed, 28 Sep 2011 14:07:01 +0000 (08:07 -0600)]
block: Free queue resources at blk_release_queue()
A kernel crash is observed when a mounted ext3/ext4 filesystem is
physically removed. The problem is that blk_cleanup_queue() frees up
some resources eg by calling elevator_exit(), which are not checked for
in normal operation. So we should rather move these calls to the
destructor function blk_release_queue() as at that point all remaining
references are gone. However, in doing so we have to ensure that any
externally supplied queue_lock is disconnected as the driver might free
up the lock after the call of blk_cleanup_queue(),
Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge branch 'for-linus' of git://github.com/tiwai/sound
* 'for-linus' of git://github.com/tiwai/sound:
ASoC: ssm2602: Re-enable oscillator after suspend
ALSA: usb-audio: Check for possible chip NULL pointer before clearing probing flag
ALSA: hda/realtek - Don't detect LO jack when identical with HP
ALSA: hda/realtek - Avoid bogus HP-pin assignment
ALSA: HDA: No power nids on 92HD93
ASoC: omap-mcbsp: Do not attempt to change DAI sysclk if stream is active
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
That flag no longer makes sense, since we don't look up automount points
as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT
handling was buggy to begin with: it would avoid automounting even for
cases where we really *needed* to do the automount handling, and could
return ENOENT for autofs entries that hadn't been instantiated yet.
With our new non-eager automount semantics, one discussion has been
about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the
newfstatat() and fstatat64() system calls), but it's probably not worth
it: you can always force at least directory automounting by simply
adding the final '/' to the filename, which works for *all* of the stat
family system calls, old and new.
So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a
result of our bad default behavior.
Acked-by: Ian Kent <raven@themaw.net> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the the internal oscillator is powered down when entering BIAS_OFF
state, but not re-enabled when going back to BIAS_STANDBY. As a result the
CODEC will stop working after suspend if the internal oscillator is used to
generate the sysclock signal. This patch fixes it by clearing the appropriate
bit in the power down register when the CODEC is re-enabled.
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: stable@kernel.org
VFS: Fix the remaining automounter semantics regressions
The concensus seems to be that system calls such as stat() etc should
not trigger an automount. Neither should the l* versions.
This patch therefore adds a LOOKUP_AUTOMOUNT flag to tag those lookups
that _should_ trigger an automount on the last path element.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ Edited to leave out the cases that are already covered by LOOKUP_OPEN,
LOOKUP_DIRECTORY and LOOKUP_CREATE - all of which also fundamentally
force automounting for their own reasons - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since we've now turned around and made LOOKUP_FOLLOW *not* force an
automount, we want to add the ability to force an automount event on
lookup even if we don't happen to have one of the other flags that force
it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..)
Most cases will never want to use this, since you'd normally want to
delay automounting as long as possible, which usually implies
LOOKUP_OPEN (when we open a file or directory, we really cannot avoid
the automount any more).
But Trond argued sufficiently forcefully that at a minimum bind mounting
a file and quotactl will want to force the automount lookup. Some other
cases (like nfs_follow_remote_path()) could use it too, although
LOOKUP_DIRECTORY would work there as well.
This commit just adds the flag and logic, no users yet, though. It also
doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and
was made irrelevant by the same change that made us not follow on
LOOKUP_FOLLOW.
Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Ian Kent <raven@themaw.net> Cc: Jeff Layton <jlayton@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg KH <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ARM: EXYNOS4: Rename sclk_cam clocks for FIMC driver
The sclk_cam clocks are now controlled by the top level FIMC media
device driver bound to "s5p-fimc-md" platform device.
Rename sclk_cam clocks so they accessible by the corresponding
driver.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
ARM: S5PV210: Rename sclk_cam clocks for FIMC media driver
The sclk_cam clocks are now controlled by the top level FIMC media
device driver bound to "s5p-fimc-md" platform device.
Rename sclk_cam clocks so they accessible by the corresponding
driver.
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Merge branch 'hwmon-for-linus' of git://github.com/groeck/linux
* 'hwmon-for-linus' of git://github.com/groeck/linux:
hwmon: (coretemp) remove struct platform_data * parameter from create_core_data()
hwmon: (coretemp) constify static data
hwmon: (coretemp) don't use kernel assigned CPU number as platform device ID
hwmon: (ds620) Fix handling of negative temperatures
hwmon: (w83791d) rename prototype parameter from 'register' to 'reg'
hwmon: (coretemp) Don't use threshold registers for tempX_max
hwmon: (coretemp) Let the user force TjMax
hwmon: (coretemp) Drop duplicate function get_pkg_tjmax
Merge branch 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm
* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
ARM: 7099/1: futex: preserve oldval in SMP __futex_atomic_op
ARM: dma-mapping: free allocated page if unable to map
ARM: fix vmlinux.lds.S discarding sections
ARM: nommu: fix warning with checksyscalls.sh
ARM: 7091/1: errata: D-cache line maintenance operation by MVA may not succeed
proper dma_unmapping and freeing of skb's has to be done in the rx
cleanup for EDMA chipsets when the device is unloaded and this also
seems to address the following warning which shows up occasionally when
the device is unloaded
Larry Finger [Fri, 23 Sep 2011 03:59:02 +0000 (22:59 -0500)]
rtlwifi: rtl8192cu: Fix unitialized struct
Driver rtl8192cu assigns a new struct rtl_tcb_desc object, but fails to
clear it.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Stable <stable@kernel.org> [2.6.39+] Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Thu, 22 Sep 2011 21:59:04 +0000 (14:59 -0700)]
iwlagn: fix dangling scan request
If iwl_scan_initiate() fails for any reason,
priv->scan_request and priv->scan_vif are left
dangling. This can lead to a crash later when
iwl_bg_scan_completed() tries to run a pending
scan request.
In practice, this seems to be very rare due to
the STATUS_SCANNING check earlier. That check,
however, is wrong -- it should allow a scan to
be queued when a reset/roc scan is going on.
When a normal scan is already going on, a new
one can't be issued by mac80211, so that code
can be removed completely. I introduced this
bug when adding off-channel support in commit 266af4c745952e9bebf687dd68af58df553cb59d.
Cc: stable@kernel.org [3.0] Reported-by: Peng Yan <peng.yan@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
PM / Clocks: Do not acquire a mutex under a spinlock
Commit b7ab83e (PM: Use spinlock instead of mutex in clock
management functions) introduced a regression causing clocks_mutex
to be acquired under a spinlock. This happens because
pm_clk_suspend() and pm_clk_resume() call pm_clk_acquire() under
pcd->lock, but pm_clk_acquire() executes clk_get() which causes
clocks_mutex to be acquired. Similarly, __pm_clk_remove(),
executed under pcd->lock, calls clk_put(), which also causes
clocks_mutex to be acquired.
To fix those problems make pm_clk_add() call pm_clk_acquire(), so
that pm_clk_suspend() and pm_clk_resume() don't have to do that.
Change pm_clk_remove() and pm_clk_destroy() to separate
modifications of the pcd->clock_list list from the actual removal of
PM clock entry objects done by __pm_clk_remove().
Reported-and-tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Analog to git commit 59e4c3a2fe9cb1681bb2cff508ff79466f7585ba
do not clear the additional personality flags on exec. We
need to inherit the personality bits in PER_MASK across exec.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
James Bottomley [Sun, 18 Sep 2011 14:56:20 +0000 (18:56 +0400)]
[SCSI] 3w-9xxx: fix iommu_iova leak
Following reports on the list, it looks like the 3e-9xxx driver will leak dma
mappings every time we get a transient queueing error back from the card.
This is because it maps the sg list in the routine that sends the command, but
doesn't unmap again in the transient failure path (even though the command is
sent back to the block layer). Fix by unmapping before returning the status.
Reported-by: Chris Boot <bootc@bootc.net> Tested-by: Chris Boot <bootc@bootc.net> Acked-by: Adam Radford <aradford@gmail.com> Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The root cause was an EEH error, which sent us down the offload_close path in
the cxgb3 driver, which in turn sets cdev->l2opt to NULL, without regard for
upper layer driver (like the cxgbi drivers) which might have execution contexts
in the middle of its use. The result is the oops above, when t3_l2t_get attempts
to dereference L2DATA(cdev)->nentries in arp_hash right after the EEH error handler sets it to NULL.
The fix is to prevent the setting of the NULL pointer until after there are no
further users of it. The t3cdev->l2opt pointer is now converted to be an rcu
pointer and the L2DATA macro is now called under the protection of the
rcu_read_lock(). When the EEH error path:
t3_adapter_error->offload_close->cxgb3_offload_deactivate
Is exectured, setting of that l2opt pointer to NULL, is now gated on an rcu
quiescence point, preventing, allowing L2DATA callers to safely check for a NULL
pointer without concern that the underlying data will be freeded before the
pointer is dereferenced.
This has been tested by the reporter and shown to fix the reproted oops
[nhorman: fix up unitinialised variable reported by Dan Carpenter] Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reviewed-by: Karen Xie <kxie@chelsio.com> Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
ALSA: hda/realtek - Don't detect LO jack when identical with HP
The spec->autocfg.line_out_pins[] may contain the same pins as hp_pins[]
depending on the configuration. When they are identical, detecting the
line_jack_present flag screws up the auto-mute because alc_line_automute()
is called unconditionally at initialization while it won't be triggered
by unsol events, thus the old line_jack_present flag is kept for the
whole run.
For fixing this buggy behavior, the driver needs to check whether the
line-outs are really individual, and skip if same as headphone jacks.
Will Deacon [Fri, 23 Sep 2011 13:34:12 +0000 (14:34 +0100)]
ARM: 7099/1: futex: preserve oldval in SMP __futex_atomic_op
The SMP implementation of __futex_atomic_op clobbers oldval with the
status flag from the exclusive store. This causes it to always read as
zero when performing the FUTEX_OP_CMP_* operation.
This patch updates the ARM __futex_atomic_op implementations to take a
tmp argument, allowing us to store the strex status flag without
overwriting the register containing oldval.
Cc: stable@kernel.org Reported-by: Minho Ban <mhban@samsung.com> Reviewed-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>