Uma Sharma [Mon, 24 Mar 2014 05:32:09 +0000 (22:32 -0700)]
rcu: Variable name changed in tree_plugin.h and used in tree.c
The variable and struct both having the name "rcu_state" confuses
sparse in some situations, so this commit changes the variable to
"rcu_state_p" in order to avoid this confusion. This also makes
things easier for human readers.
Signed-off-by: Uma Sharma <uma.sharma523@gmail.com>
[ paulmck: Changed the declaration and several additional uses. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This splat is due to the fact that torture_init_begin() and
torture_init_end() are both marked with __init, despite their use
at runtime. This commit therefore removes __init from both functions.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
torture: Check for multiple concurrent torture tests
The torture tests are designed to run in isolation, but do not enforce
this isolation. This commit therefore checks for concurrent torture
tests, and refuses to start new tests while old tests are running.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
locktorture: Remove reference to nonexistent Kconfig parameter
The locktorture module references CONFIG_LOCK_TORTURE_TEST_RUNNABLE,
which does not exist. Which is a good thing, because otherwise
randconfig testing could enable both rcutorture and locktorture
concurrently, which the torture tests are not set up for. This
commit therefore removes the reference, so that test is runnable
immediately only when inserted as a module.
Reported-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
rcutorture: Run rcu_torture_writer at normal priority
There are usually lots of readers and only one writer, so if there has
to be a choice, we would want rcu_torture_writer to win. This commit
therefore removes the set_user_nice() from rcu_torture_writer().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Mon, 24 Mar 2014 20:15:49 +0000 (13:15 -0700)]
rcutorture: Note diffs from git commits
The current scripting only keeps track of the git SHA-1 of the current
HEAD. This can cause confusion in cases where testing ran in a git
tree where changes had not yet been checked in. This commit therefore
also records the output of "git diff HEAD" to provide the information
needed to reconstruct the source tree that was tested.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Thomas Gleixner [Sun, 23 Mar 2014 15:58:27 +0000 (08:58 -0700)]
rcutorture: Add missing destroy_timer_on_stack()
The rcu_torture_reader() function uses an on-stack timer_list structure
which it initializes with setup_timer_on_stack(). However, it fails to
use destroy_timer_on_stack() before exiting, which results in leaking a
tracking object if DEBUG_OBJECTS is enabled. This commit therefore
invokes destroy_timer_on_stack() to avoid this leakage.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Fri, 21 Mar 2014 23:17:56 +0000 (16:17 -0700)]
rcutorture: Explicitly test synchronous grace-period primitives
The original rcu_torture_writer() avoided testing the synchronous
grace-period primitives because they were simply wrappers around
call_rcu() invocations. The testing of these synchronous primitives
was delegated to the fake writers. However, there really is no excuse
not to test them, especially in the case of SRCU, where the wrappering
is somewhat more elaborate. This commit therefore makes the default
rcutorture parameters cause rcu_torture_writer() to include synchronous
grace-period primitives in its testing.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Tue, 18 Mar 2014 17:34:18 +0000 (10:34 -0700)]
torture: Use elapsed time to detect hangs
The kvm-test-1-run.sh currently counts "sleep 1" commands to detect
hangs. This can fail spectacularly on busy systems, where "sleep 1"
might take far longer than one second to complete. This commit
therefore changes hang detection to use elapsed time measurements.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Tue, 18 Mar 2014 03:56:45 +0000 (20:56 -0700)]
rcutorture: Check for rcu_torture_fqs creation errors
The return value from torture_create_kthread() is currently ignored
when creating the rcu_torture_fqs kthread. This commit therefore
captures the return value so that it can be tested for errors.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Mon, 17 Mar 2014 20:42:33 +0000 (13:42 -0700)]
torture: Better summary diagnostics for build failures
The reaction of kvm-recheck.sh is obscure at best, and easy to miss
completely. This commit therefore prints "BUG: Build failed" in the
summary at the end of a run. This commit also adds the line of dashes
in cases where performance info is not available, and also avoids
printing nonsense diagnostics in cases where some of the normal test
output is not available. In addition, this commit saves off the .config
file even when the build fails.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Iulia Manda [Mon, 17 Mar 2014 13:21:21 +0000 (15:21 +0200)]
torture: Notice if an all-zero cpumask is passed inside a critical section
In torture_shuffle_tasks function, the check if an all-zero mask can
be passed to set_cpus_allowed_ptr() is redundant after clearing the
shuffle_idle_cpu bit. If the mask had more than one bit set, after
clearing a bit it has at least one bit set. If the mask had only
one bit set, a check is made at the beginning, where the function
returns, as there is no need to shuffle only one cpu.
Also, this code is executed inside a critical section, delimited by
get_online_cpus(), and put_online_cpus(), preventing CPUs from leaving between
the check of num_online_cpus and the calls to set_cpus_allowed_ptr() function.
Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Wed, 12 Mar 2014 17:26:35 +0000 (10:26 -0700)]
rcutorture: Make rcu_torture_reader() use cond_resched()
The rcu_torture_reader() function currently uses schedule(). This commit
therefore speeds things up a bit by substituting cond_resched().
This change makes rcu_torture_reader() more CPU-bound, so this commit
also adjusts the number of readers (the "nreaders" module parameter,
which feeds into the "nrealreaders" variable) to allow one CPU to be
free of readers on SMP systems. The point of this is to increase the
probability that readers will be watching while an updater makes a change.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Mon, 17 Mar 2014 04:36:25 +0000 (21:36 -0700)]
sched,rcu: Make cond_resched() report RCU quiescent states
Given a CPU running a loop containing cond_resched(), with no
other tasks runnable on that CPU, RCU will eventually report RCU
CPU stall warnings due to lack of quiescent states. Fortunately,
every call to cond_resched() is a perfectly good quiescent state.
Unfortunately, invoking rcu_note_context_switch() is a bit heavyweight
for cond_resched(), especially given the need to disable preemption,
and, for RCU-preempt, interrupts as well.
This commit therefore maintains a per-CPU counter that causes
cond_resched(), cond_resched_lock(), and cond_resched_softirq() to call
rcu_note_context_switch(), but only about once per 256 invocations.
This ratio was chosen in keeping with the relative time constants of
RCU grace periods.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Wed, 16 Apr 2014 17:07:09 +0000 (10:07 -0700)]
percpu: Fix raw_cpu_inc_return()
The definition for raw_cpu_add_return() uses the operation prefix
"raw_add_return_", but the definitions in the various percpu.h files
expect "raw_cpu_add_return_". This commit therefore appropriately
adjusts the definition of raw_cpu_add_return().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Wed, 12 Mar 2014 14:10:41 +0000 (07:10 -0700)]
rcutorture: Export RCU grace-period kthread wait state to rcutorture
This commit allows rcutorture to print additional state for the
RCU grace-period kthreads in cases where RCU seems reluctant to
start a new grace period.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
torture: Dump ftrace buffer when the RCU grace period stalls
This commit adds a call to rcutorture_trace_dump() to dump the ftrace
buffer when the RCU grace period stalls in order to help debug the
stall. Note that this is different than the RCU CPU stall warning,
as it is rcutorture detecting the stall rather than the underlying RCU
implementation.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
torture: Choose bzImage location based on architecture
Currently, the scripts hard-code arch/x86/boot/bzImage, which does not
work well for other architectures. This commit therefore provides a
identify_boot_image function that selects the correct bzImage location
relative to the top of the Linux source tree. This commit also adds a
--bootimage argument that allows selecting some other file, for example,
"vmlinux".
This change requires that the definition of the QEMU variable be
computed earlier in order to identify where to look for the boot image
when it comes time to copy it to the results directory.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Currently, all stuttered kthreads block a jiffy at a time, which can
result in them starting at different times. (Note: This is not an
energy-efficiency problem unless you run torture tests in production,
in which case you have other problems!) This commit increases the
intensity of the restart event by causing kthreads to spin through the
last jiffy, restarting when they see the variable change.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
The current script does record qemu diagnostics, but the user has to
know where to look for them. This commit therefore puts them into the
Warnings file so that kvm-recheck.sh will display them. This change is
especially useful if you are in the habit of killing the qemu process
when you realize that you messed something up, but then later on wonder
why the process terminated early.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
torture: Include "Stopping" string to torture_kthread_stopping()
Currently, torture_kthread_stopping() prints only the name of the
kthread that is stopping, which can be unedifying. This commit therefore
adds "Stopping" to make things more evident.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
rcutorture: Print negatives for SRCU counter wraparound
The srcu_torture_stats() function prints SRCU's per-CPU c[] array with
an unsigned format, which means that the number one less than zero is
a very large number. This commit therefore prints this array with a
signed format in order to improve readability of the rcutorture output.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Fri, 28 Feb 2014 04:26:57 +0000 (20:26 -0800)]
torture: Make "--dryrun script" use same environment as normal run
In a normal torture-test run, the script inherits its environment
variables, but this does not work when producing a script that is
to run later. Therefore, definitions and exports are prepended to
a dryrun script but not to a script that is run immediately. This
commit reconciles this by placing definitions and exports at the
beginning of the script in both cases.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Conflicts:
tools/testing/selftests/rcutorture/bin/kvm.sh
Paul E. McKenney [Fri, 28 Feb 2014 01:11:11 +0000 (17:11 -0800)]
torture: Make "--dryrun script" output self-sufficient
The scripts produced by kvm.sh's "--dryrun script" argument were intended
for debugging rather than to run, but it is easier to debug if the script
output matches exactly what is run. This commit therefore makes this
script runnable.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Rashika Kheria [Thu, 27 Feb 2014 11:46:57 +0000 (17:16 +0530)]
rcutorture: Mark function as static in kernel/rcu/torture.c
Mark functions as static in kernel/rcu/torture.c because they are not
used outside this file.
This eliminates the following warning in kernel/rcu/torture.c:
kernel/rcu/torture.c:902:6: warning: no previous prototype for ‘rcutorture_trace_dump’ [-Wmissing-prototypes]
kernel/rcu/torture.c:1572:6: warning: no previous prototype for ‘rcu_torture_barrier_cbf’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Thu, 27 Feb 2014 00:50:48 +0000 (16:50 -0800)]
torture: Make config-fragment filtering RCU-independent
The torture tests need to set specific values for their respective
Kconfig options (e.g., CONFIG_LOCK_TORTURE_TEST), and must therefore
filter any conflicting definitions from the Kconfig fragment
file. Unfortunately, the code in kvm-build.sh was looking only for
CONFIG_RCU_TORTURE_TEST. This commit therefore handles the general case
of CONFIG_[A-Z]*_TORTURE_TEST.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Wed, 26 Feb 2014 23:39:41 +0000 (15:39 -0800)]
torture: Rename RCU_BUILDONLY to TORTURE_BUILDONLY
This commit makes the torture scripts a bit more RCU-independent by
changing RCU_BUILDONLY to TORTURE_BUILDONLY. It also removes an
unnecessary export command.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Wed, 26 Feb 2014 23:23:21 +0000 (15:23 -0800)]
torture: Rename RCU_KMAKE_ARG to TORTURE_KMAKE_ARG
This commit makes the torture scripts a bit more RCU-independent by
changing RCU_KMAKE_ARG to TORTURE_KMAKE_ARG. It also removes the
unnecessary export command.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Wed, 26 Feb 2014 22:28:43 +0000 (14:28 -0800)]
torture: Allow variations of "defconfig" to be specified
Some environments require some variation on "make defconfig" to initialize
the .config file. This commit therefore adds a --defconfig argument to
allow this to be specified. The default value is of course "defconfig".
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Wed, 26 Feb 2014 20:14:51 +0000 (12:14 -0800)]
torture: Intensify locking test
The current lock_torture_writer() spends too much time sleeping and not
enough time hammering locks, as in an eight-CPU test will often only be
utilizing a CPU or two. This commit therefore makes lock_torture_writer()
sleep less and hammer more.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Paul E. McKenney [Wed, 26 Feb 2014 18:57:04 +0000 (10:57 -0800)]
torture: Make parse-rcutorture.sh less RCU-specific
It can be a bit jarring to see a locking test complain about RCU, so
this commit renames parse-rcutorture.sh to parse-torture.sh and makes
the messages it emits more generic.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Wed, 26 Feb 2014 16:56:57 +0000 (08:56 -0800)]
torture: Remove obsolete builddir options
The --builddir and --relbuilddir options were initially intended to handle
parallel tests. However, since commit 43e38ab3d518 (Enable concurrent
rcutorture runs), the script manages multiple build directories as
needed for parallel testing. This commit therefore removes these two
obsolete options.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Wed, 19 Feb 2014 18:51:42 +0000 (10:51 -0800)]
rcutorture: Add forward-progress checking for writer
The rcutorture output currently does not distinguish between stalls in
the RCU implementation and stalls in the rcu_torture_writer() kthreads.
This commit therefore adds some diagnostics to help distinguish between
these two conditions, at least for the non-SRCU implementations. (SRCU
does not provide evidence of update-side forward progress by design.)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Eric Dumazet [Wed, 16 Apr 2014 16:06:24 +0000 (09:06 -0700)]
softirq: A single rcu_bh_qs() call per softirq set is enough
Calling rcu_bh_qs() after every softirq action is not really needed.
What RCU needs is at least one rcu_bh_qs() per softirq round to note a
quiescent state was passed for rcu_bh.
Note for Paul and myself : this could be inlined as a single instruction
and avoid smp_processor_id()
(sone this_cpu_write(rcu_bh_data.passed_quiesce, 1))
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
rcu: Make large and small sysidle systems use same state machine
Currently, small systems move back into RCU_SYSIDLE_NOT from
RCU_SYSIDLE_SHORT and large systems do not. This works because moving
aggressively to RCU_SYSIDLE_NOT affects only performance, not correctness,
and on small systems, the performance impact should be negligible. That
said, this difference does make RCU a bit more complex, and RCU does not
seem to be suffering from any lack of complexity. This commit therefore
adjusts small-system operation to match that of large systems, so that
the state never moves back to RCU_SYSIDLE_NOT from RCU_SYSIDLE_SHORT.
Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Currently, RCU binds the grace-period kthreads to the timekeeping
CPU only if CONFIG_NO_HZ_FULL_SYSIDLE=y. This means that these
kthreads must be bound manually when CONFIG_NO_HZ_FULL_SYSIDLE=n and
CONFIG_NO_HZ_FULL=y: Otherwise, these kthreads will induce OS jitter on
random CPUs. Given that we are trying to reduce the amount of manual
tweaking required to make CONFIG_NO_HZ_FULL=y work nicely, this commit
makes this binding happen when CONFIG_NO_HZ_FULL=y, even in cases where
CONFIG_NO_HZ_FULL_SYSIDLE=n.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Mon, 31 Mar 2014 20:13:02 +0000 (13:13 -0700)]
rcu: Document RCU_INIT_POINTER()'s lack of ordering guarantees
Although rcu_assign_pointer() provides ordering guarantees,
RCU_INIT_POINTER() does not. This commit makes that explicit
in the docbook comment header.
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
rcu: Merge rcu_sched_force_quiescent_state() with rcu_force_quiescent_state()
This patch merges the function rcu_force_quiescent_state() with
rcu_sched_force_quiescent_state(), using the rcu_state pointer. Firstly,
the rcu_sched_force_quiescent_state() function is deleted from the file
kernel/rcu/tree.c. Also, the rcu_force_quiescent_state() function that was
calling force_quiescent_state with the argument rcu_preempt_state pointer
was deleted as well. The new function that combines the old ones uses
the rcu_state pointer and is located after rcu_batches_completed_bh()
in kernel/rcu/tree.c.
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
rcu: Consolidate kfree_call_rcu() to use rcu_state pointer
kfree_call_rcu is defined two times. When defined under CONFIG_TREE_PREEMPT_RCU,
it uses rcu_preempt_state. Otherwise, it uses rcu_sched_state.
This patch uses the rcu_state_pointer to combine the two definitions into one.
The resulting function is placed after the closing of the preprocessor
conditional CONFIG_TREE_PREEMPT_RCU.
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
rcu: Add event tracing to dyntick_save_progress_counter().
This patch adds event tracing to dyntick_save_progress_counter() in the case
where it returns 1. I used the tracepoint string "dti" because this function
returns 1 in case the CPU is in dynticks idle mode.
Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Himangi Saraogi [Wed, 12 Mar 2014 16:16:46 +0000 (21:46 +0530)]
rcu: Protect uses of ->jiffies_stall with ACCESS_ONCE()
Some of the accesses to the rcu_state structure's ->jiffies_stall
field are unprotected. This patch protects them with ACCESS_ONCE().
The following coccinelle script was used to acheive this:
/* coccinelle script to protect uses of ->jiffies_stall with ACCESS_ONCE() */
@@
identifier a;
@@
(
ACCESS_ONCE(a->jiffies_stall)
|
- a->jiffies_stall
+ ACCESS_ONCE(a->jiffies_stall)
)
Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Iulia Manda [Wed, 12 Mar 2014 16:37:24 +0000 (18:37 +0200)]
rcu: Remove "extern" from function declaration in include/linux/rcupdate.h
Because functions have the extern storage class specifier by default,
this keyword can be removed. It is redundant to use it explicitly.
Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Tue, 11 Mar 2014 20:02:16 +0000 (13:02 -0700)]
rcu: Make callers awaken grace-period kthread
The rcu_start_gp_advanced() function currently uses irq_work_queue()
to defer wakeups of the RCU grace-period kthread. This deferring
is necessary to avoid RCU-scheduler deadlocks involving the rcu_node
structure's lock, meaning that RCU cannot call any of the scheduler's
wake-up functions while holding one of these locks.
Unfortunately, the second and subsequent calls to irq_work_queue() are
ignored, and the first call will be ignored (aside from queuing the work
item) if the scheduler-clock tick is turned off. This is OK for many
uses, especially those where irq_work_queue() is called from an interrupt
or softirq handler, because in those cases the scheduler-clock-tick state
will be re-evaluated, which will turn the scheduler-clock tick back on.
On the next tick, any deferred work will then be processed.
However, this strategy does not always work for RCU, which can be invoked
at process level from idle CPUs. In this case, the tick might never
be turned back on, indefinitely defering a grace-period start request.
Note that the RCU CPU stall detector cannot see this condition, because
there is no RCU grace period in progress. Therefore, we can (and do!)
see long tens-of-seconds stalls in grace-period handling. In theory,
we could see a full grace-period hang, but rcutorture testing to date
has seen only the tens-of-seconds stalls. Event tracing demonstrates
that irq_work_queue() is being called repeatedly to no effect during
these stalls: The "newreq" event appears repeatedly from a task that is
not one of the grace-period kthreads.
In theory, irq_work_queue() might be fixed to avoid this sort of issue,
but RCU's requirements are unusual and it is quite straightforward to pass
wake-up responsibility up through RCU's call chain, so that the wakeup
happens when the offending locks are released.
This commit therefore makes this change. The rcu_start_gp_advanced(),
rcu_start_future_gp(), rcu_accelerate_cbs(), rcu_advance_cbs(),
__note_gp_changes(), and rcu_start_gp() functions now return a boolean
which indicates when a wake-up is needed. A new rcu_gp_kthread_wake()
does the wakeup when it is necessary and safe to do so: No self-wakes,
no wake-ups if the ->gp_flags field indicates there is no need (as in
someone else did the wake-up before we got around to it), and no wake-ups
before the grace-period kthread has been created.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Iulia Manda [Tue, 11 Mar 2014 13:22:28 +0000 (15:22 +0200)]
rcu: Protect uses of jiffies_stall field with ACCESS_ONCE()
Some of the uses of the rcu_state structure's ->jiffies_stall field
do not use ACCESS_ONCE(), despite there being unprotected accesses.
This commit therefore uses the ACCESS_ONCE() macro to protect this field.
Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Iulia Manda [Tue, 11 Mar 2014 11:18:22 +0000 (13:18 +0200)]
rcu: Remove unused rcu_data structure field
The ->preemptible field in rcu_data is only initialized in the function
rcu_init_percpu_data(), and never used. This commit therefore removes
this field.
Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Mon, 10 Mar 2014 17:55:52 +0000 (10:55 -0700)]
rcu: Update cpu_needs_another_gp() for futures from non-NOCB CPUs
In the old days, the only source of requests for future grace periods
was NOCB CPUs. This has changed: CPUs routinely post requests for
future grace periods in order to promote power efficiency and reduce
OS jitter with minimal impact on grace-period latency. This commit
therefore updates cpu_needs_another_gp() to invoke rcu_future_needs_gp()
instead of rcu_nocb_needs_gp(). The latter is no longer used, so is
now removed. This commit also adds tracing for the irq_work_queue()
wakeup case.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
rcu: Print negatives for stall-warning counter wraparound
The print_other_cpu_stall() and print_cpu_stall() functions print
grace-period numbers using an unsigned format, which means that the number
one less than zero is a very large number. This commit therefore causes
these numbers to be printed with a signed format in order to improve
readability of the RCU CPU stall-warning output.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Liu Ping Fan [Mon, 24 Feb 2014 14:18:09 +0000 (06:18 -0800)]
rcu: Fix incorrect notes for code
Signed-off-by: Liu Ping Fan <kernelfans@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Tue, 18 Feb 2014 17:47:13 +0000 (09:47 -0800)]
rcu: Protect ->gp_flags accesses with ACCESS_ONCE()
A number of ->gp_flags accesses don't have ACCESS_ONCE(), but all of
the can race against other loads or stores. This commit therefore
applies ACCESS_ONCE() to the unprotected ->gp_flags accesses.
Reported-by: Alexey Roytman <alexey.roytman@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
documentation: Record rcu_dereference() value mishandling
Recent LKML discussings (see http://lwn.net/Articles/586838/ and
http://lwn.net/Articles/588300/ for the LWN writeups) brought out
some ways of misusing the return value from rcu_dereference() that
are not necessarily completely intuitive. This commit therefore
documents what can and cannot safely be done with these values.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney [Tue, 25 Feb 2014 17:47:34 +0000 (09:47 -0800)]
documentation: Update sysfs path for rcu_cpu_stall_suppress
Commit 6bfc09e2327d (rcu: Provide RCU CPU stall warnings for tiny RCU)
moved the rcu_cpu_stall_suppress module parameter from rcutree.c to
rcupdate.c, but failed to update Documentation/RCU/stallwarn.txt. This
commit therefore repairs this omission.
Reported-by: Kirill Tkhai <tkhai@yandex.ru> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
cifs: Use min_t() when comparing "size_t" and "unsigned long"
On 32 bit, size_t is "unsigned int", not "unsigned long", causing the
following warning when comparing with PAGE_SIZE, which is always "unsigned
long":
fs/cifs/file.c: In function ‘cifs_readdata_to_iov’:
fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast
Introduced by commit 7f25bba819a3 ("cifs_iovec_read: keep iov_iter
between the calls of cifs_readdata_to_iov()"), which changed the
signedness of "remaining" and the code from min_t() to min().
Merge branch 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux
Pull slab changes from Pekka Enberg:
"The biggest change is byte-sized freelist indices which reduces slab
freelist memory usage:
https://lkml.org/lkml/2013/12/2/64"
* 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux:
mm: slab/slub: use page->list consistently instead of page->lru
mm/slab.c: cleanup outdated comments and unify variables naming
slab: fix wrongly used macro
slub: fix high order page allocation problem with __GFP_NOFAIL
slab: Make allocations with GFP_ZERO slightly more efficient
slab: make more slab management structure off the slab
slab: introduce byte sized index for the freelist of a slab
slab: restrict the number of objects in a slab
slab: introduce helper functions to get/set free object
slab: factor out calculate nr objects in cache_estimate
Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull misc kbuild changes from Michal Marek:
"Here is the non-critical part of kbuild:
- One bogus coccinelle check removed, one check fixed not to suggest
the obsolete PTR_RET macro
- scripts/tags.sh does not index the generated *.mod.c files
- new objdiff tool to list differences between two versions of an
object file
- A fix for scripts/bootgraph.pl"
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
scripts/coccinelle: Use PTR_ERR_OR_ZERO
scripts/bootgraph.pl: Add graphic header
scripts: objdiff: detect object code changes between two commits
Coccicheck: Remove memcpy to struct assignment test
scripts/tags.sh: Ignore *.mod.c
sym53c8xx_2: Set DID_REQUEUE return code when aborting squeue
This patch fixes I/O errors with the sym53c8xx_2 driver when the disk
returns QUEUE FULL status.
When the controller encounters an error (including QUEUE FULL or BUSY
status), it aborts all not yet submitted requests in the function
sym_dequeue_from_squeue.
This function aborts them with DID_SOFT_ERROR.
If the disk has full tag queue, the request that caused the overflow is
aborted with QUEUE FULL status (and the scsi midlayer properly retries
it until it is accepted by the disk), but the sym53c8xx_2 driver aborts
the following requests with DID_SOFT_ERROR --- for them, the midlayer
does just a few retries and then signals the error up to sd.
The result is that disk returning QUEUE FULL causes request failures.
The error was reproduced on 53c895 with COMPAQ BD03685A24 disk
(rebranded ST336607LC) with command queue 48 or 64 tags. The disk has
64 tags, but under some access patterns it return QUEUE FULL when there
are less than 64 pending tags. The SCSI specification allows returning
QUEUE FULL anytime and it is up to the host to retry.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Mackerras [Fri, 11 Apr 2014 06:43:35 +0000 (16:43 +1000)]
powerpc: Don't try to set LPCR unless we're in hypervisor mode
Commit 8f619b5429d9 ("powerpc/ppc64: Do not turn AIL (reloc-on
interrupts) too early") added code to set the AIL bit in the LPCR
without checking whether the kernel is running in hypervisor mode. The
result is that when the kernel is running as a guest (i.e., under
PowerKVM or PowerVM), the processor takes a privileged instruction
interrupt at that point, causing a panic. The visible result is that
the kernel hangs after printing "returning from prom_init".
This fixes it by checking for hypervisor mode being available before
setting LPCR. If we are not in hypervisor mode, we enable relocation-on
interrupts later in pSeries_setup_arch using the H_SET_MODE hcall.
Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
futex: update documentation for ordering guarantees
Commits 11d4616bd07f ("futex: revert back to the explicit waiter
counting code") and 69cd9eba3886 ("futex: avoid race between requeue and
wake") changed some of the finer details of how we think about futexes.
One was a late fix and the other a consequence of overlooking the whole
requeuing logic.
The first change caused our documentation to be incorrect, and the
second made us aware that we need to explicitly add more details to it.
Pull yet more networking updates from David Miller:
1) Various fixes to the new Redpine Signals wireless driver, from
Fariya Fatima.
2) L2TP PPP connect code takes PMTU from the wrong socket, fix from
Dmitry Petukhov.
3) UFO and TSO packets differ in whether they include the protocol
header in gso_size, account for that in skb_gso_transport_seglen().
From Florian Westphal.
4) If VLAN untagging fails, we double free the SKB in the bridging
output path. From Toshiaki Makita.
5) Several call sites of sk->sk_data_ready() were referencing an SKB
just added to the socket receive queue in order to calculate the
second argument via skb->len. This is dangerous because the moment
the skb is added to the receive queue it can be consumed in another
context and freed up.
It turns out also that none of the sk->sk_data_ready()
implementations even care about this second argument.
So just kill it off and thus fix all these use-after-free bugs as a
side effect.
6) Fix inverted test in tcp_v6_send_response(), from Lorenzo Colitti.
7) pktgen needs to do locking properly for LLTX devices, from Daniel
Borkmann.
8) xen-netfront driver initializes TX array entries in RX loop :-) From
Vincenzo Maffione.
9) After refactoring, some tunnel drivers allow a tunnel to be
configured on top itself. Fix from Nicolas Dichtel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
vti: don't allow to add the same tunnel twice
gre: don't allow to add the same tunnel twice
drivers: net: xen-netfront: fix array initialization bug
pktgen: be friendly to LLTX devices
r8152: check RTL8152_UNPLUG
net: sun4i-emac: add promiscuous support
net/apne: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
net: ipv6: Fix oif in TCP SYN+ACK route lookup.
drivers: net: cpsw: enable interrupts after napi enable and clearing previous interrupts
drivers: net: cpsw: discard all packets received when interface is down
net: Fix use after free by removing length arg from sk_data_ready callbacks.
Drivers: net: hyperv: Address UDP checksum issues
Drivers: net: hyperv: Negotiate suitable ndis version for offload support
Drivers: net: hyperv: Allocate memory for all possible per-pecket information
bridge: Fix double free and memory leak around br_allowed_ingress
bonding: Remove debug_fs files when module init fails
i40evf: program RSS LUT correctly
i40evf: remove open-coded skb_cow_head
ixgb: remove open-coded skb_cow_head
igbvf: remove open-coded skb_cow_head
...
Merge tag 'remoteproc-3.15-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc
Pull remoteproc cleanups from Ohad Ben-Cohen:
"Several remoteproc cleanup patches coming from Jingoo Han, Julia
Lawall and Uwe Kleine-König"
* tag 'remoteproc-3.15-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/remoteproc:
remoteproc/ste_modem: staticize local symbols
remoteproc/davinci: simplify use of devm_ioremap_resource
remoteproc/davinci: drop needless devm_clk_put
Merge tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel
Pull llvm patches from Behan Webster:
"These are some initial updates to support compiling the kernel with
clang.
These patches have been through the proper reviews to the best of my
ability, and have been soaking in linux-next for a few weeks. These
patches by themselves still do not completely allow clang to be used
with the kernel code, but lay the foundation for other patches which
are still under review.
Several other of the LLVMLinux patches have been already added via
maintainer trees"
* tag 'llvmlinux-for-v3.15' of git://git.linuxfoundation.org/llvmlinux/kernel:
x86: LLVMLinux: Fix "incomplete type const struct x86cpu_device_id"
x86 kbuild: LLVMLinux: More cc-options added for clang
x86, acpi: LLVMLinux: Remove nested functions from Thinkpad ACPI
LLVMLinux: Add support for clang to compiler.h and new compiler-clang.h
LLVMLinux: Remove warning about returning an uninitialized variable
kbuild: LLVMLinux: Fix LINUX_COMPILER definition script for compilation with clang
Documentation: LLVMLinux: Update Documentation/dontdiff
kbuild: LLVMLinux: Adapt warnings for compilation with clang
kbuild: LLVMLinux: Add Kbuild support for building kernel with Clang
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"Here are the target pending updates for v3.15-rc1. Apologies in
advance for waiting until the second to last day of the merge window
to send these out.
The highlights this round include:
- iser-target support for T10 PI (DIF) offloads (Sagi + Or)
- Fix Task Aborted Status (TAS) handling in target-core (Alex Leung)
- Pass in transport supported PI at session initialization (Sagi + MKP + nab)
- Add WRITE_INSERT + READ_STRIP T10 PI support in target-core (nab + Sagi)
- Fix iscsi-target ERL=2 ASYNC_EVENT connection pointer bug (nab)
- Fix tcm_fc use-after-free of ft_tpg (Andy Grover)
- Use correct ib_sg_dma primitives in ib_isert (Mike Marciniszyn)
Also, note the virtio-scsi + vhost-scsi changes to expose T10 PI
metadata into KVM guest have been left-out for now, as there where a
few comments from MST + Paolo that where not able to be addressed in
time for v3.15. Please expect this feature for v3.16-rc1"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (43 commits)
ib_srpt: Use correct ib_sg_dma primitives
target/tcm_fc: Rename ft_tport_create to ft_tport_get
target/tcm_fc: Rename ft_{add,del}_lport to {add,del}_wwn
target/tcm_fc: Rename structs and list members for clarity
target/tcm_fc: Limit to 1 TPG per wwn
target/tcm_fc: Don't export ft_lport_list
target/tcm_fc: Fix use-after-free of ft_tpg
target: Add check to prevent Abort Task from aborting itself
target: Enable READ_STRIP emulation in target_complete_ok_work
target/sbc: Add sbc_dif_read_strip software emulation
target: Enable WRITE_INSERT emulation in target_execute_cmd
target/sbc: Add sbc_dif_generate software emulation
target/sbc: Only expose PI read_cap16 bits when supported by fabric
target/spc: Only expose PI mode page bits when supported by fabric
target/spc: Only expose PI inquiry bits when supported by fabric
target: Pass in transport supported PI at session initialization
target/iblock: Fix double bioset_integrity_free bug
Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist
target/rd: T10-Dif: RAM disk is allocating more space than required.
iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug
...
Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"A series of bug fix patches for v3.15-rc1. Most are just driver
fixes. There are some changes at remote controller core level, fixing
some definitions on a new API added for Kernel v3.15.
It also adds the missing include at include/uapi/linux/v4l2-common.h,
to allow its compilation on userspace, as pointed by you"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (24 commits)
[media] gpsca: remove the risk of a division by zero
[media] stk1160: warrant a NUL terminated string
[media] v4l: ti-vpe: retain v4l2_buffer flags for captured buffers
[media] v4l: ti-vpe: Set correct field parameter for output and capture buffers
[media] v4l: ti-vpe: zero out reserved fields in try_fmt
[media] v4l: ti-vpe: Fix initial configuration queue data
[media] v4l: ti-vpe: Use correct bus_info name for the device in querycap
[media] v4l: ti-vpe: report correct capabilities in querycap
[media] v4l: ti-vpe: Allow usage of smaller images
[media] v4l: ti-vpe: Use video_device_release_empty
[media] v4l: ti-vpe: Make sure in job_ready that we have the needed number of dst_bufs
[media] lgdt3305: include sleep functionality in lgdt3304_ops
[media] drx-j: use customise option correctly
[media] m88rs2000: fix sparse static warnings
[media] r820t: fix size and init values
[media] rc-core: remove generic scancode filter
[media] rc-core: split dev->s_filter
[media] rc-core: do not change 32bit NEC scancode format for now
[media] rtl28xxu: remove duplicate ID 0458:707f Genius TVGo DVB-T03
[media] xc2028: add missing break to switch
...
Merge tag 'ntb-3.15' of git://github.com/jonmason/ntb
Pull PCIe non-transparent bridge fixes and features from Jon Mason:
"NTB driver bug fixes to address issues in list traversal, skb leak in
ntb_netdev, a typo, and a leak of msix entries in the error path.
Clean ups of the event handling logic, as well as a overall style
cleanup. Finally, the driver was converted to use the new
pci_enable_msix_range logic (and the refactoring to go along with it)"
* tag 'ntb-3.15' of git://github.com/jonmason/ntb:
ntb: Use pci_enable_msix_range() instead of pci_enable_msix()
ntb: Split ntb_setup_msix() into separate BWD/SNB routines
ntb: Use pci_msix_vec_count() to obtain number of MSI-Xs
NTB: Code Style Clean-up
NTB: client event cleanup
ntb: Fix leakage of ntb_device::msix_entries[] array
NTB: Fix typo in setting one translation register
ntb_netdev: Fix skb free issue in open
ntb_netdev: Fix list_for_each_entry exit issue
In file included from fs/ceph/super.h:4:0,
from fs/ceph/ioctl.c:3:
include/linux/ceph/ceph_debug.h:4:0: warning: "pr_fmt" redefined [enabled by default]
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
^
In file included from include/linux/kernel.h:13:0,
from include/linux/uio.h:12,
from include/linux/socket.h:7,
from include/uapi/linux/in.h:22,
from include/linux/in.h:23,
from fs/ceph/ioctl.c:1:
include/linux/printk.h:214:0: note: this is the location of the previous definition
#define pr_fmt(fmt) fmt
^
where the reason is that <linux/ceph_debug.h> is included much too late
for the "pr_fmt()" define.
The include of <linux/ceph_debug.h> needs to be the first include in the
file, but fs/ceph/ioctl.c had for some reason missed that, and it wasn't
noticeable until some unrelated header file changes brought in an
indirect earlier include of <linux/kernel.h>.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"The first vfs pile, with deep apologies for being very late in this
window.
Assorted cleanups and fixes, plus a large preparatory part of iov_iter
work. There's a lot more of that, but it'll probably go into the next
merge window - it *does* shape up nicely, removes a lot of
boilerplate, gets rid of locking inconsistencie between aio_write and
splice_write and I hope to get Kent's direct-io rewrite merged into
the same queue, but some of the stuff after this point is having
(mostly trivial) conflicts with the things already merged into
mainline and with some I want more testing.
This one passes LTP and xfstests without regressions, in addition to
usual beating. BTW, readahead02 in ltp syscalls testsuite has started
giving failures since "mm/readahead.c: fix readahead failure for
memoryless NUMA nodes and limit readahead pages" - might be a false
positive, might be a real regression..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
missing bits of "splice: fix racy pipe->buffers uses"
cifs: fix the race in cifs_writev()
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
kill generic_file_buffered_write()
ocfs2_file_aio_write(): switch to generic_perform_write()
ceph_aio_write(): switch to generic_perform_write()
xfs_file_buffered_aio_write(): switch to generic_perform_write()
export generic_perform_write(), start getting rid of generic_file_buffer_write()
generic_file_direct_write(): get rid of ppos argument
btrfs_file_aio_write(): get rid of ppos
kill the 5th argument of generic_file_buffered_write()
kill the 4th argument of __generic_file_aio_write()
lustre: don't open-code kernel_recvmsg()
ocfs2: don't open-code kernel_recvmsg()
drbd: don't open-code kernel_recvmsg()
constify blk_rq_map_user_iov() and friends
lustre: switch to kernel_sendmsg()
ocfs2: don't open-code kernel_sendmsg()
take iov_iter stuff to mm/iov_iter.c
process_vm_access: tidy up a bit
...
David S. Miller [Sat, 12 Apr 2014 21:03:20 +0000 (17:03 -0400)]
Merge branch 'tunnels'
Nicolas Dichtel says:
====================
tunnels: don't allow to add the same tunnel twice
This series fixes the check of an existing tunnel with the same
parameters when a new tunnel is added. I've checked all users of
ip_tunnel_newlink(): gre, gretap, ipip and vti. The bug exists only
for gre and vti.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Fri, 11 Apr 2014 13:51:19 +0000 (15:51 +0200)]
vti: don't allow to add the same tunnel twice
Before the patch, it was possible to add two times the same tunnel:
ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41
ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41
It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
argument dev->type, which was set only later (when calling ndo_init handler
in register_netdevice()). Let's set this type in the setup handler, which is
called before newlink handler.
Introduced by commit b9959fd3b0fa ("vti: switch to new ip tunnel code").
CC: Cong Wang <amwang@redhat.com> CC: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Fri, 11 Apr 2014 13:51:18 +0000 (15:51 +0200)]
gre: don't allow to add the same tunnel twice
Before the patch, it was possible to add two times the same tunnel:
ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249
ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249
It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
argument dev->type, which was set only later (when calling ndo_init handler
in register_netdevice()). Let's set this type in the setup handler, which is
called before newlink handler.
Introduced by commit c54419321455 ("GRE: Refactor GRE tunneling code.").
CC: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the initialization of an array used in the TX
datapath that was mistakenly initialized together with the
RX datapath arrays. An out of range array access could happen
when RX and TX rings had different sizes.
Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 12 Apr 2014 20:36:44 +0000 (16:36 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to e1000, e1000e, igb, igbvf, ixgb, ixgbe,
ixgbevf and i40evf.
Mark fixes an issue with ixgbe and ixgbevf by adding a bit to indicate
when workqueues have been initialized. This permits the register read
error handling from attempting to use them prior to that, which also
generates warnings. Checking for a detected removal after initializing
the work queues allows the probe function to return an error without
getting the workqueue involved. Further, if the error_detected
callback is entered before the workqueues are initialized, exit without
recovery since the device initialization was so truncated.
Francois Romieu provides several patches to all the drivers to remove
the open coded skb_cow_head.
Jakub Kicinski provides a fix for igb where last_rx_timestamp should be
updated only when Rx time stamp is read.
Mitch provides a fix for i40evf where a recent change broke the RSS LUT
programming causing it to be programmed with all 0's.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull more tracing updates from Steven Rostedt:
"This includes the final patch to clean up and fix the issue with the
design of tracepoints and how a user could register a tracepoint and
have that tracepoint not be activated but no error was shown.
The design was for an out of tree module but broke in tree users. The
clean up was to remove the saving of the hash table of tracepoint
names such that they can be enabled before they exist (enabling a
module tracepoint before that module is loaded). This added more
complexity than needed. The clean up was to remove that code and just
enable tracepoints that exist or fail if they do not.
This removed a lot of code as well as the complexity that it brought.
As a side effect, instead of registering a tracepoint by its name, the
tracepoint needs to be registered with the tracepoint descriptor.
This removes having to duplicate the tracepoint names that are
enabled.
The second patch was added that simplified the way modules were
searched for.
This cleanup required changes that were in the 3.15 queue as well as
some changes that were added late in the 3.14-rc cycle. This final
change waited till the two were merged in upstream and then the change
was added and full tests were run. Unfortunately, the test found some
errors, but after it was already submitted to the for-next branch and
not to be rebased. Sparse errors were detected by Fengguang Wu's bot
tests, and my internal tests discovered that the anonymous union
initialization triggered a bug in older gcc compilers. Luckily, there
was a bugzilla for the gcc bug which gave a work around to the
problem. The third and fourth patch handled the sparse error and the
gcc bug respectively.
A final patch was tagged along to fix a missing documentation for the
README file"
* tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add missing function triggers dump and cpudump to README
tracing: Fix anonymous unions in struct ftrace_event_call
tracepoint: Fix sparse warnings in tracepoint.c
tracepoint: Simplify tracepoint module search
tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints
* git://git.infradead.org/users/eparis/audit: (28 commits)
AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
audit: do not cast audit_rule_data pointers pointlesly
AUDIT: Allow login in non-init namespaces
audit: define audit_is_compat in kernel internal header
kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
sched: declare pid_alive as inline
audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
syscall_get_arch: remove useless function arguments
audit: remove stray newline from audit_log_execve_info() audit_panic() call
audit: remove stray newlines from audit_log_lost messages
audit: include subject in login records
audit: remove superfluous new- prefix in AUDIT_LOGIN messages
audit: allow user processes to log from another PID namespace
audit: anchor all pid references in the initial pid namespace
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t ppid of task in init_pid_ns
audit: rename the misleading audit_get_context() to audit_take_context()
audit: Add generic compat syscall support
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
...
Al Viro [Thu, 3 Apr 2014 14:27:17 +0000 (10:27 -0400)]
cifs: fix the race in cifs_writev()
O_APPEND handling there hadn't been completely fixed by Pavel's
patch; it checks the right value, but it's racy - we can't really
do that until i_mutex has been taken.
Fix by switching to __generic_file_aio_write() (open-coding
generic_file_aio_write(), actually) and pulling mutex_lock() above
inode_size_read().
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Daniel Borkmann [Fri, 11 Apr 2014 11:22:00 +0000 (13:22 +0200)]
pktgen: be friendly to LLTX devices
Similarly to commit 43279500deca ("packet: respect devices with
LLTX flag in direct xmit"), we can basically apply the very same
to pktgen. This will help testing against LLTX devices such as
dummy driver (or others), which only have a single netdevice txq
and would otherwise require locking their txq from pktgen side
while e.g. in dummy case, we would not need any locking. Fix this
by making use of HARD_TX_{UN,}LOCK API, so that NETIF_F_LLTX will
be respected.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the device is unplugged, the driver would try to disable the
device. Add checking the flag of RTL8152_UNPLUG to skip setting
the device when it is unplugged. This could shorten the time of
unloading the driver.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marc Zyngier [Fri, 11 Apr 2014 09:46:17 +0000 (10:46 +0100)]
net: sun4i-emac: add promiscuous support
The sun4i-emac driver is rather primitive, and doesn't support
promiscuous mode. This makes usage such as bridging impossible,
which is a shame on virtualization capable HW such as the
Allwinner A20.
The fix is fairly simple: move the RX setup code to the ndo_set_rx_mode
vector, and add the required HW configuration when IFF_PROMISC is passed
by the core code.
This has been tested on a generic A20 box running a few virtual
machines hanging off a bridge with the EMAC chip as the link to the
outside world.
Cc: Stefan Roese <sr@denx.de> Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Stefan Roese <sr@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>