git.karo-electronics.de Git - karo-tx-linux.git/log

rcu: Handle unbalanced rcu_node configurations with few CPUs

If CONFIG_RCU_FANOUT_EXACT=y, if there are not enough CPUs (according
to nr_cpu_ids) to require more than a single rcu_node structure, but if
NR_CPUS is larger than would fit into a single rcu_node structure, then
the current rcu_init_levelspread() code is subject to integer overflow
in the eight-bit ->levelspread[] array in the rcu_state structure.

In this case, the solution is -not- to increase the size of the
elements in this array because the values in that array should be
constrained to the number of bits in an unsigned long. Instead, this
commit replaces NR_CPUS with nr_cpu_ids in the rcu_init_levelspread()
function's initialization of the cprv local variable. This results in
all of the arithmetic being consistently based off of the nr_cpu_ids
value, thus avoiding the overflow, which was caused by the mixing of
nr_cpu_ids and NR_CPUS.

Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

rcu: Simplify quiescent-state detection

The current quiescent-state detection algorithm is needlessly
complex. It records the grace-period number corresponding to
the quiescent state at the time of the quiescent state, which
works, but it seems better to simply erase any record of previous
quiescent states at the time that the CPU notices the new grace
period. This has the further advantage of removing another piece
of RCU for which lockless reasoning is required.

Therefore, this commit makes this change.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Reduce synchronize_rcu_expedited() latency

The synchronize_rcu_expedited() function disables interrupts across a
scan of all leaf rcu_node structures, which is not good for real-time
scheduling latency on large systems (hundreds or especially thousands
of CPUs). This commit therefore holds off CPU-hotplug operations using
get_online_cpus(), and removes the prior acquisiion of the ->onofflock
(which required disabling interrupts).

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Eliminate signed overflow in synchronize_rcu_expedited()

In the C language, signed overflow is undefined. It is true that
twos-complement arithmetic normally comes to the rescue, but if the
compiler can subvert this any time it has any information about the values
being compared. For example, given "if (a - b > 0)", if the compiler
has enough information to realize that (for example) the value of "a"
is positive and that of "b" is negative, the compiler is within its
rights to optimize to a simple "if (1)", which might not be what you want.

This commit therefore converts synchronize_rcu_expedited()'s work-done
detection counter from signed to unsigned.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Adjust for unconditional ->completed assignment

Now that the rcu_node structures' ->completed fields are unconditionally
assigned at grace-period cleanup time, they should already have the
correct value for the new grace period at grace-period initialization
time. This commit therefore inserts a WARN_ON_ONCE() to verify this
invariant.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Add random PROVE_RCU_DELAY to grace-period initialization

Preemption greatly raised the probability of certain types of race
conditions, so this commit adds an anti-heisenbug to greatly increase
the collision cross section, also known as the probability of occurrence.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Fix day-zero grace-period initialization/cleanup race

The current approach to grace-period initialization is vulnerable to
extremely low-probability races.  These races stem from the fact that
the old grace period is marked completed on the same traversal through
the rcu_node structure that is marking the start of the new grace period.
This means that some rcu_node structures will believe that the old grace
period is still in effect at the same time that other rcu_node structures
believe that the new grace period has already started.

These sorts of disagreements can result in too-short grace periods,
as shown in the following scenario:

1. CPU 0 completes a grace period, but needs an additional
grace period, so starts initializing one, initializing all
the non-leaf rcu_node structures and the first leaf rcu_node
structure.  Because CPU 0 is both completing the old grace
period and starting a new one, it marks the completion of
the old grace period and the start of the new grace period
in a single traversal of the rcu_node structures.

Therefore, CPUs corresponding to the first rcu_node structure
can become aware that the prior grace period has completed, but
CPUs corresponding to the other rcu_node structures will see
this same prior grace period as still being in progress.

2. CPU 1 passes through a quiescent state, and therefore informs
the RCU core.  Because its leaf rcu_node structure has already
been initialized, this CPU's quiescent state is applied to the
new (and only partially initialized) grace period.

3. CPU 1 enters an RCU read-side critical section and acquires
a reference to data item A.  Note that this CPU believes that
its critical section started after the beginning of the new
grace period, and therefore will not block this new grace period.

4. CPU 16 exits dyntick-idle mode.  Because it was in dyntick-idle
mode, other CPUs informed the RCU core of its extended quiescent
state for the past several grace periods.  This means that CPU 16
is not yet aware that these past grace periods have ended.  Assume
that CPU 16 corresponds to the second leaf rcu_node structure --
which has not yet been made aware of the new grace period.

5. CPU 16 removes data item A from its enclosing data structure
and passes it to call_rcu(), which queues a callback in the
RCU_NEXT_TAIL segment of the callback queue.

6. CPU 16 enters the RCU core, possibly because it has taken a
scheduling-clock interrupt, or alternatively because it has
more than 10,000 callbacks queued.  It notes that the second
most recent grace period has completed (recall that because it
corresponds to the second as-yet-uninitialized rcu_node structure,
it cannot yet become aware that the most recent grace period has
completed), and therefore advances its callbacks.  The callback
for data item A is therefore in the RCU_NEXT_READY_TAIL segment
of the callback queue.

7. CPU 0 completes initialization of the remaining leaf rcu_node
structures for the new grace period, including the structure
corresponding to CPU 16.

8. CPU 16 again enters the RCU core, again, possibly because it has
taken a scheduling-clock interrupt, or alternatively because
it now has more than 10,000 callbacks queued. It notes that
the most recent grace period has ended, and therefore advances
its callbacks. The callback for data item A is therefore in
the RCU_DONE_TAIL segment of the callback queue.

9. All CPUs other than CPU 1 pass through quiescent states.  Because
CPU 1 already passed through its quiescent state, the new grace
period completes.  Note that CPU 1 is still in its RCU read-side
critical section, still referencing data item A.

10. Suppose that CPU 2 wais the last CPU to pass through a quiescent
state for the new grace period, and suppose further that CPU 2
did not have any callbacks queued, therefore not needing an
additional grace period.  CPU 2 therefore traverses all of the
rcu_node structures, marking the new grace period as completed,
but does not initialize a new grace period.

11. CPU 16 yet again enters the RCU core, yet again possibly because
it has taken a scheduling-clock interrupt, or alternatively
because it now has more than 10,000 callbacks queued. It notes
that the new grace period has ended, and therefore advances
its callbacks. The callback for data item A is therefore in
the RCU_DONE_TAIL segment of the callback queue.  This means
that this callback is now considered ready to be invoked.

12. CPU 16 invokes the callback, freeing data item A while CPU 1
is still referencing it.

This scenario represents a day-zero bug for TREE_RCU.  This commit
therefore ensures that the old grace period is marked completed in
all leaf rcu_node structures before a new grace period is marked
started in any of them.

That said, it would have been insanely difficult to force this race to
happen before the grace-period initialization process was preemptible.
Therefore, this commit is not a candidate for -stable.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Conflicts:

kernel/rcutree.c

rcu: Make rcutree module parameters visible in sysfs

The module parameters blimit, qhimark, and qlomark (and more
recently, rcu_fanout_leaf) have permission masks of zero, so
that their values are not visible from sysfs. This is unnecessary
and inconvenient to administrators who might like an easy way to
see what these values are on a running system. This commit therefore
sets their permission masks to 0444, allowing them to be read but
not written.

Reported-by: Rusty Russell <rusty@ozlabs.org>
Reported-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Control grace-period duration from sysfs

Although almost everyone is well-served by the defaults, some uses of RCU
benefit from shorter grace periods, while others benefit more from the
greater efficiency provided by longer grace periods.  Situations requiring
a large number of grace periods to elapse (and wireshark startup has
been called out as an example of this) are helped by lower-latency
grace periods.  Furthermore, in some embedded applications, people are
willing to accept a small degradation in update efficiency (due to there
being more of the shorter grace-period operations) in order to gain the
lower latency.

In contrast, those few systems with thousands of CPUs need longer grace
periods because the CPU overhead of a grace period rises roughly
linearly with the number of CPUs.  Such systems normally do not make
much use of facilities that require large numbers of grace periods to
elapse, so this is a good tradeoff.

Therefore, this commit allows the durations to be controlled from sysfs.
There are two sysfs parameters, one named "jiffies_till_first_fqs" that
specifies the delay in jiffies from the end of grace-period initialization
until the first attempt to force quiescent states, and the other named
"jiffies_till_next_fqs" that specifies the delay (again in jiffies)
between subsequent attempts to force quiescent states.  They both default
to three jiffies, which is compatible with the old hard-coded behavior.

At some future time, it may be possible to automatically increase the
grace-period length with the number of CPUs, but we do not yet have
sufficient data to do a good job.  Preliminary data indicates that we
should add an addiitonal jiffy to each of the delays for every 200 CPUs
in the system, but more experimentation is needed.  For now, the number
of systems with more than 1,000 CPUs is small enough that this can be
relegated to boot-time hand tuning.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Prevent force_quiescent_state() memory contention

Large systems running RCU_FAST_NO_HZ kernels see extreme memory
contention on the rcu_state structure's ->fqslock field. This
can be avoided by disabling RCU_FAST_NO_HZ, either at compile time
or at boot time (via the nohz kernel boot parameter), but large
systems will no doubt become sensitive to energy consumption.
This commit therefore uses a combining-tree approach to spread the
memory contention across new cache lines in the leaf rcu_node structures.
This can be thought of as a tournament lock that has only a try-lock
acquisition primitive.

The effect on small systems is minimal, because such systems have
an rcu_node "tree" consisting of a single node. In addition, this
functionality is not used on fastpaths.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Adjust debugfs tracing for kthread-based quiescent-state forcing

Moving quiescent-state forcing into a kthread dispenses with the need
for the ->n_rp_need_fqs field, so this commit removes it.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Allow RCU quiescent-state forcing to be preempted

RCU quiescent-state forcing is currently carried out without preemption
points, which can result in excessive latency spikes on large systems
(many hundreds or thousands of CPUs). This patch therefore inserts
a voluntary preemption point into force_qs_rnp(), which should greatly
reduce the magnitude of these spikes.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Move quiescent-state forcing into kthread

As the first step towards allowing quiescent-state forcing to be
preemptible, this commit moves RCU quiescent-state forcing into the
same kthread that is now used to initialize and clean up after grace
periods. This is yet another step towards keeping scheduling
latency down to a dull roar.

Updated to change from raw_spin_lock_irqsave() to raw_spin_lock_irq()
and to remove the now-unused rcu_state structure fields as suggested by
Peter Zijlstra.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

rcu: Segregate rcu_state fields to improve cache locality

The fields in the rcu_state structure that are protected by the
root rcu_node structure's ->lock can share a cache line with the
fields protected by ->onofflock. This can result in excessive
memory contention on large systems, so this commit applies
____cacheline_internodealigned_in_smp to the ->onofflock field in
order to segregate them.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Provide OOM handler to motivate lazy RCU callbacks

In kernels built with CONFIG_RCU_FAST_NO_HZ=y, CPUs can accumulate a
large number of lazy callbacks, which as the name implies will be slow
to be invoked.  This can be a problem on small-memory systems, where the
default 6-second sleep for CPUs having only lazy RCU callbacks could well
be fatal.  This commit therefore installs an OOM hander that ensures that
every CPU with lazy callbacks has at least one non-lazy callback, in turn
ensuring timely advancement for these callbacks.

Updated to fix bug that disabled OOM killing, noted by Lai Jiangshan.

Updated to push the for_each_rcu_flavor() loop into rcu_oom_notify_cpu(),
thus reducing the number of IPIs, as suggested by Steven Rostedt.  Also
to make the for_each_online_cpu() loop be preemptible.  (Later, it might
be good to use smp_call_function(), as suggested by Peter Zijlstra.)

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Prevent offline CPUs from executing RCU core code

Earlier versions of RCU invoked the RCU core from the CPU_DYING notifier
in order to note a quiescent state for the outgoing CPU.  Because the
CPU is marked "offline" during the execution of the CPU_DYING notifiers,
the RCU core had to tolerate being invoked from an offline CPU.  However,
commit b1420f1c (Make rcu_barrier() less disruptive) left only tracing
code in the CPU_DYING notifier, so the RCU core need no longer execute
on offline CPUs.  This commit therefore enforces this restriction.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Break up rcu_gp_kthread() into subfunctions

Then rcu_gp_kthread() function is too large and furthermore needs to
have the force_quiescent_state() code pulled in. This commit therefore
breaks up rcu_gp_kthread() into rcu_gp_init() and rcu_gp_cleanup().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Allow RCU grace-period cleanup to be preempted

RCU grace-period cleanup is currently carried out with interrupts
disabled, which can result in excessive latency spikes on large systems
(many hundreds or thousands of CPUs). This patch therefore makes the
RCU grace-period cleanup be preemptible, including voluntary preemption
points, which should eliminate those latency spikes. Similar spikes from
forcing of quiescent states will be dealt with similarly by later patches.

Updated to replace uses of spin_lock_irqsave() with spin_lock_irq(), as
suggested by Peter Zijlstra.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Move RCU grace-period cleanup into kthread

As a first step towards allowing grace-period cleanup to be preemptible,
this commit moves the RCU grace-period cleanup into the same kthread
that is now used to initialize grace periods. This is needed to keep
scheduling latency down to a dull roar.

[ paulmck: Get rid of stray spin_lock_irqsave() calls. ]

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Allow RCU grace-period initialization to be preempted

RCU grace-period initialization is currently carried out with interrupts
disabled, which can result in 200-microsecond latency spikes on systems
on which RCU has been configured for 4096 CPUs. This patch therefore
makes the RCU grace-period initialization be preemptible, which should
eliminate those latency spikes. Similar spikes from grace-period cleanup
and the forcing of quiescent states will be dealt with similarly by later
patches.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

rcu: Prevent initialization-time quiescent-state race

The next step in reducing RCU's grace-period initialization latency on
large systems will make this initialization preemptible.  Unfortunately,
making the grace-period initialization subject to interrupts (let alone
preemption) exposes the following race on systems whose rcu_node tree
contains more than one node:

1. CPU 31 starts initializing the grace period, including the
     first leaf rcu_node structures, and is then preempted.

2. CPU 0 refers to the first leaf rcu_node structure, and notes
     that a new grace period has started.  It passes through a
     quiescent state shortly thereafter, and informs the RCU core
     of this rite of passage.

3. CPU 0 enters an RCU read-side critical section, acquiring
     a pointer to an RCU-protected data item.

4. CPU 31 takes an interrupt whose handler removes the data item
referenced by CPU 0 from the data structure, and registers an
RCU callback in order to free it.

5. CPU 31 resumes initializing the grace period, including its
     own rcu_node structure.  In invokes rcu_start_gp_per_cpu(),
     which advances all callbacks, including the one registered
     in #4 above, to be handled by the current grace period.

6. The remaining CPUs pass through quiescent states and inform
     the RCU core, but CPU 0 remains in its RCU read-side critical
     section, still referencing the now-removed data item.

7. The grace period completes and all the callbacks are invoked,
     including the one that frees the data item that CPU 0 is still
     referencing.  Oops!!!

One way to avoid this race is to remove grace-period acceleration from
rcu_start_gp_per_cpu().  Now, the only reason for this acceleration was
to allow CPUs bringing RCU out of idle state to have their callbacks
invoked after only one grace period, rather than the two grace periods
that would otherwise be required.  But this acceleration does not
work when RCU grace-period initialization is moved to a kthread because
the CPU posting the callback is no longer necessarily the CPU that is
initializing the resulting grace period.

This commit therefore removes this now-pointless (and soon to be dangerous)
grace-period acceleration, thus avoiding the above race.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

rcu: Move RCU grace-period initialization into a kthread

As the first step towards allowing grace-period initialization to be
preemptible, this commit moves the RCU grace-period initialization
into its own kthread. This is needed to keep large-system scheduling
latency at reasonable levels.

Also change raw_spin_lock_irqsave() to raw_spin_lock_irq() as suggested
by Peter Zijlstra in review comments.

Reported-by: Mike Galbraith <mgalbraith@suse.de>
Reported-by: Dimitri Sivanich <sivanich@sgi.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>

trace: Don't declare trace_*_rcuidle functions in modules

Tracepoints declare a static inline trace_*_rcuidle variant of the trace
function, to support safely generating trace events from the idle loop.
Module code never actually uses that variant of trace functions, because
modules don't run code that needs tracing with RCU idled. However, the
declaration of those otherwise unused functions causes the module to
reference rcu_idle_exit and rcu_idle_enter, which RCU does not export to
modules.

To avoid this, don't generate trace_*_rcuidle functions for tracepoints
declared in module code.

Link: http://lkml.kernel.org/r/20120905062306.GA14756@leaf
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

Linux 3.6-rc5

Merge branch 'fixes-for-3.6' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping

Pull DMA-mapping fixes from Marek Szyprowski:
"Another set of fixes for ARM dma-mapping subsystem.

  Commit e9da6e9905e6 replaced custom consistent buffer remapping code
  with generic vmalloc areas.  It however introduced some regressions
  caused by limited support for allocations in atomic context.  This
  series contains fixes for those regressions.

  For some subplatforms the default, pre-allocated pool for atomic
  allocations turned out to be too small, so a function for setting its
  size has been added.

  Another set of patches adds support for atomic allocations to
  IOMMU-aware DMA-mapping implementation.

  The last part of this pull request contains two fixes for Contiguous
  Memory Allocator, which relax too strict requirements."

* 'fixes-for-3.6' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
  ARM: dma-mapping: IOMMU allocates pages from atomic_pool with GFP_ATOMIC
  ARM: dma-mapping: Introduce __atomic_get_pages() for __iommu_get_pages()
  ARM: dma-mapping: Refactor out to introduce __in_atomic_pool
  ARM: dma-mapping: atomic_pool with struct page **pages
  ARM: Kirkwood: increase atomic coherent pool size
  ARM: DMA-Mapping: print warning when atomic coherent allocation fails
  ARM: DMA-Mapping: add function for setting coherent pool size from platform code
  ARM: relax conditions required for enabling Contiguous Memory Allocator
  mm: cma: fix alignment requirements for contiguous regions

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input subsystem updates from Dmitry Torokhov.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: wacom - add support for EMR on Cintiq 24HD touch
  Input: i8042 - add Gigabyte T1005 series netbooks to noloop table
  Input: imx_keypad - reset the hardware before enabling
  Input: edt-ft5x06 - fix build error when compiling wthout CONFIG_DEBUG_FS

Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid

Pull HID updates from Jiri Kosina:
"It contains a fix for Eaton Ellipse MAX UPS from Alan Stern,
  performance improvement (not processing debug data if noone is
  interested), by Henrik Rydberg, and allowing tpkbd-driven devices to
  work even with generic driver in a crippled mode, by Andres Freund."

* 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured
  HID: Only dump input if someone is listening
  HID: add NOGET quirk for Eaton Ellipse MAX UPS

HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured

c1dcad2d32d0252e8a3023d20311b52a187ecda3 added a new driver configured by
HID_LENOVO_TPKBD but made the hid_have_special_driver entry non-optional which
lead to a recognized but non-working device if the new driver wasn't
configured (which is the correct default).

Signed-off-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

Merge tag 'stable/for-linus-3.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
* Fix for TLB flushing introduced in v3.6
* Fix Xen-SWIOTLB not using proper DMA mask - device had 64bit but
   in a 32-bit kernel we need to allocate for coherent pages from a
   32-bit pool.
* When trying to re-use P2M nodes we had a one-off error and triggered
   a BUG_ON check with specific CONFIG_ option.
* When doing FLR in Xen-PCI-backend we would first do FLR then save the
   PCI configuration space. We needed to do it the other way around.

* tag 'stable/for-linus-3.6-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pciback: Fix proper FLR steps.
  xen: Use correct masking in xen_swiotlb_alloc_coherent.
  xen: fix logical error in tlb flushing
  xen/p2m: Fix one-off error in checking the P2M tree directory.

Merge tag '3.6-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI updates from Bjorn Helgaas:
"Power management
    - PCI/PM: Enable D3/D3cold by default for most devices
    - PCI/PM: Keep parent bridge active when probing device
    - PCI/PM: Fix config reg access for D3cold and bridge suspending
    - PCI/PM: Add ABI document for sysfs file d3cold_allowed
  Core
    - PCI: Don't print anything while decoding is disabled"

* tag '3.6-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: Don't print anything while decoding is disabled
  PCI/PM: Add ABI document for sysfs file d3cold_allowed
  PCI/PM: Fix config reg access for D3cold and bridge suspending
  PCI/PM: Keep parent bridge active when probing device
  PCI/PM: Enable D3/D3cold by default for most devices

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC bug fixes from Olof Johansson:
"Mostly Renesas and Atmel bugfixes this time, targeting boot and build
  problems.  A couple of patches for gemini and kirkwood as well.  On a
  whole nothing very controversial."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: gemini: fix the gemini build
  ARM: shmobile: armadillo800eva: enable rw rootfs mount
  ARM: Kirkwood: Fix 'SZ_1M' undeclared here for db88f6281-bp-setup.c
  ARM: shmobile: mackerel: fixup usb module order
  ARM: shmobile: armadillo800eva: fixup: sound card detection order
  ARM: shmobile: marzen: fixup smsc911x id for regulator
  ARM: at91/feature-removal-schedule: delay at91_mci removal
  ARM: mach-shmobile: armadillo800eva: Enable power button as wakeup source
  ARM: mach-shmobile: armadillo800eva: Fix GPIO buttons descriptions
  ARM: at91/dts: remove partial parameter in at91sam9g25ek.dts
  ARM: at91/clock: fix PLLA overclock warning
  ARM: at91: fix rtc-at91sam9 irq issue due to sparse irq support
  ARM: at91: fix system timer irq issue due to sparse irq support
  ARM: shmobile: sh73a0: fixup RELOC_BASE of intca_irq_pins_desc

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull a hwmon fix from Guenter Roeck:
"One patch, fixing DIV_ROUND_CLOSEST to support negative dividends.

  While the changes are not in the drivers/hwmon directory, the problem
  primarily affects hwmon drivers, and it makes sense to push the patch
  through the hwmon tree."

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  linux/kernel.h: Fix DIV_ROUND_CLOSEST to support negative dividends

Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild

Pull kbuild fixes from Michal Marek:
"These are two fixes that should go into 3.6.  The link-vmlinux.sh one
  is obvious.

  The other one fixes make firmware_install with certain configurations,
  where a file in the toplevel firmware tree gets installed first, and
  $(INSTALL_FW_PATH)/$$(dir <file>) results in /lib/firmware/./, which
  confuses make 3.82 for some reason."

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  firmware: fix directory creation rule matching with make 3.82
  link-vmlinux.sh: Fix stray "echo" in error message

Remove user-triggerable BUG from mpol_to_str

Trivially triggerable, found by trinity:

  kernel BUG at mm/mempolicy.c:2546!
  Process trinity-child2 (pid: 23988, threadinfo ffff88010197e000, task ffff88007821a670)
  Call Trace:
    show_numa_map+0xd5/0x450
    show_pid_numa_map+0x13/0x20
    traverse+0xf2/0x230
    seq_read+0x34b/0x3e0
    vfs_read+0xac/0x180
    sys_pread64+0xa2/0xc0
    system_call_fastpath+0x1a/0x1f
  RIP: mpol_to_str+0x156/0x360

Cc: stable@vger.kernel.org
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xen/pciback: Fix proper FLR steps.

When we do FLR and save PCI config we did it in the wrong order.
The end result was that if a PCI device was unbind from
its driver, then binded to xen-pciback, and then back to its
driver we would get:

> lspci -s 04:00.0
04:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
13:42:12 # 4 :~/
> echo "0000:04:00.0" > /sys/bus/pci/drivers/pciback/unbind
> modprobe e1000e
e1000e: Intel(R) PRO/1000 Network Driver - 2.0.0-k
e1000e: Copyright(c) 1999 - 2012 Intel Corporation.
e1000e 0000:04:00.0: Disabling ASPM L0s L1
e1000e 0000:04:00.0: enabling device (0000 -> 0002)
xen: registering gsi 48 triggering 0 polarity 1
Already setup the GSI :48
e1000e 0000:04:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
e1000e: probe of 0000:04:00.0 failed with error -2

This fixes it by first saving the PCI configuration space, then
doing the FLR.

Reported-by: Ren, Yongjie <yongjie.ren@intel.com>
Reported-and-Tested-by: Tobias Geiger <tobias.geiger@vido.info>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@vger.kernel.org

Merge tag 'mmc-fixes-for-3.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc

Pull MMC fixes from Chris Ball:
- a firmware bug on several Samsung MoviNAND eMMC models causes
   permanent corruption on the device when secure erase and secure trim
   requests are made, so we disable those requests on these eMMC devices.
- atmel-mci: fix a hang with some SD cards by waiting for not-busy flag.
- dw_mmc: low-power mode breaks SDIO interrupts; fix PIO error handling;
   fix handling of error interrupts.
- mxs-mmc: fix deadlocks; fix compile error due to dma.h arch change.
- omap: fix broken PIO mode causing memory corruption.
- sdhci-esdhc: fix card detection.

* tag 'mmc-fixes-for-3.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
  mmc: omap: fix broken PIO mode
  mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.
  mmc: dw_mmc: Disable low power mode if SDIO interrupts are used
  mmc: dw_mmc: fix error handling in PIO mode
  mmc: dw_mmc: correct mishandling error interrupt
  mmc: dw_mmc: amend using error interrupt status
  mmc: atmel-mci: not busy flag has also to be used for read operations
  mmc: sdhci-esdhc: break out early if clock is 0
  mmc: mxs-mmc: fix deadlock caused by recursion loop
  mmc: mxs-mmc: fix deadlock in SDIO IRQ case
  mmc: bfin_sdh: fix dma_desc_array build error

uml: fix compile error in deliver_alarm()

Fix the following compile error on UML.

  arch/um/os-Linux/time.c: In function 'deliver_alarm':
  arch/um/os-Linux/time.c:117:3: error: too few arguments to function 'alarm_handler'
  arch/um/os-Linux/internal.h:1:6: note: declared here

The error was introduced by commit d3c1cfcd ("um: pass siginfo to guest
process") in 3.6-rc1.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Martin Pärtel <martin.partel@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

dj: memory scribble in logi_dj

Allocate a structure not a pointer to it !

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc

Pull powerpc fixes from Benjamin Herrenschmidt:
"Here are a few fixes for 3.6 that were piling up while I was away or
  busy (I was mostly MIA a week or two before San Diego).

  Some fixes from Anton fixing up issues with our relatively new DSCR
  control feature, and a few other fixes that are either regressions or
  bugs nasty enough to warrant not waiting."

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Don't use __put_user() in patch_instruction
  powerpc: Make sure IPI handlers see data written by IPI senders
  powerpc: Restore correct DSCR in context switch
  powerpc: Fix DSCR inheritance in copy_thread()
  powerpc: Keep thread.dscr and thread.dscr_inherit in sync
  powerpc: Update DSCR on all CPUs when writing sysfs dscr_default
  powerpc/powernv: Always go into nap mode when CPU is offline
  powerpc: Give hypervisor decrementer interrupts their own handler
  powerpc/vphn: Fix arch_update_cpu_topology() return value

Merge tag 'gpio-fixes-for-v3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"These are some GPIO regression fixes for v3.6:
   - Erroneous debug message from of_get_named_gpio_flags()
   - Make sure the MC9S08DZ60 GPIO driver depend on I2C being compiled
     in (not module) or allmodconfig breaks.
   - Check return value from irq_alloc_descs() in the Emma Mobile GPIO
     driver.
   - Assign the owner field for the rdc321x driver so the module won't
     be removed if it has active GPIOs."

* tag 'gpio-fixes-for-v3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: rdc321x: Prevent removal of modules exporting active GPIOs
  gpio: em: Fix checking return value of irq_alloc_descs
  gpio: mc9s08dz60: Fix build error if I2C=m
  gpio: Fix debug message in of_get_named_gpio_flags()

Merge tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"There are nothing scaring, contains only small fixes for HD-audio and
  USB-audio:
   - EPSS regression fix and GPIO fix for HD-audio IDT codecs
   - A series of USB-audio regression fixes that are found since 3.5
     kernel"

* tag 'sound-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: snd-usb: fix cross-interface streaming devices
  ALSA: snd-usb: fix calls to next_packet_size
  ALSA: snd-usb: restore delay information
  ALSA: snd-usb: use list_for_each_safe for endpoint resources
  ALSA: snd-usb: Fix URB cancellation at stream start
  ALSA: hda - Don't trust codec EPSS bit for IDT 92HD83xx & co
  ALSA: hda - Avoid unnecessary parameter read for EPSS
  ALSA: hda - Do not set GPIOs for speakers on IDT if there are no speakers

Merge tag 'fbdev-fixes-for-3.6-1' of git://github.com/schandinat/linux-2.6

Pull fbdev fixes from Florian Tobias Schandinat:
- a fix by Paul Cercueil to prevent a possible buffer overflow
- a fix by Bruno Prémont to prevent a rare sleep in invalid context
- a fix by Julia Lawall for a double free in auo_k190x
- a fix by Dan Carpenter to prevent a division by zero in mb862xxfb
- a regression fix by Tomi Valkeinen for the SDI output in OMAP
- a fix by Grazvydas Ignotas to fix the console colors in OMAP

* tag 'fbdev-fixes-for-3.6-1' of git://github.com/schandinat/linux-2.6:
  OMAPFB: fix framebuffer console colors
  OMAPDSS: Fix SDI PLL locking
  video: mb862xxfb: prevent divide by zero bug
  drivers/video/auo_k190x.c: drop kfree of devm_kzalloc's data
  fbcon: Fix bit_putcs() call to kmalloc(s, GFP_KERNEL)
  fbcon: prevent possible buffer overflow.

Merge tag 'upstream-3.6-rc5' of git://git.infradead.org/linux-ubi

Pull ubi fix from Artem Bityutskiy:
"A single small fix for memory deallocation: we allocated memory using
  'kmem_cache_alloc()' but were freeing it using 'kfree()' in some
  cases.  Now we fix this by using 'kmem_cache_free()' instead."

* tag 'upstream-3.6-rc5' of git://git.infradead.org/linux-ubi:
  UBI: fix a horrible memory deallocation bug

Fix order of arguments to compat_put_time[spec|val]

Commit 644595f89620 ("compat: Handle COMPAT_USE_64BIT_TIME in
net/socket.c") introduced a bug where the helper functions to take
either a 64-bit or compat time[spec|val] got the arguments in the wrong
order, passing the kernel stack pointer off as a user pointer (and vice
versa).

Because of the user address range check, that in turn then causes an
EFAULT due to the user pointer range checking failing for the kernel
address. Incorrectly resuling in a failed system call for 32-bit
processes with a 64-bit kernel.

On odder architectures like HP-PA (with separate user/kernel address
spaces), it can be used read kernel memory.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xen: Use correct masking in xen_swiotlb_alloc_coherent.

When running 32-bit pvops-dom0 and a driver tries to allocate a coherent
DMA-memory the xen swiotlb-implementation returned memory beyond 4GB.

The underlaying reason is that if the supplied driver passes in a
DMA_BIT_MASK(64) ( hwdev->coherent_dma_mask is set to 0xffffffffffffffff)
our dma_mask will be u64 set to 0xffffffffffffffff even if we set it to
DMA_BIT_MASK(32) previously. Meaning we do not reset the upper bits.
By using the dma_alloc_coherent_mask function - it does the proper casting
and we get 0xfffffffff.

This caused not working sound on a system with 4 GB and a 64-bit
compatible sound-card with sets the DMA-mask to 64bit.

On bare-metal and the forward-ported xen-dom0 patches from OpenSuse a coherent
DMA-memory is always allocated inside the 32-bit address-range by calling
dma_alloc_coherent_mask.

This patch adds the same functionality to xen swiotlb and is a rebase of the
original patch from Ronny Hegewald which never got upstream b/c the
underlaying reason was not understood until now.

The original email with the original patch is in:
http://old-list-archives.xen.org/archives/html/xen-devel/2010-02/msg00038.html
the original thread from where the discussion started is in:
http://old-list-archives.xen.org/archives/html/xen-devel/2010-01/msg00928.html

Signed-off-by: Ronny Hegewald <ronny.hegewald@online.de>
Signed-off-by: Stefano Panella <stefano.panella@citrix.com>
Acked-By: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@vger.kernel.org

xen: fix logical error in tlb flushing

While TLB_FLUSH_ALL gets passed as 'end' argument to
flush_tlb_others(), the Xen code was made to check its 'start'
parameter. That may give a incorrect op.cmd to MMUEXT_INVLPG_MULTI
instead of MMUEXT_TLB_FLUSH_MULTI. Then it causes some page can not
be flushed from TLB.

This patch fixed this issue.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Yongjie Ren <yongjie.ren@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Merge commit '4cb38750d49010ae72e718d46605ac9ba5a851b4' into stable/for-linus-3.6

* commit '4cb38750d49010ae72e718d46605ac9ba5a851b4': (6849 commits)
  bcma: fix invalid PMU chip control masks
  [libata] pata_cmd64x: whitespace cleanup
  libata-acpi: fix up for acpi_pm_device_sleep_state API
  sata_dwc_460ex: device tree may specify dma_channel
  ahci, trivial: fixed coding style issues related to braces
  ahci_platform: add hibernation callbacks
  libata-eh.c: local functions should not be exposed globally
  libata-transport.c: local functions should not be exposed globally
  sata_dwc_460ex: support hardreset
  ata: use module_pci_driver
  drivers/ata/pata_pcmcia.c: adjust suspicious bit operation
  pata_imx: Convert to clk_prepare_enable/clk_disable_unprepare
  ahci: Enable SB600 64bit DMA on MSI K9AGM2 (MS-7327) v2
  [libata] Prevent interface errors with Seagate FreeAgent GoFlex
  drivers/acpi/glue: revert accidental license-related 6b66d95895c bits
  libata-acpi: add missing inlines in libata.h
  i2c-omap: Add support for I2C_M_STOP message flag
  i2c: Fall back to emulated SMBus if the operation isn't supported natively
  i2c: Add SCCB support
  i2c-tiny-usb: Add support for the Robofuzz OSIF USB/I2C converter
  ...

xen/p2m: Fix one-off error in checking the P2M tree directory.

We would traverse the full P2M top directory (from 0->MAX_DOMAIN_PAGES
inclusive) when trying to figure out whether we can re-use some of the
P2M middle leafs.

Which meant that if the kernel was compiled with MAX_DOMAIN_PAGES=512
we would try to use the 512th entry. Fortunately for us the p2m_top_index
has a check for this:

BUG_ON(pfn >= MAX_P2M_PFN);

which we hit and saw this:

(XEN) domain_crash_sync called from entry.S
(XEN) Domain 0 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.1.2-OVM  x86_64  debug=n  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff819cadeb>]
(XEN) RFLAGS: 0000000000000212   EM: 1   CONTEXT: pv guest
(XEN) rax: ffffffff81db5000   rbx: ffffffff81db4000   rcx: 0000000000000000
(XEN) rdx: 0000000000480211   rsi: 0000000000000000   rdi: ffffffff81db4000
(XEN) rbp: ffffffff81793db8   rsp: ffffffff81793d38   r8:  0000000008000000
(XEN) r9:  4000000000000000   r10: 0000000000000000   r11: ffffffff81db7000
(XEN) r12: 0000000000000ff8   r13: ffffffff81df1ff8   r14: ffffffff81db6000
(XEN) r15: 0000000000000ff8   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000661795000   cr2: 0000000000000000

Fixes-Oracle-Bug: 14570662
CC: stable@vger.kernel.org # only for v3.5
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

powerpc: Don't use __put_user() in patch_instruction

patch_instruction() can be called very early on ppc32, when the kernel
isn't yet running at it's linked address. That can cause the !
is_kernel_addr() test in __put_user() to trip and call might_sleep()
which is very bad at that point during boot.

Use a lower level function instead for now, at least until we get to
rework ppc32 boot process to do the code patching later, like ppc64
does.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Make sure IPI handlers see data written by IPI senders

We have been observing hangs, both of KVM guest vcpu tasks and more
generally, where a process that is woken doesn't properly wake up and
continue to run, but instead sticks in TASK_WAKING state.  This
happens because the update of rq->wake_list in ttwu_queue_remote()
is not ordered with the update of ipi_message in
smp_muxed_ipi_message_pass(), and the reading of rq->wake_list in
scheduler_ipi() is not ordered with the reading of ipi_message in
smp_ipi_demux().  Thus it is possible for the IPI receiver not to see
the updated rq->wake_list and therefore conclude that there is nothing
for it to do.

In order to make sure that anything done before smp_send_reschedule()
is ordered before anything done in the resulting call to scheduler_ipi(),
this adds barriers in smp_muxed_message_pass() and smp_ipi_demux().
The barrier in smp_muxed_message_pass() is a full barrier to ensure that
there is a full ordering between the smp_send_reschedule() caller and
scheduler_ipi().  In smp_ipi_demux(), we use xchg() rather than
xchg_local() because xchg() includes release and acquire barriers.
Using xchg() rather than xchg_local() makes sense given that
ipi_message is not just accessed locally.

This moves the barrier between setting the message and calling the
cause_ipi() function into the individual cause_ipi implementations.
Most of them -- those that used outb, out_8 or similar -- already had
a full barrier because out_8 etc. include a sync before the MMIO
store.  This adds an explicit barrier in the two remaining cases.

These changes made no measurable difference to the speed of IPIs as
measured using a simple ping-pong latency test across two CPUs on
different cores of a POWER7 machine.

The analysis of the reason why processes were not waking up properly
is due to Milton Miller.

Cc: stable@vger.kernel.org # v3.0+
Reported-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Restore correct DSCR in context switch

During a context switch we always restore the per thread DSCR value.
If we aren't doing explicit DSCR management
(ie thread.dscr_inherit == 0) and the default DSCR changed while
the process has been sleeping we end up with the wrong value.

Check thread.dscr_inherit and select the default DSCR or per thread
DSCR as required.

This was found with the following test case, when running with
more threads than CPUs (ie forcing context switching):

http://ozlabs.org/~anton/junkcode/dscr_default_test.c

With the four patches applied I can run a combination of all
test cases successfully at the same time:

http://ozlabs.org/~anton/junkcode/dscr_default_test.c
http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c
http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Fix DSCR inheritance in copy_thread()

If the default DSCR is non zero we set thread.dscr_inherit in
copy_thread() meaning the new thread and all its children will ignore
future updates to the default DSCR. This is not intended and is
a change in behaviour that a number of our users have hit.

We just need to inherit thread.dscr and thread.dscr_inherit from
the parent which ends up being much simpler.

This was found with the following test case:

http://ozlabs.org/~anton/junkcode/dscr_default_test.c

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Keep thread.dscr and thread.dscr_inherit in sync

When we update the DSCR either via emulation of mtspr(DSCR) or via
a change to dscr_default in sysfs we don't update thread.dscr.
We will eventually update it at context switch time but there is
a period where thread.dscr is incorrect.

If we fork at this point we will copy the old value of thread.dscr
into the child. To avoid this, always keep thread.dscr in sync with
reality.

This issue was found with the following testcase:

http://ozlabs.org/~anton/junkcode/dscr_inherit_test.c

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Update DSCR on all CPUs when writing sysfs dscr_default

Writing to dscr_default in sysfs doesn't actually change the DSCR -
we rely on a context switch on each CPU to do the work. There is no
guarantee we will get a context switch in a reasonable amount of time
so fire off an IPI to force an immediate change.

This issue was found with the following test case:

http://ozlabs.org/~anton/junkcode/dscr_explicit_test.c

Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: <stable@kernel.org> # 3.0+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/powernv: Always go into nap mode when CPU is offline

The CPU hotplug code for the powernv platform currently only puts
offline CPUs into nap mode if the powersave_nap variable is set.
However, HV-style KVM on this platform requires secondary CPU threads
to be offline and in nap mode. Since we know nap mode works just
fine on all POWER7 machines, and the only machines that support the
powernv platform are POWER7 machines, this changes the code to
always put offline CPUs into nap mode, regardless of powersave_nap.
Powersave_nap still controls whether or not CPUs go into nap mode
when idle, as before.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc: Give hypervisor decrementer interrupts their own handler

At the moment the handler for hypervisor decrementer interrupts is
the same as for decrementer interrupts, i.e. timer_interrupt().
This is bogus; if we ever do get a hypervisor decrementer interrupt
it won't have anything to do with the next timer event.  In fact
the only time we get hypervisor decrementer interrupts is when one
is left pending on exit from a KVM guest.

When we get a hypervisor decrementer interrupt we don't need to do
anything special to clear it, since they are edge-triggered on the
transition of HDEC from 0 to -1.  Thus this adds an empty handler
function for them.  We don't need to have them masked when interrupts
are soft-disabled, so we use STD_EXCEPTION_HV instead of
MASKABLE_EXCEPTION_HV.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

powerpc/vphn: Fix arch_update_cpu_topology() return value

arch_update_cpu_topology() should only return 1 when the topology has
actually changed, and should return 0 otherwise.

This patch fixes a potential bug where rebuild_sched_domains() would
reinitialize the sched domains even when the topology hasn't changed.

Signed-off-by: Jesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

ARM: gemini: fix the gemini build

Test-compiling obscure machines I notice that the gemini (which
by the way lacks a defconfig) is broken since some time back.
Adding a simple missing include makes it build again.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes

Two regression fixes and one boot-loader compatibility fix from Simon Horman.

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
  ARM: shmobile: armadillo800eva: enable rw rootfs mount
  ARM: shmobile: mackerel: fixup usb module order
  ARM: shmobile: armadillo800eva: fixup: sound card detection order

mmc: omap: fix broken PIO mode

After commit 26b88520b80695a6fa5fd95b5d97c03f4daf87e0 ("mmc:
omap_hsmmc: remove private DMA API implementation"), the Nokia N800
here stopped booting:

[    2.086181] Waiting for root device /dev/mmcblk0p1...
[    2.324066] Unhandled fault: imprecise external abort (0x406) at 0x00000000
[    2.331451] Internal error: : 406 [#1] ARM
[    2.335784] Modules linked in:
[    2.339050] CPU: 0    Not tainted  (3.6.0-rc3 #60)
[    2.344146] PC is at default_idle+0x28/0x30
[    2.348602] LR is at trace_hardirqs_on_caller+0x15c/0x1b0

...

This turned out to be due to memory corruption caused by long-broken
PIO code in drivers/mmc/host/omap.c.  (Previously, this driver had
been using DMA; but the above commit caused the MMC driver to fall
back to PIO mode with an unmodified Kconfig.)

The PIO code, added with the rest of the driver in commit
730c9b7e6630f786fcec026fb11d2e6f2c90fdcb ("[MMC] Add OMAP MMC host
driver"), confused bytes with 16-bit words.  This bug caused memory
located after the PIO transfer buffer to be corrupted with transfers
larger than 32 bytes.  The driver also did not increment the buffer
pointer after the transfer occurred.  This bug resulted in data
corruption during any transfer larger than 64 bytes.

Signed-off-by: Paul Walmsley <paul@pwsan.com>
Reviewed-by: Felipe Balbi <balbi@ti.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: card: Skip secure erase on MoviNAND; causes unrecoverable corruption.

For several MoviNAND eMMC parts, there are known issues with secure
erase and secure trim. For these specific MoviNAND devices, we skip
these operations.

Specifically, there is a bug in the eMMC firmware that causes
unrecoverable corruption when the MMC is erased with MMC_CAP_ERASE
enabled.

References:

http://forum.xda-developers.com/showthread.php?t=1644364
https://plus.google.com/111398485184813224730/posts/21pTYfTsCkB#111398485184813224730/posts/21pTYfTsCkB

Signed-off-by: Ian Chen <ian.cy.chen@samsung.com>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Cc: stable <stable@vger.kernel.org> [3.0+]
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: dw_mmc: Disable low power mode if SDIO interrupts are used

The documentation for the dw_mmc part says that the low power
mode should normally only be set for MMC and SD memory and should
be turned off for SDIO cards that need interrupts detected.

The best place I could find to do this is when the SDIO interrupt
was first enabled. I rely on the fact that dw_mci_setup_bus()
will be called when it's time to reenable.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: dw_mmc: fix error handling in PIO mode

Data transfer will be continued until all the bytes are transmitted,
even if data crc error occurs during a multiple-block data transfer.
This means RXDR/TXDR interrupts will occurs until data transfer is
terminated. Early setting of host->sg to NULL prevents going into
xxx_data_pio functions, hence permanent unhandled RXDR/TXDR interrupts
occurs. And checking error interrupt status in the xxx_data_pio functions
is no need because dw_mci_interrupt does do the same. This patch also
removes it.

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: dw_mmc: correct mishandling error interrupt

Datasheet of SYNOPSYS mentions that DTO(Data Transfer Over) interrupt
will be raised even if some error interrupts, however it is actually
found that DTO does not occur. SYNOPSYS has confirmed this issue.
Current implementation defers the call of tasklet_schedule until DTO
when the error interrupts is happened. This patch fixes error handling.

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: dw_mmc: amend using error interrupt status

RINTSTS status includes masked interrupts as well as unmasked.
data_status and cmd_status are set by value of RINTSTS in interrupt handler
and tasklet finally uses it to decide whether error is happened or not.
In addition, MINTSTS status is used for setting data_status in PIO.
Masked error interrupt will not be handled and that status can be considered
non-error case.

Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com>
Reviewed By: Girish K S <girish.shivananjappa@linaro.org>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Acked-by: Will Newton <will.newton@imgtec.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: atmel-mci: not busy flag has also to be used for read operations

Even if the datasheet says that the not busy flag has to be used only
for write operations, it's false except for version lesser than v2xx.

Not waiting on the not busy flag for read operations can cause the
controller to hang-up during the initialization of some SD cards
with DMA after the first CMD6 -- the next command is sent too early.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Cc: stable <stable@vger.kernel.org> [3.5, 3.6]
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-esdhc: break out early if clock is 0

Since commit 30832ab56 ("mmc: sdhci: Always pass clock request value
zero to set_clock host op") was merged, esdhc_set_clock starts hitting
"if (clock == 0)" where ESDHC_SYSTEM_CONTROL has been operated. This
causes SDHCI card-detection function being broken. Fix the regression
by moving "if (clock == 0)" above ESDHC_SYSTEM_CONTROL operation.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: mxs-mmc: fix deadlock caused by recursion loop

Release the lock before mmc_signal_sdio_irq is called by
mxs_mmc_enable_sdio_irq.

Backtrace:
[   65.470000] =============================================
[   65.470000] [ INFO: possible recursive locking detected ]
[   65.470000] 3.5.0-rc5 #2 Not tainted
[   65.470000] ---------------------------------------------
[   65.470000] ksdioirqd/mmc0/73 is trying to acquire lock:
[   65.470000]  (&(&host->lock)->rlock#2){-.-...}, at: [<bf054120>] mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]
[   65.470000]
[   65.470000] but task is already holding lock:
[   65.470000]  (&(&host->lock)->rlock#2){-.-...}, at: [<bf054120>] mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]
[   65.470000]
[   65.470000] other info that might help us debug this:
[   65.470000]  Possible unsafe locking scenario:
[   65.470000]
[   65.470000]        CPU0
[   65.470000]        ----
[   65.470000]   lock(&(&host->lock)->rlock#2);
[   65.470000]   lock(&(&host->lock)->rlock#2);
[   65.470000]
[   65.470000]  *** DEADLOCK ***
[   65.470000]
[   65.470000]  May be due to missing lock nesting notation
[   65.470000]
[   65.470000] 1 lock held by ksdioirqd/mmc0/73:
[   65.470000]  #0:  (&(&host->lock)->rlock#2){-.-...}, at: [<bf054120>] mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]
[   65.470000]
[   65.470000] stack backtrace:
[   65.470000] [<c0014990>] (unwind_backtrace+0x0/0xf4) from [<c005ccb8>] (__lock_acquire+0x14f8/0x1b98)
[   65.470000] [<c005ccb8>] (__lock_acquire+0x14f8/0x1b98) from [<c005d3f8>] (lock_acquire+0xa0/0x108)
[   65.470000] [<c005d3f8>] (lock_acquire+0xa0/0x108) from [<c02f671c>] (_raw_spin_lock_irqsave+0x48/0x5c)
[   65.470000] [<c02f671c>] (_raw_spin_lock_irqsave+0x48/0x5c) from [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc])
[   65.470000] [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]) from [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc])
[   65.470000] [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc]) from [<c0219b38>] (sdio_irq_thread+0x1bc/0x274)
[   65.470000] [<c0219b38>] (sdio_irq_thread+0x1bc/0x274) from [<c003c324>] (kthread+0x8c/0x98)
[   65.470000] [<c003c324>] (kthread+0x8c/0x98) from [<c00101ac>] (kernel_thread_exit+0x0/0x8)
[   65.470000] BUG: spinlock lockup suspected on CPU#0, ksdioirqd/mmc0/73
[   65.470000]  lock: 0xc3358724, .magic: dead4ead, .owner: ksdioirqd/mmc0/73, .owner_cpu: 0
[   65.470000] [<c0014990>] (unwind_backtrace+0x0/0xf4) from [<c01b46b0>] (do_raw_spin_lock+0x100/0x144)
[   65.470000] [<c01b46b0>] (do_raw_spin_lock+0x100/0x144) from [<c02f6724>] (_raw_spin_lock_irqsave+0x50/0x5c)
[   65.470000] [<c02f6724>] (_raw_spin_lock_irqsave+0x50/0x5c) from [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc])
[   65.470000] [<bf054120>] (mxs_mmc_enable_sdio_irq+0x18/0xdc [mxs_mmc]) from [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc])
[   65.470000] [<bf0541d0>] (mxs_mmc_enable_sdio_irq+0xc8/0xdc [mxs_mmc]) from [<c0219b38>] (sdio_irq_thread+0x1bc/0x274)
[   65.470000] [<c0219b38>] (sdio_irq_thread+0x1bc/0x274) from [<c003c324>] (kthread+0x8c/0x98)
[   65.470000] [<c003c324>] (kthread+0x8c/0x98) from [<c00101ac>] (kernel_thread_exit+0x0/0x8)

Reported-by: Attila Kinali <attila@kinali.ch>
Signed-off-by: Lauri Hintsala <lauri.hintsala@bluegiga.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: mxs-mmc: fix deadlock in SDIO IRQ case

Release the lock before mmc_signal_sdio_irq is called by mxs_mmc_irq_handler.

Backtrace:
[   79.660000] =============================================
[   79.660000] [ INFO: possible recursive locking detected ]
[   79.660000] 3.4.0-00009-g3e96082-dirty #11 Not tainted
[   79.660000] ---------------------------------------------
[   79.660000] swapper/0 is trying to acquire lock:
[   79.660000]  (&(&host->lock)->rlock#2){-.....}, at: [<c026ea3c>] mxs_mmc_enable_sdio_irq+0x18/0xd4
[   79.660000]
[   79.660000] but task is already holding lock:
[   79.660000]  (&(&host->lock)->rlock#2){-.....}, at: [<c026f744>] mxs_mmc_irq_handler+0x1c/0xe8
[   79.660000]
[   79.660000] other info that might help us debug this:
[   79.660000]  Possible unsafe locking scenario:
[   79.660000]
[   79.660000]        CPU0
[   79.660000]        ----
[   79.660000]   lock(&(&host->lock)->rlock#2);
[   79.660000]   lock(&(&host->lock)->rlock#2);
[   79.660000]
[   79.660000]  *** DEADLOCK ***
[   79.660000]
[   79.660000]  May be due to missing lock nesting notation
[   79.660000]
[   79.660000] 1 lock held by swapper/0:
[   79.660000]  #0:  (&(&host->lock)->rlock#2){-.....}, at: [<c026f744>] mxs_mmc_irq_handler+0x1c/0xe8
[   79.660000]
[   79.660000] stack backtrace:
[   79.660000] [<c0014bd0>] (unwind_backtrace+0x0/0xf4) from [<c005f9c0>] (__lock_acquire+0x1948/0x1d48)
[   79.660000] [<c005f9c0>] (__lock_acquire+0x1948/0x1d48) from [<c005fea0>] (lock_acquire+0xe0/0xf8)
[   79.660000] [<c005fea0>] (lock_acquire+0xe0/0xf8) from [<c03a8460>] (_raw_spin_lock_irqsave+0x44/0x58)
[   79.660000] [<c03a8460>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c026ea3c>] (mxs_mmc_enable_sdio_irq+0x18/0xd4)
[   79.660000] [<c026ea3c>] (mxs_mmc_enable_sdio_irq+0x18/0xd4) from [<c026f7fc>] (mxs_mmc_irq_handler+0xd4/0xe8)
[   79.660000] [<c026f7fc>] (mxs_mmc_irq_handler+0xd4/0xe8) from [<c006bdd8>] (handle_irq_event_percpu+0x70/0x254)
[   79.660000] [<c006bdd8>] (handle_irq_event_percpu+0x70/0x254) from [<c006bff8>] (handle_irq_event+0x3c/0x5c)
[   79.660000] [<c006bff8>] (handle_irq_event+0x3c/0x5c) from [<c006e6d0>] (handle_level_irq+0x90/0x110)
[   79.660000] [<c006e6d0>] (handle_level_irq+0x90/0x110) from [<c006b930>] (generic_handle_irq+0x38/0x50)
[   79.660000] [<c006b930>] (generic_handle_irq+0x38/0x50) from [<c00102fc>] (handle_IRQ+0x30/0x84)
[   79.660000] [<c00102fc>] (handle_IRQ+0x30/0x84) from [<c000f058>] (__irq_svc+0x38/0x60)
[   79.660000] [<c000f058>] (__irq_svc+0x38/0x60) from [<c0010520>] (default_idle+0x2c/0x40)
[   79.660000] [<c0010520>] (default_idle+0x2c/0x40) from [<c0010a90>] (cpu_idle+0x64/0xcc)
[   79.660000] [<c0010a90>] (cpu_idle+0x64/0xcc) from [<c04ff858>] (start_kernel+0x244/0x2c8)
[   79.660000] BUG: spinlock lockup on CPU#0, swapper/0
[   79.660000]  lock: c398cb2c, .magic: dead4ead, .owner: swapper/0, .owner_cpu: 0
[   79.660000] [<c0014bd0>] (unwind_backtrace+0x0/0xf4) from [<c01ddb1c>] (do_raw_spin_lock+0xf0/0x144)
[   79.660000] [<c01ddb1c>] (do_raw_spin_lock+0xf0/0x144) from [<c03a8468>] (_raw_spin_lock_irqsave+0x4c/0x58)
[   79.660000] [<c03a8468>] (_raw_spin_lock_irqsave+0x4c/0x58) from [<c026ea3c>] (mxs_mmc_enable_sdio_irq+0x18/0xd4)
[   79.660000] [<c026ea3c>] (mxs_mmc_enable_sdio_irq+0x18/0xd4) from [<c026f7fc>] (mxs_mmc_irq_handler+0xd4/0xe8)
[   79.660000] [<c026f7fc>] (mxs_mmc_irq_handler+0xd4/0xe8) from [<c006bdd8>] (handle_irq_event_percpu+0x70/0x254)
[   79.660000] [<c006bdd8>] (handle_irq_event_percpu+0x70/0x254) from [<c006bff8>] (handle_irq_event+0x3c/0x5c)
[   79.660000] [<c006bff8>] (handle_irq_event+0x3c/0x5c) from [<c006e6d0>] (handle_level_irq+0x90/0x110)
[   79.660000] [<c006e6d0>] (handle_level_irq+0x90/0x110) from [<c006b930>] (generic_handle_irq+0x38/0x50)
[   79.660000] [<c006b930>] (generic_handle_irq+0x38/0x50) from [<c00102fc>] (handle_IRQ+0x30/0x84)
[   79.660000] [<c00102fc>] (handle_IRQ+0x30/0x84) from [<c000f058>] (__irq_svc+0x38/0x60)
[   79.660000] [<c000f058>] (__irq_svc+0x38/0x60) from [<c0010520>] (default_idle+0x2c/0x40)
[   79.660000] [<c0010520>] (default_idle+0x2c/0x40) from [<c0010a90>] (cpu_idle+0x64/0xcc)
[   79.660000] [<c0010a90>] (cpu_idle+0x64/0xcc) from [<c04ff858>] (start_kernel+0x244/0x2c8)

Signed-off-by: Lauri Hintsala <lauri.hintsala@bluegiga.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: bfin_sdh: fix dma_desc_array build error

Descriptor array structure has been moved into blackfin dma.h head file.
This patch fix below error:

drivers/mmc/host/bfin_sdh.c:52:8: error: redefinition of 'struct
dma_desc_array'
make[4]: *** [drivers/mmc/host/bfin_sdh.o] Error 1

Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

UBI: fix a horrible memory deallocation bug

UBI was mistakingly using 'kfree()' instead of 'kmem_cache_free()' when
freeing "attach eraseblock" structures in vtbl.c. Thankfully, this happened
only when we were doing auto-format, so many systems were unaffected. However,
there are still many users affected.

It is strange, but the system did not crash and nothing bad happened when
the SLUB memory allocator was used. However, in case of SLOB we observed an
crash right away.

This problem was introduced in 2.6.39 by commit
"6c1e875 UBI: add slab cache for ubi_scan_leb objects"

A note for stable trees:
Because variable were renamed, this won't cleanly apply to older kernels.
Changing names like this should help:
1. ai -> si
2. aeb_slab_cache -> seb_slab_cache
3. new_aeb -> new_seb

Reported-by: Richard Genoud <richard.genoud@gmail.com>
Tested-by: Richard Genoud <richard.genoud@gmail.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: stable@vger.kernel.org [v2.6.39+]
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>

ARM: shmobile: armadillo800eva: enable rw rootfs mount

armadillo800eva default boot loader is "hermit",
and it's tag->u.core.flags has flag when kernel boots.
Because of this, ${LINUX}/arch/arm/kernel/setup.c :: parse_tag_core()
didn't remove MS_RDONLY flag from root_mountflags.
Thus, the rootfs is mounted as "readonly".
This patch adds "rw" kernel parameter,
and enable read/write mounts for rootfs

Cc: Masahiro Nakai <nakai@atmark-techno.com>
Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fixes from Steve French.

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix cifs_do_create error hadnling
  cifs: print error code if smb signature verification fails
  CIFS: Fix log messages in packet checking for SMB2
  CIFS: Protect i_nlink from being negative

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) NLA_PUT* --> nla_put_* conversion got one case wrong in
    nfnetlink_log, fix from Patrick McHardy.

2) Missed error return check in ipw2100 driver, from Julia Lawall.

3) PMTU updates in ipv4 were setting the expiry time incorrectly, fix
    from Eric Dumazet.

4) SFC driver erroneously reversed src and dst when reporting filters
    via ethtool.

5) Memory leak in CAN protocol and wrong setting of IRQF_SHARED in
    sja1000 can platform driver, from Alexey Khoroshilov and Sven
    Schmitt.

6) Fix multicast traffic scaling regression in ipv4_dst_destroy, only
    take the lock when we really need to.  From Eric Dumazet.

7) Fix non-root process spoofing in netlink, from Pablo Neira Ayuso.

8) CWND reduction in TCP is done incorrectly during non-SACK recovery,
    fix from Yuchung Cheng.

9) Revert netpoll change, and fix what was actually a driver specific
    problem.  From Amerigo Wang.  This should cure bootup hangs with
    netconsole some people reported.

10) Fix xen-netfront invoking __skb_fill_page_desc() with a NULL page
    pointer.  From Ian Campbell.

11) SIP NAT fix for expectiontation creation, from Pablo Neira Ayuso.

12) __ip_rt_update_pmtu() needs RCU locking, from Eric Dumazet.

13) Fix usbnet deadlock on resume, can't use GFP_KERNEL in this
    situation.  From Oliver Neukum.

14) The davinci ethernet driver triggers an OOPS on removal because it
    frees an MDIO object before unregistering it.  Fix from Bin Liu.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
  net: qmi_wwan: add several new Gobi devices
  fddi: 64 bit bug in smt_add_para()
  net: ethernet: fix kernel OOPS when remove davinci_mdio module
  net/xfrm/xfrm_state.c: fix error return code
  net: ipv6: fix error return code
  net: qmi_wwan: new device: Foxconn/Novatel E396
  usbnet: fix deadlock in resume
  cs89x0 : packet reception not working
  netfilter: nf_conntrack: fix racy timer handling with reliable events
  bnx2x: Correct the ndo_poll_controller call
  bnx2x: Move netif_napi_add to the open call
  ipv4: must use rcu protection while calling fib_lookup
  bnx2x: fix 57840_MF pci id
  net: ipv4: ipmr_expire_timer causes crash when removing net namespace
  e1000e: DoS while TSO enabled caused by link partner with small MSS
  l2tp: avoid to use synchronize_rcu in tunnel free function
  gianfar: fix default tx vlan offload feature flag
  netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation
  xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX
  netfilter: nfnetlink_log: fix error return code in init path
  ...

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
ARM: shmobile: marzen: fixup smsc911x id for regulator

Merge branch 'fixes-for-v3.6-v2' of git://git.infradead.org/users/jcooper/linux into fixes

* 'fixes-for-v3.6-v2' of git://git.infradead.org/users/jcooper/linux:
ARM: Kirkwood: Fix 'SZ_1M' undeclared here for db88f6281-bp-setup.c

HID: Only dump input if someone is listening

Going through the motions of printing the debug message information
takes a long time; using the keyboard can lead to a 160 us irqsoff
latency. This patch skips hid_dump_input() when there are no open
handles, which brings latency down to 100 us.

Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>

net: qmi_wwan: add several new Gobi devices

Gobi devices are composite, needing both the qcserial and
qmi_wwan drivers to support all functions. Re-syncing the
list of supported devices with qcserial.

Cc: Aleksander Morgado <aleksander@lanedo.com>
Cc: Thomas Tuttle <ttuttle@chromium.org>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@tempietto.lan>

fddi: 64 bit bug in smt_add_para()

The intent was to set 4 bytes of data so that's why the sp_len is set
to 4 on the next line. The cast to u_long pointer clears 8 bytes
on 64 bit arches.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@tempietto.lan>

linux/kernel.h: Fix DIV_ROUND_CLOSEST to support negative dividends

DIV_ROUND_CLOSEST returns a bad result for negative dividends:
DIV_ROUND_CLOSEST(-2, 2) = 0

Most of the time this does not matter. However, in the hardware monitoring
subsystem, DIV_ROUND_CLOSEST is sometimes used on integers which can be
negative (such as temperatures).

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Jean Delvare <khali@linux-fr.org>

Linux 3.6-rc4

time: Move ktime_t overflow checking into timespec_valid_strict

Andreas Bombe reported that the added ktime_t overflow checking added to
timespec_valid in commit 4e8b14526ca7 ("time: Improve sanity checking of
timekeeping inputs") was causing problems with X.org because it caused
timeouts larger then KTIME_T to be invalid.

Previously, these large timeouts would be clamped to KTIME_MAX and would
never expire, which is valid.

This patch splits the ktime_t overflow checking into a new
timespec_valid_strict function, and converts the timekeeping codes
internal checking to use this more strict function.

Reported-and-tested-by: Andreas Bombe <aeb@debian.org>
Cc: Zhouping Liu <zliu@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

gpio: rdc321x: Prevent removal of modules exporting active GPIOs

This driver can be built as a module, set the missing owner field of
struct gpio_chip to prevent removal of modules exporting active GPIOs.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

Merge git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM bugfixes from Marcelo Tosatti.

* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: fix KVM_GET_MSR for PV EOI
kvm: Fix nonsense handling of compat ioctl

Merge tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6

Pull PARISC fixes from James Bottomley:
"This is a set of two bug fixes.  One is the ATOMIC problem which is
  now causing a compile failure in certain situations.  The other is
  mishandling of PER_LINUX32 which may also cause user visible effects.

Signed-off-by: James Bottomley <JBottomley@Parallels.com>"
* tag 'parisc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
  [PARISC] fix personality flag check in copy_thread()
  [PARISC] Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Martin Schwidefsky:
"A couple of s390 bug fixes for 3.5-rc4"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/32: Don't clobber personality flags on exec
  s390/smp: add missing smp_store_status() for !SMP
  s390/dasd: fix ioctl return value
  s390: Always use "long" for ssize_t to match size_t

gpio: em: Fix checking return value of irq_alloc_descs

irq_alloc_descs() returns negative error code on failure.

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

gpio: mc9s08dz60: Fix build error if I2C=m

Make GPIO_MC9S08DZ60 depend on I2C=y, this fixes below build error:

LD init/built-in.o
drivers/built-in.o: In function `mc9s08dz60_get_value':
clk-fixed-factor.c:(.text+0x7214): undefined reference to `i2c_smbus_read_byte_data'
drivers/built-in.o: In function `mc9s08dz60_set':
clk-fixed-factor.c:(.text+0x727c): undefined reference to `i2c_smbus_read_byte_data'
clk-fixed-factor.c:(.text+0x72bc): undefined reference to `i2c_smbus_write_byte_data'
drivers/built-in.o: In function `mc9s08dz60_i2c_driver_init':
clk-fixed-factor.c:(.init.text+0x290): undefined reference to `i2c_register_driver'
drivers/built-in.o: In function `mc9s08dz60_i2c_driver_exit':
clk-fixed-factor.c:(.exit.text+0x2c): undefined reference to `i2c_del_driver'
make: *** [vmlinux] Error 1

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

net: ethernet: fix kernel OOPS when remove davinci_mdio module

davinci mdio device is not unregistered from mdiobus when removing
the module, which causes BUG_ON() when free the device from mdiobus.

Calling mdiobus_unregister() before mdiobus_free() fixes the issue.

Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/xfrm/xfrm_state.c: fix error return code

Initialize return variable before exiting on an error path.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6: fix error return code

Initialize return variable before exiting on an error path.

The initial initialization of the return variable is also dropped, because
that value is never used.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 ($ret < 0\|ret != 0$)
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qmi_wwan: new device: Foxconn/Novatel E396

Foxconn-branded Novatel E396, Gobi3k modem.

Cc: Dan Williams <dcbw@redhat.com>
Cc: Bjørn Mork <bjorn@mork.no>
Cc: Ben Chan <benchan@google.com>
Signed-off-by: Aleksander Morgado <aleksander@lanedo.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

usbnet: fix deadlock in resume

A usbnet device can share a multifunction device
with a storage device. If the storage device is autoresumed
the usbnet devices also needs to be autoresumed. Allocating
memory with GFP_KERNEL can deadlock in this case.

This should go back into all kernels that have
commit 65841fd5132c3941cdf5df09e70df3ed28323212
That is 3.5

Signed-off-by: Oliver Neukum <oneukum@suse.de>
CC: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

cs89x0 : packet reception not working

The RxCFG register of the CS89x0 could be configured incorrectly
(because of misplaced parentheses), resulting in the disabling
of packet reception.

Signed-off-by: Jaccon Bastiaansen <jaccon.bastiaansen@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ALSA: snd-usb: fix cross-interface streaming devices

Commit 68e67f40b ("ALSA: snd-usb: move calls to usb_set_interface")
saved us some unnecessary calls to snd_usb_set_interface() but ignored
the fact that there is at least one device out there which operates on
two endpoint in different interfaces simultaniously.

Take care for this by catching the case where data and sync endpoints
are located on different interfaces and calling snd_usb_set_interface()
between the start of the two endpoints.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: Robert M. Albrecht <linux@romal.de>
Cc: stable@kernel.org [v3.5+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: snd-usb: fix calls to next_packet_size

In order to support devices with implicit feedback streaming models,
packet sizes are now stored with each individual urb, and the PCM
handling code which fills the buffers purely relies on the size fields
now.

However, calling snd_usb_audio_next_packet_size() for all possible
packets in an URB at once, prior to letting the PCM code do its job
does in fact not lead to the same behaviour than what the old code did:
The PCM code will break its loop once a period boundary is reached,
consequently using up less packets that it really could.

As snd_usb_audio_next_packet_size() implements a feedback mechanism to
the endpoints phase accumulator, the number of calls to that function
matters, and when called too often, the data rate runs out of bounds.

Fix this by making the next_packet function public, and call it from the
PCM code as before if the packet data sizes are not defined.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: stable@kernel.org [v3.5+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ALSA: snd-usb: restore delay information

Parts of commit 294c4fb8 ("ALSA: usb: refine delay information with USB
frame counter") were unfortunately lost during the refactoring of the
snd-usb driver in 3.5.

This patch adds them back, restoring the correct delay information
behaviour.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: stable@kernel.org [3.5+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>

Merge branch 'master' of git://1984.lsi.us.es/nf

ALSA: snd-usb: use list_for_each_safe for endpoint resources

snd_usb_endpoint_free() frees the structure that contains its argument.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Cc: stable@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>

ARM: Kirkwood: Fix 'SZ_1M' undeclared here for db88f6281-bp-setup.c

Linux-next has failed to compile for kirkwood since 23 August with:

arch/arm/mach-kirkwood/db88f6281-bp-setup.c:29: error: 'SZ_1M' undeclared here (not in a function)
arch/arm/mach-kirkwood/db88f6281-bp-setup.c:33: error: 'SZ_4M' undeclared here (not in a function)

Add missing <linux/sizes.h>

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Jason Cooper <jason@lakedaemon.net>