Peter Zijlstra [Wed, 1 Jun 2016 18:58:15 +0000 (20:58 +0200)]
locking/mutex: Optimize mutex_trylock() fast-path
A while back Viro posted a number of 'interesting' mutex_is_locked()
users on IRC, one of those was RCU.
RCU seems to use mutex_is_locked() to avoid doing mutex_trylock(), the
regular load before modify pattern.
While the use isn't wrong per se, its curious in that its needed at all,
mutex_trylock() should be good enough on its own to avoid the pointless
cacheline bounces.
So fix those and remove the mutex_is_locked() (ab)use from RCU.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paul McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <Waiman.Long@hpe.com> Link: http://lkml.kernel.org/r/20160601185815.GW3190@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
Waiman Long [Wed, 18 May 2016 01:26:23 +0000 (21:26 -0400)]
locking/rwsem: Streamline the rwsem_optimistic_spin() code
This patch moves the owner loading and checking code entirely inside of
rwsem_spin_on_owner() to simplify the logic of rwsem_optimistic_spin()
loop.
Suggested-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Douglas Hatch <doug.hatch@hpe.com> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1463534783-38814-6-git-send-email-Waiman.Long@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Waiman Long [Wed, 18 May 2016 01:26:22 +0000 (21:26 -0400)]
locking/rwsem: Improve reader wakeup code
In __rwsem_do_wake(), the reader wakeup code will assume a writer
has stolen the lock if the active reader/writer count is not 0.
However, this is not as reliable an indicator as the original
"< RWSEM_WAITING_BIAS" check. If another reader is present, the code
will still break out and exit even if the writer is gone. This patch
changes it to check the same "< RWSEM_WAITING_BIAS" condition to
reduce the chance of false positive.
Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Douglas Hatch <doug.hatch@hpe.com> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1463534783-38814-5-git-send-email-Waiman.Long@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Waiman Long [Wed, 18 May 2016 01:26:20 +0000 (21:26 -0400)]
locking/rwsem: Protect all writes to owner by WRITE_ONCE()
Without using WRITE_ONCE(), the compiler can potentially break a
write into multiple smaller ones (store tearing). So a read from the
same data by another task concurrently may return a partial result.
This can result in a kernel crash if the data is a memory address
that is being dereferenced.
This patch changes all write to rwsem->owner to use WRITE_ONCE()
to make sure that store tearing will not happen. READ_ONCE() may
not be needed for rwsem->owner as long as the value is only used for
comparison and not dereferencing.
Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Douglas Hatch <doug.hatch@hpe.com> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1463534783-38814-3-git-send-email-Waiman.Long@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Waiman Long [Wed, 18 May 2016 01:26:19 +0000 (21:26 -0400)]
locking/rwsem: Add reader-owned state to the owner field
Currently, it is not possible to determine for sure if a reader
owns a rwsem by looking at the content of the rwsem data structure.
This patch adds a new state RWSEM_READER_OWNED to the owner field
to indicate that readers currently own the lock. This enables us to
address the following 2 issues in the rwsem optimistic spinning code:
1) rwsem_can_spin_on_owner() will disallow optimistic spinning if
the owner field is NULL which can mean either the readers own
the lock or the owning writer hasn't set the owner field yet.
In the latter case, we miss the chance to do optimistic spinning.
2) While a writer is waiting in the OSQ and a reader takes the lock,
the writer will continue to spin when out of the OSQ in the main
rwsem_optimistic_spin() loop as the owner field is NULL wasting
CPU cycles if some of readers are sleeping.
Adding the new state will allow optimistic spinning to go forward as
long as the owner field is not RWSEM_READER_OWNED and the owner is
running, if set, but stop immediately when that state has been reached.
On a 4-socket Haswell machine running on a 4.6-rc1 based kernel, the
fio test with multithreaded randrw and randwrite tests on the same
file on a XFS partition on top of a NVDIMM were run, the aggregated
bandwidths before and after the patch were as follows:
Test BW before patch BW after patch % change
---- --------------- -------------- --------
randrw 988 MB/s 1192 MB/s +21%
randwrite 1513 MB/s 1623 MB/s +7.3%
The perf profile of the rwsem_down_write_failed() function in randrw
before and after the patch were:
Jason Low [Tue, 17 May 2016 00:38:02 +0000 (17:38 -0700)]
locking/rwsem: Remove rwsem_atomic_add() and rwsem_atomic_update()
The rwsem-xadd count has been converted to an atomic variable and the
rwsem code now directly uses atomic_long_add() and
atomic_long_add_return(), so we can remove the arch implementations of
rwsem_atomic_add() and rwsem_atomic_update().
Signed-off-by: Jason Low <jason.low2@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Terry Rudd <terry.rudd@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Waiman Long <Waiman.Long@hpe.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jason Low [Sat, 4 Jun 2016 05:26:02 +0000 (22:26 -0700)]
locking/rwsem: Convert sem->count to 'atomic_long_t'
Convert the rwsem count variable to an atomic_long_t since we use it
as an atomic variable. This also allows us to remove the
rwsem_atomic_{add,update}() "abstraction" which would now be an unnecesary
level of indirection. In follow up patches, we also remove the
rwsem_atomic_{add,update}() definitions across the various architectures.
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Jason Low <jason.low2@hpe.com>
[ Build warning fixes on various architectures. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Terry Rudd <terry.rudd@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Waiman Long <Waiman.Long@hpe.com> Link: http://lkml.kernel.org/r/1465017963-4839-2-git-send-email-jason.low2@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 8 Jun 2016 08:36:53 +0000 (10:36 +0200)]
locking/qspinlock: Add comments
I figured we need to document the spin_is_locked() and
spin_unlock_wait() constraints somwehere.
Ideally 'someone' would rewrite Documentation/atomic_ops.txt and we
could find a place in there. But currently that document is stale to
the point of hardly being useful.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 8 Jun 2016 07:12:30 +0000 (09:12 +0200)]
locking/qspinlock: Clarify xchg_tail() ordering
While going over the code I noticed that xchg_tail() is a RELEASE but
had no obvious pairing commented.
It pairs with a somewhat unique address dependency through
decode_tail().
So the store-release of xchg_tail() is paired by the address
dependency of the load of xchg_tail followed by the dereference from
the pointer computed from that load.
The @old -> @prev transformation itself is pure, and therefore does
not depend on external state, so that is immaterial wrt. ordering.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 8 Jun 2016 08:19:51 +0000 (10:19 +0200)]
locking/qspinlock: Fix spin_unlock_wait() some more
While this prior commit:
54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()")
... fixes spin_is_locked() and spin_unlock_wait() for the usage
in ipc/sem and netfilter, it does not in fact work right for the
usage in task_work and futex.
So while the 2 locks crossed problem:
spin_lock(A) spin_lock(B)
if (!spin_is_locked(B)) spin_unlock_wait(A)
foo() foo();
... works with the smp_mb() injected by both spin_is_locked() and
spin_unlock_wait(), this is not sufficient for:
flag = 1;
smp_mb(); spin_lock()
spin_unlock_wait() if (!flag)
// add to lockless list
// iterate lockless list
... because in this scenario, the store from spin_lock() can be delayed
past the load of flag, uncrossing the variables and loosing the
guarantee.
This patch reworks spin_is_locked() and spin_unlock_wait() to work in
both cases by exploiting the observation that while the lock byte
store can be delayed, the contender must have registered itself
visibly in other state contained in the word.
It also allows for architectures to override both functions, as PPC
and ARM64 have an additional issue for which we currently have no
generic solution.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Giovanni Gherdovich <ggherdovich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <waiman.long@hpe.com> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org # v4.2 and later Fixes: 54cf809b9512 ("locking,qspinlock: Fix spin_is_locked() and spin_unlock_wait()") Signed-off-by: Ingo Molnar <mingo@kernel.org>
locking/rtmutex: Only warn once on a trylock from bad context
One warning should be enough to get one motivated to fix this. It is
possible that this happens more than once and that starts flooding the
output. Later the prints will be suppressed so we only get half of it.
Depending on the console system used it might not be helpful.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1464356838-1755-1-git-send-email-bigeasy@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Mon, 30 May 2016 16:31:33 +0000 (18:31 +0200)]
locking/lockdep: Use __jhash_mix() for iterate_chain_key()
Use __jhash_mix() to mix the class_idx into the class_key. This
function provides better mixing than the previously used, home grown
mix function.
Leave hashing to the professionals :-)
Suggested-by: George Spelvin <linux@sciencehorizons.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tejun Heo [Wed, 25 May 2016 20:11:57 +0000 (16:11 -0400)]
percpu, locking: Revert ("percpu: Replace smp_read_barrier_depends() with lockless_dereference()")
lockless_dereference() is planned to grow a sanity check to ensure
that the input parameter is a pointer. __ref_is_percpu() passes in an
unsinged long value which is a combination of a pointer and a flag.
While it can be casted to a pointer lvalue, the casting looks messy
and it's a special case anyway. Let's revert back to open-coding
READ_ONCE() and explicit barrier.
This doesn't cause any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pranith Kumar <bobby.prani@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Link: http://lkml.kernel.org/g/20160522185040.GA23664@p183.telecom.by Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jason Low [Fri, 20 May 2016 22:19:36 +0000 (15:19 -0700)]
locking/mutex: Set and clear owner using WRITE_ONCE()
The mutex owner can get read and written to locklessly.
Use WRITE_ONCE when setting and clearing the owner field
in order to avoid optimizations such as store tearing. This
avoids situations where the owner field gets written to with
multiple stores and another thread could concurrently read
and use a partially written owner value.
Signed-off-by: Jason Low <jason.low2@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Acked-by: Waiman Long <Waiman.Long@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Terry Rudd <terry.rudd@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1463782776.2479.9.camel@j-VirtualBox Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jason Low [Tue, 17 May 2016 00:38:00 +0000 (17:38 -0700)]
locking/rwsem: Optimize write lock by reducing operations in slowpath
When acquiring the rwsem write lock in the slowpath, we first try
to set count to RWSEM_WAITING_BIAS. When that is successful,
we then atomically add the RWSEM_WAITING_BIAS in cases where
there are other tasks on the wait list. This causes write lock
operations to often issue multiple atomic operations.
We can instead make the list_is_singular() check first, and then
set the count accordingly, so that we issue at most 1 atomic
operation when acquiring the write lock and reduce unnecessary
cacheline contention.
Signed-off-by: Jason Low <jason.low2@hpe.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Waiman Long<Waiman.Long@hpe.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Terry Rudd <terry.rudd@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1463445486-16078-2-git-send-email-jason.low2@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Davidlohr Bueso [Fri, 13 May 2016 18:56:27 +0000 (11:56 -0700)]
locking/rwsem: Rework zeroing reader waiter->task
Readers that are awoken will expect a nil ->task indicating
that a wakeup has occurred. Because of the way readers are
implemented, there's a small chance that the waiter will never
block in the slowpath (rwsem_down_read_failed), and therefore
requires some form of reference counting to avoid the following
scenario:
rwsem_down_read_failed() rwsem_wake()
get_task_struct();
spin_lock_irq(&wait_lock);
list_add_tail(&waiter.list)
spin_unlock_irq(&wait_lock);
raw_spin_lock_irqsave(&wait_lock)
__rwsem_do_wake()
while (1) {
set_task_state(TASK_UNINTERRUPTIBLE);
waiter->task = NULL
if (!waiter.task) // true
break;
schedule() // never reached
... and therefore race with do_exit() when the caller returns.
There is also a mismatch between the smp_mb() and its documentation,
in that the serialization is done between reading the task and the
nil store. Furthermore, in addition to having the overlapping of
loads and stores to waiter->task guaranteed to be ordered within
that CPU, both wake_up_process() originally and now wake_q_add()
already imply barriers upon successful calls, which serves the
comment.
Now, as an alternative to perhaps inverting the checks in the blocker
side (which has its own penalty in that schedule is unavoidable),
with lockless wakeups this situation is naturally addressed and we
can just use the refcount held by wake_q_add(), instead doing so
explicitly. Of course, we must guarantee that the nil store is done
as the _last_ operation in that the task must already be marked for
deletion to not fall into the race above. Spurious wakeups are also
handled transparently in that the task's reference is only removed
when wake_up_q() is actually called _after_ the nil store.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman.Long@hpe.com Cc: dave@stgolabs.net Cc: jason.low2@hp.com Cc: peter@hurleysoftware.com Link: http://lkml.kernel.org/r/1463165787-25937-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
Davidlohr Bueso [Fri, 13 May 2016 18:56:26 +0000 (11:56 -0700)]
locking/rwsem: Enable lockless waiter wakeup(s)
As wake_qs gain users, we can teach rwsems about them such that
waiters can be awoken without the wait_lock. This is for both
readers and writer, the former being the most ideal candidate
as we can batch the wakeups shortening the critical region that
much more -- ie writer task blocking a bunch of tasks waiting to
service page-faults (mmap_sem readers).
In general applying wake_qs to rwsem (xadd) is not difficult as
the wait_lock is intended to be released soon _anyways_, with
the exception of when a writer slowpath will proactively wakeup
any queued readers if it sees that the lock is owned by a reader,
in which we simply do the wakeups with the lock held (see comment
in __rwsem_down_write_failed_common()).
Similar to other locking primitives, delaying the waiter being
awoken does allow, at least in theory, the lock to be stolen in
the case of writers, however no harm was seen in this (in fact
lock stealing tends to be a _good_ thing in most workloads), and
this is a tiny window anyways.
Some page-fault (pft) and mmap_sem intensive benchmarks show some
pretty constant reduction in systime (by up to ~8 and ~10%) on a
2-socket, 12 core AMD box. In addition, on an 8-core Westmere doing
page allocations (page_test)
Chris Wilson [Thu, 26 May 2016 20:08:17 +0000 (21:08 +0100)]
locking/ww_mutex: Report recursive ww_mutex locking early
Recursive locking for ww_mutexes was originally conceived as an
exception. However, it is heavily used by the DRM atomic modesetting
code. Currently, the recursive deadlock is checked after we have queued
up for a busy-spin and as we never release the lock, we spin until
kicked, whereupon the deadlock is discovered and reported.
A simple solution for the now common problem is to move the recursive
deadlock discovery to the first action when taking the ww_mutex.
Suggested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1464293297-19777-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
seq = ({ p = READ_ONCE(latch);
smp_read_barrier_depends(); p })->seq;
is just wrong; because it looses the volatile read on seq, which can now
be torn or worse 'optimized'. And the read_depend barrier is also placed
wrong, we want it after the load of seq, to match the above data[]
up-to-date wmb()s.
Such that when we dereference latch->data[] below, we're guaranteed to
observe the right data.
Linus Torvalds [Wed, 1 Jun 2016 19:38:50 +0000 (12:38 -0700)]
Merge tag 'pinctrl-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here are three pin control fixes for v4.7. Not much, and just driver
fixes:
- add device tree matches to MAINTAINERS
- inversion bug in the Nomadik driver
- dual edge handling bug in the mediatek driver"
* tag 'pinctrl-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: mediatek: fix dual-edge code defect
MAINTAINERS: Add file patterns for pinctrl device tree bindings
pinctrl: nomadik: fix inversion of gpio direction
1) Fix negative error code usage in ATM layer, from Stefan Hajnoczi.
2) If CONFIG_SYSCTL is disabled, the default TTL is not initialized
properly. From Ezequiel Garcia.
3) Missing spinlock init in mvneta driver, from Gregory CLEMENT.
4) Missing unlocks in hwmb error paths, also from Gregory CLEMENT.
5) Fix deadlock on team->lock when propagating features, from Ivan
Vecera.
6) Work around buffer offset hw bug in alx chips, from Feng Tang.
7) Fix double listing of SCTP entries in sctp_diag dumps, from Xin
Long.
8) Various statistics bug fixes in mlx4 from Eric Dumazet.
9) Fix some randconfig build errors wrt fou ipv6 from Arnd Bergmann.
10) All of l2tp was namespace aware, but the ipv6 support code was not
doing so. From Shmulik Ladkani.
11) Handle on-stack hrtimers properly in pktgen, from Guenter Roeck.
12) Propagate MAC changes properly through VLAN devices, from Mike
Manning.
13) Fix memory leak in bnx2x_init_one(), from Vitaly Kuznetsov.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
sfc: Track RPS flow IDs per channel instead of per function
usbnet: smsc95xx: fix link detection for disabled autonegotiation
virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv
bnx2x: avoid leaking memory on bnx2x_init_one() failures
fou: fix IPv6 Kconfig options
openvswitch: update checksum in {push,pop}_mpls
sctp: sctp_diag should dump sctp socket type
net: fec: update dirty_tx even if no skb
vlan: Propagate MAC address to VLANs
atm: iphase: off by one in rx_pkt()
atm: firestream: add more reserved strings
vxlan: Accept user specified MTU value when create new vxlan link
net: pktgen: Call destroy_hrtimer_on_stack()
timer: Export destroy_hrtimer_on_stack()
net: l2tp: Make l2tp_ip6 namespace aware
Documentation: ip-sysctl.txt: clarify secure_redirects
sfc: use flow dissector helpers for aRFS
ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr
net: nps_enet: Disable interrupts before napi reschedule
net/lapb: tuse %*ph to dump buffers
...
Pull sparc fixes from David Miller:
"sparc64 mmu context allocation and trap return bug fixes"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix return from trap window fill crashes.
sparc: Harden signal return frame checks.
sparc64: Take ctx_alloc_lock properly in hugetlb_setup().
Christoph Fritz [Thu, 26 May 2016 02:06:47 +0000 (04:06 +0200)]
usbnet: smsc95xx: fix link detection for disabled autonegotiation
To detect link status up/down for connections where autonegotiation is
explicitly disabled, we don't get an irq but need to poll the status
register for link up/down detection.
This patch adds a workqueue to poll for link status.
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
wangyunjian [Tue, 31 May 2016 03:52:43 +0000 (11:52 +0800)]
virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv
In function virtnet_open() and virtnet_probe(), func try_fill_recv() may
be executed at the same time. VQ in virtqueue_add() has not been protected
well and BUG_ON will be triggered when virito_net.ko being removed.
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vitaly Kuznetsov [Mon, 30 May 2016 13:00:54 +0000 (15:00 +0200)]
bnx2x: avoid leaking memory on bnx2x_init_one() failures
bnx2x_init_bp() allocates memory with bnx2x_alloc_mem_bp() so if we
fail later in bnx2x_init_one() we need to free this memory
with bnx2x_free_mem_bp() to avoid leakages. E.g. I'm observing memory
leaks reported by kmemleak when a failure (unrelated) happens in
bnx2x_vfpf_acquire().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 31 May 2016 20:42:11 +0000 (22:42 +0200)]
fou: fix IPv6 Kconfig options
The Kconfig options I added to work around broken compilation ended
up screwing up things more, as I used the wrong symbol to control
compilation of the file, resulting in IPv6 fou support to never be built
into the kernel.
Changing CONFIG_NET_FOU_IPV6_TUNNELS to CONFIG_IPV6_FOU fixes that
problem, I had renamed the symbol in one location but not the other,
and as the file is never being used by other kernel code, this did not
lead to a build failure that I would have caught.
After that fix, another issue with the same patch becomes obvious, as we
'select INET6_TUNNEL', which is related to IPV6_TUNNEL, but not the same,
and this can still cause the original build failure when IPV6_TUNNEL is
not built-in but IPV6_FOU is. The fix is equally trivial, we just need
to select the right symbol.
I have successfully build 350 randconfig kernels with this patch
and verified that the driver is now being built.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Valentin Rothberg <valentinrothberg@gmail.com> Fixes: fabb13db448e ("fou: add Kconfig options for IPv6 support") Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Mon, 30 May 2016 05:04:25 +0000 (14:04 +0900)]
openvswitch: update checksum in {push,pop}_mpls
In the case of CHECKSUM_COMPLETE the skb checksum should be updated in
{push,pop}_mpls() as they the type in the ethernet header.
As suggested by Pravin Shelar.
Cc: Pravin Shelar <pshelar@nicira.com> Fixes: 25cd9ba0abc0 ("openvswitch: Add basic MPLS support to kernel") Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sun, 29 May 2016 09:42:13 +0000 (17:42 +0800)]
sctp: sctp_diag should dump sctp socket type
Now we cannot distinguish that one sk is a udp or sctp style when
we use ss to dump sctp_info. it's necessary to dump it as well.
For sctp_diag, ss support is not officially available, thus there
are no official users of this yet, so we can add this field in the
middle of sctp_info without breaking user API.
v1->v2:
- move 'sctpi_s_type' field to the end of struct sctp_info, so
that it won't cause incompatibility with applications already
built.
- add __reserved3 in sctp_info to make sure sctp_info is 8-byte
alignment.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Troy Kisky [Fri, 27 May 2016 20:30:40 +0000 (13:30 -0700)]
net: fec: update dirty_tx even if no skb
If dirty_tx isn't updated, then dma_unmap_single
can be called twice.
This fixes a
[ 58.420980] ------------[ cut here ]------------
[ 58.425667] WARNING: CPU: 0 PID: 377 at /home/schurig/d/mkarm/linux-4.5/lib/dma-debug.c:1096 check_unmap+0x9d0/0xab8()
[ 58.436405] fec 2188000.ethernet: DMA-API: device driver tries to free DMA memory it has not allocated [device address=0x0000000000000000] [size=66 bytes]
encountered by Holger
Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Tested-by: <holgerschurig@gmail.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Manning [Fri, 27 May 2016 16:45:07 +0000 (17:45 +0100)]
vlan: Propagate MAC address to VLANs
The MAC address of the physical interface is only copied to the VLAN
when it is first created, resulting in an inconsistency after MAC
address changes of only newly created VLANs having an up-to-date MAC.
The VLANs should continue inheriting the MAC address of the physical
interface until the VLAN MAC address is explicitly set to any value.
This allows IPv6 EUI64 addresses for the VLAN to reflect any changes
to the MAC of the physical interface and thus for DAD to behave as
expected.
Signed-off-by: Mike Manning <mmanning@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 27 May 2016 10:34:35 +0000 (13:34 +0300)]
atm: iphase: off by one in rx_pkt()
The iadev->rx_open[] array holds "iadev->num_vc" pointers (this code
assumes that pointers are 32 bits). So the > here should be >= or else
we could end up reading a garbage pointer from one element beyond the
end of the array.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
There are supposed to be 64 entries in this array and the missing
strings are clearly in the 30 40 range. I added them as reserved 37 to
reserved 40. It's possible that strings are really supposed to be added
in the middle instead of at the end, but this approach is safe, in that
it fixes the bug and doesn't break anything that wasn't already broken.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chen Haiquan [Fri, 27 May 2016 02:49:11 +0000 (10:49 +0800)]
vxlan: Accept user specified MTU value when create new vxlan link
When create a new vxlan link, example:
ip link add vtap mtu 1440 type vxlan vni 1 dev eth0
The argument "mtu" has no effect, because it is not set to conf->mtu. The
default value is used in vxlan_dev_configure function.
This problem was introduced by commit 0dfbdf4102b9 (vxlan: Factor out device
configuration).
Fixes: 0dfbdf4102b9 (vxlan: Factor out device configuration) Signed-off-by: Chen Haiquan <oc@yunify.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Replace explicit computation of vma page count by a call to
vma_pages().
Also, include <linux/mm.h>
Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com> Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Linus Torvalds [Tue, 31 May 2016 16:43:24 +0000 (09:43 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Three bugs fixes and an update for the default configuration"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: fix info leak in do_sigsegv
s390/config: update default configuration
s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop
s390/bpf: reduce maximum program size to 64 KB
Rob Clark [Thu, 31 Mar 2016 20:26:50 +0000 (16:26 -0400)]
dma-buf: headerdoc fixes
Apparently nobody noticed that dma-buf.h wasn't actually pulled into
docbook build. And as a result the headerdoc comments bitrot a bit.
Add missing params/fields.
Signed-off-by: Rob Clark <robdclark@gmail.com> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Linus Torvalds [Tue, 31 May 2016 16:27:00 +0000 (09:27 -0700)]
Merge tag 'gpio-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"A bunch of GPIO fixes for the v4.7 series:
- Drop the lock before reading out the GPIO direction setting in
drivers supporting the .get_direction() callback: some of them may
be slowpath.
- Flush GPIO direction setting before locking a GPIO as an IRQ: some
electronics or other poking around in the registers behind our back
may have happened, so flush the direction status before trying to
lock the line for use by IRQs.
- Bail out silently when asked to perform operations on NULL GPIO
descriptors. That is what all the get_*_optional() is about: we
get optional GPIO handles, if they are not there, we get NULL.
- Handle compatible ioctl() correctly: we need to convert the ioctl()
pointer using compat_ptr() here like everyone else.
- Disable the broken .to_irq() on the LPC32xx platform. The whole
irqchip infrastructure was replaced in the last merge window, and a
new implementation will be needed"
* tag 'gpio-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: drop lock before reading GPIO direction
gpio: bail out silently on NULL descriptors
gpio: handle compatible ioctl() pointers
gpio: flush direction status in gpiochip_lock_as_irq()
gpio: lpc32xx: disable broken to_irq support
hongkun.cao [Sat, 21 May 2016 07:23:39 +0000 (15:23 +0800)]
pinctrl: mediatek: fix dual-edge code defect
When a dual-edge irq is triggered, an incorrect irq will be reported on
condition that the external signal is not stable and this incorrect irq
has been registered.
Correct the register offset.
Andy Shevchenko [Mon, 30 May 2016 14:40:41 +0000 (17:40 +0300)]
lib/uuid: add a test module
It appears that somehow I missed a test of the latest UUID rework which
landed in the kernel. Present a small test module to avoid such cases
in the future.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Walleij [Mon, 30 May 2016 15:11:59 +0000 (17:11 +0200)]
gpio: drop lock before reading GPIO direction
When adding the gpiochip, the GPIO HW drivers' callback get_direction()
could get called in atomic context. Some of the GPIO HW drivers may
sleep when accessing the register.
Move the lock before initializing the descriptors.
Linus Walleij [Mon, 30 May 2016 14:48:39 +0000 (16:48 +0200)]
gpio: bail out silently on NULL descriptors
In fdeb8e1547cb9dd39d5d7223b33f3565cf86c28e
("gpio: reflect base and ngpio into gpio_device")
assumed that GPIO descriptors are either valid or error
pointers, but gpiod_get_[index_]optional() actually return
NULL descriptors and then all subsequent calls should just
bail out.
Cc: stable@vger.kernel.org Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Fixes: fdeb8e1547cb ("gpio: reflect base and ngpio into gpio_device") Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Walleij [Wed, 25 May 2016 08:56:03 +0000 (10:56 +0200)]
gpio: flush direction status in gpiochip_lock_as_irq()
As irqchip and gpiochip functions are orthogonal, the IRQ
set-up or something else can have changed the direction of
the GPIO line from what the GPIO descriptor knows when we
get into gpiochip_lock_as_irq(). Make sure to re-read the
direction setting if we have the .get_direction() callback
enabled for the chip.
Else we get problems like this:
iio iio:device2: interrupts on the rising edge
gpio gpiochip2: (8012e080.gpio): gpiochip_lock_as_irq:
tried to flag a GPIO set as output for IRQ
gpio gpiochip2: (8012e080.gpio): unable to lock HW IRQ 0 for IRQ
genirq: Failed to request resources for l3g4200d-trigger
(irq 111) on irqchip nmk1-32-63
iio iio:device2: failed to request trigger IRQ.
st-gyro-i2c: probe of 2-0068 failed with error -22
Fixes: 72d320006177 ("gpio: set up initial state from .get_direction()") Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Baozeng Ding [Thu, 26 May 2016 13:07:42 +0000 (21:07 +0800)]
ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr
Fix a logic error to avoid potential null pointer dereference.
Signed-off-by: Baozeng Ding <sploving1@gmail.com> Reviewed-by: Stefan Schmidt<stefan@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Kanfi [Thu, 26 May 2016 12:00:06 +0000 (15:00 +0300)]
net: nps_enet: Disable interrupts before napi reschedule
Since NAPI works by shutting down event interrupts when theres
work and turning them on when theres none, the net driver must
make sure that interrupts are disabled when it reschedules polling.
By calling napi_reschedule, the driver switches to polling mode,
therefor there should be no interrupt interference.
Any received packets will be handled in nps_enet_poll by polling the HW
indication of received packet until all packets are handled.
Signed-off-by: Elad Kanfi <eladkan@mellanox.com> Acked-by: Noam Camus <noamca@mellanox.com> Tested-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 26 May 2016 06:46:22 +0000 (09:46 +0300)]
ptp: oops in ptp_ioctl()
If we pass ERR_PTR(-EFAULT) to kfree() then it's going to oops.
Fixes: 2ece068e1b1d ('ptp: use memdup_user().') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 25 May 2016 14:50:46 +0000 (16:50 +0200)]
fou: add Kconfig options for IPv6 support
A previous patch added the fou6.ko module, but that failed to link
in a couple of configurations:
net/built-in.o: In function `ip6_tnl_encap_add_fou_ops':
net/ipv6/fou6.c:88: undefined reference to `ip6_tnl_encap_add_ops'
net/ipv6/fou6.c:94: undefined reference to `ip6_tnl_encap_add_ops'
net/ipv6/fou6.c:97: undefined reference to `ip6_tnl_encap_del_ops'
net/built-in.o: In function `ip6_tnl_encap_del_fou_ops':
net/ipv6/fou6.c:106: undefined reference to `ip6_tnl_encap_del_ops'
net/ipv6/fou6.c:107: undefined reference to `ip6_tnl_encap_del_ops'
If CONFIG_IPV6=m, ip6_tnl_encap_add_ops/ip6_tnl_encap_del_ops
are in a module, but fou6.c can still be built-in, and that
obviously fails to link.
Also, if CONFIG_IPV6=y, but CONFIG_IPV6_TUNNEL=m or
CONFIG_IPV6_TUNNEL=n, the same problem happens for a different
reason.
This adds two new silent Kconfig symbols to work around both
problems:
- CONFIG_IPV6_FOU is now always set to 'm' if either CONFIG_NET_FOU=m
or CONFIG_IPV6=m
- CONFIG_IPV6_FOU_TUNNEL is set implicitly when IPV6_FOU is enabled
and NET_FOU_IP_TUNNELS is also turned out, and it will ensure
that CONFIG_IPV6_TUNNEL is also available.
The options could be made user-visible as well, to give additional
room for configuration, but it seems easier not to bother users
with more choice here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: aa3463d65e7b ("fou: Add encap ops for IPv6 tunnels") Signed-off-by: David S. Miller <davem@davemloft.net>
A recent cleanup moved MAX_IPTUN_ENCAP_OPS along with some other
definitions, but it is now invisible when CONFIG_INET is
not defined, but still referenced from ip6_tunnel.h:
In file included from net/xfrm/xfrm_input.c:17:0:
include/net/ip6_tunnel.h:67:17: error: 'MAX_IPTUN_ENCAP_OPS' undeclared here (not in a function)
ip6tun_encaps[MAX_IPTUN_ENCAP_OPS];
^~~~~~~~~~~~~~~~~~~
This hides the ip6_encap_hlen and ip6_tnl_encap functions inside
of CONFIG_INET so we don't run into the the problem.
Alternatively we could move the macro out of the #ifdef again to
restore the previous behavior
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 55c2bc143224 ("net: Cleanup encap items in ip_tunnels.h") Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 29 May 2016 03:41:12 +0000 (20:41 -0700)]
sparc64: Fix return from trap window fill crashes.
We must handle data access exception as well as memory address unaligned
exceptions from return from trap window fill faults, not just normal
TLB misses.
Otherwise we can get an OOPS that looks like this:
The window trap handlers are slightly clever, the trap table entries for them are
composed of two pieces of code. First comes the code that actually performs
the window fill or spill trap handling, and then there are three instructions at
the end which are for exception processing.
And the way this works is that if any of those memory accesses
generate an exception, the exception handler can revector to one of
those final three branch instructions depending upon which kind of
exception the memory access took. In this way, the fault handler
doesn't have to know if it was a spill or a fill that it's handling
the fault for. It just always branches to the last instruction in
the parent trap's handler.
All window trap handlers are 0x80 aligned, so if we "or" 0x7c into the
trap time program counter, we'll get that final instruction in the
trap handler.
On return from trap, we have to pull the register window in but we do
this by hand instead of just executing a "restore" instruction for
several reasons. The largest being that from Niagara and onward we
simply don't have enough levels in the trap stack to fully resolve all
possible exception cases of a window fault when we are already at
trap level 1 (which we enter to get ready to return from the original
trap).
This is executed inline via the FILL_*_RTRAP handlers. rtrap_64.S's
code branches directly to these to do the window fill by hand if
necessary. Now if you look at them, we'll see at the end:
And oops, all three cases are handled like a fault.
This doesn't work because each of these trap types (data access
exception, memory address unaligned, and faults) store their auxiliary
info in different registers to pass on to the C handler which does the
real work.
So in the case where the stack was unaligned, the unaligned trap
handler sets up the arg registers one way, and then we branched to
the fault handler which expects them setup another way.
So the FAULT_TYPE_* value ends up basically being garbage, and
randomly would generate the backtrace seen above.
Reported-by: Nick Alcock <nix@esperi.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 29 May 2016 20:28:39 +0000 (13:28 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of four fixes noticed in the merge window. The aacraid
one is an optimisation, the mp3sas one fixes a spurious printk, the
sd_check_events one fixes a theoretical race and the failed zero
length commands fixes a bug in our completion/retry routines that has
been causing problems in the field"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
aacraid: do not activate events on non-SRC adapters
mpt3sas: add missing curly braces
sd: get disk reference in sd_check_events()
scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands
George Spelvin [Sun, 29 May 2016 05:26:41 +0000 (01:26 -0400)]
Rename other copy of hash_string to hashlen_string
The original name was simply hash_string(), but that conflicted with a
function with that name in drivers/base/power/trace.c, and I decided
that calling it "hashlen_" was better anyway.
But you have to do it in two places.
[ This caused build errors for architectures that don't define
CONFIG_DCACHE_WORD_ACCESS - Linus ]
Mikulas Patocka [Tue, 24 May 2016 20:49:18 +0000 (22:49 +0200)]
hpfs: implement the show_options method
The HPFS filesystem used generic_show_options to produce string that is
displayed in /proc/mounts. However, there is a problem that the options
may disappear after remount. If we mount the filesystem with option1
and then remount it with option2, /proc/mounts should show both option1
and option2, however it only shows option2 because the whole option
string is replaced with replace_mount_options in hpfs_remount_fs.
To fix this bug, implement the hpfs_show_options function that prints
options that are currently selected.
Mikulas Patocka [Tue, 24 May 2016 20:48:33 +0000 (22:48 +0200)]
affs: fix remount failure when there are no options changed
Commit c8f33d0bec99 ("affs: kstrdup() memory handling") checks if the
kstrdup function returns NULL due to out-of-memory condition.
However, if we are remounting a filesystem with no change to
filesystem-specific options, the parameter data is NULL. In this case,
kstrdup returns NULL (because it was passed NULL parameter), although no
out of memory condition exists. The mount syscall then fails with
ENOMEM.
This patch fixes the bug. We fail with ENOMEM only if data is non-NULL.
The patch also changes the call to replace_mount_options - if we didn't
pass any filesystem-specific options, we don't call
replace_mount_options (thus we don't erase existing reported options).
Mikulas Patocka [Tue, 24 May 2016 20:47:00 +0000 (22:47 +0200)]
hpfs: fix remount failure when there are no options changed
Commit ce657611baf9 ("hpfs: kstrdup() out of memory handling") checks if
the kstrdup function returns NULL due to out-of-memory condition.
However, if we are remounting a filesystem with no change to
filesystem-specific options, the parameter data is NULL. In this case,
kstrdup returns NULL (because it was passed NULL parameter), although no
out of memory condition exists. The mount syscall then fails with
ENOMEM.
This patch fixes the bug. We fail with ENOMEM only if data is non-NULL.
The patch also changes the call to replace_mount_options - if we didn't
pass any filesystem-specific options, we don't call
replace_mount_options (thus we don't erase existing reported options).
Fixes: ce657611baf9 ("hpfs: kstrdup() out of memory handling") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
VDSO:
- Build microMIPS VDSO for microMIPS kernels.
- Fix aliasing warning by building with `-fno-strict-aliasing' for
debugging but also tracing them might result in recursion.
Misc:
- Add missing FROZEN hotplug notifier transitions.
- Fix clk binding example for varioius PIC32 devices.
- Fix cpu interrupt controller node-names in the DT files.
- Fix XPA CPU feature separation.
- Fix write_gc0_* macros when writing zero.
- Add inline asm encoding helpers.
- Add missing VZ accessor microMIPS encodings.
- Fix little endian microMIPS MSA encodings.
- Add 64-bit HTW fields and fix its configuration.
- Fix sigreturn via VDSO on microMIPS kernel.
- Lots of typo fixes.
- Add definitions of SegCtl registers and use them"
Linus Torvalds [Sat, 28 May 2016 23:15:25 +0000 (16:15 -0700)]
Merge branch 'hash' of git://ftp.sciencehorizons.net/linux
Pull string hash improvements from George Spelvin:
"This series does several related things:
- Makes the dcache hash (fs/namei.c) useful for general kernel use.
(Thanks to Bruce for noticing the zero-length corner case)
- Converts the string hashes in <linux/sunrpc/svcauth.h> to use the
above.
- Avoids 64-bit multiplies in hash_64() on 32-bit platforms. Two
32-bit multiplies will do well enough.
- Rids the world of the bad hash multipliers in hash_32.
This finishes the job started in commit 689de1d6ca95 ("Minimal
fix-up of bad hashing behavior of hash_64()")
The vast majority of Linux architectures have hardware support for
32x32-bit multiply and so derive no benefit from "simplified"
multipliers.
The few processors that do not (68000, h8/300 and some models of
Microblaze) have arch-specific implementations added. Those
patches are last in the series.
- Overhauls the dcache hash mixing.
The patch in commit 0fed3ac866ea ("namei: Improve hash mixing if
CONFIG_DCACHE_WORD_ACCESS") was an off-the-cuff suggestion.
Replaced with a much more careful design that's simultaneously
faster and better. (My own invention, as there was noting suitable
in the literature I could find. Comments welcome!)
- Modify the hash_name() loop to skip the initial HASH_MIX(). This
would let us salt the hash if we ever wanted to.
- Sort out partial_name_hash().
The hash function is declared as using a long state, even though
it's truncated to 32 bits at the end and the extra internal state
contributes nothing to the result. And some callers do odd things:
- fs/hfs/string.c only allocates 32 bits of state
- fs/hfsplus/unicode.c uses it to hash 16-bit unicode symbols not bytes
- Modify bytemask_from_count to handle inputs of 1..sizeof(long)
rather than 0..sizeof(long)-1. This would simplify users other
than full_name_hash"
Special thanks to Bruce Fields for testing and finding bugs in v1. (I
learned some humbling lessons about "obviously correct" code.)
On the arch-specific front, the m68k assembly has been tested in a
standalone test harness, I've been in contact with the Microblaze
maintainers who mostly don't care, as the hardware multiplier is never
omitted in real-world applications, and I haven't heard anything from
the H8/300 world"
* 'hash' of git://ftp.sciencehorizons.net/linux:
h8300: Add <asm/hash.h>
microblaze: Add <asm/hash.h>
m68k: Add <asm/hash.h>
<linux/hash.h>: Add support for architecture-specific functions
fs/namei.c: Improve dcache hash function
Eliminate bad hash multipliers from hash_32() and hash_64()
Change hash_64() return value to 32 bits
<linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
fs/namei.c: Add hashlen_string() function
Pull out string hash to <linux/stringhash.h>
George Spelvin [Wed, 25 May 2016 18:19:49 +0000 (14:19 -0400)]
h8300: Add <asm/hash.h>
This will improve the performance of hash_32() and hash_64(), but due
to complete lack of multi-bit shift instructions on H8, performance will
still be bad in surrounding code.
Designing H8-specific hash algorithms to work around that is a separate
project. (But if the maintainers would like to get in touch...)
Signed-off-by: George Spelvin <linux@sciencehorizons.net> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp
George Spelvin [Wed, 25 May 2016 15:06:09 +0000 (11:06 -0400)]
microblaze: Add <asm/hash.h>
Microblaze is an FPGA soft core that can be configured various ways.
If it is configured without a multiplier, the standard __hash_32()
will require a call to __mulsi3, which is a slow software loop.
Instead, use a shift-and-add sequence for the constant multiply.
GCC knows how to do this, but it's not as clever as some.
Signed-off-by: George Spelvin <linux@sciencehorizons.net> Cc: Alistair Francis <alistair.francis@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com>
George Spelvin [Fri, 27 May 2016 02:11:51 +0000 (22:11 -0400)]
<linux/hash.h>: Add support for architecture-specific functions
This is just the infrastructure; there are no users yet.
This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
the existence of <asm/hash.h>.
That file may define its own versions of various functions, and define
HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.
Included is a self-test (in lib/test_hash.c) that verifies the basics.
It is NOT in general required that the arch-specific functions compute
the same thing as the generic, but if a HAVE_* symbol is defined with
the value 1, then equality is tested.
Signed-off-by: George Spelvin <linux@sciencehorizons.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Philippe De Muyter <phdm@macq.eu> Cc: linux-m68k@lists.linux-m68k.org Cc: Alistair Francis <alistai@xilinx.com> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: uclinux-h8-devel@lists.sourceforge.jp
George Spelvin [Mon, 23 May 2016 11:43:58 +0000 (07:43 -0400)]
fs/namei.c: Improve dcache hash function
Patch 0fed3ac866 improved the hash mixing, but the function is slower
than necessary; there's a 7-instruction dependency chain (10 on x86)
each loop iteration.
Word-at-a-time access is a very tight loop (which is good, because
link_path_walk() is one of the hottest code paths in the entire kernel),
and the hash mixing function must not have a longer latency to avoid
slowing it down.
There do not appear to be any published fast hash functions that:
1) Operate on the input a word at a time, and
2) Don't need to know the length of the input beforehand, and
3) Have a single iterated mixing function, not needing conditional
branches or unrolling to distinguish different loop iterations.
One of the algorithms which comes closest is Yann Collet's xxHash, but
that's two dependent multiplies per word, which is too much.
The key insights in this design are:
1) Barring expensive ops like multiplies, to diffuse one input bit
across 64 bits of hash state takes at least log2(64) = 6 sequentially
dependent instructions. That is more cycles than we'd like.
2) An operation like "hash ^= hash << 13" requires a second temporary
register anyway, and on a 2-operand machine like x86, it's three
instructions.
3) A better use of a second register is to hold a two-word hash state.
With careful design, no temporaries are needed at all, so it doesn't
increase register pressure. And this gets rid of register copying
on 2-operand machines, so the code is smaller and faster.
4) Using two words of state weakens the requirement for one-round mixing;
we now have two rounds of mixing before cancellation is possible.
5) A two-word hash state also allows operations on both halves to be
done in parallel, so on a superscalar processor we get more mixing
in fewer cycles.
I ended up using a mixing function inspired by the ChaCha and Speck
round functions. It is 6 simple instructions and 3 cycles per iteration
(assuming multiply by 9 can be done by an "lea" instruction):
x ^= *input++;
y ^= x; x = ROL(x, K1);
x += y; y = ROL(y, K2);
y *= 9;
Not only is this reversible, two consecutive rounds are reversible:
if you are given the initial and final states, but not the intermediate
state, it is possible to compute both input words. This means that at
least 3 words of input are required to create a collision.
(It also has the property, used by hash_name() to avoid a branch, that
it hashes all-zero to all-zero.)
The rotate constants K1 and K2 were found by experiment. The search took
a sample of random initial states (I used 1023) and considered the effect
of flipping each of the 64 input bits on each of the 128 output bits two
rounds later. Each of the 8192 pairs can be considered a biased coin, and
adding up the Shannon entropy of all of them produces a score.
The best-scoring shifts also did well in other tests (flipping bits in y,
trying 3 or 4 rounds of mixing, flipping all 64*63/2 pairs of input bits),
so the choice was made with the additional constraint that the sum of the
shifts is odd and not too close to the word size.
The final state is then folded into a 32-bit hash value by a less carefully
optimized multiply-based scheme. This also has to be fast, as pathname
components tend to be short (the most common case is one iteration!), but
there's some room for latency, as there is a fair bit of intervening logic
before the hash value is used for anything.
(Performance verified with "bonnie++ -s 0 -n 1536:-2" on tmpfs. I need
a better benchmark; the numbers seem to show a slight dip in performance
between 4.6.0 and this patch, but they're too noisy to quote.)
Special thanks to Bruce fields for diligent testing which uncovered a
nasty fencepost error in an earlier version of this patch.
[checkpatch.pl formatting complaints noted and respectfully disagreed with.]
Signed-off-by: George Spelvin <linux@sciencehorizons.net> Tested-by: J. Bruce Fields <bfields@redhat.com>
George Spelvin [Fri, 27 May 2016 03:00:23 +0000 (23:00 -0400)]
Eliminate bad hash multipliers from hash_32() and hash_64()
The "simplified" prime multipliers made very bad hash functions, so get rid
of them. This completes the work of 689de1d6ca.
To avoid the inefficiency which was the motivation for the "simplified"
multipliers, hash_64() on 32-bit systems is changed to use a different
algorithm. It makes two calls to hash_32() instead.
drivers/media/usb/dvb-usb-v2/af9015.c uses the old GOLDEN_RATIO_PRIME_32
for some horrible reason, so it inherits a copy of the old definition.
Signed-off-by: George Spelvin <linux@sciencehorizons.net> Cc: Antti Palosaari <crope@iki.fi> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
George Spelvin [Fri, 27 May 2016 02:22:01 +0000 (22:22 -0400)]
Change hash_64() return value to 32 bits
That's all that's ever asked for, and it makes the return
type of hash_long() consistent.
It also allows (upcoming patch) an optimized implementation
of hash_64 on 32-bit machines.
I tried adding a BUILD_BUG_ON to ensure the number of bits requested
was never more than 32 (most callers use a compile-time constant), but
adding <linux/bug.h> to <linux/hash.h> breaks the tools/perf compiler
unless tools/perf/MANIFEST is updated, and understanding that code base
well enough to update it is too much trouble. I did the rest of an
allyesconfig build with such a check, and nothing tripped.
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
George Spelvin [Fri, 20 May 2016 17:31:33 +0000 (13:31 -0400)]
<linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string()
Finally, the first use of previous two patches: eliminate the
separate ad-hoc string hash functions in the sunrpc code.
Now hash_str() is a wrapper around hash_string(), and hash_mem() is
likewise a wrapper around full_name_hash().
Note that sunrpc code *does* call hash_mem() with a zero length, which
is why the previous patch needed to handle that in full_name_hash().
(Thanks, Bruce, for finding that!)
This also eliminates the only caller of hash_long which asks for
more than 32 bits of output.
The comment about the quality of hashlen_string() and full_name_hash()
is jumping the gun by a few patches; they aren't very impressive now,
but will be improved greatly later in the series.
Signed-off-by: George Spelvin <linux@sciencehorizons.net> Tested-by: J. Bruce Fields <bfields@redhat.com> Acked-by: J. Bruce Fields <bfields@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: linux-nfs@vger.kernel.org
George Spelvin [Fri, 20 May 2016 12:41:37 +0000 (08:41 -0400)]
fs/namei.c: Add hashlen_string() function
We'd like to make more use of the highly-optimized dcache hash functions
throughout the kernel, rather than have every subsystem create its own,
and a function that hashes basic null-terminated strings is required
for that.
(The name is to emphasize that it returns both hash and length.)
It's actually useful in the dcache itself, specifically d_alloc_name().
Other uses in the next patch.
full_name_hash() is also tweaked to make it more generally useful:
1) Take a "char *" rather than "unsigned char *" argument, to
be consistent with hash_name().
2) Handle zero-length inputs. If we want more callers, we don't want
to make them worry about corner cases.
Signed-off-by: George Spelvin <linux@sciencehorizons.net>
Linus Torvalds [Sat, 28 May 2016 19:38:50 +0000 (12:38 -0700)]
Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"A fix for a regression introduced yesterday.
The regression didn't show up here locally because I did not have
PAGE_POISONING enabled. And buildbots discovered this only after it
hit your tree. Thanks to Dan for the quick response"
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: dev: use after free in detach
Linus Torvalds [Sat, 28 May 2016 19:32:01 +0000 (12:32 -0700)]
Merge tag 'chrome-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform
Pull chrome platform updates from Olof Johansson
"A handful of Chrome driver and binding changes this merge window:
- a few patches to fix probing and configuration of pstore
- a few patches adding Elan touchpad registration on a few devices
- EC changes: a security fix dealing with max message sizes and
addition of compat_ioctl support.
- keyboard backlight control support
There was also an accidential duplicate registration of trackpads on
'Leon', which was reverted just recently"
* tag 'chrome-platform' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform:
Revert "platform/chrome: chromeos_laptop: Add Leon Touch"
platform/chrome: chromeos_laptop - Add Elan touchpad for Wolf
platform/chrome: chromeos_laptop - Add elan trackpad option for C720
platform/chrome: cros_ec_dev - Populate compat_ioctl
platform/chrome: cros_ec_lightbar - use name instead of ID to hide lightbar attributes
platform/chrome: cros_ec_dev - Fix security issue
platform/chrome: Add Chrome OS keyboard backlight LEDs support
platform/chrome: use to_platform_device()
platform/chrome: pstore: Move to larger record size.
platform/chrome: pstore: probe for ramoops buffer using acpi
platform/chrome: chromeos_laptop: Add Leon Touch
Linus Torvalds [Sat, 28 May 2016 19:23:12 +0000 (12:23 -0700)]
Merge tag 'sound-4.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull more sound updates from Takashi Iwai:
"This is the second update round for 4.7-rc1. Most of changes are
about the pending ASoC updates and fixes, including a few new drivers.
Below are some highlights:
ASoC:
- New drivers for MAX98371 and TAS5720
- SPI support for TLV320AIC32x4, along with the module split
- TDM support for STI Uniperf IPs
- Remaining topology API fixes / updates
HDA:
- A couple of Dell quirks and new Realtek codec support"
* tag 'sound-4.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (63 commits)
ALSA: hda - Fix headset mic detection problem for one Dell machine
spi: spi-ep93xx: Fix the PTR_ERR() argument
ALSA: hda/realtek - Add support for ALC295/ALC3254
ASoC: kirkwood: fix build failure
ALSA: hda - Fix headphone noise on Dell XPS 13 9360
ASoC: ak4642: Enable cache usage to fix crashes on resume
ASoC: twl6040: Disconnect AUX output pads on digital mute
ASoC: tlv320aic32x4: Properly implement the positive and negative pins into the mixers
rcar: src: skip disabled-SRC nodes
ASoC: max98371 Remove duplicate entry in max98371_reg
ASoC: twl6040: Select LPPLL during standby
ASoC: rsnd: don't use prohibited number to PDMACHCRn.SRS
ASoC: simple-card: Add pm callbacks to platform driver
ASoC: pxa: Fix module autoload for platform drivers
ASoC: topology: Fix memory leak in widget creation
ASoC: Add max98371 codec driver
ASoC: rsnd: count .probe/.remove for rsnd_mod_call()
ASoC: topology: Check size mismatch of ABI objects before parsing
ASoC: topology: Check failure to create a widget
ASoC: add support for TAS5720 digital amplifier
...
Linus Torvalds [Sat, 28 May 2016 19:04:17 +0000 (12:04 -0700)]
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
"Here are the outstanding target pending updates for v4.7-rc1.
The highlights this round include:
- Allow external PR/ALUA metadata path be defined at runtime via top
level configfs attribute (Lee)
- Fix target session shutdown bug for ib_srpt multi-channel (hch)
- Make TFO close_session() and shutdown_session() optional (hch)
- Drop se_sess->sess_kref + convert tcm_qla2xxx to internal kref
(hch)
- Add tcm_qla2xxx endpoint attribute for basic FC jammer (Laurence)
- Refactor iscsi-target RX/TX PDU encode/decode into common code
(Varun)
- Extend iscsit_transport with xmit_pdu, release_cmd, get_rx_pdu,
validate_parameters, and get_r2t_ttt for generic ISO offload
(Varun)
- Initial merge of cxgb iscsi-segment offload target driver (Varun)
The bulk of the changes are Chelsio's new driver, along with a number
of iscsi-target common code improvements made by Varun + Co along the
way"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (29 commits)
iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race
cxgbit: Use type ISCSI_CXGBIT + cxgbit tpg_np attribute
iscsi-target: Convert transport drivers to signal rdma_shutdown
iscsi-target: Make iscsi_tpg_np driver show/store use generic code
tcm_qla2xxx Add SCSI command jammer/discard capability
iscsi-target: graceful disconnect on invalid mapping to iovec
target: need_to_release is always false, remove redundant check and kfree
target: remove sess_kref and ->shutdown_session
iscsi-target: remove usage of ->shutdown_session
tcm_qla2xxx: introduce a private sess_kref
target: make close_session optional
target: make ->shutdown_session optional
target: remove acl_stop
target: consolidate and fix session shutdown
cxgbit: add files for cxgbit.ko
iscsi-target: export symbols
iscsi-target: call complete on conn_logout_comp
iscsi-target: clear tx_thread_active
iscsi-target: add new offload transport type
iscsi-target: use conn_transport->transport_type in text rsp
...
Linus Torvalds [Sat, 28 May 2016 18:04:16 +0000 (11:04 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull more rdma updates from Doug Ledford:
"This is the second group of code for the 4.7 merge window. It looks
large, but only in one sense. I'll get to that in a minute. The list
of changes here breaks down as follows:
- Dynamic counter infrastructure in the IB drivers
This is a sysfs based code to allow free form access to the
hardware counters RDMA devices might support so drivers don't need
to code this up repeatedly themselves
- SendOnlyFullMember multicast support
- IB router support
- A couple misc fixes
- The big item on the list: hfi1 driver updates, plus moving the hfi1
driver out of staging
There was a group of 15 patches in the hfi1 list that I thought I had
in the first pull request but they weren't. So that added to the
length of the hfi1 section here.
As far as these go, everything but the hfi1 is pretty straight
forward.
The hfi1 is, if you recall, the driver that Al had complaints about
how it used the write/writev interfaces in an overloaded fashion. The
write portion of their interface behaved like the write handler in the
IB stack proper and did bi-directional communications. The writev
interface, on the other hand, only accepts SDMA request structures.
The completions for those structures are sent back via an entirely
different event mechanism.
With the security patch, we put security checks on the write
interface, however, we also knew they would be going away soon. Now,
we've converted the write handler in the hfi1 driver to use ioctls
from the IB reserved magic area for its bidirectional communications.
With that change, Intel has addressed all of the items originally on
their TODO when they went into staging (as well as many items added to
the list later).
As such, I moved them out, and since they were the last item in the
staging/rdma directory, and I don't have immediate plans to use the
staging area again, I removed the staging/rdma area.
Because of the move out of staging, as well as a series of 5 patches
in the hfi1 driver that removed code people thought should be done in
a different way and was optional to begin with (a snoop debug
interface, an eeprom driver for an eeprom connected directory to their
hfi1 chip and not via an i2c bus, and a few other things like that),
the line count, especially the removal count, is high"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (56 commits)
staging/rdma: Remove the entire rdma subdirectory of staging
IB/core: Make device counter infrastructure dynamic
IB/hfi1: Fix pio map initialization
IB/hfi1: Correct 8051 link parameter settings
IB/hfi1: Update pkey table properly after link down or FM start
IB/rdamvt: Fix rdmavt s_ack_queue sizing
IB/rdmavt: Max atomic value should be a u8
IB/hfi1: Fix hard lockup due to not using save/restore spin lock
IB/hfi1: Add tracing support for send with invalidate opcode
IB/hfi1, qib: Add ieth to the packet header definitions
IB/hfi1: Move driver out of staging
IB/hfi1: Do not free hfi1 cdev parent structure early
IB/hfi1: Add trace message in user IOCTL handling
IB/hfi1: Remove write(), use ioctl() for user cmds
IB/hfi1: Add ioctl() interface for user commands
IB/hfi1: Remove unused user command
IB/hfi1: Remove snoop/diag interface
IB/hfi1: Remove EPROM functionality from data device
IB/hfi1: Remove UI char device
IB/hfi1: Remove multiple device cdev
...
Board "Leon" is otherwise known as "Toshiba CB35" and we already have
the entry that supports that board as of this commit : 963cb6f platform/chrome: chromeos_laptop - Add Toshiba CB35 Touch
Remove this duplicate.
Signed-off-by: Benson Leung <bleung@chromium.org> Signed-off-by: Olof Johansson <olof@lixom.net>
Dan Carpenter [Sat, 28 May 2016 05:01:46 +0000 (08:01 +0300)]
i2c: dev: use after free in detach
The call to put_i2c_dev() frees "i2c_dev" so there is a use after
free when we call cdev_del(&i2c_dev->cdev).
Fixes: d6760b14d4a1 ('i2c: dev: switch from register_chrdev to cdev API') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
The corresponding FROZEN hotplug notifier transitions used on
suspend/resume are ignored. Therefore the switch case action argument
is masked with the frozen hotplug notifier transition mask.
James Hogan [Tue, 24 May 2016 08:35:11 +0000 (09:35 +0100)]
MIPS: Build microMIPS VDSO for microMIPS kernels
MicroMIPS kernels may be expected to run on microMIPS only cores which
don't support the normal MIPS instruction set, so be sure to pass the
-mmicromips flag through to the VDSO cflags.
Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 4.4.x-
Patchwork: https://patchwork.linux-mips.org/patch/13349/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
James Hogan [Tue, 24 May 2016 08:35:10 +0000 (09:35 +0100)]
MIPS: Fix sigreturn via VDSO on microMIPS kernel
In microMIPS kernels, handle_signal() sets the isa16 mode bit in the
vdso address so that the sigreturn trampolines (which are offset from
the VDSO) get executed as microMIPS.
However commit ebb5e78cc634 ("MIPS: Initial implementation of a VDSO")
changed the offsets to come from the VDSO image, which already have the
isa16 mode bit set correctly since they're extracted from the VDSO
shared library symbol table.
Drop the isa16 mode bit handling from handle_signal() to fix sigreturn
for cores which support both microMIPS and normal MIPS. This doesn't fix
microMIPS only cores, since the VDSO is still built for normal MIPS, but
thats a separate problem.
Fixes: ebb5e78cc634 ("MIPS: Initial implementation of a VDSO") Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 4.4.x-
Patchwork: https://patchwork.linux-mips.org/patch/13348/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>