The problem is almost exactly the same as the one fixed by feaff8e5b2cf.
We must take a reference to the struct file when running the LOCKU
compound to prevent the final fput from running until the operation is
complete.
Reported-by: Jean Spector <jean@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NeilBrown [Mon, 29 Jun 2015 04:28:54 +0000 (14:28 +1000)]
NFSv4: When returning a delegation, don't reclaim an incompatible open mode.
It is possible to have an active open with one mode, and a delegation
for the same file with a different mode.
In particular, a WR_ONLY open and an RD_ONLY delegation.
This happens if a WR_ONLY open is followed by a RD_ONLY open which
provides a delegation, but is then close.
When returning the delegation, we currently try to claim opens for
every open type (n_rdwr, n_rdonly, n_wronly). As there is no harm
in claiming an open for a mode that we already have, this is often
simplest.
However if the delegation only provides a subset of the modes that we
currently have open, this will produce an error from the server.
So when claiming open modes prior to returning a delegation, skip the
open request if the mode is not covered by the delegation - the open_stateid
must already cover that mode, so there is nothing to do.
Trond Myklebust [Fri, 26 Jun 2015 18:51:32 +0000 (14:51 -0400)]
pNFS/flexfiles: Turn off layoutcommit for servers that don't need it
This patch ensures that we record the value of 'ffl_flags' from
the layout, and then checks for the presence of the
FF_FLAGS_NO_LAYOUTCOMMIT flag before deciding whether or not to
call pnfs_set_layoutcommit().
The effect is that servers now can decide whether or not they want
the client to call layoutcommit before returning a writeable layout.
Trond Myklebust [Fri, 26 Jun 2015 18:01:59 +0000 (14:01 -0400)]
Merge branch 'layoutstats'
* layoutstats:
pnfs/flexfiles: protect ktime manipulation with mirror lock
nfs: provide pnfs_report_layoutstat when NFS42 is disabled
pnfs/flexfiles: report layoutstat regularly
nfs42: serialize LAYOUTSTATS calls of the same file
pnfs/flexfiles: encode LAYOUTSTATS flexfiles specific data
pnfs/flexfiles: add ff_layout_prepare_layoutstats
pNFS/flexfiles: track when layout is first used
pNFS/flexfiles: add layoutstats tracking
pNFS/flexfiles: Remove unused struct members user_name, group_name
pnfs: add pnfs_report_layoutstat helper function
pNFS: fill in nfs42_layoutstat_ops
NFSv.2/pnfs Add a LAYOUTSTATS rpc function
Peng Tao [Thu, 25 Jun 2015 10:19:32 +0000 (18:19 +0800)]
nfs: provide pnfs_report_layoutstat when NFS42 is disabled
kbuild test robot reported:
fs/built-in.o: In function `pnfs_report_layoutstat':
>> (.text+0x151a1c): undefined reference to `nfs42_proc_layoutstats_generic'
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Commit 9597c13b forbade opens with O_APPEND|O_DIRECT for NFSv4:
nfs: verify open flags before allowing an atomic open
Currently, you can open a NFSv4 file with O_APPEND|O_DIRECT, but cannot
fcntl(F_SETFL,...) with those flags. This flag combination is explicitly
forbidden on NFSv3 opens, and it seems like it should also be on NFSv4.
However, you can still open a file with O_DIRECT|O_APPEND if there exists a
cached dentry for the file because nfs4_file_open() is used instead of
nfs_atomic_open() and the check is bypassed. Add the check in
nfs4_file_open() as well.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 24 Jun 2015 16:10:24 +0000 (12:10 -0400)]
nfs: always update creds in mirror, even when we have an already connected ds
A ds can be associated with more than one mirror, but we currently skip
setting a mirror's credentials if we find that it's already set up with
a connected client.
The upshot is that we can end up sending DS writes with MDS credentials
instead of properly setting them up. Fix nfs4_ff_layout_prepare_ds to
always verify that the mirror's credentials are set up, even when we
have a DS that's already connected.
Reported-by: Tom Haynes <thomas.haynes@primarydata.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 24 Jun 2015 16:10:23 +0000 (12:10 -0400)]
nfs: fix potential credential leak in ff_layout_update_mirror_cred
If we have two tasks racing to update a mirror's credentials, then they
can end up leaking one (or more) sets of credentials. The first task
will set mirror->cred and then the second task will just overwrite it.
Use a cmpxchg to ensure that the creds are only set once. If we get to
the point where we would set mirror->cred and find that they're already
set, then we just release the creds that were just found.
Peng Tao [Tue, 23 Jun 2015 11:52:01 +0000 (19:52 +0800)]
pnfs/flexfiles: add ff_layout_prepare_layoutstats
It fills in the generic part of LAYOUTSTATS call. One thing to note
is that we don't really track if IO is continuous or not. So just fake
to use the completed bytes for it.
Still missing flexfiles specific part, which will be included in the next patch.
Reviewed-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Mon, 22 Jun 2015 13:55:08 +0000 (09:55 -0400)]
Merge branch 'bugfixes'
* bugfixes:
NFS: Ensure we set NFS_CONTEXT_RESEND_WRITES when requeuing writes
pNFS: Fix a memory leak when attempted pnfs fails
NFS: Ensure that we update the sequence id under the slot table lock
nfs: Initialize cb_sequenceres information before validate_seqid()
nfs: Only update callback sequnce id when CB_SEQUENCE success
NFSv4: nfs4_handle_delegation_recall_error should ignore EAGAIN
Trond Myklebust [Sat, 20 Jun 2015 19:31:54 +0000 (15:31 -0400)]
SUNRPC: Set the TCP user timeout option on client sockets
Use the TCP_USER_TIMEOUT socket option to advertise to the server
how long we will keep the connection open if there is unacknowledged
data. See RFC5482.
Trond Myklebust [Fri, 19 Jun 2015 20:17:57 +0000 (16:17 -0400)]
SUNRPC: Ensure we release the TCP socket once it has been closed
This fixes a regression introduced by commit caf4ccd4e88cf2 ("SUNRPC:
Make xs_tcp_close() do a socket shutdown rather than a sock_release").
Prior to that commit, the autoclose feature would ensure that an
idle connection would result in the socket being both disconnected and
released, whereas now only gets disconnected.
While the current behaviour is harmless, it does leave the port bound
until either RPC traffic resumes or the RPC client is shut down.
Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Trond Myklebust [Fri, 19 Jun 2015 17:04:13 +0000 (13:04 -0400)]
SUNRPC: Handle connection issues correctly on the back channel
If the back channel is disconnected, we can and should just fail the
transmission. The expectation is that the NFSv4.1 server will always
retransmit any outstanding callbacks once the connection is
re-established.
Trond Myklebust [Wed, 17 Jun 2015 23:56:22 +0000 (19:56 -0400)]
NFS: Ensure we set NFS_CONTEXT_RESEND_WRITES when requeuing writes
If a write attempt fails, and the write is queued up for resending to
the server, as opposed to being dropped, then we need to set the
appropriate flag so that nfs_file_fsync() does the right thing.
Trond Myklebust [Wed, 17 Jun 2015 23:41:51 +0000 (19:41 -0400)]
pNFS: Fix a memory leak when attempted pnfs fails
pnfs_do_write() expects the call to pnfs_write_through_mds() to free the
pgio header and to release the layout segment before exiting. The problem
is that nfs_pgio_data_destroy() doesn't actually do this; it only frees
the memory allocated by nfs_generic_pgio().
Ditto for pnfs_do_read()...
Fix in both cases is to add a call to hdr->release(hdr).
Trond Myklebust [Tue, 16 Jun 2015 15:36:47 +0000 (11:36 -0400)]
Merge tag 'nfs-rdma-for-4.2' of git://git.linux-nfs.org/projects/anna/nfs-rdma
NFS: NFSoRDMA Client Changes
These patches continue to build up for improving the rsize and wsize that the
NFS client uses when talking over RDMA. In addition, these patches also add
in scalability enhancements and other bugfixes.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* tag 'nfs-rdma-for-4.2' of git://git.linux-nfs.org/projects/anna/nfs-rdma: (142 commits)
xprtrdma: Reduce per-transport MR allocation
xprtrdma: Stack relief in fmr_op_map()
xprtrdma: Split rb_lock
xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy
xprtrdma: Remove ->ro_reset
xprtrdma: Remove unused LOCAL_INV recovery logic
xprtrdma: Acquire MRs in rpcrdma_register_external()
xprtrdma: Introduce an FRMR recovery workqueue
xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external()
xprtrdma: Introduce helpers for allocating MWs
xprtrdma: Use ib_device pointer safely
xprtrdma: Remove rr_func
xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt
xprtrdma: Warn when there are orphaned IB objects
...
Client can receives stateid-type error (eg., BAD_STATEID) on SETATTR when
delegation stateid was used. When no open state exists, in case of application
calling truncate() on the file, client has no state to recover and fails with
EIO.
Instead, upon such error, return the bad delegation and then resend the
SETATTR with a zero stateid.
Signed-off: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Tue, 9 Jun 2015 23:44:00 +0000 (19:44 -0400)]
nfs: make nfs4_init_uniform_client_string use a dynamically allocated buffer
Change the uniform client string generator to dynamically allocate the
NFSv4 client name string buffer. With this patch, we can eliminate the
buffers that are embedded within the "args" structs and simply use the
name string that is hanging off the client.
This uniform string case is a little simpler than the nonuniform since
we don't need to deal with RCU, but we do have two different cases,
depending on whether there is a uniquifier or not.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Tue, 9 Jun 2015 23:43:59 +0000 (19:43 -0400)]
nfs: make nfs4_init_nonuniform_client_string use a dynamically allocated buffer
The way the *_client_string functions work is a little goofy. They build
the string in an on-stack buffer and then use kstrdup to copy it. This
is not only stack-heavy but artificially limits the size of the client
name string. Change it so that we determine the length of the string,
allocate it and then scnprintf into it.
Since the contents of the nonuniform string depend on rcu-managed data
structures, it's possible that they'll change between when we allocate
the string and when we go to fill it. If that happens, free the string,
recalculate the length and try again. If it the mismatch isn't resolved
on the second try then just give up and return -EINVAL.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Tue, 9 Jun 2015 23:43:58 +0000 (19:43 -0400)]
nfs: update maxsz values for SETCLIENTID and EXCHANGE_ID
The spec allows for up to NFS4_OPAQUE_LIMIT (1k). While we'll almost
certainly never use that much, these ops are generally the only ones
in the compound so we might as well allow for them to be that large.
Also, the existing code didn't add in a word for the opaque length
field for either name string. Fix that while we're in there.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Tue, 9 Jun 2015 23:43:57 +0000 (19:43 -0400)]
nfs: convert setclientid and exchange_id encoders to use clp->cl_owner_id
...instead of buffers that are part of their arg structs. We already
hold a reference to the client, so we might as well use the allocated
buffer. In the event that we can't allocate the clp->cl_owner_id, then
just return -ENOMEM.
Note too that we switch from a GFP_KERNEL allocation here to GFP_NOFS.
It's possible we could end up trying to do a SETCLIENTID or EXCHANGE_ID
in order to reclaim some memory, and the GFP_KERNEL allocations in the
existing code could cause recursion back into NFS reclaim.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Neil Brown [Mon, 15 Jun 2015 05:55:30 +0000 (15:55 +1000)]
SUNRPC: never enqueue a ->rq_cong request on ->sending
If the sending queue has a task without ->rq_cong set at the front,
and then a number of tasks with ->rq_cong set such that they use
the entire congestion window, then the queue deadlocks. The first
entry cannot be processed until later entries complete.
This scenario has been seen with a client using UDP to access a server,
and the network connection breaking for a period of time - it doesn't
recover.
It never really makes sense for an ->rq_cong request to be on the ->sending
queue, but it can happen when a request is being retried, and finds
the transport if locked (XPRT_LOCKED). In this case we simple call
__xprt_put_cong() and the deadlock goes away.
Chuck Lever [Tue, 26 May 2015 15:53:13 +0000 (11:53 -0400)]
xprtrdma: Split rb_lock
/proc/lock_stat showed contention between rpcrdma_buffer_get/put
and the MR allocation functions during I/O intensive workloads.
Now that MRs are no longer allocated in rpcrdma_buffer_get(),
there's no reason the rb_mws list has to be managed using the
same lock as the send/receive buffers. Split that lock. The
new lock does not need to disable interrupts because buffer
get/put is never called in an interrupt context.
struct rpcrdma_buffer is re-arranged to ensure rb_mwlock and rb_mws
are always in a different cacheline than rb_lock and the buffer
pointers.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 26 May 2015 15:52:54 +0000 (11:52 -0400)]
xprtrdma: Remove ->ro_reset
An RPC can exit at any time. When it does so, xprt_rdma_free() is
called, and it calls ->op_unmap().
If ->ro_reset() is running due to a transport disconnect, the two
methods can race while processing the same rpcrdma_mw. The results
are unpredictable.
Because of this, in previous patches I've altered ->ro_map() to
handle MR reset. ->ro_reset() is no longer needed and can be
removed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 26 May 2015 15:52:35 +0000 (11:52 -0400)]
xprtrdma: Acquire MRs in rpcrdma_register_external()
Acquiring 64 MRs in rpcrdma_buffer_get() while holding the buffer
pool lock is expensive, and unnecessary because most modern adapters
can transfer 100s of KBs of payload using just a single MR.
Instead, acquire MRs one-at-a-time as chunks are registered, and
return them to rb_mws immediately during deregistration.
Note: commit 539431a437d2 ("xprtrdma: Don't invalidate FRMRs if
registration fails") is reverted: There is now a valid case where
registration can fail (with -ENOMEM) but the QP is still in RTS.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 26 May 2015 15:52:25 +0000 (11:52 -0400)]
xprtrdma: Introduce an FRMR recovery workqueue
After a transport disconnect, FRMRs can be left in an undetermined
state. In particular, the MR's rkey is no good.
Currently, FRMRs are fixed up by the transport connect worker, but
that can race with ->ro_unmap if an RPC happens to exit while the
transport connect worker is running.
A better way of dealing with broken FRMRs is to detect them before
they are re-used by ->ro_map. Such FRMRs are either already invalid
or are owned by the sending RPC, and thus no race with ->ro_unmap
is possible.
Introduce a mechanism for handing broken FRMRs to a workqueue to be
reset in a context that is appropriate for allocating resources
(ie. an ib_alloc_fast_reg_mr() API call).
This mechanism is not yet used, but will be in subsequent patches.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 26 May 2015 15:52:16 +0000 (11:52 -0400)]
xprtrdma: Acquire FMRs in rpcrdma_fmr_register_external()
Acquiring 64 FMRs in rpcrdma_buffer_get() while holding the buffer
pool lock is expensive, and unnecessary because FMR mode can
transfer up to a 1MB payload using just a single ib_fmr.
Instead, acquire ib_fmrs one-at-a-time as chunks are registered, and
return them to rb_mws immediately during deregistration.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 26 May 2015 15:51:56 +0000 (11:51 -0400)]
xprtrdma: Use ib_device pointer safely
The connect worker can replace ri_id, but prevents ri_id->device
from changing during the lifetime of a transport instance. The old
ID is kept around until a new ID is created and the ->device is
confirmed to be the same.
Cache a copy of ri_id->device in rpcrdma_ia and in rpcrdma_rep.
The cached copy can be used safely in code that does not serialize
with the connect worker.
Other code can use it to save an extra address generation (one
pointer dereference instead of two).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Tue, 26 May 2015 15:51:27 +0000 (11:51 -0400)]
xprtrdma: Warn when there are orphaned IB objects
WARN during transport destruction if ib_dealloc_pd() fails. This is
a sign that xprtrdma orphaned one or more RDMA API objects at some
point, which can pin lower layer kernel modules and cause shutdown
to hang.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-By: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Fri, 12 Jun 2015 01:13:58 +0000 (21:13 -0400)]
NFS: Ensure that we update the sequence id under the slot table lock
Fix a callback slot table regression.
Fixes: e937ee714b2d ("nfs: Only update callback sequnce id when CB_SEQUENCE success") Cc: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Thu, 4 Jun 2015 22:40:13 +0000 (18:40 -0400)]
nfs: deny backchannel RPCs with an incorrect authflavor instead of dropping them
A drop should really only be done when the frame is malformed or we have
reason to think that there is some sort of DoS going on. When we get an
RPC with bad auth, we should send back an error instead.
Cc: Andy Adamson <William.Adamson@netapp.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Chuck Lever [Thu, 11 Jun 2015 17:47:10 +0000 (13:47 -0400)]
SUNRPC: Address kbuild warning in net/sunrpc/debugfs.c
Cross-compile test on ARCH=mn10300:
In file included from include/linux/list.h:8:0,
from include/linux/wait.h:6,
from include/linux/fs.h:6,
from include/linux/debugfs.h:18,
from net/sunrpc/debugfs.c:7:
net/sunrpc/debugfs.c: In function 'fault_disconnect_write':
include/linux/kernel.h:723:17: warning: comparison of distinct pointer
types lacks a cast
(void) (&_min1 == &_min2); \
^
>> net/sunrpc/debugfs.c:307:8: note: in expansion of macro 'min'
len = min(len, sizeof(buffer) - 1);
Fixes: ('SUNRPC: Transport fault injection') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Kinglong Mee [Tue, 2 Jun 2015 10:58:46 +0000 (18:58 +0800)]
nfs: Only update callback sequnce id when CB_SEQUENCE success
When testing pnfs layout, nfsd got error NFS4ERR_SEQ_MISORDERED.
It is caused by nfs return NFS4ERR_DELAY before validate_seqid(),
don't update the sequnce id, but nfsd updates the sequnce id !!!
According to RFC5661 20.9.3,
" If CB_SEQUENCE returns an error, then the state of the slot
(sequence ID, cached reply) MUST NOT change. "
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Vaishali Thakkar [Wed, 10 Jun 2015 14:45:29 +0000 (20:15 +0530)]
NFS: Convert use of __constant_htonl to htonl
In little endian cases, the macro htonl unfolds to __swab32 which
provides special case for constants. In big endian cases,
__constant_htonl and htonl expand directly to the same expression.
So, replace __constant_htonl with htonl with the goal of getting
rid of the definition of __constant_htonl completely.
The semantic patch that performs this transformation is as follows:
Chuck Lever [Mon, 11 May 2015 18:02:25 +0000 (14:02 -0400)]
SUNRPC: Transport fault injection
It has been exceptionally useful to exercise the logic that handles
local immediate errors and RDMA connection loss. To enable
developers to test this regularly and repeatably, add logic to
simulate connection loss every so often.
Fault injection is disabled by default. It is enabled with
where "xxx" is a large positive number of transport method calls
before a disconnect. A value of several thousand is usually a good
number that allows reasonable forward progress while still causing a
lot of connection drops.
These hooks are disabled when SUNRPC_DEBUG is turned off.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Anna Schumaker [Wed, 10 Jun 2015 20:54:28 +0000 (16:54 -0400)]
NFS: Remove unused nfs_rw_ops->rw_release() function
This was only ever set to nfs_writeback_release_common(), a function
which is completely empty. Let's just drop this function pointer and
simplify the code a bit.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 3 Jun 2015 20:14:29 +0000 (16:14 -0400)]
sunrpc: turn swapper_enable/disable functions into rpc_xprt_ops
RDMA xprts don't have a sock_xprt, but an rdma_xprt, so the
xs_swapper_enable/disable functions will likely oops when fed an RDMA
xprt. Turn these functions into rpc_xprt_ops so that that doesn't
occur. For now the RDMA versions are no-ops that just return -EINVAL
on an attempt to swapon.
Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 3 Jun 2015 20:14:28 +0000 (16:14 -0400)]
sunrpc: lock xprt before trying to set memalloc on the sockets
It's possible that we could race with a call to xs_reset_transport, in
which case the xprt->inet pointer could be zeroed out while we're
accessing it. Lock the xprt before we try to set memalloc on it.
Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 3 Jun 2015 20:14:27 +0000 (16:14 -0400)]
sunrpc: if we're closing down a socket, clear memalloc on it first
We currently increment the memalloc_socks counter if we have a xprt that
is associated with a swapfile. That socket can be replaced however
during a reconnect event, and the memalloc_socks counter is never
decremented if that occurs.
When tearing down a xprt socket, check to see if the xprt is set up for
swapping and sk_clear_memalloc before releasing the socket if so.
Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 3 Jun 2015 20:14:26 +0000 (16:14 -0400)]
sunrpc: make xprt->swapper an atomic_t
Split xs_swapper into enable/disable functions and eliminate the
"enable" flag.
Currently, it's racy if you have multiple swapon/swapoff operations
running in parallel over the same xprt. Also fix it so that we only
set it to a memalloc socket on a 0->1 transition and only clear it
on a 1->0 transition.
Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Jeff Layton [Wed, 3 Jun 2015 20:14:25 +0000 (16:14 -0400)]
sunrpc: keep a count of swapfiles associated with the rpc_clnt
Jerome reported seeing a warning pop when working with a swapfile on
NFS. The nfs_swap_activate can end up calling sk_set_memalloc while
holding the rcu_read_lock and that function can sleep.
To fix that, we need to take a reference to the xprt while holding the
rcu_read_lock, set the socket up for swapping and then drop that
reference. But, xprt_put is not exported and having NFS deal with the
underlying xprt is a bit of layering violation anyway.
Fix this by adding a set of activate/deactivate functions that take a
rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and
nfs_swap_deactivate call those.
Also, add a per-rpc_clnt atomic counter to keep track of the number of
active swapfiles associated with it. When the counter does a 0->1
transition, we enable swapping on the xprt, when we do a 1->0 transition
we disable swapping on it.
This also allows us to be a bit more selective with the RPC_TASK_SWAPPER
flag. If non-swapper and swapper clnts are sharing a xprt, then we only
need to flag the tasks from the swapper clnt with that flag.
Acked-by: Mel Gorman <mgorman@suse.de> Reported-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Linus Torvalds [Sun, 7 Jun 2015 23:56:10 +0000 (16:56 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
"Eight fixes across arch/mips. Nothing stands particuarly out nor is
complicated but fixes keep coming in at a higher than comfortable
rate"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: KVM: Do not sign extend on unsigned MMIO load
MIPS: BPF: Fix stack pointer allocation
MIPS: Loongson-3: Fix a cpu-hotplug issue in loongson3_ipi_interrupt()
MIPS: Fix enabling of DEBUG_STACKOVERFLOW
MIPS: c-r4k: Fix typo in probe_scache()
MIPS: Avoid an FPE exception in FCSR mask probing
MIPS: ath79: Add a missing new line in log message
MIPS: ralink: Fix clearing the illegal access interrupt
Linus Torvalds [Sun, 7 Jun 2015 05:37:45 +0000 (22:37 -0700)]
Merge tag 'driver-core-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are two fixes for the driver core that resolve some reported
issues.
One is a regression from 4.0, the other a fixes a reported oops that
has been there since 3.19.
Both have been in linux-next for a while with no problems"
* tag 'driver-core-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers/base: cacheinfo: handle absence of caches
drivers: of/base: move of_init to driver_init
Linus Torvalds [Sun, 7 Jun 2015 05:33:08 +0000 (22:33 -0700)]
Merge tag 'staging-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging / IIO fixes from Greg KH:
"Here are some IIO driver fixes to resolve reported issues, some ozwpan
fixes for some reported CVE problems, and a rtl8712 driver fix for a
reported regression.
All have been in linux-next successfully"
* tag 'staging-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8712: fix stack dump
ozwpan: unchecked signed subtraction leads to DoS
ozwpan: divide-by-zero leading to panic
ozwpan: Use unsigned ints to prevent heap overflow
ozwpan: Use proper check to prevent heap overflow
iio: adc: twl6030-gpadc: Fix modalias
iio: adis16400: Fix burst transfer for adis16448
iio: adis16400: Fix burst mode
iio: adis16400: Compute the scan mask from channel indices
iio: adis16400: Use != channel indices for the two voltage channels
iio: adis16400: Report pressure channel scale
Linus Torvalds [Sun, 7 Jun 2015 05:14:23 +0000 (22:14 -0700)]
Merge tag 'tty-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are a few TTY and Serial driver fixes for reported regressions
and crashes.
All of these have been in linux-next with no reported problems"
* tag 'tty-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
n_tty: Fix auditing support for cannonical mode
serial: 8250_omap: provide complete custom startup & shutdown callbacks
n_tty: Fix calculation of size in canon_copy_from_read_buf
serial: imx: Fix DMA handling for IDLE condition aborts
serial/amba-pl011: Unconditionally poll for FIFO space before each TX char
Linus Torvalds [Sun, 7 Jun 2015 05:06:53 +0000 (22:06 -0700)]
Merge tag 'usb-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB and PHY driver fixes from Greg KH:
"Here are some USB and PHY driver fixes that resolve some reported
regressions. Also in here are some new device ids.
All of the details are in the shortlog and these patches have been in
linux-next with no problems"
* tag 'usb-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
USB: cp210x: add ID for HubZ dual ZigBee and Z-Wave dongle
usb: renesas_usbhs: Don't disable the pipe if Control write status stage
usb: renesas_usbhs: Fix fifo unclear in usbhsf_prepare_pop
usb: gadget: f_fs: fix check in read operation
usb: musb: fix order of conditions for assigning end point operations
usb: gadget: f_uac1: check return code from config_ep_by_speed
usb: gadget: ffs: fix: Always call ffs_closed() in ffs_data_clear()
usb: gadget: g_ffs: Fix counting of missing_functions
usb: s3c2410_udc: correct reversed pullup logic
usb: dwc3: gadget: Fix incorrect DEPCMD and DGCMD status macros
usb: phy: tahvo: Pass the IRQF_ONESHOT flag
usb: phy: ab8500-usb: Pass the IRQF_ONESHOT flag
usb: renesas_usbhs: Revise the binding document about the dma-names
usb: host: xhci: add mutex for non-thread-safe data
usb: make module xhci_hcd removable
USB: serial: ftdi_sio: Add support for a Motion Tracker Development Board
usb: gadget: f_midi: fix segfault when reading empty id
phy: phy-rcar-gen2: Fix USBHS_UGSTS_LOCK value
phy: omap-usb2: invoke pm_runtime_disable on error path
phy: fix Kconfig dependencies
...
Linus Torvalds [Sun, 7 Jun 2015 04:43:29 +0000 (21:43 -0700)]
Merge tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux
Pull devicetree fix from Grant Likely:
"Stupid typo fix for v4.1. One of the IS_ENABLED() macro calls forgot
the CONFIG_ prefix. Only affects a tiny number of platforms, but
still..."
* tag 'devicetree-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/glikely/linux:
of/dynamic: Fix test for PPC_PSERIES
Linus Torvalds [Sat, 6 Jun 2015 16:15:14 +0000 (09:15 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"i915 has a bunch of fixes, and Russell found a bug in sysfs writing
handling that results in userspace getting stuck"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm: fix writing to /sys/class/drm/*/status
drm/i915: Move WaBarrierPerformanceFixDisable:skl to skl code from chv code
drm/i915: Include G4X/VLV/CHV in self refresh status
drm/i915: Initialize HWS page address after GPU reset
drm/i915: Don't skip request retirement if the active list is empty
drm/i915/hsw: Fix workaround for server AUX channel clock divisor
Linus Torvalds [Sat, 6 Jun 2015 16:08:23 +0000 (09:08 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
"Just a couple touchpad drivers fixups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: alps - do not reduce trackpoint speed by half
Input: elantech - add new icbody type
Input: elantech - fix detection of touchpads where the revision matches a known rate
Markos Chandras [Thu, 4 Jun 2015 10:56:13 +0000 (11:56 +0100)]
MIPS: BPF: Fix stack pointer allocation
Fix stack pointer offset which could potentially corrupt
argument registers in the previous frame. The calculated offset
reflects the size of all the registers we need to preserve so there
is no need for this erroneous subtraction.
[ralf@linux-mips.org: Fixed conflict due to only applying this fix part
of the entire series as part of 4.1 fixes.]
Huacai Chen [Thu, 4 Jun 2015 06:53:47 +0000 (14:53 +0800)]
MIPS: Loongson-3: Fix a cpu-hotplug issue in loongson3_ipi_interrupt()
setup_per_cpu_areas() only setup __per_cpu_offset[] for each possible
cpu, but loongson_sysconf.nr_cpus can be greater than possible cpus
(due to reserved_cpus_mask). So in loongson3_ipi_interrupt(), percpu
access will touch the original varible in .data..percpu section which
has been freed. Without this patch, cpu-hotplug will cause memery
corruption.
Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: John Crispin <john@phrozen.org> Cc: Steven J. Hill <Steven.Hill@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Fuxin Zhang <zhangfx@lemote.com> Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Patchwork: http://patchwork.linux-mips.org/patch/10524/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
James Hogan [Thu, 4 Jun 2015 12:25:27 +0000 (13:25 +0100)]
MIPS: Fix enabling of DEBUG_STACKOVERFLOW
Commit 334c86c494b9 ("MIPS: IRQ: Add stackoverflow detection") added
kernel stack overflow detection, however it only enabled it conditional
upon the preprocessor definition DEBUG_STACKOVERFLOW, which is never
actually defined. The Kconfig option is called DEBUG_STACKOVERFLOW,
which manifests to the preprocessor as CONFIG_DEBUG_STACKOVERFLOW, so
switch it to using that definition instead.
Chris Leech [Thu, 28 May 2015 19:51:51 +0000 (12:51 -0700)]
iscsi_ibft: filter null v4-mapped v6 addresses
I've had reports of UEFI platforms failing iSCSI boot in various
configurations, that ended up being caused by network initialization
scripts getting tripped up by unexpected null addresses (0.0.0.0) being
reported for gateways, dhcp servers, and dns servers.
The tianocore EDK2 iSCSI driver generates an iBFT table that always uses
IPv4-mapped IPv6 addresses for the NIC structure fields. This results
in values that are "not present or not specified" being reported as
::ffff:0.0.0.0 rather than all zeros as specified.
The iscsi_ibft module filters unspecified fields from the iBFT from
sysfs, preventing userspace from using invalid values and making it easy
to check for the presence of a value. This currently fails in regard to
these mapped null addresses.
In order to remain consistent with how the iBFT information is exposed,
we should accommodate the behavior of the tianocore iSCSI driver as it's
already in the wild in a large number of servers.
Tested under qemu using an OVMF build of tianocore EDK2.
Signed-off-by: Chris Leech <cleech@redhat.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The map_single() function is not defined as static, even though it
doesn't seem to be used anywhere else in the kernel. Make it static to
avoid namespace pollution since this is a rather generic symbol.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Russell King [Fri, 5 Jun 2015 22:27:30 +0000 (08:27 +1000)]
drm: fix writing to /sys/class/drm/*/status
Writing to a file is supposed to return the number of bytes written.
Returning zero unfortunately causes bash to constantly spin trying
to write to the sysfs file, to such an extent that even ^c and ^z
have no effect. The only way out of that is to kill the shell and
log back in. This isn't nice behaviour.
Fix it by returning the number of characters written to sysfs files.
[airlied: used suggestion from Al Viro] Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 5 Jun 2015 21:14:13 +0000 (07:14 +1000)]
Merge tag 'drm-intel-fixes-2015-06-05' of git://anongit.freedesktop.org/drm-intel into drm-fixes
bunch of i915 fixes.
* tag 'drm-intel-fixes-2015-06-05' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Move WaBarrierPerformanceFixDisable:skl to skl code from chv code
drm/i915: Include G4X/VLV/CHV in self refresh status
drm/i915: Initialize HWS page address after GPU reset
drm/i915: Don't skip request retirement if the active list is empty
drm/i915/hsw: Fix workaround for server AUX channel clock divisor
Linus Torvalds [Fri, 5 Jun 2015 17:22:06 +0000 (10:22 -0700)]
Merge tag 'pci-v4.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Resource management
- Fix IOV sorting by alignment (Wei Yang)
- Preserve resource size during alignment reordering (Yinghai Lu)
Miscellaneous
- MAINTAINERS: Add Pratyush for SPEAr13xx and DesignWare PCIe (Pratyush Anand)"
* tag 'pci-v4.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Preserve resource size during alignment reordering
PCI: Fix IOV resource sorting by alignment requirement
MAINTAINERS: Add Pratyush Anand as SPEAr13xx and DesignWare PCIe maintainer
Linus Torvalds [Fri, 5 Jun 2015 17:16:52 +0000 (10:16 -0700)]
Merge tag 'sound-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"It was a fairly calm week; here you can find only a few trivial quirks
and fixes for USB and HD-audio. All changes are pretty device
specific"
* tag 'sound-4.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: fix missing input volume controls in MAYA44 USB(+)
ALSA: usb-audio: add MAYA44 USB+ mixer control names
ALSA: hda/realtek - Add a fixup for another Acer Aspire 9420
ALSA: hda - Fix jack detection at resume with VT codecs
ALSA: usb-audio: don't try to get Outlaw RR2150 sample rate
ALSA: usb-audio: Add mic volume fix quirk for Logitech Quickcam Fusion
ALSA: hda/realtek - Suooprt Dell headset mode for ALC256
Linus Torvalds [Fri, 5 Jun 2015 17:14:51 +0000 (10:14 -0700)]
Merge tag 'iommu-fixes-v4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fix from Joerg Roedel:
"Only one patch:
- Revert "iommu/amd: Don't allocate with __GFP_ZERO in
alloc_coherent".
This patch caused problems with some drivers, so it is better to
revert it now until the drivers have been fixed"
* tag 'iommu-fixes-v4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
Revert "iommu/amd: Don't allocate with __GFP_ZERO in alloc_coherent"
Linus Torvalds [Fri, 5 Jun 2015 17:03:48 +0000 (10:03 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- early_idt_handlers[] fix that fixes the build with bleeding edge
tooling
- build warning fix on GCC 5.1
- vm86 fix plus self-test to make it harder to break it again"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/irq: Stop relying on magic JMP behavior for early_idt_handlers
x86/asm/entry/32, selftests: Add a selftest for kernel entries from VM86 mode
x86/boot: Add CONFIG_PARAVIRT_SPINLOCKS quirk to arch/x86/boot/compressed/misc.h
x86/asm/entry/32: Really make user_mode() work correctly for VM86 mode
Linus Torvalds [Fri, 5 Jun 2015 17:00:53 +0000 (10:00 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"The biggest chunk of the changes are two regression fixes: a HT
workaround fix and an event-group scheduling fix. It's been verified
with 5 days of fuzzer testing.
Linus Torvalds [Fri, 5 Jun 2015 16:58:36 +0000 (09:58 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Just two small fixes: one radeon, one amdkfd"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/amdkfd: fix topology bug with capability attr.
drm/radeon: use proper ACR regisiter for DCE3.2
Linus Torvalds [Fri, 5 Jun 2015 16:54:43 +0000 (09:54 -0700)]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c bug fixes from Wolfram Sang:
"Two small bugfixes for I2C"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: s3c2410: fix oops in suspend callback for non-dt platforms
i2c: hix5hd2: Fix modalias to make module auto-loading work
Trond Myklebust [Thu, 4 Jun 2015 19:37:10 +0000 (15:37 -0400)]
SUNRPC: Fix a backchannel race
We need to allow the server to send a new request immediately after we've
replied to the previous one. Right now, there is a window between the
send and the release of the old request in rpc_put_task(), where the
server could send us a new backchannel RPC call, and we have no
request to service it.