Anna Schumaker [Tue, 25 Jul 2017 20:10:47 +0000 (16:10 -0400)]
NFS: Use raw NFS access mask in nfs4_opendata_access()
Commit bd8b2441742b ("NFS: Store the raw NFS access mask in the inode's
access cache") changed how the access results are stored after an
access() call. An NFS v4 OPEN might have access bits returned with the
opendata, so we should use the NFS4_ACCESS values when determining the
return value in nfs4_opendata_access().
Fixes: bd8b2441742b ("NFS: Store the raw NFS access mask in the inode's
access cache") Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Tested-by: Takashi Iwai <tiwai@suse.de>
NFS: Be more careful about mapping file permissions
When mapping a directory, we want the MAY_WRITE permissions to reflect
whether or not we have permission to modify, add and delete the directory
entries. MAY_EXEC must map to lookup permissions.
On the other hand, for files, we want MAY_WRITE to reflect a permission
to modify and extend the file.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
net/sunrpc/xprt_sock: fix regression in connection error reporting.
Commit 3d4762639dd3 ("tcp: remove poll() flakes when receiving
RST") in v4.12 changed the order in which ->sk_state_change()
and ->sk_error_report() are called when a socket is shut
down - sk_state_change() is now called first.
This causes xs_tcp_state_change() -> xs_sock_mark_closed() ->
xprt_disconnect_done() to wake all pending tasked with -EAGAIN.
When the ->sk_error_report() callback arrives, it is too late to
pass the error on, and it is lost.
As easy way to demonstrate the problem caused is to try to start
rpc.nfsd while rcpbind isn't running.
nfsd will attempt a tcp connection to rpcbind. A ECONNREFUSED
error is returned, but sunrpc code loses the error and keeps
retrying. If it saw the ECONNREFUSED, it would abort.
To fix this, handle the sk->sk_err in the TCP_CLOSE branch of
xs_tcp_state_change().
Fixes: 3d4762639dd3 ("tcp: remove poll() flakes when receiving RST") Cc: stable@vger.kernel.org (v4.12) Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
nfs: count correct array for mnt3_counts array size
Array size of mnt3_counts should be the size of array
mnt3_procedures, not mnt_procedures, though they're same in size
right now. Found this by code inspection.
Fixes: 1c5876ddbdb4 ("sunrpc: move p_count out of struct rpc_procinfo") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Steve Dickson [Thu, 29 Jun 2017 15:48:26 +0000 (11:48 -0400)]
mount: copy the port field into the cloned nfs_server structure.
Doing this copy eliminates the "port=0" entry in
the /proc/mounts entries
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=69241 Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFS: Don't run wake_up_bit() when nobody is waiting...
"perf lock" shows fairly heavy contention for the bit waitqueue locks
when doing an I/O heavy workload.
Use a bit to tell whether or not there has been contention for a lock
so that we can optimise away the bit waitqueue options in those cases.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Peng Tao [Thu, 29 Jun 2017 13:34:53 +0000 (06:34 -0700)]
nfs: add export operations
This support for opening files on NFS by file handle, both through the
open_by_handle syscall, and for re-exporting NFS (for example using a
different version). The support is very basic for now, as each open by
handle will have to do an NFSv4 open operation on the wire. In the
future this will hopefully be mitigated by an open file cache, as well
as various optimizations in NFS for this specific case.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
[hch: incorporated various changes, resplit the patches, new changelog] Signed-off-by: Christoph Hellwig <hch@lst.de>
Peng Tao [Thu, 29 Jun 2017 13:34:51 +0000 (06:34 -0700)]
nfs: add a nfs_ilookup helper
This helper will allow to find an existing NFS inode by the file handle
and fattr.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
[hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
sunrpc: use constant time memory comparison for mac
Otherwise, we enable a MAC forgery via timing attack.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Anna Schumaker <anna.schumaker@netapp.com> Cc: linux-nfs@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:53:24 +0000 (11:53 -0400)]
xprtrdma: Fix documenting comments in frwr_ops.c
Clean up.
FASTREG and LOCAL_INV WRs are typically not signaled. localinv_wake
is used for the last LOCAL_INV WR in a chain, which is always
signaled. The documenting comments should reflect that.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:53:08 +0000 (11:53 -0400)]
xprtrdma: FMR does not need list_del_init()
Clean up.
Commit 38f1932e60ba ("xprtrdma: Remove FMRs from the unmap list
after unmapping") utilized list_del_init() to try to prevent some
list corruption. The corruption was actually caused by the reply
handler racing with a signal. Now that MR invalidation is properly
serialized, list_del_init() can safely be replaced.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:53:00 +0000 (11:53 -0400)]
xprtrdma: Demote "connect" log messages
Some have complained about the log messages generated when xprtrdma
opens or closes a connection to a server. When an NFS mount is
mostly idle these can appear every few minutes as the client idles
out the connection and reconnects.
Connection and disconnection is a normal part of operation, and not
exceptional, so change these to dprintk's for now. At some point
all of these will be converted to tracepoints, but that's for
another day.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:52 +0000 (11:52 -0400)]
NFSv4.1: Use seqid returned by EXCHANGE_ID after state migration
Transparent State Migration copies a client's lease state from the
server where a filesystem used to reside to the server where it now
resides. When an NFSv4.1 client first contacts that destination
server, it uses EXCHANGE_ID to detect trunking relationships.
The lease that was copied there is returned to that client, but the
destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
the client. This is because the lease was confirmed on the source
server (before it was copied).
When CONFIRMED_R is set, the client throws away the sequence ID
returned by the server. During a Transparent State Migration, however
there's no other way for the client to know what sequence ID to use
with a lease that's been migrated.
Therefore, the client must save and use the contrived slot sequence
value returned by the destination server even when CONFIRMED_R is
set.
Note that some servers always return a seqid of 1 after a migration.
Reported-by: Xuan Qi <xuan.qi@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Xuan Qi <xuan.qi@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:44 +0000 (11:52 -0400)]
NFSv4.1: Handle EXCHGID4_FLAG_CONFIRMED_R during NFSv4.1 migration
Transparent State Migration copies a client's lease state from the
server where a filesystem used to reside to the server where it now
resides. When an NFSv4.1 client first contacts that destination
server, it uses EXCHANGE_ID to detect trunking relationships.
The lease that was copied there is returned to that client, but the
destination server sets EXCHGID4_FLAG_CONFIRMED_R when replying to
the client. This is because the lease was confirmed on the source
server (before it was copied).
Normally, when CONFIRMED_R is set, a client purges the lease and
creates a new one. However, that throws away the entire benefit of
Transparent State Migration.
Therefore, the client must not purge that lease when it is possible
that Transparent State Migration has occurred.
Reported-by: Xuan Qi <xuan.qi@oracle.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Xuan Qi <xuan.qi@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:36 +0000 (11:52 -0400)]
xprtrdma: Don't defer MR recovery if ro_map fails
Deferred MR recovery does a DMA-unmapping of the MW. However, ro_map
invokes rpcrdma_defer_mr_recovery in some error cases where the MW
has not even been DMA-mapped yet.
Avoid a DMA-unmapping error replacing rpcrdma_defer_mr_recovery.
Also note that if ib_dma_map_sg is asked to map 0 nents, it will
return 0. So the extra "if (i == 0)" check is no longer needed.
Fixes: 42fe28f60763 ("xprtrdma: Do not leak an MW during a DMA ...") Fixes: 505bbe64dd04 ("xprtrdma: Refactor MR recovery work queues") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:28 +0000 (11:52 -0400)]
xprtrdma: Fix FRWR invalidation error recovery
When ib_post_send() fails, all LOCAL_INV WRs past @bad_wr have to be
examined, and the MRs reset by hand.
I'm not sure how the existing code can work by comparing R_keys.
Restructure the logic so that instead it walks the chain of WRs,
starting from the first bad one.
Make sure to wait for completion if at least one WR was actually
posted. Otherwise, if the ib_post_send fails, we can end up
DMA-unmapping the MR while LOCAL_INV operations are in flight.
Commit 7a89f9c626e3 ("xprtrdma: Honor ->send_request API contract")
added the rdma_disconnect() call site. The disconnect actually
causes more problems than it solves, and SQ overruns happen only as
a result of software bugs. So remove it.
Fixes: d7a21c1bed54 ("xprtrdma: Reset MRs in frwr_op_unmap_sync()") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:20 +0000 (11:52 -0400)]
xprtrdma: Fix client lock-up after application signal fires
After a signal, the RPC client aborts synchronous RPCs running on
behalf of the signaled application.
The server is still executing those RPCs, and will write the results
back into the client's memory when it's done. By the time the server
writes the results, that memory is likely being used for other
purposes. Therefore xprtrdma has to immediately invalidate all
memory regions used by those aborted RPCs to prevent the server's
writes from clobbering that re-used memory.
With FMR memory registration, invalidation takes a relatively long
time. In fact, the invalidation is often still running when the
server tries to write the results into the memory regions that are
being invalidated.
This sets up a race between two processes:
1. After the signal, xprt_rdma_free calls ro_unmap_safe.
2. While ro_unmap_safe is still running, the server replies and
rpcrdma_reply_handler runs, calling ro_unmap_sync.
Both processes invoke ib_unmap_fmr on the same FMR.
The mlx4 driver allows two ib_unmap_fmr calls on the same FMR at
the same time, but HCAs generally don't tolerate this. Sometimes
this can result in a system crash.
If the HCA happens to survive, rpcrdma_reply_handler continues. It
removes the rpc_rqst from rq_list and releases the transport_lock.
This enables xprt_rdma_free to run in another process, and the
rpc_rqst is released while rpcrdma_reply_handler is still waiting
for the ib_unmap_fmr call to finish.
But further down in rpcrdma_reply_handler, the transport_lock is
taken again, and "rqst" is dereferenced. If "rqst" has already been
released, this triggers a general protection fault. Since bottom-
halves are disabled, the system locks up.
Address both issues by reversing the order of the xprt_lookup_rqst
call and the ro_unmap_sync call. Introduce a separate lookup
mechanism for rpcrdma_req's to enable calling ro_unmap_sync before
xprt_lookup_rqst. Now the handler takes the transport_lock once
and holds it for the XID lookup and RPC completion.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305 Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:12 +0000 (11:52 -0400)]
xprtrdma: Rename rpcrdma_req::rl_free
Clean up: I'm about to use the rl_free field for purposes other than
a free list. So use a more generic name.
This is a refactoring change only.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305 Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:52:04 +0000 (11:52 -0400)]
xprtrdma: Pass only the list of registered MRs to ro_unmap_sync
There are rare cases where an rpcrdma_req can be re-used (via
rpcrdma_buffer_put) while the RPC reply handler is still running.
This is due to a signal firing at just the wrong instant.
Since commit 9d6b04097882 ("xprtrdma: Place registered MWs on a
per-req list"), rpcrdma_mws are self-contained; ie., they fully
describe an MR and scatterlist, and no part of that information is
stored in struct rpcrdma_req.
As part of closing the above race window, pass only the req's list
of registered MRs to ro_unmap_sync, rather than the rpcrdma_req
itself.
Some extra transport header sanity checking is removed. Since the
client depends on its own recollection of what memory had been
registered, there doesn't seem to be a way to abuse this change.
And, the check was not terribly effective. If the client had sent
Read chunks, the "list_empty" test is negative in both of the
removed cases, which are actually looking for Write or Reply
chunks.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305 Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:51:56 +0000 (11:51 -0400)]
xprtrdma: Pre-mark remotely invalidated MRs
There are rare cases where an rpcrdma_req and its matched
rpcrdma_rep can be re-used, via rpcrdma_buffer_put, while the RPC
reply handler is still using that req. This is typically due to a
signal firing at just the wrong instant.
As part of closing this race window, avoid using the wrong
rpcrdma_rep to detect remotely invalidated MRs. Mark MRs as
invalidated while we are sure the rep is still OK to use.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305 Fixes: 68791649a725 ('xprtrdma: Invalidate in the RPC reply ... ') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Chuck Lever [Thu, 8 Jun 2017 15:51:48 +0000 (11:51 -0400)]
xprtrdma: On invalidation failure, remove MWs from rl_registered
Callers assume the ro_unmap_sync and ro_unmap_safe methods empty
the list of registered MRs. Ensure that all paths through
fmr_op_unmap_sync() remove MWs from that list.
Fixes: 9d6b04097882 ("xprtrdma: Place registered MWs on a ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFS: check for nfs_refresh_inode() errors in nfs_fhget()
If an NFS server returns a filehandle that we have previously
seen, and reports a different type, then nfs_refresh_inode()
will log a warning and return an error.
nfs_fhget() does not check for this error and may return an
inode with a different type than the one that the server
reported.
This is likely to cause confusion, and is one way that
->open_context() could return a directory inode as discussed
in the previous patch.
So if nfs_refresh_inode() returns and error, return that error
from nfs_fhget() to avoid the confusion propagating.
Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFS: guard against confused server in nfs_atomic_open()
A confused server could return a filehandle for an
NFSv4 OPEN request, which it previously returned for a directory.
So the inode returned by ->open_context() in nfs_atomic_open()
could conceivably be a directory inode.
This has particular implications for the call to
nfs_file_set_open_context() in nfs_finish_open().
If that is called on a directory inode, then the nfs_open_context
that gets stored in the filp->private_data will be linked to
nfs_inode->open_files.
When the directory is closed, nfs_closedir() will (ultimately)
free the ->private_data, but not unlink it from nfs_inode->open_files
(because it doesn't expect an nfs_open_context there).
Subsequently the memory could get used for something else and eventually
if the ->open_files list is walked, the walker will fall off the end and
crash.
So: change nfs_finish_open() to only call nfs_file_set_open_context()
for regular-file inodes.
This failure mode has been seen in a production setting (unknown NFS
server implementation). The kernel was v3.0 and the specific sequence
seen would not affect more recent kernels, but I think a risk is still
present, and caution is wise.
Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFS: only invalidate dentrys that are clearly invalid.
Since commit bafc9b754f75 ("vfs: More precise tests in d_invalidate")
in v3.18, a return of '0' from ->d_revalidate() will cause the dentry
to be invalidated even if it has filesystems mounted on or it or on a
descendant. The mounted filesystem is unmounted.
This means we need to be careful not to return 0 unless the directory
referred to truly is invalid. So -ESTALE or -ENOENT should invalidate
the directory. Other errors such a -EPERM or -ERESTARTSYS should be
returned from ->d_revalidate() so they are propagated to the caller.
A particular problem can be demonstrated by:
1/ mount an NFS filesystem using NFSv3 on /mnt
2/ mount any other filesystem on /mnt/foo
3/ ls /mnt/foo
4/ turn off network, or otherwise make the server unable to respond
5/ ls /mnt/foo &
6/ cat /proc/$!/stack # note that nfs_lookup_revalidate is in the call stack
7/ kill -9 $! # this results in -ERESTARTSYS being returned
8/ observe that /mnt/foo has been unmounted.
This patch changes nfs_lookup_revalidate() to only treat
-ESTALE from nfs_lookup_verify_inode() and
-ESTALE or -ENOENT from ->lookup()
as indicating an invalid inode. Other errors are returned.
Also nfs_check_inode_attributes() is changed to return -ESTALE rather
than -EIO. This is consistent with the error returned in similar
circumstances from nfs_update_inode().
As this bug allows any user to unmount a filesystem mounted on an NFS
filesystem, this fix is suitable for stable kernels.
Fixes: bafc9b754f75 ("vfs: More precise tests in d_invalidate") Cc: stable@vger.kernel.org (v3.18+) Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Upon receiving a stateid error such as BAD_STATEID, the client
should retry the operation against the MDS before deciding to
do stateid recovery.
Previously, the code would initiate state recovery and it could
lead to a race in a state manager that could chose an incorrect
recovery method which would lead to the EIO failure for the
application.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit fabbbee0eb0f "PNFS fix fallback to MDS if got error on
commit to DS" moved the pnfs_set_lo_fail() to unhandled errors
which was not correct and lead to a kernel oops on umount.
Instead, fix the original EACCESS on commit to DS error by
getting the new layout and re-doing the IO.
Fixes: fabbbee0eb0f ("PNFS fix fallback to MDS if got error on commit to DS") Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Cc: stable@vger.kernel.org # v4.12 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Tuo Chen Peng [Wed, 7 Jun 2017 06:42:44 +0000 (23:42 -0700)]
nfs: Fix fscache stat printing in nfs_show_stats()
nfs_show_stats() was incorrectly reading statistics for bytes when printing that
for fsc. It caused files like /proc/self/mountstats to report incorrect fsc
statistics for NFS mounts.
Signed-off-by: Tuo Chen Peng <tpeng@nvidia.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Commit 8ef9b0b9e1c0 open-coded nfs_pgarray_set(), and left out the
initialization of the nfs_page_array's npages. This mistake didn't show up
until testing with block layouts, and there shows that all pNFS reads
return -EIO.
Fixes: 8ef9b0b9e1c0 ("NFS: move nfs_pgarray_set() to open code") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # 4.12 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Tue, 20 Jun 2017 23:35:38 +0000 (19:35 -0400)]
NFS: Fix commit policy for non-blocking calls to nfs_write_inode()
Now that the writes will schedule a commit on their own, we don't
need nfs_write_inode() to schedule one if there are outstanding
writes, and we're being called in non-blocking mode.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Tue, 20 Jun 2017 23:35:37 +0000 (19:35 -0400)]
NFS: Ensure we commit after writeback is complete
If the page cache is being flushed, then we want to ensure that we
do start a commit once the pages are done being flushed.
If we just wait until all I/O is done to that file, we can end up
livelocking until the balance_dirty_pages() mechanism puts its
foot down and forces I/O to stop.
So instead we do more or less the same thing that O_DIRECT does,
and set up a counter to tell us when the flush is done,
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Trond Myklebust [Tue, 20 Jun 2017 23:35:39 +0000 (19:35 -0400)]
SUNRPC: Make slot allocation more reliable
In xprt_alloc_slot(), the spin lock is only needed to provide atomicity
between the atomic_add_unless() failure and the call to xprt_add_backlog().
We do not actually need to hold it across the memory allocation itself.
By dropping the lock, we can use a more resilient GFP_NOFS allocation,
just as we now do in the rest of the RPC client code.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
NFS: nfs_rename() - revalidate directories on -ERESTARTSYS
An interrupted rename will leave the old dentry behind if the rename
succeeds. Fix this by forcing a lookup the next time through
->d_revalidate.
A previous attempt at solving this problem took the approach to complete
the work of the rename asynchronously, however that approach was wrong
since it would allow the d_move() to occur after the directory's i_mutex
had been dropped by the original process.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Anna Schumaker [Fri, 16 Jun 2017 16:06:59 +0000 (12:06 -0400)]
NFS: Set FATTR4_WORD0_TYPE for . and .. entries
The current code worked okay for getdents(), but getdents64() expects
the d_type field to get filled out properly in the stat structure.
Setting this field fixes xfstests generic/401.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.
Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument. With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.
p_count is the only writeable memeber of struct rpc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.
This patch moves it into out out struct rpc_procinfo, and into a
separate writable array that is pointed to by struct rpc_version and
indexed by p_statidx.
Linus Torvalds [Sun, 11 Jun 2017 23:17:29 +0000 (16:17 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull key subsystem fixes from James Morris:
"Here are a bunch of fixes for Linux keyrings, including:
- Fix up the refcount handling now that key structs use the
refcount_t type and the refcount_t ops don't allow a 0->1
transition.
- Fix a potential NULL deref after error in x509_cert_parse().
- Don't put data for the crypto algorithms to use on the stack.
- Fix the handling of a null payload being passed to add_key().
- Fix incorrect cleanup an uninitialised key_preparsed_payload in
key_update().
- Explicit sanitisation of potentially secure data before freeing.
- Fixes for the Diffie-Helman code"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (23 commits)
KEYS: fix refcount_inc() on zero
KEYS: Convert KEYCTL_DH_COMPUTE to use the crypto KPP API
crypto : asymmetric_keys : verify_pefile:zero memory content before freeing
KEYS: DH: add __user annotations to keyctl_kdf_params
KEYS: DH: ensure the KDF counter is properly aligned
KEYS: DH: don't feed uninitialized "otherinfo" into KDF
KEYS: DH: forbid using digest_null as the KDF hash
KEYS: sanitize key structs before freeing
KEYS: trusted: sanitize all key material
KEYS: encrypted: sanitize all key material
KEYS: user_defined: sanitize key payloads
KEYS: sanitize add_key() and keyctl() key payloads
KEYS: fix freeing uninitialized memory in key_update()
KEYS: fix dereferencing NULL payload with nonzero length
KEYS: encrypted: use constant-time HMAC comparison
KEYS: encrypted: fix race causing incorrect HMAC calculations
KEYS: encrypted: fix buffer overread in valid_master_desc()
KEYS: encrypted: avoid encrypting/decrypting stack buffers
KEYS: put keyring if install_session_keyring_to_cred() fails
KEYS: Delete an error message for a failed memory allocation in get_derived_key()
...
Linus Torvalds [Sun, 11 Jun 2017 22:51:56 +0000 (15:51 -0700)]
compiler, clang: properly override 'inline' for clang
Commit abb2ea7dfd82 ("compiler, clang: suppress warning for unused
static inline functions") just caused more warnings due to re-defining
the 'inline' macro.
So undef it before re-defining it, and also add the 'notrace' attribute
like the gcc version that this is overriding does.
Linus Torvalds [Sun, 11 Jun 2017 19:02:01 +0000 (12:02 -0700)]
Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull randomness fixes from Ted Ts'o:
"Improve performance by using a lockless update mechanism suggested by
Linus, and make sure we refresh per-CPU entropy returned get_random_*
as soon as the CRNG is initialized"
* tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: invalidate batched entropy after crng init
random: use lockless method of accessing and updating f->reg_idx
Linus Torvalds [Sun, 11 Jun 2017 18:57:47 +0000 (11:57 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix various bug fixes in ext4 caused by races and memory allocation
failures"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix fdatasync(2) after extent manipulation operations
ext4: fix data corruption for mmap writes
ext4: fix data corruption with EXT4_GET_BLOCKS_ZERO
ext4: fix quota charging for shared xattr blocks
ext4: remove redundant check for encrypted file on dio write path
ext4: remove unused d_name argument from ext4_search_dir() et al.
ext4: fix off-by-one error when writing back pages before dio read
ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff()
ext4: keep existing extra fields when inode expands
ext4: handle the rest of ext4_mb_load_buddy() ENOMEM errors
ext4: fix off-by-in in loop termination in ext4_find_unwritten_pgoff()
ext4: fix SEEK_HOLE
jbd2: preserve original nofs flag during journal restart
ext4: clear lockdep subtype for quota files on quota off
Linus Torvalds [Sun, 11 Jun 2017 18:34:27 +0000 (11:34 -0700)]
Merge tag 'gpio-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"A few overdue GPIO patches for the v4.12 kernel.
- Fix debounce logic on the Aspeed platform.
- Fix the "virtual gpio" things on the Intel Crystal Cove.
- Fix the blink counter selection on the MVEBU platform"
* tag 'gpio-v4.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpio: mvebu: fix gpio bank registration when pwm is used
gpio: mvebu: fix blink counter register selection
MAINTAINERS: remove self from GPIO maintainers
gpio: crystalcove: Do not write regular gpio registers for virtual GPIOs
gpio: aspeed: Don't attempt to debounce if disabled
Linus Torvalds [Sun, 11 Jun 2017 18:29:15 +0000 (11:29 -0700)]
Merge tag 'char-misc-4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small driver fixes for 4.12-rc5. Nothing major here,
just some small bugfixes found by people testing, and a MAINTAINERS
file update for the genwqe driver.
All have been in linux-next with no reported issues"
[ The cxl driver fix came in through the powerpc tree earlier ]
* tag 'char-misc-4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
cxl: Avoid double free_irq() for psl,slice interrupts
mei: make sysfs modalias format similar as uevent modalias
drivers: char: mem: Fix wraparound check to allow mappings up to the end
MAINTAINERS: Change maintainer of genwqe driver
goldfish_pipe: use GFP_ATOMIC under spin lock
firmware: vpd: do not leak kobjects
firmware: vpd: avoid potential use-after-free when destroying section
firmware: vpd: do not leave freed section attributes to the list
Linus Torvalds [Sun, 11 Jun 2017 18:25:51 +0000 (11:25 -0700)]
Merge tag 'staging-4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO fixes from Greg KH:
"These are mostly all IIO driver fixes, resolving a number of tiny
issues. There's also a ccree and lustre fix in here as well, both fix
problems found in those codebases.
All have been in linux-next with no reported issues"
* tag 'staging-4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: ccree: fix buffer copy
staging/lustre/lov: remove set_fs() call from lov_getstripe()
staging: ccree: add CRYPTO dependency
iio: adc: sun4i-gpadc-iio: fix parent device being used in devm function
iio: light: ltr501 Fix interchanged als/ps register field
iio: adc: bcm_iproc_adc: swap primary and secondary isr handler's
iio: trigger: fix NULL pointer dereference in iio_trigger_write_current()
iio: adc: max9611: Fix attribute measure unit
iio: adc: ti_am335x_adc: allocating too much in probe
iio: adc: sun4i-gpadc-iio: Fix module autoload when OF devices are registered
iio: adc: sun4i-gpadc-iio: Fix module autoload when PLATFORM devices are registered
iio: proximity: as3935: fix iio_trigger_poll issue
iio: proximity: as3935: fix AS3935_INT mask
iio: adc: Max9611: checking for ERR_PTR instead of NULL in probe
iio: proximity: as3935: recalibrate RCO after resume
Linus Torvalds [Sun, 11 Jun 2017 18:21:08 +0000 (11:21 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of user visible fixes (excepting one format string
change).
Four of the qla2xxx fixes only affect the firmware dump path, but it's
still important to the enterprise. The rest are various NULL pointer
crash conditions or outright driver hangs"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: cxgb4i: libcxgbi: in error case RST tcp conn
scsi: scsi_debug: Avoid PI being disabled when TPGS is enabled
scsi: qla2xxx: Fix extraneous ref on sp's after adapter break
scsi: lpfc: prevent potential null pointer dereference
scsi: lpfc: Avoid NULL pointer dereference in lpfc_els_abort()
scsi: lpfc: nvmet_fc: fix format string
scsi: qla2xxx: Fix crash due to NULL pointer dereference of ctx
scsi: qla2xxx: Fix mailbox pointer error in fwdump capture
scsi: qla2xxx: Set bit 15 for DIAG_ECHO_TEST MBC
scsi: qla2xxx: Modify T262 FW dump template to specify same start/end to debug customer issues
scsi: qla2xxx: Fix crash due to mismatch mumber of Q-pair creation for Multi queue
scsi: qla2xxx: Fix NULL pointer access due to redundant fc_host_port_name call
scsi: qla2xxx: Fix recursive loop during target mode configuration for ISP25XX leaving system unresponsive
scsi: bnx2fc: fix race condition in bnx2fc_get_host_stats()
scsi: qla2xxx: don't disable a not previously enabled PCI device
Linus Torvalds [Sun, 11 Jun 2017 18:15:09 +0000 (11:15 -0700)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Dan Williams:
"We expanded the device-dax fs type in 4.12 to be a generic provider of
a struct dax_device with an embedded inode. However, Sasha found some
basic negative testing was not run to verify that this fs cleanly
handles being mounted directly.
Note that the fresh rebase was done to remove an unnecessary Cc:
<stable> tag, but this commit otherwise had a build success
notification from the 0day robot."
Wanpeng Li [Fri, 9 Jun 2017 03:13:40 +0000 (20:13 -0700)]
KVM: async_pf: avoid async pf injection when in guest mode
INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
Not tainted 4.12.0-rc4+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
gnome-terminal- D 0 1734 1015 0x00000000
Call Trace:
__schedule+0x3cd/0xb30
schedule+0x40/0x90
kvm_async_pf_task_wait+0x1cc/0x270
? __vfs_read+0x37/0x150
? prepare_to_swait+0x22/0x70
do_async_page_fault+0x77/0xb0
? do_async_page_fault+0x77/0xb0
async_page_fault+0x28/0x30
This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
and then gives stress to memory on L1, I can observed this hang on L1 when
at least ~70% swap area is occupied on L0.
This is due to async pf was injected to L2 which should be injected to L1,
L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
actually), and L1 guest starts accumulating tasks stuck in D state in
kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.
This patch fixes the hang by doing async pf when executing L1 guest.
Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Guenter Roeck [Wed, 3 May 2017 03:44:16 +0000 (20:44 -0700)]
hexagon: Use raw_copy_to_user
Commit ac4691fac8ad ("hexagon: switch to RAW_COPY_USER") replaced
__copy_to_user_hexagon() with raw_copy_to_user(), but did not catch
all callers, resulting in the following build error.
arch/hexagon/mm/uaccess.c: In function '__clear_user_hexagon':
arch/hexagon/mm/uaccess.c:40:3: error:
implicit declaration of function '__copy_to_user_hexagon'
Fixes: ac4691fac8ad ("hexagon: switch to RAW_COPY_USER") Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Richard Kuo <rkuo@codeaurora.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Linus Torvalds [Sat, 10 Jun 2017 18:09:23 +0000 (11:09 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull UFS fixes from Al Viro:
"This is just the obvious backport fodder; I'm pretty sure that there
will be more - definitely so wrt performance and quite possibly
correctness as well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ufs: we need to sync inode before freeing it
excessive checks in ufs_write_failed() and ufs_evict_inode()
ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
ufs: set correct ->s_maxsize
ufs: restore maintaining ->i_blocks
fix ufs_isblockset()
ufs: restore proper tail allocation
Linus Torvalds [Sat, 10 Jun 2017 18:06:05 +0000 (11:06 -0700)]
Merge branch 'for-linus-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Some fixes that Dave Sterba collected.
We've been hitting an early enospc problem on production machines that
Omar tracked down to an old int->u64 mistake. I waited a bit on this
pull to make sure it was really the problem from production, but it's
on ~2100 hosts now and I think we're good.
Omar also noticed a commit in the queue would make new early ENOSPC
problems. I pulled that out for now, which is why the top three
commits are younger than the rest.
Otherwise these are all fixes, some explaining very old bugs that
we've been poking at for a while"
* 'for-linus-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix delalloc accounting leak caused by u32 overflow
Btrfs: clear EXTENT_DEFRAG bits in finish_ordered_io
btrfs: tree-log.c: Wrong printk information about namelen
btrfs: fix race with relocation recovery and fs_root setup
btrfs: fix memory leak in update_space_info failure path
btrfs: use correct types for page indices in btrfs_page_exists_in_range
btrfs: fix incorrect error return ret being passed to mapping_set_error
btrfs: Make flush bios explicitely sync
btrfs: fiemap: Cache and merge fiemap extent before submit it to user
Linus Torvalds [Sat, 10 Jun 2017 17:51:25 +0000 (10:51 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes: a Geode fix plus a microcode loader fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/intel: Clear patch pointer before jettisoning the initrd
x86/cpu/cyrix: Add alternative Device ID of Geode GX1 SoC