J. Bruce Fields [Tue, 30 Apr 2013 19:28:51 +0000 (15:28 -0400)]
nfsd4: don't remap EISDIR errors in rename
We're going out of our way here to remap an error to make rfc 3530
happy--but the rfc itself (nor rfc 1813, which has similar language)
gives no justification. And disagrees with local filesystem behavior,
with Linux and posix man pages, and knfsd's implemented behavior for v2
and v3.
And the documented behavior seems better, in that it gives a little more
information--you could implement the 3530 behavior using the posix
behavior, but not the other way around.
Also, the Linux client makes no attempt to remap this error in the v4
case, so it can end up just returning EEXIST to the application in a
case where it should return EISDIR.
So honestly I think the rfc's are just buggy here--or in any case it
doesn't see worth the trouble to remap this error.
Reported-by: Frank S Filz <ffilz@us.ibm.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFSv4: Servers should only check SETATTR stateid open mode on size change
The NFSv4 and NFSv4.1 specs are both clear that the server should only check
stateid open mode if a SETATTR specifies the size attribute. If the
open mode is not one that allows writing, then it returns NFS4ERR_OPENMODE.
In the case where the SETATTR is not changing the size, the client will
still pass it the delegation stateid to ensure that the server does not
recall that delegation. In that case, the server should _ignore_ the
delegation open mode, and simply apply standard permission checks.
J. Bruce Fields [Fri, 12 Apr 2013 22:10:56 +0000 (18:10 -0400)]
nfsd4: better error return to indicate SSV non-support
As 4.1 becomes less experimental and SSV still isn't implemented, we
have to admit it's not going to be, and return some sensible error
rather than just saying "our server's broken". Discussion in the ietf
group hasn't turned up any objections to using NFS4ERR_ENC_ALG_UNSUPP
for that purpose.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 15 Apr 2013 20:03:46 +0000 (16:03 -0400)]
nfsd: fix EXDEV checking in rename
We again check for the EXDEV a little later on, so the first check is
redundant. This check is also slightly racier, since a badly timed
eviction from the export cache could leave us with the two fh_export
pointers pointing to two different cache entries which each refer to the
same underlying export.
It's better to compare vfsmounts as the later check does, but that
leaves a minor security hole in the case where the two exports refer to
two different directories especially if (for example) they have
different root-squashing options.
So, compare ex_path.dentry too.
Reported-by: Joe Habermann <joe.habermann@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Simo Sorce [Fri, 25 May 2012 22:09:56 +0000 (18:09 -0400)]
SUNRPC: Use gssproxy upcall for server RPCGSS authentication.
The main advantge of this new upcall mechanism is that it can handle
big tickets as seen in Kerberos implementations where tickets carry
authorization data like the MS-PAC buffer with AD or the Posix Authorization
Data being discussed in IETF on the krbwg working group.
The Gssproxy program is used to perform the accept_sec_context call on the
kernel's behalf. The code is changed to also pass the input buffer straight
to upcall mechanism to avoid allocating and copying many pages as tokens can
be as big (potentially more in future) as 64KiB.
Signed-off-by: Simo Sorce <simo@redhat.com>
[bfields: containerization, negotiation api] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Simo Sorce [Fri, 25 May 2012 22:09:55 +0000 (18:09 -0400)]
SUNRPC: Add RPC based upcall mechanism for RPCGSS auth
This patch implements a sunrpc client to use the services of the gssproxy
userspace daemon.
In particular it allows to perform calls in user space using an RPC
call instead of custom hand-coded upcall/downcall messages.
Currently only accept_sec_context is implemented as that is all is needed for
the server case.
File server modules like NFS and CIFS can use full gssapi services this way,
once init_sec_context is also implemented.
For the NFS server case this code allow to lift the limit of max 2k krb5
tickets. This limit is prevents legitimate kerberos deployments from using krb5
authentication with the Linux NFS server as they have normally ticket that are
many kilobytes large.
It will also allow to lift the limitation on the size of the credential set
(uid,gid,gids) passed down from user space for users that have very many groups
associated. Currently the downcall mechanism used by rpc.svcgssd is limited
to around 2k secondary groups of the 65k allowed by kernel structures.
Signed-off-by: Simo Sorce <simo@redhat.com>
[bfields: containerization, concurrent upcalls, misc. fixes and cleanup] Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Simo Sorce [Fri, 25 May 2012 22:09:53 +0000 (18:09 -0400)]
SUNRPC: conditionally return endtime from import_sec_context
We expose this parameter for a future caller.
It will be used to extract the endtime from the gss-proxy upcall mechanism,
in order to set the rsc cache expiration time.
Signed-off-by: Simo Sorce <simo@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 11 Apr 2013 19:06:36 +0000 (15:06 -0400)]
SUNRPC: allow disabling idle timeout
In the gss-proxy case we don't want to have to reconnect at random--we
want to connect only on gss-proxy startup when we can steal gss-proxy's
context to do the connect in the right namespace.
So, provide a flag that allows the rpc_create caller to turn off the
idle timeout.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Fri, 26 Apr 2013 15:37:29 +0000 (11:37 -0400)]
Merge Trond's nfs-for-next
Merging Trond's nfs-for-next branch, mainly to get b7993cebb841b0da7a33e9d5ce301a9fd3209165 "SUNRPC: Allow rpc_create() to
request that TCP slots be unlimited", which a small piece of the
gss-proxy work depends on.
Merge branch 'rpcsec_gss-from_cel' into linux-next
* rpcsec_gss-from_cel: (21 commits)
NFS: Retry SETCLIENTID with AUTH_SYS instead of AUTH_NONE
NFSv4: Don't clear the machine cred when client establish returns EACCES
NFSv4: Fix issues in nfs4_discover_server_trunking
NFSv4: Fix the fallback to AUTH_NULL if krb5i is not available
NFS: Use server-recommended security flavor by default (NFSv3)
SUNRPC: Don't recognize RPC_AUTH_MAXFLAVOR
NFS: Use "krb5i" to establish NFSv4 state whenever possible
NFS: Try AUTH_UNIX when PUTROOTFH gets NFS4ERR_WRONGSEC
NFS: Use static list of security flavors during root FH lookup recovery
NFS: Avoid PUTROOTFH when managing leases
NFS: Clean up nfs4_proc_get_rootfh
NFS: Handle missing rpc.gssd when looking up root FH
SUNRPC: Remove EXPORT_SYMBOL_GPL() from GSS mech switch
SUNRPC: Make gss_mech_get() static
SUNRPC: Refactor nfsd4_do_encode_secinfo()
SUNRPC: Consider qop when looking up pseudoflavors
SUNRPC: Load GSS kernel module by OID
SUNRPC: Introduce rpcauth_get_pseudoflavor()
SUNRPC: Define rpcsec_gss_info structure
NFS: Remove unneeded forward declaration
...
NFSv4: Don't recheck permissions on open in case of recovery cached open
If we already checked the user access permissions on the original open,
then don't bother checking again on recovery. Doing so can cause a
deadlock with NFSv4.1, since the may_open() operation is not privileged.
Furthermore, we can't report an access permission failure here anyway.
The seconds field of an nfstime4 structure is 64bit, but we are assuming
that the first 32bits are zero-filled. So if the client tries to set
atime to a value before the epoch (touch -t 196001010101), then the
server will save the wrong value on disk.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFSv4.1: Use the more efficient open_noattr call for open-by-filehandle
When we're doing open-by-filehandle in NFSv4.1, we shouldn't need to
do the cache consistency revalidation on the directory. It is
therefore more efficient to just use open_noattr, which returns the
file attributes, but not the directory attributes.
Chuck Lever [Mon, 22 Apr 2013 19:42:48 +0000 (15:42 -0400)]
NFS: Retry SETCLIENTID with AUTH_SYS instead of AUTH_NONE
Recently I changed the SETCLIENTID code to use AUTH_GSS(krb5i), and
then retry with AUTH_NONE if that didn't work. This was to enable
Kerberos NFS mounts to work without forcing Linux NFS clients to
have a keytab on hand.
Rick Macklem reports that the FreeBSD server accepts AUTH_NONE only
for NULL operations (thus certainly not for SETCLIENTID). Falling
back to AUTH_NONE means our proposed 3.10 NFS client will not
interoperate with FreeBSD servers over NFSv4 unless Kerberos is
fully configured on both ends.
If the Linux client falls back to using AUTH_SYS instead for
SETCLIENTID, all should work fine as long as the NFS server is
configured to allow AUTH_SYS for SETCLIENTID.
This may still prevent access to Kerberos-only FreeBSD servers by
Linux clients with no keytab. Rick is of the opinion that the
security settings the server applies to its pseudo-fs should also
apply to the SETCLIENTID operation.
Linux and Solaris NFS servers do not place that limitation on
SETCLIENTID. The security settings for the server's pseudo-fs are
determined automatically as the union of security flavors allowed on
real exports, as recommended by RFC 3530bis; and the flavors allowed
for SETCLIENTID are all flavors supported by the respective server
implementation.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
LOCKD: Ensure that nlmclnt_block resets block->b_status after a server reboot
After a server reboot, the reclaimer thread will recover all the existing
locks. For locks that are blocked, however, it will change the value
of block->b_status to nlm_lck_denied_grace_period in order to signal that
they need to wake up and resend the original blocking lock request.
Due to a bug, however, the block->b_status never gets reset after the
blocked locks have been woken up, and so the process goes into an
infinite loop of resends until the blocked lock is satisfied.
Reported-by: Marc Eshel <eshel@us.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
NFSv4: Use the open stateid if the delegation has the wrong mode
Fix nfs4_select_rw_stateid() so that it chooses the open stateid
(or an all-zero stateid) if the delegation does not match the selected
read/write mode.
RFC 3530 says that the seconds value of a nfstime4 structure is a 64bit
value, but we are instead sending a 32-bit 0 and then a 32bit conversion
of the 64bit Linux value. This means that if we try to set atime to a
value before the epoch (touch -t 196001010101) the client will only send
part of the new value due to lost precision.
NFSv4: Record the OPEN create mode used in the nfs4_opendata structure
If we're doing NFSv4.1 against a server that has persistent sessions,
then we should not need to call SETATTR in order to reset the file
attributes immediately after doing an exclusive create.
Note that since the create mode depends on the type of session that
has been negotiated with the server, we should not choose the
mode until after we've got a session slot.
A 4.1 server must notify a client that has had any state revoked using
the SEQ4_STATUS_RECALLABLE_STATE_REVOKED flag. The client can figure
out exactly which state is the problem using CHECK_STATEID and then free
it using FREE_STATEID. The status flag will be unset once all such
revoked stateids are freed.
Our server's only recallable state is delegations. So we keep with each
4.1 client a list of delegations that have timed out and been recalled,
but haven't yet been freed by FREE_STATEID.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
SUNRPC: Fix a livelock problem in the xprt->backlog queue
This patch ensures that we throttle new RPC requests if there are
requests already waiting in the xprt->backlog queue. The reason for
doing this is to fix livelock issues that can occur when an existing
(high priority) task is waiting in the backlog queue, gets woken up
by xprt_free_slot(), but a new task then steals the slot.
NFSv4: Fix handling of revoked delegations by setattr
Currently, _nfs4_do_setattr() will use the delegation stateid if no
writeable open file stateid is available.
If the server revokes that delegation stateid, then the call to
nfs4_handle_exception() will fail to handle the error due to the
lack of a struct nfs4_state, and will just convert the error into
an EIO.
This patch just removes the requirement that we must have a
struct nfs4_state in order to invalidate the delegation and
retry.
Reported-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
J. Bruce Fields [Tue, 9 Apr 2013 21:42:28 +0000 (17:42 -0400)]
nfsd4: clean up validate_stateid
The logic here is better expressed with a switch statement.
While we're here, CLOSED stateids (or stateids of an unkown type--which
would indicate a server bug) should probably return nfserr_bad_stateid,
though this behavior shouldn't affect any non-buggy client.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 8 Apr 2013 20:44:14 +0000 (16:44 -0400)]
nfsd4: fix forechannel attribute negotiation
Negotiation of the 4.1 session forechannel attributes is a mess. Fix:
- Move it all into check_forechannel_attrs instead of spreading
it between that, alloc_session, and init_forechannel_attrs.
- set a minimum "slotsize" so that our drc memory limits apply
even for small maxresponsesize_cached. This also fixes some
bugs when slotsize becomes <= 0.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Fri, 29 Mar 2013 00:37:14 +0000 (20:37 -0400)]
nfsd4: don't close read-write opens too soon
Don't actually close any opens until we don't need them at all.
This means being left with write access when it's not really necessary,
but that's better than putting a file that might still have posix locks
held on it, as we have been.
Reported-by: Toralf Förster <toralf.foerster@gmx.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Sun, 7 Apr 2013 17:28:16 +0000 (13:28 -0400)]
nfsd4: release lockowners on last unlock in 4.1 case
In the 4.1 case we're supposed to release lockowners as soon as they're
no longer used.
It would probably be more efficient to reference count them, but that's
slightly fiddly due to the need to have callbacks from locks.c to take
into account lock merging and splitting.
For most cases just scanning the inode's lock list on unlock for
matching locks will be sufficient.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
NFSv4: Handle timeouts correctly when probing for lease validity
When we send a RENEW or SEQUENCE operation in order to probe if the
lease is still valid, we want it to be able to time out since the
lease we are probing is likely to time out too. Currently, because
we use soft mount semantics for these RPC calls, the return value
is EIO, which causes the state manager to exit with an "unhandled
error" message.
This patch changes the call semantics, so that the RPC layer returns
ETIMEDOUT instead of EIO. We then have the state manager default to
a simple retry instead of exiting.
J. Bruce Fields [Mon, 1 Apr 2013 20:37:12 +0000 (16:37 -0400)]
nfsd4: cleanup handling of nfsv4.0 closed stateid's
Closed stateid's are kept around a little while to handle close replays
in the 4.0 case. So we stash them in the last-used stateid in the
oo_last_closed_stateid field of the open owner. We can free that in
encode_seqid_op_tail once the seqid on the open owner is next
incremented. But we don't want to do that on the close itself; so we
set NFS4_OO_PURGE_CLOSE flag set on the open owner, skip freeing it the
first time through encode_seqid_op_tail, then when we see that flag set
next time we free it.
This is unnecessarily baroque.
Instead, just move the logic that increments the seqid out of the xdr
code and into the operation code itself.
The justification given for the current placement is that we need to
wait till the last minute to be sure we know whether the status is a
sequence-id-mutating error or not, but examination of the code shows
that can't actually happen.
Reported-by: Yanchuan Nian <ycnian@gmail.com> Tested-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If the state manager thread is already running, we may end up
racing with it in nfs_client_return_marked_delegations. Better to
just allow the state manager thread to do the job.
NFSv4: Be less aggressive about returning delegations for open files
Currently, if the application that holds the file open isn't doing
I/O, we may end up returning the delegation. This means that we can
no longer cache the file as aggressively, and often also that we
multiply the state that both the server and the client needs to track.
This patch adds a check for open files to the routine that scans
for delegations that are unreferenced.
NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_open_delegation_recall
A server shouldn't normally return NFS4ERR_GRACE if the client holds a
delegation, since no conflicting lock reclaims can be granted, however
the spec does not require the server to grant the open in this
instance
NFSv4: Handle NFS4ERR_DELAY and NFS4ERR_GRACE in nfs4_lock_delegation_recall
A server shouldn't normally return NFS4ERR_GRACE if the client holds a
delegation, since no conflicting lock reclaims can be granted, however
the spec does not require the server to grant the lock in this
instance.
Paul Bolle [Sat, 9 Mar 2013 16:02:31 +0000 (17:02 +0100)]
sunrpc: drop "select NETVM"
The Kconfig entry for SUNRPC_SWAP selects NETVM. That select statement
was added in commit a564b8f0398636ba30b07c0eaebdef7ff7837249 ("nfs:
enable swap on NFS"). But there's no Kconfig symbol NETVM. It apparently
was only in used in development versions of the swap over nfs
functionality but never entered mainline. Anyhow, it is a nop and can
safely be dropped.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Jeff Layton [Mon, 25 Mar 2013 11:59:57 +0000 (07:59 -0400)]
nfs: allow the v4.1 callback thread to freeze
The v4.1 callback thread has set_freezable() at the top, but it doesn't
ever try to freeze within the loop. Have it call try_to_freeze() at the
top of the loop. If a freeze event occurs, recheck kthread_should_stop()
after thawing.
SUNRPC: Fix a potential memory leak in rpc_new_client
If the call to rpciod_up() fails, we currently leak a reference to the
struct rpc_xprt.
As part of the fix, we also remove the redundant check for xprt!=NULL.
This is already taken care of by the callers.
It is unsafe to use list_for_each_entry_safe() here, because
when we drop the nn->nfs_client_lock, we pin the _current_ list
entry and ensure that it stays in the list, but we don't do the
same for the _next_ list entry. Use of list_for_each_entry() is
therefore the correct thing to do.
Also fix the refcounting in nfs41_walk_client_list().
Finally, ensure that the nfs_client has finished being initialised
and, in the case of NFSv4.1, that the session is set up.
NFSv4: Fix issues in nfs4_discover_server_trunking
- Ensure that we exit with ENOENT if the call to ops->get_clid_cred()
fails.
- Handle the case where ops->detect_trunking() exits with an
unexpected error, and return EIO.
NFSv4: Fix the fallback to AUTH_NULL if krb5i is not available
If the rpcsec_gss_krb5 module cannot be loaded, the attempt to create
an rpc_client in nfs4_init_client will currently fail with an EINVAL.
Fix is to retry with AUTH_NULL.
Regression introduced by the commit "NFS: Use "krb5i" to establish NFSv4
state whenever possible"
Chuck Lever [Fri, 22 Mar 2013 16:53:17 +0000 (12:53 -0400)]
NFS: Use server-recommended security flavor by default (NFSv3)
Since commit ec88f28d in 2009, checking if the user-specified flavor
is in the server's flavor list has been the source of a few
noticeable regressions (now fixed), but there is one that is still
vexing.
An NFS server can list AUTH_NULL in its flavor list, which suggests
a client should try to mount the server with the flavor of the
client's choice, but the server will squash all accesses. In some
cases, our client fails to mount a server because of this check,
when the mount could have proceeded successfully.
Skip this check if the user has specified "sec=" on the mount
command line. But do consult the server-provided flavor list to
choose a security flavor if no sec= option is specified on the mount
command.
If a server lists Kerberos pseudoflavors before "sys" in its export
options, our client now chooses Kerberos over AUTH_UNIX for mount
points, when no security flavor is specified by the mount command.
This could be surprising to some administrators or users, who would
then need to have Kerberos credentials to access the export.
Or, a client administrator may not have enabled rpc.gssd. In this
case, auth_rpcgss.ko might still be loadable, which is enough for
the new logic to choose Kerberos over AUTH_UNIX. But the mount
would fail since no GSS context can be created without rpc.gssd
running.
To retain the use of AUTH_UNIX by default:
o The server administrator can ensure that "sys" is listed before
Kerberos flavors in its export security options (see
exports(5)),
o The client administrator can explicitly specify "sec=sys" on
its mount command line (see nfs(5)),
o The client administrator can use "Sec=sys" in an appropriate
section of /etc/nfsmount.conf (see nfsmount.conf(5)), or
o The client administrator can blacklist auth_rpcgss.ko.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Chuck Lever [Fri, 22 Mar 2013 16:53:08 +0000 (12:53 -0400)]
SUNRPC: Don't recognize RPC_AUTH_MAXFLAVOR
RPC_AUTH_MAXFLAVOR is an invalid flavor, on purpose. Don't allow
any processing whatsoever if a caller passes it to rpcauth_create()
or rpcauth_get_gssinfo().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
J. Bruce Fields [Thu, 21 Mar 2013 15:21:50 +0000 (11:21 -0400)]
nfsd4: shut down more of delegation earlier
Once we've unhashed the delegation, it's only hanging around for the
benefit of an oustanding recall, which only needs the encoded
filehandle, stateid, and dl_retries counter. No point keeping the file
around any longer, or keeping it hashed.
This also fixes a race: calls to idr_remove should really be serialized
by the caller, but the nfs4_put_delegation call from the callback code
isn't taking the state lock.
(Better might be to cancel the callback before destroying the
delegation, and remove any need for reference counting--but I don't see
an easy way to cancel an rpc call.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Tue, 19 Mar 2013 16:05:39 +0000 (12:05 -0400)]
nfsd4: don't destroy in-use session
This changes session destruction to be similar to client destruction in
that attempts to destroy a session while in use (which should be rare
corner cases) result in DELAY. This simplifies things somewhat and
helps meet a coming 4.2 requirement.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Tue, 2 Apr 2013 02:23:49 +0000 (22:23 -0400)]
nfsd4: don't destroy in-use clients
When a setclientid_confirm or create_session confirms a client after a
client reboot, it also destroys any previous state held by that client.
The shutdown of that previous state must be careful not to free the
client out from under threads processing other requests that refer to
the client.
This is a particular problem in the NFSv4.1 case when we hold a
reference to a session (hence a client) throughout compound processing.
The server attempts to handle this by unhashing the client at the time
it's destroyed, then delaying the final free to the end. But this still
leaves some races in the current code.
I believe it's simpler just to fail the attempt to destroy the client by
returning NFS4ERR_DELAY. This is a case that should never happen
anyway.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Mon, 18 Mar 2013 21:31:30 +0000 (17:31 -0400)]
nfsd4: simplify bind_conn_to_session locking
The locking here is very fiddly, and there's no reason for us to be
setting cstate->session, since this is the only op in the compound.
Let's just take the state lock and drop the reference counting.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Tue, 12 Mar 2013 14:12:37 +0000 (10:12 -0400)]
nfsd4: warn on odd create_session state
This should never happen.
(Note: the comparable case in setclientid_confirm *can* happen, since
updating a client record can result in both confirmed and unconfirmed
records with the same clientid.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
ycnian@gmail.com [Mon, 11 Mar 2013 00:46:14 +0000 (08:46 +0800)]
nfsd: fix bug on nfs4 stateid deallocation
NFS4_OO_PURGE_CLOSE is not handled properly. To avoid memory leak, nfs4
stateid which is pointed by oo_last_closed_stid is freed in nfsd4_close(),
but NFS4_OO_PURGE_CLOSE isn't cleared meanwhile. So the stateid released in
THIS close procedure may be freed immediately in the coming encoding function.
Sorry that Signed-off-by was forgotten in last version.
Signed-off-by: Yanchuan Nian <ycnian@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Fri, 8 Mar 2013 14:30:43 +0000 (09:30 -0500)]
nfsd4: fix use-after-free of 4.1 client on connection loss
Once we drop the lock here there's nothing keeping the client around:
the only lock still held is the xpt_lock on this socket, but this socket
no longer has any connection with the client so there's no way for other
code to know we're still using the client.
The solution is simple: all nfsd4_probe_callback does is set a few
variables and queue some work, so there's no reason we can't just keep
it under the lock.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
J. Bruce Fields [Thu, 7 Mar 2013 22:26:18 +0000 (17:26 -0500)]
nfsd4: fix race on client shutdown
Dropping the session's reference count after the client's means we leave
a window where the session's se_client pointer is NULL. An xpt_user
callback that encounters such a session may then crash:
J. Bruce Fields [Thu, 28 Feb 2013 20:51:49 +0000 (12:51 -0800)]
nfsd4: handle seqid-mutating open errors from xdr decoding
If a client sets an owner (or group_owner or acl) attribute on open for
create, and the mapping of that owner to an id fails, then we return
BAD_OWNER. But BAD_OWNER is a seqid-mutating error, so we can't
shortcut the open processing that case: we have to at least look up the
owner so we can find the seqid to bump.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Wed, 27 Mar 2013 14:15:39 +0000 (10:15 -0400)]
nfsd: scale up the number of DRC hash buckets with cache size
We've now increased the size of the duplicate reply cache by quite a
bit, but the number of hash buckets has not changed. So, we've gone from
an average hash chain length of 16 in the old code to 4096 when the
cache is its largest. Change the code to scale out the number of buckets
with the max size of the cache.
At the same time, we also need to fix the hash function since the
existing one isn't really suitable when there are more than 256 buckets.
Move instead to use the stock hash_32 function for this. Testing on a
machine that had 2048 buckets showed that this gave a smaller
longest:average ratio than the existing hash function:
The formula here is longest hash bucket searched divided by average
number of entries per bucket at the time that we saw that longest
bucket:
old hash: 68/(39258/2048) == 3.547404
hash_32: 45/(33773/2048) == 2.728807
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Jeff Layton [Wed, 27 Mar 2013 14:15:39 +0000 (10:15 -0400)]
nfsd: keep stats on worst hash balancing seen so far
The typical case with the DRC is a cache miss, so if we keep track of
the max number of entries that we've ever walked over in a search, then
we should have a reasonable estimate of the longest hash chain that
we've ever seen.
With that, we'll also keep track of the total size of the cache when we
see the longest chain. In the case of a tie, we prefer to track the
smallest total cache size in order to properly gauge the worst-case
ratio of max vs. avg chain length.
Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>