]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
12 years agonfs: declare nfs_xdev_mount as static
Stanislav Kinsbursky [Tue, 2 Oct 2012 10:19:04 +0000 (14:19 +0400)]
nfs: declare nfs_xdev_mount as static

Sparse warning:
fs/nfs/super.c:2517:15: warning: symbol 'nfs_xdev_mount' was not declared.
Should it be static?

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agonfs: declare nfs_callback_tcp_port in header
Stanislav Kinsbursky [Tue, 2 Oct 2012 10:18:59 +0000 (14:18 +0400)]
nfs: declare nfs_callback_tcp_port in header

Sparse warning:
fs/nfs/super.c:2638:16: sparse: symbol 'nfs_callback_tcpport' was not
declared. Should it be static?

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agonfs: include NFSv4 header in netns.h
Stanislav Kinsbursky [Tue, 2 Oct 2012 10:18:54 +0000 (14:18 +0400)]
nfs: include NFSv4 header in netns.h

Build error:
fs/nfs/netns.h:27:15: error: 'NFS4_MAX_MINOR_VERSION' undeclared here (not in
a function)

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.0 reclaim reboot state when re-establishing clientid
Andy Adamson [Tue, 2 Oct 2012 00:42:32 +0000 (20:42 -0400)]
NFSv4.0 reclaim reboot state when re-establishing clientid

We should reclaim reboot state when the clientid is stale.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: nfs4_match_clientids is only used by NFSv4.1
Trond Myklebust [Mon, 1 Oct 2012 23:37:51 +0000 (16:37 -0700)]
NFSv4: nfs4_match_clientids is only used by NFSv4.1

Fix another compiler warning.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Fix the minor version callback channel startup
Trond Myklebust [Mon, 1 Oct 2012 23:33:18 +0000 (16:33 -0700)]
NFSv4: Fix the minor version callback channel startup

The current spaghetti code confuses some versions of gcc (and just
looks ugly as hell)! Clean up...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Fix up a merge conflict between migration and container changes
Trond Myklebust [Mon, 1 Oct 2012 23:17:31 +0000 (16:17 -0700)]
NFSv4: Fix up a merge conflict between migration and container changes

nfs_callback_tcpport is now per-net_namespace.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoMerge branch 'bugfixes' into nfs-for-next
Trond Myklebust [Mon, 1 Oct 2012 22:42:14 +0000 (15:42 -0700)]
Merge branch 'bugfixes' into nfs-for-next

12 years agonfs: replace strict_strto* with kstrto*
Daniel Walter [Wed, 26 Sep 2012 19:51:46 +0000 (21:51 +0200)]
nfs: replace strict_strto* with kstrto*

[nfs] replace strict_str* with kstr* variants

 * replace string conversions with newer kstr* functions

Signed-off-by: Daniel Walter <sahne@0x90.at>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Remove unnecessary semicolons (fs/nfs/client.c)
Yanchuan Nian [Mon, 10 Sep 2012 00:40:16 +0000 (08:40 +0800)]
NFS: Remove unnecessary semicolons (fs/nfs/client.c)

There are some unnecessary semicolons in function find_nfs_version. Just remove them.

Signed-off-by: Yanchuan Nian <ycnian@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agopnfsblock: use list_move_tail instead of list_del/list_add_tail
Wei Yongjun [Wed, 5 Sep 2012 06:38:03 +0000 (14:38 +0800)]
pnfsblock: use list_move_tail instead of list_del/list_add_tail

Using list_move_tail() instead of list_del() + list_add_tail().

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agopnfsblock: fix non-aligned DIO write
Peng Tao [Thu, 23 Aug 2012 16:27:53 +0000 (00:27 +0800)]
pnfsblock: fix non-aligned DIO write

For DIO writes, if it is not blocksize aligned, we need to do
internal serialization. It may slow down writers anyway. So we
just bail them out and resend to MDS.

Cc: stable <stable@vger.kernel.org> [since v3.4]
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agopnfsblock: fix non-aligned DIO read
Peng Tao [Thu, 23 Aug 2012 16:27:52 +0000 (00:27 +0800)]
pnfsblock: fix non-aligned DIO read

For DIO read, if it is not sector aligned, we should reject it
and resend via MDS. Otherwise there might be data corruption.
Also teach bl_read_pagelist to handle partial page reads for DIO.

Cc: stable <stable@vger.kernel.org> [since v3.4]
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agopnfsblock: fix partial page buffer wirte
Peng Tao [Thu, 23 Aug 2012 16:27:51 +0000 (00:27 +0800)]
pnfsblock: fix partial page buffer wirte

If applications use flock to protect its write range, generic NFS
will not do read-modify-write cycle at page cache level. Therefore
LD should know how to handle non-sector aligned writes. Otherwise
there will be data corruption.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoRevert "pnfsblock: bail out partial page IO"
Peng Tao [Thu, 23 Aug 2012 16:27:50 +0000 (00:27 +0800)]
Revert "pnfsblock: bail out partial page IO"

This reverts commit 159e0561e322dd8008fff59e36efff8d2bdd0b0e, in favor
of a more complete fix to the alignment issue.

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS41: fix error of setting blocklayoutdriver
Peng Tao [Thu, 23 Aug 2012 16:27:49 +0000 (00:27 +0800)]
NFS41: fix error of setting blocklayoutdriver

After commit e38eb650 (NFS: set_pnfs_layoutdriver() from
nfs4_proc_fsinfo()), set_pnfs_layoutdriver() is called inside
nfs4_proc_fsinfo(), but pnfs_blksize is not set. It causes setting
blocklayoutdriver failure and pnfsblock mount failure.

Cc: stable <stable@vger.kernel.org> [since v3.5]
Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv41: fix DIO write_io calculation
Peng Tao [Thu, 23 Aug 2012 16:27:48 +0000 (00:27 +0800)]
NFSv41: fix DIO write_io calculation

pnfs_within_mdsthreshold() is called inside pg_init. We need to set
read_io/write_io before that. Otherwise we fail pnfs_within_mdsthreshold()
and IO goes to MDS.
A simple test case:
dd if=foo of=/mnt/pnfs/bar bs=10M count=1 oflag=direct

Signed-off-by: Peng Tao <tao.peng@emc.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Add nfs4_unique_id boot parameter
Chuck Lever [Fri, 14 Sep 2012 21:24:41 +0000 (17:24 -0400)]
NFS: Add nfs4_unique_id boot parameter

An optional boot parameter is introduced to allow client
administrators to specify a string that the Linux NFS client can
insert into its nfs_client_id4 id string, to make it both more
globally unique, and to ensure that it doesn't change even if the
client's nodename changes.

If this boot parameter is not specified, the client's nodename is
used, as before.

Client installation procedures can create a unique string (typically,
a UUID) which remains unchanged during the lifetime of that client
instance.  This works just like creating a UUID for the label of the
system's root and boot volumes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Discover NFSv4 server trunking when mounting
Chuck Lever [Fri, 14 Sep 2012 21:24:32 +0000 (17:24 -0400)]
NFS: Discover NFSv4 server trunking when mounting

"Server trunking" is a fancy named for a multi-homed NFS server.
Trunking might occur if a client sends NFS requests for a single
workload to multiple network interfaces on the same server.  There
are some implications for NFSv4 state management that make it useful
for a client to know if a single NFSv4 server instance is
multi-homed.  (Note this is only a consideration for NFSv4, not for
legacy versions of NFS, which are stateless).

If a client cares about server trunking, no NFSv4 operations can
proceed until that client determines who it is talking to.  Thus
server IP trunking discovery must be done when the client first
encounters an unfamiliar server IP address.

The nfs_get_client() function walks the nfs_client_list and matches
on server IP address.  The outcome of that walk tells us immediately
if we have an unfamiliar server IP address.  It invokes
nfs_init_client() in this case.  Thus, nfs4_init_client() is a good
spot to perform trunking discovery.

Discovery requires a client to establish a fresh client ID, so our
client will now send SETCLIENTID or EXCHANGE_ID as the first NFS
operation after a successful ping, rather than waiting for an
application to perform an operation that requires NFSv4 state.

The exact process for detecting trunking is different for NFSv4.0 and
NFSv4.1, so a minorversion-specific init_client callout method is
introduced.

CLID_INUSE recovery is important for the trunking discovery process.
CLID_INUSE is a sign the server recognizes the client's nfs_client_id4
id string, but the client is using the wrong principal this time for
the SETCLIENTID operation.  The SETCLIENTID must be retried with a
series of different principals until one works, and then the rest of
trunking discovery can proceed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Use the same nfs_client_id4 for every server
Chuck Lever [Fri, 14 Sep 2012 21:24:21 +0000 (17:24 -0400)]
NFS: Use the same nfs_client_id4 for every server

Currently, when identifying itself to NFS servers, the Linux NFS
client uses a unique nfs_client_id4.id string for each server IP
address it talks with.  For example, when client A talks to server X,
the client identifies itself using a string like "AX".  The
requirements for these strings are specified in detail by RFC 3530
(and bis).

This form of client identification presents a problem for Transparent
State Migration.  When client A's state on server X is migrated to
server Y, it continues to be associated with string "AX."  But,
according to the rules of client string construction above, client
A will present string "AY" when communicating with server Y.

Server Y thus has no way to know that client A should be associated
with the state migrated from server X.  "AX" is all but abandoned,
interfering with establishing fresh state for client A on server Y.

To support transparent state migration, then, NFSv4.0 clients must
instead use the same nfs_client_id4.id string to identify themselves
to every NFS server; something like "A".

Now a client identifies itself as "A" to server X.  When a file
system on server X transitions to server Y, and client A identifies
itself as "A" to server Y, Y will know immediately that the state
associated with "A," whether it is native or migrated, is owned by
the client, and can merge both into a single lease.

As a pre-requisite to adding support for NFSv4 migration to the Linux
NFS client, this patch changes the way Linux identifies itself to NFS
servers via the SETCLIENTID (NFSv4 minor version 0) and EXCHANGE_ID
(NFSv4 minor version 1) operations.

In addition to removing the server's IP address from nfs_client_id4,
the Linux NFS client will also no longer use its own source IP address
as part of the nfs_client_id4 string.  On multi-homed clients, the
value of this address depends on the address family and network
routing used to contact the server, thus it can be different for each
server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Introduce "migration" mount option
Chuck Lever [Fri, 14 Sep 2012 21:24:11 +0000 (17:24 -0400)]
NFS: Introduce "migration" mount option

Currently, the Linux client uses a unique nfs_client_id4.id string
when identifying itself to distinct NFS servers.

To support transparent state migration, the Linux client will have to
use the same nfs_client_id4 string for all servers it communicates
with (also known as the "uniform client string" approach).  Otherwise
NFS servers can not recognize that open and lock state need to be
merged after a file system transition.

Unfortunately, there are some NFSv4.0 servers currently in the field
that do not tolerate the uniform client string approach.

Thus, by default, our NFSv4.0 mounts will continue to use the current
approach, and we introduce a mount option that switches them to use
the uniform model.  Client administrators must identify which servers
can be mounted with this option.  Eventually most NFSv4.0 servers will
be able to handle the uniform approach, and we can change the default.

The first mount of a server controls the behavior for all subsequent
mounts for the lifetime of that set of mounts of that server.  After
the last mount of that server is gone, the client erases the data
structure that tracks the lease.  A subsequent lease may then honor
a different "migration" setting.

This patch adds only the infrastructure for parsing the new mount
option.  Support for uniform client strings is added in a subsequent
patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Introduce rpc_clone_client_set_auth()
Chuck Lever [Fri, 14 Sep 2012 21:24:02 +0000 (17:24 -0400)]
SUNRPC: Introduce rpc_clone_client_set_auth()

An ULP is supposed to be able to replace a GSS rpc_auth object with
another GSS rpc_auth object using rpcauth_create().  However,
rpcauth_create() in 3.5 reliably fails with -EEXIST in this case.
This is because when gss_create() attempts to create the upcall pipes,
sometimes they are already there.  For example if a pipe FS mount
event occurs, or a previous GSS flavor was in use for this rpc_clnt.

It turns out that's not the only problem here.  While working on a
fix for the above problem, we noticed that replacing an rpc_clnt's
rpc_auth is not safe, since dereferencing the cl_auth field is not
protected in any way.

So we're deprecating the ability of rpcauth_create() to switch an
rpc_clnt's security flavor during normal operation.  Instead, let's
add a fresh API that clones an rpc_clnt and gives the clone a new
flavor before it's used.

This makes immediate use of the new __rpc_clone_client() helper.

This can be used in a similar fashion to rpcauth_create() when a
client is hunting for the correct security flavor.  Instead of
replacing an rpc_clnt's security flavor in a loop, the ULP replaces
the whole rpc_clnt.

To fix the -EEXIST problem, any ULP logic that relies on replacing
an rpc_clnt's rpc_auth with rpcauth_create() must be changed to use
this API instead.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Refactor rpc_clone_client()
Chuck Lever [Fri, 14 Sep 2012 21:23:52 +0000 (17:23 -0400)]
SUNRPC: Refactor rpc_clone_client()

rpc_clone_client() does most of the same tasks as rpc_new_client(),
so there is an opportunity for code re-use.  Create a generic helper
that makes it easy to clone an RPC client while replacing any of the
clnt's parameters.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Use __func__ in dprintk() in auth_gss.c
Chuck Lever [Fri, 14 Sep 2012 21:23:43 +0000 (17:23 -0400)]
SUNRPC: Use __func__ in dprintk() in auth_gss.c

Clean up: Some function names have changed, but debugging messages
were never updated.  Automate the construction of the function name
in debugging messages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Clean up dprintk messages in rpc_pipe.c
Chuck Lever [Fri, 14 Sep 2012 21:23:34 +0000 (17:23 -0400)]
SUNRPC: Clean up dprintk messages in rpc_pipe.c

Clean up: The blank space in front of the message must be spaces.
Tabs show up on the console as a graphical character.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Slow down state manager after an unhandled error
Chuck Lever [Fri, 14 Sep 2012 21:23:23 +0000 (17:23 -0400)]
NFS: Slow down state manager after an unhandled error

If the state manager thread is not actually able to fully recover from
some situation, it wakes up waiters, who kick off a new state manager
thread.  Quite often the fresh invocation of the state manager is just
as successful.

This results in a livelock as the client dumps thousands of NFS
requests a second on the network in a vain attempt to recover.  Not
very friendly.

To mitigate this situation, add a delay in the state manager after
an unhandled error, so that the client sends just a few requests
every second in this case.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: nfs_parsed_mount_options can use unsigned int
Chuck Lever [Fri, 14 Sep 2012 21:23:14 +0000 (17:23 -0400)]
NFS: nfs_parsed_mount_options can use unsigned int

fs/nfs/super.c: In function ‘nfs_compare_remount_data’:
fs/nfs/super.c:2042:18: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2043:18: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2044:20: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2046:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2047:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2048:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2049:21: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]
fs/nfs/super.c:2050:18: warning: comparison between signed and
    unsigned integer expressions [-Wsign-compare]

Seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agolockd: create and use per-net NSM RPC clients on MON/UNMON requests
Stanislav Kinsbursky [Tue, 18 Sep 2012 09:37:23 +0000 (13:37 +0400)]
lockd: create and use per-net NSM RPC clients on MON/UNMON requests

NSM RPC client can be required on NFSv3 umount, when child reaper is dying
(and destroying it's mount namespace). It means, that current nsproxy is set
to NULL already, but creation of RPC client requires UTS namespace for gaining
hostname string.

This patch creates reference-counted per-net NSM client on first monitor
request and destroys it after last unmonitor request.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agolockd: use rpc client's cl_nodename for id encoding
Stanislav Kinsbursky [Tue, 18 Sep 2012 09:37:18 +0000 (13:37 +0400)]
lockd: use rpc client's cl_nodename for id encoding

Taking hostname from uts namespace if not safe, because this cuold be
performind during umount operation on child reaper death. And in this case
current->nsproxy is NULL already.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agolockd: per-net NSM client creation and destruction helpers introduced
Stanislav Kinsbursky [Tue, 18 Sep 2012 09:37:12 +0000 (13:37 +0400)]
lockd: per-net NSM client creation and destruction helpers introduced

NSM RPC client can be required on NFSv3 umount, when child reaper is dying (and
destroying it's mount namespace). It means, that current nsproxy is set to
NULL already, but creation of RPC client requires UTS namespace for gaining
hostname string.
This patch introduces reference counted NFS RPC clients creation and
destruction helpers (similar to RPCBIND RPC clients).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: add debug messages to callback down function
Stanislav Kinsbursky [Mon, 20 Aug 2012 14:00:51 +0000 (18:00 +0400)]
NFS: add debug messages to callback down function

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: callback per-net usage counting introduced
Stanislav Kinsbursky [Mon, 20 Aug 2012 14:00:46 +0000 (18:00 +0400)]
NFS: callback per-net usage counting introduced

This patch also introduces refcount-aware nfs_callback_down_net() wrapper for
svc_shutdown_net().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: make nfs_callback_tcpport6 per network context
Stanislav Kinsbursky [Mon, 20 Aug 2012 14:00:41 +0000 (18:00 +0400)]
NFS: make nfs_callback_tcpport6 per network context

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: make nfs_callback_tcpport per network context
Stanislav Kinsbursky [Mon, 20 Aug 2012 14:00:36 +0000 (18:00 +0400)]
NFS: make nfs_callback_tcpport per network context

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: callback up - users counting cleanup
Stanislav Kinsbursky [Mon, 20 Aug 2012 14:00:31 +0000 (18:00 +0400)]
NFS: callback up - users counting cleanup

Usage coutner now increased only is the service was started sccessfully.
Even if service is running already, then goto is not required anymore, because
service creation and start will be skipped.
With this patch code looks clearer.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: callback service start function introduced
Stanislav Kinsbursky [Mon, 20 Aug 2012 14:00:26 +0000 (18:00 +0400)]
NFS: callback service start function introduced

This is just a code move, which from my POW makes code looks better.
I.e. now on start we have 3 different stages:
1) Service creation.
2) Service per-net data allocation.
3) Service start.

Patch also renames goto label "out_err:" into "err_start:" to reflect new
changes.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: callback up - transport backchannel cleanup
Stanislav Kinsbursky [Mon, 20 Aug 2012 14:00:21 +0000 (18:00 +0400)]
NFS: callback up - transport backchannel cleanup

No need to assign transports backchannel server explicitly in
nfs41_callback_up() -  there is nfs_callback_bc_serv() function for this.
By using it, nfs4_callback_up() and nfs41_callback_up() can be called without
transport argument.

Note: service have to be passed to nfs_callback_bc_serv() instead of callback,
since callback link can be uninitialized.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: move per-net callback thread initialization to nfs_callback_up_net()
Stanislav Kinsbursky [Mon, 20 Aug 2012 14:00:16 +0000 (18:00 +0400)]
NFS: move per-net callback thread initialization to nfs_callback_up_net()

v4:
1) Callback transport creation routine selection by version simlified.

This new function in now called before nfs_minorversion_callback_svc_setup()).

Also few small changes:
1) current network namespace in nfs_callback_up() was replaced by transport net.
2) svc_shutdown_net() was moved prior to callback usage counter decrement
(because in case of per-net data allocation faulure svc_shutdown_net() have to
be skipped).

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: callback service creation function introduced
Stanislav Kinsbursky [Mon, 20 Aug 2012 14:00:11 +0000 (18:00 +0400)]
NFS: callback service creation function introduced

This function creates service if it's not exist, or increase usage counter of
the existent, and returns pointer to it.
Usage counter will be droppepd by svc_destroy() later in nfs_callback_up().

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: pass net to nfs_callback_down()
Stanislav Kinsbursky [Mon, 20 Aug 2012 14:00:06 +0000 (18:00 +0400)]
NFS: pass net to nfs_callback_down()

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Add ACCESS operation to OPEN compound
Weston Andros Adamson [Mon, 10 Sep 2012 18:00:46 +0000 (14:00 -0400)]
NFSv4: Add ACCESS operation to OPEN compound

The OPEN operation has no way to differentiate an open for read and an
open for execution - both look like read to the server. This allowed
users to read files that didn't have READ access but did have EXEC access,
which is obviously wrong.

This patch adds an ACCESS call to the OPEN compound to handle the
difference between OPENs for reading and execution.  Since we're going
through the trouble of calling ACCESS, we check all possible access bits
and cache the results hopefully avoiding an ACCESS call in the future.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Use kzalloc() instead of kmalloc() in the idmapper
Bryan Schumaker [Thu, 9 Aug 2012 18:05:51 +0000 (14:05 -0400)]
NFS: Use kzalloc() instead of kmalloc() in the idmapper

This will allocate memory that has already been zeroed, allowing us to
remove the memset later on.

Signed-off-by: Bryan Schumaker <bjchuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Remove bad delegations during open recovery
Bryan Schumaker [Wed, 26 Sep 2012 19:25:52 +0000 (15:25 -0400)]
NFS: Remove bad delegations during open recovery

I put the client into an open recovery loop by:
Client: Open file
read half
Server: Expire client (echo 0 > /sys/kernel/debug/nfsd/forget_clients)
Client: Drop vm cache (echo 3 > /proc/sys/vm/drop_caches)
finish reading file

This causes a loop because the client never updates the nfs4_state after
discovering that the delegation is invalid.  This means it will keep
trying to read using the bad delegation rather than attempting to re-open
the file.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
CC: stable@vger.kernel.org [3.4+]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Always use the open stateid when checking for expired opens
Bryan Schumaker [Wed, 26 Sep 2012 19:25:53 +0000 (15:25 -0400)]
NFS: Always use the open stateid when checking for expired opens

If we are reading through a delegation, and the delegation is OK then
state->stateid will still point to a delegation stateid and not an open
stateid.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Limit the rpciod workqueue concurrency
Trond Myklebust [Sat, 29 Sep 2012 00:24:16 +0000 (20:24 -0400)]
SUNRPC: Limit the rpciod workqueue concurrency

We shouldn't need more than 1 worker thread per cpu, since rpciod
is designed to run without sleeping in most cases.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: nfs4_proc_layoutreturn must always drop the plh_block_lgets count
Trond Myklebust [Mon, 24 Sep 2012 18:18:39 +0000 (14:18 -0400)]
NFSv4.1: nfs4_proc_layoutreturn must always drop the plh_block_lgets count

Currently it does not do so if the RPC call failed to start. Fix is to
move the decrement of plh_block_lgets into nfs4_layoutreturn_release.

Also remove a redundant test of task->tk_status in nfs4_layoutreturn_done:
if lrp->res.lrs_present is set, then obviously the RPC call succeeded.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: _pnfs_return_layout() shouldn't invalidate the layout on failure
Trond Myklebust [Mon, 24 Sep 2012 17:49:27 +0000 (13:49 -0400)]
NFSv4.1: _pnfs_return_layout() shouldn't invalidate the layout on failure

Failure of the layoutreturn allocation fails is not a good reason to
mark the pnfs_layout_hdr as having failed a layoutget or i/o. Just
exit cleanly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Remove the NFS_LAYOUT_RETURNED state
Trond Myklebust [Fri, 21 Sep 2012 20:37:02 +0000 (16:37 -0400)]
NFSv4.1: Remove the NFS_LAYOUT_RETURNED state

It serves no purpose that the test for whether or not we have valid
layout segments doesn't already serve.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Clear NFS_LAYOUT_BULK_RECALL when the layout segments are freed
Trond Myklebust [Fri, 21 Sep 2012 19:49:42 +0000 (15:49 -0400)]
NFSv4.1: Clear NFS_LAYOUT_BULK_RECALL when the layout segments are freed

Once all the affected layout segments have been freed up, clear the
NFS_LAYOUT_BULK_RECALL flag so that we can reuse the pnfs_layout_hdr

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Get rid of the NFS_LAYOUT_DESTROYED state
Trond Myklebust [Fri, 21 Sep 2012 18:48:04 +0000 (14:48 -0400)]
NFSv4.1: Get rid of the NFS_LAYOUT_DESTROYED state

We already have a mechanism for blocking LAYOUTGET by means of the
plh_block_lgets counter. The only "service" that NFS_LAYOUT_DESTROYED
provides at this point is to block layoutget once the layout segment
list is empty, which basically means that you have to wait until
the pnfs_layout_hdr is destroyed before you can do pNFS on that file
again.

This patch enables the reuse of the pnfs_layout_hdr if the layout
segment list is empty.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Remove unused 'default allocation' for pnfs_alloc_layout_hdr()
Trond Myklebust [Fri, 21 Sep 2012 00:37:23 +0000 (20:37 -0400)]
NFSv4.1: Remove unused 'default allocation' for pnfs_alloc_layout_hdr()

...and ditto for pnfs_free_layout_hdr()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Get rid of pNFS spin lock debugging asserts...
Trond Myklebust [Fri, 21 Sep 2012 00:01:56 +0000 (20:01 -0400)]
NFSv4.1: Get rid of pNFS spin lock debugging asserts...

These are all in static declared functions that are called only once.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Balance pnfs_layout_hdr refcount in pnfs_layout_(insert|remove)_lseg
Trond Myklebust [Fri, 21 Sep 2012 00:57:11 +0000 (20:57 -0400)]
NFSv4.1: Balance pnfs_layout_hdr refcount in pnfs_layout_(insert|remove)_lseg

Ensure that the reference count for pnfs_layout_hdr reverts to the
original value after a call to pnfs_layout_remove_lseg().

Note that the caller is expected to hold a reference to the struct
pnfs_layout_hdr.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Clean up pnfs_put_lseg()
Trond Myklebust [Fri, 21 Sep 2012 00:46:49 +0000 (20:46 -0400)]
NFSv4.1: Clean up pnfs_put_lseg()

There is no longer a need to use pnfs_free_lseg_list(). Just call
pnfs_free_lseg() directly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Clean up the removal of pnfs_layout_hdr from the server list
Trond Myklebust [Thu, 20 Sep 2012 21:31:43 +0000 (17:31 -0400)]
NFSv4.1: Clean up the removal of pnfs_layout_hdr from the server list

Move the code into pnfs_free_layout_hdr(), and add checks to
get_layout_by_fh_locked to ensure that they don't reference a layout
that is being freed.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Free the pnfs_layout_hdr outside the inode->i_lock
Trond Myklebust [Thu, 20 Sep 2012 21:23:11 +0000 (17:23 -0400)]
NFSv4.1: Free the pnfs_layout_hdr outside the inode->i_lock

None of the existing pNFS layout drivers seem to require the inode
to be locked while they free the layout header.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Remove redundant reference to the pnfs_layout_hdr
Trond Myklebust [Thu, 20 Sep 2012 21:02:32 +0000 (17:02 -0400)]
NFSv4.1: Remove redundant reference to the pnfs_layout_hdr

Each layout segment already holds a reference to the pnfs_layout_hdr,
so there is no need to hold an extra reference that is released once
the last layout segment is freed.

Ensure that pnfs_find_alloc_layout() always returns a reference
to the pnfs_layout_hdr, which will be matched by the final call to
pnfs_put_layout_hdr() in pnfs_update_layout().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Rename the pnfs_put_lseg_common to pnfs_layout_remove_lseg
Trond Myklebust [Thu, 20 Sep 2012 20:33:30 +0000 (16:33 -0400)]
NFSv4.1: Rename the pnfs_put_lseg_common to pnfs_layout_remove_lseg

The latter name is more descriptive of the actual function.
Also rename pnfs_insert_layout to pnfs_layout_insert_lseg.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: reset the inode MDS threshold counters on layout destruction
Trond Myklebust [Thu, 20 Sep 2012 19:52:13 +0000 (15:52 -0400)]
NFSv4.1: reset the inode MDS threshold counters on layout destruction

Instead of resetting the inode MDS threshold counters when we mark
the layout for destruction, do it as part of freeing the layout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Get rid of pNFS layout state "NFS_LAYOUT_INVALID"
Trond Myklebust [Thu, 20 Sep 2012 19:07:45 +0000 (15:07 -0400)]
NFSv4.1: Get rid of pNFS layout state "NFS_LAYOUT_INVALID"

In all cases where we set NFS_LAYOUT_INVALID, we also set NFS_LAYOUT_DESTROYED.
Furthermore, in all cases where we test for NFS_LAYOUT_INVALID, we should
also be testing for NFS_LAYOUT_DESTROYED, since the latter means that
we hold no valid layout segments.
Ergo the two are redundant.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Simplify the pNFS return-on-close code
Trond Myklebust [Fri, 21 Sep 2012 00:31:51 +0000 (20:31 -0400)]
NFSv4.1: Simplify the pNFS return-on-close code

Confine it to the nfs4_do_close() code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Fix a race in the pNFS return-on-close code
Trond Myklebust [Fri, 21 Sep 2012 00:15:57 +0000 (20:15 -0400)]
NFSv4.1: Fix a race in the pNFS return-on-close code

If we sleep after dropping the inode->i_lock, then we are no longer
atomic with respect to the rpc_wake_up() call in pnfs_layout_remove_lseg().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: pnfs_layout_io_set_failed must clear invalid lsegs
Trond Myklebust [Fri, 21 Sep 2012 01:19:43 +0000 (21:19 -0400)]
NFSv4.1: pnfs_layout_io_set_failed must clear invalid lsegs

If pnfs_layout_io_test_failed() authorises a retry of the failed layoutgets,
we should clear the existing layout segments so that we start afresh. Do
this in pnfs_layout_io_set_failed().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Don't drop the pnfs_layout_hdr after a layoutget failure
Trond Myklebust [Mon, 24 Sep 2012 17:07:16 +0000 (13:07 -0400)]
NFSv4.1: Don't drop the pnfs_layout_hdr after a layoutget failure

We want to cache the pnfs_layout_hdr after a layoutget or i/o
failure so that pnfs_update_layout() can find it and know when
it is time to retry.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Fix a reference leak in pnfs_update_layout
Trond Myklebust [Fri, 21 Sep 2012 01:25:19 +0000 (21:25 -0400)]
NFSv4.1: Fix a reference leak in pnfs_update_layout

If we exit after the call to pnfs_find_alloc_layout(), we have to ensure
that we put the struct pnfs_layout_hdr.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: pNFS data servers may be temporarily offline
Trond Myklebust [Tue, 18 Sep 2012 23:51:12 +0000 (19:51 -0400)]
NFSv4.1: pNFS data servers may be temporarily offline

In cases where the pNFS data server is just temporarily out of service,
we want to mark it as such, and then try again later. Typically that will
be in cases of network connection errors etc.
This patch allows us to mark the devices as being "unavailable" for such
transient errors, and will make them available for retries after a
2 minute timeout period.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Retry pNFS after a 2 minute timeout
Trond Myklebust [Tue, 18 Sep 2012 21:01:12 +0000 (17:01 -0400)]
NFSv4.1: Retry pNFS after a 2 minute timeout

If we had to fall back to read/write through MDS, then assume that we should
retry pNFS after a suitable timeout period.
The following patch sets a timeout of 2 minutes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Add helpers for setting/reading the I/O fail bit
Trond Myklebust [Tue, 18 Sep 2012 20:41:18 +0000 (16:41 -0400)]
NFSv4.1: Add helpers for setting/reading the I/O fail bit

...and make them local to the pnfs.c file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Replace dprintk() in pnfs_update_layout with something less buggy
Trond Myklebust [Wed, 26 Sep 2012 15:21:40 +0000 (11:21 -0400)]
NFSv4.1: Replace dprintk() in pnfs_update_layout with something less buggy

Dereferencing nfsi->layout in order to read plh_flags without holding
a spin lock is bug prone. Furthermore, the dprintk() tells you nothing
about whether or not the call succeeded.
Replace it with something that tells you about whether or not a valid
layout segment was returned for the inode in question.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Replace get_device_info() with filelayout_get_device_info()
Trond Myklebust [Wed, 19 Sep 2012 01:02:29 +0000 (21:02 -0400)]
NFSv4.1: Replace get_device_info() with filelayout_get_device_info()

Fix the namespace pollution issue.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Cleanup; add "pnfs_" prefix to put_lseg() and get_lseg()
Trond Myklebust [Wed, 19 Sep 2012 00:57:08 +0000 (20:57 -0400)]
NFSv4.1: Cleanup; add "pnfs_" prefix to put_lseg() and get_lseg()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Cleanup; add "pnfs_" prefix to get_layout_hdr() and put_layout_hdr()
Trond Myklebust [Wed, 19 Sep 2012 00:51:13 +0000 (20:51 -0400)]
NFSv4.1: Cleanup; add "pnfs_" prefix to get_layout_hdr() and put_layout_hdr()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: Cleanup add a "pnfs_" prefix to mark_matching_lsegs_invalid
Trond Myklebust [Wed, 19 Sep 2012 00:43:31 +0000 (20:43 -0400)]
NFSv4.1: Cleanup add a "pnfs_" prefix to mark_matching_lsegs_invalid

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Clean up the pNFS layoutget interface
Trond Myklebust [Mon, 17 Sep 2012 21:12:15 +0000 (17:12 -0400)]
NFS: Clean up the pNFS layoutget interface

Ensure that we do return errors from nfs4_proc_layoutget() and that we
don't mark the layout as having failed if the error was due to a
signal or resource problem on the client side.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Get rid of the redundant xprt->shutdown bit field
Trond Myklebust [Tue, 11 Sep 2012 21:21:25 +0000 (17:21 -0400)]
SUNRPC: Get rid of the redundant xprt->shutdown bit field

It is only set after everyone has dereferenced the transport,
and serves no useful purpose: setting it is racy, so all the
socket code, etc still needs to be able to cope with the cases
where they miss reading it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Write the entire file if a server reboot occurs during fsync()
Trond Myklebust [Tue, 11 Sep 2012 20:19:38 +0000 (16:19 -0400)]
NFS: Write the entire file if a server reboot occurs during fsync()

This is to ensure that we don't clear the NFS_CONTEXT_RESEND_WRITES
flag while there are still writes that haven't been resent.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Fix fdatasync/fsync() when confronted with a server reboot
Trond Myklebust [Tue, 11 Sep 2012 20:01:22 +0000 (16:01 -0400)]
NFS: Fix fdatasync/fsync() when confronted with a server reboot

If the server reboots before it can commit the unstable writes to disk,
then nfs_commit_release_pages() will detect this when it compares the
verifier returned by COMMIT to the one returned by WRITE. When this
happens, the client needs to resend those writes in order to guarantee
that they make it to stable storage.

This patch adds a signalling mechanism to notify fsync() that it
needs to retry all writes before it can exit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Convert the nfs4_lock_state->ls_flags to a bit field
Trond Myklebust [Mon, 10 Sep 2012 17:26:49 +0000 (13:26 -0400)]
NFSv4: Convert the nfs4_lock_state->ls_flags to a bit field

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Clean up helper function nfs4_select_rw_stateid()
Trond Myklebust [Mon, 13 Aug 2012 22:54:45 +0000 (18:54 -0400)]
NFS: Clean up helper function nfs4_select_rw_stateid()

We want to be able to pass on the information that the page was not
dirtied under a lock. Instead of adding a flag parameter, do this
by passing a pointer to a 'struct nfs_lock_owner' that may be NULL.

Also reuse this structure in struct nfs_lock_context to carry the
fl_owner_t and pid_t.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Convert nfs_get_lock_context to return an ERR_PTR on failure
Trond Myklebust [Mon, 13 Aug 2012 21:15:50 +0000 (17:15 -0400)]
NFS: Convert nfs_get_lock_context to return an ERR_PTR on failure

We want to be able to distinguish between allocation failures, and
the case where the lock context is not needed (because there are no
locks).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Optimise away unnecessary data moves in xdr_align_pages
Trond Myklebust [Thu, 2 Aug 2012 17:21:43 +0000 (13:21 -0400)]
SUNRPC: Optimise away unnecessary data moves in xdr_align_pages

We only have to call xdr_shrink_pagelen() if the remaining RPC
message does not fit in the page buffer length that we supplied
to xdr_align_pages().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Ensure that idmap_pipe_downcall sanity-checks the downcall data
Trond Myklebust [Thu, 27 Sep 2012 20:15:00 +0000 (16:15 -0400)]
NFSv4: Ensure that idmap_pipe_downcall sanity-checks the downcall data

Use the idmapper upcall data to verify that the legacy idmapper daemon
is indeed responding to an upcall that we sent.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
12 years agoNFSv4: Clean up the legacy idmapper upcall
Trond Myklebust [Thu, 27 Sep 2012 19:44:19 +0000 (15:44 -0400)]
NFSv4: Clean up the legacy idmapper upcall

Replace the BUG_ON(idmap->idmap_key_cons != NULL) with a
WARN_ON_ONCE(). Then get rid of the ACCESS_ONCE(idmap->idmap_key_cons).

Then add helper functions for starting, finishing and aborting the
legacy upcall.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
12 years agoNFSv4: Remove BUG_ON() and ACCESS_ONCE() calls in the idmapper
Trond Myklebust [Fri, 28 Sep 2012 16:03:09 +0000 (12:03 -0400)]
NFSv4: Remove BUG_ON() and ACCESS_ONCE() calls in the idmapper

The use of ACCESS_ONCE() is wrong, since the various routines that set/clear
idmap->idmap_key_cons should be strictly ordered w.r.t. each other, and
the idmap->idmap_mutex ensures that only one thread at a time may be in
an upcall situation.

Also replace the BUG_ON()s with WARN_ON_ONCE() where appropriate.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1: decode_getdeviceinfo should check xdr_read_pages() return value
Trond Myklebust [Wed, 1 Aug 2012 18:21:12 +0000 (14:21 -0400)]
NFSv4.1: decode_getdeviceinfo should check xdr_read_pages() return value

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Fix the return value of xdr_align_pages()
Trond Myklebust [Wed, 1 Aug 2012 18:32:13 +0000 (14:32 -0400)]
SUNRPC: Fix the return value of xdr_align_pages()

The callers of xdr_align_pages() expect it to return the number of bytes
of actual XDR data remaining in the pages.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS4: avoid underflow when converting error to pointer.
NeilBrown [Mon, 17 Sep 2012 06:46:34 +0000 (16:46 +1000)]
NFS4: avoid underflow when converting error to pointer.

In nfs4_create_sec_client, 'flavor' can hold a negative error
code (returned from nfs4_negotiate_security), even though it
is an 'enum' and hence unsigned.

The code is careful to cast it to an (int) before testing if it
is negative, however it doesn't cast to an (int) before calling
ERR_PTR.

On a machine where "void*" is larger than "int", this results in
the unsigned equivalent of -1 (e.g. 0xffffffff) being converted
to a pointer.  Subsequent code determines that this is not
negative, and so  dereferences it with predictable results.

So: cast 'flavor' to a (signed) int before passing to ERR_PTR.

cc: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: fix the return value check by using IS_ERR
Wei Yongjun [Fri, 21 Sep 2012 04:27:41 +0000 (12:27 +0800)]
NFS: fix the return value check by using IS_ERR

In case of error, the function rpcauth_create() returns ERR_PTR()
and never returns NULL pointer. The NULL test in the return value
check should be replaced with IS_ERR().

dpatch engine is used to auto generated this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Set alloc_slot for backchannel tcp ops
Bryan Schumaker [Mon, 24 Sep 2012 17:39:01 +0000 (13:39 -0400)]
SUNRPC: Set alloc_slot for backchannel tcp ops

f39c1bfb5a03e2d255451bff05be0d7255298fa4 (SUNRPC: Fix a UDP transport
regression) introduced the "alloc_slot" function for xprt operations,
but never created one for the backchannel operations.  This patch fixes
a null pointer dereference when mounting NFS over v4.1.

Call Trace:
 [<ffffffffa0207957>] ? xprt_reserve+0x47/0x50 [sunrpc]
 [<ffffffffa02023a4>] call_reserve+0x34/0x60 [sunrpc]
 [<ffffffffa020e280>] __rpc_execute+0x90/0x400 [sunrpc]
 [<ffffffffa020e61a>] rpc_async_schedule+0x2a/0x40 [sunrpc]
 [<ffffffff81073589>] process_one_work+0x139/0x500
 [<ffffffff81070e70>] ? alloc_worker+0x70/0x70
 [<ffffffffa020e5f0>] ? __rpc_execute+0x400/0x400 [sunrpc]
 [<ffffffff81073d1e>] worker_thread+0x15e/0x460
 [<ffffffff8145c839>] ? preempt_schedule+0x49/0x70
 [<ffffffff81073bc0>] ? rescuer_thread+0x230/0x230
 [<ffffffff81079603>] kthread+0x93/0xa0
 [<ffffffff81465d04>] kernel_thread_helper+0x4/0x10
 [<ffffffff81079570>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff81465d00>] ? gs_change+0x13/0x13

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAIT
Trond Myklebust [Wed, 12 Sep 2012 20:49:15 +0000 (16:49 -0400)]
SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAIT

Instead of doing a shutdown() call, we need to do an actual close().
Ditto if/when the server is sending us junk RPC headers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Simon Kirby <sim@hostway.ca>
Cc: stable@vger.kernel.org
12 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Wed, 19 Sep 2012 18:04:34 +0000 (11:04 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A small collection of driver fixes/updates and a core fix for 3.6.  It
  contains:

   - Bug fixes for mtip32xx, and support for new hardware (just addition
     of IDs).  They have been queued up for 3.7 for a few weeks as well.

   - rate-limit a failing command error message in block core.

   - A fix for an old cciss bug from Stephen.

   - Prevent overflow of partition count from Alan."

* 'for-linus' of git://git.kernel.dk/linux-block:
  cciss: fix handling of protocol error
  blk: add an upper sanity check on partition adding
  mtip32xx: fix user_buffer check in exec_drive_command
  mtip32xx: Remove dead code
  mtip32xx: Change printk to pr_xxxx
  mtip32xx: Proper reporting of write protect status on big-endian
  mtip32xx: Increase timeout for standby command
  mtip32xx: Handle NCQ commands during the security locked state
  mtip32xx: Add support for new devices
  block: rate-limit the error message from failing commands

12 years agoMerge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh
Linus Torvalds [Wed, 19 Sep 2012 18:03:55 +0000 (11:03 -0700)]
Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh

Pull SuperH fixes from Paul Mundt.

* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
  sh: Fix up TIF_NOTIFY_RESUME sans TIF_SIGPENDING handling.
  sh: pfc: Release spinlock in sh_pfc_gpio_request_enable() error path
  sh: intc: Fix up multi-evt irq association.

12 years agoMerge tag 'rpmsg-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg
Linus Torvalds [Wed, 19 Sep 2012 18:03:13 +0000 (11:03 -0700)]
Merge tag 'rpmsg-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg

Pull rpmsg fix from Ohad Ben-Cohen:
 "A quick rpmsg fix from Fernando, fixing two buggy invocations of
  dma_free_coherent"

* tag 'rpmsg-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg:
  rpmsg: fix dma_free_coherent dev parameter

12 years agoMerge tag 'md-3.6-fixes' of git://neil.brown.name/md
Linus Torvalds [Wed, 19 Sep 2012 18:01:38 +0000 (11:01 -0700)]
Merge tag 'md-3.6-fixes' of git://neil.brown.name/md

Pull md fixes from NeilBrown:
 "3 fixes for md in 3.6.

  One reverts a recent patch which turns out to not be such a good idea.

  Other two fix minor bugs with the new (since 3.3) 'replacement' code
  and have been tagged for -stable."

* tag 'md-3.6-fixes' of git://neil.brown.name/md:
  md: make sure metadata is updated when spares are activated or removed.
  md/raid5: fix calculate of 'degraded' when a replacement becomes active.
  Revert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE."

12 years agoMerge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Linus Torvalds [Wed, 19 Sep 2012 18:00:07 +0000 (11:00 -0700)]
Merge branch 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

Pull workqueue / powernow-k8 fix from Tejun Heo:
 "This is the fix for the bug where cpufreq/powernow-k8 was tripping
  BUG_ON() in try_to_wake_up_local() by migrating workqueue worker to a
  different CPU.

    https://bugzilla.kernel.org/show_bug.cgi?id=47301

  As discussed, the fix is now two parts - one to reimplement
  work_on_cpu() so that it doesn't create a new kthread each time and
  the actual fix which makes powernow-k8 use work_on_cpu() instead of
  performing manual migration.

  While pretty late in the merge cycle, both changes are on the safer
  side.  Jiri and I verified two existing users of work_on_cpu() and
  Duncan confirmed that the powernow-k8 fix survived about 18 hours of
  testing."

* 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
  workqueue: reimplement work_on_cpu() using system_wq

12 years agocpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU
Tejun Heo [Tue, 18 Sep 2012 21:24:59 +0000 (14:24 -0700)]
cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU

powernowk8_target() runs off a per-cpu work item and if the
cpufreq_policy->cpu is different from the current one, it migrates the
kworker to the target CPU by manipulating current->cpus_allowed.  The
function migrates the kworker back to the original CPU but this is
still broken.  Workqueue concurrency management requires the kworkers
to stay on the same CPU and powernowk8_target() ends up triggerring
BUG_ON(rq != this_rq()) in try_to_wake_up_local() if it contends on
fidvid_mutex and sleeps.

It is unclear why this bug is being reported now.  Duncan says it
appeared to be a regression of 3.6-rc1 and couldn't reproduce it on
3.5.  Bisection seemed to point to 63d95a91 "workqueue: use @pool
instead of @gcwq or @cpu where applicable" which is an non-functional
change.  Given that the reproduce case sometimes took upto days to
trigger, it's easy to be misled while bisecting.  Maybe something made
contention on fidvid_mutex more likely?  I don't know.

This patch fixes the bug by using work_on_cpu() instead if @pol->cpu
isn't the same as the current one.  The code assumes that
cpufreq_policy->cpu is kept online by the caller, which Rafael tells
me is the case.

stable: ed48ece27c ("workqueue: reimplement work_on_cpu() using
        system_wq") should be applied before this; otherwise, the
        behavior could be horrible.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Duncan <1i5t5.duncan@cox.net>
Tested-by: Duncan <1i5t5.duncan@cox.net>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: stable@vger.kernel.org
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47301

12 years agoworkqueue: reimplement work_on_cpu() using system_wq
Tejun Heo [Tue, 18 Sep 2012 19:48:43 +0000 (12:48 -0700)]
workqueue: reimplement work_on_cpu() using system_wq

The existing work_on_cpu() implementation is hugely inefficient.  It
creates a new kthread, execute that single function and then let the
kthread die on each invocation.

Now that system_wq can handle concurrent executions, there's no
advantage of doing this.  Reimplement work_on_cpu() using system_wq
which makes it simpler and way more efficient.

stable: While this isn't a fix in itself, it's needed to fix a
        workqueue related bug in cpufreq/powernow-k8.  AFAICS, this
        shouldn't break other existing users.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@vger.kernel.org
12 years agomd: make sure metadata is updated when spares are activated or removed.
NeilBrown [Wed, 19 Sep 2012 02:54:22 +0000 (12:54 +1000)]
md: make sure metadata is updated when spares are activated or removed.

It isn't always necessary to update the metadata when spares are
removed as the presence-or-not of a spare isn't really important to
the integrity of an array.
Also activating a spare doesn't always require updating the metadata
as the update on 'recovery-completed' is usually sufficient.

However the introduction of 'replacement' devices have made these
transitions sometimes more important.  For example the 'Replacement'
flag isn't cleared until the original device is removed, so we need
to ensure a metadata update after that 'spare' is removed.

So set MD_CHANGE_DEVS whenever a spare is activated or removed, to
complement the current situation where it is set when a spare is added
or a device is failed (or a number of other less common situations).

This is suitable for -stable as out-of-data metadata could lead
to data corruption.
This is only relevant for 3.3 and later 9when 'replacement' as
introduced.

Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.de>
12 years agomd/raid5: fix calculate of 'degraded' when a replacement becomes active.
NeilBrown [Wed, 19 Sep 2012 02:52:30 +0000 (12:52 +1000)]
md/raid5: fix calculate of 'degraded' when a replacement becomes active.

When a replacement device becomes active, we mark the device that it
replaces as 'faulty' so that it can subsequently get removed.
However 'calc_degraded' only pays attention to the primary device, not
the replacement, so the array appears to become degraded, which is
wrong.

So teach 'calc_degraded' to consider any replacement if a primary
device is faulty.

This is suitable for -stable as an incorrect 'degraded' value can
confuse md and could lead to data corruption.
This is only relevant for 3.3 and later.

Cc: stable@vger.kernel.org
Reported-by: Robin Hill <robin@robinhill.me.uk>
Reported-by: John Drescher <drescherjm@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
12 years agoRevert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE."
NeilBrown [Wed, 19 Sep 2012 02:48:30 +0000 (12:48 +1000)]
Revert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE."

This reverts commit 895e3c5c58a80bb9e4e05d9ac38b4f30e0f97d80.

While this patch seemed like a good idea and did help some workloads,
it hurts other workloads.
Large sequential O_DIRECT writes were faster,
Small random O_DIRECT writes were slower.

Other changes (batching RAID5 writes) have improved the sequential
writes using a different mechanism, so the net result of this patch
is definitely negative.  So revert it.

Reported-by: Shaohua Li <shli@kernel.org>
Tested-by: Jianpeng Ma <majianpeng@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>