]> git.karo-electronics.de Git - mv-sheeva.git/log
mv-sheeva.git
13 years agopnfs: CB_LAYOUTRECALL xdr code
Fred Isaman [Thu, 6 Jan 2011 11:36:29 +0000 (11:36 +0000)]
pnfs: CB_LAYOUTRECALL xdr code

This is the xdr decoding for CB_LAYOUTRECALL.

Signed-off-by: Alexandros Batsakis <batsakis@netapp.com>
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agopnfs: change lo refcounting to atomic_t
Fred Isaman [Thu, 6 Jan 2011 11:36:28 +0000 (11:36 +0000)]
pnfs: change lo refcounting to atomic_t

This will be required to allow us to grab reference outside of i_lock.
While we are at it, make put_layout_hdr take the same argument as all the
related functions.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agopnfs: check that partial LAYOUTGET return is ignored
Fred Isaman [Thu, 6 Jan 2011 11:36:27 +0000 (11:36 +0000)]
pnfs: check that partial LAYOUTGET return is ignored

Either a bad server reply, or our ignoring of multiple array segments in
a reply, can cause a reply to not meet our requirements.  Ensure
that we ignore such replies.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agopnfs: add layout to client list before sending rpc
Fred Isaman [Thu, 6 Jan 2011 11:36:26 +0000 (11:36 +0000)]
pnfs: add layout to client list before sending rpc

Since this list will be used to search for layouts to recall,
this is necessary to avoid a race where the recall comes in,
sees there is nothing in the client list, and prepares to return
NOMATCHING, while the LAYOUTGET gets processed before the recall
updates the stateid.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agopnfs: serialize LAYOUTGET(openstateid)
Fred Isaman [Thu, 6 Jan 2011 11:36:25 +0000 (11:36 +0000)]
pnfs: serialize LAYOUTGET(openstateid)

We shouldn't send a LAYOUTGET(openstateid) unless all outstanding RPCs
using the previous stateid are completed.  This requires choosing the
stateid to encode earlier, so we can abort if one is not available (we
want to use the open stateid, but a LAYOUTGET is already out using
it), and adding a count of the number of outstanding rpc calls using
layout state (which for now consist solely of LAYOUTGETs).

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agopnfs: layoutget rpc code cleanup
Fred Isaman [Thu, 6 Jan 2011 11:36:24 +0000 (11:36 +0000)]
pnfs: layoutget rpc code cleanup

No functional changes, just some code minor code rearrangement and
comments.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agopnfs: change how lsegs are removed from layout list
Fred Isaman [Thu, 6 Jan 2011 11:36:23 +0000 (11:36 +0000)]
pnfs: change how lsegs are removed from layout list

This is to prepare the way for sensible io draining.  Instead of just
removing the lseg from the list, we instead clear the VALID flag
(preventing new io from grabbing references to the lseg) and remove
the reference holding it in the list.  Thus the lseg will be removed
once any io in progress completes and any references still held are
dropped.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agopnfs: change layout state seqlock to a spinlock
Fred Isaman [Thu, 6 Jan 2011 11:36:22 +0000 (11:36 +0000)]
pnfs: change layout state seqlock to a spinlock

This prepares for future changes, where the layout state needs
to change atomically with several other variables.  In particular,
it will need to know if lo->segs is empty, as we test that instead
of manipulating the NFS_LAYOUT_STATEID_SET bit.  Moreover, the
layoutstateid is not really a read-mostly structure, as it is
written almost as often as it is read.

The behavior of pnfs_get_layout_stateid is also slightly changed, so that
it no longer changes the stateid.  Its name is changed to +pnfs_choose_layoutget_stateid.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agopnfs: add prefix to struct pnfs_layout_hdr fields
Fred Isaman [Thu, 6 Jan 2011 11:36:21 +0000 (11:36 +0000)]
pnfs: add prefix to struct pnfs_layout_hdr fields

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agopnfs: add prefix to struct pnfs_layout_segment fields
Fred Isaman [Thu, 6 Jan 2011 11:36:20 +0000 (11:36 +0000)]
pnfs: add prefix to struct pnfs_layout_segment fields

While we are renaming all the fields, change lo->state to lo->plh_flags.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agopnfs: remove unnecessary field lgp->status
Fred Isaman [Thu, 6 Jan 2011 11:36:19 +0000 (11:36 +0000)]
pnfs: remove unnecessary field lgp->status

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agopnfs: fix incorrect comment in destroy_lseg
Fred Isaman [Thu, 6 Jan 2011 11:36:18 +0000 (11:36 +0000)]
pnfs: fix incorrect comment in destroy_lseg

Comment references get_layout_hdr_locked, which never existed in
submitted code.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS rename client back channel transport field
Andy Adamson [Thu, 6 Jan 2011 02:04:35 +0000 (02:04 +0000)]
NFS rename client back channel transport field

Differentiate from server backchannel

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS add session back channel draining
Andy Adamson [Thu, 6 Jan 2011 02:04:34 +0000 (02:04 +0000)]
NFS add session back channel draining

Currently session draining only drains the fore channel.
The back channel processing must also be drained.

Use the back channel highest_slot_used to indicate that a callback is being
processed by the callback thread.  Move the session complete to be per channel.

When the session is draininig, wait for any current back channel processing
to complete and stop all new back channel processing by returning NFS4ERR_DELAY
to the back channel client.

Drain the back channel, then the fore channel.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS RPC_AUTH_GSS unsupported on v4.1 back channel
Andy Adamson [Thu, 6 Jan 2011 02:04:33 +0000 (02:04 +0000)]
NFS RPC_AUTH_GSS unsupported on v4.1 back channel

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS refactor nfs_find_client and reference client across callback processing
Andy Adamson [Thu, 6 Jan 2011 02:04:32 +0000 (02:04 +0000)]
NFS refactor nfs_find_client and reference client across callback processing

Fixes a bug where the nfs_client could be freed during callback processing.
Refactor nfs_find_client to use minorversion specific means to locate the
correct nfs_client structure.

In the NFS layer, V4.0 clients are found using the callback_ident field in the
CB_COMPOUND header.  V4.1 clients are found using the sessionID in the
CB_SEQUENCE operation which is also compared against the sessionID associated
with the back channel thread after a successful CREATE_SESSION.

Each of these methods finds the one an only nfs_client associated
with the incoming callback request - so nfs_find_client_next is not needed.

In the RPC layer, the pg_authenticate call needs to find the nfs_client. For
the v4.0 callback service, the callback identifier has not been decoded so a
search by address, version, and minorversion is used.  The sessionid for the
sessions based callback service has (usually) not been set for the
pg_authenticate on a CB_NULL call which can be sent prior to the return
of a CREATE_SESSION call, so the sessionid associated with the back channel
thread is not used to find the client in pg_authenticate for CB_NULL calls.

Pass the referenced nfs_client to each CB_COMPOUND operation being proceesed
via the new cb_process_state structure. The reference is held across
cb_compound processing.

Use the new cb_process_state struct to move the NFS4ERR_RETRY_UNCACHED_REP
processing from process_op into nfs4_callback_sequence where it belongs.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS associate sessionid with callback connection
Andy Adamson [Thu, 6 Jan 2011 02:04:31 +0000 (02:04 +0000)]
NFS associate sessionid with callback connection

The sessions based callback service is started prior to the CREATE_SESSION call
so that it can handle CB_NULL requests which can be sent before the
CREATE_SESSION call returns and the session ID is known.

Set the callback sessionid after a sucessful CREATE_SESSION.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS implement v4.0 callback_ident
Andy Adamson [Thu, 6 Jan 2011 02:04:30 +0000 (02:04 +0000)]
NFS implement v4.0 callback_ident

Use the small id to pointer translator service to provide a unique callback
identifier per SETCLIENTID call used to identify the v4.0 callback service
associated with the clientid.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS do not clear minor version at nfs_client free
Andy Adamson [Thu, 6 Jan 2011 02:04:29 +0000 (02:04 +0000)]
NFS do not clear minor version at nfs_client free

Resetting the client minor version operations causes nfs4_destroy_callback
to fail to shutdown the NFSv4.1 callback service.

There is no reason to reset the client minorversion operations when the
nfs_client struct is being freed.

Remove the minorverion reset and rename the function.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS use svc_create_xprt for NFSv4.1 callback service
Andy Adamson [Thu, 6 Jan 2011 02:04:28 +0000 (02:04 +0000)]
NFS use svc_create_xprt for NFSv4.1 callback service

The new back channel transport means we call the normal creation routine as
well as svc_xprt_put.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoSUNRPC register and unregister the back channel transport
Andy Adamson [Thu, 6 Jan 2011 02:04:27 +0000 (02:04 +0000)]
SUNRPC register and unregister the back channel transport

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoSUNRPC new transport for the NFSv4.1 shared back channel
Andy Adamson [Thu, 6 Jan 2011 02:04:26 +0000 (02:04 +0000)]
SUNRPC new transport for the NFSv4.1 shared back channel

Move the current sock create and destroy routines into the new transport ops.
Back channel socket will be destroyed by the svc_closs_all call in svc_destroy.

Added check: only TCP supported on shared back channel.

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoSUNRPC fix bc_send print
Andy Adamson [Thu, 6 Jan 2011 02:04:25 +0000 (02:04 +0000)]
SUNRPC fix bc_send print

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoSUNRPC move svc_drop to caller of svc_process_common
Andy Adamson [Thu, 6 Jan 2011 02:04:24 +0000 (02:04 +0000)]
SUNRPC move svc_drop to caller of svc_process_common

The NFSv4.1 shared back channel does not need to call svc_drop because the
callback service never outlives the single connection it services, and it
reuses it's buffers and keeps the trasport.

Signed-off-by: Andy Adamson <andros@netapp.com>
Acked-by: Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agonfsv4: Switch to generic xattr handling code
Aneesh Kumar K.V [Thu, 9 Dec 2010 11:35:25 +0000 (11:35 +0000)]
nfsv4: Switch to generic xattr handling code

This patch make nfsv4 use the generic xattr handling code
to get the nfsv4 acl. This will help us to add richacl
support to nfsv4 in later patches

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agonfs: Set MS_POSIXACL always
Aneesh Kumar K.V [Thu, 9 Dec 2010 11:35:14 +0000 (11:35 +0000)]
nfs: Set MS_POSIXACL always

We want to skip VFS applying mode for NFS. So set MS_POSIXACL always
and selectively use umask. Ideally we would want to use umask only
when we don't have inheritable ACEs set. But NFS currently don't
allow to send umask to the server. So this is best what we can do
and this is consistent with NFSv3

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: use ERR_CAST()
Namhyung Kim [Tue, 28 Dec 2010 17:02:46 +0000 (17:02 +0000)]
NFS: use ERR_CAST()

Use ERR_CAST() intead of wierd-looking cast.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agonfs: fix mispelling of idmap CONFIG symbol
J. Bruce Fields [Tue, 21 Dec 2010 23:49:34 +0000 (23:49 +0000)]
nfs: fix mispelling of idmap CONFIG symbol

Trivial, but confusing when you're trying to grep through this
code....

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agokernel panic when mount NFSv4
Trond Myklebust [Mon, 20 Dec 2010 21:19:26 +0000 (21:19 +0000)]
kernel panic when mount NFSv4

On Tue, 2010-12-14 at 16:58 +0800, Mi Jinlong wrote:
> Hi,
>
> When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
> at NFS client's __rpc_create_common function.
>
> The panic place is:
>   rpc_mkpipe
>     __rpc_lookup_create()          <=== find pipefile *idmap*
>     __rpc_mkpipe()                 <=== pipefile is *idmap*
>       __rpc_create_common()
>        ******  BUG_ON(!d_unhashed(dentry)); ******    *panic*
>
> It means that the dentry's d_flags have be set DCACHE_UNHASHED,
> but it should not be set here.
>
> Is someone known this bug? or give me some idea?
>
> A reproduce program is append, but it can't reproduce the bug every time.
> the export is: "/nfsroot       *(rw,no_root_squash,fsid=0,insecure)"
>
> And the panic message is append.
>
> ============================================================================
> #!/bin/sh
>
> LOOPTOTAL=768
> LOOPCOUNT=0
> ret=0
>
> while [ $LOOPCOUNT -ne $LOOPTOTAL ]
> do
>  ((LOOPCOUNT += 1))
>  service nfs restart
>  /usr/sbin/rpc.idmapd
>  mount -t nfs4 127.0.0.1:/ /mnt|| return 1;
>  ls -l /var/lib/nfs/rpc_pipefs/nfs/*/
>  umount /mnt
>  echo $LOOPCOUNT
> done
>
> ===============================================================================
> Code: af 60 01 00 00 89 fa 89 f0 e8 64 cf 89 f0 e8 5c 7c 64 cf 31 c0 8b 5c 24 10 8b
> 74 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 <0f> 0b eb fc 8b 46 28 c7 44 24 08 20
> de ee f0 c7 44 24 04 56 ea
> EIP:[<f0ee92ea>] __rpc_create_common+0x8a/0xc0 [sunrpc] SS:ESP 0068:eccb5d28
> ---[ end trace 8f5606cd08928ed2]---
> Kernel panic - not syncing: Fatal exception
> Pid:7131, comm: mount.nfs4 Tainted: G     D   -------------------2.6.32 #1
> Call Trace:
>  [<c080ad18>] ? panic+0x42/0xed
>  [<c080e42c>] ? oops_end+0xbc/0xd0
>  [<c040b090>] ? do_invalid_op+0x0/0x90
>  [<c040b10f>] ? do_invalid_op+0x7f/0x90
>  [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
>  [<f0edc433>] ? rpc_free_task+0x33/0x70[sunrpc]
>  [<f0ed6508>] ? prc_call_sync+0x48/0x60[sunrpc]
>  [<f0ed656e>] ? rpc_ping+0x4e/0x60[sunrpc]
>  [<f0ed6eaf>] ? rpc_create+0x38f/0x4f0[sunrpc]
>  [<c080d80b>] ? error_code+0x73/0x78
>  [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
>  [<c0532bda>] ? d_lookup+0x2a/0x40
>  [<f0ee94b1>] ? rpc_mkpipe+0x111/0x1b0[sunrpc]
>  [<f10a59f4>] ? nfs_create_rpc_client+0xb4/0xf0[nfs]
>  [<f10d6c6d>] ? nfs_fscache_get_client_cookie+0x1d/0x50[nfs]
>  [<f10d3fcb>] ? nfs_idmap_new+0x7b/0x140[nfs]
>  [<c05e76aa>] ? strlcpy+0x3a/0x60
>  [<f10a60ca>] ? nfs4_set_client+0xea/0x2b0[nfs]
>  [<f10a6d0c>] ? nfs4_create_server+0xac/0x1b0[nfs]
>  [<c04f1400>] ? krealloc+0x40/0x50
>  [<f10b0e8b>] ? nfs4_remote_get_sb+0x6b/0x250[nfs]
>  [<c04f14ec>] ? kstrdup+0x3c/0x60
>  [<c0520739>] ? vfs_kern_mount+0x69/0x170
>  [<f10b1a3c>] ? nfs_do_root_mount+0x6c/0xa0[nfs]
>  [<f10b1b47>] ? nfs4_try_mount+0x37/0xa0[nfs]
>  [<f10afe6d>] ? nfs4_validate_text_mount_data+-x7d/0xf0[nfs]
>  [<f10b1c42>] ? nfs4_get_sb+0x92/0x2f0
>  [<c0520739>] ? vfs_kern_mount+0x69/0x170
>  [<c05366d2>] ? get_fs_type+0x32/0xb0
>  [<c052089f>] ? do_kern_mount+0x3f/0xe0
>  [<c053954f>] ? do_mount+0x2ef/0x740
>  [<c0537740>] ? copy_mount_options+0xb0/0x120
>  [<c0539a0e>] ? sys_mount+0x6e/0xa0

Hi,

Does the following patch fix the problem?

Cheers
  Trond

--------------------------
SUNRPC: Fix a BUG in __rpc_create_common

From: Trond Myklebust <Trond.Myklebust@netapp.com>

Mi Jinlong reports:

When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
at NFS client's __rpc_create_common function.

The panic place is:
  rpc_mkpipe
      __rpc_lookup_create()          <=== find pipefile *idmap*
      __rpc_mkpipe()                 <=== pipefile is *idmap*
        __rpc_create_common()
         ******  BUG_ON(!d_unhashed(dentry)); ****** *panic*

The test is wrong: we can find ourselves with a hashed negative dentry here
if the idmapper tried to look up the file before we got round to creating
it.

Just replace the BUG_ON() with a d_drop(dentry).

Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: double unlock in next_host_state()
Dan Carpenter [Sun, 2 Jan 2011 20:20:42 +0000 (20:20 +0000)]
lockd: double unlock in next_host_state()

We unlock again after we goto out.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Don't leak in nfs_proc_symlink()
Jesper Juhl [Fri, 24 Dec 2010 22:22:37 +0000 (22:22 +0000)]
NFS: Don't leak in nfs_proc_symlink()

Hi,

In fs/nfs/proc.c::nfs_proc_symlink() we will leak memory if either
nfs_alloc_fhandle() or nfs_alloc_fattr() returns NULL but the other one
doesn't.
This patch ensures memory allocated by one when the other fails is always
released (this is safe since nfs_free_fattr() and nfs_free_fhandle() both
call kfree which deals gracefully with NULL pointers).

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFSv4: Convert a few commas into semicolons...
Trond Myklebust [Tue, 21 Dec 2010 15:52:24 +0000 (10:52 -0500)]
NFSv4: Convert a few commas into semicolons...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agonet/sunrpc/clnt.c: Convert sprintf_symbol to %ps
Joe Perches [Tue, 21 Dec 2010 15:52:24 +0000 (10:52 -0500)]
net/sunrpc/clnt.c: Convert sprintf_symbol to %ps

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: suppressing showing of default mount port value in /proc fixed
Stanislav Kinsbursky [Wed, 8 Dec 2010 09:40:13 +0000 (12:40 +0300)]
NFS: suppressing showing of default mount port value in /proc fixed

Update: added check for zero value as it was before (note: can't simply check
mountd_port for positive value because it's typeof unsigned short)

Default value for mount server port is set to NFS_UNSPEC_PORT (-1) and will not
be changed during parsing mount options for mound data version 6. This default
value will be showed for mountport in /proc/mounts always since current default
check is for zero value. This small mistake leads to big problem, because
during umount.nfs execution from old user-space utils (at least nfs-utils
1.0.9) this value will be used as the server port to connect to. This request
will be rejected (since port is 65535) and thus nfs mount point can't be
unmounted.

Note from Chuck Lever (chuck.lever@oracle.com): this is only possible if
/etc/mtab is a link to /proc/mounts.  Not all systems have this configuration.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agonfs4: fix units bug causing hang on recovery
J. Bruce Fields [Tue, 14 Dec 2010 00:05:46 +0000 (19:05 -0500)]
nfs4: fix units bug causing hang on recovery

Note that cl_lease_time is in jiffies.  This can cause a very long wait
in the NFS4ERR_CLID_INUSE case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agonfs: Take advantage of kmem_cache_zalloc() in nfs_page_alloc()
Jesper Juhl [Thu, 9 Dec 2010 22:17:15 +0000 (23:17 +0100)]
nfs: Take advantage of kmem_cache_zalloc() in nfs_page_alloc()

Take advantage of kmem_cache_zalloc() in nfs_page_alloc(). Save a call to
memset() and a few bytes.

Before:
 [jj@dragon linux-2.6]$ size fs/nfs/pagelist.o
    text    data     bss     dec     hex filename
    1765       0       8    1773     6ed fs/nfs/pagelist.o
After:
 [jj@dragon linux-2.6]$ size fs/nfs/pagelist.o
    text    data     bss     dec     hex filename
    1749       0       8    1757     6dd fs/nfs/pagelist.o

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Remove redundant unlikely()
Tobias Klauser [Thu, 9 Dec 2010 14:53:28 +0000 (15:53 +0100)]
NFS: Remove redundant unlikely()

IS_ERR() already implies unlikely(), so it can be omitted here.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Remove src_sap and src_len from nlm_lookup_host_info struct
Chuck Lever [Tue, 14 Dec 2010 15:06:52 +0000 (15:06 +0000)]
lockd: Remove src_sap and src_len from nlm_lookup_host_info struct

Clean up.

The contents of the src_sap field is not used in nlm_alloc_host().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Remove nlm_lookup_host()
Chuck Lever [Tue, 14 Dec 2010 15:06:41 +0000 (15:06 +0000)]
lockd: Remove nlm_lookup_host()

Clean up.

Remove the now unused helper nlm_lookup_host().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Make nrhosts an unsigned long
Chuck Lever [Tue, 14 Dec 2010 15:06:32 +0000 (15:06 +0000)]
lockd: Make nrhosts an unsigned long

Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Rename nlm_hosts
Chuck Lever [Tue, 14 Dec 2010 15:06:22 +0000 (15:06 +0000)]
lockd: Rename nlm_hosts

Clean up.

nlm_hosts now contains only server-side entries.  Rename it to match
convention of client side cache.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Clean up nlmsvc_lookup_host()
Chuck Lever [Tue, 14 Dec 2010 15:06:12 +0000 (15:06 +0000)]
lockd: Clean up nlmsvc_lookup_host()

Clean up.

Change nlmsvc_lookup_host() to be purpose-built for server-side
nlm_host management.  This replaces the generic nlm_lookup_host()
helper function, just like on the client side.  The lookup logic is
specialized for server host lookups.

The server side cache also gets its own specialized equivalent of the
nlm_release_host() function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Create client-side nlm_host cache
Chuck Lever [Tue, 14 Dec 2010 15:05:52 +0000 (15:05 +0000)]
lockd: Create client-side nlm_host cache

NFS clients don't need the garbage collection processing that is
performed on nlm_host structures.  The client picks up an nlm_host at
mount time and holds a reference to it until the file system is
unmounted.

Servers, on the other hand, don't have a precise way to tell when an
nlm_host is no longer being used, so zero refcount nlm_host entries
are left to expire in the cache after a time.

Basically there's nothing holding a reference to an nlm_host between
individual server-side NLM requests, but we can't afford the expense
of recreating them for every new NLM request from a client.  The
nlm_host cache adds some lifetime hysteresis to entries in the cache
so the next time a particular nlm_host is needed, it's likely to be
discovered by a lookup rather than created from whole cloth.

With the new implementation, client nlm_host cache items are no longer
garbage collected, and are destroyed directly by a new release
function specialized for client entries, nlmclnt_release_host().  They
are cached in their own data structure, and have their own lookup
logic, simplified and specialized for client nlm_host entries.

However, the client nlm_host cache still shares reboot recovery logic
with the server nlm_host cache.  The NSM "peer rebooted" downcall for
clients and servers still come through the same RPC call.  This is a
legacy formal API that would be difficult to alter, and besides, the
user space NSM implementation can't tell the difference between peers
that are clients or servers.

For this reason, the client cache continues to share the
nlm_host_mutex (and reboot recovery logic) with the server cache.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Split nlm_release_call()
Chuck Lever [Tue, 14 Dec 2010 15:05:42 +0000 (15:05 +0000)]
lockd: Split nlm_release_call()

The nlm_release_call() function is invoked from both the server and
the client side.  We're about to introduce a distinct server- and
client-side nlm_release_host(), so nlm_release_call() must first be
split into a client-side and a server-side version.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Add nlm_destroy_host_locked()
Chuck Lever [Tue, 14 Dec 2010 15:05:33 +0000 (15:05 +0000)]
lockd: Add nlm_destroy_host_locked()

Refactor the tail of nlm_gc_hosts() into nlm_destroy_host() so that
this logic can be used separately from garbage collection.

Rename it _locked() to document that it must be called with the hosts
cache mutex held.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Add nlm_alloc_host()
Chuck Lever [Tue, 14 Dec 2010 15:05:23 +0000 (15:05 +0000)]
lockd: Add nlm_alloc_host()

Refactor nlm_host allocation and initialization into a separate
function.  This will be the common piece of server and client nlm_host
lookup logic after the nlm_host cache is split.

Small change: use kmalloc() instead of kzalloc(), as we're overwriting
almost all fields in the new nlm_host struct with non-zero values
immediately after it is allocated.  An added benefit is we now have an
explicit reference to each field name where it is initialized (for all
you cscope fans out there).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: reorganize nlm_host_rebooted
J. Bruce Fields [Tue, 14 Dec 2010 15:05:13 +0000 (15:05 +0000)]
lockd: reorganize nlm_host_rebooted

Minor reorganization; no change in behavior.  This will save some
duplicated code after we split the client and server host caches.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
[ cel: Forward-ported to 2.6.37 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: define host_for_each{_safe} macros
J. Bruce Fields [Tue, 14 Dec 2010 15:05:03 +0000 (15:05 +0000)]
lockd: define host_for_each{_safe} macros

We've got a lot of loops like this, and I find them a little easier to
read with the macros.  More such loops are coming.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
[ cel: Forward-ported to 2.6.37 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoSUNRPC: New xdr_streams XDR decoder API
Chuck Lever [Tue, 14 Dec 2010 14:59:29 +0000 (14:59 +0000)]
SUNRPC: New xdr_streams XDR decoder API

Now that all client-side XDR decoder routines use xdr_streams, there
should be no need to support the legacy calling sequence [rpc_rqst *,
__be32 *, RPC res *] anywhere.  We can construct an xdr_stream in the
generic RPC code, instead of in each decoder function.

This is a refactoring change.  It should not cause different behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoSUNRPC: New xdr_streams XDR encoder API
Chuck Lever [Tue, 14 Dec 2010 14:59:18 +0000 (14:59 +0000)]
SUNRPC: New xdr_streams XDR encoder API

Now that all client-side XDR encoder routines use xdr_streams, there
should be no need to support the legacy calling sequence [rpc_rqst *,
__be32 *, RPC arg *] anywhere.  We can construct an xdr_stream in the
generic RPC code, instead of in each encoder function.

Also, all the client-side encoder functions return 0 now, making a
return value superfluous.  Take this opportunity to convert them to
return void instead.

This is a refactoring change.  It should not cause different behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoSUNRPC: Determine value of "nrprocs" automatically
Chuck Lever [Tue, 14 Dec 2010 14:59:09 +0000 (14:59 +0000)]
SUNRPC: Determine value of "nrprocs" automatically

Clean up.

Just fixed a panic where the nrprocs field in a different upper layer
client was set by hand incorrectly.  Use the compiler-generated method
used by the other upper layer protocols.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoSUNRPC: Avoid return code checking in rpcbind XDR encoder functions
Chuck Lever [Tue, 14 Dec 2010 14:58:59 +0000 (14:58 +0000)]
SUNRPC: Avoid return code checking in rpcbind XDR encoder functions

Clean up.

The trend in the other XDR encoder functions is to BUG() when encoding
problems occur, since a problem here is always due to a local coding
error.  Then, instead of a status, zero is unconditionally returned.

Update the rpcbind XDR encoders to behave this way.

To finish the update, use the new-style be32_to_cpup() and
cpu_to_be32() macros, and compute the buffer sizes using raw integers
instead of sizeof().  This matches the conventions used in other XDR
functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Remove unused UMNT response data structure
Chuck Lever [Tue, 14 Dec 2010 14:58:49 +0000 (14:58 +0000)]
NFS: Remove unused UMNT response data structure

Clean up.

The UMNT request has a NULL response.  There's no need to set up a
mountres structure for it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Avoid return code checking in mount XDR encoder functions
Chuck Lever [Tue, 14 Dec 2010 14:58:40 +0000 (14:58 +0000)]
NFS: Avoid return code checking in mount XDR encoder functions

Clean up.

The trend in the other XDR encoder functions is to BUG() when encoding
problems occur, since a problem here is always due to a local coding
error.  Then, instead of a status, zero is unconditionally returned.

Update the mount client XDR encoders to behave this way.

To finish the update, use the new-style be32_to_cpup() and
cpu_to_be32() macros, and compute the buffer sizes using raw integers
instead of sizeof().  This matches the conventions used in other XDR
functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNSM: Avoid return code checking in NSM XDR encoder functions
Chuck Lever [Tue, 14 Dec 2010 14:58:30 +0000 (14:58 +0000)]
NSM: Avoid return code checking in NSM XDR encoder functions

Clean up.

The trend in the other XDR encoder functions is to BUG() when encoding
problems occur, since a problem here is always due to a local coding
error.  Then, instead of a status, zero is unconditionally returned.

Update the NSM XDR encoders to behave this way.

To finish the update, use the new-style be32_to_cpup() and
cpu_to_be32() macros, and compute the buffer sizes using raw integers
instead of sizeof().  This matches the conventions used in other XDR
functions

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Squelch compiler warning in decode_getdeviceinfo()
Chuck Lever [Tue, 14 Dec 2010 14:58:21 +0000 (14:58 +0000)]
NFS: Squelch compiler warning in decode_getdeviceinfo()

Clean up.

.../linux/nfs-2.6/fs/nfs/nfs4xdr.c: In function â€˜decode_getdeviceinfo’:
.../linux/nfs-2.6/fs/nfs/nfs4xdr.c:5008: warning: comparison between signed and unsigned integer expressions

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Simplify ->decode_dirent() calling sequence
Chuck Lever [Tue, 14 Dec 2010 14:58:11 +0000 (14:58 +0000)]
NFS: Simplify ->decode_dirent() calling sequence

Clean up.

The pointer returned by ->decode_dirent() is no longer used as a
pointer.  The only call site (xdr_decode() in fs/nfs/dir.c) simply
extracts the errno value encoded in the pointer.  Replace the
returned pointer with a standard integer errno return value.

Also, pass the "server" argument as part of the nfs_entry instead of
as a separate parameter.  It's faster to derive "server" in
nfs_readdir_xdr_to_array() since we already have the directory's inode
handy.  "server" ought to be invariant for a set of entries in the
same directory, right?

The legacy versions of decode_dirent() don't use "server" anyway, so
it's wasted work for them to derive and pass "server" for each entry.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Fix hdrlen calculation in NFSv4's decode_read()
Chuck Lever [Tue, 14 Dec 2010 14:58:01 +0000 (14:58 +0000)]
NFS: Fix hdrlen calculation in NFSv4's decode_read()

When computing the length of the header, be sure to include the
four octets consumed by "count".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Move nlmdbg_cookie2a() to svclock.c
Chuck Lever [Tue, 14 Dec 2010 14:57:52 +0000 (14:57 +0000)]
lockd: Move nlmdbg_cookie2a() to svclock.c

Clean up.  nlmdbg_cookie2a() is used only in svclock.c.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Repair whitespace damage in NFS PROC macro
Chuck Lever [Tue, 14 Dec 2010 14:57:42 +0000 (14:57 +0000)]
NFS: Repair whitespace damage in NFS PROC macro

Clean up.

When I was making other changes in this area, checkscript.pl
complained about the use of leading blanks in the PROC macros in the
xdr files.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFSD: Update XDR decoders in NFSv4 callback client
Chuck Lever [Tue, 14 Dec 2010 14:57:32 +0000 (14:57 +0000)]
NFSD: Update XDR decoders in NFSv4 callback client

Clean up.

Remove old-style NFSv4 XDR macros in favor of the style now used in
fs/nfs/nfs4xdr.c.  These were forgotten during the recent nfs4xdr.c
rewrite.

Additional whitespace cleanup adds to the size of this patch.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFSD: Update XDR encoders in NFSv4 callback client
Chuck Lever [Tue, 14 Dec 2010 14:57:22 +0000 (14:57 +0000)]
NFSD: Update XDR encoders in NFSv4 callback client

Clean up.

Remove old-style NFSv4 XDR macros in favor of the style now used in
fs/nfs/nfs4xdr.c.  These were forgotten during the recent nfs4xdr.c
rewrite.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Introduce new-style XDR functions for NLMv4
Chuck Lever [Tue, 14 Dec 2010 14:57:12 +0000 (14:57 +0000)]
lockd: Introduce new-style XDR functions for NLMv4

We'd like to prevent local buffer overflows caused by malicious or
broken servers.  New xdr_stream style decoders can do that.

For efficiency, we also want to be able to pass xdr_streams from
call_encode() to all XDR encoding functions, rather than building
an xdr_stream in every XDR encoding function in the kernel.

Same idea as the NLM v3 XDR overhaul.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Move and update xdr_decode_foo() functions that we're keeping
Chuck Lever [Tue, 14 Dec 2010 14:57:02 +0000 (14:57 +0000)]
NFS: Move and update xdr_decode_foo() functions that we're keeping

Clean up.

Move the timestamp decoder to match the placement and naming
conventions of the other helpers.  Fold xdr_decode_fattr() into
decode_fattr3(), which is now it's only user.  Fold
xdr_decode_wcc_attr() into decode_wcc_attr(), which is now it's only
user.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Remove unused old NFSv3 decoder functions
Chuck Lever [Tue, 14 Dec 2010 14:56:52 +0000 (14:56 +0000)]
NFS: Remove unused old NFSv3 decoder functions

Clean up.  Remove unused legacy result decoder functions, and any
now unused decoder helper functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Switch in new NFSv3 decoder functions
Chuck Lever [Tue, 14 Dec 2010 14:56:42 +0000 (14:56 +0000)]
NFS: Switch in new NFSv3 decoder functions

The naming scheme of the new decoder functions, which follows the
NFSv4 XDR decoder functions, is slightly different than the scheme
used for the old functions.  Rename the functions as a separate
step to keep the patches clean.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Introduce new-style XDR decoding functions for NFSv2
Chuck Lever [Tue, 14 Dec 2010 14:56:30 +0000 (14:56 +0000)]
NFS: Introduce new-style XDR decoding functions for NFSv2

We'd like to prevent local buffer overflows caused by malicious or
broken servers.  New xdr_stream style decoders can do that.

For efficiency, we also eventually want to be able to pass xdr_streams
from call_decode() to all XDR decoding functions, rather than building
an xdr_stream in every XDR decoding function in the kernel.

Static helper functions are left without the "inline" directive.  This
allows the compiler to choose automatically how to optimize these for
size or speed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Update xdr_encode_foo() functions that we're keeping
Chuck Lever [Tue, 14 Dec 2010 14:56:20 +0000 (14:56 +0000)]
NFS: Update xdr_encode_foo() functions that we're keeping

Clean up.  Move the timestamp and the sattr encoder to match the
placement convention of the other helpers, update their coding style,
and refresh their documenting comments.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Remove unused old NFSv3 encoder functions
Chuck Lever [Tue, 14 Dec 2010 14:56:10 +0000 (14:56 +0000)]
NFS: Remove unused old NFSv3 encoder functions

Clean up.  Remove unused legacy argument encoder functions, and any
now unused encoder helper functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Replace old NFSv3 encoder functions with xdr_stream-based ones
Chuck Lever [Tue, 14 Dec 2010 14:56:01 +0000 (14:56 +0000)]
NFS: Replace old NFSv3 encoder functions with xdr_stream-based ones

The naming scheme of the new encoder functions, which follows the
NFSv4 XDR encoder functions, is slightly different than the scheme
used for the old functions.  Rename the functions as a separate
step to keep the patches clean.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Introduce new-style XDR encoding functions for NFSv3
Chuck Lever [Tue, 14 Dec 2010 14:55:50 +0000 (14:55 +0000)]
NFS: Introduce new-style XDR encoding functions for NFSv3

We're interested in taking advantage of the safety benefits of
xdr_streams.  These data structures allow more careful checking for
buffer overflow while encoding.  More careful type checking is also
introduced in the new functions.

For efficiency, we also eventually want to be able to pass xdr_streams
from call_encode() to all XDR encoding functions, rather than building
an xdr_stream in every XDR encoding function in the kernel.  To do
this means all encoders must be ready to handle a passed-in
xdr_stream.

The new encoders follow the modern paradigm for XDR encoders: BUG on
error, and always return a zero status code.

Static helper functions are left without the "inline" directive.  This
allows the compiler to choose automatically how to optimize these for
size or speed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agolockd: Introduce new-style XDR functions for NLMv3
Chuck Lever [Tue, 14 Dec 2010 14:55:40 +0000 (14:55 +0000)]
lockd: Introduce new-style XDR functions for NLMv3

We'd like to prevent local buffer overflows caused by malicious or
broken servers.  New xdr_stream style decoders can do that.

For efficiency, we also eventually want to be able to pass xdr_streams
from call_encode() and call_decode() to all XDR encoding functions,
rather than building an xdr_stream in every XDR encoding and decoding
function in the kernel.

To do all of this, rewrite the XDR encoding and decoding functions in
fs/lockd/xdr.c to use xdr_streams.  This makes them more or less
incompatible with server-side XDR helper functions, so break them out
into a separate source file.

Static helper functions are left without the "inline" directive.  This
allows the compiler to choose automatically how to optimize these for
size or speed.

SHARE-related functionality doesn't seem to be used, as those
functions are hiding behind a #define that isn't set anywhere that I
can find.  And, they've been in there forever (at least as far back as
the kernel's git history goes), yet remain unused.  Let's take the
opportunity to bin them.  It should be easy enough for someone to
introduce proper XDR functions if at some point SHARE-related NLM
functionality is desired.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Move and update xdr_decode_foo() functions that we're keeping
Chuck Lever [Tue, 14 Dec 2010 14:55:30 +0000 (14:55 +0000)]
NFS: Move and update xdr_decode_foo() functions that we're keeping

Clean up.

Move the timestamp decoder to match the placement and naming
conventions of the other helpers.  Fold xdr_decode_fattr() into
decode_fattr(), which is now it's only user.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Replace old NFSv2 decoder functions with xdr_stream-based ones
Chuck Lever [Tue, 14 Dec 2010 14:55:20 +0000 (14:55 +0000)]
NFS: Replace old NFSv2 decoder functions with xdr_stream-based ones

Clean up.  Remove unused legacy result decoder functions, and any
now unused decoder helper functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Introduce new-style XDR decoding functions for NFSv2
Chuck Lever [Tue, 14 Dec 2010 14:55:10 +0000 (14:55 +0000)]
NFS: Introduce new-style XDR decoding functions for NFSv2

We'd like to prevent local buffer overflows caused by malicious or
broken servers.  New xdr_stream style decoders can do that.

For efficiency, we also eventually want to be able to pass xdr_streams
from call_decode() to all XDR decoding functions, rather than building
an xdr_stream in every XDR decoding function in the kernel.

nfs_decode_dirent() is renamed to follow the naming convention of the
other two dirent decoders.

Static helper functions are left without the "inline" directive.  This
allows the compiler to choose automatically how to optimize these for
size or speed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Use the "nfs_stat" enum for nfs_stat_to_errno()'s argument
Chuck Lever [Tue, 14 Dec 2010 14:55:00 +0000 (14:55 +0000)]
NFS: Use the "nfs_stat" enum for nfs_stat_to_errno()'s argument

Clean up.

To distinguish more clearly between the on-the-wire NFSERR_ value and
our local errno values, use the proper type for the argument of
nfs_stat_to_errno().

Add a documenting comment appropriate for a global function shared
outside this source file.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Update xdr_encode_foo() functions that we're keeping
Chuck Lever [Tue, 14 Dec 2010 14:54:50 +0000 (14:54 +0000)]
NFS: Update xdr_encode_foo() functions that we're keeping

Clean up.

The new helper functions are kept in order by section of RFC 1094.
Move the two timestamp encoders we're keeping, update their coding
style, and refresh their documenting comments.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Remove old NFSv2 encoder functions
Chuck Lever [Tue, 14 Dec 2010 14:54:40 +0000 (14:54 +0000)]
NFS: Remove old NFSv2 encoder functions

Clean up:  Remove unused legacy argument encoder functions, and any
now unused encoder helper functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoNFS: Introduce new-style XDR encoding functions for NFSv2
Chuck Lever [Tue, 14 Dec 2010 14:54:30 +0000 (14:54 +0000)]
NFS: Introduce new-style XDR encoding functions for NFSv2

We're interested in taking advantage of the safety benefits of
xdr_streams.  These data structures allow more careful checking for
buffer overflow while encoding.  More careful type checking is also
introduced in the new functions.

For efficiency, we also eventually want to be able to pass xdr_streams
from call_encode() to all XDR encoding functions, rather than building
an xdr_stream in every XDR encoding function in the kernel.  To do
this means all encoders must be ready to handle a passed-in
xdr_stream.

The new encoders follow the modern paradigm for XDR encoders: BUG on
any error, and always return a zero status code.

Static helper functions are left without the "inline" directive.  This
allows the compiler to choose automatically how to optimize these for
size or speed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
Linus Torvalds [Thu, 16 Dec 2010 16:51:57 +0000 (08:51 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus

* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
  lguest: populate initial_page_table
  lguest: restore boot speed
  lguest: fix crash lguest_time_init

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke...
Linus Torvalds [Thu, 16 Dec 2010 16:34:22 +0000 (08:34 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix regression of garbage collection ioctl

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Thu, 16 Dec 2010 16:33:44 +0000 (08:33 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: define separate EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2
  Input: wacom - add another Bamboo Pen ID (0xd4)

13 years agolguest: populate initial_page_table
Rusty Russell [Thu, 16 Dec 2010 23:03:15 +0000 (17:03 -0600)]
lguest: populate initial_page_table

Two x86 patches broke lguest:
1) v2.6.35-492-g72d7c3b, which changed x86 to use the memblock allocator.

In lguest, the host places linear page tables at the top of mem, which
used to be enough to get us up to the swapper_pg_dir page tables.  With
the first patch, the direct mapping tables used that memory:

Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000

I initially fixed this by lying about the amount of memory we had, so
the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...

2) v2.6.36-rc8-54-gb40827f, which made x86 boot use initial_page_table.

This was initialized in a part of head_32.S which isn't executed by
lguest; it is then copied into swapper_pg_dir.  So we have to initialize
it; and anyway we switch to it before we blatt the old tables, so that
fixes the previous damage as well.

For the moment, I cut & pasted the code into lguest's boot code, but
next merge window I will merge them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: x86@kernel.org

13 years agolguest: restore boot speed
Rusty Russell [Thu, 16 Dec 2010 23:03:15 +0000 (17:03 -0600)]
lguest: restore boot speed

lguest is dumb and drops *all* the pagetables for set_pte (which is
only used for kernel mapping manipulation, so it's OK without highmem).

But it's used a lot in boot, too.  As a guest optimization, we
suppressed this flushing until the first page switch.  Now we have
initial_page_table, that happens much earlier, so extend the heuristic
to wait until we switch to something other than the swapper_pg_dir or
initial_page_table.

As measured on my laptop under kvm, this dropped the time-to-mount-root
from 48 seconds to 4.3 seconds.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
13 years agolguest: fix crash lguest_time_init
Rusty Russell [Thu, 16 Dec 2010 23:03:13 +0000 (17:03 -0600)]
lguest: fix crash lguest_time_init

fe25c7fc2e "x86: lguest: Convert to new irq chip functions" converted
enable_lguest_irq() to take a struct irq_data *, but didn't fix the one
internal caller.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: x86@kernel.org

13 years agonilfs2: fix regression of garbage collection ioctl
Ryusuke Konishi [Thu, 16 Dec 2010 00:57:57 +0000 (09:57 +0900)]
nilfs2: fix regression of garbage collection ioctl

On 2.6.37-rc1, garbage collection ioctl of nilfs was broken due to the
commit 263d90cefc7d82a0 ("nilfs2: remove own inode hash used for GC"),
and leading to filesystem corruption.

The patch doesn't queue gc-inodes for log writer if they are reused
through the vfs inode cache.  Here, gc-inode is the inode which
buffers blocks to be relocated on GC.  That patch queues gc-inodes in
nilfs_init_gcinode() function, but this function is not called when
they don't have I_NEW flag.  Thus, some of live blocks are wrongly
overrode without being moved to new logs.

This resolves the problem by moving the gc-inode queueing to an outer
function to ensure it's done right.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
13 years agoLinux 2.6.37-rc6
Linus Torvalds [Thu, 16 Dec 2010 01:24:48 +0000 (17:24 -0800)]
Linux 2.6.37-rc6

13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Thu, 16 Dec 2010 01:24:05 +0000 (17:24 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: ghash-intel - ghash-clmulni-intel_glue needs err.h

13 years agoMerge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Linus Torvalds [Wed, 15 Dec 2010 20:41:17 +0000 (12:41 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix typo which broke '..' detection in ext4_find_entry()
  ext4: Turn off multiple page-io submission by default

13 years agoxen: Provide a variant of __RING_SIZE() that is an integer constant expression
Jeremy Fitzhardinge [Wed, 8 Dec 2010 20:39:12 +0000 (12:39 -0800)]
xen: Provide a variant of __RING_SIZE() that is an integer constant expression

Without this, gcc 4.5 won't compile xen-netfront and xen-blkfront, where
this is being used to specify array sizes.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: David Miller <davem@davemloft.net>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMAINTAINERS: update MSM git tree
Daniel Walker [Thu, 9 Dec 2010 23:45:27 +0000 (15:45 -0800)]
MAINTAINERS: update MSM git tree

The MSM main git tree has changed over to this new address.

Signed-off-by: Daniel Walker <dwalker@codeaurora.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoinstall_special_mapping skips security_file_mmap check.
Tavis Ormandy [Thu, 9 Dec 2010 14:29:42 +0000 (15:29 +0100)]
install_special_mapping skips security_file_mmap check.

The install_special_mapping routine (used, for example, to setup the
vdso) skips the security check before insert_vm_struct, allowing a local
attacker to bypass the mmap_min_addr security restriction by limiting
the available pages for special mappings.

bprm_mm_init() also skips the check, and although I don't think this can
be used to bypass any restrictions, I don't see any reason not to have
the security check.

  $ uname -m
  x86_64
  $ cat /proc/sys/vm/mmap_min_addr
  65536
  $ cat install_special_mapping.s
  section .bss
      resb BSS_SIZE
  section .text
      global _start
      _start:
          mov     eax, __NR_pause
          int     0x80
  $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s
  $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o
  $ ./install_special_mapping &
  [1] 14303
  $ cat /proc/14303/maps
  0000f000-00010000 r-xp 00000000 00:00 0                                  [vdso]
  00010000-00011000 r-xp 00001000 00:19 2453665                            /home/taviso/install_special_mapping
  00011000-ffffe000 rwxp 00000000 00:00 0                                  [stack]

It's worth noting that Red Hat are shipping with mmap_min_addr set to
4096.

Signed-off-by: Tavis Ormandy <taviso@google.com>
Acked-by: Kees Cook <kees@ubuntu.com>
Acked-by: Robert Swiecki <swiecki@google.com>
[ Changed to not drop the error code - akpm ]
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocrypto: ghash-intel - ghash-clmulni-intel_glue needs err.h
Randy Dunlap [Wed, 15 Dec 2010 09:58:57 +0000 (17:58 +0800)]
crypto: ghash-intel - ghash-clmulni-intel_glue needs err.h

Add missing header file:

arch/x86/crypto/ghash-clmulni-intel_glue.c:256: error: implicit declaration of function 'IS_ERR'
arch/x86/crypto/ghash-clmulni-intel_glue.c:257: error: implicit declaration of function 'PTR_ERR'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
13 years agoInput: define separate EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2
Dmitry Torokhov [Wed, 15 Dec 2010 07:53:21 +0000 (23:53 -0800)]
Input: define separate EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2

The desire to keep old names for the EVIOCGKEYCODE/EVIOCSKEYCODE while
extending them to support large scancodes was a mistake. While we tried
to keep ABI intact (and we succeeded in doing that, programs compiled
on older kernels will work on newer ones) there is still a problem with
recompiling existing software with newer kernel headers.

New kernel headers will supply updated ioctl numbers and kernel will
expect that userspace will use struct input_keymap_entry to set and
retrieve keymap data. But since the names of ioctls are still the same
userspace will happily compile even if not adjusted to make use of the
new structure and will start miraculously fail in the field.

To avoid this issue let's revert EVIOCGKEYCODE/EVIOCSKEYCODE definitions
and add EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2 so that userspace can explicitly
select the style of ioctls it wants to employ.

Reviewed-by: Henrik Rydberg <rydberg@euromail.se>
Acked-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Linus Torvalds [Wed, 15 Dec 2010 02:50:10 +0000 (18:50 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: It is likely that WORKER_NOT_RUNNING is true
  MAINTAINERS: Add workqueue entry
  workqueue: check the allocation of system_unbound_wq

13 years agoMerge branch 'for-linus' of git://neil.brown.name/md
Linus Torvalds [Wed, 15 Dec 2010 02:49:40 +0000 (18:49 -0800)]
Merge branch 'for-linus' of git://neil.brown.name/md

* 'for-linus' of git://neil.brown.name/md:
  md: protect against NULL reference when waiting to start a raid10.
  md: fix bug with re-adding of partially recovered device.
  md: fix possible deadlock in handling flush requests.
  md: move code in to submit_flushes.
  md: remove handling of flush_pending in md_submit_flush_data

13 years agodw_spi: Fix missing final read in some polling situations
Major Lee [Fri, 10 Dec 2010 10:13:49 +0000 (10:13 +0000)]
dw_spi: Fix missing final read in some polling situations

There is a possibility that the last word of a transaction will be lost
if data is not ready.  Re-read in poll_transfer() to solve this issue
when poll_mode is enabled.

Verified on SPI touch screen device.

Signed-off-by: Major Lee <major_lee@wistron.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoi2c_intel_mid: Fix slash in sysfs name
Alan Cox [Tue, 14 Dec 2010 15:29:08 +0000 (15:29 +0000)]
i2c_intel_mid: Fix slash in sysfs name

This gets caught by the new sanity check code. Instead of the slash use a
different symbol. This was originally found by Major Lee who proposed a
rather more complex patch which changed the name according to the chip
type.

On the basis that we are in a late -rc and making Linus grumpy isn't always
a good idea (however fun) this is a simple alternative.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoext4: fix typo which broke '..' detection in ext4_find_entry()
Aaro Koskinen [Wed, 15 Dec 2010 02:45:31 +0000 (21:45 -0500)]
ext4: fix typo which broke '..' detection in ext4_find_entry()

There should be a check for the NUL character instead of '0'.

Fortunately the only thing that cares about this is NFS serving, which
is why we didn't notice this in the merge window testing.

Reported-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
13 years agoMerge branch 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 15 Dec 2010 01:37:08 +0000 (17:37 -0800)]
Merge branch 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6

* 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: wire up accept4 syscall (non-multiplexed path)
  sh: Enable deprecated IRQ chip APIs for MFD and GPIOLIB drivers.