]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
12 years agoNFS: Clean up - Simplify reference counting in fs/nfs/direct.c
Trond Myklebust [Wed, 9 May 2012 17:54:53 +0000 (13:54 -0400)]
NFS: Clean up - Simplify reference counting in fs/nfs/direct.c

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Clean up - Rename nfs_unlock_request and nfs_unlock_request_dont_release
Trond Myklebust [Wed, 9 May 2012 18:04:55 +0000 (14:04 -0400)]
NFS: Clean up - Rename nfs_unlock_request and nfs_unlock_request_dont_release

Function rename to ensure that the functionality of nfs_unlock_request()
mirrors that of nfs_lock_request(). Then let nfs_unlock_and_release_request()
do the work of what used to be called nfs_unlock_request()...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Clean up - simplify nfs_lock_request()
Trond Myklebust [Wed, 9 May 2012 17:19:15 +0000 (13:19 -0400)]
NFS: Clean up - simplify nfs_lock_request()

We only have two places where we need to grab a reference when trying
to lock the nfs_page. We're better off making that explicit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: nfs_set_page_writeback no longer needs to reference the page
Trond Myklebust [Wed, 9 May 2012 17:37:43 +0000 (13:37 -0400)]
NFS: nfs_set_page_writeback no longer needs to reference the page

We now hold a reference to the nfs_page across the calls to
nfs_set_page_writeback and nfs_end_page_writeback, and that
means we already have a reference to the struct page.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Prevent a deadlock in the new writeback code
Trond Myklebust [Wed, 9 May 2012 18:30:35 +0000 (14:30 -0400)]
NFS: Prevent a deadlock in the new writeback code

We have to unlock the nfs_page before we call nfs_end_page_writeback
to avoid races with functions that expect the page to be unlocked
when PG_locked and PG_writeback are not set.
The problem is that nfs_unlock_request also releases the nfs_page,
causing a deadlock if the release of the nfs_open_context
triggers an iput() while the PG_writeback flag is still set...

The solution is to separate the unlocking and release of the nfs_page,
so that we can do the former before nfs_end_page_writeback and the
latter after.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFSv4: nfs_client_return_marked_delegations can't flush data
Trond Myklebust [Sun, 6 May 2012 23:46:30 +0000 (19:46 -0400)]
NFSv4: nfs_client_return_marked_delegations can't flush data

Since even filemap_flush() needs to lock pages that are dirty, we
cannot risk calling it from the state manager context. Therefore,
we need to move the call to filemap_flush() to
nfs_async_inode_return_delegation().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: nfs_inode_return_delegation() should always flush dirty data
Trond Myklebust [Sun, 6 May 2012 23:34:17 +0000 (19:34 -0400)]
NFS: nfs_inode_return_delegation() should always flush dirty data

The assumption is that if you are in a situation where you need to
return the delegation, then you should probably stop caching the
data anyway.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Don't do a full flush to disk on close() if we hold a delegation
Trond Myklebust [Sun, 6 May 2012 23:10:59 +0000 (19:10 -0400)]
NFS: Don't do a full flush to disk on close() if we hold a delegation

If we hold a delegation then we know that it should be safe to continue
to cache the data beyond the close(). However since the process that wrote
the data may die after close(), we may still want to send the data to
server before those RPCSEC_GSS credentials expire. We therefore compromise
by starting writeback to the server, but don't wait for completion.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Fix sparse warnings
Trond Myklebust [Fri, 4 May 2012 17:54:24 +0000 (13:54 -0400)]
NFS: Fix sparse warnings

Fix the following sparse warnings:

fs/nfs/direct.c:221:6: warning: symbol 'nfs_direct_readpage_release' was
not declared. Should it be static?
fs/nfs/read.c:38:43: warning: non-ANSI function declaration of function
'nfs_readhdr_alloc'
fs/nfs/objlayout/objio_osd.c:214:5: warning: symbol '__alloc_objio_seg'
was not declared. Should it be static?

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
12 years agoNFS: Fix O_DIRECT compile warnings
Trond Myklebust [Fri, 4 May 2012 17:47:16 +0000 (13:47 -0400)]
NFS: Fix O_DIRECT compile warnings

Fix the following compile warnings:
fs/nfs/direct.c: In function 'nfs_direct_read_schedule_segment':
fs/nfs/direct.c:325:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:325:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:325:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:352:27: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c: In function 'nfs_direct_write_schedule_segment':
fs/nfs/direct.c:622:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:622:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:622:11: warning: comparison of distinct pointer types
lacks a cast [enabled by default]
fs/nfs/direct.c:650:27: warning: comparison of distinct pointer types
lacks a cast [enabled by default]

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
12 years agoNFS: Adapt readdirplus to application usage patterns
Trond Myklebust [Tue, 1 May 2012 21:37:59 +0000 (17:37 -0400)]
NFS: Adapt readdirplus to application usage patterns

While the use of READDIRPLUS is significantly more efficient than
READDIR followed by many LOOKUP calls, it is still less efficient
than just READDIR if the attributes are not required.

This patch tracks when lookups are attempted on the directory,
and uses that information to selectively disable READDIRPLUS
on that directory.
The first 'readdir' call is always served using READDIRPLUS.
Subsequent calls only use READDIRPLUS if there was a successful
lookup or revalidation on a child in the mean time.

Credit for the original idea should go to Neil Brown. See:
      http://www.spinics.net/lists/linux-nfs/msg19996.html
However, the implementation in this patch differs from Neil's
in that it focuses on tracking lookups rather than calls to
stat().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Neil Brown <neilb@suse.de>
12 years agoNFSv4: COMMIT does not need post-op attributes
Trond Myklebust [Sun, 29 Apr 2012 14:44:42 +0000 (10:44 -0400)]
NFSv4: COMMIT does not need post-op attributes

No attributes are supposed to change during a COMMIT call, so there
is no need to request post-op attributes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Don't request cache consistency attributes on some writes
Trond Myklebust [Sat, 28 Apr 2012 18:55:16 +0000 (14:55 -0400)]
NFSv4: Don't request cache consistency attributes on some writes

We don't need cache consistency information when we're doing O_DIRECT
writes. Ditto for the case of delegated writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Simplify the NFSv4 REMOVE, LINK and RENAME compounds
Trond Myklebust [Fri, 27 Apr 2012 17:48:19 +0000 (13:48 -0400)]
NFSv4: Simplify the NFSv4 REMOVE, LINK and RENAME compounds

Get rid of the post-op GETATTR on the directory in order to reduce
the amount of processing done on the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Simplify the NFSv4 CREATE compound
Trond Myklebust [Fri, 27 Apr 2012 17:48:18 +0000 (13:48 -0400)]
NFSv4: Simplify the NFSv4 CREATE compound

Get rid of the post-op GETATTR on the directory in order to reduce
the amount of processing done on the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Simplify the NFSv4 OPEN compound
Trond Myklebust [Fri, 27 Apr 2012 17:48:18 +0000 (13:48 -0400)]
NFSv4: Simplify the NFSv4 OPEN compound

Get rid of the post-op GETATTR on the directory in order to reduce
the amount of processing done on the server.

The cost is that if we later need to stat() the directory, then we
know that the ctime and mtime are likely to be invalid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Simplify the cache invalidation code
Trond Myklebust [Fri, 27 Apr 2012 17:48:18 +0000 (13:48 -0400)]
NFS: Simplify the cache invalidation code

Now that NFSv2 and NFSv3 have simulated change attributes,
instead of using all three of mtime, ctime and change attribute to
manage data cache consistency, we can simplify the code to just use
the change attribute.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv2/v3: Simulate the change attribute
Trond Myklebust [Fri, 27 Apr 2012 17:48:18 +0000 (13:48 -0400)]
NFSv2/v3: Simulate the change attribute

Use the ctime to simulate a change attribute for NFSv2 and NFSv3.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Change attribute updates should set NFS_INO_REVAL_PAGECACHE
Trond Myklebust [Fri, 27 Apr 2012 17:48:17 +0000 (13:48 -0400)]
NFS: Change attribute updates should set NFS_INO_REVAL_PAGECACHE

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Simplify nfs_fhget()
Trond Myklebust [Fri, 27 Apr 2012 17:48:17 +0000 (13:48 -0400)]
NFS: Simplify nfs_fhget()

If the inode is being initialised, there is no point in
setting flags such as NFS_INO_INVALID_ACCESS,
NFS_INO_INVALID_ACL or NFS_INO_INVALID_DATA since there are
no cached access calls, acls or data caches to invalidate.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Always trust the PageUptodate flag when we have a delegation
Trond Myklebust [Sun, 29 Apr 2012 16:50:01 +0000 (12:50 -0400)]
NFS: Always trust the PageUptodate flag when we have a delegation

We can always use the optimal full page write if we know that we
hold a delegation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Optimise away nfs_check_inode_attributes() when holding a delegation
Trond Myklebust [Sun, 29 Apr 2012 16:30:19 +0000 (12:30 -0400)]
NFS: Optimise away nfs_check_inode_attributes() when holding a delegation

We already know that the attribute cache is valid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Don't force page cache revalidations when holding a delegation
Trond Myklebust [Sun, 29 Apr 2012 15:23:50 +0000 (11:23 -0400)]
NFS: Don't force page cache revalidations when holding a delegation

If we're holding a delegation, then we already know that our
page cache is valid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Retrieve attributes _before_ calling delegreturn
Trond Myklebust [Sat, 28 Apr 2012 20:05:03 +0000 (16:05 -0400)]
NFSv4: Retrieve attributes _before_ calling delegreturn

In order to retrieve cache consistency attributes before
anyone else has a chance to change the inode, we need to
put the GETATTR op _before_ the DELEGRETURN op.

We can then use that as part of a 'nfs_post_op_update_inode_force_wcc()'
call, to ensure that we update the attributes without clearing our
cached data.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Delegreturn only needs the cache consistency bitmask
Trond Myklebust [Fri, 27 Apr 2012 17:48:17 +0000 (13:48 -0400)]
NFSv4: Delegreturn only needs the cache consistency bitmask

In order to do close-to-open cache consistency checking after
a delegreturn, we don't need to retrieve the full set of
attributes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4: Fix a typo in NFS4_enc_link_sz
Trond Myklebust [Fri, 27 Apr 2012 17:48:17 +0000 (13:48 -0400)]
NFSv4: Fix a typo in NFS4_enc_link_sz

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Simplify the nfs_read_completion functions
Trond Myklebust [Tue, 1 May 2012 16:49:58 +0000 (12:49 -0400)]
NFS: Simplify the nfs_read_completion functions

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Clean up nfs read and write error paths
Trond Myklebust [Tue, 1 May 2012 16:07:22 +0000 (12:07 -0400)]
NFS: Clean up nfs read and write error paths

Move the error handling for nfs_generic_pagein() into a single function.
Ditto for nfs_generic_flush().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Read cleanups
Trond Myklebust [Tue, 1 May 2012 15:21:43 +0000 (11:21 -0400)]
NFS: Read cleanups

Remove unused variables, and reformat some code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Fix a compile issue when CONFIG_NFS_V4_1 is undefined
Trond Myklebust [Mon, 30 Apr 2012 22:39:20 +0000 (18:39 -0400)]
NFS: Fix a compile issue when CONFIG_NFS_V4_1 is undefined

struct nfs_direct_req can't compile when struct pnfs_ds_commit_info
is undefined.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Use kmem_cache_zalloc() in nfs_direct_req_alloc
Trond Myklebust [Mon, 30 Apr 2012 22:31:49 +0000 (18:31 -0400)]
NFS: Use kmem_cache_zalloc() in nfs_direct_req_alloc

Simplify the initialisation of O_DIRECT requests.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Simplify O_DIRECT page referencing
Trond Myklebust [Mon, 30 Apr 2012 17:27:31 +0000 (13:27 -0400)]
NFS: Simplify O_DIRECT page referencing

The O_DIRECT code shouldn't need to hold 2 references to each page. The
reference held by the struct nfs_page should suffice.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: O_DIRECT pgio_completion_ops error_cleanup must unlock the request
Trond Myklebust [Mon, 30 Apr 2012 17:40:06 +0000 (13:40 -0400)]
NFS: O_DIRECT pgio_completion_ops error_cleanup must unlock the request

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Ensure that we break out of read/write_schedule_segment on error
Trond Myklebust [Mon, 30 Apr 2012 17:22:54 +0000 (13:22 -0400)]
NFS: Ensure that we break out of read/write_schedule_segment on error

Currently we do break out of the for() loop, but we also need to
break out of the enclosing do {} while()...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Define dummy nfs_init_cinfo() and nfs_init_cinfo_from_inode()
Bryan Schumaker [Mon, 30 Apr 2012 18:30:22 +0000 (14:30 -0400)]
NFS: Define dummy nfs_init_cinfo() and nfs_init_cinfo_from_inode()

These are needed when v3 and v4 are not enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Define nfs_direct_write_schedule_work() when v3 and v4 are disabled
Bryan Schumaker [Mon, 30 Apr 2012 17:27:11 +0000 (13:27 -0400)]
NFS: Define nfs_direct_write_schedule_work() when v3 and v4 are disabled

v2 doesn't have commits, so this function can be a no-op.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: pnfs_pageio_init_read() and init_write() need an extra argument
Bryan Schumaker [Mon, 30 Apr 2012 17:06:53 +0000 (13:06 -0400)]
NFS: pnfs_pageio_init_read() and init_write() need an extra argument

This is only when CONFIG_NFS_V4_1 isn't enabled.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Acked-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Fix a use-before-initialised warning in fs/nfs/write.c and fs/nfs/pnfs.c
Trond Myklebust [Fri, 27 Apr 2012 18:31:47 +0000 (14:31 -0400)]
NFS: Fix a use-before-initialised warning in fs/nfs/write.c and fs/nfs/pnfs.c

If the allocation of nfs_write_header fails, the list of nfs_pages that
needs to be cleaned up is still on desc->pg_list...

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Fred Isaman <iisaman@netapp.com>
12 years agoNFS: Remove extra rpc_clnt argument to proc_lookup
Bryan Schumaker [Fri, 27 Apr 2012 17:27:46 +0000 (13:27 -0400)]
NFS: Remove extra rpc_clnt argument to proc_lookup

Now that I'm doing secinfo automatically in the v4 code this extra
argument isn't needed.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Create a submount rpc_op
Bryan Schumaker [Fri, 27 Apr 2012 17:27:45 +0000 (13:27 -0400)]
NFS: Create a submount rpc_op

This simplifies the code for v2 and v3 and gives v4 a chance to decide
on referrals without needing to modify the generic client.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Remove secinfo knowledge out of the generic client
Bryan Schumaker [Fri, 27 Apr 2012 17:27:44 +0000 (13:27 -0400)]
NFS: Remove secinfo knowledge out of the generic client

And also remove the unneeded rpc_op.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Prevent garbage cinfo->ds from leaking out
Fred Isaman [Tue, 24 Apr 2012 18:50:34 +0000 (14:50 -0400)]
NFS: Prevent garbage cinfo->ds from leaking out

This is a bugfix that applies on top of the previous directio patches,
that fixes a bug introduced in "NFS: create struct nfs_commit_info".

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: rewrite directio write to use async coalesce code
Fred Isaman [Fri, 20 Apr 2012 18:47:57 +0000 (14:47 -0400)]
NFS: rewrite directio write to use async coalesce code

This also has the advantage that it allows directio to use pnfs.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: avoid some stat gathering for direct io
Fred Isaman [Fri, 20 Apr 2012 18:47:56 +0000 (14:47 -0400)]
NFS: avoid some stat gathering for direct io

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: add dreq to nfs_commit_info
Fred Isaman [Fri, 20 Apr 2012 18:47:55 +0000 (14:47 -0400)]
NFS: add dreq to nfs_commit_info

Need this to pass into nfs_commitdata_init, in order to keep data->dreq
accurate.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: create nfs_commit_completion_ops
Fred Isaman [Fri, 20 Apr 2012 18:47:54 +0000 (14:47 -0400)]
NFS: create nfs_commit_completion_ops

Factors out the code that needs to change when directio
starts using these code paths.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: create struct nfs_commit_info
Fred Isaman [Fri, 20 Apr 2012 18:47:53 +0000 (14:47 -0400)]
NFS: create struct nfs_commit_info

It is COMMIT that is handled the most differently between
the paged and direct paths.  Create a structure that encapsulates
everything either path needs to know about the commit state.

We could use void to hide some of the layout driver stuff, but
Trond suggests pulling it out to ensure type checking, given the
huge changes being made, and the fact that it doesn't interfere
with other drivers.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: create nfs_generic_commit_list
Fred Isaman [Fri, 20 Apr 2012 18:47:52 +0000 (14:47 -0400)]
NFS: create nfs_generic_commit_list

Simple refactoring.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: rewrite directio read to use async coalesce code
Fred Isaman [Fri, 20 Apr 2012 18:47:51 +0000 (14:47 -0400)]
NFS: rewrite directio read to use async coalesce code

This also has the advantage that it allows directio to use pnfs.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: prepare coalesce testing for directio
Fred Isaman [Fri, 20 Apr 2012 23:55:31 +0000 (19:55 -0400)]
NFS: prepare coalesce testing for directio

The coalesce code made assumptions that will no longer be true once
non-page aligned io occurs.  This introduces no change in
current behavior, but allows for more general situations to come.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: remove unused wb_complete field from struct nfs_page
Fred Isaman [Fri, 20 Apr 2012 18:47:49 +0000 (14:47 -0400)]
NFS: remove unused wb_complete field from struct nfs_page

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: create completion structure to pass into page_init functions
Fred Isaman [Fri, 20 Apr 2012 18:47:48 +0000 (14:47 -0400)]
NFS: create completion structure to pass into page_init functions

Factors out the code that will need to change when directio
starts using these code paths.  This will allow directio to use
the generic pagein and flush routines

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: merge _full and _partial write rpc_ops
Fred Isaman [Fri, 20 Apr 2012 18:47:47 +0000 (14:47 -0400)]
NFS: merge _full and _partial write rpc_ops

Decouple nfs_pgio_header and nfs_write_data, and have (possibly
multiple) nfs_write_datas each take a refcount on nfs_pgio_header.

For the moment keeps nfs_write_header as a way to preallocate a single
nfs_write_data with the nfs_pgio_header.  The code doesn't need this,
and would be prettier without, but given the amount of churn I am
already introducing I didn't want to play with tuning new mempools.

This also fixes bug in pnfs_ld_handle_write_error.  In the case of
desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing
replay attempt to do nothing.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: merge _full and _partial read rpc_ops
Fred Isaman [Fri, 20 Apr 2012 18:47:46 +0000 (14:47 -0400)]
NFS: merge _full and _partial read rpc_ops

Decouple nfs_pgio_header and nfs_read_data, and have (possibly
multiple) nfs_read_datas each take a refcount on nfs_pgio_header.

For the moment keeps nfs_read_header as a way to preallocate a single
nfs_read_data with the nfs_pgio_header.  The code doesn't need this,
and would be prettier without, but given the amount of churn I am
already introducing I didn't want to play with tuning new mempools.

This also fixes bug in pnfs_ld_handle_read_error.  In the case of
desc->pg_bsize < PAGE_CACHE_SIZE, the pages list was empty, causing
replay attempt to do nothing.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: create struct nfs_page_array
Fred Isaman [Fri, 20 Apr 2012 18:47:45 +0000 (14:47 -0400)]
NFS: create struct nfs_page_array

Both nfs_read_data and nfs_write_data devote several fields which
can be combined into a single shared struct.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: create common nfs_pgio_header for both read and write
Fred Isaman [Fri, 20 Apr 2012 18:47:44 +0000 (14:47 -0400)]
NFS: create common nfs_pgio_header for both read and write

In order to avoid duplicating all the data in nfs_read_data whenever we
split it up into multiple RPC calls (either due to a short read result
or due to rsize < PAGE_SIZE), we split out the bits that are the same
per RPC call into a separate "header" structure.

The goal this patch moves towards is to have a single header
refcounted by several rpc_data structures.  Thus, want to always refer
from rpc_data to the header, and not the other way.  This patch comes
close to that ideal, but the directio code currently needs some
special casing, isolated in the nfs_direct_[read_write]hdr_release()
functions.  This will be dealt with in a future patch.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: use req_offset where appropriate
Fred Isaman [Fri, 20 Apr 2012 18:47:43 +0000 (14:47 -0400)]
NFS: use req_offset where appropriate

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: remove unnecessary casts of void pointers in nfs4filelayout.c
Fred Isaman [Fri, 20 Apr 2012 18:47:42 +0000 (14:47 -0400)]
NFS: remove unnecessary casts of void pointers in nfs4filelayout.c

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: reverse arg order in nfs_initiate_[read|write]
Fred Isaman [Fri, 20 Apr 2012 18:47:41 +0000 (14:47 -0400)]
NFS: reverse arg order in nfs_initiate_[read|write]

Make it consistent with nfs_initiate_commit.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: dprintks in directio code were referencing task after put
Fred Isaman [Fri, 20 Apr 2012 18:47:40 +0000 (14:47 -0400)]
NFS: dprintks in directio code were referencing task after put

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: add a struct nfs_commit_data to replace nfs_write_data in commits
Fred Isaman [Fri, 20 Apr 2012 18:47:39 +0000 (14:47 -0400)]
NFS: add a struct nfs_commit_data to replace nfs_write_data in commits

Commits don't need the vectors of pages, etc. that writes do. Split out
a separate structure for the commit operation.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS4.1: Add lseg to struct nfs4_fl_commit_bucket
Fred Isaman [Fri, 20 Apr 2012 18:47:38 +0000 (14:47 -0400)]
NFS4.1: Add lseg to struct nfs4_fl_commit_bucket

Also create a commit_info structure to hold the bucket array and push
it up from the lseg to the layout where it really belongs.

While we are at it, fix a refcounting bug due to an (incorrect)
implicit assumption that filelayout_scan_ds_commit_list always
completely emptied the src list.

This clarifies refcounting, removes the ugly find_only_write_lseg
functions, and pushes the file layout commit code along on the path to
supporting multiple lsegs.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS4.1: make pnfs_ld_[read|write]_done consistent
Fred Isaman [Fri, 20 Apr 2012 18:47:37 +0000 (14:47 -0400)]
NFS4.1: make pnfs_ld_[read|write]_done consistent

The two functions had diverged quite a bit, with the write function
being a bit more robust than the read.

However, these still break badly in the desc->pg_bsize < PAGE_CACHE_SIZE case,
as then there is nothing hanging on the data->pages list, and the resend
ends up doing nothing.  This will be fixed in a patch later in the series.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: grab open context in direct read
Fred Isaman [Fri, 20 Apr 2012 18:47:36 +0000 (14:47 -0400)]
NFS: grab open context in direct read

Signed-off-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Remove unused function nfs_lookup_with_sec()
Bryan Schumaker [Fri, 27 Apr 2012 17:27:43 +0000 (13:27 -0400)]
NFS: Remove unused function nfs_lookup_with_sec()

This fixes a compiler warning.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Honor the authflavor set in the clone mount data
Bryan Schumaker [Fri, 27 Apr 2012 17:27:42 +0000 (13:27 -0400)]
NFS: Honor the authflavor set in the clone mount data

The authflavor is set in an nfs_clone_mount structure and passed to the
xdev_mount() functions where it was promptly ignored.  Instead, use it
to initialize an rpc_clnt for the cloned server.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Fix following referral mount points with different security
Bryan Schumaker [Fri, 27 Apr 2012 17:27:41 +0000 (13:27 -0400)]
NFS: Fix following referral mount points with different security

I create a new proc_lookup_mountpoint() to use when submounting an NFS
v4 share.  This function returns an rpc_clnt to use for performing an
fs_locations() call on a referral's mountpoint.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Do secinfo as part of lookup
Bryan Schumaker [Fri, 27 Apr 2012 17:27:40 +0000 (13:27 -0400)]
NFS: Do secinfo as part of lookup

Whenever lookup sees wrongsec do a secinfo and retry the lookup to find
attributes of the file or directory, such as "is this a referral
mountpoint?".  This also allows me to remove handling -NFS4ERR_WRONSEC
as part of getattr xdr decoding.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Handle exceptions coming out of nfs4_proc_fs_locations()
Bryan Schumaker [Fri, 27 Apr 2012 17:27:39 +0000 (13:27 -0400)]
NFS: Handle exceptions coming out of nfs4_proc_fs_locations()

We don't want to return -NFS4ERR_WRONGSEC to the VFS because it could
cause the kernel to oops.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFS: Fix SECINFO_NO_NAME
Bryan Schumaker [Fri, 27 Apr 2012 17:27:38 +0000 (13:27 -0400)]
NFS: Fix SECINFO_NO_NAME

I was using the same decoder function for SECINFO and SECINFO_NO_NAME,
so it was returning an error when it tried to decode an OP_SECINFO_NO_NAME
header as OP_SECINFO.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: traverse clients tree on PipeFS event
Stanislav Kinsbursky [Fri, 27 Apr 2012 09:00:17 +0000 (13:00 +0400)]
SUNRPC: traverse clients tree on PipeFS event

v2: recursion was replaced by loop

If client is a clone, then it's parent can not be in the list.
But parent's Pipefs dentries have to be created and destroyed.

Note: event skip helper for clients introduced

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: set per-net PipeFS superblock before notification
Stanislav Kinsbursky [Fri, 20 Apr 2012 14:19:56 +0000 (18:19 +0400)]
SUNRPC: set per-net PipeFS superblock before notification

There can be a case, when on MOUNT event RPC client (after it's dentries were
created) is not longer hold by anyone except notification callback.
I.e. on release this client will be destoroyed. And it's dentries have to be
destroyed as well. Which in turn requires per-net PipeFS superblock to be set.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: skip clients with program without PipeFS entries
Stanislav Kinsbursky [Fri, 20 Apr 2012 14:19:18 +0000 (18:19 +0400)]
SUNRPC: skip clients with program without PipeFS entries

1) This is sane.
2) Otherwise there will be soft lockup:

do {
rpc_get_client_for_event (clnt->cl_dentry == NULL ==> choose)
__rpc_pipefs_event (clnt->cl_program->pipe_dir_name == NULL ==> return)
} while (1)

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoSUNRPC: skip dead but not buried clients on PipeFS events
Stanislav Kinsbursky [Fri, 20 Apr 2012 14:11:02 +0000 (18:11 +0400)]
SUNRPC: skip dead but not buried clients on PipeFS events

These clients can't be safely dereferenced if their counter in 0.

Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoAvoid beyond bounds copy while caching ACL
Sachin Prabhu [Tue, 17 Apr 2012 13:36:40 +0000 (14:36 +0100)]
Avoid beyond bounds copy while caching ACL

When attempting to cache ACLs returned from the server, if the bitmap
size + the ACL size is greater than a PAGE_SIZE but the ACL size itself
is smaller than a PAGE_SIZE, we can read past the buffer page boundary.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Jian Li <jiali@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoAvoid reading past buffer when calling GETACL
Sachin Prabhu [Tue, 17 Apr 2012 13:35:39 +0000 (14:35 +0100)]
Avoid reading past buffer when calling GETACL

Bug noticed in commit
bf118a342f10dafe44b14451a1392c3254629a1f

When calling GETACL, if the size of the bitmap array, the length
attribute and the acl returned by the server is greater than the
allocated buffer(args.acl_len), we can Oops with a General Protection
fault at _copy_from_pages() when we attempt to read past the pages
allocated.

This patch allocates an extra PAGE for the bitmap and checks to see that
the bitmap + attribute_length + ACLs don't exceed the buffer space
allocated to it.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reported-by: Jian Li <jiali@redhat.com>
[Trond: Fixed a size_t vs unsigned int printk() warning]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agofix page number calculation bug for block layout decode buffer
Jim Rees [Tue, 10 Apr 2012 02:33:39 +0000 (22:33 -0400)]
fix page number calculation bug for block layout decode buffer

Signed-off-by: Jim Rees <rees@umich.edu>
Suggested-by: Andy Adamson <andros@netapp.com>
Suggested-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoNFSv4.1 fix page number calculation bug for filelayout decode buffers
Andy Adamson [Sat, 14 Apr 2012 07:56:35 +0000 (03:56 -0400)]
NFSv4.1 fix page number calculation bug for filelayout decode buffers

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agopnfs-obj: Remove unused variable from objlayout_get_deviceinfo()
Sachin Bhamare [Fri, 30 Mar 2012 21:29:59 +0000 (14:29 -0700)]
pnfs-obj: Remove unused variable from objlayout_get_deviceinfo()

Local variable 'sb' was not being used in objlayout_get_deviceinfo().

Signed-off-by: Sachin Bhamare <sbhamare@panasas.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agonfs4: fix referrals on mounts that use IPv6 addrs
Weston Andros Adamson [Tue, 24 Apr 2012 20:50:37 +0000 (16:50 -0400)]
nfs4: fix referrals on mounts that use IPv6 addrs

All referrals (IPv4 addr, IPv6 addr, and DNS) are broken on mounts of
IPv6 addresses, because validation code uses a path that is parsed
from the dev_name ("<server>:<path>") by splitting on the first colon and
colons are used in IPv6 addrs.
This patch ignores colons within IPv6 addresses that are escaped by '[' and ']'.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
12 years agoMerge tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Thu, 26 Apr 2012 04:38:44 +0000 (21:38 -0700)]
Merge tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 - Fix NFSv4 infinite loops on open(O_TRUNC)
 - Fix an Oops and an infinite loop in the NFSv4 flock code
 - Don't register the PipeFS filesystem until it has been set up
 - Fix an Oops in nfs_try_to_update_request
 - Don't reuse NFSv4 open owners: fixes a bad sequence id storm.

* tag 'nfs-for-3.4-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Keep dropped state owners on the LRU list for a while
  NFSv4: Ensure that we don't drop a state owner more than once
  NFSv4: Ensure we do not reuse open owner names
  nfs: Enclose hostname in brackets when needed in nfs_do_root_mount
  NFS: put open context on error in nfs_flush_multi
  NFS: put open context on error in nfs_pagein_multi
  NFSv4: Fix open(O_TRUNC) and ftruncate() error handling
  NFSv4: Ensure that we check lock exclusive/shared type against open modes
  NFSv4: Ensure that the LOCK code sets exception->inode
  NFS: check for req==NULL in nfs_try_to_update_request cleanup
  SUNRPC: register PipeFS file system after pernet sybsystem

12 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 26 Apr 2012 04:29:26 +0000 (21:29 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from H. Peter Anvin.

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x32, siginfo: Provide proper overrides for x32 siginfo_t
  asm-generic: Allow overriding clock_t and add attributes to siginfo_t
  x32: Check __ILP32__ instead of __LP64__ for x32
  x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler
  ACPI: Convert wake_sleep_flags to a value instead of function
  x86, apic: APIC code touches invalid MSR on P5 class machines
  i387: ptrace breaks the lazy-fpu-restore logic
  x86/platform: Remove incorrect error message in x86_default_fixup_cpu_id()
  x86, efi: Add dedicated EFI stub entry point
  x86/amd: Remove broken links from comment and kernel message
  x86, microcode: Ensure that module is only loaded on supported AMD CPUs
  x86, microcode: Fix sysfs warning during module unload on unsupported CPUs

12 years agoMerge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86
Linus Torvalds [Thu, 26 Apr 2012 04:28:10 +0000 (21:28 -0700)]
Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86

Pull x86 platform driver fixes from Matthew Garrett:
 "One annoyance fix (make intel_ips stop complaining unnecessarily) and
  one oops fix (unterminated list in dell-laptop).  Both have been in
  -next for a while with no complaints."

* 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
  dell-laptop: Terminate quirks list properly
  intel_ips: Hush the i915 symbols message

12 years agomm: memcg: move pc lookup point to commit_charge()
Johannes Weiner [Tue, 24 Apr 2012 18:22:33 +0000 (20:22 +0200)]
mm: memcg: move pc lookup point to commit_charge()

None of the callsites actually need the page_cgroup descriptor
themselves, so just pass the page and do the look up in there.

We already had two bugs (6568d4a 'mm: memcg: update the correct soft
limit tree during migration' and 'memcg: fix Bad page state after
replace_page_cache') where the passed page and pc were not referring
to the same page frame.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agomm: nobootmem: Correct alloc_bootmem semantics.
David Miller [Wed, 25 Apr 2012 20:10:50 +0000 (16:10 -0400)]
mm: nobootmem: Correct alloc_bootmem semantics.

The comments above __alloc_bootmem_node() claim that the code will
first try the allocation using 'goal' and if that fails it will
try again but with the 'goal' requirement dropped.

Unfortunately, this is not what the code does, so fix it to do so.

This is important for nobootmem conversions to architectures such
as sparc where MAX_DMA_ADDRESS is infinity.

On such architectures all of the allocations done by generic spots,
such as the sparse-vmemmap implementation, will pass in:

__pa(MAX_DMA_ADDRESS)

as the goal, and with the limit given as "-1" this will always fail
unless we add the appropriate fallback logic here.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes
Linus Torvalds [Tue, 24 Apr 2012 15:22:25 +0000 (08:22 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes

Pull gfs2 fixes from Steven Whitehouse.

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
  GFS2: Instruct DLM to avoid queue convert slowdown

12 years agoMerge tag 'hsi_fixes_for_3.4' of git://gitorious.org/kernel-hsi/kernel-hsi
Linus Torvalds [Tue, 24 Apr 2012 15:20:28 +0000 (08:20 -0700)]
Merge tag 'hsi_fixes_for_3.4' of git://gitorious.org/kernel-hsi/kernel-hsi

Pull HSI fixes and ABI documentation from Carlos Chinea

* tag 'hsi_fixes_for_3.4' of git://gitorious.org/kernel-hsi/kernel-hsi:
  HSI: Add HSI ABI documentation
  HSI: hsi_char: Remove max_data_size from sysfs
  HSI: hsi: Rework hsi_event interface
  HSI: hsi: Remove controllers and ports from the bus
  HSI: hsi: Fix error path cleanup on client registration
  HSI: hsi: Rework hsi_controller release

12 years agoGFS2: Instruct DLM to avoid queue convert slowdown
Bob Peterson [Tue, 10 Apr 2012 18:45:24 +0000 (14:45 -0400)]
GFS2: Instruct DLM to avoid queue convert slowdown

This patch instructs DLM to prevent an "in place" conversion, where the
lock just stays on the granted queue, and instead forces the conversion to
the back of the convert queue. This is done on upward conversions only.

This is useful in cases where, for example, a lock is frequently needed in
PR on one node, but another node needs it temporarily in EX to update it.
This may happen, for example, when the rindex is being updated by gfs2_grow.
The gfs2_grow needs to have the lock in EX, but the other nodes need to
re-read it to retrieve the updates. The glock is already granted in PR on
the non-growing nodes, so this prevents them from continually re-granting
the lock in PR, and forces the EX from gfs2_grow to go through.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
12 years agoMerge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso...
Linus Torvalds [Tue, 24 Apr 2012 02:52:00 +0000 (19:52 -0700)]
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bug fixes from Ted Ts'o:
 "These are two low-risk bug fixes for ext4, fixing a compile warning
  and a potential deadlock."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  super.c: unused variable warning without CONFIG_QUOTA
  jbd2: use GFP_NOFS for blkdev_issue_flush

12 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux...
Linus Torvalds [Tue, 24 Apr 2012 02:50:48 +0000 (19:50 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel

Pull Hexagon fixes from Richard Kuo:
 "It's mostly compile fixes and the Hexagon portion of a CPU hotplug
  patch set."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel:
  hexagon: add missing cpu.h include
  hexagon/CPU hotplug: Add missing call to notify_cpu_starting()
  hexagon:  use renamed tick_nohz_idle_* functions
  Hexagon: misc compile warning/error cleanup due to missing headers

12 years agoMerge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Linus Torvalds [Tue, 24 Apr 2012 02:45:19 +0000 (19:45 -0700)]
Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild

Pull build system failure fix from Michal Marek:
 "This fixes build failure with newer gcc that adds some internal
  symbols that end in "__mod_*_device_table", but are not actually the
  tables themselves."

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  Fix modpost failures in fedora 17

12 years agosuper.c: unused variable warning without CONFIG_QUOTA
Eldad Zack [Sun, 22 Apr 2012 15:50:52 +0000 (17:50 +0200)]
super.c: unused variable warning without CONFIG_QUOTA

sb info is only checked with quota support.

fs/ext4/super.c: In function ‘parse_options’:
fs/ext4/super.c:1600:23: warning: unused variable ‘sbi’ [-Wunused-variable]

Signed-off-by: Eldad Zack <eldad@fogrefinery.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
12 years agojbd2: use GFP_NOFS for blkdev_issue_flush
Shaohua Li [Fri, 13 Apr 2012 02:27:35 +0000 (10:27 +0800)]
jbd2: use GFP_NOFS for blkdev_issue_flush

flush request is issued in transaction commit code path, so looks using
GFP_KERNEL to allocate memory for flush request bio falls into the classic
deadlock issue.  I saw btrfs and dm get it right, but ext4, xfs and md are
using GFP.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
12 years agoMerge tag 'md-3.4-fixes' of git://neil.brown.name/md
Linus Torvalds [Tue, 24 Apr 2012 01:25:01 +0000 (18:25 -0700)]
Merge tag 'md-3.4-fixes' of git://neil.brown.name/md

Pull a few more md bug fixes from NeilBrown:
 "2 are tagged for -stable, one being for a fairly serious bug that can
  corrupt metadata and make it hard to recovery an array.  The other is
  for a more recent regression since 3.3"

* tag 'md-3.4-fixes' of git://neil.brown.name/md:
  md: fix possible corruption of array metadata on shutdown.
  md: don't call ->add_disk unless there is good reason.
  DM RAID: Use safe version of rdev_for_each

12 years agoMerge tag 'dlm-fixes-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland...
Linus Torvalds [Tue, 24 Apr 2012 01:22:42 +0000 (18:22 -0700)]
Merge tag 'dlm-fixes-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm fixes from David Teigland:
 "This includes one short patch fixing the behavior of the QUECVT flag,
  which the gfs2 folks are waiting on."

* tag 'dlm-fixes-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: fix QUECVT when convert queue is empty

12 years agomm: fix s390 BUG by __set_page_dirty_no_writeback on swap
Hugh Dickins [Mon, 23 Apr 2012 18:14:50 +0000 (11:14 -0700)]
mm: fix s390 BUG by __set_page_dirty_no_writeback on swap

Mel reports a BUG_ON(slot == NULL) in radix_tree_tag_set() on s390
3.0.13: called from __set_page_dirty_nobuffers() when page_remove_rmap()
tries to transfer dirty flag from s390 storage key to struct page and
radix_tree.

That would be because of reclaim's shrink_page_list() calling
add_to_swap() on this page at the same time: first PageSwapCache is set
(causing page_mapping(page) to appear as &swapper_space), then
page->private set, then tree_lock taken, then page inserted into
radix_tree - so there's an interval before taking the lock when the
radix_tree slot is empty.

We could fix this by moving __add_to_swap_cache()'s spin_lock_irq up
before the SetPageSwapCache.  But a better fix is simply to do what's
five years overdue: Ken Chen introduced __set_page_dirty_no_writeback()
(if !PageDirty TestSetPageDirty) for tmpfs to skip all the radix_tree
overhead, and swap is just the same - it ignores the radix_tree tag, and
does not participate in dirty page accounting, so should be using
__set_page_dirty_no_writeback() too.

s390 testing now confirms that this does indeed fix the problem.

Reported-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ken Chen <kenchen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
12 years agox32, siginfo: Provide proper overrides for x32 siginfo_t
H. Peter Anvin [Mon, 23 Apr 2012 23:34:12 +0000 (16:34 -0700)]
x32, siginfo: Provide proper overrides for x32 siginfo_t

Provide the proper override macros for x32 siginfo_t.  The combination
of a special type here and an overall alignment constraint actually
ends up with all the types being properly aligned, but the hack is
needed to keep the substructures inside siginfo_t from adding padding.

Note: use __attribute__((aligned())) since __aligned() is not exported
to user space.

[ v2: fix stray semicolon ]

Reported-by: H.J. Lu <hjl.rools@gmail.com>
Cc: Bruce J. Beare <bruce.j.beare@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/CAMe9rOqF6Kh6-NK7oP0Fpzkd4SBAWU%2BG53hwBbSD4iA2UzyxuA@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
12 years agomd: fix possible corruption of array metadata on shutdown.
NeilBrown [Tue, 24 Apr 2012 00:23:16 +0000 (10:23 +1000)]
md: fix possible corruption of array metadata on shutdown.

commit c744a65c1e2d59acc54333ce8
  md: don't set md arrays to readonly on shutdown.

removed the possibility of a 'BUG' when data is written to an array
that has just been switched to read-only, but also introduced the
possibility that the array metadata could be corrupted.

If, when md_notify_reboot gets the mddev lock, the array is
in a state where it is assembled but hasn't been started (as can
happen if the personality module is not available, or in other unusual
situations), then incorrect metadata will be written out making it
impossible to re-assemble the array.

So only call __md_stop_writes() if the array has actually been
activated.

This patch is needed for any stable kernel which has had the above
commit applied.

Cc: stable@vger.kernel.org
Reported-by: Christoph Nelles <evilazrael@evilazrael.de>
Signed-off-by: NeilBrown <neilb@suse.de>
12 years agomd: don't call ->add_disk unless there is good reason.
NeilBrown [Tue, 24 Apr 2012 00:23:14 +0000 (10:23 +1000)]
md: don't call ->add_disk unless there is good reason.

Commit 7bfec5f35c68121e7b18

   md/raid5: If there is a spare and a want_replacement device, start replacement.

cause md_check_recovery to call ->add_disk much more often.
Instead of only when the array is degraded, it is now called whenever
md_check_recovery finds anything useful to do, which includes
updating the metadata for clean<->dirty transition.
This causes unnecessary work, and causes info messages from ->add_disk
to be reported much too often.

So refine md_check_recovery to only do any actual recovery checking
(including ->add_disk) if MD_RECOVERY_NEEDED is set.

This fix is suitable for 3.3.y:

Cc: stable@vger.kernel.org
Reported-by: Jan Ceuleers <jan.ceuleers@computer.org>
Signed-off-by: NeilBrown <neilb@suse.de>
12 years agoDM RAID: Use safe version of rdev_for_each
Jonathan Brassow [Tue, 24 Apr 2012 00:23:13 +0000 (10:23 +1000)]
DM RAID: Use safe version of rdev_for_each

Fix segfault caused by using rdev_for_each instead of rdev_for_each_safe

Commit dafb20fa34320a472deb7442f25a0c086e0feb33 mistakenly replaced a safe
iterator with an unsafe one when making some macro changes.

Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>