Jeff Layton [Wed, 12 Dec 2012 16:38:44 +0000 (11:38 -0500)]
nfs: don't zero out the rest of the page if we hit the EOF on a DIO READ
Eryu provided a test program that would segfault when attempting to read
past the EOF on file that was opened O_DIRECT. The buffer given to the
read() call was on the stack, and when he attempted to read past it it
would scribble over the rest of the stack page.
If we hit the end of the file on a DIO READ request, then we don't want
to zero out the rest of the buffer. These aren't pagecache pages after
all, and there's no guarantee that the buffers that were passed in
represent entire pages.
Cc: <stable@vger.kernel.org> # v3.5+ Cc: Fred Isaman <iisaman@netapp.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Tue, 11 Dec 2012 17:10:14 +0000 (12:10 -0500)]
NFSv4.1: Be conservative about the client highest slotid
If the server sends us a target that looks like an outlier, but
is lower than the existing target, then respect it anyway.
However defer actually updating the generation counter until
we get a target that doesn't look like an outlier.
Trond Myklebust [Tue, 11 Dec 2012 15:31:12 +0000 (10:31 -0500)]
NFSv4.1: Handle NFS4ERR_BADSLOT errors correctly
Most (all) NFS4ERR_BADSLOT errors are due to the client failing to
respect the server's sr_highest_slotid limit. This mainly happens
due to reordered RPC requests.
The way to handle it is simply to drop the slot that we're using,
and retry using the new highest_slotid limits.
While there's no locking involved here, the operations are serialized,
so CTO should prevent corruption.
The first write to the file is fine and writes 4 bytes. The file is then
extended on the server. When it's reopened a GETATTR is issued and the
size change is noticed. This causes NFS_INO_INVALID_DATA to be set on
the file. Because the file is opened for write only,
nfs_want_read_modify_write() returns 0 to nfs_write_begin().
nfs_updatepage then calls nfs_write_pageuptodate() to see if it should
extend the nfs_page to cover the whole page. NFS_INO_INVALID_DATA is
still set on the file at that point, but that flag is ignored and
nfs_pageuptodate erroneously extends the write to cover the whole page,
with the write done on the server side filled in with zeroes.
This patch just has that function check for NFS_INO_INVALID_DATA in
addition to NFS_INO_REVAL_PAGECACHE. This fixes the bug, but looking
over the code, I wonder if we might have a similar bug in
nfs_revalidate_size(). The difference between those two flags is very
subtle, so it seems like we ought to be checking for
NFS_INO_INVALID_DATA in most of the places that we look for
NFS_INO_REVAL_PAGECACHE.
I believe this is regression introduced by commit 8d197a568. The code
did check for NFS_INO_INVALID_DATA prior to that patch.
Sven Wegener [Sat, 8 Dec 2012 14:30:18 +0000 (15:30 +0100)]
NFSv4: Check for buffer length in __nfs4_get_acl_uncached
Commit 1f1ea6c "NFSv4: Fix buffer overflow checking in
__nfs4_get_acl_uncached" accidently dropped the checking for too small
result buffer length.
If someone uses getxattr on "system.nfs4_acl" on an NFSv4 mount
supporting ACLs, the ACL has not been cached and the buffer suplied is
too short, we still copy the complete ACL, resulting in kernel and user
space memory corruption.
Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Trond Myklebust [Sun, 2 Dec 2012 18:54:59 +0000 (13:54 -0500)]
NFSv4.1: Try to eliminate outliers when updating target_highest_slotid
Look for sudden changes in the first and second derivatives in order
to eliminate outlier changes to target_highest_slotid (which are
due to out-of-order RPC replies).
Currently, the priority queues attempt to be 'fair' to lower priority
tasks by scheduling them after a certain number of higher priority tasks
have run. The problem is that both the transport send queue and
the NFSv4.1 session slot queue have strong ordering requirements.
This patch therefore removes the fairness code in favour of strong
ordering of task priorities.
Trond Myklebust [Thu, 29 Nov 2012 22:27:47 +0000 (17:27 -0500)]
NFSv4.1: Ensure smooth handover of slots from one task to the next waiting
Currently, we see a lot of bouncing for the value of highest_used_slotid
due to the fact that slots are getting freed, instead of getting instantly
transmitted to the next waiting task.
Trond Myklebust [Mon, 29 Oct 2012 23:02:20 +0000 (19:02 -0400)]
NFSv4: Clean up handling of privileged operations
Privileged rpc calls are those that are run by the state recovery thread,
in cases where we're trying to recover the system after a server reboot
or a network partition. In those cases, we want to fence off all other
rpc calls (see nfs4_begin_drain_session()) so that they don't end up
using stateids or clientids that are in the process of being recovered.
Prior to this patch, we had to set up special callback functions in
order to declare an rpc call as being privileged.
By adding a new field to the sequence arguments, this patch simplifies
things considerably, and allows us to declare the rpc call as privileged
before it is run.
Trond Myklebust [Thu, 1 Nov 2012 21:07:07 +0000 (17:07 -0400)]
NFSv4.1: Remove the 'FIFO' behaviour for nfs41_setup_sequence
It is more important to preserve the task priority behaviour, which ensures
that things like reclaim writes take precedence over background and kupdate
writes.
Trond Myklebust [Tue, 23 Oct 2012 00:28:44 +0000 (20:28 -0400)]
NFSv4.1: Simplify the sequence setup
Nobody calls nfs4_setup_sequence or nfs41_setup_sequence without
also calling rpc_call_start() on success. This commit therefore
folds the rpc_call_start call into nfs41_setup_sequence().
Trond Myklebust [Mon, 26 Nov 2012 21:16:54 +0000 (16:16 -0500)]
NFSv4.1: Ping server when our session table limits are too high
If the server requests a lower target_highest_slotid, then ensure
that we ping it with at least one RPC call containing an
appropriate SEQUENCE op. This ensures that the server won't need to
send a recall callback in order to shrink the slot table.
Trond Myklebust [Thu, 22 Nov 2012 03:34:45 +0000 (22:34 -0500)]
NFSv4.1: Set the maximum slot table size to 1024 slots
This means that we end up statically allocating 128 bytes for the
bitmap on each slot table.
For a server that supports 1MB write and read I/O sizes this means
that we can completely fill the maximum 1GB TCP send/receive
windows.
Trond Myklebust [Mon, 26 Nov 2012 18:13:29 +0000 (13:13 -0500)]
NFSv4: Move nfs4_wait_clnt_recover and nfs4_client_recover_expired_lease
nfs4_wait_clnt_recover and nfs4_client_recover_expired_lease are both
generic state related functions. As such, they belong in nfs4state.c,
and not nfs4proc.c
Trond Myklebust [Fri, 23 Nov 2012 18:09:38 +0000 (13:09 -0500)]
NFSv4.1: Clean up session draining
Coalesce nfs4_check_drain_bc_complete and nfs4_check_drain_fc_complete
into a single function that can be called when the slot table is known
to be empty, then change nfs4_callback_free_slot() and nfs4_free_slot()
to use it.
Trond Myklebust [Thu, 22 Nov 2012 18:21:02 +0000 (13:21 -0500)]
NFSv4.1: If slot allocation fails due to OOM, retry more quickly
If the NFSv4.1 session slot allocation fails due to an ENOMEM condition,
then set the task->tk_timeout to 1/4 second to ensure that we do retry
the slot allocation more quickly.
Trond Myklebust [Wed, 21 Nov 2012 14:06:11 +0000 (09:06 -0500)]
NFSv4.1: CB_RECALL_SLOT must schedule a sequence op after updating targets
RFC5661 requires us to make sure that the server knows we've updated
our slot table size by sending at least one SEQUENCE op containing the
new 'highest_slotid' value.
We can do so using the 'CHECK_LEASE' functionality of the state
manager.
Trond Myklebust [Wed, 21 Nov 2012 01:12:38 +0000 (20:12 -0500)]
NFSv4.1: Remove the state manager code to resize the slot table
The state manager no longer needs any special machinery to stop the
session flow and resize the slot table. It is all done on the fly by
the SEQUENCE op code now.
Trond Myklebust [Tue, 20 Nov 2012 23:10:30 +0000 (18:10 -0500)]
NFSv4.1: Reset the sequence number for slots that have been deallocated
When the server tells us that it is dynamically resizing the session
replay cache, we should reset the sequence number for those slots
that have been deallocated.
Trond Myklebust [Tue, 20 Nov 2012 17:49:27 +0000 (12:49 -0500)]
NFSv4.1: Ensure that the client tracks the server target_highest_slotid
Dynamic slot allocation in NFSv4.1 depends on the client being able to
track the server's target value for the highest slotid in the
slot table. See the reference in Section 2.10.6.1 of RFC5661.
To avoid ordering problems in the case where 2 SEQUENCE replies contain
conflicting updates to this target value, we also introduce a generation
counter, to track whether or not an RPC containing a SEQUENCE operation
was launched before or after the last update.
Also rename the nfs4_slot_table target_max_slots field to
'target_highest_slotid' to avoid confusion with a slot
table size or number of slots.
Trond Myklebust [Fri, 16 Nov 2012 21:10:11 +0000 (16:10 -0500)]
NFSv4.1: Simplify slot allocation
Clean up the NFSv4.1 slot allocation by replacing nfs_find_slot() with
a function nfs_alloc_slot() that returns a pointer to the nfs4_slot
instead of an offset into the slot table.
Yanchuan Nian [Mon, 12 Nov 2012 01:27:37 +0000 (09:27 +0800)]
nfs: Fix wrong slab cache in nfs_commit_mempool
The slab cache in nfs_commit_mempool is wrong, and I think it is just a slip.
I tested it on a x86-32 machine, the size of nfs_write_header is 544, and
the size of nfs_commit_data is 408, so it works fine. It is also true that
sizeof(struct nfs_write_header) > sizeof(struct nfs_commit_data) on other
platforms in my opinoin. Just fix it.
Jim Rees [Fri, 16 Nov 2012 23:12:06 +0000 (18:12 -0500)]
NFS: Reduce stack use in encode_exchange_id()
encode_exchange_id() uses more stack space than necessary, giving a compile
time warning. Reduce the size of the static buffer for implementation name.
Trond Myklebust [Tue, 20 Nov 2012 16:13:12 +0000 (11:13 -0500)]
NFSv4.1: We must bump the clientid sequence number after CREATE_SESSION
We must always bump the clientid sequence number after a successful
call to CREATE_SESSION on the server. The result of
nfs4_verify_channel_attrs() is irrelevant to that requirement.
Trond Myklebust [Tue, 20 Nov 2012 15:53:39 +0000 (10:53 -0500)]
NFSv4.1: Adjust CREATE_SESSION arguments when mounting a new filesystem
If we're mounting a new filesystem, ensure that the session has negotiated
large enough request and reply sizes to match the wsize and rsize mount
arguments.
Trond Myklebust [Wed, 21 Nov 2012 14:22:14 +0000 (09:22 -0500)]
NFSv4.1: Handle session reset and bind_conn_to_session before lease check
We can't send a SEQUENCE op unless the session is OK, so it is pointless
to handle the CHECK_LEASE state before we've dealt with SESSION_RESET
and BIND_CONN_TO_SESSION.
Bryan Schumaker [Mon, 12 Nov 2012 21:55:38 +0000 (16:55 -0500)]
NFS: Add sequence_priviliged_ops for nfs4_proc_sequence()
If I mount an NFS v4.1 server to a single client multiple times and then
run xfstests over each mountpoint I usually get the client into a state
where recovery deadlocks. The server informs the client of a
cb_path_down sequence error, the client then does a
bind_connection_to_session and checks the status of the lease.
I found that bind_connection_to_session sets the NFS4_SESSION_DRAINING
flag on the client, but this flag is never unset before
nfs4_check_lease() reaches nfs4_proc_sequence(). This causes the client
to deadlock, halting all NFS activity to the server. nfs4_proc_sequence()
is only called by the state manager, so I can change it to run in privileged
mode to bypass the NFS4_SESSION_DRAINING check and avoid the deadlock.
Trond Myklebust [Thu, 8 Nov 2012 15:01:26 +0000 (10:01 -0500)]
SUNRPC: Fix validity issues with rpc_pipefs sb->s_fs_info
rpc_kill_sb() must defer calling put_net() until after the notifier
has been called, since most (all?) of the notifier callbacks assume
that sb->s_fs_info points to a valid net namespace. It also must not
call put_net() if the call to rpc_fill_super was unsuccessful.
Pull gfs2 fixes from Steven Whitehouse:
"Here are a number of GFS2 bug fixes. There are three from Andy Price
which fix various issues spotted by automated code analysis. There
are two from Lukas Czerner fixing my mistaken assumptions as to how
FITRIM should work. Finally Ben Marzinski has fixed a bug relating to
mmap and atime and also a bug relating to a locking issue in the
transaction code."
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: Test bufdata with buffer locked and gfs2_log_lock held
GFS2: Don't call file_accessed() with a shared glock
GFS2: Fix FITRIM argument handling
GFS2: Require user to provide argument for FITRIM
GFS2: Clean up some unused assignments
GFS2: Fix possible null pointer deref in gfs2_rs_alloc
GFS2: Fix an unchecked error from gfs2_rs_alloc
GFS2: Test bufdata with buffer locked and gfs2_log_lock held
In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
buffer without having the gfs2_log_lock held. It was then assuming it would
stay attached for the rest of the function. However, without either the log
lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
time. This patch moves the locking before the test. If there isn't a bd
already attached, gfs2 can safely allocate one and attach it before locking.
There is no way that the newly allocated bd could be on the ail list,
and thus no way for __gfs2_ail_flush() to detach it.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
GFS2: Don't call file_accessed() with a shared glock
file_accessed() was being called by gfs2_mmap() with a shared glock. If it
needed to update the atime, it was crashing because it dirtied the inode in
gfs2_dirty_inode() without holding an exclusive lock. gfs2_dirty_inode()
checked if the caller was already holding a glock, but it didn't make sure that
the glock was in the exclusive state. Now, instead of calling file_accessed()
while holding the shared lock in gfs2_mmap(), file_accessed() is called after
grabbing and releasing the glock to update the inode. If file_accessed() needs
to update the atime, it will grab an exclusive lock in gfs2_dirty_inode().
gfs2_dirty_inode() now also checks to make sure that if the calling process has
already locked the glock, it has an exclusive lock.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Lukas Czerner [Tue, 16 Oct 2012 09:39:08 +0000 (11:39 +0200)]
GFS2: Fix FITRIM argument handling
Currently implementation in gfs2 uses FITRIM arguments as it were in
file system blocks units which is wrong. The FITRIM arguments
(fstrim_range.start, fstrim_range.len and fstrim_range.minlen) are
actually in bytes.
Moreover, check for start argument beyond the end of file system, len
argument being smaller than file system block and minlen argument being
bigger than biggest resource group were missing.
This commit converts the code to convert FITRIM argument to file system
blocks and also adds appropriate checks mentioned above.
All the problems were recognised by xfstests 251 and 260.
Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Lukas Czerner [Tue, 16 Oct 2012 09:39:07 +0000 (11:39 +0200)]
GFS2: Require user to provide argument for FITRIM
When the fstrim_range argument is not provided by user in FITRIM ioctl
we should just return EFAULT and not promoting bad behaviour by filling
the structure in kernel. Let the user deal with it.
Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Andrew Price [Fri, 12 Oct 2012 15:45:09 +0000 (16:45 +0100)]
GFS2: Fix possible null pointer deref in gfs2_rs_alloc
Despite the return value from kmem_cache_zalloc() being checked, the
error wasn't being returned until after a possible null pointer
dereference. This patch returns the error immediately, allowing the
removal of the error variable.
Signed-off-by: Andrew Price <anprice@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Linus Torvalds [Wed, 7 Nov 2012 03:16:41 +0000 (04:16 +0100)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A single radeon typo fix for a regressions and two fixes for a
regression in the open helper address space stuff."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/radeon: fix typo in evergreen_mc_resume()
drm: set dev_mapping before calling drm_open_helper
drm: restore open_count if drm_setup fails
Linus Torvalds [Wed, 7 Nov 2012 03:14:45 +0000 (04:14 +0100)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull arm fixes from Russell King:
"Not much here again.
The two most notable things here are the sched_clock() fix, which was
causing problems with the scheduling of threaded IRQs after a suspend
event, and the vfp fix, which afaik has only been seen on some older
OMAP boards. Nevertheless, both are fairly important fixes."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7569/1: mm: uninitialized warning corrections
ARM: 7567/1: io: avoid GCC's offsettable addressing modes for halfword accesses
ARM: 7566/1: vfp: fix save and restore when running on pre-VFPv3 and CONFIG_VFPv3 set
ARM: 7565/1: sched: stop sched_clock() during suspend
Alex Deucher [Mon, 5 Nov 2012 16:34:58 +0000 (16:34 +0000)]
drm/radeon: fix typo in evergreen_mc_resume()
Add missing index that may have led us to enabling
more crtcs than necessary.
May also fix:
https://bugs.freedesktop.org/show_bug.cgi?id=56139
Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Jean Delvare [Mon, 5 Nov 2012 20:54:40 +0000 (21:54 +0100)]
hwmon: Fix chip feature table headers
These got broken by recent patches fixing checkpatch warnings in these
drivers. The trick is that the patches themselves looked good, but the
source files after applying them do not. That's why I am not a big fan
of using tabs inside comments.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Guenter Roeck <linux@roeck-us.net>
Jean Delvare [Mon, 5 Nov 2012 20:54:39 +0000 (21:54 +0100)]
hwmon: (w83627ehf) Force initial bank selection
Don't assume bank 0 is selected at device probe time. This may not be
the case. Force bank selection at first register access to guarantee
that we read the right registers upon driver loading.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org
Replace multiple BUG_ON() calls with WARN_ON_ONCE() and early return when
sanity checking socket ownership (lock). The bind call will fail if the
socket was unsuccessfully reclassified.
Replace BUG_ON() with WARN_ON_ONCE(). The error condition is a simple
ref counting sanity check and the following code will not free anything
until final put.
Replace BUG_ON() with WARN_ON_ONCE() - rpc_run_bc_task calls rpc_init_task()
then increments the tk_count, so this is a simple sanity check that
if hit once would hit every time this code path is executed.
Trond Myklebust [Mon, 15 Oct 2012 21:14:38 +0000 (17:14 -0400)]
lockd: Remove unnecessary BUG_ON()s in the xdr client code
- Offset bound checks are done in the NFS client code.
- So are filehandle size checks
- The cookie length is a constant
- The utsname()->nodename is already bounded