git.karo-electronics.de Git - karo-tx-linux.git/log

]> git.karo-electronics.de Git - karo-tx-linux.git/log

Jeff Layton [Tue, 11 Dec 2012 17:10:11 +0000 (12:10 -0500)]

vfs: have do_sys_truncate retry once on an ESTALE error

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Tue, 11 Dec 2012 17:10:10 +0000 (12:10 -0500)]

vfs: fix renameat to retry on ESTALE errors

...as always, rename is the messiest of the bunch. We have to track
whether to retry or not via a separate flag since the error handling
is already quite complex.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Thu, 20 Dec 2012 21:38:04 +0000 (16:38 -0500)]

vfs: make do_unlinkat retry once on ESTALE errors

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Thu, 20 Dec 2012 21:28:33 +0000 (16:28 -0500)]

vfs: make do_rmdir retry once on ESTALE errors

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Tue, 11 Dec 2012 17:10:09 +0000 (12:10 -0500)]

vfs: add a flags argument to user_path_parent

...so we can pass in LOOKUP_REVAL. For now, nothing does yet.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Thu, 20 Dec 2012 21:15:38 +0000 (16:15 -0500)]

vfs: fix linkat to retry once on ESTALE errors

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Tue, 11 Dec 2012 17:10:08 +0000 (12:10 -0500)]

vfs: fix symlinkat to retry on ESTALE errors

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Thu, 20 Dec 2012 21:04:09 +0000 (16:04 -0500)]

vfs: fix mkdirat to retry once on an ESTALE error

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Thu, 20 Dec 2012 21:00:10 +0000 (16:00 -0500)]

vfs: fix mknodat to retry on ESTALE errors

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Tue, 11 Dec 2012 17:10:06 +0000 (12:10 -0500)]

vfs: turn is_dir argument to kern_path_create into a lookup_flags arg

Where we can pass in LOOKUP_DIRECTORY or LOOKUP_REVAL. Any other flags
passed in here are currently ignored.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Tue, 11 Dec 2012 17:10:06 +0000 (12:10 -0500)]

vfs: fix readlinkat to retry on ESTALE

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Tue, 11 Dec 2012 17:10:05 +0000 (12:10 -0500)]

vfs: make fstatat retry on ESTALE errors from getattr call

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Thu, 20 Dec 2012 19:59:40 +0000 (14:59 -0500)]

vfs: add a retry_estale helper function to handle retries on ESTALE

This function is expected to be called from path-based syscalls to help
them decide whether to try the lookup and call again in the event that
they got an -ESTALE return back on an earier try.

Currently, we only retry the call once on an ESTALE error, but in the
event that we decide that that's not enough in the future, we should be
able to change the logic in this helper without too much effort.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Thu, 20 Dec 2012 23:49:14 +0000 (18:49 -0500)]

Merge branch 'fscache' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into for-linus

commit | commitdiff | tree

NeilBrown [Fri, 9 Nov 2012 00:09:37 +0000 (16:09 -0800)]

vfs: d_obtain_alias() needs to use "/" as default name.

NFS appears to use d_obtain_alias() to create the root dentry rather than
d_make_root.  This can cause 'prepend_path()' to complain that the root
has a weird name if an NFS filesystem is lazily unmounted.  e.g.  if
"/mnt" is an NFS mount then

{ cd /mnt; umount -l /mnt ; ls -l /proc/self/cwd; }

will cause a WARN message like
   WARNING: at /home/git/linux/fs/dcache.c:2624 prepend_path+0x1d7/0x1e0()
   ...
   Root dentry has weird name <>

to appear in kernel logs.

So change d_obtain_alias() to use "/" rather than "" as the anonymous
name.

Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Alessio Igor Bogani [Thu, 13 Dec 2012 11:22:39 +0000 (12:22 +0100)]

vfs: Remove useless function prototypes

Commit 8e22cc88d68ca1a46d7d582938f979eb640ed30f removes the (un)lock_super
function definitions but forgets to remove their prototypes.

Signed-off-by: Alessio Igor Bogani <abogani@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 11:00:38 +0000 (12:00 +0100)]

documentation: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 11:00:02 +0000 (12:00 +0100)]

mm: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:59:20 +0000 (11:59 +0100)]

vfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:58:36 +0000 (11:58 +0100)]

ntfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:57:37 +0000 (11:57 +0100)]

nilfs2: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:57:03 +0000 (11:57 +0100)]

ncpfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:56:25 +0000 (11:56 +0100)]

minix: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:55:42 +0000 (11:55 +0100)]

logfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:55:07 +0000 (11:55 +0100)]

hfsplus: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:54:25 +0000 (11:54 +0100)]

jfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:53:50 +0000 (11:53 +0100)]

hpfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

David Howells [Fri, 14 Dec 2012 11:02:22 +0000 (11:02 +0000)]

FS-Cache: Clear remaining page count on retrieval cancellation

Provide fscache_cancel_op() with a pointer to a function it should invoke under
lock if it cancels an operation.

Use this to clear the remaining page count upon cancellation of a pending
retrieval operation so that fscache_release_retrieval_op() doesn't get an
assertion failure (see below).  This can happen when a signal occurs, say from
CTRL-C being pressed during data retrieval.

FS-Cache: Assertion failed
3 == 0 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:237!
invalid opcode: 0000 [#641] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 6075, comm: slurp-q Tainted: GF     D      3.7.0-rc8-fsdevel+ #411                  /DG965RY
RIP: 0010:[<ffffffffa007f328>]  [<ffffffffa007f328>] fscache_release_retrieval_op+0x75/0xff [fscache]
RSP: 0000:ffff88001c6d7988  EFLAGS: 00010296
RAX: 000000000000000f RBX: ffff880014cdfe00 RCX: ffffffff6c102000
RDX: ffffffff8102d1ad RSI: ffffffff6c102000 RDI: ffffffff8102d1d6
RBP: ffff88001c6d7998 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000fffffe00
R13: ffff88001c6d7ab4 R14: ffff88001a8638a0 R15: ffff88001552b190
FS:  00007f877aaf0700(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff11378fd2 CR3: 000000001c6c6000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process slurp-q (pid: 6075, threadinfo ffff88001c6d6000, task ffff88001c6c4080)
Stack:
ffffffffa007ec07 ffff880014cdfe00 ffff88001c6d79c8 ffffffffa007db4d
ffffffffa007ec07 ffff880014cdfe00 00000000fffffe00 ffff88001c6d7ab4
ffff88001c6d7a38 ffffffffa008116d 0000000000000000 ffff88001c6c4080
Call Trace:
[<ffffffffa007ec07>] ? fscache_cancel_op+0x194/0x1cf [fscache]
[<ffffffffa007db4d>] fscache_put_operation+0x135/0x2ed [fscache]
[<ffffffffa007ec07>] ? fscache_cancel_op+0x194/0x1cf [fscache]
[<ffffffffa008116d>] __fscache_read_or_alloc_pages+0x413/0x4bc [fscache]
[<ffffffff810ac8ae>] ? __alloc_pages_nodemask+0x195/0x75c
[<ffffffffa00aab0f>] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
[<ffffffffa00a5fe0>] nfs_readpages+0x186/0x1bd [nfs]
[<ffffffff810d23c8>] ? alloc_pages_current+0xc7/0xe4
[<ffffffff810a68b5>] ? __page_cache_alloc+0x84/0x91
[<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<ffffffff810afaa3>] __do_page_cache_readahead+0x237/0x2e0
[<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<ffffffff810afe3e>] ra_submit+0x1c/0x20
[<ffffffff810b019b>] ondemand_readahead+0x359/0x382
[<ffffffff810b0279>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff810a77b5>] generic_file_aio_read+0x26b/0x637
[<ffffffffa00f1852>] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
[<ffffffffa009cc85>] nfs_file_read+0xaa/0xcf [nfs]
[<ffffffff810db5b3>] do_sync_read+0x91/0xd1
[<ffffffff810dbb8b>] vfs_read+0x9b/0x144
[<ffffffff810dbc78>] sys_read+0x44/0x75
[<ffffffff81422892>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Thu, 13 Dec 2012 20:03:13 +0000 (20:03 +0000)]

FS-Cache: Mark cancellation of in-progress operation

Mark as cancelled an operation that is in progress rather than pending at the
time it is cancelled, and call fscache_complete_op() to cancel an operation so
that blocked ops can be started.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Fri, 7 Dec 2012 10:41:26 +0000 (10:41 +0000)]

FS-Cache: One of the write operation paths doesn't set the object state

In fscache_write_op(), if the object is determined to have become inactive or
to have lost its cookie, we don't move the operation state from in-progress,
and so an assertion in fscache_put_operation() fails with an assertion (see
below).

Instrumenting fscache_op_work_func() indicates that it called
fscache_write_op() before calling fscache_put_operation() - where the assertion
failed.  The assertion at line 433 indicates that the operation state is
IN_PROGRESS rather than being COMPLETE or CANCELLED.

Instrumenting fscache_write_op() showed that it was being called on an object
that had had its cookie removed and that this was due to relinquishment of the
cookie by the netfs.  At this point fscache no longer has access to the pages
of netfs data that were requested to be written, and so simply cancelling the
operation is the thing to do.

FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:433!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 0
Pid: 1035, comm: kworker/u:3 Tainted: GF            3.7.0-rc8-fsdevel+ #411                  /DG965RY
RIP: 0010:[<ffffffffa007db22>]  [<ffffffffa007db22>] fscache_put_operation+0x11a/0x2ed [fscache]
RSP: 0018:ffff88003e32bcf8  EFLAGS: 00010296
RAX: 000000000000000f RBX: ffff88001818eb78 RCX: ffffffff6c102000
RDX: ffffffff8102d1ad RSI: ffffffff6c102000 RDI: ffffffff8102d1d6
RBP: ffff88003e32bd18 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa00811da
R13: 0000000000000001 R14: 0000000100625d26 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff7dd31c68 CR3: 000000003d730000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:3 (pid: 1035, threadinfo ffff88003e32a000, task ffff88003bb38080)
Stack:
ffffffff8102d1ad ffff88001818eb78 ffffffffa00811da 0000000000000001
ffff88003e32bd48 ffffffffa007f0ad ffff88001818eb78 ffffffff819583c0
ffff88003df24e00 ffff88003882c3e0 ffff88003e32bde8 ffffffff81042de0
Call Trace:
[<ffffffff8102d1ad>] ? vprintk_emit+0x3c6/0x41a
[<ffffffffa00811da>] ? __fscache_read_or_alloc_pages+0x4bc/0x4bc [fscache]
[<ffffffffa007f0ad>] fscache_op_work_func+0xec/0x123 [fscache]
[<ffffffff81042de0>] process_one_work+0x21c/0x3b0
[<ffffffff81042d82>] ? process_one_work+0x1be/0x3b0
[<ffffffffa007efc1>] ? fscache_operation_gc+0x23e/0x23e [fscache]
[<ffffffff8104332e>] worker_thread+0x202/0x2df
[<ffffffff8104312c>] ? rescuer_thread+0x18e/0x18e
[<ffffffff81047c1c>] kthread+0xd0/0xd8
[<ffffffff81421bfa>] ? _raw_spin_unlock_irq+0x29/0x3e
[<ffffffff81047b4c>] ? __init_kthread_worker+0x55/0x55
[<ffffffff814227ec>] ret_from_fork+0x7c/0xb0
[<ffffffff81047b4c>] ? __init_kthread_worker+0x55/0x55

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Fri, 7 Dec 2012 18:08:02 +0000 (18:08 +0000)]

FS-Cache: Fix signal handling during waits

wait_on_bit() with TASK_INTERRUPTIBLE returns 1 rather than a negative error
code, so change what we check for.  This means that the signal handling in
fscache_wait_for_retrieval_activation()  should now work properly.

Without this, the following bug can be seen if CTRL-C is pressed during
fscache read operation:

FS-Cache: Assertion failed
2 == 3 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:347!
invalid opcode: 0000 [#1] SMP
Modules linked in: cachefiles(F) nfsv4(F) nfsv3(F) nfsv2(F) nfs(F) fscache(F) auth_rpcgss(F) nfs_acl(F) lockd(F) sunrpc(F)
CPU 1
Pid: 15006, comm: slurp-q Tainted: GF            3.7.0-rc8-fsdevel+ #411                  /DG965RY
RIP: 0010:[<ffffffffa007fcb4>]  [<ffffffffa007fcb4>] fscache_wait_for_retrieval_activation+0x167/0x177 [fscache]
RSP: 0018:ffff88002a4c39a8  EFLAGS: 00010292
RAX: 000000000000001a RBX: ffff88002d3dc158 RCX: 0000000000008685
RDX: ffffffff8102ccd6 RSI: 0000000000000001 RDI: ffffffff8102d1d6
RBP: ffff88002a4c39c8 R08: 0000000000000002 R09: 0000000000000000
R10: ffffffff8163afa0 R11: ffff88003bd11900 R12: ffffffffa00868c8
R13: ffff880028306458 R14: ffff88002d3dc1b0 R15: ffff88001372e538
FS:  00007f17426a0700(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f1742494a44 CR3: 0000000031bd7000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process slurp-q (pid: 15006, threadinfo ffff88002a4c2000, task ffff880023de3040)
Stack:
ffff88002d3dc158 ffff88001372e538 ffff88002a4c3ab4 ffff8800283064e0
ffff88002a4c3a38 ffffffffa0080f6d 0000000000000000 ffff880023de3040
ffff88002a4c3ac8 ffffffff810ac8ae ffff880028306458 ffff88002a4c3bc8
Call Trace:
[<ffffffffa0080f6d>] __fscache_read_or_alloc_pages+0x24f/0x4bc [fscache]
[<ffffffff810ac8ae>] ? __alloc_pages_nodemask+0x195/0x75c
[<ffffffffa00aab0f>] __nfs_readpages_from_fscache+0x86/0x13d [nfs]
[<ffffffffa00a5fe0>] nfs_readpages+0x186/0x1bd [nfs]
[<ffffffff810d23c8>] ? alloc_pages_current+0xc7/0xe4
[<ffffffff810a68b5>] ? __page_cache_alloc+0x84/0x91
[<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<ffffffff810afaa3>] __do_page_cache_readahead+0x237/0x2e0
[<ffffffff810af912>] ? __do_page_cache_readahead+0xa6/0x2e0
[<ffffffff810afe3e>] ra_submit+0x1c/0x20
[<ffffffff810b019b>] ondemand_readahead+0x359/0x382
[<ffffffff810b0279>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff810a77b5>] generic_file_aio_read+0x26b/0x637
[<ffffffffa00f1852>] ? nfs_mark_delegation_referenced+0xb/0xb [nfsv4]
[<ffffffffa009cc85>] nfs_file_read+0xaa/0xcf [nfs]
[<ffffffff810db5b3>] do_sync_read+0x91/0xd1
[<ffffffff810dbb8b>] vfs_read+0x9b/0x144
[<ffffffff810dbc78>] sys_read+0x44/0x75
[<ffffffff81422892>] system_call_fastpath+0x16/0x1b

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 5 Dec 2012 16:31:49 +0000 (16:31 +0000)]

NFS4: Open files for fscaching

nfs4_file_open() should open files for fscaching.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 5 Dec 2012 13:34:49 +0000 (13:34 +0000)]

FS-Cache: Add transition to handle invalidate immediately after lookup

Add a missing transition to the FS-Cache object state machine to handle an
invalidation event occuring between the back end completing the object lookup
by calling fscache_obtained_object() (which moves to state OBJECT_AVAILABLE)
and the backend returning to fscache_lookup_object() and thence to
fscache_object_state_machine() which then does a goto lookup_transit to handle
the transition - but lookup_transit doesn't handle EV_INVALIDATE.

Without this, the following BUG can be logged:

FS-Cache: Unsupported event 2 [5/f7] in state OBJECT_AVAILABLE
------------[ cut here ]------------
kernel BUG at fs/fscache/object.c:357!

Where event 2 is EV_INVALIDATE.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 5 Dec 2012 13:34:49 +0000 (13:34 +0000)]

NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page

nfs_migrate_page() does not wait for FS-Cache to finish with a page, probably
leading to the following bad-page-state:

BUG: Bad page state in process python-bin  pfn:17d39b
page:ffffea00053649e8 flags:004000000000100c count:0 mapcount:0 mapping:(null)
index:38686 (Tainted: G    B      ---------------- )
Pid: 31053, comm: python-bin Tainted: G    B      ----------------
2.6.32-71.24.1.el6.x86_64 #1
Call Trace:
[<ffffffff8111bfe7>] bad_page+0x107/0x160
[<ffffffff8111ee69>] free_hot_cold_page+0x1c9/0x220
[<ffffffff8111ef19>] __pagevec_free+0x59/0xb0
[<ffffffff8104b988>] ? flush_tlb_others_ipi+0x128/0x130
[<ffffffff8112230c>] release_pages+0x21c/0x250
[<ffffffff8115b92a>] ? remove_migration_pte+0x28a/0x2b0
[<ffffffff8115f3f8>] ? mem_cgroup_get_reclaim_stat_from_page+0x18/0x70
[<ffffffff81122687>] ____pagevec_lru_add+0x167/0x180
[<ffffffff811226f8>] __lru_cache_add+0x58/0x70
[<ffffffff81122731>] lru_cache_add_lru+0x21/0x40
[<ffffffff81123f49>] putback_lru_page+0x69/0x100
[<ffffffff8115c0bd>] migrate_pages+0x13d/0x5d0
[<ffffffff81122687>] ? ____pagevec_lru_add+0x167/0x180
[<ffffffff81152ab0>] ? compaction_alloc+0x0/0x370
[<ffffffff8115255c>] compact_zone+0x4cc/0x600
[<ffffffff8111cfac>] ? get_page_from_freelist+0x15c/0x820
[<ffffffff810672f4>] ? check_preempt_wakeup+0x1c4/0x3c0
[<ffffffff8115290e>] compact_zone_order+0x7e/0xb0
[<ffffffff81152a49>] try_to_compact_pages+0x109/0x170
[<ffffffff8111e94d>] __alloc_pages_nodemask+0x5ed/0x850
[<ffffffff814c9136>] ? thread_return+0x4e/0x778
[<ffffffff81150d43>] alloc_pages_vma+0x93/0x150
[<ffffffff81167ea5>] do_huge_pmd_anonymous_page+0x135/0x340
[<ffffffff814cb6f6>] ? rwsem_down_read_failed+0x26/0x30
[<ffffffff81136755>] handle_mm_fault+0x245/0x2b0
[<ffffffff814ce383>] do_page_fault+0x123/0x3a0
[<ffffffff814cbdf5>] page_fault+0x25/0x30

nfs_migrate_page() calls nfs_fscache_release_page() which doesn't actually wait
- even if __GFP_WAIT is set.  The reason that doesn't wait is that
fscache_maybe_release_page() might deadlock the allocator as the work threads
writing to the cache may all end up sleeping on memory allocation.

However, I wonder if that is actually a problem.  There are a number of things
I can do to deal with this:

(1) Make nfs_migrate_page() wait.

(2) Make fscache_maybe_release_page() honour the __GFP_WAIT flag.

(3) Set a timeout around the wait.

(4) Make nfs_migrate_page() return an error if the page is still busy.

For the moment, I'll select (2) and (4).

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 5 Dec 2012 13:34:48 +0000 (13:34 +0000)]

FS-Cache: Exclusive op submission can BUG if there's been an I/O error

The function to submit an exclusive op (fscache_submit_exclusive_op()) can BUG
if there's been an I/O error because it may see the parent cache object in an
unexpected state.  It should only BUG if there hasn't been an I/O error.

In this case the problem was produced by remounting the cache partition to be
R/O.  The EROFS state was detected and the cache was aborted, but not
everything handled the aborting correctly.

SysRq : Emergency Remount R/O
EXT4-fs (sda6): re-mounted. Opts: (null)
Emergency Remount complete
CacheFiles: I/O Error: Failed to update xattr with error -30
FS-Cache: Cache cachefiles stopped due to I/O error
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:128!
invalid opcode: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 6612, comm: kworker/u:2 Not tainted 3.1.0-rc8-fsdevel+ #1093                  /DG965RY
RIP: 0010:[<ffffffffa00739c0>]  [<ffffffffa00739c0>] fscache_submit_exclusive_op+0x2ad/0x2c2 [fscache]
RSP: 0018:ffff880000853d40  EFLAGS: 00010206
RAX: ffff880038ac72a8 RBX: ffff8800181f2260 RCX: ffffffff81f2b2b0
RDX: 0000000000000001 RSI: ffffffff8179a478 RDI: ffff8800181f2280
RBP: ffff880000853d60 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff880038ac7268
R13: ffff8800181f2280 R14: ffff88003a359190 R15: 000000010122b162
FS:  0000000000000000(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000034cc4a77f0 CR3: 0000000010e96000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:2 (pid: 6612, threadinfo ffff880000852000, task ffff880014c3c040)
Stack:
ffff8800181f2260 ffff8800181f2310 ffff880038ac7268 ffff8800181f2260
ffff880000853dc0 ffffffffa0072375 ffff880037ecfe00 ffff88003a359198
ffff880000853dc0 0000000000000246 0000000000000000 ffff88000a91d308
Call Trace:
[<ffffffffa0072375>] fscache_object_work_func+0x792/0xe65 [fscache]
[<ffffffff81047e44>] process_one_work+0x1eb/0x37f
[<ffffffff81047de6>] ? process_one_work+0x18d/0x37f
[<ffffffffa0071be3>] ? fscache_enqueue_dependents+0xd8/0xd8 [fscache]
[<ffffffff810482e4>] worker_thread+0x15a/0x21a
[<ffffffff8104818a>] ? rescuer_thread+0x188/0x188
[<ffffffff8104bf96>] kthread+0x7f/0x87
[<ffffffff813ad6f4>] kernel_thread_helper+0x4/0x10
[<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
[<ffffffff813abd1d>] ? retint_restore_args+0xe/0xe
[<ffffffff8104bf17>] ? __init_kthread_worker+0x53/0x53
[<ffffffff813ad6f0>] ? gs_change+0xb/0xb

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 5 Dec 2012 13:34:48 +0000 (13:34 +0000)]

FS-Cache: Limit the number of I/O error reports for a cache

Limit the number of I/O error reports for a cache to 1 to prevent massive
amounts of noise. After the first I/O error the cache is taken off line
automatically, so must be restarted to resume caching.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 5 Dec 2012 13:34:47 +0000 (13:34 +0000)]

FS-Cache: Don't mask off the object event mask when printing it

Don't mask off the object event mask when printing it. That way it can be seen
if threre are bits set that shouldn't be.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 5 Dec 2012 13:34:46 +0000 (13:34 +0000)]

FS-Cache: Initialise the object event mask with the calculated mask

Initialise the object event mask with the calculated mask rather than unmasking
undefined events also.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 5 Dec 2012 13:34:46 +0000 (13:34 +0000)]

FS-Cache: Convert the object event ID #defines into an enum

Convert the fscache_object event IDs from #defines into an enum. Also add an
extra label to the enum to carry the event count and redefine the event mask
in terms of that.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Wed, 5 Dec 2012 13:34:45 +0000 (13:34 +0000)]

CacheFiles: Add missing retrieval completions

CacheFiles is missing some calls to fscache_retrieval_complete() in the error
handling/collision paths of its reader functions.

This can be seen by the following assertion tripping in fscache_put_operation()
whereby the operation being destroyed is still in the in-progress state and has
not been cancelled or completed:

FS-Cache: Assertion failed
3 == 5 is false
------------[ cut here ]------------
kernel BUG at fs/fscache/operation.c:408!
invalid opcode: 0000 [#1] SMP
CPU 2
Modules linked in: xfs ioatdma dca loop joydev evdev
psmouse dcdbas pcspkr serio_raw i5000_edac edac_core i5k_amb shpchp
pci_hotplug sg sr_mod]

Pid: 8062, comm: httpd Not tainted 3.1.0-rc8 #1 Dell Inc. PowerEdge 1950/0DT097
RIP: 0010:[<ffffffff81197b24>]  [<ffffffff81197b24>] fscache_put_operation+0x304/0x330
RSP: 0018:ffff880062f739d8  EFLAGS: 00010296
RAX: 0000000000000025 RBX: ffff8800c5122e84 RCX: ffffffff81ddf040
RDX: 00000000ffffffff RSI: 0000000000000082 RDI: ffffffff81ddef30
RBP: ffff880062f739f8 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000003 R12: ffff8800c5122e40
R13: ffff880037a2cd20 R14: ffff880087c7a058 R15: ffff880087c7a000
FS:  00007f63dcf636e0(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f0c0a91f000 CR3: 0000000062ec2000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process httpd (pid: 8062, threadinfo ffff880062f72000, task ffff880087e58000)
Stack:
ffff880062f73bf8 0000000000000000 ffff880062f73bf8 ffff880037a2cd20
ffff880062f73a68 ffffffff8119aa7e ffff88006540e000 ffff880062f73ad4
ffff88008e9a4308 ffff880037a2cd20 ffff880062f73a48 ffff8800c5122e40
Call Trace:
[<ffffffff8119aa7e>] __fscache_read_or_alloc_pages+0x1fe/0x530
[<ffffffff81250780>] __nfs_readpages_from_fscache+0x70/0x1c0
[<ffffffff8123142a>] nfs_readpages+0xca/0x1e0
[<ffffffff815f3c06>] ? rpc_do_put_task+0x36/0x50
[<ffffffff8122755b>] ? alloc_nfs_open_context+0x4b/0x110
[<ffffffff815ecd1a>] ? rpc_call_sync+0x5a/0x70
[<ffffffff810e7e9a>] __do_page_cache_readahead+0x1ca/0x270
[<ffffffff810e7f61>] ra_submit+0x21/0x30
[<ffffffff810e818d>] ondemand_readahead+0x11d/0x250
[<ffffffff810e83b6>] page_cache_sync_readahead+0x36/0x60
[<ffffffff810dffa4>] generic_file_aio_read+0x454/0x770
[<ffffffff81224ce1>] nfs_file_read+0xe1/0x130
[<ffffffff81121bd9>] do_sync_read+0xd9/0x120
[<ffffffff8114088f>] ? mntput+0x1f/0x40
[<ffffffff811238cb>] ? fput+0x1cb/0x260
[<ffffffff81122938>] vfs_read+0xc8/0x180
[<ffffffff81122af5>] sys_read+0x55/0x90

Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Thu, 20 Dec 2012 21:52:38 +0000 (21:52 +0000)]

NFS: Use FS-Cache invalidation

Use the new FS-Cache invalidation facility from NFS to deal with foreign
changes being detected on the server rather than attempting to retire the old
cookie and get a new one.

The problem with the old method was that NFS did not wait for all outstanding
storage and retrieval ops on the cache to complete.  There was no automatic
wait between the calls to ->readpages() and calls to invalidate_inode_pages2()
as the latter can only wait on locked pages that have been added to the
pagecache (which they haven't yet on entry to ->readpages()).

This was leading to oopses like the one below when an outstanding read got cut
off from its cookie by a premature release.

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
PGD 15889067 PUD 15890067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064                  /DG965RY
RIP: 0010:[<ffffffffa0075118>]  [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
RSP: 0018:ffff8800158799e8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0
RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90
RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002
R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4
R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8
FS:  00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040)
Stack:
ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508
ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8
ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040
Call Trace:
[<ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs]
[<ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs]
[<ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662
[<ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs]
[<ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267
[<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
[<ffffffff81098d7b>] ra_submit+0x1c/0x20
[<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
[<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
[<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
[<ffffffff810c22c4>] do_sync_read+0xba/0xfa
[<ffffffff810a62c9>] ? might_fault+0x4e/0x9e
[<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
[<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
[<ffffffff810c29a4>] vfs_read+0xaa/0x13a
[<ffffffff810c2a79>] sys_read+0x45/0x6c
[<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b

Reported-by: Mark Moseley <moseleymark@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Thu, 20 Dec 2012 21:52:36 +0000 (21:52 +0000)]

CacheFiles: Implement invalidation

Implement invalidation for CacheFiles.  This is in two parts:

(1) Provide an invalidation method (which just truncates the backing file).

(2) Abort attempts to copy anything read from the backing file whilst
     invalidation is in progress.

Question: CacheFiles uses truncation in a couple of places.  It has been using
notify_change() rather than sys_truncate() or something similar.  This means
it bypasses a bunch of checks and suchlike that it possibly should be making
(security, file locking, lease breaking, vfsmount write).  Should it be using
vfs_truncate() as added by a preceding patch or should it use notify_write()
and assume that anyone poking around in the cache files on disk gets
everything they deserve?

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Thu, 20 Dec 2012 21:52:36 +0000 (21:52 +0000)]

VFS: Make more complete truncate operation available to CacheFiles

Make a more complete truncate operation available to CacheFiles (including
security checks and suchlike) so that it can use this to clear invalidated
cache files.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

David Howells [Thu, 20 Dec 2012 21:52:36 +0000 (21:52 +0000)]

FS-Cache: Provide proper invalidation

Provide a proper invalidation method rather than relying on the netfs retiring
the cookie it has and getting a new one. The problem with this is that isn't
easy for the netfs to make sure that it has completed/cancelled all its
outstanding storage and retrieval operations on the cookie it is retiring.

Instead, have the cache provide an invalidation method that will cancel or wait
for all currently outstanding operations before invalidating the cache, and
will cause new operations to queue up behind that. Whilst invalidation is in
progress, some requests will be rejected until the cache can stack a barrier on
the operation queue to cause new operations to be deferred behind it.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Thu, 20 Dec 2012 21:52:35 +0000 (21:52 +0000)]

FS-Cache: Fix operation state management and accounting

Fix the state management of internal fscache operations and the accounting of
what operations are in what states.

This is done by:

(1) Give struct fscache_operation a enum variable that directly represents the
     state it's currently in, rather than spreading this knowledge over a bunch
     of flags, who's processing the operation at the moment and whether it is
     queued or not.

     This makes it easier to write assertions to check the state at various
     points and to prevent invalid state transitions.

(2) Add an 'operation complete' state and supply a function to indicate the
     completion of an operation (fscache_op_complete()) and make things call
     it.  The final call to fscache_put_operation() can then check that an op
     in the appropriate state (complete or cancelled).

(3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better
     govern the state of an object:

(a) The ->n_ops is now the number of extant operations on the object
    and is now decremented by fscache_put_operation() only.

(b) The ->n_in_progress is simply the number of objects that have been
    taken off of the object's pending queue for the purposes of being
    run.  This is decremented by fscache_op_complete() only.

(c) The ->n_exclusive is the number of exclusive ops that have been
    submitted and queued or are in progress.  It is decremented by
    fscache_op_complete() and by fscache_cancel_op().

     fscache_put_operation() and fscache_operation_gc() now no longer try to
     clean up ->n_exclusive and ->n_in_progress.  That was leading to double
     decrements against fscache_cancel_op().

     fscache_cancel_op() now no longer decrements ->n_ops.  That was leading to
     double decrements against fscache_put_operation().

     fscache_submit_exclusive_op() now decides whether it has to queue an op
     based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter
     will persist in being true even after all preceding operations have been
     cancelled or completed.  Furthermore, if an object is active and there are
     runnable ops against it, there must be at least one op running.

(4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and
     provide a function to record completion of the pages as they complete.

     When n_pages reaches 0, the operation is deemed to be complete and
     fscache_op_complete() is called.

     Add calls to fscache_retrieval_complete() anywhere we've finished with a
     page we've been given to read or allocate for.  This includes places where
     we just return pages to the netfs for reading from the server and where
     accessing the cache fails and we discard the proposed netfs page.

The bugs in the unfixed state management manifest themselves as oopses like the
following where the operation completion gets out of sync with return of the
cookie by the netfs.  This is possible because the cache unlocks and returns
all the netfs pages before recording its completion - which means that there's
nothing to stop the netfs discarding them and returning the cookie.

FS-Cache: Cookie 'NFS.fh' still has outstanding reads
------------[ cut here ]------------
kernel BUG at fs/fscache/cookie.c:519!
invalid opcode: 0000 [#1] SMP
CPU 1
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090                  /DG965RY
RIP: 0010:[<ffffffffa007050a>]  [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache]
RSP: 0018:ffff8800368cfb00  EFLAGS: 00010282
RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000
RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c
RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000
R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98
R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370
FS:  0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040)
Stack:
ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0
ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0
ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91
Call Trace:
[<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs]
[<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs]
[<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs]
[<ffffffff810d8d47>] evict+0xa1/0x15c
[<ffffffff810d8e2e>] dispose_list+0x2c/0x38
[<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b
[<ffffffff810c56b7>] prune_super+0xd5/0x140
[<ffffffff8109b615>] shrink_slab+0x102/0x1ab
[<ffffffff8109d690>] balance_pgdat+0x2f2/0x595
[<ffffffff8103e009>] ? process_timeout+0xb/0xb
[<ffffffff8109dba3>] kswapd+0x270/0x289
[<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46
[<ffffffff8109d933>] ? balance_pgdat+0x595/0x595
[<ffffffff8104bf7a>] kthread+0x7f/0x87
[<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10
[<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0
[<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe
[<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53
[<ffffffff813ad6b0>] ? gs_change+0xb/0xb

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Thu, 20 Dec 2012 21:52:35 +0000 (21:52 +0000)]

FS-Cache: Make cookie relinquishment wait for outstanding reads

Make fscache_relinquish_cookie() log a warning and wait if there are any
outstanding reads left on the cookie it was given.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Thu, 20 Dec 2012 21:52:34 +0000 (21:52 +0000)]

CacheFiles: Make some debugging statements conditional

Downgrade some debugging statements to not unconditionally print stuff, but
rather be conditional on the appropriate module parameter setting.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Thu, 20 Dec 2012 21:52:33 +0000 (21:52 +0000)]

FS-Cache: Check that there are no read ops when cookie relinquished

Check that the netfs isn't trying to relinquish a cookie that still has read
operations in progress upon it. If there are, then give log a warning and BUG.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Thu, 20 Dec 2012 21:52:33 +0000 (21:52 +0000)]

CacheFiles: Downgrade the requirements passed to the allocator

Downgrade the requirements passed to the allocator in the gfp flags parameter.
FS-Cache/CacheFiles can handle OOM conditions simply by aborting the attempt to
store an object or a page in the cache.

Signed-off-by: David Howells <dhowells@redhat.com>

commit | commitdiff | tree

David Howells [Thu, 20 Dec 2012 21:52:32 +0000 (21:52 +0000)]

CacheFiles: Fix the marking of cached pages

Under some circumstances CacheFiles defers the marking of pages with PG_fscache
so that it can take advantage of pagevecs to reduce the number of calls to
fscache_mark_pages_cached() and the netfs's hook to keep track of this.

There are, however, two problems with this:

(1) It can lead to the PG_fscache mark being applied _after_ the page is set
     PG_uptodate and unlocked (by the call to fscache_end_io()).

(2) CacheFiles's ref on the page is dropped immediately following
     fscache_end_io() - and so may not still be held when the mark is applied.
     This can lead to the page being passed back to the allocator before the
     mark is applied.

Fix this by, where appropriate, marking the page before calling
fscache_end_io() and releasing the page.  This means that we can't take
advantage of pagevecs and have to make a separate call for each page to the
marking routines.

The symptoms of this are Bad Page state errors cropping up under memory
pressure, for example:

BUG: Bad page state in process tar  pfn:002da
page:ffffea0000009fb0 count:0 mapcount:0 mapping:          (null) index:0x1447
page flags: 0x1000(private_2)
Pid: 4574, comm: tar Tainted: G        W   3.1.0-rc4-fsdevel+ #1064
Call Trace:
[<ffffffff8109583c>] ? dump_page+0xb9/0xbe
[<ffffffff81095916>] bad_page+0xd5/0xea
[<ffffffff81095d82>] get_page_from_freelist+0x35b/0x46a
[<ffffffff810961f3>] __alloc_pages_nodemask+0x362/0x662
[<ffffffff810989da>] __do_page_cache_readahead+0x13a/0x267
[<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
[<ffffffff81098d7b>] ra_submit+0x1c/0x20
[<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
[<ffffffff81098ee2>] ? ondemand_readahead+0x163/0x29a
[<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
[<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
[<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
[<ffffffff810c22c4>] do_sync_read+0xba/0xfa
[<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
[<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
[<ffffffff810c29a4>] vfs_read+0xaa/0x13a
[<ffffffff810c2a79>] sys_read+0x45/0x6c
[<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b

As can be seen, PG_private_2 (== PG_fscache) is set in the page flags.

Instrumenting fscache_mark_pages_cached() to verify whether page->mapping was
set appropriately showed that sometimes it wasn't.  This led to the discovery
that sometimes the page has apparently been reclaimed by the time the marker
got to see it.

Reported-by: M. Stevens <m@tippett.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:53:15 +0000 (11:53 +0100)]

hfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:52:33 +0000 (11:52 +0100)]

bfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:51:53 +0000 (11:51 +0100)]

affs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:51:11 +0000 (11:51 +0100)]

adfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:50:20 +0000 (11:50 +0100)]

ocfs2: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:49:42 +0000 (11:49 +0100)]

omfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:48:48 +0000 (11:48 +0100)]

procfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:47:31 +0000 (11:47 +0100)]

reiserfs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:45:58 +0000 (11:45 +0100)]

sysv: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Marco Stornelli [Sat, 15 Dec 2012 10:45:14 +0000 (11:45 +0100)]

ufs: drop vmtruncate

Removed vmtruncate

Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jan Kara [Wed, 5 Dec 2012 13:40:14 +0000 (14:40 +0100)]

fs: Fix imbalance in freeze protection in mark_files_ro()

File descriptors (even those for writing) do not hold freeze protection.
Thus mark_files_ro() must call __mnt_drop_write() to only drop protection
against remount read-only. Calling mnt_drop_write_file() as we do now
results in:

[ BUG: bad unlock balance detected! ]
3.7.0-rc6-00028-g88e75b6 #101 Not tainted
-------------------------------------
kworker/1:2/79 is trying to release lock (sb_writers) at:
[<ffffffff811b33b4>] mnt_drop_write+0x24/0x30
but there are no more locks to release!

Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Wed, 28 Nov 2012 16:30:53 +0000 (11:30 -0500)]

vfs: remove DCACHE_NEED_LOOKUP

The code that relied on that flag was ripped out of btrfs quite some
time ago, and never added back. Josef indicated that he was going to
take a different approach to the problem in btrfs, and that we
could just eliminate this flag.

Cc: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Al Viro [Thu, 20 Dec 2012 18:41:28 +0000 (13:41 -0500)]

path_init(): make -ENOTDIR failure exits consistent

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Jeff Layton [Tue, 11 Dec 2012 13:56:16 +0000 (08:56 -0500)]

vfs: remove unneeded permission check from path_init

When path_init is called with a valid dfd, that code checks permissions
on the open directory fd and returns an error if the check fails. This
permission check is redundant, however.

Both callers of path_init immediately call link_path_walk afterward. The
first thing that link_path_walk does for pathnames that do not consist
only of slashes is to check for exec permissions at the starting point of
the path walk. And this check in path_init() is on the path taken only
when *name != '/' && *name != '\0'.

In most cases, these checks are very quick, but when the dfd is for a
file on a NFS mount with the actimeo=0, each permission check goes
out onto the wire. The result is 2 identical ACCESS calls.

Given that these codepaths are fairly "hot", I think it makes sense to
eliminate the permission check in path_init and simply assume that the
caller will eventually check the permissions before proceeding.

Reported-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Miao Xie [Fri, 16 Nov 2012 09:23:50 +0000 (17:23 +0800)]

vfs, freeze: use ACCESS_ONCE() to guard access to ->mnt_flags

The compiler may optimize the while loop and make the check just be done once,
so we should use ACCESS_ONCE() to guard access to ->mnt_flags

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

commit | commitdiff | tree

Vaibhav Bedia [Wed, 19 Dec 2012 06:53:10 +0000 (06:53 +0000)]

ARM: OMAP: Fix build breakage due to missing include in i2c.c

Merge commit 752451f01c45 ("Merge branch 'i2c-embedded/for-next' of
git://git.pengutronix.de/git/wsa/linux") resulted in a build breakage
for OMAP

  arch/arm/mach-omap2/i2c.c: In function 'omap_pm_set_max_mpu_wakeup_lat_compat':
  arch/arm/mach-omap2/i2c.c:130:2: error: implicit declaration of function 'omap_pm_set_max_mpu_wakeup_lat'
  make[1]: *** [arch/arm/mach-omap2/i2c.o] Error 1

Fix this by including the appropriate header file with the function
prototype.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Vaibhav Bedia <vaibhav.bedia@ti.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 16:37:04 +0000 (08:37 -0800)]

Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux

Pull virtio update from Rusty Russell:
"Some nice cleanups, and even a patch my wife did as a "live" demo for
  Latinoware 2012.

  There's a slightly non-trivial merge in virtio-net, as we cleaned up
  the virtio add_buf interface while DaveM accepted the mq virtio-net
  patches."

* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (27 commits)
  virtio_console: Add support for remoteproc serial
  virtio_console: Merge struct buffer_token into struct port_buffer
  virtio: add drv_to_virtio to make code clearly
  virtio: use dev_to_virtio wrapper in virtio
  virtio-mmio: Fix irq parsing in command line parameter
  virtio_console: Free buffers from out-queue upon close
  virtio: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
  virtio_console: Use kmalloc instead of kzalloc
  virtio_console: Free buffer if splice fails
  virtio: tools: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: scsi: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: rpmsg: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: console: make it clear that virtqueue_add_buf() no longer returns > 0
  virtio: make virtqueue_add_buf() returning 0 on success, not capacity.
  virtio: console: don't rely on virtqueue_add_buf() returning capacity.
  virtio_net: don't rely on virtqueue_add_buf() returning capacity.
  virtio-net: remove unused skb_vnet_hdr->num_sg field
  virtio-net: correct capacity math on ring full
  virtio: move queue_index and num_free fields into core struct virtqueue.
  ...

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 15:52:13 +0000 (07:52 -0800)]

Merge tag 'sound-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"This update contains overall only driver-specific fixes.  Slightly
  large LOC are seen in usb-audio driver for a couple of new device
  quirks and cs42l71 ASoC driver for enhanced features.  The others are
  a few small (regression) fixes HD-audio, and yet other small / trival
  ASoC fixes."

* tag 'sound-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: usb-audio: Support for Digidesign Mbox 2 USB sound card:
  ALSA: HDA: Fix sound resume hang
  ALSA: hda - bug fix for invalid connection list of Haswell HDMI codec pins
  ALSA: hda - Fix the wrong pincaps set in ALC861VD dallas/hp fixup
  ALSA: hda - Set codec->single_adc_amp flag for Realtek codecs
  ASoC: atmel-ssc: change disable to disable in dts node
  ASoC: Prevent pop_wait overwrite
  ALSA: usb-audio: ignore-quirk for HP Wireless Audio
  ALSA: hda - Always turn on pins for HDMI/DP
  ALSA: hda - Fix pin configuration of HP Pavilion dv7
  ASoC: core: Fix splitting of log messages
  ASoC: cs42l73: Change VSPIN/VSPOUT to VSPINOUT
  ASoC: cs42l73: Add DAPM events for power down.
  ASoC: cs42l73: Add DMIC's as DAPM inputs.
  ASoC: sigmadsp: Fix endianness conversion issue
  ASoC: tpa6130a2: Use devm_* APIs

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 15:39:03 +0000 (07:39 -0800)]

Merge tag 'upstream-3.8-rc1' of git://git.infradead.org/linux-ubi

Pull UBI update from Artem Bityutskiy:
"Nothing exciting, just clean-ups and nicification.  Oh, and one small
  optimization which makes UBI to use less RAM."

* tag 'upstream-3.8-rc1' of git://git.infradead.org/linux-ubi:
  UBI: embed ubi_debug_info field in ubi_device struct
  UBI: introduce helpers dbg_chk_{io, gen}
  UBI: replace memcpy with struct assignment
  UBI: remove spurious comment
  UBI: gluebi: rename misleading variables
  UBI: do not allocate the memory unnecessarily
  UBI: use list_move_tail instead of list_del/list_add_tail

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 15:27:44 +0000 (07:27 -0800)]

Merge tags 'disintegrate-h8300-20121219', 'disintegrate-m32r-20121219' and 'disintegrate-score-20121220' of git://git.infradead.org/users/dhowells/linux-headers

Pull UAPI disintegration for H8/300, M32R and Score from David Howells.

Scripted UAPI patches for architectures that apparently never reacted to
it on their own.

* tag 'disintegrate-h8300-20121219' of git://git.infradead.org/users/dhowells/linux-headers:
  UAPI: (Scripted) Disintegrate arch/h8300/include/asm

* tag 'disintegrate-m32r-20121219' of git://git.infradead.org/users/dhowells/linux-headers:
  UAPI: (Scripted) Disintegrate arch/m32r/include/asm

* tag 'disintegrate-score-20121220' of git://git.infradead.org/users/dhowells/linux-headers:
  UAPI: (Scripted) Disintegrate arch/score/include/asm

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 15:24:17 +0000 (07:24 -0800)]

Merge tag 'cris-for-linus-3.8' of git://jni.nu/cris

Pull CRIS changes from Jesper Nilsson.

... mainly the UAPI disintegration.

* tag 'cris-for-linus-3.8' of git://jni.nu/cris:
  UAPI: Fix up empty files in arch/cris/
  CRIS: locking: fix the return value of arch_read_trylock()
  CRIS: use kbuild.h instead of defining macros in asm-offset.c
  UAPI: (Scripted) Disintegrate arch/cris/include/asm
  UAPI: (Scripted) Disintegrate arch/cris/include/arch-v32/arch
  UAPI: (Scripted) Disintegrate arch/cris/include/arch-v10/arch

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 15:21:54 +0000 (07:21 -0800)]

Merge tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"This is a batch of fixes for arm-soc platforms, most of it is for OMAP
  but there are others too (i.MX, Tegra, ep93xx).  Fixes warnings, some
  broken platforms and drivers, etc.  A bit all over the map really."

There was some concern about commit 68136b10 ("RM: sunxi: Change device
tree naming scheme for sunxi"), but Tony says:
"Looks like that's trivial to fix as needed, no need to rebuild the
  branch to fix that AFAIK.

  The fix can be done once Olof is available online again.

  Linus, I suggest that you go ahead and pull this if there are no other
  issues with this branch."

* tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (32 commits)
  ARM: sunxi: Change device tree naming scheme for sunxi
  ARM: ux500: fix missing include
  ARM: u300: delete custom pin hog code
  ARM: davinci: fix build break due to missing include
  ARM: exynos: Fix warning due to missing 'inline' in stub
  ARM: imx: Move platform-mx2-emma to arch/arm/mach-imx/devices
  ARM i.MX51 clock: Fix regression since enabling MIPI/HSP clocks
  ARM: dts: mx27: Fix the AIPI bus for FEC
  ARM: OMAP2+: common: remove use of vram
  ARM: OMAP3/4: cpuidle: fix sparse and checkpatch warnings
  ARM: OMAP4: clock data: DPLLs are missing bypass clocks in their parent lists
  ARM: OMAP4: clock data: div_iva_hs_clk is a power-of-two divider
  ARM: OMAP4: Fix EMU clock domain always on
  ARM: OMAP4460: Workaround ABE DPLL failing to turn-on
  ARM: OMAP4: Enhance support for DPLLs with 4X multiplier
  ARM: OMAP4: Add function table for non-M4X dplls
  ARM: OMAP4: Update timer clock aliases
  ARM: OMAP: Move plat/omap-serial.h to include/linux/platform_data/serial-omap.h
  ARM: dts: Add build target for omap4-panda-a4
  ARM: dts: OMAP2420: Correct H4 board memory size
  ...

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 15:18:29 +0000 (07:18 -0800)]

Merge tag 'tag-for-linus-3.8' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf

Pull dma-buf updates from Sumit Semwal:
"A fairly small dma-buf pull request for 3.8 - only 2 patches"

* tag 'tag-for-linus-3.8' of git://git.linaro.org/people/sumitsemwal/linux-dma-buf:
dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER
dma-buf: might_sleep() in dma_buf_unmap_attachment()

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 15:07:18 +0000 (07:07 -0800)]

Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging

Pull hwmon subsystem update from Jean Delvare:
"There are many improvements to the it87 driver, as well as suspend
  support for the Winbond Super-I/O chips, and a few other fixes."

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  hwmon-vid: Add support for AMD family 11h to 15h processors
  hwmon: (it87) Support PECI for additional chips
  hwmon: (it87) Report thermal sensor type as Intel PECI if appropriate
  hwmon: (it87) Manage device specific features with table
  hwmon: (it87) Replace pwm group macro with direct attribute definitions
  hwmon: (it87) Avoid quoted string splits across lines
  hwmon: (it87) Save fan registers in 2-dimensional array
  hwmon: (it87) Introduce support for tempX_offset sysfs attribute
  hwmon: (it87) Replace macro defining tempX_type sensors with direct definitions
  hwmon: (it87) Save voltage register values in 2-dimensional array
  hwmon: (it87) Save temperature registers in 2-dimensional array
  hwmon: (w83627ehf) Get rid of smatch warnings
  hwmon: (w83627hf) Don't touch nonexistent I2C address registers
  hwmon: (w83627ehf) Add support for suspend
  hwmon: (w83627hf) Add support for suspend
  hwmon: Fix PCI device reference leak in quirk

commit | commitdiff | tree

Hugh Dickins [Thu, 20 Dec 2012 01:44:29 +0000 (17:44 -0800)]

ksm: make rmap walks more scalable

The rmap walks in ksm.c are like those in rmap.c: they can safely be
done with anon_vma_lock_read().

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Hugh Dickins [Thu, 20 Dec 2012 01:42:16 +0000 (17:42 -0800)]

sched: numa: ksm: fix oops in task_numa_placment()

task_numa_placement() oopsed on NULL p->mm when task_numa_fault() got
called in the handling of break_ksm() for ksmd. That might be a
peculiar case, which perhaps KSM could takes steps to avoid? but it's
more robust if task_numa_placement() allows for such a possibility.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Zlatko Calusic [Wed, 19 Dec 2012 23:25:13 +0000 (00:25 +0100)]

mm: do not sleep in balance_pgdat if there's no i/o congestion

On a 4GB RAM machine, where Normal zone is much smaller than DMA32 zone,
the Normal zone gets fragmented in time.  This requires relatively more
pressure in balance_pgdat to get the zone above the required watermark.
Unfortunately, the congestion_wait() call in there slows it down for a
completely wrong reason, expecting that there's a lot of
writeback/swapout, even when there's none (much more common).  After a
few days, when fragmentation progresses, this flawed logic translates to
a very high CPU iowait times, even though there's no I/O congestion at
all.  If THP is enabled, the problem occurs sooner, but I was able to
see it even on !THP kernels, just by giving it a bit more time to occur.

The proper way to deal with this is to not wait, unless there's
congestion.  Thanks to Mel Gorman, we already have the function that
perfectly fits the job.  The patch was tested on a machine which nicely
revealed the problem after only 1 day of uptime, and it's been working
great.

Signed-off-by: Zlatko Calusic <zlatko.calusic@iskon.hr>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

David Howells [Mon, 22 Oct 2012 12:18:44 +0000 (14:18 +0200)]

UAPI: Fix up empty files in arch/cris/

Fix up three empty files in arch/cris/ by sticking placeholder comments in
there to prevent the patch program from deleting them.

I decided not to delete the arch-v*/Kbuild files as it's possibly someone might
want to use them for genhdr-y lines in the future, but they could be deleted
and the pointer lines removed from asm/Kbuild. The uapi/arch-v*/Kbuild files
ought to be uneffected by such a change.

asm/swab.h didn't have anything outside of __KERNEL__ so nothing appeared in
uapi/asm/swab.h. The latter, however, is exported by Kbuild.asm.

This needs to be applied after the CRIS UAPI disintegration patch.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>

commit | commitdiff | tree

Wei Yongjun [Wed, 17 Oct 2012 14:54:27 +0000 (16:54 +0200)]

CRIS: locking: fix the return value of arch_read_trylock()

arch_write_trylock() should return 'ret' instead of always
return 1.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>

commit | commitdiff | tree

Jesper Nilsson [Thu, 20 Dec 2012 11:48:53 +0000 (12:48 +0100)]

Merge tag 'disintegrate-cris-20121009' of git://git.infradead.org/users/dhowells/linux-headers into for-linus2

UAPI Disintegration 2012-10-09

* tag 'disintegrate-cris-20121009' of git://git.infradead.org/users/dhowells/linux-headers:
  UAPI: (Scripted) Disintegrate arch/cris/include/asm
  UAPI: (Scripted) Disintegrate arch/cris/include/arch-v32/arch
  UAPI: (Scripted) Disintegrate arch/cris/include/arch-v10/arch

commit | commitdiff | tree

James Hogan [Thu, 11 Oct 2012 09:00:58 +0000 (11:00 +0200)]

CRIS: use kbuild.h instead of defining macros in asm-offset.c

This is modelled on commits such as the one below:

Commit fc1c3a003edb8a6778e64e10ef671a38c76c969e ("sh: use kbuild.h
instead of defining macros in asm-offsets.c") introduced in v2.6.26.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>

commit | commitdiff | tree

David Howells [Thu, 20 Dec 2012 10:53:58 +0000 (10:53 +0000)]

UAPI: (Scripted) Disintegrate arch/score/include/asm

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
Acked-by: Lennox Wu <lennox.wu@gmail.com>
Acked-by: Liqin Chen <liqin299@gmail.com>

commit | commitdiff | tree

Maarten Lankhorst [Wed, 12 Dec 2012 09:23:03 +0000 (10:23 +0100)]

dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER

Documentation says that code requiring dma-buf should add it to
select, so inline fallbacks are not going to be used. A link error
will make it obvious what went wrong, instead of silently doing
nothing at runtime.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Rob Clark <rob.clark@linaro.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>

commit | commitdiff | tree

Rob Clark [Fri, 28 Sep 2012 07:29:43 +0000 (09:29 +0200)]

dma-buf: might_sleep() in dma_buf_unmap_attachment()

We never really clarified if unmap could be done in atomic context.
But since mapping might require sleeping, this implies mutex in use
to synchronize mapping/unmapping, so unmap could sleep as well. Add
a might_sleep() to clarify this.

Signed-off-by: Rob Clark <rob@ti.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 04:31:02 +0000 (20:31 -0800)]

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
"Please pull to get these sparc AES/DES/CAMELLIA crypto bug fixes as
  well as an addition of a pte_accessible() define for sparc64 and a
  hugetlb fix from Dave Kleikamp."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in CAMELLIA code.
  sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in DES code.
  sparc64: Fix ECB looping constructs in AES code.
  sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in AES code.
  sparc64: Fix AES ctr mode block size.
  sparc64: Fix unrolled AES 256-bit key loops.
  sparc64: Define pte_accessible()
  sparc: huge_ptep_set_* functions need to call set_huge_pte_at()

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 04:29:15 +0000 (20:29 -0800)]

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Really fix tuntap SKB use after free bug, from Eric Dumazet.

2) Adjust SKB data pointer to point past the transport header before
    calling icmpv6_notify() so that the headers are in the state which
    that function expects.  From Duan Jiong.

3) Fix ambiguities in the new tuntap multi-queue APIs.  From Jason
    Wang.

4) mISDN needs to use del_timer_sync(), from Konstantin Khlebnikov.

5) Don't destroy mutex after freeing up device private in mac802154,
    fix also from Konstantin Khlebnikov.

6) Fix INET request socket leak in TCP and DCCP, from Christoph Paasch.

7) SCTP HMAC kconfig rework, from Neil Horman.

8) Fix SCTP jprobes function signature, otherwise things explode, from
    Daniel Borkmann.

9) Fix typo in ipv6-offload Makefile variable reference, from Simon
    Arlott.

10) Don't fail USBNET open just because remote wakeup isn't supported,
    from Oliver Neukum.

11) be2net driver bug fixes from Sathya Perla.

12) SOLOS PCI ATM driver bug fixes from Nathan Williams and David
    Woodhouse.

13) Fix MTU changing regression in 8139cp driver, from John Greene.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
  solos-pci: ensure all TX packets are aligned to 4 bytes
  solos-pci: add firmware upgrade support for new models
  solos-pci: remove superfluous debug output
  solos-pci: add GPIO support for newer versions on Geos board
  8139cp: Prevent dev_close/cp_interrupt race on MTU change
  net: qmi_wwan: add ZTE MF880
  drivers/net: Use of_match_ptr() macro in smsc911x.c
  drivers/net: Use of_match_ptr() macro in smc91x.c
  ipv6: addrconf.c: remove unnecessary "if"
  bridge: Correctly encode addresses when dumping mdb entries
  bridge: Do not unregister all PF_BRIDGE rtnl operations
  use generic usbnet_manage_power()
  usbnet: generic manage_power()
  usbnet: handle PM failure gracefully
  ksz884x: fix receive polling race condition
  qlcnic: update driver version
  qlcnic: fix unused variable warnings
  net: fec: forbid FEC_PTP on SoCs that do not support
  be2net: fix wrong frag_idx reported by RX CQ
  be2net: fix be_close() to ensure all events are ack'ed
  ...

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 04:26:16 +0000 (20:26 -0800)]

Merge tags 'dt-for-linus', 'gpio-for-linus' and 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull devicetree, gpio and spi bugfixes from Grant Likely:
"Device tree v3.8 bug fix:
   - Fixes an undefined struct device build error and a missing symbol
     export.

  GPIO device driver bug fixes:
   - gpio/mvebu-gpio: Make mvebu-gpio depend on OF_CONFIG
   - gpio/ich: Add missing spinlock init

  SPI device driver bug fixes:
   - Most of this is bug fixes to the core code and the sh-hspi and
     s3c64xx device drivers.

   - There is also a patch here to add DT support to the Atmel driver.
     This one should have been in the first round, but I missed it.
     It's a low risk change contained within a single driver and the
     Atmel maintainer has requested it."

* tag 'dt-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  of: define struct device in of_platform.h if !OF_DEVICE and !OF_ADDRESS
  of: Fix export of of_find_matching_node_and_match()

* tag 'gpio-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  gpio/mvebu-gpio: Make mvebu-gpio depend on OF_CONFIG
  gpio/ich: Add missing spinlock init

* tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  spi/sh-hspi: fix return value check in hspi_probe().
  spi: fix tegra SPI binding examples
  spi/atmel: add DT support
  of/spi: Fix SPI module loading by using proper "spi:" modalias prefixes.
  spi: Change FIFO flush operation and spi channel off
  spi: Keep chipselect assertion during one message

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 04:24:25 +0000 (20:24 -0800)]

Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux

Pull drm bugfix from Dave Airlie:
"Just a single urgent regression fix, seeing a few wierd behaviours I'd
like not to persist."

* 'drm-next' of git://people.freedesktop.org/~airlied/linux:
drm/ttm: fix delayed ttm_bo_cleanup_refs_and_unlock delayed handling

commit | commitdiff | tree

Linus Torvalds [Thu, 20 Dec 2012 04:23:37 +0000 (20:23 -0800)]

Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random

Pull random updates from Ted Ts'o:
"A few /dev/random improvements for the v3.8 merge window."

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  random: Mix cputime from each thread that exits to the pool
  random: prime last_data value per fips requirements
  random: fix debug format strings
  random: make it possible to enable debugging without rebuild

commit | commitdiff | tree

David S. Miller [Wed, 19 Dec 2012 23:44:31 +0000 (15:44 -0800)]

sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in CAMELLIA code.

We use the FPU and therefore cannot sleep during the crypto
loops.

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 19 Dec 2012 23:43:38 +0000 (15:43 -0800)]

sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in DES code.

We use the FPU and therefore cannot sleep during the crypto
loops.

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 19 Dec 2012 23:30:07 +0000 (15:30 -0800)]

sparc64: Fix ECB looping constructs in AES code.

Things works better when you increment the source buffer pointer
properly.

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 19 Dec 2012 23:22:03 +0000 (15:22 -0800)]

sparc64: Set CRYPTO_TFM_REQ_MAY_SLEEP consistently in AES code.

We use the FPU and therefore cannot sleep during the crypto
loops.

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 19 Dec 2012 23:20:23 +0000 (15:20 -0800)]

sparc64: Fix AES ctr mode block size.

Like the generic versions, we need to support a block size
of '1' for CTR mode AES.

This was discovered thanks to all of the new test cases added by
Jussi Kivilinna.

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 19 Dec 2012 23:19:11 +0000 (15:19 -0800)]

sparc64: Fix unrolled AES 256-bit key loops.

The basic scheme of the block mode assembler is that we start by
enabling the FPU, loading the key into the floating point registers,
then iterate calling the encrypt/decrypt routine for each block.

For the 256-bit key cases, we run short on registers in the unrolled
loops.

So the {ENCRYPT,DECRYPT}_256_2() macros reload the key registers that
get clobbered.

The unrolled macros, {ENCRYPT,DECRYPT}_256(), are not mindful of this.

So if we have a mix of multi-block and single-block calls, the
single-block unrolled 256-bit encrypt/decrypt can run with some
of the key registers clobbered.

Handle this by always explicitly loading those registers before using
the non-unrolled 256-bit macro.

This was discovered thanks to all of the new test cases added by
Jussi Kivilinna.

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David Woodhouse [Wed, 19 Dec 2012 11:01:21 +0000 (11:01 +0000)]

solos-pci: ensure all TX packets are aligned to 4 bytes

The FPGA can't handled unaligned DMA (yet). So copy into an aligned buffer,
if skb->data isn't suitably aligned.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nathan Williams [Wed, 19 Dec 2012 11:01:20 +0000 (11:01 +0000)]

solos-pci: add firmware upgrade support for new models

Signed-off-by: Nathan Williams <nathan@traverse.com.au>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nathan Williams [Wed, 19 Dec 2012 11:01:19 +0000 (11:01 +0000)]

solos-pci: remove superfluous debug output

Signed-off-by: Nathan Williams <nathan@traverse.com.au>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nathan Williams [Wed, 19 Dec 2012 11:01:18 +0000 (11:01 +0000)]

solos-pci: add GPIO support for newer versions on Geos board

dwmw2: Tidy up a little, simpler matching on which GPIO is being accessed,
only register on newer boards, register under PCI device instead of
duplicating them under each ATM device.

Signed-off-by: Nathan Williams <nathan@traverse.com.au>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

John Greene [Wed, 19 Dec 2012 09:47:48 +0000 (09:47 +0000)]

8139cp: Prevent dev_close/cp_interrupt race on MTU change

commit:  cb64edb6b89491edfdbae52ba7db9a8b8391d339 upstream

Above commit may introduce a race between cp_interrupt and dev_close
/ change MTU / dev_open up state. Changes cp_interrupt to tolerate
this.  Change spin_locking in cp_interrupt to avoid possible
but unobserved race.

Reported-by: "Francois Romieu" <romieu@fr.zoreil.com>
Tested on virtual hardware, Tx MTU size up to 4096, max tx payload
    was ping -s 4068 for MTU of 4096. No real hardware, need test
    assist.

Signed-off-by: "John Greene" <jogreene@redhat.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: "David Woodhouse" <David.Woodhouse@intel.com>
Tested-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Linux kernel for KaRo TX COM modules

RSS Atom