]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
11 years agoext3 ->tmpfile() support
Al Viro [Tue, 11 Jun 2013 08:52:02 +0000 (12:52 +0400)]
ext3 ->tmpfile() support

In this case we do need a bit more than usual, due to orphan
list handling.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoallow the temp files created by open() to be linked to
Al Viro [Tue, 11 Jun 2013 04:34:36 +0000 (08:34 +0400)]
allow the temp files created by open() to be linked to

O_TMPFILE | O_CREAT => linkat() with AT_SYMLINK_FOLLOW and /proc/self/fd/<n>
as oldpath (i.e. flink()) will create a link
O_TMPFILE | O_CREAT | O_EXCL => ENOENT on attempt to link those guys

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...
Al Viro [Fri, 7 Jun 2013 05:20:27 +0000 (01:20 -0400)]
[O_TMPFILE] it's still short a few helpers, but infrastructure should be OK now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoallow build_open_flags() to return an error
Al Viro [Tue, 11 Jun 2013 04:23:01 +0000 (08:23 +0400)]
allow build_open_flags() to return an error

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agolift file_*_write out of do_splice_direct()
Al Viro [Fri, 24 May 2013 00:10:34 +0000 (20:10 -0400)]
lift file_*_write out of do_splice_direct()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agolift file_*_write out of do_splice_from()
Al Viro [Fri, 24 May 2013 00:07:11 +0000 (20:07 -0400)]
lift file_*_write out of do_splice_from()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agodo_last(): fix missing checks for LAST_BIND case
Al Viro [Thu, 6 Jun 2013 13:12:33 +0000 (09:12 -0400)]
do_last(): fix missing checks for LAST_BIND case

/proc/self/cwd with O_CREAT should fail with EISDIR.  /proc/self/exe, OTOH,
should fail with ENOTDIR when opened with O_DIRECTORY.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agopcm_native: switch to fdget()/fdput()
Al Viro [Wed, 5 Jun 2013 18:09:55 +0000 (14:09 -0400)]
pcm_native: switch to fdget()/fdput()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] constify ->actor
Al Viro [Thu, 23 May 2013 02:22:04 +0000 (22:22 -0400)]
[readdir] constify ->actor

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] ->readdir() is gone
Al Viro [Thu, 23 May 2013 01:44:23 +0000 (21:44 -0400)]
[readdir] ->readdir() is gone

everything's converted to ->iterate()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert ecryptfs
Al Viro [Thu, 23 May 2013 01:23:40 +0000 (21:23 -0400)]
[readdir] convert ecryptfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert coda
Al Viro [Thu, 23 May 2013 01:15:30 +0000 (21:15 -0400)]
[readdir] convert coda

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert ocfs2
Al Viro [Thu, 23 May 2013 01:06:00 +0000 (21:06 -0400)]
[readdir] convert ocfs2

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert fatfs
Al Viro [Wed, 22 May 2013 22:37:16 +0000 (18:37 -0400)]
[readdir] convert fatfs

... pox upon the idiotic ioctls; life would be much easier without
those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert xfs
Al Viro [Wed, 22 May 2013 21:07:56 +0000 (17:07 -0400)]
[readdir] convert xfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert btrfs
Al Viro [Wed, 22 May 2013 20:48:09 +0000 (16:48 -0400)]
[readdir] convert btrfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert hostfs
Al Viro [Wed, 22 May 2013 20:34:19 +0000 (16:34 -0400)]
[readdir] convert hostfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert afs
Al Viro [Wed, 22 May 2013 20:31:14 +0000 (16:31 -0400)]
[readdir] convert afs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert ncpfs
Al Viro [Wed, 22 May 2013 19:11:27 +0000 (15:11 -0400)]
[readdir] convert ncpfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert hfsplus
Al Viro [Wed, 22 May 2013 18:59:39 +0000 (14:59 -0400)]
[readdir] convert hfsplus

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert hfs
Al Viro [Wed, 22 May 2013 18:29:35 +0000 (14:29 -0400)]
[readdir] convert hfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert befs
Al Viro [Wed, 22 May 2013 17:44:05 +0000 (13:44 -0400)]
[readdir] convert befs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert cifs
Al Viro [Wed, 22 May 2013 20:17:25 +0000 (16:17 -0400)]
[readdir] convert cifs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert freevxfs
Al Viro [Sat, 18 May 2013 07:15:00 +0000 (03:15 -0400)]
[readdir] convert freevxfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert fuse
Al Viro [Sat, 18 May 2013 07:03:58 +0000 (03:03 -0400)]
[readdir] convert fuse

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert hpfs
Al Viro [Sat, 18 May 2013 06:58:57 +0000 (02:58 -0400)]
[readdir] convert hpfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoreiserfs: switch reiserfs_readdir_dentry to inode
Al Viro [Sat, 18 May 2013 02:58:58 +0000 (22:58 -0400)]
reiserfs: switch reiserfs_readdir_dentry to inode

... and clean the callers up a bit

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoreiserfs: is_privroot_deh() needs only directory inode, actually
Al Viro [Sat, 18 May 2013 02:45:29 +0000 (22:45 -0400)]
reiserfs: is_privroot_deh() needs only directory inode, actually

... and that - only to get the superblock.  Privroot is a directory
and we don't allow hardlinks to those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert reiserfs
Al Viro [Sat, 18 May 2013 02:42:17 +0000 (22:42 -0400)]
[readdir] convert reiserfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert ntfs
Al Viro [Sat, 18 May 2013 01:22:31 +0000 (21:22 -0400)]
[readdir] convert ntfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert isofs
Al Viro [Sat, 18 May 2013 01:11:59 +0000 (21:11 -0400)]
[readdir] convert isofs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert jffs2
Al Viro [Fri, 17 May 2013 22:08:49 +0000 (18:08 -0400)]
[readdir] convert jffs2

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert f2fs
Al Viro [Fri, 17 May 2013 22:02:17 +0000 (18:02 -0400)]
[readdir] convert f2fs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert 9p
Al Viro [Fri, 17 May 2013 21:51:41 +0000 (17:51 -0400)]
[readdir] convert 9p

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert affs
Al Viro [Fri, 17 May 2013 21:44:42 +0000 (17:44 -0400)]
[readdir] convert affs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert adfs
Al Viro [Fri, 17 May 2013 21:30:10 +0000 (17:30 -0400)]
[readdir] convert adfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert logfs
Al Viro [Fri, 17 May 2013 21:06:34 +0000 (17:06 -0400)]
[readdir] convert logfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert jfs
Al Viro [Fri, 17 May 2013 21:00:34 +0000 (17:00 -0400)]
[readdir] convert jfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert ceph
Al Viro [Fri, 17 May 2013 20:52:26 +0000 (16:52 -0400)]
[readdir] convert ceph

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert nfs
Al Viro [Fri, 17 May 2013 20:34:50 +0000 (16:34 -0400)]
[readdir] convert nfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert ext4
Al Viro [Fri, 17 May 2013 20:08:53 +0000 (16:08 -0400)]
[readdir] convert ext4

and trim the living hell out bogosities in inline dir case

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert qnx6
Al Viro [Fri, 17 May 2013 19:32:10 +0000 (15:32 -0400)]
[readdir] convert qnx6

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert qnx4
Al Viro [Fri, 17 May 2013 19:17:59 +0000 (15:17 -0400)]
[readdir] convert qnx4

... and use strnlen() instead of strlen() - it's done on untrusted data,
after all.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert omfs
Al Viro [Fri, 17 May 2013 19:05:25 +0000 (15:05 -0400)]
[readdir] convert omfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert nilfs2
Al Viro [Thu, 16 May 2013 18:36:14 +0000 (14:36 -0400)]
[readdir] convert nilfs2

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert sysfs
Al Viro [Thu, 16 May 2013 18:31:02 +0000 (14:31 -0400)]
[readdir] convert sysfs

get rid of the kludges in sysfs_readdir()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert gfs2
Al Viro [Thu, 16 May 2013 18:14:48 +0000 (14:14 -0400)]
[readdir] convert gfs2

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert exofs
Al Viro [Thu, 16 May 2013 17:48:17 +0000 (13:48 -0400)]
[readdir] convert exofs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert bfs
Al Viro [Thu, 16 May 2013 17:41:48 +0000 (13:41 -0400)]
[readdir] convert bfs

... and get rid of that ridiculous mutex in bfs_readdir()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert procfs
Al Viro [Thu, 16 May 2013 16:07:31 +0000 (12:07 -0400)]
[readdir] convert procfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert openpromfs
Al Viro [Thu, 16 May 2013 05:52:12 +0000 (01:52 -0400)]
[readdir] convert openpromfs

what the hell is op_mutex for, BTW?

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert efs
Al Viro [Thu, 16 May 2013 05:41:10 +0000 (01:41 -0400)]
[readdir] convert efs

* sanity checks belong before risky operation, not after it
* don't quit as soon as we'd found an entry

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert configfs
Al Viro [Thu, 16 May 2013 05:28:34 +0000 (01:28 -0400)]
[readdir] convert configfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert romfs
Al Viro [Thu, 16 May 2013 05:22:00 +0000 (01:22 -0400)]
[readdir] convert romfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert squashfs
Al Viro [Thu, 16 May 2013 05:17:58 +0000 (01:17 -0400)]
[readdir] convert squashfs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert ubifs
Al Viro [Thu, 16 May 2013 05:14:46 +0000 (01:14 -0400)]
[readdir] convert ubifs

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert udf
Al Viro [Thu, 16 May 2013 05:09:37 +0000 (01:09 -0400)]
[readdir] convert udf

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] convert ext3
Al Viro [Thu, 16 May 2013 01:02:48 +0000 (21:02 -0400)]
[readdir] convert ext3

new helper: dir_relax(inode).  Call when you are in location that will
_not_ be invalidated by directory modifications (block boundary, in case
of ext*).  Returns whether the directory has survived (dropping i_mutex
allows rmdir to kill the sucker; if it returns false to us, ->iterate()
is obviously done)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] switch dcache_readdir() users to ->iterate()
Al Viro [Thu, 16 May 2013 00:23:06 +0000 (20:23 -0400)]
[readdir] switch dcache_readdir() users to ->iterate()

new helpers - dir_emit_dot(file, ctx, dentry), dir_emit_dotdot(file, ctx),
dir_emit_dots(file, ctx).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] simple local unixlike: switch to ->iterate()
Al Viro [Wed, 15 May 2013 22:51:49 +0000 (18:51 -0400)]
[readdir] simple local unixlike: switch to ->iterate()

ext2, ufs, minix, sysv

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] introduce ->iterate(), ctx->pos, dir_emit()
Al Viro [Wed, 15 May 2013 22:49:12 +0000 (18:49 -0400)]
[readdir] introduce ->iterate(), ctx->pos, dir_emit()

New method - ->iterate(file, ctx).  That's the replacement for ->readdir();
it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
calls dir_emit(ctx, ...) instead of filldir(data, ...).  It does *not*
update file->f_pos (or look at it, for that matter); iterate_dir() does the
update.

Note that dir_emit() takes the offset from ctx->pos (and eventually
filldir_t will lose that argument).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years ago[readdir] introduce iterate_dir() and dir_context
Al Viro [Wed, 15 May 2013 17:52:59 +0000 (13:52 -0400)]
[readdir] introduce iterate_dir() and dir_context

iterate_dir(): new helper, replacing vfs_readdir().

struct dir_context: contains the readdir callback (and will get more stuff
in it), embedded into whatever data that callback wants to deal with;
eventually, we'll be passing it to ->readdir() replacement instead of
(data,filldir) pair.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agomove linux/loop.h to drivers/block
Al Viro [Sun, 12 May 2013 14:14:07 +0000 (10:14 -0400)]
move linux/loop.h to drivers/block

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agocompat.c: LOOP_CLR_FD is taken care of in loop.c itself...
Al Viro [Sun, 12 May 2013 14:12:11 +0000 (10:12 -0400)]
compat.c: LOOP_CLR_FD is taken care of in loop.c itself...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agopxa3xx: VM_IO is set by io_remap_pfn_range()
Al Viro [Sat, 11 May 2013 16:39:26 +0000 (12:39 -0400)]
pxa3xx: VM_IO is set by io_remap_pfn_range()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoau1100fb: VM_IO is set by io_remap_pfn_range()
Al Viro [Sat, 11 May 2013 16:38:38 +0000 (12:38 -0400)]
au1100fb: VM_IO is set by io_remap_pfn_range()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoau1200fb: io_remap_pfn_range() sets VM_IO
Al Viro [Sat, 11 May 2013 16:37:38 +0000 (12:37 -0400)]
au1200fb: io_remap_pfn_range() sets VM_IO

... and single return is quite sufficient to get out of function, TYVM

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agovfio: remap_pfn_range() sets all those flags...
Al Viro [Sat, 11 May 2013 16:33:31 +0000 (12:33 -0400)]
vfio: remap_pfn_range() sets all those flags...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoi810: VM_IO is set by io_remap_pfn_range()
Al Viro [Sat, 11 May 2013 16:27:16 +0000 (12:27 -0400)]
i810: VM_IO is set by io_remap_pfn_range()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agodrm: io_remap_pfn_range() sets VM_IO...
Al Viro [Sat, 11 May 2013 16:23:17 +0000 (12:23 -0400)]
drm: io_remap_pfn_range() sets VM_IO...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agosparc: __pci_mmap_set_flags() is useless
Al Viro [Sat, 11 May 2013 16:21:55 +0000 (12:21 -0400)]
sparc: __pci_mmap_set_flags() is useless

io_remap_pfn_range() does all we need

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agomn10300: don't bother with VM_IO
Al Viro [Sat, 11 May 2013 16:19:34 +0000 (12:19 -0400)]
mn10300: don't bother with VM_IO

io_remap_pfn_range() sets it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agohose_mmap_page_range(): io_remap_pfn_range() will set all those flags...
Al Viro [Sat, 11 May 2013 16:18:01 +0000 (12:18 -0400)]
hose_mmap_page_range(): io_remap_pfn_range() will set all those flags...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agosamsung: don't bother with setting VM_IO
Al Viro [Sat, 11 May 2013 16:15:47 +0000 (12:15 -0400)]
samsung: don't bother with setting VM_IO

io_remap_pfn_range() will set it just fine

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoconsolidate io_remap_pfn_range definitions
Al Viro [Sat, 11 May 2013 16:13:10 +0000 (12:13 -0400)]
consolidate io_remap_pfn_range definitions

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoUBIFS: fix a horrid bug
Artem Bityutskiy [Fri, 28 Jun 2013 11:15:15 +0000 (14:15 +0300)]
UBIFS: fix a horrid bug

Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.

This means that 'file->private_data' can be freed while 'ubifs_readdir()' uses
it, and this is a very bad bug: not only 'ubifs_readdir()' can return garbage,
but this may corrupt memory and lead to all kinds of problems like crashes an
security holes.

This patch fixes the problem by using the 'file->f_version' field, which
'->llseek()' always unconditionally sets to zero. We set it to 1 in
'ubifs_readdir()' and whenever we detect that it became 0, we know there was a
seek and it is time to clear the state saved in 'file->private_data'.

I tested this patch by writing a user-space program which runds readdir and
seek in parallell. I could easily crash the kernel without these patches, but
could not crash it with these patches.

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoUBIFS: prepare to fix a horrid bug
Artem Bityutskiy [Fri, 28 Jun 2013 11:15:14 +0000 (14:15 +0300)]
UBIFS: prepare to fix a horrid bug

Al Viro pointed me to the fact that '->readdir()' and '->llseek()' have no
mutual exclusion, which means the 'ubifs_dir_llseek()' can be run while we are
in the middle of 'ubifs_readdir()'.

First of all, this means that 'file->private_data' can be freed while
'ubifs_readdir()' uses it.  But this particular patch does not fix the problem.
This patch is only a preparation, and the fix will follow next.

In this patch we make 'ubifs_readdir()' stop using 'file->f_pos' directly,
because 'file->f_pos' can be changed by '->llseek()' at any point. This may
lead 'ubifs_readdir()' to returning inconsistent data: directory entry names
may correspond to incorrect file positions.

So here we introduce a local variable 'pos', read 'file->f_pose' once at very
the beginning, and then stick to 'pos'. The result of this is that when
'ubifs_dir_llseek()' changes 'file->f_pos' while we are in the middle of
'ubifs_readdir()', the latter "wins".

Cc: stable@vger.kernel.org
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoaout32 coredump compat fix
Al Viro [Sat, 22 Jun 2013 07:01:38 +0000 (11:01 +0400)]
aout32 coredump compat fix

dump_seek() does SEEK_CUR, not SEEK_SET; native binfmt_aout
handles it correctly (seeks by PAGE_SIZE - sizeof(struct user),
getting the current position to PAGE_SIZE), compat one seeks
by PAGE_SIZE and ends up at PAGE_SIZE + already written...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agosplice: don't pass the address of ->f_pos to methods
Al Viro [Thu, 20 Jun 2013 14:58:36 +0000 (18:58 +0400)]
splice: don't pass the address of ->f_pos to methods

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agomconsole: we'd better initialize pos before passing it to vfs_read()...
Al Viro [Wed, 19 Jun 2013 08:35:42 +0000 (12:35 +0400)]
mconsole: we'd better initialize pos before passing it to vfs_read()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agolseek(fd, n, SEEK_END) does *not* go to eof - n
Al Viro [Sun, 16 Jun 2013 17:06:06 +0000 (18:06 +0100)]
lseek(fd, n, SEEK_END) does *not* go to eof - n

When you copy some code, you are supposed to read it.  If nothing else,
there's a chance to spot and fix an obvious bug instead of sharing it...

X-Song: "I Got It From Agnes", by Tom Lehrer
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ Tom Lehrer? You're dating yourself, Al ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoLinux 3.10-rc6
Linus Torvalds [Sat, 15 Jun 2013 21:51:07 +0000 (11:51 -1000)]
Linux 3.10-rc6

11 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Sat, 15 Jun 2013 21:49:48 +0000 (11:49 -1000)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "These are a little later than I planned on since I got caught up with
  handling merges for 3.11 most of the week.

  Another week, another batch of fixes for arm-soc platforms.

  Again, nothing controversial.  A few more than would be ideal, but all
  are valid fixes.  In particular the prima2 panic patch is critical
  since it fixes a problem where multiplatform kernels panic on all but
  prima2 hardware."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: SAMSUNG: pm: Adjust for pinctrl- and DT-enabled platforms
  ARM: prima2: fix incorrect panic usage
  arm: mvebu: armada-xp-{gp,openblocks-ax3-4}: specify PCIe range
  ARM: Kirkwood: handle mv88f6282 cpu in __kirkwood_variant().
  ARM: omap3: clock: fix wrong container_of in clock36xx.c
  ARM: dts: OMAP5: Fix missing PWM capability to timer nodes
  ARM: dts: omap4-panda|sdp: Fix mux for twl6030 IRQ pin and msecure line
  ARM: dts: AM33xx: Fix properties on gpmc node
  arm: omap2: fix AM33xx hwmod infos for UART2
  ARM: OMAP3: Fix iva2_pwrdm settings for 3703

11 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Sat, 15 Jun 2013 21:47:56 +0000 (11:47 -1000)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Fix RTNL locking in batman-adv, from Matthias Schiffer.

 2) Don't allow non-passthrough macvlan devices to set NOPROMISC via
    netlink, otherwise we can end up with corrupted promisc counter
    values on the device.  From Michael S Tsirkin.

 3) Fix stmmac driver build with debugging defines enabled, from Dinh
    Nguyen.

 4) Make sure name string we give in socket address in AF_PACKET is NULL
    terminated, from Daniel Borkmann.

 5) Fix leaking of two uninitialized bytes of memory to userspace in
    l2tp, from Guillaume Nault.

 6) Clear IPCB(skb) before tunneling otherwise we touch dangling IP
    options state and crash.  From Saurabh Mohan.

 7) Fix suspend/resume for davinci_mdio by using suspend_late and
    resume_early.  From Mugunthan V N.

 8) Don't tag ip_tunnel_init_net and ip_tunnel_delete_net with
    __net_{init,exit}, they can be called outside of those contexts.
    From Eric Dumazet.

 9) Fix RX length error in sh_eth driver, from Yoshihiro Shimoda.

10) Fix missing sctp_outq initialization in some code paths of SCTP
    stack, from Neil Horman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
  sctp: fully initialize sctp_outq in sctp_outq_init
  netiucv: Hold rtnl between name allocation and device registration.
  tulip: Properly check dma mapping result
  net: sh_eth: fix incorrect RX length error if R8A7740
  ip_tunnel: remove __net_init/exit from exported functions
  drivers: net: davinci_mdio: restore mdio clk divider in mdio resume
  drivers: net: davinci_mdio: moving mdio resume earlier than cpsw ethernet driver
  net/ipv4: ip_vti clear skb cb before tunneling.
  tg3: Wait for boot code to finish after power on
  l2tp: Fix sendmsg() return value
  l2tp: Fix PPP header erasure and memory leak
  bonding: fix igmp_retrans type and two related races
  bonding: reset master mac on first enslave failure
  packet: packet_getname_spkt: make sure string is always 0-terminated
  net: ethernet: stmicro: stmmac: Fix compile error when STMMAC_XMIT_DEBUG used
  be2net: Fix 32-bit DMA Mask handling
  xen-netback: don't de-reference vif pointer after having called xenvif_put()
  macvlan: don't touch promisc without passthrough
  batman-adv: Don't handle address updates when bla is disabled
  batman-adv: forward late OGMs from best next hop
  ...

11 years agoMerge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Linus Torvalds [Sat, 15 Jun 2013 05:25:04 +0000 (19:25 -1000)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc

Pull powerpc fixes from Benjamin Herrenschmidt:
 "So here are 3 fixes still for 3.10.  Fixes are simple, bugs are nasty
  (though not recent regressions, nasty enough) and all targeted at
  stable"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc: Fix missing/delayed calls to irq_work
  powerpc: Fix emulation of illegal instructions on PowerNV platform
  powerpc: Fix stack overflow crash in resume_kernel when ftracing

11 years agosmp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().
David Daney [Fri, 14 Jun 2013 18:13:59 +0000 (11:13 -0700)]
smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu().

Thanks to commit f91eb62f71b3 ("init: scream bloody murder if interrupts
are enabled too early"), "bloody murder" is now being screamed.

With a MIPS OCTEON config, we use on_each_cpu() in our
irq_chip.irq_bus_sync_unlock() function.  This gets called in early as a
result of the time_init() call.  Because the !SMP version of
on_each_cpu() unconditionally enables irqs, we get:

    WARNING: at init/main.c:560 start_kernel+0x250/0x410()
    Interrupts were enabled early
    CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801
    Call Trace:
      show_stack+0x68/0x80
      warn_slowpath_common+0x78/0xb0
      warn_slowpath_fmt+0x38/0x48
      start_kernel+0x250/0x410

Suggested fix: Do what we already do in the SMP version of
on_each_cpu(), and use local_irq_save/local_irq_restore.  Because we
need a flags variable, make it a static inline to avoid name space
issues.

[ Change from v1: Convert on_each_cpu to a static inline function, add
  #include <linux/irqflags.h> to avoid build breakage on some files.

  on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as
  on_each_cpu(), but they are not causing !SMP bugs for me, so I will
  defer changing them to a less urgent patch. ]

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sat, 15 Jun 2013 05:18:56 +0000 (19:18 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull VFS fixes from Al Viro:
 "Several fixes + obvious cleanup (you've missed a couple of open-coded
  can_lookup() back then)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  snd_pcm_link(): fix a leak...
  use can_lookup() instead of direct checks of ->i_op->lookup
  move exit_task_namespaces() outside of exit_notify()
  fput: task_work_add() can fail if the caller has passed exit_task_work()
  ncpfs: fix rmdir returns Device or resource busy

11 years agoMerge tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs
Linus Torvalds [Sat, 15 Jun 2013 05:16:31 +0000 (19:16 -1000)]
Merge tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs

Pull xfs fixes from Ben Myers:
 - Remove noisy warnings about experimental support which spams the logs
 - Add padding to align directory and attr structures correctly
 - Set block number on child buffer on a root btree split
 - Disable verifiers during log recovery for non-CRC filesystems

* tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs:
  xfs: don't shutdown log recovery on validation errors
  xfs: ensure btree root split sets blkno correctly
  xfs: fix implicit padding in directory and attr CRC formats
  xfs: don't emit v5 superblock warnings on write

11 years agoMerge tag 'char-misc-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregk...
Linus Torvalds [Sat, 15 Jun 2013 05:15:36 +0000 (19:15 -1000)]
Merge tag 'char-misc-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc fixes from Greg Kroah-Hartman:
 "Here are some small mei driver fixes for 3.10-rc6 that fix some
  reported problems"

* tag 'char-misc-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  mei: me: clear interrupts on the resume path
  mei: nfc: fix nfc device freeing
  mei: init: Flush scheduled work before resetting the device

11 years agoMerge tag 'usb-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sat, 15 Jun 2013 05:14:39 +0000 (19:14 -1000)]
Merge tag 'usb-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg Kroah-Hartman:
 "Here are some small USB driver fixes that resolve some reported
  problems for 3.10-rc6

  Nothing major, just 3 USB serial driver fixes, and two chipidea fixes"

* tag 'usb-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: chipidea: fix id change handling
  usb: chipidea: fix no transceiver case
  USB: pl2303: fix device initialisation at open
  USB: spcp8x5: fix device initialisation at open
  USB: f81232: fix device initialisation at open

11 years agopowerpc: Fix missing/delayed calls to irq_work
Benjamin Herrenschmidt [Sat, 15 Jun 2013 02:13:40 +0000 (12:13 +1000)]
powerpc: Fix missing/delayed calls to irq_work

When replaying interrupts (as a result of the interrupt occurring
while soft-disabled), in the case of the decrementer, we are exclusively
testing for a pending timer target. However we also use decrementer
interrupts to trigger the new "irq_work", which in this case would
be missed.

This change the logic to force a replay in both cases of a timer
boundary reached and a decrementer interrupt having actually occurred
while disabled. The former test is still useful to catch cases where
a CPU having been hard-disabled for a long time completely misses the
interrupt due to a decrementer rollover.

CC: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
11 years agopowerpc: Fix emulation of illegal instructions on PowerNV platform
Paul Mackerras [Fri, 14 Jun 2013 10:07:41 +0000 (20:07 +1000)]
powerpc: Fix emulation of illegal instructions on PowerNV platform

Normally, the kernel emulates a few instructions that are unimplemented
on some processors (e.g. the old dcba instruction), or privileged (e.g.
mfpvr).  The emulation of unimplemented instructions is currently not
working on the PowerNV platform.  The reason is that on these machines,
unimplemented and illegal instructions cause a hypervisor emulation
assist interrupt, rather than a program interrupt as on older CPUs.
Our vector for the emulation assist interrupt just calls
program_check_exception() directly, without setting the bit in SRR1
that indicates an illegal instruction interrupt.  This fixes it by
making the emulation assist interrupt set that bit before calling
program_check_interrupt().  With this, old programs that use no-longer
implemented instructions such as dcba now work again.

CC: <stable@vger.kernel.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
11 years agopowerpc: Fix stack overflow crash in resume_kernel when ftracing
Michael Ellerman [Thu, 13 Jun 2013 11:04:56 +0000 (21:04 +1000)]
powerpc: Fix stack overflow crash in resume_kernel when ftracing

It's possible for us to crash when running with ftrace enabled, eg:

  Bad kernel stack pointer bffffd12 at c00000000000a454
  cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
      pc: c00000000000a454: resume_kernel+0x34/0x60
      lr: c00000000000335c: performance_monitor_common+0x15c/0x180
      sp: bffffd12
     msr: 8000000000001032
     dar: bffffd12
   dsisr: 42000000

If we look at current's stack (paca->__current->stack) we see it is
equal to c0000002ecab0000. Our stack is 16K, and comparing to
paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
kernel stack. This leads to us writing over our struct thread_info, and
in this case we have corrupted thread_info->flags and set
_TIF_EMULATE_STACK_STORE.

Dumping the stack we see:

  3:mon> t c0000002ecab0000
  [c0000002ecab0000c00000000002131c .performance_monitor_exception+0x5c/0x70
  [c0000002ecab0080c00000000000335c performance_monitor_common+0x15c/0x180
  --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
  [c0000002ecab0370c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
  [c0000002ecab0410c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab04b0c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab0520c0000000000d6b58 .idle_cpu+0x18/0x90
  [c0000002ecab05a0c00000000000a934 .return_to_handler+0x0/0x34
  [c0000002ecab0620c00000000001e660 .timer_interrupt+0x160/0x300
  [c0000002ecab06d0c0000000000025dc decrementer_common+0x15c/0x180
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
  [c0000002ecab09c0c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
  [c0000002ecab0fb0c00000000016fe3c .trace_graph_entry+0x13c/0x280
  [c0000002ecab1050c00000000003d038 .prepare_ftrace_return+0x98/0x130
  [c0000002ecab10f0c00000000000a920 .ftrace_graph_caller+0x14/0x28
  [c0000002ecab1160c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
  [c0000002ecab11d0c00000000000a934 .return_to_handler+0x0/0x34
  --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0

  ... and so on

__ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
path. At that point the irq state is not consistent, ie. interrupts are
hard disabled (by the exception entry), but the paca soft-enabled flag
may be out of sync.

This leads to the local_irq_restore() in trace_graph_entry() actually
enabling interrupts, which we do not want. Because we have not yet
reprogrammed the decrementer we immediately take another decrementer
exception, and recurse.

The fix is twofold. Firstly make sure we call DISABLE_INTS before
calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
the irq state in the paca with the hardware, making it safe again to
call local_irq_save/restore().

Although that should be sufficient to fix the bug, we also mark the
runlatch routines as notrace. They are called very early in the
exception entry and we are asking for trouble tracing them. They are
also fairly uninteresting and tracing them just adds unnecessary
overhead.

[ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873
  "powerpc: Rework runlatch code" by myself --BenH
]

CC: <stable@vger.kernel.org> [v3.4+]
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
11 years agosnd_pcm_link(): fix a leak...
Al Viro [Wed, 5 Jun 2013 18:07:08 +0000 (14:07 -0400)]
snd_pcm_link(): fix a leak...

in case when snd_pcm_stream_linked(substream) is true, we end up leaking
group.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agouse can_lookup() instead of direct checks of ->i_op->lookup
Al Viro [Thu, 6 Jun 2013 23:33:47 +0000 (19:33 -0400)]
use can_lookup() instead of direct checks of ->i_op->lookup

a couple of places got missed back when Linus has introduced that one...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agomove exit_task_namespaces() outside of exit_notify()
Oleg Nesterov [Fri, 14 Jun 2013 19:09:49 +0000 (21:09 +0200)]
move exit_task_namespaces() outside of exit_notify()

exit_notify() does exit_task_namespaces() after
forget_original_parent(). This was needed to ensure that ->nsproxy
can't be cleared prematurely, an exiting child we are going to
reparent can do do_notify_parent() and use the parent's (ours) pid_ns.

However, after 32084504 "pidns: use task_active_pid_ns in
do_notify_parent" ->nsproxy != NULL is no longer needed, we rely
on task_active_pid_ns().

Move exit_task_namespaces() from exit_notify() to do_exit(), after
exit_fs() and before exit_task_work().

This solves the problem reported by Andrey, free_ipc_ns()->shm_destroy()
does fput() which needs task_work_add().

Note: this particular problem can be fixed if we change fput(), and
that change makes sense anyway. But there is another reason to move
the callsite. The original reason for exit_task_namespaces() from
the middle of exit_notify() was subtle and it has already gone away,
now this looks confusing. And this allows us do simplify exit_notify(),
we can avoid unlock/lock(tasklist) and we can use ->exit_state instead
of PF_EXITING in forget_original_parent().

Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agofput: task_work_add() can fail if the caller has passed exit_task_work()
Oleg Nesterov [Fri, 14 Jun 2013 19:09:47 +0000 (21:09 +0200)]
fput: task_work_add() can fail if the caller has passed exit_task_work()

fput() assumes that it can't be called after exit_task_work() but
this is not true, for example free_ipc_ns()->shm_destroy() can do
this. In this case fput() silently leaks the file.

Change it to fallback to delayed_fput_work if task_work_add() fails.
The patch looks complicated but it is not, it changes the code from

if (PF_KTHREAD) {
schedule_work(...);
return;
}
task_work_add(...)

to
if (!PF_KTHREAD) {
if (!task_work_add(...))
return;
/* fallback */
}
schedule_work(...);

As for shm_destroy() in particular, we could make another fix but I
think this change makes sense anyway. There could be another similar
user, it is not safe to assume that task_work_add() can't fail.

Reported-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
11 years agoxfs: don't shutdown log recovery on validation errors
Dave Chinner [Wed, 12 Jun 2013 02:19:06 +0000 (12:19 +1000)]
xfs: don't shutdown log recovery on validation errors

Unfortunately, we cannot guarantee that items logged multiple times
and replayed by log recovery do not take objects back in time. When
they are taken back in time, the go into an intermediate state which
is corrupt, and hence verification that occurs on this intermediate
state causes log recovery to abort with a corruption shutdown.

Instead of causing a shutdown and unmountable filesystem, don't
verify post-recovery items before they are written to disk. This is
less than optimal, but there is no way to detect this issue for
non-CRC filesystems If log recovery successfully completes, this
will be undone and the object will be consistent by subsequent
transactions that are replayed, so in most cases we don't need to
take drastic action.

For CRC enabled filesystems, leave the verifiers in place - we need
to call them to recalculate the CRCs on the objects anyway. This
recovery problem can be solved for such filesystems - we have a LSN
stamped in all metadata at writeback time that we can to determine
whether the item should be replayed or not. This is a separate piece
of work, so is not addressed by this patch.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 9222a9cf86c0d64ffbedf567412b55da18763aa3)

11 years agoxfs: ensure btree root split sets blkno correctly
Dave Chinner [Wed, 12 Jun 2013 02:19:08 +0000 (12:19 +1000)]
xfs: ensure btree root split sets blkno correctly

For CRC enabled filesystems, the BMBT is rooted in an inode, so it
passes through a different code path on root splits than the
freespace and inode btrees. This is much less traversed by xfstests
than the other trees. When testing on a 1k block size filesystem,
I've been seeing ASSERT failures in generic/234 like:

XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317

which are generally preceded by a lblock check failure. I noticed
this in the bmbt stats:

$ pminfo -f xfs.btree.block_map

xfs.btree.block_map.lookup
    value 39135

xfs.btree.block_map.compare
    value 268432

xfs.btree.block_map.insrec
    value 15786

xfs.btree.block_map.delrec
    value 13884

xfs.btree.block_map.newroot
    value 2

xfs.btree.block_map.killroot
    value 0
.....

Very little coverage of root splits and merges. Indeed, on a 4k
filesystem, block_map.newroot and block_map.killroot are both zero.
i.e. the code is not exercised at all, and it's the only generic
btree infrastructure operation that is not exercised by a default run
of xfstests.

Turns out that on a 1k filesystem, generic/234 accounts for one of
those two root splits, and that is somewhat of a smoking gun. In
fact, it's the same problem we saw in the directory/attr code where
headers are memcpy()d from one block to another without updating the
self describing metadata.

Simple fix - when copying the header out of the root block, make
sure the block number is updated correctly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit ade1335afef556df6538eb02e8c0dc91fbd9cc37)

11 years agoxfs: fix implicit padding in directory and attr CRC formats
Dave Chinner [Wed, 12 Jun 2013 02:19:07 +0000 (12:19 +1000)]
xfs: fix implicit padding in directory and attr CRC formats

Michael L. Semon has been testing CRC patches on a 32 bit system and
been seeing assert failures in the directory code from xfs/080.
Thanks to Michael's heroic efforts with printk debugging, we found
that the problem was that the last free space being left in the
directory structure was too small to fit a unused tag structure and
it was being corrupted and attempting to log a region out of bounds.
Hence the assert failure looked something like:

.....
#5 calling xfs_dir2_data_log_unused() 36 32
#1 4092 4095 4096
#2 8182 8183 4096
XFS: Assertion failed: first <= last && last < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568

Where #1 showed the first region of the dup being logged (i.e. the
last 4 bytes of a directory buffer) and #2 shows the corrupt values
being calculated from the length of the dup entry which overflowed
the size of the buffer.

It turns out that the problem was not in the logging code, nor in
the freespace handling code. It is an initial condition bug that
only shows up on 32 bit systems. When a new buffer is initialised,
where's the freespace that is set up:

[  172.316249] calling xfs_dir2_leaf_addname() from xfs_dir_createname()
[  172.316346] #9 calling xfs_dir2_data_log_unused()
[  172.316351] #1 calling xfs_trans_log_buf() 60 63 4096
[  172.316353] #2 calling xfs_trans_log_buf() 4094 4095 4096

Note the offset of the first region being logged? It's 60 bytes into
the buffer. Once I saw that, I pretty much knew that the bug was
going to be caused by this.

Essentially, all direct entries are rounded to 8 bytes in length,
and all entries start with an 8 byte alignment. This means that we
can decode inplace as variables are naturally aligned. With the
directory data supposedly starting on a 8 byte boundary, and all
entries padded to 8 bytes, the minimum freespace in a directory
block is supposed to be 8 bytes, which is large enough to fit a
unused data entry structure (6 bytes in size). The fact we only have
4 bytes of free space indicates a directory data block alignment
problem.

And what do you know - there's an implicit hole in the directory
data block header for the CRC format, which means the header is 60
byte on 32 bit intel systems and 64 bytes on 64 bit systems. Needs
padding. And while looking at the structures, I found the same
problem in the attr leaf header. Fix them both.

Note that this only affects 32 bit systems with CRCs enabled.
Everything else is just fine. Note that CRC enabled filesystems created
before this fix on such systems will not be readable with this fix
applied.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Debugged-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 8a1fd2950e1fe267e11fc8c85dcaa6b023b51b60)