git.karo-electronics.de Git - linux-beck.git/log

dcache: Don't set DISCONNECTED on "pseudo filesystem" dentries

I can't for the life of me see any reason why anyone should care whether
a dentry that is never hooked into the dentry cache would need
DCACHE_DISCONNECTED set.

This originates from 4b936885ab04dc6e0bb0ef35e0e23c1a7364d9e5 "fs:
improve scalability of pseudo filesystems", which probably just made the
false assumption the DCACHE_DISCONNECTED was meant to be set on anything
not connected to a parent somehow.

So this is just confusing. Ideally the only uses of DCACHE_DISCONNECTED
would be in the filehandle-lookup code, which needs it to ensure
dentries are connected into the dentry tree before use.

I left d_alloc_pseudo there even though it's now equivalent to
__d_alloc(), just on the theory the name is better documentation of its
intended use outside dcache.c.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

dcache: use IS_ROOT to decide where dentry is hashed

Every hashed dentry is either hashed in the dentry_hashtable, or a
superblock's s_anon list.

__d_drop() assumes it can determine which is the case by checking
DCACHE_DISCONNECTED; this is not true.

It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
only hashed on dentry_hashtable, but is fully connected to its parents
back to the root.

But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
attempts to connect a directory (found by filehandle lookup) back to
root by ascending to parents and performing lookups one at a time.  It
does not clear DCACHE_DISCONNECTED until it's done, and that is not at
all an atomic process.

In particular, it is possible for DCACHE_DISCONNECTED to be set on a
dentry which is hashed on the dentry_hashtable.

Instead, use IS_ROOT() to check which hash chain a dentry is on.  This
*does* work:

Dentries are hashed only by:

- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.

- __d_rehash, called by _d_rehash: hashes to the dentry's
  parent, and all callers of _d_rehash appear to have d_parent
  set to a "real" parent.
- __d_rehash, called by __d_move: rehashes the moved dentry to
  hash chain determined by target, and assigns target's d_parent
  to its d_parent, before dropping the dentry's d_lock.

Therefore I believe it's safe for a holder of a dentry's d_lock to
assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
true.

I believe the incorrect assumption about DCACHE_DISCONNECTED was
originally introduced by ceb5bdc2d246 "fs: dcache per-bucket dcache hash
locking".

Also add a comment while we're here.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ocfs2: get rid of impossible checks

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

qnx4: i_sb is never NULL

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

exportfs: fix 32-bit nfsd handling of 64-bit inode numbers

Symptoms were spurious -ENOENTs on stat of an NFS filesystem from a
32-bit NFS server exporting a very large XFS filesystem, when the
server's cache is cold (so the inodes in question are not in cache).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Trevor Cordes <trevor@tecnopolis.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: split out vfs_getattr_nosec

The filehandle lookup code wants this version of getattr.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

iget/iget5: don't bother with ->i_lock until we find a match

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

VFS: Put a small type field into struct dentry::d_flags

Put a type field into struct dentry::d_flags to indicate if the dentry is one
of the following types that relate particularly to pathwalk:

Miss (negative dentry)
Directory
"Automount" directory (defective - no i_op->lookup())
Symlink
Other (regular, socket, fifo, device)

The type field is set to one of the first five types on a dentry by calls to
__d_instantiate() and d_obtain_alias() from information in the inode (if one is
given).

The type is cleared by dentry_unlink_inode() when it reconstitutes an existing
dentry as a negative dentry.

Accessors provided are:

d_set_type(dentry, type)
d_is_directory(dentry)
d_is_autodir(dentry)
d_is_symlink(dentry)
d_is_file(dentry)
d_is_negative(dentry)
d_is_positive(dentry)

A bunch of checks in pathname resolution switched to those.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

elf{,_fdpic} coredump: get rid of pointless if (siginfo->si_signo)

we can't get to do_coredump() if that condition isn't satisfied...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

constify do_coredump() argument

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

constify copy_siginfo_to_user{,32}()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

... and kill anon_inode_getfile_private()

it's a seriously misguided API, now fortunately without users.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

rework aio migrate pages to use aio fs

Don't abuse anon_inodes.c to host private files needed by aio;
we can bloody well declare a mini-fs of our own instead of
patching up what anon_inodes can create for us.

Tested-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

take anon inode allocation to libfs.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

new helper: dump_align()

dump_skip to given alignment...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

spufs: get rid of dump_emit() wrappers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

dump_skip(): dump_seek() replacement taking coredump_params

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

make dump_emit() use vfs_write() instead of banging at ->f_op->write directly

... and deal with short writes properly - the output might be to pipe, after
all; as it is, e.g. no-MMU case of elf_fdpic coredump can write a whole lot
more than a page worth of data at one call.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

binfmt_elf: count notes towards coredump limit

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

aout: switch to dump_emit

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

switch elf_coredump_extra_notes_write() to dump_emit()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

convert the rest of binfmt_elf_fdpic to dump_emit()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

binfmt_elf: convert writing actual dump pages to dump_emit()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

switch elf_core_write_extra_data() to dump_emit()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

switch elf_core_write_extra_phdrs() to dump_emit()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

new helper: dump_emit()

dump_write() analog, takes core_dump_params instead of file,
keeps track of the amount written in cprm->written and checks for
cprm->limit. Start using it in binfmt_elf.c...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

restore 32bit aout coredump

just getting rid of bitrot

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

no need to keep brlock macros anymore...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

coda_revalidate_inode(): switch to passing inode...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fold __d_shrink() into its only remaining caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

get rid of s_files and files_lock

The only thing we need it for is alt-sysrq-r (emergency remount r/o)
and these days we can do just as well without going through the
list of files.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

get rid of {lock,unlock}_rcu_walk()

those have become aliases for rcu_read_{lock,unlock}()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

RCU'd vfsmounts

* RCU-delayed freeing of vfsmounts
* vfsmount_lock replaced with a seqlock (mount_lock)
* sequence number from mount_lock is stored in nameidata->m_seq and
used when we exit RCU mode
* new vfsmount flag - MNT_SYNC_UMOUNT.  Set by umount_tree() when its
caller knows that vfsmount will have no surviving references.
* synchronize_rcu() done between unlocking namespace_sem in namespace_unlock()
and doing pending mntput().
* new helper: legitimize_mnt(mnt, seq).  Checks the mount_lock sequence
number against seq, then grabs reference to mnt.  Then it rechecks mount_lock
again to close the race and either returns success or drops the reference it
has acquired.  The subtle point is that in case of MNT_SYNC_UMOUNT we can
simply decrement the refcount and sod off - aforementioned synchronize_rcu()
makes sure that final mntput() won't come until we leave RCU mode.  We need
that, since we don't want to end up with some lazy pathwalk racing with
umount() and stealing the final mntput() from it - caller of umount() may
expect it to return only once the fs is shut down and we don't want to break
that.  In other cases (i.e. with MNT_SYNC_UMOUNT absent) we have to do
full-blown mntput() in case of mount_lock sequence number mismatch happening
just as we'd grabbed the reference, but in those cases we won't be stealing
the final mntput() from anything that would care.
* mntput_no_expire() doesn't lock anything on the fast path now.  Incidentally,
SMP and UP cases are handled the same way - no ifdefs there.
* normal pathname resolution does *not* do any writes to mount_lock.  It does,
of course, bump the refcounts of vfsmount and dentry in the very end, but that's
it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

switch shrink_dcache_for_umount() to use of d_walk()

we have too many iterators in fs/dcache.c...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fuse: rcu-delay freeing fuse_conn

makes ->permission() and ->d_revalidate() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

pid_namespace: make freeing struct pid_namespace rcu-delayed

makes procfs ->premission() instances safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ncpfs: rcu-delay unload_nls() and freeing ncp_server

makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fat: rcu-delay unloading nls and freeing sbi

makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

cifs: rcu-delay unload_nls() and freeing sbi

makes ->d_hash(), ->d_compare() and ->permission() safety in RCU mode
independent from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

autofs4: make freeing sbi rcu-delayed

makes ->d_managed() safety in RCU mode independent from vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

adfs: delayed freeing of sbi

makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

hpfs: make freeing sbi and codetables rcu-delayed

makes ->d_hash() and ->d_compare() safety in RCU mode independent
from vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

make freeing super_block rcu-delayed

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: introduce d_instantiate_no_diralias()

...which just returns -EBUSY if a directory alias would be created.

This is to be used by fuse mkdir to make sure that a buggy or malicious
userspace filesystem doesn't do anything nasty. Previously fuse used a
private mutex for this purpose, which can now go away.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>

move taking vfsmount_lock down into prepend_path()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

split __lookup_mnt() in two functions

Instead of passing the direction as argument (and checking it on every
step through the hash chain), just have separate __lookup_mnt() and
__lookup_mnt_last(). And use the standard iterators...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

uninline destroy_super(), consolidate alloc_super()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

isofs: don't pass dentry to isofs_hash{i,}_common()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

new helpers: lock_mount_hash/unlock_mount_hash

aka br_write_{lock,unlock} of vfsmount_lock. Inlines in fs/mount.h,
vfsmount_lock extern moved over there as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

don't bother with vfsmount_lock in mounts_poll()

wake_up_interruptible/poll_wait provide sufficient barriers;
just use ACCESS_ONCE() to fetch ns->event and that's it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

namespace.c: get rid of mnt_ghosts

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fold dup_mnt_ns() into its only surviving caller

should've been done 6 years ago...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

mnt_set_expiry() doesn't need vfsmount_lock

->mnt_expire is protected by namespace_sem

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

finish_automount() doesn't need vfsmount_lock for removal from expiry list

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fs/namespace.c: bury long-dead define

MNT_WRITER_UNDERFLOW_LIMIT has been missed 4 years ago when it became unused.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fold mntfree() into mntput_no_expire()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

do_remount(): pull touch_mnt_namespace() up

... and don't bother with dropping and regaining vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

dup_mnt_ns(): get rid of pointless grabbing of vfsmount_lock

mnt_list is protected by namespace_sem, not vfsmount_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fs_is_visible only needs namespace_sem held shared

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

initialize namespace_sem statically

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

file->f_op is never NULL...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

rtl8188eu: remove dead code

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

dmxdev: get rid of pointless clearing ->f_op

nobody else will see that struct file after return from ->release()
anyway; just leave ->f_op as is and let __fput() do that fops_put().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

consolidate the reassignments of ->f_op in ->open() instances

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

put_mnt_ns(): use drop_collected_mounts()

... rather than open-coding it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ncpfs: switch to %p[dD]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ubifs: switch to %pd

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

sunrpc: switch to %pd

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

nfsd: switch to %p[dD]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

nfs: use %p[dD] instead of open-coded (and often racy) equivalents

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

befs: split symlink iops in two - for short and long symlinks resp.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

new helper: kfree_put_link()

duplicated to hell and back...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

libfs: get exports to definitions of objects being exported...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ecryptfs: ->lower_path.dentry is never NULL

... on anything found via ->d_fsdata

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ecryptfs: get rid of ecryptfs_set_dentry_lower{,_mnt}

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ecryptfs: don't leave RCU pathwalk immediately

If the underlying dentry doesn't have ->d_revalidate(), there's no need to
force dropping out of RCU mode. All we need for that is to make freeing
ecryptfs_dentry_info RCU-delayed.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ecryptfs: check DCACHE_OP_REVALIDATE instead of ->d_op

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

9p: make v9fs_cache_inode_{get,put,set}_cookie empty inlines for !9P_CACHEFS

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Linux 3.12-rc4

net: Update the sysctl permissions handler to test effective uid/gid

Modify the code to use current_euid(), and in_egroup_p, as in done
in fs/proc/proc_sysctl.c:test_perm()

Cc: stable@vger.kernel.org
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reported-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
"Here are the outstanding target fixes queued up for v3.12-rc4 code.

  The highlights include:

   - Make vhost/scsi tag percpu_ida_alloc() use GFP_ATOMIC
   - Allow sess_cmd_map allocation failure fallback to use vzalloc
   - Fix COMPARE_AND_WRITE se_cmd->data_length bug with FILEIO backends
   - Fixes for COMPARE_AND_WRITE callback recursive failure OOPs + non
     zero scsi_status bug
   - Make iscsi-target do acknowledgement tag release from RX context
   - Setup iscsi-target with extra (cmdsn_depth / 2) percpu_ida tags

  Also included is a iscsi-target patch CC'ed for v3.10+ that avoids
  legacy wait_for_task=true release during fast-past StatSN
  acknowledgement, and two other SRP target related patches that address
  long-standing issues that are CC'ed for v3.3+.

  Extra thanks to Thomas Glanzmann for his testing feedback with
  COMPARE_AND_WRITE + EXTENDED_COPY VAAI logic"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iscsi-target; Allow an extra tag_num / 2 number of percpu_ida tags
  iscsi-target: Perform release of acknowledged tags from RX context
  iscsi-target: Only perform wait_for_tasks when performing shutdown
  target: Fail on non zero scsi_status in compare_and_write_callback
  target: Fix recursive COMPARE_AND_WRITE callback failure
  target: Reset data_length for COMPARE_AND_WRITE to NoLB * block_size
  ib_srpt: always set response for task management
  target: Fall back to vzalloc upon ->sess_cmd_map kzalloc failure
  vhost/scsi: Use GFP_ATOMIC with percpu_ida_alloc for obtaining tag
  ib_srpt: Destroy cm_id before destroying QP.
  target: Fix xop->dbl assignment in target_xcopy_parse_segdesc_02

Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull slave-dmaengine fixes from Vinod Koul:
"Here is the slave dmanegine fixes.  We have the fix for deadlock issue
  on imx-dma by Michael and Josh's edma config fix along with author
  change"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: imx-dma: fix callback path in tasklet
  dmaengine: imx-dma: fix lockdep issue between irqhandler and tasklet
  dmaengine: imx-dma: fix slow path issue in prep_dma_cyclic
  dma/Kconfig: Make TI_EDMA select TI_PRIV_EDMA
  edma: Update author email address

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

Pull btrfs fixes from Chris Mason:
"This is a small collection of fixes, including a regression fix from
  Liu Bo that solves rare crashes with compression on.

  I've merged my for-linus up to 3.12-rc3 because the top commit is only
  meant for 3.12.  The rest of the fixes are also available in my master
  branch on top of my last 3.11 based pull"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: Fix crash due to not allocating integrity data for a bioset
  Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing
  Btrfs: eliminate races in worker stopping code
  Btrfs: fix crash of compressed writes
  Btrfs: fix transid verify errors when recovering log tree

Merge tag 'gpio-v3.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
"Two patches for the OMAP driver, dealing with setting up IRQs properly
  on the device tree boot path"

* tag 'gpio-v3.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio/omap: auto-setup a GPIO when used as an IRQ
  gpio/omap: maintain GPIO and IRQ usage separately

Merge tag 'usb-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Here are none fixes for various USB driver problems.  The majority are
  gadget/musb fixes, but there are some new device ids in here as well"

* tag 'usb-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: chipidea: add Intel Clovertrail pci id
  usb: gadget: s3c-hsotg: fix can_write limit for non-periodic endpoints
  usb: gadget: f_fs: fix error handling
  usb: musb: dsps: do not bind to "musb-hdrc"
  USB: serial: option: Ignore card reader interface on Huawei E1750
  usb: musb: gadget: fix otg active status flag
  usb: phy: gpio-vbus: fix deferred probe from __init
  usb: gadget: pxa25x_udc: fix deferred probe from __init
  usb: musb: fix otg default state

Merge tag 'tty-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty fixes from Greg KH:
"Here are two tty driver fixes for 3.12-rc4.

  One fixes the reported regression in the n_tty code that a number of
  people found recently, and the other one fixes an issue with xen
  consoles that broke in 3.10"

* tag 'tty-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  xen/hvc: allow xenboot console to be used again
  tty: Fix pty master read() after slave closes

Merge tag 'staging-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging fixes from Greg KH:
"Here are 4 tiny staging and iio driver fixes for 3.12-rc4.  Nothing
  major, just some small fixes for reported issues"

* tag 'staging-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: comedi: ni_65xx: (bug fix) confine insn_bits to one subdevice
  iio:magnetometer: Bugfix magnetometer default output registers
  iio: Remove debugfs entries in iio_device_unregister()
  iio: amplifiers: ad8366: Remove regulator_put

btrfs: Fix crash due to not allocating integrity data for a bioset

When btrfs creates a bioset, we must also allocate the integrity data pool.
Otherwise btrfs will crash when it tries to submit a bio to a checksumming
disk:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
PGD 2305e4067 PUD 23063d067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache
sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet
raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug]
CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000
RIP: 0010:[<ffffffff8111e28a>]  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
RSP: 0018:ffff880230699688  EFLAGS: 00010286
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445
RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000
RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008
R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210
R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720
FS:  00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0
Stack:
  ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250
  ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000
  0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900
Call Trace:
  [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0
  [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360
  [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150
  [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs]
  [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460
  [<ffffffff8123e58a>] generic_make_request+0xca/0x100
  [<ffffffff8123e639>] submit_bio+0x79/0x160
  [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs]
  [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs]
  [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs]
  [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs]
  [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0
  [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260
  [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120
[btrfs]
  [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs]
  [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs]
  [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs]
  [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0
  [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0
  [<ffffffff81176ba0>] mount_fs+0x20/0xd0
  [<ffffffff81191096>] vfs_kern_mount+0x76/0x120
  [<ffffffff81193320>] do_mount+0x200/0xa40
  [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80
  [<ffffffff81193bf0>] SyS_mount+0x90/0xe0
  [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f
Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48
89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7
44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74
RIP  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
  RSP <ffff880230699688>
CR2: 0000000000000018
---[ end trace 7a96042017ed21e2 ]---

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>

Merge branch 'for-linus' into for-linus-3.12

Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fixes from Steve French:
"Small set of cifs fixes.  Most important is Jeff's fix that works
  around disconnection problems which can be caused by simultaneous use
  of user space tools (starting a long running smbclient backup then
  doing a cifs kernel mount) or multiple cifs mounts through a NAT, and
  Jim's fix to deal with reexport of cifs share.

  I expect to send two more cifs fixes next week (being tested now) -
  fixes to address an SMB2 unmount hang when server dies and a fix for
  cifs symlink handling of Windows "NFS" symlinks"

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  [CIFS] update cifs.ko version
  [CIFS] Remove ext2 flags that have been moved to fs.h
  [CIFS] Provide sane values for nlink
  cifs: stop trying to use virtual circuits
  CIFS: FS-Cache: Uncache unread pages in cifs_readpages() before freeing them

Merge tag 'pci-v3.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fix from Bjorn Helgaas:
"We merged what was intended to be an MMCONFIG cleanup, but in fact,
  for systems without _CBA (which is almost everything), it broke
  extended config space for domain 0 and it broke all config space for
  other domains.

  This reverts the change"

* tag 'pci-v3.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  Revert "x86/PCI: MMCONFIG: Check earlier for MMCONFIG region at address zero"

Revert "x86/PCI: MMCONFIG: Check earlier for MMCONFIG region at address zero"

This reverts commit 07f9b61c3915e8eb156cb4461b3946736356ad02.

07f9b61c was intended to be a cleanup that didn't change anything, but in
fact, for systems without _CBA (which is almost everything), it broke
extended config space for domain 0 and all config space for other domains.

Reference: http://lkml.kernel.org/r/20131004011806.GE20450@dangermouse.emea.sgi.com
Reported-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Merge tag 'pm+acpi-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:

- The resume part of user space driven hibernation (s2disk) is now
   broken after the change that moved the creation of memory bitmaps to
   after the freezing of tasks, because I forgot that the resume utility
   loaded the image before freezing tasks and needed the bitmaps for
   that.  The fix adds special handling for that case.

- One of recent commits changed the export of acpi_bus_get_device() to
   EXPORT_SYMBOL_GPL(), which was technically correct but broke existing
   binary modules using that function including one in particularly
   widespread use.  Change it back to EXPORT_SYMBOL().

- The intel_pstate driver sometimes fails to disable turbo if its
   no_turbo sysfs attribute is set.  Fix from Srinivas Pandruvada.

- One of recent cpufreq fixes forgot to update a check in cpufreq-cpu0
   which still (incorrectly) treats non-NULL as non-error.  Fix from
   Philipp Zabel.

- The SPEAr cpufreq driver uses a wrong variable type in one place
   preventing it from catching errors returned by one of the functions
   called by it.  Fix from Sachin Kamat.

* tag 'pm+acpi-3.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: Use EXPORT_SYMBOL() for acpi_bus_get_device()
  intel_pstate: fix no_turbo
  cpufreq: cpufreq-cpu0: NULL is a valid regulator, part 2
  cpufreq: SPEAr: Fix incorrect variable type
  PM / hibernate: Fix user space driven resume regression

Merge tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:
"There are lockdep annotations for project quotas, a fix for dirent
  dtype support on v4 filesystems, a fix for a memory leak in recovery,
  and a fix for the build error that resulted from it.  D'oh"

* tag 'xfs-for-linus-v3.12-rc4' of git://oss.sgi.com/xfs/xfs:
  xfs: Use kmem_free() instead of free()
  xfs: fix memory leak in xlog_recover_add_to_trans
  xfs: dirent dtype presence is dependent on directory magic numbers
  xfs: lockdep needs to know about 3 dquot-deep nesting

selinux: remove 'flags' parameter from avc_audit()

Now avc_audit() has no more users with that parameter. Remove it.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

selinux: avc_has_perm_flags has no more users

.. so get rid of it. The only indirect users were all the
avc_has_perm() callers which just expanded to have a zero flags
argument.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing

free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
can be processed before btrfs_scratch_superblock is called, which would
result in a use-after-free on btrfs_device contents. Fix this by
zeroing the superblock before the rcu callback is registered.

Cc: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

Btrfs: eliminate races in worker stopping code

The current implementation of worker threads in Btrfs has races in
worker stopping code, which cause all kinds of panics and lockups when
running btrfs/011 xfstest in a loop.  The problem is that
btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
check_busy_worker and __btrfs_start_workers.

E.g., check_idle_worker race flow:

       btrfs_stop_workers():            check_idle_worker(aworker):
- grabs the lock
- splices the idle list into the
  working list
- removes the first worker from the
  working list
- releases the lock to wait for
  its kthread's completion
                                  - grabs the lock
                                  - if aworker is on the working list,
                                    moves aworker from the working list
                                    to the idle list
                                  - releases the lock
- grabs the lock
- puts the worker
- removes the second worker from the
  working list
                              ......
        btrfs_stop_workers returns, aworker is on the idle list
                 FS is umounted, memory is freed
                              ......
              aworker is waken up, fireworks ensue

With this applied, I wasn't able to trigger the problem in 48 hours,
whereas previously I could reliably reproduce at least one of these
races within an hour.

Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

Btrfs: fix crash of compressed writes

The crash[1] is found by xfstests/generic/208 with "-o compress",
it's not reproduced everytime, but it does panic.

The bug is quite interesting, it's actually introduced by a recent commit
(573aecafca1cf7a974231b759197a1aebcf39c2a,
Btrfs: actually limit the size of delalloc range).

Btrfs implements delay allocation, so during writeback, we
(1) get a page A and lock it
(2) search the state tree for delalloc bytes and lock all pages within the range
(3) process the delalloc range, including find disk space and create
    ordered extent and so on.
(4) submit the page A.

It runs well in normal cases, but if we're in a racy case, eg.
buffered compressed writes and aio-dio writes,
sometimes we may fail to lock all pages in the 'delalloc' range,
in which case, we need to fall back to search the state tree again with
a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).

The mentioned commit has a side effect, that is, in the fallback case,
we can find delalloc bytes before the index of the page we already have locked,
so we're in the case of (delalloc_end <= *start) and return with (found > 0).

This ends with not locking delalloc pages but making ->writepage still
process them, and the crash happens.

This fixes it by just thinking that we find nothing and returning to caller
as the caller knows how to deal with it properly.

[1]:
------------[ cut here ]------------
kernel BUG at mm/page-writeback.c:2170!
[...]
CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G           O 3.11.0+ #8
[...]
RIP: 0010:[<ffffffff810f5093>]  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[...]
[ 4934.248731] Stack:
[ 4934.248731]  ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
[ 4934.248731]  0000000000000000 0000000000000000 0000000000000fff 0000000000000620
[ 4934.248731]  ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
[ 4934.248731] Call Trace:
[ 4934.248731]  [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
[ 4934.248731]  [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs]
[ 4934.248731]  [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b
[ 4934.248731]  [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs]
[ 4934.248731]  [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs]
[ 4934.248731]  [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
[ 4934.248731]  [<ffffffff810608f5>] kthread+0x8d/0x95
[ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731]  [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0
[ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
[ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
[ 4934.248731] RIP  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
[ 4934.248731]  RSP <ffff8801869f9c48>
[ 4934.280307] ---[ end trace 36f06d3f8750236a ]---

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>

Btrfs: fix transid verify errors when recovering log tree

If we crash with a log, remount and recover that log, and then crash before we
can commit another transaction we will get transid verify errors on the next
mount.  This is because we were not zero'ing out the log when we committed the
transaction after recovery.  This is ok as long as we commit another transaction
at some point in the future, but if you abort or something else goes wrong you
can end up in this weird state because the recovery stuff says that the tree log
should have a generation+1 of the super generation, which won't be the case of
the transaction that was started for recovery.  Fix this by removing the check
and _always_ zero out the log portion of the super when we commit a transaction.
This fixes the transid verify issues I was seeing with my force errors tests.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>