git.karo-electronics.de Git - linux-beck.git/log

vfs: fix typo in s_op->alloc_inode() documentation

The function which calls s_op->alloc_inode() is not inode_alloc(), but
instead alloc_inode() which lives in fs/inode.c .

The typo was there from the beginning from 5ea626aa (VFS: update
documentation, 2005) - there was no standalone inode_alloc() for the
whole kernel history.

Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

constify file_inode()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

handle suicide on late failure exits in execve() in search_binary_handler()

... rather than doing that in the guts of ->load_binary().
[updated to fix the bug spotted by Shentino - for SIGSEGV we really need
something stronger than send_sig_info(); again, better do that in one place]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

dcache.c: call ->d_prune() regardless of d_unhashed()

the only in-tree instance checks d_unhashed() anyway,
out-of-tree code can preserve the current behaviour by
adding such check if they want it and we get an ability
to use it in cases where we *want* to be notified of
killing being inevitable before ->d_lock is dropped,
whether it's unhashed or not. In particular, autofs
would benefit from that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

d_prune_alias(): just lock the parent and call __dentry_kill()

The only reason for games with ->d_prune() was __d_drop(), which
was needed only to force dput() into killing the sucker off.

Note that lock_parent() can be called under ->i_lock and won't
drop it, so dentry is safe from somebody managing to kill it
under us - it won't happen while we are holding ->i_lock.

__dentry_kill() is called only with ->d_lockref.count being 0
(here and when picked from shrink list) or 1 (dput() and dropping
the ancestors in shrink_dentry_list()), so it will never be called
twice - the first thing it's doing is making ->d_lockref.count
negative and once that happens, nothing will increment it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

proc: Update proc_flush_task_mnt to use d_invalidate

Now that d_invalidate always succeeds and flushes mount points use
it in stead of a combination of shrink_dcache_parent and d_drop
in proc_flush_task_mnt. This removes the danger of a mount point
under /proc/<pid>/... becoming unreachable after the d_drop.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: Remove d_drop calls from d_revalidate implementations

Now that d_invalidate always succeeds it is not longer necessary or
desirable to hard code d_drop calls into filesystem specific
d_revalidate implementations.

Remove the unnecessary d_drop calls and rely on d_invalidate
to drop the dentries. Using d_invalidate ensures that paths
to mount points will not be dropped.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: Make d_invalidate return void

Now that d_invalidate can no longer fail, stop returning a useless
return code. For the few callers that checked the return code update
remove the handling of d_invalidate failure.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: Merge check_submounts_and_drop and d_invalidate

Now that d_invalidate is the only caller of check_submounts_and_drop,
expand check_submounts_and_drop inline in d_invalidate.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: Remove unnecessary calls of check_submounts_and_drop

Now that check_submounts_and_drop can not fail and is called from
d_invalidate there is no longer a need to call check_submounts_and_drom
from filesystem d_revalidate methods so remove it.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: Lazily remove mounts on unlinked files and directories.

With the introduction of mount namespaces and bind mounts it became
possible to access files and directories that on some paths are mount
points but are not mount points on other paths.  It is very confusing
when rm -rf somedir returns -EBUSY simply because somedir is mounted
somewhere else.  With the addition of user namespaces allowing
unprivileged mounts this condition has gone from annoying to allowing
a DOS attack on other users in the system.

The possibility for mischief is removed by updating the vfs to support
rename, unlink and rmdir on a dentry that is a mountpoint and by
lazily unmounting mountpoints on deleted dentries.

In particular this change allows rename, unlink and rmdir system calls
on a dentry without a mountpoint in the current mount namespace to
succeed, and it allows rename, unlink, and rmdir performed on a
distributed filesystem to update the vfs cache even if when there is a
mount in some namespace on the original dentry.

There are two common patterns of maintaining mounts: Mounts on trusted
paths with the parent directory of the mount point and all ancestory
directories up to / owned by root and modifiable only by root
(i.e. /media/xxx, /dev, /dev/pts, /proc, /sys, /sys/fs/cgroup/{cpu,
cpuacct, ...}, /usr, /usr/local).  Mounts on unprivileged directories
maintained by fusermount.

In the case of mounts in trusted directories owned by root and
modifiable only by root the current parent directory permissions are
sufficient to ensure a mount point on a trusted path is not removed
or renamed by anyone other than root, even if there is a context
where the there are no mount points to prevent this.

In the case of mounts in directories owned by less privileged users
races with users modifying the path of a mount point are already a
danger.  fusermount already uses a combination of chdir,
/proc/<pid>/fd/NNN, and UMOUNT_NOFOLLOW to prevent these races.  The
removable of global rename, unlink, and rmdir protection really adds
nothing new to consider only a widening of the attack window, and
fusermount is already safe against unprivileged users modifying the
directory simultaneously.

In principle for perfect userspace programs returning -EBUSY for
unlink, rmdir, and rename of dentires that have mounts in the local
namespace is actually unnecessary.  Unfortunately not all userspace
programs are perfect so retaining -EBUSY for unlink, rmdir and rename
of dentries that have mounts in the current mount namespace plays an
important role of maintaining consistency with historical behavior and
making imperfect userspace applications hard to exploit.

v2: Remove spurious old_dentry.
v3: Optimized shrink_submounts_and_drop
    Removed unsued afs label
v4: Simplified the changes to check_submounts_and_drop
    Do not rename check_submounts_and_drop shrink_submounts_and_drop
    Document what why we need atomicity in check_submounts_and_drop
    Rely on the parent inode mutex to make d_revalidate and d_invalidate
    an atomic unit.
v5: Refcount the mountpoint to detach in case of simultaneous
    renames.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: Add a function to lazily unmount all mounts from any dentry.

The new function detach_mounts comes in two pieces.  The first piece
is a static inline test of d_mounpoint that returns immediately
without taking any locks if d_mounpoint is not set.  In the common
case when mountpoints are absent this allows the vfs to continue
running with it's same cacheline foot print.

The second piece of detach_mounts __detach_mounts actually does the
work and it assumes that a mountpoint is present so it is slow and
takes namespace_sem for write, and then locks the mount hash (aka
mount_lock) after a struct mountpoint has been found.

With those two locks held each entry on the list of mounts on a
mountpoint is selected and lazily unmounted until all of the mount
have been lazily unmounted.

v7: Wrote a proper change description and removed the changelog
    documenting deleted wrong turns.

Signed-off-by: Eric W. Biederman <ebiederman@twitter.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: factor out lookup_mountpoint from new_mountpoint

I am shortly going to add a new user of struct mountpoint that
needs to look up existing entries but does not want to create
a struct mountpoint if one does not exist. Therefore to keep
the code simple and easy to read split out lookup_mountpoint
from new_mountpoint.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: Keep a list of mounts on a mount point

To spot any possible problems call BUG if a mountpoint
is put when it's list of mounts is not empty.

AV: use hlist instead of list_head

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Eric W. Biederman <ebiederman@twitter.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: Don't allow overwriting mounts in the current mount namespace

In preparation for allowing mountpoints to be renamed and unlinked
in remote filesystems and in other mount namespaces test if on a dentry
there is a mount in the local mount namespace before allowing it to
be renamed or unlinked.

The primary motivation here are old versions of fusermount unmount
which is not safe if the a path can be renamed or unlinked while it is
verifying the mount is safe to unmount. More recent versions are simpler
and safer by simply using UMOUNT_NOFOLLOW when unmounting a mount
in a directory owned by an arbitrary user.

Miklos Szeredi <miklos@szeredi.hu> reports this is approach is good
enough to remove concerns about new kernels mixed with old versions
of fusermount.

A secondary motivation for restrictions here is that it removing empty
directories that have non-empty mount points on them appears to
violate the rule that rmdir can not remove empty directories. As
Linus Torvalds pointed out this is useful for programs (like git) that
test if a directory is empty with rmdir.

Therefore this patch arranges to enforce the existing mount point
semantics for local mount namespace.

v2: Rewrote the test to be a drop in replacement for d_mountpoint
v3: Use bool instead of int as the return type of is_local_mountpoint

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: More precise tests in d_invalidate

The current comments in d_invalidate about what and why it is doing
what it is doing are wildly off-base. Which is not surprising as
the comments date back to last minute bug fix of the 2.2 kernel.

The big fat lie of a comment said: If it's a directory, we can't drop
it for fear of somebody re-populating it with children (even though
dropping it would make it unreachable from that root, we still might
repopulate it if it was a working directory or similar).

[AV] What we really need to avoid is multiple dentry aliases of the
same directory inode; on all filesystems that have ->d_revalidate()
we either declare all positive dentries always valid (and thus never
fed to d_invalidate()) or use d_materialise_unique() and/or d_splice_alias(),
which take care of alias prevention.

The current rules are:
- To prevent mount point leaks dentries that are mount points or that
have childrent that are mount points may not be be unhashed.
- All dentries may be unhashed.
- Directories may be rehashed with d_materialise_unique

check_submounts_and_drop implements this already for well maintained
remote filesystems so implement the current rules in d_invalidate
by just calling check_submounts_and_drop.

The one difference between d_invalidate and check_submounts_and_drop
is that d_invalidate must respect it when a d_revalidate method has
earlier called d_drop so preserve the d_unhashed check in
d_invalidate.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

vfs: Document the effect of d_revalidate on d_find_alias

d_drop or check_submounts_and_drop called from d_revalidate can result
in renamed directories with child dentries being unhashed. These
renamed and drop directory dentries can be rehashed after
d_materialise_unique uses d_find_alias to find them.

Reviewed-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

delayed mntput

On final mntput() we want fs shutdown to happen before return to
userland; however, the only case where we want it happen right
there (i.e. where task_work_add won't do) is MNT_INTERNAL victim.
Those have to be fully synchronous - failure halfway through module
init might count on having vfsmount killed right there. Fortunately,
final mntput on MNT_INTERNAL vfsmounts happens on shallow stack.
So we handle those synchronously and do an analog of delayed fput
logics for everything else.

As the result, we are guaranteed that fs shutdown will always happen
on shallow stack.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

autofs - remove obsolete d_invalidate() from expire

Biederman's umount-on-rmdir series changes d_invalidate() to sumarily remove
mounts under the passed in dentry regardless of whether they are busy
or not. So calling this in fs/autofs4/expire.c:autofs4_tree_busy() is
definitely the wrong thing to do becuase it will silently umount entries
instead of just cleaning stale dentrys.

But this call shouldn't be needed and testing shows that automounting
continues to function without it.

As Al Viro correctly surmises the original intent of the call was to
perform what shrink_dcache_parent() does.

If at some time in the future I see stale dentries accumulating
following failed mounts I'll revisit the issue and possibly add a
shrink_dcache_parent() call if needed.

Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Allow sharing external names after __d_move()

* external dentry names get a small structure prepended to them
(struct external_name).
* it contains an atomic refcount, matching the number of struct dentry
instances that have ->d_name.name pointing to that external name.  The
first thing free_dentry() does is decrementing refcount of external name,
so the instances that are between the call of free_dentry() and
RCU-delayed actual freeing do not contribute.
* __d_move(x, y, false) makes the name of x equal to the name of y,
external or not.  If y has an external name, extra reference is grabbed
and put into x->d_name.name.  If x used to have an external name, the
reference to the old name is dropped and, should it reach zero, freeing
is scheduled via kfree_rcu().
* free_dentry() in dentry with external name decrements the refcount of
that name and, should it reach zero, does RCU-delayed call that will
free both the dentry and external name.  Otherwise it does what it
used to do, except that __d_free() doesn't even look at ->d_name.name;
it simply frees the dentry.

All non-RCU accesses to dentry external name are safe wrt freeing since they
all should happen before free_dentry() is called.  RCU accesses might run
into a dentry seen by free_dentry() or into an old name that got already
dropped by __d_move(); however, in both cases dentry must have been
alive and refer to that name at some point after we'd done rcu_read_lock(),
which means that any freeing must be still pending.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

missing data dependency barrier in prepend_name()

AFAICS, prepend_name() is broken on SMP alpha.  Disclaimer: I don't have
SMP alpha boxen to reproduce it on.  However, it really looks like the race
is real.

CPU1: d_path() on /mnt/ramfs/<255-character>/foo
CPU2: mv /mnt/ramfs/<255-character> /mnt/ramfs/<63-character>

CPU2 does d_alloc(), which allocates an external name, stores the name there
including terminating NUL, does smp_wmb() and stores its address in
dentry->d_name.name.  It proceeds to d_add(dentry, NULL) and d_move()
old dentry over to that.  ->d_name.name value ends up in that dentry.

In the meanwhile, CPU1 gets to prepend_name() for that dentry.  It fetches
->d_name.name and ->d_name.len; the former ends up pointing to new name
(64-byte kmalloc'ed array), the latter - 255 (length of the old name).
Nothing to force the ordering there, and normally that would be OK, since we'd
run into the terminating NUL and stop.  Except that it's alpha, and we'd need
a data dependency barrier to guarantee that we see that store of NUL
__d_alloc() has done.  In a similar situation dentry_cmp() would survive; it
does explicit smp_read_barrier_depends() after fetching ->d_name.name.
prepend_name() doesn't and it risks walking past the end of kmalloc'ed object
and possibly oops due to taking a page fault in kernel mode.

Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fixes from Al Viro:
"Assorted fixes + unifying __d_move() and __d_materialise_dentry() +
  minimal regression fix for d_path() of victims of overwriting rename()
  ported on top of that"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  vfs: Don't exchange "short" filenames unconditionally.
  fold swapping ->d_name.hash into switch_names()
  fold unlocking the children into dentry_unlock_parents_for_move()
  kill __d_materialise_dentry()
  __d_materialise_dentry(): flip the order of arguments
  __d_move(): fold manipulations with ->d_child/->d_subdirs
  don't open-code d_rehash() in d_materialise_unique()
  pull rehashing and unlocking the target dentry into __d_materialise_dentry()
  ufs: deal with nfsd/iget races
  fuse: honour max_read and max_write in direct_io mode
  shmem: fix nlink for rename overwrite directory

Merge branch 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
"This is quite late but these need to be backported anyway.

  This is the fix for a long-standing cpuset bug which existed from
  2009.  cpuset makes use of PF_SPREAD_{PAGE|SLAB} flags to modify the
  task's memory allocation behavior according to the settings of the
  cpuset it belongs to; unfortunately, when those flags have to be
  changed, cpuset did so directly even whlie the target task is running,
  which is obviously racy as task->flags may be modified by the task
  itself at any time.  This obscure bug manifested as corrupt
  PF_USED_MATH flag leading to a weird crash.

  The bug is fixed by moving the flag to task->atomic_flags.  The first
  two are prepatory ones to help defining atomic_flags accessors and the
  third one is the actual fix"

* 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags
  sched: add macros to define bitops for task atomic flags
  sched: fix confusing PFA_NO_NEW_PRIVS constant

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
"Here's our last set of fixes for 3.17.  Most of these are for TI
  platforms, fixing some noisy Kconfig issues, runtime clock and power
  issues on several platforms and NAND timings on DRA7.

  There are also a couple of bug fixes for i.MX, one for QCOM and a
small fix to avoid section mismatch noise on PXA.

  Diffstat looks large, partially due to some tables being updated and
  thus touching many lines.  The qcom gsbi change also restructures
  clock management a bit and thus touches a bunch of lines.

  All in all, a bit more changes than we'd like at this point, but
  nothing stands out as risky either so it seems like the right thing to
  send it up now instead of holding it to the merge window"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  drivers/soc: qcom: do not disable the iface clock in probe
  ARM: imx: fix .is_enabled() of shared gate clock
  ARM: OMAP3: Fix I/O chain clock line assertion timed out error
  ARM: keystone: dts: fix bindings for pcie and usb clock nodes
  bus: omap_l3_noc: Fix connID for OMAP4
  ARM: DT: imx53: fix lvds channel 1 port
  ARM: dts: cm-t54: fix serial console power supply.
  ARM: dts: dra7-evm: Fix NAND GPMC timings
  ARM: pxa: fix section mismatch warning for pxa_timer_nodt_init
  ARM: OMAP: Fix Kconfig warning for omap1

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
"The final round of fixes.  One corner case in the math emulator and
  another one in the mcount function for ftrace"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: mcount: Adjust stack pointer for static trace in MIPS32
  MIPS: Fix MFC1 & MFHC1 emulation for 64-bit MIPS systems

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:
"This has:

   - EFI revert to fix a boot regression
   - early_ioremap() fix for boot failure
   - KASLR fix for possible boot failures
   - EFI fix for corrupted string printing
   - remove a misleading EFI bootup 'failed!' error message

  Unfortunately it's all rather close to the merge window"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Truncate 64-bit values when calling 32-bit OutputString()
  x86/efi: Delete misleading efi_printk() error message
  Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>"
  x86/kaslr: Avoid the setup_data area when picking location
  x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8

vfs: Don't exchange "short" filenames unconditionally.

Only exchange source and destination filenames
if flags contain RENAME_EXCHANGE.
In case if executable file was running and replaced by
other file /proc/PID/exe should still show correct file name,
not the old name of the file by which it was replaced.

The scenario when this bug manifests itself was like this:
* ALT Linux uses rpm and start-stop-daemon;
* during a package upgrade rpm creates a temporary file
  for an executable to rename it upon successful unpacking;
* start-stop-daemon is run subsequently and it obtains
  the (nonexistant) temporary filename via /proc/PID/exe
  thus failing to identify the running process.

Note that "long" filenames (> DNAiME_INLINE_LEN) are still
exchanged without RENAME_EXCHANGE and this behaviour exists
long enough (should be fixed too apparently).
So this patch is just an interim workaround that restores
behavior for "short" names as it was before changes
introduced by commit da1ce0670c14 ("vfs: add cross-rename").

See https://lkml.org/lkml/2014/9/7/6 for details.

AV: the comments about being more careful with ->d_name.hash
than with ->d_name.name are from back in 2.3.40s; they
became obsolete by 2.3.60s, when we started to unhash the
target instead of swapping hash chain positions followed
by d_delete() as we used to do when dcache was first
introduced.

Acked-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: da1ce0670c14 "vfs: add cross-rename"
Signed-off-by: Mikhail Efremov <sem@altlinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fold swapping ->d_name.hash into switch_names()

and do it along with ->d_name.len there

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fold unlocking the children into dentry_unlock_parents_for_move()

... renaming it into dentry_unlock_for_move() and making it more
symmetric with dentry_lock_for_move().

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

kill __d_materialise_dentry()

it folds into __d_move() now

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

__d_materialise_dentry(): flip the order of arguments

... thus making it much closer to (now unreachable, BTW) IS_ROOT(dentry)
case in __d_move(). A bit more and it'll fold in.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

__d_move(): fold manipulations with ->d_child/->d_subdirs

list_del() + list_add() is a slightly pessimised list_move()
list_del() + INIT_LIST_HEAD() is a slightly pessimised list_del_init()

Interleaving those makes the resulting code even worse. And harder to follow...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

don't open-code d_rehash() in d_materialise_unique()

... and get rid of duplicate BUG_ON() there

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

pull rehashing and unlocking the target dentry into __d_materialise_dentry()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

ufs: deal with nfsd/iget races

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

fuse: honour max_read and max_write in direct_io mode

The third argument of fuse_get_user_pages() "nbytesp" refers to the number of
bytes a caller asked to pack into fuse request. This value may be lesser
than capacity of fuse request or iov_iter. So fuse_get_user_pages() must
ensure that *nbytesp won't grow.

Now, when helper iov_iter_get_pages() performs all hard work of extracting
pages from iov_iter, it can be done by passing properly calculated
"maxsize" to the helper.

The other caller of iov_iter_get_pages() (dio_refill_pages()) doesn't need
this capability, so pass LONG_MAX as the maxsize argument here.

Fixes: c9c37e2e6378 ("fuse: switch to iov_iter_get_pages()")
Reported-by: Werner Baumann <werner.baumann@onlinehome.de>
Tested-by: Maxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

shmem: fix nlink for rename overwrite directory

If overwriting an empty directory with rename, then need to drop the extra
nlink.

Test prog:

#include <stdio.h>
#include <fcntl.h>
#include <err.h>
#include <sys/stat.h>

int main(void)
{
const char *test_dir1 = "test-dir1";
const char *test_dir2 = "test-dir2";
int res;
int fd;
struct stat statbuf;

res = mkdir(test_dir1, 0777);
if (res == -1)
err(1, "mkdir(\"%s\")", test_dir1);

res = mkdir(test_dir2, 0777);
if (res == -1)
err(1, "mkdir(\"%s\")", test_dir2);

fd = open(test_dir2, O_RDONLY);
if (fd == -1)
err(1, "open(\"%s\")", test_dir2);

res = rename(test_dir1, test_dir2);
if (res == -1)
err(1, "rename(\"%s\", \"%s\")", test_dir1, test_dir2);

res = fstat(fd, &statbuf);
if (res == -1)
err(1, "fstat(%i)", fd);

if (statbuf.st_nlink != 0) {
fprintf(stderr, "nlink is %lu, should be 0\n", statbuf.st_nlink);
return 1;
}

return 0;
}

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

Pull input fix from Dmitry Torokhov:
"A small fixup to i8042 adding Asus X450LCP to the nomux list"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: i8042 - fix Asus X450LCP touchpad detection

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:
"A CONFIG_STACK_GROWSUP=y fix, and a hotplug llc CPU mask fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Fix unreleased llc_shared_mask bit during CPU hotplug
sched: Fix end_of_stack() and location of stack canary for architectures using CONFIG_STACK_GROWSUP

Merge branch 'akpm' (fixes from Andrew Morton)

Merge fixes from Andrew Morton:
"9 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: softdirty: keep bit when zapping file pte
  fs/cachefiles: add missing \n to kerror conversions
  genalloc: fix device node resource counter
  drivers/rtc/rtc-efi.c: add missing module alias
  mm, slab: initialize object alignment on cache creation
  mm: softdirty: addresses before VMAs in PTE holes aren't softdirty
  ocfs2/dlm: do not get resource spinlock if lockres is new
  nilfs2: fix data loss with mmap()
  ocfs2: free vol_label in ocfs2_delete_osb()

mm: softdirty: keep bit when zapping file pte

This fixes the same bug as b43790eedd31 ("mm: softdirty: don't forget to
save file map softdiry bit on unmap") and 9aed8614af5a ("mm/memory.c:
don't forget to set softdirty on file mapped fault") where the return
value of pte_*mksoft_dirty was being ignored.

To be sure that no other pte/pmd "mk" function return values were being
ignored, I annotated the functions in arch/x86/include/asm/pgtable.h
with __must_check and rebuilt.

The userspace effect of this bug is that the softdirty mark might be
lost if a file mapped pte get zapped.

Signed-off-by: Peter Feiner <pfeiner@google.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Jamie Liu <jamieliu@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org> [3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/cachefiles: add missing \n to kerror conversions

Commit 0227d6abb378 ("fs/cachefiles: replace kerror by pr_err") didn't
include newline featuring in original kerror definition

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reported-by: David Howells <dhowells@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org> [3.16.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

genalloc: fix device node resource counter

Decrement the np_pool device_node refcount, which was incremented on
the preceding of_parse_phandle() call.

Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-efi.c: add missing module alias

Without proper alias kernel module is not loaded for rtc-efi driver.

Signed-off-by: Pali Rohár <pali.rohar@gmail.com>
Cc: dann frazier <dannf@dannf.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm, slab: initialize object alignment on cache creation

Since commit 4590685546a3 ("mm/sl[aou]b: Common alignment code"), the
"ralign" automatic variable in __kmem_cache_create() may be used as
uninitialized.

The proper alignment defaults to BYTES_PER_WORD and can be overridden by
SLAB_RED_ZONE or the alignment specified by the caller.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=85031

Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Andrei Elovikov <a.elovikov@gmail.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

mm: softdirty: addresses before VMAs in PTE holes aren't softdirty

In PTE holes that contain VM_SOFTDIRTY VMAs, unmapped addresses before
VM_SOFTDIRTY VMAs are reported as softdirty by /proc/pid/pagemap.  This
bug was introduced in commit 68b5a6524856 ("mm: softdirty: respect
VM_SOFTDIRTY in PTE holes").  That commit made /proc/pid/pagemap look at
VM_SOFTDIRTY in PTE holes but neglected to observe the start of VMAs
returned by find_vma.

Tested:
  Wrote a selftest that creates a PMD-sized VMA then unmaps the first
  page and asserts that the page is not softdirty. I'm going to send the
  pagemap selftest in a later commit.

Signed-off-by: Peter Feiner <pfeiner@google.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Jamie Liu <jamieliu@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2/dlm: do not get resource spinlock if lockres is new

There is a deadlock case which reported by Guozhonghua:
  https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html

This case is caused by &res->spinlock and &dlm->master_lock
misordering in different threads.

It was introduced by commit 8d400b81cc83 ("ocfs2/dlm: Clean up refmap
helpers").  Since lockres is new, it doesn't not require the
&res->spinlock.  So remove it.

Fixes: 8d400b81cc83 ("ocfs2/dlm: Clean up refmap helpers")
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Reported-by: Guozhonghua <guozhonghua@h3c.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

nilfs2: fix data loss with mmap()

This bug leads to reproducible silent data loss, despite the use of
msync(), sync() and a clean unmount of the file system.  It is easily
reproducible with the following script:

  ----------------[BEGIN SCRIPT]--------------------
  mkfs.nilfs2 -f /dev/sdb
  mount /dev/sdb /mnt

  dd if=/dev/zero bs=1M count=30 of=/mnt/testfile

  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_BEFORE="$(md5sum /mnt/testfile)"

  /root/mmaptest/mmaptest /mnt/testfile 30 10 5

  sync
  CHECKSUM_AFTER="$(md5sum /mnt/testfile)"
  umount /mnt
  mount /dev/sdb /mnt
  CHECKSUM_AFTER_REMOUNT="$(md5sum /mnt/testfile)"
  umount /mnt

  echo "BEFORE MMAP:\t$CHECKSUM_BEFORE"
  echo "AFTER MMAP:\t$CHECKSUM_AFTER"
  echo "AFTER REMOUNT:\t$CHECKSUM_AFTER_REMOUNT"
  ----------------[END SCRIPT]--------------------

The mmaptest tool looks something like this (very simplified, with
error checking removed):

  ----------------[BEGIN mmaptest]--------------------
  data = mmap(NULL, file_size - file_offset, PROT_READ | PROT_WRITE,
              MAP_SHARED, fd, file_offset);

  for (i = 0; i < write_count; ++i) {
        memcpy(data + i * 4096, buf, sizeof(buf));
        msync(data, file_size - file_offset, MS_SYNC))
  }
  ----------------[END mmaptest]--------------------

The output of the script looks something like this:

  BEFORE MMAP:    281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile
  AFTER MMAP:     6604a1c31f10780331a6850371b3a313  /mnt/testfile
  AFTER REMOUNT:  281ed1d5ae50e8419f9b978aab16de83  /mnt/testfile

So it is clear, that the changes done using mmap() do not survive a
remount.  This can be reproduced a 100% of the time.  The problem was
introduced in commit 136e8770cd5d ("nilfs2: fix issue of
nilfs_set_page_dirty() for page at EOF boundary").

If the page was read with mpage_readpage() or mpage_readpages() for
example, then it has no buffers attached to it.  In that case
page_has_buffers(page) in nilfs_set_page_dirty() will be false.
Therefore nilfs_set_file_dirty() is never called and the pages are never
collected and never written to disk.

This patch fixes the problem by also calling nilfs_set_file_dirty() if the
page has no buffers attached to it.

[akpm@linux-foundation.org: s/PAGE_SHIFT/PAGE_CACHE_SHIFT/]
Signed-off-by: Andreas Rohner <andreas.rohner@gmx.net>
Tested-by: Andreas Rohner <andreas.rohner@gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

ocfs2: free vol_label in ocfs2_delete_osb()

osb->vol_label is malloced in ocfs2_initialize_super but not freed if
error occurs or during umount, thus causing a memory leak.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: joyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MIPS: mcount: Adjust stack pointer for static trace in MIPS32

Every mcount() call in the MIPS 32-bit kernel is done as follows:

[...]
move at, ra
jal _mcount
addiu sp, sp, -8
[...]

but upon returning from the mcount() function, the stack pointer
is not adjusted properly. This is explained in details in 58b69401c797
(MIPS: Function tracer: Fix broken function tracing).

Commit ad8c396936e3 ("MIPS: Unbreak function tracer for 64-bit kernel.)
fixed the stack manipulation for 64-bit but it didn't fix it completely
for MIPS32.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: <stable@vger.kernel.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7792/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

MIPS: Fix MFC1 & MFHC1 emulation for 64-bit MIPS systems

Commit bbd426f542cb "MIPS: Simplify FP context access" modified the
SIFROMREG & SIFROMHREG macros such that they return unsigned rather
than signed 32b integers. I had believed that to be fine, but
inadvertently missed the MFC1 & MFHC1 cases which write to a struct
pt_regs regs element. On MIPS32 this is fine, but on 64 bit those
saved regs' fields are 64 bit wide. Using unsigned values caused the
32 bit value from the FP register to be zero rather than sign extended
as the architecture specifies, causing incorrect emulation of the
MFC1 & MFHc1 instructions. Fix by reintroducing the casts to signed
integers, and therefore the sign extension.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: stable@vger.kernel.org # v3.15+
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7848/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

Merge tag 'pm+acpi-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:
"These are regression fixes (ACPI hotplug, cpufreq, hibernation, ACPI
  LPSS driver), fixes for stuff that never worked correctly (ACPI GPIO
  support in some cases and a wrong sign of an error code in the ACPI
  core in one place), and one blacklist item for ACPI backlight
  handling.

  Specifics:

   - Revert of a recent hibernation core commit that introduced a NULL
     pointer dereference during resume for at least one user (Rafael J
     Wysocki).

   - Fix for the ACPI LPSS (Low-Power Subsystem) driver to disable
     asynchronous PM callback execution for LPSS devices during system
     suspend/resume (introduced in 3.16) which turns out to break
     ordering expectations on some systems.  From Fu Zhonghui.

   - cpufreq core fix related to the handling of sysfs nodes during
     system suspend/resume that has been broken for intel_pstate since
     3.15 from Lan Tianyu.

   - Restore the generation of "online" uevents for ACPI container
     devices that was removed in 3.14, but some user space utilities
     turn out to need them (Rafael J Wysocki).

   - The cpufreq core fails to release a lock in an error code path
     after changes made in 3.14.  Fix from Prarit Bhargava.

   - ACPICA and ACPI/GPIO fixes to make the handling of ACPI GPIO
     operation regions (which means AML using GPIOs) work correctly in
     all cases from Bob Moore and Srinivas Pandruvada.

   - Fix for a wrong sign of the ACPI core's create_modalias() return
     value in case of an error from Mika Westerberg.

   - ACPI backlight blacklist entry for ThinkPad X201s from Aaron Lu"

* tag 'pm+acpi-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"
  gpio / ACPI: Use pin index and bit length
  ACPICA: Update to GPIO region handler interface.
  ACPI / platform / LPSS: disable async suspend/resume of LPSS devices
  cpufreq: release policy->rwsem on error
  cpufreq: fix cpufreq suspend/resume for intel_pstate
  ACPI / scan: Correct error return value of create_modalias()
  ACPI / video: disable native backlight for ThinkPad X201s
  ACPI / hotplug: Generate online uevents for ACPI containers

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"This is probably not the kind of pull request you want to see that
  late in the cycle.  Yet, the ACPI refactorization was problematic
  again and caused another two issues which need fixing.  My holidays
  with limited internet (plus travelling) and the developer's illness
  didn't help either :(

  The details:

   - ACPI code was refactored out into a seperate file and as a
     side-effect, the i2c-core module got renamed.  Jean Delvare
     rightfully complained about the rename being problematic for
     distributions.  So, Mika and I thought the least problematic way to
     deal with it is to move all the code back into the main i2c core
     source file.  This is mainly a huge code move with some #ifdeffery
     applied.  No functional code changes.  Our personal tests and the
     testbots did not find problems.  (I was thinking about reverting,
     too, yet that would also have ~800 lines changed)

   - The new ACPI code also had a NULL pointer exception, thanks to
     Peter for finding and fixing it.

   - Mikko fixed a locking problem by decoupling clock_prepare and
     clock_enable.

   - Addy learnt that the datasheet was wrong and reimplemented the
     frequency setup according to the new algorithm.

  - Fan fixed an off-by-one error when copying data

  - Janusz fixed a copy'n'paste bug which gave a wrong error message

  - Sergei made sure that "don't touch" bits are not accessed"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: acpi: Fix NULL Pointer dereference
  i2c: move acpi code back into the core
  i2c: rk3x: fix divisor calculation for SCL frequency
  i2c: mxs: fix error message in pio transfer
  i2c: ismt: use correct length when copy buffer
  i2c: rcar: fix RCAR_IRQ_ACK_{RECV|SEND}
  i2c: tegra: Move clk_prepare/clk_set_rate to probe

MAINTAINERS: new Documentation maintainer

Transfer Documentation maintainership to Jiri Kosina.
Thanks, Jiri.

I'll still be reviewing and working on documentation.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branches 'pm-cpufreq' and 'pm-sleep'

* pm-cpufreq:
  cpufreq: release policy->rwsem on error
  cpufreq: fix cpufreq suspend/resume for intel_pstate

* pm-sleep:
  Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"

Merge branches 'acpi-hotplug', 'acpi-scan', 'acpi-lpss', 'acpi-gpio' and 'acpi-video'

* acpi-hotplug:
  ACPI / hotplug: Generate online uevents for ACPI containers

* acpi-scan:
  ACPI / scan: Correct error return value of create_modalias()

* acpi-lpss:
  ACPI / platform / LPSS: disable async suspend/resume of LPSS devices

* acpi-gpio:
  gpio / ACPI: Use pin index and bit length
  ACPICA: Update to GPIO region handler interface.

* acpi-video:
  ACPI / video: disable native backlight for ThinkPad X201s

Merge tag 'gpio-v3.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull gpio fixes from Linus Walleij:
"Two GPIO fixes:

   - GPIO direction flags where handled wrong in the new descriptor-
     based API, so direction changes did not always "take".

   - Fix a handler installation race in the generic GPIO irqchip code"

* tag 'gpio-v3.17-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: Fix potential NULL handler data in chained irqchip handler
  gpio: Fix gpio direction flags not getting set

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

  * Revert the static library changes from the merge window since they're
    causing issues for Macbooks and Fedora + Grub2 (Matt Fleming)

  * Delete the misleading "setup_efi_pci() failed!" message which some
    people are seeing when booting EFI (Matt Fleming)

  * Fix printing strings from the 32-bit EFI boot stub by only passing
    32-bit addresses to the firmware (Matt Fleming)

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux

Pull devicetree bug fixes and documentation from Grant Likely:
"Several bug fix commits for issues found in the v3.17 rc series.

  Most of these are minor in that they aren't actively dangerous, but
  they have been seen in the wild.  The one important fix is commit
  7dbe5849fb50 ("of: make sure of_alias is initialized before accessing
  it"), without which some powerpc platforms will fail to find stdout
  for the console"

* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
  of/fdt: fix memory range check
  of: Fix memory block alignment in early_init_dt_add_memory_arch()
  of: make sure of_alias is initialized before accessing it
  of: Documentation regarding attaching OF Selftest testdata
  of: Disabling OF functions that use sysfs if CONFIG_SYSFS disabled
  of: correct of_console_check()'s return value

i2c: acpi: Fix NULL Pointer dereference

If adapter->dev.parent == NULL there is a NULL pointer dereference in
acpi_i2c_install_space_handler and acpi_i2c_remove_space_handler.

This is present since introduction of this code:
366047515c6e "i2c: rework kernel config I2C_ACPI" or even
da3c6647ee08 "I2C/ACPI: Clean up I2C ACPI code and Add CONFIG_I2C_ACPI"

The adapter->dev.parent == NULL case is valid for the i2c_stub,
so loading i2c_stub with ACPI_I2C_OPREGION enabled results in an oops.
This is also valid at least for i2c_tiny_usb and i2c_robotfuzz_osif.

Fix by checking whether it is null before calling ACPI_HANDLE.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>

i2c: move acpi code back into the core

Commit 5d98e61d337c ("I2C/ACPI: Add i2c ACPI operation region support")
renamed the i2c-core module. This may cause regressions for
distributions, so put the ACPI code back into the core.

Reported-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Tested-by: Lan Tianyu <tianyu.lan@intel.com>
Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>

of/fdt: fix memory range check

In cases where board has below memory DT node

memory{
device_type = "memory";
reg = <0x80000000 0x80000000>;
};

Check on the memory range in fdt.c will always fail because it is
comparing MAX_PHYS_ADDR with base + size, in fact it should compare
it with base + size - 1.

This issue was originally noticed on Qualcomm IFC6410 board.
Without this patch kernel shows up noticed unnecessary warnings

[    0.000000] Machine model: Qualcomm APQ8064/IFC6410
[    0.000000] Ignoring memory range 0xffffffff - 0x100000000
[    0.000000] cma: Reserved 64 MiB at ab800000

as a result the size get reduced to 0x7fffffff which looks wrong.

This patch fixes the check involved in generating this warning and
as a result it also fixes the wrong size calculation.

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
[grant.likely: adjust new size calculation also]
Signed-off-by: Grant Likely <grant.likely@linaro.org>

cpuset: PF_SPREAD_PAGE and PF_SPREAD_SLAB should be atomic flags

When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.

Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.

Here's the full report:
https://lkml.org/lkml/2014/9/19/230

To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.

v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Kees Cook <keescook@chromium.org>
Fixes: 950592f7b991 ("cpusets: update tasks' page/slab spread flags in time")
Cc: <stable@vger.kernel.org> # 2.6.31+
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

sched: add macros to define bitops for task atomic flags

This will simplify code when we add new flags.

v3:
- Kees pointed out that no_new_privs should never be cleared, so we
shouldn't define task_clear_no_new_privs(). we define 3 macros instead
of a single one.

v2:
- updated scripts/tags.sh, suggested by Peter

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

sched: fix confusing PFA_NO_NEW_PRIVS constant

Commit 1d4457f99928 ("sched: move no_new_privs into new atomic flags")
defined PFA_NO_NEW_PRIVS as hexadecimal value, but it is confusing
because it is used as bit number. Redefine it as decimal bit number.

Note this changes the bit position of PFA_NOW_NEW_PRIVS from 1 to 0.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Kees Cook <keescook@chromium.org>
[ lizf: slightly modified subject and changelog ]
Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

Input: i8042 - fix Asus X450LCP touchpad detection

We need to add this module to the nomux table to be able to detect the
touchpad.

Cc: stablevger.kernel.org
Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>

Revert "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"

Revert commit 6efde38f0769 (PM / Hibernate: Iterate over set bits
instead of PFNs in swsusp_free()) that introduced a NULL pointer
dereference during system resume from hibernation:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff810a8cc1>] swsusp_free+0x21/0x190
PGD b39c2067 PUD b39c1067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: <irrelevant list of modules>
CPU: 1 PID: 4898 Comm: s2disk Tainted: G         C     3.17-rc5-amd64 #1 Debian 3.17~rc5-1~exp1
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
task: ffff88023155ea40 ti: ffff8800b3b14000 task.ti: ffff8800b3b14000
RIP: 0010:[<ffffffff810a8cc1>]  [<ffffffff810a8cc1>]
swsusp_free+0x21/0x190
RSP: 0018:ffff8800b3b17ea8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800b39bab00 RCX: 0000000000000001
RDX: ffff8800b39bab10 RSI: ffff8800b39bab00 RDI: 0000000000000000
RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8800b39bab10 R11: 0000000000000246 R12: ffffea0000000000
R13: ffff880232f485a0 R14: ffff88023ac27cd8 R15: ffff880232927590
FS:  00007f406d83b700(0000) GS:ffff88023bc80000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000b3a62000 CR4: 00000000000007e0
Stack:
ffff8800b39bab00 0000000000000010 ffff880232927590 ffffffff810acb4a
ffff8800b39bab00 ffffffff811a955a ffff8800b39bab10 0000000000000000
ffff88023155f098 ffffffff81a6b8c0 ffff88023155ea40 0000000000000007
Call Trace:
[<ffffffff810acb4a>] ? snapshot_release+0x2a/0xb0
[<ffffffff811a955a>] ? __fput+0xca/0x1d0
[<ffffffff81080627>] ? task_work_run+0x97/0xd0
[<ffffffff81012d89>] ? do_notify_resume+0x69/0xa0
[<ffffffff8151452a>] ? int_signal+0x12/0x17
Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 54 48 8b 05 ba 62 9c 00 49 bc 00 00 00 00 00 ea ff ff 48 8b 3d a1 62 9c 00 55 53 <48> 8b 10 48 89 50 18 48 8b 52 20 48 c7 40 28 00 00 00 00 c7 40
RIP  [<ffffffff810a8cc1>] swsusp_free+0x21/0x190
RSP <ffff8800b3b17ea8>
CR2: 0000000000000000
---[ end trace f02be86a1ec0cccb ]---

due to forbidden_pages_map being NULL in swsusp_free().

Fixes: 6efde38f0769 "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()"
Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Some final radeon and i915 fixes, black screens mostly"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon/cik: use a separate counter for CP init timeout
  drm/i915/hdmi: fix hdmi audio state readout
  drm/i915: Don't leak command parser tables on suspend/resume
  drm/radeon: add PX quirk for asus K53TK
  drm/radeon: add a backlight quirk for Amilo Xi 2550
  drm/radeon: add a module parameter for backlight control (v2)
  drm/radeon: Update IH_RB_RPTR register after each processed interrupt
  drm/radeon: Make IH ring overflow debugging output more useful
  drm/radeon: Clear RB_OVERFLOW bit earlier

gpio / ACPI: Use pin index and bit length

Fix code when the operation region callback is for an gpio, which
is not at index 0 and for partial pins in a GPIO definition.
For example:
Name (GMOD, ResourceTemplate ()
{
//3 Outputs that define the Power mode of the device
GpioIo (Exclusive, PullDown, , , , "\\_SB.GPI2") {10, 11, 12}
})
}

If opregion callback calls is for:
- Set pin 10, then address = 0 and bit length = 1
- Set pin 11, then address = 1 and bit length = 1
- Set for both pin 11 and pin 12, then address = 1, bit length = 2

This change requires updated ACPICA gpio operation handler code to
send the pin index and bit length.

Fixes: 473ed7be0da0 (gpio / ACPI: Add support for ACPI GPIO operation regions)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+: 75ec6e55f138 ACPICA: Update to GPIO region handler interface.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

x86/efi: Truncate 64-bit values when calling 32-bit OutputString()

If we're executing the 32-bit efi_char16_printk() code path (i.e.
running on top of 32-bit firmware) we know that efi_early->text_output
will be a 32-bit value, even though ->text_output has type u64.

Unfortunately, we currently pass ->text_output directly to
efi_early->call() so for CONFIG_X86_32 the compiler will push a 64-bit
value onto the stack, causing the other parameters to be misaligned.

The way we handle this in the rest of the EFI boot stub is to pass
pointers as arguments to efi_early->call(), which automatically do the
right thing (pointers are 32-bit on CONFIG_X86_32, and we simply ignore
the upper 32-bits of the argument register if running in 64-bit mode
with 32-bit firmware).

This fixes a corruption bug when printing strings from the 32-bit EFI
boot stub.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=84241
Signed-off-by: Matt Fleming <matt.fleming@intel.com>

ACPICA: Update to GPIO region handler interface.

Changes to correct several GPIO issues:

1) The update_rule in a GPIO field definition is now ignored;
a read-modify-write operation is never performed for GPIO fields.
(Internally, this means that the field assembly/disassembly
code is completely bypassed for GPIO.)

2) The Address parameter passed to a GPIO region handler is
now the bit offset of the field from a previous Connection()
operator. Thus, it becomes a "Pin Number Index" into the
Connection() resource descriptor.

3) The bit_width parameter passed to a GPIO region handler is
now the exact bit width of the GPIO field. Thus, it can be
interpreted as "number of pins".

Overall, we can now say that the region handler interface
to GPIO handlers is a raw "bit/pin" addressed interface, not
a byte-addressed interface like the system_memory handler interface.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge branch 'drm-fixes-3.17' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

- fix a backlight regression resulting in dark screen
- add a PX quirk to avoid a hang with runtime pm
- fix an init issue on the CIK compute rings
- fix IH ring buffer overflows gracefully

* 'drm-fixes-3.17' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon/cik: use a separate counter for CP init timeout
  drm/radeon: add PX quirk for asus K53TK
  drm/radeon: add a backlight quirk for Amilo Xi 2550
  drm/radeon: add a module parameter for backlight control (v2)
  drm/radeon: Update IH_RB_RPTR register after each processed interrupt
  drm/radeon: Make IH ring overflow debugging output more useful
  drm/radeon: Clear RB_OVERFLOW bit earlier

Merge tag 'drm-intel-fixes-2014-09-24' of git://anongit.freedesktop.org/drm-intel into drm-fixes

a couple of small fixes for 3.17 still.

* tag 'drm-intel-fixes-2014-09-24' of git://anongit.freedesktop.org/drm-intel:
drm/i915/hdmi: fix hdmi audio state readout
drm/i915: Don't leak command parser tables on suspend/resume

ACPI / platform / LPSS: disable async suspend/resume of LPSS devices

On some systems (Asus T100 in particular) there are strict ordering
dependencies between LPSS devices with respect to power management
that break if they suspend/resume asynchronously.

In theory it should be possible to follow those dependencies in the
async suspend/resume case too (the ACPI tables tell as that the
dependencies are there), but since we're missing infrastructure
for that at the moment, disable async suspend/resume for all of
the LPSS devices for the time being.

Link: http://marc.info/?l=linux-acpi&m=141158962321905&w=2
Fixes: 8ce62f85a81f (ACPI / platform / LPSS: Enable async suspend/resume of LPSS devices)
Signed-off-by: Li Aubrey <aubrey.li@linux.intel.com>
Signed-off-by: Fu Zhonghui <zhonghui.fu@linux.intel.com>
Cc: 3.16+ <stable@vger.kernel.org> # 3.16+
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"Here is a quick pull request primarily meant to address the deconfig
  fallout from changing SCSI_NETLINK from being used via 'select' to
  being used via 'depends'.

  I applied a set of 5 patches written by Michal Marek, and then I
  carefully audited all of the remaining config files, basically:

   1) I scanned every arch config file, and if it mentioned CONFIG_INET
      or CONFIG_UNIX, I made sure it had CONFIG_NET=y

   2) After that, I scanned every arch config file, and if it did not
      have CONFIG_NET=y I made sure it did not reference any networking
      config options.

  Finally, we have some late breaking wireless fixes in here from John
  Linville and co"

[ And there's a sparc bpf fix snuck in too ]

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  sparc: bpf_jit: fix loads from negative offsets
  parisc: Update defconfigs which were missing CONFIG_NET.
  powerpc: Update defconfigs which were missing CONFIG_NET.
  s390: Update defconfigs which were missing CONFIG_NET.
  mips: Update some more defconfigs which were missing CONFIG_NET.
  sparc: Set CONFIG_NET=y in defconfigs
  sh: Set CONFIG_NET=y in defconfigs
  powerpc: Set CONFIG_NET=y in defconfigs
  parisc: Set CONFIG_NET=y in defconfigs
  mips: Set CONFIG_NET=y in defconfigs
  brcmfmac: Fix off by one bug in brcmf_count_20mhz_channels()
  ath9k: Fix NULL pointer dereference on early irq
  net: rfkill: gpio: Fix clock status
  NFC: st21nfca: Fix potential depmod dependency cycle
  NFC: st21nfcb: Fix depmod dependency cycle
  NFC: microread: Potential overflows in microread_target_discovered()

sparc: bpf_jit: fix loads from negative offsets

- fix BPF_LD|ABS|IND from negative offsets:
  make sure to sign extend lower 32 bits in 64-bit register
  before calling C helpers from JITed code, otherwise 'int k'
  argument of bpf_internal_load_pointer_neg_helper() function
  will be added as large unsigned integer, causing packet size
  check to trigger and abort the program.

  It's worth noting that JITed code for 'A = A op K' will affect
  upper 32 bits differently depending whether K is simm13 or not.
  Since small constants are sign extended, whereas large constants
  are stored in temp register and zero extended.
  That is ok and we don't have to pay a penalty of sign extension
  for every sethi, since all classic BPF instructions have 32-bit
  semantics and we only need to set correct upper bits when
  transitioning from JITed code into C.

- though instructions 'A &= 0' and 'A *= 0' are odd, JIT compiler
  should not optimize them out

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'master-2014-09-23' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless

John W. Linville says:

====================
pull request: wireless 2014-09-23

Please consider pulling this one last batch of fixes intended for the 3.17 stream!

For the NFC bits, Samuel says:

"Hopefully not too late for a handful of NFC fixes:

- 2 potential build failures for ST21NFCA and ST21NFCB, triggered by a
depmod dependenyc cycle.
- One potential buffer overflow in the microread driver."

On top of that...

Emil Goode provides a fix for a brcmfmac off-by-one regression which
was introduced in the 3.17 cycle.

Loic Poulain fixes a polarity mismatch for a variable assignment
inside of rfkill-gpio.

Wojciech Dubowik prevents a NULL pointer dereference in ath9k.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

parisc: Update defconfigs which were missing CONFIG_NET.

Commit df568d8e ("scsi: Use 'depends' with LIBFC instead of
'select'.") removed what happened to be the only instance of 'select
NET'. Defconfigs that were relying on the select now lack networking
support.

Signed-off-by: David S. Miller <davem@davemloft.net>

powerpc: Update defconfigs which were missing CONFIG_NET.

Commit df568d8e ("scsi: Use 'depends' with LIBFC instead of
'select'.") removed what happened to be the only instance of 'select
NET'. Defconfigs that were relying on the select now lack networking
support.

Signed-off-by: David S. Miller <davem@davemloft.net>

s390: Update defconfigs which were missing CONFIG_NET.

Commit df568d8e ("scsi: Use 'depends' with LIBFC instead of
'select'.") removed what happened to be the only instance of 'select
NET'. Defconfigs that were relying on the select now lack networking
support.

Signed-off-by: David S. Miller <davem@davemloft.net>

mips: Update some more defconfigs which were missing CONFIG_NET.

Commit df568d8e ("scsi: Use 'depends' with LIBFC instead of
'select'.") removed what happened to be the only instance of 'select
NET'. Defconfigs that were relying on the select now lack networking
support.

Signed-off-by: David S. Miller <davem@davemloft.net>

sparc: Set CONFIG_NET=y in defconfigs

Commit 5d6be6a5 ("scsi_netlink : Make SCSI_NETLINK dependent on NET
instead of selecting NET") removed what happened to be the only instance
of 'select NET'. Defconfigs that were relying on the select now lack
networking support.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh: Set CONFIG_NET=y in defconfigs

Commit 5d6be6a5 ("scsi_netlink : Make SCSI_NETLINK dependent on NET
instead of selecting NET") removed what happened to be the only instance
of 'select NET'. Defconfigs that were relying on the select now lack
networking support.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: linux-sh@vger.kernel.org
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

powerpc: Set CONFIG_NET=y in defconfigs

Commit 5d6be6a5 ("scsi_netlink : Make SCSI_NETLINK dependent on NET
instead of selecting NET") removed what happened to be the only instance
of 'select NET'. Defconfigs that were relying on the select now lack
networking support.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

parisc: Set CONFIG_NET=y in defconfigs

Commit 5d6be6a5 ("scsi_netlink : Make SCSI_NETLINK dependent on NET
instead of selecting NET") removed what happened to be the only instance
of 'select NET'. Defconfigs that were relying on the select now lack
networking support.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: linux-parisc@vger.kernel.org
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

mips: Set CONFIG_NET=y in defconfigs

Commit 5d6be6a5 ("scsi_netlink : Make SCSI_NETLINK dependent on NET
instead of selecting NET") removed what happened to be the only instance
of 'select NET'. Defconfigs that were relying on the select now lack
networking support.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: linux-mips@linux-mips.org
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull one last block fix from Jens Axboe:
"We've had an issue with scsi-mq where probing takes forever.  This was
  bisected down to the percpu changes for blk_mq_queue_enter(), and the
  fact we now suffer an RCU grace period when killing a queue.  SCSI
  creates and destroys tons of queues, so this let to 10s of seconds of
  stalls at boot for some.

  Tejun has a real fix for this, but it's too involved for 3.17.  So
  this is a temporary workaround to expedite the queue killing until we
  can fold in the real fix for 3.18 when that merge window opens"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe

Merge tag 'pci-v3.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
"Here are a few fixes that should be in v3.17.

   - Reverting "Don't scan random busses" covers up a CardBus regression
     having to do with allocating CardBus bus numbers.

   - Reverting "Make sure bus numbers stay within parents bounds" covers
     up an ACPI _CRS bug that makes us reconfigure a bridge, causing a
     broken device behind it to stop responding.

   - The pciehp timeout change fixes some code we added in v3.17.
     Without the fix, we can send a new hotplug command too early,
     before the timeout has expired.

  I hope for better fixes for the reverts, but those will have to come
  after v3.17"

* tag 'pci-v3.17-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: pciehp: Fix pcie_wait_cmd() timeout
  Revert "PCI: Make sure bus number resources stay within their parents bounds"
  Revert "PCI: Don't scan random busses in pci_scan_bridge()"

Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
"This fixes three issues:

   - if ccp is loaded on a machine without ccp, it will incorrectly
     activate causing all requests to fail.  Fixed by preventing ccp
     from loading if hardware isn't available.

   - not all IRQs were enabled for the qat driver, leading to potential
     stalls when it is used

   - disabled buggy AVX CTR implementation in aesni"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: aesni - disable "by8" AVX CTR optimization
  crypto: ccp - Check for CCP before registering crypto algs
  crypto: qat - Enable all 32 IRQs

Merge tag 'media/v3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
"For some last time fixes:
   - a regression detected on Kernel 3.16 related to VBI Teletext
     application breakage on drivers using videobuf2 (see
     https://bugzilla.kernel.org/show_bug.cgi?id=84401).  The bug was
     noticed on saa7134 (migrated to VB2 on 3.16), but also affects
     em28xx (migrated on 3.9 to VB2);
   - two additional sanity checks at videobuf2;
   - two fixups to restore proper VBI support at the em28xx driver;
   - two Kernel oops fixups (at cx24123 and cx2341x drivers);
   - a bug at adv7604 where an if was doing just the opposite as it
     would be expected;
   - some documentation fixups to match the behavior defined at the
     Kernel"

* tag 'media/v3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] em28xx-v4l: get rid of field "users" in struct em28xx_v4l2"
  [media] em28xx: fix VBI handling logic
  [media] DocBook media: improve the poll() documentation
  [media] DocBook media: fix the poll() 'no QBUF' documentation
  [media] vb2: fix VBI/poll regression
  [media] cx2341x: fix kernel oops
  [media] cx24123: fix kernel oops due to missing parent pointer
  [media] adv7604: fix inverted condition
  [media] media/radio: fix radio-miropcm20.c build with io.h header file
  [media] vb2: fix plane index sanity check in vb2_plane_cookie()
  [media] DocBook media: update version number and V4L2 changes
  [media] DocBook media: fix fieldname in struct v4l2_subdev_selection
  [media] vb2: fix vb2 state check when start_streaming fails
  [media] videobuf2-core.h: fix comment
  [media] videobuf2-core: add comments before the WARN_ON
  [media] videobuf2-dma-sg: fix for wrong GFP mask to sg_alloc_table_from_pages

Merge tag 'md/3.17-more-fixes' of git://git.neil.brown.name/md

Pull bugfixes for md/raid1 from Neil Brown:
"It is amazing how much easier it is to find bugs when you know one is
  there.  Two bug reports resulted in finding 7 bugs!

  All are tagged for -stable.  Those that can't cause (rare) data
  corruption, cause lockups.

  Particularly, but not only, fixing new "resync" code"

* tag 'md/3.17-more-fixes' of git://git.neil.brown.name/md:
  md/raid1: fix_read_error should act on all non-faulty devices.
  md/raid1: count resync requests in nr_pending.
  md/raid1: update next_resync under resync_lock.
  md/raid1: Don't use next_resync to determine how far resync has progressed
  md/raid1: make sure resync waits for conflicting writes to complete.
  md/raid1: clean up request counts properly in close_sync()
  md/raid1:  be more cautious where we read-balance during resync.
  md/raid1: intialise start_next_window for READ case to avoid hang

blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe

blk-mq uses percpu_ref for its usage counter which tracks the number
of in-flight commands and used to synchronously drain the queue on
freeze.  percpu_ref shutdown takes measureable wallclock time as it
involves a sched RCU grace period.  This means that draining a blk-mq
takes measureable wallclock time.  One would think that this shouldn't
matter as queue shutdown should be a rare event which takes place
asynchronously w.r.t. userland.

Unfortunately, SCSI probing involves synchronously setting up and then
tearing down a lot of request_queues back-to-back for non-existent
LUNs.  This means that SCSI probing may take more than ten seconds
when scsi-mq is used.

This will be properly fixed by implementing a mechanism to keep
q->mq_usage_counter in atomic mode till genhd registration; however,
that involves rather big updates to percpu_ref which is difficult to
apply late in the devel cycle (v3.17-rc6 at the moment).  As a
stop-gap measure till the proper fix can be implemented in the next
cycle, this patch introduces __percpu_ref_kill_expedited() and makes
blk_mq_freeze_queue() use it.  This is heavy-handed but should work
for testing the experimental SCSI blk-mq implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de
Fixes: add703fda981 ("blk-mq: use percpu_ref for mq usage count")
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>

crypto: aesni - disable "by8" AVX CTR optimization

The "by8" implementation introduced in commit 22cddcc7df8f ("crypto: aes
- AES CTR x86_64 "by8" AVX optimization") is failing crypto tests as it
handles counter block overflows differently. It only accounts the right
most 32 bit as a counter -- not the whole block as all other
implementations do. This makes it fail the cryptomgr test #4 that
specifically tests this corner case.

As we're quite late in the release cycle, just disable the "by8" variant
for now.

Reported-by: Romain Francoise <romain@orebokech.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

sched: Fix unreleased llc_shared_mask bit during CPU hotplug

The following bug can be triggered by hot adding and removing a large number of
xen domain0's vcpus repeatedly:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
PGD 5a9d5067 PUD 13067 PMD 0
Oops: 0000 [#3] SMP
[...]
Call Trace:
load_balance
? _raw_spin_unlock_irqrestore
idle_balance
__schedule
schedule
schedule_timeout
? lock_timer_base
schedule_timeout_uninterruptible
msleep
lock_device_hotplug_sysfs
online_store
dev_attr_store
sysfs_write_file
vfs_write
SyS_write
system_call_fastpath

Last level cache shared mask is built during CPU up and the
build_sched_domain() routine takes advantage of it to setup
the sched domain CPU topology.

However, llc_shared_mask is not released during CPU disable,
which leads to an invalid sched domainCPU topology.

This patch fix it by releasing the llc_shared_mask correctly
during CPU disable.

Yasuaki also reported that this can happen on real hardware:

https://lkml.org/lkml/2014/7/22/1018

His case is here:

==
Here is an example on my system.
My system has 4 sockets and each socket has 15 cores and HT is
enabled. In this case, each core of sockes is numbered as
follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-44, 90-104
Socket#3 | 45-59, 105-119

Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.

It means that last level cache of Socket#2 is shared with
CPU#30-44 and 90-104.

When hot-removing socket#2 and #3, each core of sockets is
numbered as follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89

But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
remains having 0x3fff80000001fffc0000000.

After that, when hot-adding socket#2 and #3, each core of
sockets is numbered as follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-59
Socket#3 | 90-119

Then llc_shared_mask of CPU#30 becomes
0x3fff8000fffffffc0000000. It means that last level cache of
Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
the wrong value.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Linn Crosetto <linn@hp.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

x86/efi: Delete misleading efi_printk() error message

A number of people are reporting seeing the "setup_efi_pci() failed!"
error message in what used to be a quiet boot,

https://bugzilla.kernel.org/show_bug.cgi?id=81891

The message isn't all that helpful because setup_efi_pci() can return a
non-success error code for a variety of reasons, not all of them fatal.

Let's drop the return code from setup_efi_pci*() altogether, since
there's no way to process it in any meaningful way outside of the inner
__setup_efi_pci*() functions.

Reported-by: Darren Hart <dvhart@linux.intel.com>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Ulf Winkelvos <ulf@winkelvos.de>
Cc: Andre Müller <andre.muller@web.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>

gpio: Fix potential NULL handler data in chained irqchip handler

There is possibility with misconfigured pins that interrupt occurs instantly
after setting irq_set_chained_handler() in gpiochip_set_chained_irqchip().
Now if handler gets called before irq_set_handler_data() the handler gets
NULL handler data.

Fix this by moving irq_set_handler_data() call before
irq_set_chained_handler() in gpiochip_set_chained_irqchip().

Cc: Stable <stable@vger.kernel.org> # 3.15+
Reviewed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>

gpio: Fix gpio direction flags not getting set

GPIO direction flags are not getting set because
an 'if' statement is the wrong way around.

Cc: Stable <stable@vger.kernel.org> # 3.15+
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

crypto: ccp - Check for CCP before registering crypto algs

If the ccp is built as a built-in module, then ccp-crypto (whether
built as a module or a built-in module) will be able to load and
it will register its crypto algorithms. If the system does not have
a CCP this will result in -ENODEV being returned whenever a command
is attempted to be queued by the registered crypto algorithms.

Add an API, ccp_present(), that checks for the presence of a CCP
on the system. The ccp-crypto module can use this to determine if it
should register it's crypto alogorithms.

Cc: stable@vger.kernel.org
Reported-by: Scot Doyle <lkml14@scotdoyle.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Tested-by: Scot Doyle <lkml14@scotdoyle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

drivers/soc: qcom: do not disable the iface clock in probe

since commit 31964ffebbb9 ("tty: serial: msm: Remove direct access to GSBI")'
serial hangs if earlyprintk are enabled.

This hang is noticed only when the GSBI driver is probed and all the
earlyprintks before gsbi probe are seen on the console.
The reason why it hangs is because GSBI driver disables hclk in its
probe function without realizing that the serial IP might be in use by
a bootconsole. As gsbi driver disables the clock in probe the
bootconsole locks up.

Turning off hclk's could be dangerous if there are system components
like earlyprintk using the hclk.

This patch fixes the issue by delegating the clock management to
probe and remove functions in gsbi rather than disabling the clock in probe.

More detailed problem description can be found here:
http://www.spinics.net/lists/linux-arm-msm/msg10589.html

Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma fixes from Roland Dreier:
"Last late set of InfiniBand/RDMA fixes for 3.17:

   - fixes for the new memory region re-registration support
   - iSER initiator error path fixes
   - grab bag of small fixes for the qib and ocrdma hardware drivers
   - larger set of fixes for mlx4, especially in RoCE mode"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
  IB/mlx4: Fix VF mac handling in RoCE
  IB/mlx4: Do not allow APM under RoCE
  IB/mlx4: Don't update QP1 in native mode
  IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header
  mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses
  IB/core: When marshaling uverbs path, clear unused fields
  IB/mlx4: Avoid executing gid task when device is being removed
  IB/mlx4: Fix lockdep splat for the iboe lock
  IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up
  IB/mlx4: Reorder steps in RoCE GID table initialization
  IB/mlx4: Don't duplicate the default RoCE GID
  IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs()
  IB/iser: Bump version to 1.4.1
  IB/iser: Allow bind only when connection state is UP
  IB/iser: Fix RX/TX CQ resource leak on error flow
  RDMA/ocrdma: Use right macro in query AH
  RDMA/ocrdma: Resolve L2 address when creating user AH
  mlx4: Correct error flows in rereg_mr
  IB/qib: Correct reference counting in debugfs qp_stats
  IPoIB: Remove unnecessary port query
  ...