Naoya Horiguchi [Wed, 14 May 2014 00:01:32 +0000 (10:01 +1000)]
mm, hugetlbfs: fix rmapping for anonymous hugepages with page_pgoff()
page->index stores pagecache index when the page is mapped into file
mapping region, and the index is in pagecache size unit, so it depends on
the page size. Some of users of reverse mapping obviously assumes that
page->index is in PAGE_CACHE_SHIFT unit, so they don't work for anonymous
hugepage.
For example, consider that we have 3-hugepage vma and try to mbind the 2nd
hugepage to migrate to another node. Then the vma is split and
migrate_page() is called for the 2nd hugepage (belonging to the middle
vma.) In migrate operation, rmap_walk_anon() tries to find the relevant
vma to which the target hugepage belongs, but here we miscalculate pgoff.
So anon_vma_interval_tree_foreach() grabs invalid vma, which fires
VM_BUG_ON.
This patch introduces a new API that is usable both for normal page and
hugepage to get PAGE_SIZE offset from page->index. Users should clearly
distinguish page_index for pagecache index and page_pgoff for page offset.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stephen Rothwell [Wed, 14 May 2014 00:01:32 +0000 (10:01 +1000)]
mm: get rid of __GFP_KMEMCG fix
export kmalloc_order() to modules
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vladimir Davydov <vdavydov@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vladimir Davydov [Wed, 14 May 2014 00:01:31 +0000 (10:01 +1000)]
mm: get rid of __GFP_KMEMCG
Currently to allocate a page that should be charged to kmemcg (e.g.
threadinfo), we pass __GFP_KMEMCG flag to the page allocator. The page
allocated is then to be freed by free_memcg_kmem_pages. Apart from
looking asymmetrical, this also requires intrusion to the general
allocation path. So let's introduce separate functions that will
alloc/free pages charged to kmemcg.
The new functions are called alloc_kmem_pages and free_kmem_pages. They
should be used when the caller actually would like to use kmalloc, but has
to fall back to the page allocator for the allocation is large. They only
differ from alloc_pages and free_pages in that besides allocating or
freeing pages they also charge them to the kmem resource counter of the
current memory cgroup.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Vladimir Davydov [Wed, 14 May 2014 00:01:31 +0000 (10:01 +1000)]
sl[au]b: charge slabs to kmemcg explicitly
We have only a few places where we actually want to charge kmem so instead
of intruding into the general page allocation path with __GFP_KMEMCG it's
better to explictly charge kmem there. All kmem charges will be easier to
follow that way.
This is a step towards removing __GFP_KMEMCG. It removes __GFP_KMEMCG
from memcg caches' allocflags. Instead it makes slab allocation path call
memcg_charge_kmem directly getting memcg to charge from the cache's memcg
params.
This also eliminates any possibility of misaccounting an allocation going
from one memcg's cache to another memcg, because now we always charge
slabs against the memcg the cache belongs to. That's why this patch
removes the big comment to memcg_kmem_get_cache.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dave Hansen [Wed, 14 May 2014 00:01:31 +0000 (10:01 +1000)]
mm: slub: fix ALLOC_SLOWPATH stat
There used to be only one path out of __slab_alloc(), and ALLOC_SLOWPATH
got bumped in that exit path. Now there are two, and a bunch of gotos.
ALLOC_SLOWPATH can now get set more than once during a single call to
__slab_alloc() which is pretty bogus. Here's the sequence:
1. Enter __slab_alloc(), fall through all the way to the
stat(s, ALLOC_SLOWPATH);
2. hit 'if (!freelist)', and bump DEACTIVATE_BYPASS, jump to
new_slab (goto #1)
3. Hit 'if (c->partial)', bump CPU_PARTIAL_ALLOC, goto redo
(goto #2)
4. Fall through in the same path we did before all the way to
stat(s, ALLOC_SLOWPATH)
5. bump ALLOC_REFILL stat, then return
Doing this is obviously bogus. It keeps us from being able to accurately
compare ALLOC_SLOWPATH vs. ALLOC_FASTPATH. It also means that the total
number of allocs always exceeds the total number of frees.
This patch moves stat(s, ALLOC_SLOWPATH) to be called from the same place
that __slab_alloc() is. This makes it much less likely that
ALLOC_SLOWPATH will get botched again in the spaghetti-code inside
__slab_alloc().
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Only define count_free() when CONFIG_SLUB_DEBUG since that's the only
context in which it is referenced. Only define count_partial() when
CONFIG_SLUB_DEBUG or CONFIG_SYSFS since the sysfs interface still uses it
for partial slab counts.
Also only define node_nr_objs() when CONFIG_SLUB_DEBUG since that's the
only context in which it is referenced.
Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Rientjes [Wed, 14 May 2014 00:01:30 +0000 (10:01 +1000)]
mm, slab: suppress out of memory warning unless debug is enabled
When the slab or slub allocators cannot allocate additional slab pages,
they emit diagnostic information to the kernel log such as current number
of slabs, number of objects, active objects, etc. This is always coupled
with a page allocation failure warning since it is controlled by
!__GFP_NOWARN.
Suppress this out of memory warning if the allocator is configured without
debug supported. The page allocation failure warning will indicate it is
a failed slab allocation, the order, and the gfp mask, so this is only
useful to diagnose allocator issues.
Since CONFIG_SLUB_DEBUG is already enabled by default for the slub
allocator, there is no functional change with this patch. If debug is
disabled, however, the warnings are now suppressed.
Signed-off-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Wed, 14 May 2014 00:01:30 +0000 (10:01 +1000)]
mm/slub.c: convert vnsprintf-static to va_format
Inspired by Joe Perches suggestion in ntfs logging clean-up.
Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Christoph Lameter <cl@linux.com> Cc: Joe Perches <joe@perches.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Wed, 14 May 2014 00:01:30 +0000 (10:01 +1000)]
mm/slub.c: convert printk to pr_foo()
-All printk(KERN_foo converted to pr_foo()
-Default printk converted to pr_warn()
-Coalesce format fragments
Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Christoph Lameter <cl@linux.com> Cc: Joe Perches <joe@perches.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Don Zickus <dzickus@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Aaron Tomlin [Wed, 14 May 2014 00:01:29 +0000 (10:01 +1000)]
nmi: provide the option to issue an NMI back trace to every cpu but current
Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
routine when one wants to avoid capturing a back trace for current. For
instance if one was previously captured recently.
This patch provides a new routine namely
trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
an NMI to every cpu but current and capture a back trace accordingly.
Patch x86 and sparc to support new routine.
[dzickus@redhat.com: add stub in #else clause]
[dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Wed, 14 May 2014 00:01:29 +0000 (10:01 +1000)]
fs/9p: kerneldoc fixes
Function parameters comment fixing.
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Wed, 14 May 2014 00:01:29 +0000 (10:01 +1000)]
fs/9p/v9fs.c: add __init to v9fs_sysfs_init
v9fs_sysfs_init is only called by __init init_v9fs
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Josh Hunt [Wed, 14 May 2014 00:01:29 +0000 (10:01 +1000)]
block: restore /proc/partitions to not display non-partitionable removable devices
We found with newer kernels we started seeing the cdrom device showing
up in /proc/partitions, but it was not there before.
Looking into this I found that commit d27769ec ("block: add
GENHD_FL_NO_PART_SCAN") introduces this change in behavior. It's not
clear to me from the commit's changelog if this change was intentional or
not. This comment still remains: /* Don't show non-partitionable
removeable devices or empty devices */ so I've decided to send a patch to
restore the behavior of not printing unpartitionable removable devices.
Signed-off-by: Josh Hunt <johunt@akamai.com> Cc: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In case of error, V2 restored the previous number of segments but left
the BIO_SEG_FLAG set.
To avoid problems, after the page is removed from the bio vec,
V3 performs a recount of the segments in the error code path.
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
bio: modify __bio_add_page() to accept pages that don't start a new segment
The original behaviour is to refuse to add a new page if the maximum
number of segments has been reached, regardless of the fact the page we
are going to add can be merged into the last segment or not.
Unfortunately, when the system runs under heavy memory fragmentation
conditions, a driver may try to add multiple pages to the last segment.
The original code won't accept them and EBUSY will be reported to
userspace.
This patch modifies the function so it refuses to add a page only in case
the latter starts a new segment and the maximum number of segments has
already been reached.
The bug can be easily reproduced with the st driver:
1) set CONFIG_SCSI_MPT2SAS_MAX_SGE or CONFIG_SCSI_MPT3SAS_MAX_SGE to 16
2) modprobe st buffer_kbs=1024
3) #dd if=/dev/zero of=/dev/st0 bs=1M count=10
dd: error writing `/dev/st0': Device or resource busy
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
- Invalid maintainer e-mail address:
Mail server reply:
Recipient address rejected: User unknown in virtual alias table
- Remove no longer working webpage URL
- Remove obsolete "Person" field
- Move status to "Orphan"
- Add Dave Jeffery and Jack Hammer to the CREDITS file
Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com> Reviewed-by: Jean Delvare <jdelvare@suse.de> Cc: David Jeffery <dhjeffery@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Paul Bolle <pebolle@tiscali.nl> Reviewed-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
jiangyiwen [Wed, 14 May 2014 00:01:27 +0000 (10:01 +1000)]
ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod
When the call to ocfs2_add_entry() failed in ocfs2_symlink() and
ocfs2_mknod(), iput() will not be called during dput(dentry) because no
d_instantiate(), and this will lead to umount hung.
Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
jiangyiwen [Wed, 14 May 2014 00:01:27 +0000 (10:01 +1000)]
ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount
The following case may lead to endless loop during umount.
node A node B node C node D
umount volume,
migrate lockres1
to B
want to lock lockres1,
send
MASTER_REQUEST_MSG
to C
init block mle
send
MIGRATE_REQUEST_MSG
to C
find a block
mle, and then
return
DLM_MIGRATE_RESPONSE_MASTERY_REF
to B
set C in refmap
umount successfully
try to umount, endless
loop occurs when migrate
lockres1 since C is in
refmap
So we can fix this endless loop case by only returning
DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving
MIGRATE_REQUEST_MSG.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Xue jiufei <xuejiufei@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yiwen Jiang [Wed, 14 May 2014 00:01:27 +0000 (10:01 +1000)]
ocfs2: fix a tiny race when running dirop_fileop_racer
When running dirop_fileop_racer we found a dead lock case.
2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create
/race/16/1 in the filesystem, and let the inode number of dir 16 is less
than the inode number of dir race.
Node A Node B
mv /race/16/1 /race/
right after Node A has got the
EX mode of /race/16/, and tries to
get EX mode of /race
ls /race/16/
In this case, Node A has got the EX mode of /race/16/, and wants to get EX
mode of /race/. Node B has got the PR mode of /race/, and wants to get
the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead
lock happens.
This patch fixes this case by locking in ancestor order before trying
inode number order.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When o2net-accept-one() rejects an illegal connection, it terminates the
loop picking up the remaining queued connections. This fix will continue
accepting connections till the queue is emtpy.
Signed-off-by: Tariq Saseed <tariq.x.saeed@oracle.com> Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tariq Saeed [Wed, 14 May 2014 00:01:26 +0000 (10:01 +1000)]
ocfs2/o2net: incorrect to terminate accepting connections loop upon rejecting an invalid one
When o2net-accept-one() rejects an illegal connection, it terminates the
loop picking up the remaining queued connections. This fix will continue
accepting connections till the queue is emtpy.
alex chen [Wed, 14 May 2014 00:01:26 +0000 (10:01 +1000)]
ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename()
There are two files a and b in dir /mnt/ocfs2.
node A node B
mv a b
In ocfs2_rename(), after calling
ocfs2_orphan_add(), the inode of
file b will be added into orphan
dir.
If ocfs2_update_entry() fails,
ocfs2_rename return error and mv
operation fails. But file b still
exists in the parent dir.
ocfs2_queue_orphan_scan
-> ocfs2_queue_recovery_completion
-> ocfs2_complete_recovery
-> ocfs2_recover_orphans
The inode of the file b will be
put with iput().
ocfs2_evict_inode
-> ocfs2_delete_inode
-> ocfs2_wipe_inode
-> ocfs2_remove_inode
OCFS2_VALID_FL in the inode
i_flags will be cleared.
The file b still can be accessed
on node B.
ls /mnt/ocfs2
When first read the file b with
ocfs2_read_inode_block(). It will
validate the inode using
ocfs2_validate_inode_block().
Because OCFS2_VALID_FL not set in
the inode i_flags, so the file
system will be readonly.
So we should add inode into orphan dir after updating entry in
ocfs2_rename().
Signed-off-by: alex.chen <alex.chen@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Wed, 14 May 2014 00:01:26 +0000 (10:01 +1000)]
ocfs2-limit-printk-when-journal-is-aborted-fix
document the msleep
Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joseph Qi [Wed, 14 May 2014 00:01:26 +0000 (10:01 +1000)]
ocfs2: limit printk when journal is aborted
Once JBD2_ABORT is set, ocfs2_commit_cache will fail in
ocfs2_commit_thread. Then it will get into a loop with mass logs. This
will meaninglessly consume a larger number of resource and may lead to the
system hanging. So limit printk in this case.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
George Spelvin [Wed, 14 May 2014 00:01:25 +0000 (10:01 +1000)]
ocfs2: remove some redundant casting
There are two standard techniques for dereferencing structures pointed
to by void *: cast to the right type each time they're used, or assign
to local variables of the right type.
But there's no need to do *both*.
Signed-off-by: George Spelvin <linux@horizon.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Wed, 14 May 2014 00:01:25 +0000 (10:01 +1000)]
fs/ocfs2/super.c: use OCFS2_MAX_VOL_LABEL_LEN and strlcpy
Replace strncpy(size 63) by defined value.
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Wed, 14 May 2014 00:01:25 +0000 (10:01 +1000)]
ocfs2: remove NULL assignments on static
Static values are automatically initialized to NULL.
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
__get_cpu_var() is used for multiple purposes in the kernel source. One
of them is address calculation via the form &__get_cpu_var(x). This
calculates the address for the instance of the percpu variable of the
current processor based on an offset.
Other use cases are for storing and retrieving data from the current
processors percpu area. __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.
__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.
this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.
This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset. Thereby address calculations are avoided and less
registers are used when code is generated.
At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.
The patch set includes passes over all arches as well. Once these
operations are used throughout then specialized macros can be defined in
non -x86 arches as well in order to optimize per cpu access by f.e. using
a global register that may be set to the per cpu base.
Transformations done to __get_cpu_var()
1. Determine the address of the percpu instance of the current processor.
DEFINE_PER_CPU(int, y);
int *x = &__get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(&y);
2. Same as #1 but this time an array structure is involved.
DEFINE_PER_CPU(int, y[20]);
int *x = __get_cpu_var(y);
Converts to
int *x = this_cpu_ptr(y);
3. Retrieve the content of the current processors instance of a per cpu
variable.
DEFINE_PER_CPU(int, y);
int x = __get_cpu_var(y)
Converts to
int x = __this_cpu_read(y);
4. Retrieve the content of a percpu struct
DEFINE_PER_CPU(struct mystruct, y);
struct mystruct x = __get_cpu_var(y);
Converts to
memcpy(&x, this_cpu_ptr(&y), sizeof(x));
5. Assignment to a per cpu variable
DEFINE_PER_CPU(int, y)
__get_cpu_var(y) = x;
Converts to
__this_cpu_write(y, x);
6. Increment/Decrement etc of a per cpu variable
DEFINE_PER_CPU(int, y);
__get_cpu_var(y)++
Converts to
__this_cpu_inc(y)
Signed-off-by: Christoph Lameter <cl@linux.com> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> [compilation only] Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Wed, 14 May 2014 00:01:22 +0000 (10:01 +1000)]
ntfs: remove NULL value assignments
Static values are automatically initialized to NULL.
Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Samuel Thibault [Wed, 14 May 2014 00:01:22 +0000 (10:01 +1000)]
input: route kbd LEDs through the generic LEDs layer
This permits to reassign keyboard LEDs to something else than keyboard
"leds" state, by adding keyboard led and modifier triggers connected to a
series of VT input LEDs, themselves connected to VT input triggers, which
per-input device LEDs use by default. Userland can thus easily change the
LED behavior of (a priori) all input devices, or of particular input
devices.
This also permits to fix #7063 from userland by using a modifier to
implement proper CapsLock behavior and have the keyboard caps lock led
show that modifier state.
[ebroder@mokafive.com: Rebased to 3.2-rc1 or so, cleaned up some includes, and fixed some constants]
[blogic@openwrt.org: CONFIG_INPUT_LEDS stubs should be static inline]
[akpm@linux-foundation.org: remove unneeded `extern', fix comment layout] Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Evan Broder <evan@ebroder.net> Reviewed-by: David Herrmann <dh.herrmann@gmail.com> Tested-by: Pavel Machek <pavel@ucw.cz> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Cc: Pavel Machek <pavel@ucw.cz> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Bryan Wu <cooloney@gmail.com> Cc: Arnaud Patard <arnaud.patard@rtp-net.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Matt Sealey <matt@genesi-usa.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Niels de Vos <devos@fedoraproject.org> Cc: Steev Klimaszewski <steev@genesi-usa.com> Signed-off-by: John Crispin <blogic@openwrt.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Stephen Boyd [Wed, 14 May 2014 00:01:22 +0000 (10:01 +1000)]
sched_clock: document 4Mhz vs 1Mhz decision
Bo Shen sent a patch to change this to 1Mhz instead of 4Mhz but according
to Russell King the use of 4Mhz was intentional. Add a comment to this
effect so that others don't try to change the code as well.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Cc: Bo Shen <voice.shen@atmel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fs/notify/fanotify/fanotify_user.c: fix FAN_MARK_FLUSH flag checking
If fanotify_mark is called with illegal value of arguments flags and marks
it usually returns EINVAL.
When fanotify_mark is called with FAN_MARK_FLUSH the argument flags is not
checked for irrelevant flags like FAN_MARK_IGNORED_MASK.
The patch removes this inconsistency.
If an irrelevant flag is set error EINVAL is returned.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Acked-by: Michael Kerrisk <mtk.manpages@gmail.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
David Cohen [Wed, 14 May 2014 00:01:20 +0000 (10:01 +1000)]
fs/notify/mark.c: trivial cleanup
Do not initialize private_destroy_list twice. list_replace_init() already
takes care of initializing private_destroy_list. We don't need to
initialize it with LIST_HEAD() beforehand.
Signed-off-by: David Cohen <david.a.cohen@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Before the patch,
read creates FAN_ACCESS_PERM and FAN_ACCESS events,
readdir creates only FAN_ACCESS_PERM events.
This is inconsistent.
After the patch,
readdir creates FAN_ACCESS_PERM and FAN_ACCESS events.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fanotify: FAN_MARK_FLUSH: avoid having to provide a fake/invalid fd and path
Originally from Tvrtko Ursulin (https://lkml.org/lkml/2011/1/12/112)
Avoid having to provide a fake/invalid fd and path when flushing marks
Currently for a group to flush marks it has set it needs to provide a fake
or invalid (but resolvable) file descriptor and path when calling
fanotify_mark. This patch pulls the flush handling a bit up so file
descriptor and path are completely ignored when flushing.
I reworked the patch to be applicable again (the signature of fanotify_mark
has changed since Tvrtko's work).
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Tvrtko Ursulin <tvrtko.ursulin@onelan.co.uk> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Wed, 14 May 2014 00:01:19 +0000 (10:01 +1000)]
fs/cifs: remove obsolete __constant
Replace all __constant_foo to foo() except in smb2status.h (1700 lines to
update).
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Steve French <sfrench@samba.org> Cc: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Yinghai Lu [Wed, 14 May 2014 00:01:18 +0000 (10:01 +1000)]
x86, mm: probe memory block size for generic x86 64bit
On system with 2TiB ram, current x86_64 have 128M as section size, and one
memory_block only include one section. So will have 16400 entries under
/sys/devices/system/memory/.
Current code try to use block id to find block pointer in /sys
for any section, and reuse that block pointer. that finding will take some time
even after
that will skip the search in that case during booting up.
So solution could be increase block size just like SGI UV system did.
(harded code to 2g).
This patch is trying to probe the block size to make it match mmio remap
size. for example, Intel Nehalem later system will have memory range [0,
TOML), [4g, TOMH]. If the memory hole is 2g and total is 128g, TOM will
be 2g, and TOM2 will be 130g.
We could use 2g as block size instead of default 128M. That will reduce
number of entries in /sys/devices/system/memory/
On system 6TiB system will reduce boot time by 35 seconds.
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Wed, 14 May 2014 00:01:18 +0000 (10:01 +1000)]
x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels -fix 2
powerpc has NUMA_BALANCING and non-NUMA_BALANCING versions of pte_present
and I missed that when testing cross-compiling. This patch replaces
x86-define-_page_numa-by-reusing-software-bits-on-the-pmd-and-pte-levels-fix.patch
Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Wed, 14 May 2014 00:01:17 +0000 (10:01 +1000)]
x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels
_PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
faults on x86. Care is taken such that _PAGE_NUMA is used only in
situations where the VMA flags distinguish between NUMA hinting faults and
prot_none faults. This decision was x86-specific and conceptually it is
difficult requiring special casing to distinguish between PROTNONE and
NUMA ptes based on context.
Fundamentally, we only need the _PAGE_NUMA bit to tell the difference
between an entry that is really unmapped and a page that is protected for
NUMA hinting faults as if the PTE is not present then a fault will be
trapped.
Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset. This
patch shrinks the maximum possible swap size and uses the bit to uniquely
distinguish between NUMA hinting ptes and swap ptes.
Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mel Gorman [Wed, 14 May 2014 00:01:17 +0000 (10:01 +1000)]
x86: require x86-64 for automatic NUMA balancing
32-bit support for NUMA is an oddity on its own but with automatic NUMA
balancing on top there is a reasonable risk that the CPUPID information
cannot be stored in the page flags. This patch removes support for
automatic NUMA support on 32-bit x86.
Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Johannes Weiner [Wed, 14 May 2014 00:01:17 +0000 (10:01 +1000)]
mm: madvise: fix MADV_WILLNEED on shmem swapouts
MADV_WILLNEED currently does not read swapped out shmem pages back in.
0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache radix
trees") made find_get_page() filter exceptional radix tree entries but
failed to convert all find_get_page() callers that WANT exceptional
entries over to find_get_entry(). One of them is shmem swap readahead in
madvise, which now skips over any swap-out records.
Convert it to find_get_entry().
Fixes: 0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache radix trees") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jens Axboe [Wed, 14 May 2014 00:01:17 +0000 (10:01 +1000)]
mm/filemap.c: avoid always dirtying mapping->flags on O_DIRECT
In some testing I ran today (some fio jobs that spread over two nodes), we
end up spending 40% of the time in filemap_check_errors(). That smells
fishy. Looking further, this is basically what happens:
blkdev_aio_read()
generic_file_aio_read()
filemap_write_and_wait_range()
if (!mapping->nr_pages)
filemap_check_errors()
and filemap_check_errors() always attempts two test_and_clear_bit() on the
mapping flags, thus dirtying it for every single invocation. The patch
below tests each of these bits before clearing them, avoiding this issue.
In my test case (4-socket box), performance went from 1.7M IOPS to 4.0M
IOPS.
Signed-off-by: Jens Axboe <axboe@fb.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Chen Yucong [Wed, 14 May 2014 00:01:16 +0000 (10:01 +1000)]
HWPOSION, hugetlb: lock_page/unlock_page does not match for handling a free hugepage
For handling a free hugepage in memory failure, the race will happen if
another thread hwpoisoned this hugepage concurrently. So we need to check
PageHWPoison instead of !PageHWPoison.
If hwpoison_filter(p) returns true or a race happens, then we need to
unlock_page(hpage).
Linus Torvalds [Tue, 13 May 2014 02:28:52 +0000 (11:28 +0900)]
Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix resource leak as well as broken store function in emc1403 driver,
and add support for additional chip revisions"
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (emc1403) Support full range of known chip revision numbers
hwmon: (emc1403) Fix resource leak on module unload
hwmon: (emc1403) fix inverted store_hyst()
Linus Torvalds [Tue, 13 May 2014 02:25:56 +0000 (11:25 +0900)]
Merge branch 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull a percpu fix from Tejun Heo:
"Fix for a percpu allocator bug where it could try to kfree() a memory
region allocated using vmalloc(). The bug has been there for years
now and is unlikely to have ever triggered given the size of struct
pcpu_chunk. It's still theoretically possible and the fix is simple
and safe enough, so the patch is marked with -stable"
* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree()
Linus Torvalds [Tue, 13 May 2014 02:24:07 +0000 (11:24 +0900)]
Merge branch 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
"Fixes for two bugs in workqueue.
One is exiting with internal mutex held in a failure path of
wq_update_unbound_numa(). The other is a subtle and unlikely
use-after-possible-last-put in the rescuer logic. Both have been
around for quite some time now and are unlikely to have triggered
noticeably often. All patches are marked for -stable backport"
* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix a possible race condition between rescuer and pwq-release
workqueue: make rescuer_thread() empty wq->maydays list before exiting
workqueue: fix bugs in wq_update_unbound_numa() failure path
Linus Torvalds [Tue, 13 May 2014 02:22:57 +0000 (11:22 +0900)]
Merge branch 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"During recent restructuring, device_cgroup unified config input check
and enforcement logic; unfortunately, it turned out to share too much.
Aristeu's patches fix the breakage and marked for -stable backport.
The other two patches are fallouts from kernfs conversion. The blkcg
change is temporary and will go away once kernfs internal locking gets
simplified (patches pending)"
* 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats()
device_cgroup: check if exception removal is allowed
device_cgroup: fix the comment format for recently added functions
device_cgroup: rework device access check and exception checking
cgroup: fix the retry path of cgroup_mount()
Linus Torvalds [Tue, 13 May 2014 02:21:01 +0000 (11:21 +0900)]
Merge tag 'stable/for-linus-3.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- Fix arm64 crash on boot.
- Quiet a noisy arm build warning (virt_to_pfn() redefined).
* tag 'stable/for-linus-3.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
arm64: introduce virt_to_pfn
xen/events/fifo: correctly align bitops
arm/xen: Remove definiition of virt_to_pfn in asm/xen/page.h
Linus Torvalds [Tue, 13 May 2014 02:11:48 +0000 (11:11 +0900)]
Merge tag 'md/3.15-fixes' of git://neil.brown.name/md
Pull md bugfixes from Neil Brown:
"Two bugfixes for md in 3.15
Both tagged for -stable"
* tag 'md/3.15-fixes' of git://neil.brown.name/md:
md: avoid possible spinning md thread at shutdown.
md/raid10: call wait_barrier() for each request submitted.
Linus Torvalds [Tue, 13 May 2014 02:07:02 +0000 (11:07 +0900)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Seems like we've had more fixes than usual this release cycle, but
there's nothing in particular that we're doing differently. Perhaps
it's just one of those cycles where more people are finding more
regressions (and/or that the latency of when people actually test
what's been in the tree for a while is catching up so that we get the
bug reports now).
The bigger changes here are are for TI and Marvell platforms:
* Timing changes for GPMC (generic localbus) on OMAP causing some
largeish DTS deltas.
* Fixes to window allocation on PCI for mvebu touching drivers/
stuff. Patches have acks from subsystem maintainers where needed.
* A fix from Thomas for a botched DT conversion in drivers/edma.
There's a handful of other fixes for the above platforms as well as
sunxi, at91, i.MX. I also included a MAINTAINER update for Broadcom,
and a trivial move of a binding doc.
I know you said you'd be offline this week, but I might as well post
it for when you return. :)"
I'm not quite offline yet. Doing a few pulls in the last hour before my
internet goes away..
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (31 commits)
MAINTAINERS: update Broadcom ARM tree location and add an SoC family
ARM: dts: i.MX53: Fix ipu register space size
ARM: dts: kirkwood: fix mislocated pcie-controller nodes
ARM: sunxi: Enable GMAC in sunxi_defconfig
ARM: common: edma: Fix xbar mapping
ARM: sun7i: Fix i2c4 base address
ARM: Kirkwood: T5325: Fix double probe of Codec
ARM: mvebu: enable the SATA interface on Armada 375 DB
ARM: mvebu: specify I2C bus frequency on Armada 370 DB
ARM: mvebu: use qsgmii phy-mode for Armada XP GP interfaces
ARM: mvebu: fix NOR bus-width in Armada XP OpenBlocks AX3 Device Tree
ARM: mvebu: fix NOR bus-width in Armada XP DB Device Tree
ARM: mvebu: fix NOR bus-width in Armada XP GP Device Tree
ARM: dts: AM3517: Disable absent IPs inherited from OMAP3
ARM: dts: OMAP2: Fix interrupts for OMAP2420 mailbox
ARM: dts: OMAP5: Add mailbox dt node to fix boot warning
ARM: OMAP5: Switch to THUMB mode if needed on secondary CPU
ARM: dts: am437x-gp-evm: Do not reset gpio5
ARM: dts: omap3-igep0020: use SMSC9221 timings
PCI: mvebu: split PCIe BARs into multiple MBus windows when needed
...
Josef Gajdusek [Mon, 12 May 2014 11:48:26 +0000 (13:48 +0200)]
hwmon: (emc1403) Support full range of known chip revision numbers
The datasheet for EMC1413/EMC1414, which is fully compatible to
EMC1403/1404 and uses the same chip identification, references revision
numbers 0x01, 0x03, and 0x04. Accept the full range of revision numbers
from 0x01 to 0x04 to make sure none are missed.
Signed-off-by: Josef Gajdusek <atx@atx.name> Cc: stable@vger.kernel.org
[Guenter Roeck: Updated headline and description] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Jean Delvare [Mon, 12 May 2014 09:44:51 +0000 (11:44 +0200)]
hwmon: (emc1403) Fix resource leak on module unload
Commit 454aee17f claims to convert driver emc1403 to use
devm_hwmon_device_register_with_groups, however the patch itself makes
use of hwmon_device_register_with_groups instead. As the driver remove
function was still dropped, the hwmon device is no longer unregistered
on driver removal, leading to a resource leak.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Fixes: 454aee17f hwmon: (emc1403) Convert to use devm_hwmon_device_register_with_groups Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org [3.13+] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Josef Gajdusek [Sun, 11 May 2014 12:40:44 +0000 (14:40 +0200)]
hwmon: (emc1403) fix inverted store_hyst()
Attempts to set the hysteresis value to a temperature below the target
limit fails with "write error: Numerical result out of range" due to
an inverted comparison.
Signed-off-by: Josef Gajdusek <atx@atx.name> Reviewed-by: Jean Delvare <jdelvare@suse.de> Cc: stable@vger.kernel.org
[Guenter Roeck: Updated headline and description] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
virt_to_pfn has been defined in arch/arm/include/asm/memory.h by commit e26a9e0 "ARM: Better virt_to_page() handling" and Xen has come to rely
on it. Introduce virt_to_pfn on arm64 too.
Linus Torvalds [Sun, 11 May 2014 09:06:13 +0000 (18:06 +0900)]
Merge branch 'for-3.15' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields.
* 'for-3.15' of git://linux-nfs.org/~bfields/linux:
NFSD: Call ->set_acl with a NULL ACL structure if no entries
NFSd: call rpc_destroy_wait_queue() from free_client()
NFSd: Move default initialisers from create_client() to alloc_client()
Linus Torvalds [Sun, 11 May 2014 08:56:53 +0000 (17:56 +0900)]
Merge branch 'akpm' (incoming from Andrew)
Merge misc fixes from Andrew Morton:
"4 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, thp: close race between mremap() and split_huge_page()
mm: postpone the disabling of kmemleak early logging
MAINTAINERS: update maintainership of LTP
drivers/rtc/rtc-hym8563.c: set uie_unsupported
mm, thp: close race between mremap() and split_huge_page()
It's critical for split_huge_page() (and migration) to catch and freeze
all PMDs on rmap walk. It gets tricky if there's concurrent fork() or
mremap() since usually we copy/move page table entries on dup_mm() or
move_page_tables() without rmap lock taken. To get it work we rely on
rmap walk order to not miss any entry. We expect to see destination VMA
after source one to work correctly.
But after switching rmap implementation to interval tree it's not always
possible to preserve expected walk order.
It works fine for dup_mm() since new VMA has the same vma_start_pgoff()
/ vma_last_pgoff() and explicitly insert dst VMA after src one with
vma_interval_tree_insert_after().
But on move_vma() destination VMA can be merged into adjacent one and as
result shifted left in interval tree. Fortunately, we can detect the
situation and prevent race with rmap walk by moving page table entries
under rmap lock. See commit 38a76013ad80.
Problem is that we miss the lock when we move transhuge PMD. Most
likely this bug caused the crash[1].
Fixes: 108d6642ad81 ("mm anon rmap: remove anon_vma_moveto_tail") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: Michel Lespinasse <walken@google.com> Cc: Dave Jones <davej@redhat.com> Cc: David Miller <davem@davemloft.net> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [3.7+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Catalin Marinas [Fri, 9 May 2014 22:36:59 +0000 (15:36 -0700)]
mm: postpone the disabling of kmemleak early logging
Commit 8910ae896c8c ("kmemleak: change some global variables to int"),
in addition to the atomic -> int conversion, moved the disabling of
kmemleak_early_log to the beginning of the kmemleak_init() function,
before the full kmemleak tracing is actually enabled. In this small
window, kmem_cache_create() is called by kmemleak which triggers
additional memory allocation that are not traced. This patch restores
the original logic with kmemleak_early_log disabling when kmemleak is
fully functional.
Fixes: 8910ae896c8c (kmemleak: change some global variables to int) Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heiko Stuebner [Fri, 9 May 2014 22:36:57 +0000 (15:36 -0700)]
drivers/rtc/rtc-hym8563.c: set uie_unsupported
The alarm of the hym8563 only supports a minute accuracy, while the uie
wants an alarm one second in the future. Therefore things like the
select() syscall will fail with a timeout, because the next alarm will
happen in a worst case of 60 seconds.
Olof Johansson [Sun, 11 May 2014 03:25:07 +0000 (20:25 -0700)]
Merge tag 'sunxi-fixes-for-3.15' of https://github.com/mripard/linux into fixes
Merge 'Allwinner fixes for 3.15' from Maxime Ripard:
Set of fixes for the Allwinner support for 3.15
Some minor things, the major thing being the enabling of the GMAC driver in
sunxi_defconfig that will un-break Olof's autobooters.
* tag 'sunxi-fixes-for-3.15' of https://github.com/mripard/linux:
ARM: sunxi: Enable GMAC in sunxi_defconfig
ARM: sun7i: Fix i2c4 base address
ARM: sun7i: fix PLL4 clock and add PLL8
Linus Torvalds [Fri, 9 May 2014 19:24:20 +0000 (12:24 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"A somewhat unpleasantly large collection of small fixes. The big ones
are the __visible tree sweep and a fix for 'earlyprintk=efi,keep'. It
was using __init functions with predictably suboptimal results.
Another key fix is a build fix which would produce output that simply
would not decompress correctly in some configuration, due to the
existing Makefiles picking up an unfortunate local label and mistaking
it for the global symbol _end.
Additional fixes include the handling of 64-bit numbers when setting
the vdso data page (a latent bug which became manifest when i386
started exporting a vdso with time functions), a fix to the new MSR
manipulation accessors which would cause features to not get properly
unblocked, a build fix for 32-bit userland, and a few new platform
quirks"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall()
x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macro
x86: Fix typo preventing msr_set/clear_bit from having an effect
x86/intel: Add quirk to disable HPET for the Baytrail platform
x86/hpet: Make boot_hpet_disable extern
x86-64, build: Fix stack protector Makefile breakage with 32-bit userland
x86/reboot: Add reboot quirk for Certec BPC600
asmlinkage: Add explicit __visible to drivers/*, lib/*, kernel/*
asmlinkage, x86: Add explicit __visible to arch/x86/*
asmlinkage: Revert "lto: Make asmlinkage __visible"
x86, build: Don't get confused by local symbols
x86/efi: earlyprintk=efi,keep fix
Boris Ostrovsky [Fri, 9 May 2014 15:11:27 +0000 (11:11 -0400)]
x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall()
With tk->wall_to_monotonic.tv_nsec being a 32-bit value on 32-bit
systems, (tk->wall_to_monotonic.tv_nsec << tk->shift) in update_vsyscall()
may lose upper bits or, worse, add them since compiler will do this:
(u64)(tk->wall_to_monotonic.tv_nsec << tk->shift)
instead of
((u64)tk->wall_to_monotonic.tv_nsec << tk->shift)
So if, for example, tv_nsec is 0x800000 and shift is 8 we will end up
with 0xffffffff80000000 instead of 0x80000000. And then we are stuck in
the subsequent 'while' loop.
Andres Freund [Fri, 9 May 2014 01:29:16 +0000 (03:29 +0200)]
x86: Fix typo preventing msr_set/clear_bit from having an effect
Due to a typo the msr accessor function introduced in 22085a66c2fab6cf9b9393c056a3600a6b4735de didn't have any lasting
effects because they accidentally wrote the old value back.
After c0a639ad0bc6b178b46996bd1f821a04643e2bde this at the very least
this causes cpuid limits not to be lifted on some cpus leading to
missing capabilities for those.
Jeff Layton [Fri, 9 May 2014 15:41:54 +0000 (11:41 -0400)]
locks: only validate the lock vs. f_mode in F_SETLK codepaths
v2: replace missing break in switch statement (as pointed out by Dave
Jones)
commit bce7560d4946 (locks: consolidate checks for compatible
filp->f_mode values in setlk handlers) introduced a regression in the
F_GETLK handler.
flock64_to_posix_lock is a shared codepath between F_GETLK and F_SETLK,
but the f_mode checks should only be applicable to the F_SETLK codepaths
according to POSIX.
Instead of just reverting the patch, add a new function to do this
checking and have the F_SETLK handlers call it.
Cc: Dave Jones <davej@redhat.com> Reported-and-Tested-by: Reuben Farrelly <reuben@reub.net> Signed-off-by: Jeff Layton <jlayton@poochiereds.net>
Linus Torvalds [Fri, 9 May 2014 02:20:45 +0000 (19:20 -0700)]
Merge tag 'xfs-for-linus-3.15-rc5' of git://oss.sgi.com/xfs/xfs
Pull xfs fixes from Dave Chinner:
"The main fix is adding support for default ACLs on O_TMPFILE opened
inodes to bring XFS into line with other filesystems. Metadata CRCs
are now also considered well enough tested to be fully supported, so
we're removing the shouty warnings issued at mount time for
filesystems with that format. And there's transaction block
reservation overrun fix.
Summary:
- fix a remote attribute size calculation bug that leads to a
transaction overrun
- add default ACLs to O_TMPFILE files
- Remove the EXPERIMENTAL tag from filesystems with metadata CRC
support"
* tag 'xfs-for-linus-3.15-rc5' of git://oss.sgi.com/xfs/xfs:
xfs: remote attribute overwrite causes transaction overrun
xfs: initialize default acls for ->tmpfile()
xfs: fully support v5 format filesystems
Linus Torvalds [Thu, 8 May 2014 21:17:13 +0000 (14:17 -0700)]
Merge tag 'trace-fixes-v3.15-rc4-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This contains two fixes.
The first is a long standing bug that causes bogus data to show up in
the refcnt field of the module_refcnt tracepoint. It was introduced
by a merge conflict resolution back in 2.6.35-rc days.
The result should be 'refcnt = incs - decs', but instead it did
'refcnt = incs + decs'.
The second fix is to a bug that was introduced in this merge window
that allowed for a tracepoint funcs pointer to be used after it was
freed. Moving the location of where the probes are released solved
the problem"
* tag 'trace-fixes-v3.15-rc4-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracepoint: Fix use of tracepoint funcs after rcu free
trace: module: Maintain a valid user count
Linus Torvalds [Thu, 8 May 2014 21:06:45 +0000 (14:06 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem fixes from Dmitry Torokhov:
"Just a few fixups to various drivers"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elantech - fix touchpad initialization on Gigabyte U2442
Input: tca8418 - fix loading this driver as a module from a device tree
Input: bma150 - extend chip detection for bma180
Input: atkbd - fix keyboard not working on some LG laptops
Input: synaptics - add min/max quirk for ThinkPad Edge E431
Linus Torvalds [Thu, 8 May 2014 20:51:53 +0000 (13:51 -0700)]
Merge tag 'sound-3.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A bunch of small fixes for USB-audio and HD-audio, where most of them
are for regressions: USB-audio PM fixes, ratelimit annoyance fix, HDMI
offline state fix, and a couple of device-specific quirks"
* tag 'sound-3.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - hdmi: Set converter channel count even without sink
ALSA: usb-audio: work around corrupted TEAC UD-H01 feedback data
ALSA: usb-audio: Fix deadlocks at resuming
ALSA: usb-audio: Save mixer status only once at suspend
ALSA: usb-audio: Prevent printk ratelimiting from spamming kernel log while DEBUG not defined
ALSA: hda - add headset mic detect quirk for a Dell laptop
Linus Torvalds [Thu, 8 May 2014 19:41:14 +0000 (12:41 -0700)]
Merge tag 'mfd-mmc-fixes-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull mmc/rtsx revert from Lee Jones.
* tag 'mfd-mmc-fixes-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mmc: rtsx: Revert "mmc: rtsx: add support for pre_req and post_req"
tracepoint: Fix use of tracepoint funcs after rcu free
Commit de7b2973903c "tracepoint: Use struct pointer instead of name hash
for reg/unreg tracepoints" introduces a use after free by calling
release_probes on the old struct tracepoint array before the newly
allocated array is published with rcu_assign_pointer. There is a race
window where tracepoints (RCU readers) can perform a
"use-after-grace-period-after-free", which shows up as a GPF in
stress-tests.
Romain Izard [Tue, 4 Mar 2014 09:09:39 +0000 (10:09 +0100)]
trace: module: Maintain a valid user count
The replacement of the 'count' variable by two variables 'incs' and
'decs' to resolve some race conditions during module unloading was done
in parallel with some cleanup in the trace subsystem, and was integrated
as a merge.
Unfortunately, the formula for this replacement was wrong in the tracing
code, and the refcount in the traces was not usable as a result.
Use 'count = incs - decs' to compute the user count.