Linus Torvalds [Thu, 17 Oct 2013 04:36:03 +0000 (21:36 -0700)]
Merge branch 'akpm' (fixes from Andrew Morton)
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (21 commits)
mm: revert mremap pud_free anti-fix
mm: fix BUG in __split_huge_page_pmd
swap: fix set_blocksize race during swapon/swapoff
procfs: call default get_unmapped_area on MMU-present architectures
procfs: fix unintended truncation of returned mapped address
writeback: fix negative bdi max pause
percpu_refcount: export symbols
fs: buffer: move allocation failure loop into the allocator
mm: memcg: handle non-error OOM situations more gracefully
tools/testing/selftests: fix uninitialized variable
block/partitions/efi.c: treat size mismatch as a warning, not an error
mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages
mm/zswap: bugfix: memory leak when re-swapon
mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
mm: migration: do not lose soft dirty bit if page is in migration state
gcov: MAINTAINERS: Add an entry for gcov
mm/hugetlb.c: correct missing private flag clearing
mm/vmscan.c: don't forget to free shrinker->nr_deferred
ipc/sem.c: synchronize semop and semctl with IPC_RMID
ipc: update locking scheme comments
...
Hugh Dickins [Wed, 16 Oct 2013 20:47:09 +0000 (13:47 -0700)]
mm: revert mremap pud_free anti-fix
Revert commit 1ecfd533f4c5 ("mm/mremap.c: call pud_free() after fail
calling pmd_alloc()").
The original code was correct: pud_alloc(), pmd_alloc(), pte_alloc_map()
ensure that the pud, pmd, pt is already allocated, and seldom do they
need to allocate; on failure, upper levels are freed if appropriate by
the subsequent do_munmap(). Whereas commit 1ecfd533f4c5 did an
unconditional pud_free() of a most-likely still-in-use pud: saved only
by the near-impossiblity of pmd_alloc() failing.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Chen Gang <gang.chen@asianux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 16 Oct 2013 20:47:08 +0000 (13:47 -0700)]
mm: fix BUG in __split_huge_page_pmd
Occasionally we hit the BUG_ON(pmd_trans_huge(*pmd)) at the end of
__split_huge_page_pmd(): seen when doing madvise(,,MADV_DONTNEED).
It's invalid: we don't always have down_write of mmap_sem there: a racing
do_huge_pmd_wp_page() might have copied-on-write to another huge page
before our split_huge_page() got the anon_vma lock.
Forget the BUG_ON, just go back and try again if this happens.
Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
swap: fix set_blocksize race during swapon/swapoff
Fix race between swapoff and swapon. Swapoff used old_block_size from
swap_info outside of swapon_mutex so it could be overwritten by
concurrent swapon.
The race has visible effect only if more than one swap block device
exists with different block sizes (e.g. /dev/sda1 with block size 4096
and /dev/sdb1 with 512). In such case it leads to setting the blocksize
of swapped off device with wrong blocksize.
The bug can be triggered with multiple concurrent swapoff and swapon:
0. Swap for some device is on.
1. swapoff:
First the swapoff is called on this device and "struct swap_info_struct
*p" is assigned. This is done under swap_lock however this lock is
released for the call try_to_unuse().
2. swapon:
After the assignment above (and before acquiring swapon_mutex &
swap_lock by swapoff) the swapon is called on the same device.
The p->old_block_size is assigned to the value of block_size the device.
This block size should be the same as previous but sometimes it is not.
The swapon ends successfully.
3. swapoff:
Swapoff resumes, grabs the locks and mutex and continues to disable this
swap device. Now it sets the block size to value taken from swap_info
which was overwritten by swapon in 2.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Reported-by: Weijie Yang <weijie.yang.kh@gmail.com> Cc: Bob Liu <bob.liu@oracle.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Shaohua Li <shli@fusionio.com> Cc: Minchan Kim <minchan@kernel.org> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HATAYAMA Daisuke [Wed, 16 Oct 2013 20:47:05 +0000 (13:47 -0700)]
procfs: call default get_unmapped_area on MMU-present architectures
Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
proc_reg_get_unmapped_area in proc_reg_file_ops and
proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
get_unmapped_area method is not defined for the target procfs file,
which causes regression of mmap on /proc/vmcore.
To address this issue, like get_unmapped_area(), call default
current->mm->get_unmapped_area on MMU-present architectures if
pde->proc_fops->get_unmapped_area, i.e. the one in actual file
operation in the procfs file, is not defined.
Reported-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
HATAYAMA Daisuke [Wed, 16 Oct 2013 20:47:04 +0000 (13:47 -0700)]
procfs: fix unintended truncation of returned mapped address
Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the
mapped virtual address returned from get_unmapped_area method in
pde->proc_fops due to the variable rv of signed integer on x86_64. This
is too small to have vitual address of unsigned long on x86_64 since on
x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes.
To fix this issue, use unsigned long instead.
Fixes a regression added in commit c4fe24485729 ("sparc: fix PCI device
proc file mmap(2)").
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David S. Miller <davem@davemloft.net> Tested-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since pause is bounded by [min_pause, max_pause] where min_pause is also
bounded by max_pause. It's suspected and demonstrated that the
max_pause calculation goes wrong:
The problem lies in the two "long = unsigned long" assignments in
bdi_max_pause() which might go negative if the highest bit is 1, and the
min_t(long, ...) check failed to protect it falling under 0. Fix all of
them by using "unsigned long" throughout the function.
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Richard Weinberger <richard@nod.at> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 16 Oct 2013 20:47:00 +0000 (13:47 -0700)]
fs: buffer: move allocation failure loop into the allocator
Buffer allocation has a very crude indefinite loop around waking the
flusher threads and performing global NOFS direct reclaim because it can
not handle allocation failures.
The most immediate problem with this is that the allocation may fail due
to a memory cgroup limit, where flushers + direct reclaim might not make
any progress towards resolving the situation at all. Because unlike the
global case, a memory cgroup may not have any cache at all, only
anonymous pages but no swap. This situation will lead to a reclaim
livelock with insane IO from waking the flushers and thrashing unrelated
filesystem cache in a tight loop.
Use __GFP_NOFAIL allocations for buffers for now. This makes sure that
any looping happens in the page allocator, which knows how to
orchestrate kswapd, direct reclaim, and the flushers sensibly. It also
allows memory cgroups to detect allocations that can't handle failure
and will allow them to ultimately bypass the limit if reclaim can not
make progress.
Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Wed, 16 Oct 2013 20:46:59 +0000 (13:46 -0700)]
mm: memcg: handle non-error OOM situations more gracefully
Commit 3812c8c8f395 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead. But there are many more and it's impractical to annotate
them all.
First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well. This simplifies the code quite a bit for
added bonus.
Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts. If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.
Reported-by: azurIt <azurit@pobox.sk> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Doug Anderson [Wed, 16 Oct 2013 20:46:57 +0000 (13:46 -0700)]
block/partitions/efi.c: treat size mismatch as a warning, not an error
In commit 27a7c642174e ("partitions/efi: account for pmbr size in lba")
we started treating bad sizes in lba field of the partition that has the
0xEE (GPT protective) as errors.
However, we may run into these "bad sizes" in the real world if someone
uses dd to copy an image from a smaller disk to a bigger disk. Since
this case used to work (even without using force_gpt), keep it working
and treat the size mismatch as a warning instead of an error.
Reported-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Sean Paul <seanpaul@chromium.org> Signed-off-by: Doug Anderson <dianders@chromium.org> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Tested-by: Artem Bityutskiy <dedekind1@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Wed, 16 Oct 2013 20:46:56 +0000 (13:46 -0700)]
mm: hugetlb: initialize PG_reserved for tail pages of gigantic compound pages
Commit 11feeb498086 ("kvm: optimize away THP checks in
kvm_is_mmio_pfn()") introduced a memory leak when KVM is run on gigantic
compound pages.
That commit depends on the assumption that PG_reserved is identical for
all head and tail pages of a compound page. So that if get_user_pages
returns a tail page, we don't need to check the head page in order to
know if we deal with a reserved page that requires different
refcounting.
The assumption that PG_reserved is the same for head and tail pages is
certainly correct for THP and regular hugepages, but gigantic hugepages
allocated through bootmem don't clear the PG_reserved on the tail pages
(the clearing of PG_reserved is done later only if the gigantic hugepage
is freed).
This patch corrects the gigantic compound page initialization so that we
can retain the optimization in 11feeb498086. The cacheline was already
modified in order to set PG_tail so this won't affect the boot time of
large memory systems.
[akpm@linux-foundation.org: tweak comment layout and grammar] Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: andy123 <ajs124.ajs124@gmail.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Acked-by: Rafael Aquini <aquini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Weijie Yang [Wed, 16 Oct 2013 20:46:54 +0000 (13:46 -0700)]
mm/zswap: bugfix: memory leak when re-swapon
zswap_tree is not freed when swapoff, and it got re-kmalloced in swapon,
so a memory leak occurs.
Free the memory of zswap_tree in zswap_frontswap_invalidate_area().
Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Reviewed-by: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org>
From: Weijie Yang <weijie.yang@samsung.com>
Subject: mm/zswap: bugfix: memory leak when invalidate and reclaim occur concurrently
Consider the following scenario:
thread 0: reclaim entry x (get refcount, but not call zswap_get_swap_cache_page)
thread 1: call zswap_frontswap_invalidate_page to invalidate entry x.
finished, entry x and its zbud is not freed as its refcount != 0
now, the swap_map[x] = 0
thread 0: now call zswap_get_swap_cache_page
swapcache_prepare return -ENOENT because entry x is not used any more
zswap_get_swap_cache_page return ZSWAP_SWAPCACHE_NOMEM
zswap_writeback_entry do nothing except put refcount
Now, the memory of zswap_entry x and its zpage leak.
Modify:
- check the refcount in fail path, free memory if it is not referenced.
- use ZSWAP_SWAPCACHE_FAIL instead of ZSWAP_SWAPCACHE_NOMEM as the fail path
can be not only caused by nomem but also by invalidate.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Acked-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Wed, 16 Oct 2013 20:46:53 +0000 (13:46 -0700)]
mm: /proc/pid/pagemap: inspect _PAGE_SOFT_DIRTY only on present pages
If a page we are inspecting is in swap we may occasionally report it as
having soft dirty bit (even if it is clean). The pte_soft_dirty helper
should be called on present pte only.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Wed, 16 Oct 2013 20:46:51 +0000 (13:46 -0700)]
mm: migration: do not lose soft dirty bit if page is in migration state
If page migration is turned on in config and the page is migrating, we
may lose the soft dirty bit. If fork and mprotect are called on
migrating pages (once migration is complete) pages do not obtain the
soft dirty bit in the correspond pte entries. Fix it adding an
appropriate test on swap entries.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Mackall <mpm@selenic.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Wed, 16 Oct 2013 20:46:48 +0000 (13:46 -0700)]
mm/hugetlb.c: correct missing private flag clearing
We should clear the page's private flag when returing the page to the
hugepage pool. Otherwise, marked hugepage can be allocated to the user
who tries to allocate the non-reserved hugepage. If this user fail to
map this hugepage, he would try to return the page to the hugepage pool.
Since this page has a private flag, resv_huge_pages would mistakenly
increase. This patch fixes this situation.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Davidlohr Bueso <davidlohr.bueso@hp.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Manfred Spraul [Wed, 16 Oct 2013 20:46:45 +0000 (13:46 -0700)]
ipc/sem.c: synchronize semop and semctl with IPC_RMID
After acquiring the semlock spinlock, operations must test that the
array is still valid.
- semctl() and exit_sem() would walk stale linked lists (ugly, but
should be ok: all lists are empty)
- semtimedop() would sleep forever - and if woken up due to a signal -
access memory after free.
The patch also:
- standardizes the tests for .deleted, so that all tests in one
function leave the function with the same approach.
- unconditionally tests for .deleted immediately after every call to
sem_lock - even it it means that for semctl(GETALL), .deleted will be
tested twice.
Both changes make the review simpler: After every sem_lock, there must
be a test of .deleted, followed by a goto to the cleanup code (if the
function uses "goto cleanup").
The only exception is semctl_down(): If sem_ids().rwsem is locked, then
the presence in ids->ipcs_idr is equivalent to !.deleted, thus no
additional test is required.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Mike Galbraith <efault@gmx.de> Acked-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Wed, 16 Oct 2013 20:46:45 +0000 (13:46 -0700)]
ipc: update locking scheme comments
The initial documentation was a bit incomplete, update accordingly.
[akpm@linux-foundation.org: make it more readable in 80 columns] Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 16 Oct 2013 20:46:43 +0000 (13:46 -0700)]
mm, memcg: protect mem_cgroup_read_events for cpu hotplug
for_each_online_cpu() needs the protection of {get,put}_online_cpus() so
cpu_online_mask doesn't change during the iteration.
cpu_hotplug.lock is held while a cpu is going down, it's a coarse lock
that is used kernel-wide to synchronize cpu hotplug activity. Memcg has
a cpu hotplug notifier, called while there may not be any cpu hotplug
refcounts, which drains per-cpu event counts to memcg->nocpu_base.events
to maintain a cumulative event count as cpus disappear. Without
get_online_cpus() in mem_cgroup_read_events(), it's possible to account
for the event count on a dying cpu twice, and this value may be
significantly large.
In fact, all memcg->pcp_counter_lock use should be nested by
{get,put}_online_cpus().
This fixes that issue and ensures the reported statistics are not vastly
over-reported during cpu hotplug.
Signed-off-by: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
driver core: Release device_hotplug_lock when store_mem_state returns EINVAL
When inserting a wrong value to /sys/devices/system/memory/memoryX/state file,
following messages are shown. And device_hotplug_lock is never released.
================================================
[ BUG: lock held when returning to user space! ]
3.12.0-rc4-debug+ #3 Tainted: G W
------------------------------------------------
bash/6442 is leaving the kernel with locks still held!
1 lock held by bash/6442:
#0: (device_hotplug_lock){+.+.+.}, at: [<ffffffff8146cbb5>] lock_device_hotplug_sysfs+0x15/0x50
This issue was introdued by commit fa2be40 (drivers: base: use standard
device online/offline for state change).
This patch releases device_hotplug_lcok when store_mem_state returns EINVAL.
Linus Torvalds [Thu, 17 Oct 2013 00:16:57 +0000 (17:16 -0700)]
Merge tag 'dm-3.12-fix-cve' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device-mapper fix from Alasdair Kergon:
"A patch to avoid data corruption in a device-mapper snapshot.
This is primarily a data corruption bug that all users of
device-mapper snapshots will want to fix. The CVE is due to a data
leak under specific circumstances if, for example, the snapshot is
presented to a virtual machine: a block written as data inside the VM
can get interpreted incorrectly on the host outside the VM as
metadata, causing the host to provide the VM with access to blocks it
would not otherwise see. This is likely to affect few, if any,
people"
* tag 'dm-3.12-fix-cve' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm snapshot: fix data corruption
Linus Torvalds [Thu, 17 Oct 2013 00:15:57 +0000 (17:15 -0700)]
Merge tag 'gpio-v3.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull gpio fixes from Linus Walleij:
"Three GPIO fixes for the v3.12 series:
- A fix to the Lynxpoint IRQ handler
- Two late fixes to fallout from the gpiod refactoring"
* tag 'gpio-v3.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpiolib: let gpiod_request() return -EPROBE_DEFER
gpiolib: safer implementation of desc_to_gpio()
gpio/lynxpoint: check if the interrupt is enabled in IRQ handler
Matthew Dawson [Wed, 16 Oct 2013 04:11:24 +0000 (00:11 -0400)]
usb: misc: usb3503: Fix compile error due to incorrect regmap depedency
The USB3503 driver had an incorrect depedency on REGMAP, instead of
REGMAP_I2C. This caused the build to fail since the necessary regmap
i2c pieces were not available.
Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
usb/chipidea: fix oops on memory allocation failure
When CMA fails to initialize in v3.12-rc4, the chipidea driver oopses
the kernel while trying to remove and put the HCD which doesn't exist:
WARNING: CPU: 0 PID: 6 at /home/rmk/git/linux-rmk/arch/arm/mm/dma-mapping.c:511
__dma_alloc+0x200/0x240()
coherent pool not initialised!
Modules linked in:
CPU: 0 PID: 6 Comm: kworker/u2:0 Tainted: G W 3.12.0-rc4+ #56
Workqueue: deferwq deferred_probe_work_func
Backtrace:
[<c001218c>] (dump_backtrace+0x0/0x10c) from [<c0012328>] (show_stack+0x18/0x1c)
r6:c05fd9cc r5:000001ff r4:00000000 r3:df86ad00
[<c0012310>] (show_stack+0x0/0x1c) from [<c05f3a4c>] (dump_stack+0x70/0x8c)
[<c05f39dc>] (dump_stack+0x0/0x8c) from [<c00230a8>] (warn_slowpath_common+0x6c/0x8c)
r4:df883a60 r3:df86ad00
[<c002303c>] (warn_slowpath_common+0x0/0x8c) from [<c002316c>] (warn_slowpath_fmt+0x38/0x40)
r8:ffffffff r7:00001000 r6:c083b808 r5:00000000 r4:df2efe80
[<c0023134>] (warn_slowpath_fmt+0x0/0x40) from [<c00196bc>] (__dma_alloc+0x200/0x240)
r3:00000000 r2:c05fda00
[<c00194bc>] (__dma_alloc+0x0/0x240) from [<c001982c>] (arm_dma_alloc+0x88/0xa0)
[<c00197a4>] (arm_dma_alloc+0x0/0xa0) from [<c03e2904>] (ehci_setup+0x1f4/0x438)
[<c03e2710>] (ehci_setup+0x0/0x438) from [<c03cbd60>] (usb_add_hcd+0x18c/0x664)
[<c03cbbd4>] (usb_add_hcd+0x0/0x664) from [<c03e89f4>] (host_start+0xf0/0x180)
[<c03e8904>] (host_start+0x0/0x180) from [<c03e7c34>] (ci_hdrc_probe+0x360/0x670
)
r6:df2ef410 r5:00000000 r4:df2c3010 r3:c03e8904
[<c03e78d4>] (ci_hdrc_probe+0x0/0x670) from [<c0311044>] (platform_drv_probe+0x20/0x24)
[<c0311024>] (platform_drv_probe+0x0/0x24) from [<c030fcac>] (driver_probe_device+0x9c/0x234)
...
---[ end trace c88ccaf3969e8422 ]---
Unable to handle kernel NULL pointer dereference at virtual address 00000028
pgd = c0004000
[00000028] *pgd=00000000
Internal error: Oops: 17 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 6 Comm: kworker/u2:0 Tainted: G W 3.12.0-rc4+ #56
Workqueue: deferwq deferred_probe_work_func
task: df86ad00 ti: df882000 task.ti: df882000
PC is at usb_remove_hcd+0x10/0x150
LR is at host_stop+0x1c/0x3c
pc : [<c03cacec>] lr : [<c03e88e4>] psr: 60000013
sp : df883b50 ip : df883b78 fp : df883b74
r10: c11f4c54 r9 : c0836450 r8 : df30c400
r7 : fffffff4 r6 : df2ef410 r5 : 00000000 r4 : df2c3010
r3 : 00000000 r2 : 00000000 r1 : df86b0a0 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 10c53c7d Table: 2f29404a DAC: 00000015
Process kworker/u2:0 (pid: 6, stack limit = 0xdf882240)
Stack: (0xdf883b50 to 0xdf884000)
...
Backtrace:
[<c03cacdc>] (usb_remove_hcd+0x0/0x150) from [<c03e88e4>] (host_stop+0x1c/0x3c)
r6:df2ef410 r5:00000000 r4:df2c3010
[<c03e88c8>] (host_stop+0x0/0x3c) from [<c03e8aa0>] (ci_hdrc_host_destroy+0x1c/0x20)
r5:00000000 r4:df2c3010
[<c03e8a84>] (ci_hdrc_host_destroy+0x0/0x20) from [<c03e7c80>] (ci_hdrc_probe+0x3ac/0x670)
[<c03e78d4>] (ci_hdrc_probe+0x0/0x670) from [<c0311044>] (platform_drv_probe+0x20/0x24)
[<c0311024>] (platform_drv_probe+0x0/0x24) from [<c030fcac>] (driver_probe_device+0x9c/0x234)
[<c030fc10>] (driver_probe_device+0x0/0x234) from [<c030ff28>] (__device_attach+0x44/0x48)
...
---[ end trace c88ccaf3969e8423 ]---
Fix this so at least we can continue booting and get to a shell prompt.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oliver Neukum [Mon, 14 Oct 2013 13:24:55 +0000 (15:24 +0200)]
usb-storage: add quirk for mandatory READ_CAPACITY_16
Some USB drive enclosures do not correctly report an
overflow condition if they hold a drive with a capacity
over 2TB and are confronted with a READ_CAPACITY_10.
They answer with their capacity modulo 2TB.
The generic layer cannot cope with that. It must be told
to use READ_CAPACITY_16 from the beginning.
Roel Kluin [Mon, 14 Oct 2013 21:21:15 +0000 (23:21 +0200)]
serial: vt8500: add missing braces
Due to missing braces on an if statement, in presence of a device_node a
port was always assigned -1, regardless of any alias entries in the
device tree. Conversely, if device_node was NULL, an unitialized port
ended up being used.
This patch adds the missing braces, fixing the issues.
Mikulas Patocka [Wed, 16 Oct 2013 02:17:47 +0000 (03:17 +0100)]
dm snapshot: fix data corruption
This patch fixes a particular type of data corruption that has been
encountered when loading a snapshot's metadata from disk.
When we allocate a new chunk in persistent_prepare, we increment
ps->next_free and we make sure that it doesn't point to a metadata area
by further incrementing it if necessary.
When we load metadata from disk on device activation, ps->next_free is
positioned after the last used data chunk. However, if this last used
data chunk is followed by a metadata area, ps->next_free is positioned
erroneously to the metadata area. A newly-allocated chunk is placed at
the same location as the metadata area, resulting in data or metadata
corruption.
This patch changes the code so that ps->next_free skips the metadata
area when metadata are loaded in function read_exceptions.
The patch also moves a piece of code from persistent_prepare_exception
to a separate function skip_metadata to avoid code duplication.
CVE-2013-4299
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Linus Torvalds [Wed, 16 Oct 2013 00:14:13 +0000 (17:14 -0700)]
Merge tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux
Pull device tree fixes and reverts from Grant Likely:
"One bug fix and three reverts. The reverts back out the slightly
controversial feeding the entire device tree into the random pool and
the reserved-memory binding which isn't fully baked yet. Expect the
reserved-memory patches at least to resurface for v3.13.
The bug fixes removes a scary but harmless warning on SPARC that was
introduced in the v3.12 merge window. v3.13 will contain a proper fix
that makes the new code work on SPARC.
On the plus side, the diffstat looks *awesome*. I love removing lines
of code"
* tag 'devicetree-for-linus' of git://git.secretlab.ca/git/linux:
Revert "drivers: of: add initialization code for dma reserved memory"
Revert "ARM: init: add support for reserved memory defined by device tree"
Revert "of: Feed entire flattened device tree into the random pool"
of: fix unnecessary warning on missing /cpus node
Linus Torvalds [Tue, 15 Oct 2013 23:22:11 +0000 (16:22 -0700)]
Merge tag 'stable/for-linus-3.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen fixes from Stefano Stabellini:
"A small fix for Xen on x86_32 and a build fix for xen-tpmfront on
arm64"
* tag 'stable/for-linus-3.12-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Fix possible user space selector corruption
tpm: xen-tpmfront: fix missing declaration of xen_domain
Russell King [Tue, 15 Oct 2013 23:09:02 +0000 (00:09 +0100)]
ARM: sa11x0/assabet: ensure CS2 is configured appropriately
The CS2 region contains the Assabet board configuration and status
registers, which are 32-bit. Unfortunately, some boot loaders do not
configure this region correctly, leaving it setup as a 16-bit region.
Fix this.
Cc: <stable@vger.kernel.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Raghavendra K T [Wed, 9 Oct 2013 09:03:21 +0000 (14:33 +0530)]
KVM: Enable pvspinlock after jump_label_init() to avoid VM hang
We use jump label to enable pv-spinlock. With the changes in (442e0973e927
Merge branch 'x86/jumplabel'), the jump label behaviour has changed
that would result in eventual hang of the VM since we would end up in a
situation where slow path locks would halt the vcpus but we will not be
able to wakeup the vcpu by lock releaser using unlock kick.
Similar problem in Xen and more detailed description is available in a945928ea270 (xen: Do not enable spinlocks before jump_label_init()
has executed)
This patch splits kvm_spinlock_init to separate jump label changes with
pvops patching and also make jump label enabling after jump_label_init().
Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Gleb Natapov <gleb@redhat.com>
The pcm_usb_stream plugin requires the mremap explicitly for the read
buffer, as it expands itself once after reading the required size.
But the commit [314e51b9: mm: kill vma flag VM_RESERVED and
mm->reserved_vm counter] converted blindly to a combination of
VM_DONTEXPAND | VM_DONTDUMP like other normal drivers, and this
resulted in the failure of mremap().
For fixing this regression, we need to remove VM_DONTEXPAND for the
read-buffer mmap.
Reported-and-tested-by: James Miller <jamesstewartmiller@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
Marek Szyprowski [Fri, 11 Oct 2013 07:27:28 +0000 (09:27 +0200)]
Revert "drivers: of: add initialization code for dma reserved memory"
This reverts commit 9d8eab7af79cb4ce2de5de39f82c455b1f796963. There is
still no consensus on the bindings for the reserved memory and various
drawbacks of the proposed solution has been shown, so the best now is to
revert it completely and start again from scratch later.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Grant Likely <grant.likely@linaro.org>
Marek Szyprowski [Fri, 11 Oct 2013 07:27:27 +0000 (09:27 +0200)]
Revert "ARM: init: add support for reserved memory defined by device tree"
This reverts commit 10bcdfb8ba24760f715f0a700c3812747eddddf5. There is
no consensus on the bindings for the reserved memory, so the code for
handing it will be reverted.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Grant Likely <grant.likely@linaro.org>
Linus Torvalds [Tue, 15 Oct 2013 00:43:33 +0000 (17:43 -0700)]
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband updates from Roland Dreier:
"Last batch of IB changes for 3.12: many mlx5 hardware driver fixes
plus one trivial semicolon cleanup"
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
IB: Remove unnecessary semicolons
IB/mlx5: Ensure proper synchronization accessing memory
IB/mlx5: Fix alignment of reg umr gather buffers
IB/mlx5: Fix eq names to display nicely in /proc/interrupts
mlx5: Fix error code translation from firmware to driver
IB/mlx5: Fix opt param mask according to firmware spec
mlx5: Fix opt param mask for sq err to rts transition
IB/mlx5: Disable atomic operations
mlx5: Fix layout of struct mlx5_init_seg
mlx5: Keep polling to reclaim pages while any returned
IB/mlx5: Avoid async events on invalid port number
IB/mlx5: Decrease memory consumption of mr caches
mlx5: Remove checksum on command interface commands
IB/mlx5: Fix memory leak in mlx5_ib_create_srq
IB/mlx5: Flush cache workqueue before destroying it
IB/mlx5: Fix send work queue size calculation
So revert this patch, as Felipe says:
It's unfortunate that tusb6010 is so messed up
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Acked-by: Felipe Balbi <balbi@ti.com> Cc: Bin Liu <b-liu@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Avinash Patil [Sat, 12 Oct 2013 01:31:32 +0000 (18:31 -0700)]
mwifiex: inform cfg80211 about disconnect for P2P client interface
This patch adds missing cfg80211_disconnected event for P2P client
interface upon successful deauthenticate command, deauthenticate
event or disassociate event from FW.
Signed-off-by: Avinash Patil <patila@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Avinash Patil [Sat, 12 Oct 2013 01:31:31 +0000 (18:31 -0700)]
mwifiex: inform cfg80211 about disconnect if device is removed
If device is surprise removed, commands sent to FW including
deauthenticate command fail as bus writes fail.
We update our media_connected status to false and inform cfg80211
about disconnection only when command is successful. Since cfg80211
assumes device is still connected, it results into following
WARN_ON during unload:
WARNING: CPU: 0 PID: 18245 at net/wireless/core.c:937
cfg80211_netdev_notifier_call+0x175/0x4d0 [cfg80211]()
Avoid this by emitting cfg80211_disconnected event even if the
deauthenticate command fails.
Signed-off-by: Avinash Patil <patila@marvell.com> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Linus Torvalds [Mon, 14 Oct 2013 17:02:23 +0000 (10:02 -0700)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
"Some more ARM fixes, nothing particularly major here. The biggest
change is to fix the SMP_ON_UP code so that it works with TI's Aegis
cores"
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7851/1: check for number of arguments in syscall_get/set_arguments()
ARM: 7846/1: Update SMP_ON_UP code to detect A9MPCore with 1 CPU devices
ARM: 7845/1: sharpsl_param.c: fix invalid memory access for pxa devices
ARM: 7843/1: drop asm/types.h from generic-y
ARM: 7842/1: MCPM: don't explode if invoked without being initialized first
Linus Torvalds [Mon, 14 Oct 2013 16:31:08 +0000 (09:31 -0700)]
Merge tag 'pm+acpi-3.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These fix two recent bugs in ACPIPHP (ACPI-based PCI hotplug) and
update a bunch of web links and e-mail addresses in MAINTAINERS, docs
and Kconfig that either are stale or will expire soon.
Specifics:
- The WARN_ON() in acpiphp_enumerate_slots() triggers as a false
positive in some cases, so drop it.
- Add a missing pci_dev_put() to an error code path in
acpiphp_enumerate_slots().
- Replace my old e-mail address that's going to expire with a new
one.
- Update ACPI web links and git tree information in MAINTAINERS.
- Update links to the Linux-ACPI project's page in MAINTAINERS.
- Update some stale links and e-mail addresses under Documentation
and in the ACPI Kconfig file"
* tag 'pm+acpi-3.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / hotplug / PCI: Drop WARN_ON() from acpiphp_enumerate_slots()
ACPI / hotplug / PCI: Fix error code path in acpiphp_enumerate_slots()
ACPI / PM / Documentation: Replace outdated project links and addresses
MAINTAINERS / ACPI: Update links to the Linux-ACPI project web page
MAINTAINERS / ACPI: Update links and git tree information
MAINTAINERS / Documentation: Update Rafael's e-mail address
ARM: 7805/1: mm: change max*pfn to include the physical offset of memory
Most of the kernel code assumes that max*pfn is maximum pfns because
the physical start of memory is expected to be PFN0. Since this
assumption is not true on ARM architectures, the meaning of max*pfn
is number of memory pages. This is done to keep drivers happy which
are making use of of these variable to calculate the dma bounce limit
using dma_mask.
Now since we have a architecture override possibility for DMAable
maximum pfns, lets make meaning of max*pfns as maximum pnfs on ARM
as well.
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ARM: 7797/1: mmc: Use dma_max_pfn(dev) helper for bounce_limit calculations
DMA bounce limit is the maximum direct DMA'able memory beyond which
bounce buffers has to be used to perform dma operations. MMC queue layr
relies on dma_mask but its calculation is based on max_*pfn which
don't have uniform meaning across architectures. So make use of
dma_max_pfn() which is expected to return the DMAable maximum pfn
value across architectures.
Cc: Chris Ball <cjb@laptop.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ARM: 7796/1: scsi: Use dma_max_pfn(dev) helper for bounce_limit calculations
DMA bounce limit is the maximum direct DMA'able memory beyond which
bounce buffers has to be used to perform dma operations. SCSI driver
relies on dma_mask but its calculation is based on max_*pfn which
don't have uniform meaning across architectures. So make use of
dma_max_pfn() which is expected to return the DMAable maximum pfn
value across architectures.
Cc: linux-scsi@vger.kernel.org Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ARM: 7795/1: mm: dma-mapping: Add dma_max_pfn(dev) helper function
Most of the kernel assumes that PFN0 is the start of the physical
memory (RAM). This assumptions is not true on most of the ARM SOCs
and hence and if one try to update the ARM port to follow the assumptions,
we end of breaking the dma bounce limit for few block layer drivers.
One such example is trying to unify the meaning of max*_pfn on ARM
as the bootmem layer expects, breaks few block layer driver dma
bounce limit.
To fix this problem, we introduce dma_max_pfn(dev) generic helper with
a possibility of override from the architecture code. The helper converts
a DMA bitmask of bits to a block PFN number. In all the generic cases,
it is just "dev->dma_mask >> PAGE_SHIFT" and hence default behavior
is maintained as is.
Subsequent patches will make use of the helper. No functional change.
Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
ARM: 7794/1: block: Rename parameter dma_mask to max_addr for blk_queue_bounce_limit()
The blk_queue_bounce_limit() API parameter 'dma_mask' is actually the
maximum address the device can handle rather than a dma_mask. Rename
it accordingly to avoid it being interpreted as dma_mask.
No functional change.
The idea is to fix the bad assumptions about dma_mask wherever it could
be miss-interpreted.
Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Tue, 9 Jul 2013 11:14:49 +0000 (12:14 +0100)]
ARM: DMA-API: better handing of DMA masks for coherent allocations
We need to start treating DMA masks as something which is specific to
the bus that the device resides on, otherwise we're going to hit all
sorts of nasty issues with LPAE and 32-bit DMA controllers in >32-bit
systems, where memory is offset from PFN 0.
In order to start doing this, we convert the DMA mask to a PFN using
the device specific dma_to_pfn() macro. This is the reverse of the
pfn_to_dma() macro which is used to get the DMA address for the device.
This gives us a PFN mask, which we can then check against the PFN
limit of the DMA zone.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The dma mask is not configured in the current code.
This was triggered by soc-dmaengine-pcm which allocate the dma
buffers with the imx-sdma as device.
This commit fix audio on imx31.
Signed-off-by: Philippe Rétornaz <philippe.retornaz@epfl.ch> Acked-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Steven Capper [Mon, 14 Oct 2013 08:49:10 +0000 (09:49 +0100)]
ARM: 7858/1: mm: make UACCESS_WITH_MEMCPY huge page aware
The memory pinning code in uaccess_with_memcpy.c does not check
for HugeTLB or THP pmds, and will enter an infinite loop should
a __copy_to_user or __clear_user occur against a huge page.
This patch adds detection code for huge pages to pin_page_for_write.
As this code can be executed in a fast path it refers to the actual
pmds rather than the vma. If a HugeTLB or THP is found (they have
the same pmd representation on ARM), the page table spinlock is
taken to prevent modification whilst the page is pinned.
On ARM, huge pages are only represented as pmds, thus no huge pud
checks are performed. (For huge puds one would lock the page table
in a similar manner as in the pmd case).
Two helper functions are introduced; pmd_thp_or_huge will check
whether or not a page is huge or transparent huge (which have the
same pmd layout on ARM), and pmd_hugewillfault will detect whether
or not a page fault will occur on write to the page.
Running the following test (with the chunking from read_zero
removed):
$ dd if=/dev/zero of=/dev/null bs=10M count=1024
Gave: 2.3 GB/s backed by normal pages,
2.9 GB/s backed by huge pages,
5.1 GB/s backed by huge pages, with page mask=HPAGE_MASK.
After some discussion, it was decided not to adopt the HPAGE_MASK,
as this would have a significant detrimental effect on the overall
system latency due to page_table_lock being held for too long.
This could be revisited if split huge page locks are adopted.
Signed-off-by: Steve Capper <steve.capper@linaro.org> Reviewed-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rob Herring [Wed, 9 Oct 2013 16:26:44 +0000 (17:26 +0100)]
ARM: 7855/1: Add check for Cortex-A15 errata 798181 ECO
The work-around for A15 errata 798181 is not needed if appropriate ECO
fixes have been applied to r3p2 and earlier core revisions. This can be
checked by reading REVIDR register bits 4 and 9. If only bit 4 is set,
then the IPI broadcast can be skipped.
Signed-off-by: Rob Herring <rob.herring@calxeda.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Will Deacon [Wed, 9 Oct 2013 16:19:22 +0000 (17:19 +0100)]
ARM: 7854/1: lockref: add support for lockless lockrefs using cmpxchg64
Our spinlocks are only 32-bit (2x16-bit tickets) and, on processors
with 64-bit atomic instructions, cmpxchg64 makes use of the double-word
exclusive accessors.
This patch wires up the cmpxchg-based lockless lockref implementation
for ARM.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Will Deacon [Wed, 9 Oct 2013 16:17:18 +0000 (17:17 +0100)]
ARM: 7853/1: cmpxchg: implement cmpxchg64_relaxed
This patch introduces cmpxchg64_relaxed for arm, which performs a 64-bit
cmpxchg operation without barrier semantics. cmpxchg64_local is updated
to use the new operation.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Our cmpxchg64 macros are wrappers around atomic64_cmpxchg. Whilst this is
great for code re-use, there is a case for barrier-less cmpxchg where it
is known to be safe (for example cmpxchg64_local and cmpxchg-based
lockrefs).
This patch introduces a 64-bit cmpxchg implementation specifically
for the cmpxchg64_* macros, so that it can be later used by the lockref
code.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tim Bird expressed concern that this will have a bad effect on boot
time, and while simple tests have shown it to be okay with simple tree,
a device tree blob can potentially be quite large and
add_device_randomness() is not a fast function. Rather than do this for
all platforms unconditionally, I'm reverting this patch and would like
to see it revisited. Instead of feeding the entire tree into the random
pool, it would probably be appropriate to hash the tree and feed the
hash result into the pool. There really isn't a lot of randomness in a
device tree anyway. In the majority of cases only a handful of
properties are going to be different between machines with the same
baseboard.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Grant Likely [Thu, 3 Oct 2013 20:04:31 +0000 (21:04 +0100)]
of: fix unnecessary warning on missing /cpus node
Not all DT platforms have all the cpus collected under a /cpus node.
That just happens to be a details of FDT, ePAPR and PowerPC platforms.
Sparc does something different, but unfortunately the current code
complains with a warning if /cpus isn't there. This became a problem
with commit f86e4718, "driver/core cpu: initialize of_node in cpu's
device structure", which caused the function to get called for all
architectures.
This commit is a temporary fix to fail silently if the cpus node isn't
present. A proper fix will come later to allow arch code to provide a
custom mechanism for decoding the CPU hwid if the 'reg' property isn't
appropriate.
Signed-off-by: Grant Likely <grant.likely@linaro.org> Cc: David Miller <davem@davemloft.net> Cc: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com> Cc: Rob Herring <rob.herring@calxeda.com>
ALSA: hda - Fix inverted internal mic not indicated on some machines
The create_bind_cap_vol_ctl does not create any control indicating
that an inverted dmic is present. Therefore, create multiple
capture volumes in this scenario, so we always have some indication
that the internal mic is inverted.
This happens on the Lenovo Ideapad U310 as well as the Lenovo Yoga 13
(both are based on the CX20590 codec), but the fix is generic and
could be needed for other codecs/machines too.
Thanks to Szymon Acedański for the pointer and a draft patch.
Johannes Berg [Fri, 11 Oct 2013 13:47:06 +0000 (15:47 +0200)]
mac80211: fix crash if bitrate calculation goes wrong
If a frame's timestamp is calculated, and the bitrate
calculation goes wrong and returns zero, the system
will attempt to divide by zero and crash. Catch this
case and print the rate information that the driver
reported when this happens.
Cc: stable@vger.kernel.org Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 11 Oct 2013 12:47:05 +0000 (14:47 +0200)]
wireless: radiotap: fix parsing buffer overrun
When parsing an invalid radiotap header, the parser can overrun
the buffer that is passed in because it doesn't correctly check
1) the minimum radiotap header size
2) the space for extended bitmaps
The first issue doesn't affect any in-kernel user as they all
check the minimum size before calling the radiotap function.
The second issue could potentially affect the kernel if an skb
is passed in that consists only of the radiotap header with a
lot of extended bitmaps that extend past the SKB. In that case
a read-only buffer overrun by at most 4 bytes is possible.
Fix this by adding the appropriate checks to the parser.
Cc: stable@vger.kernel.org Reported-by: Evan Huus <eapache@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Linus Torvalds [Sun, 13 Oct 2013 18:41:26 +0000 (11:41 -0700)]
Merge git://www.linux-watchdog.org/linux-watchdog
Pull watchdog fixes from Wim Van Sebroeck:
"This will fix a deadlock on the ts72xx_wdt driver, fix bitmasks in the
kempld_wdt driver and fix a section mismatch in the sunxi_wdt driver"
* git://www.linux-watchdog.org/linux-watchdog:
watchdog: sunxi: Fix section mismatch
watchdog: kempld_wdt: Fix bit mask definition
watchdog: ts72xx_wdt: locking bug in ioctl
Maxime Ripard [Sat, 5 Oct 2013 14:20:17 +0000 (16:20 +0200)]
watchdog: sunxi: Fix section mismatch
This driver has a section mismatch, for probe and remove functions,
leading to the following warning during the compilation.
WARNING: drivers/watchdog/built-in.o(.data+0x24): Section mismatch in
reference from the variable sunxi_wdt_driver to the function
.init.text:sunxi_wdt_probe()
The variable sunxi_wdt_driver references
the function __init sunxi_wdt_probe()
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
AKASHI Takahiro [Wed, 9 Oct 2013 14:58:29 +0000 (15:58 +0100)]
ARM: 7851/1: check for number of arguments in syscall_get/set_arguments()
In ftrace_syscall_enter(),
syscall_get_arguments(..., 0, n, ...)
if (i == 0) { <handle ORIG_r0> ...; n--;}
memcpy(..., n * sizeof(args[0]));
If 'number of arguments(n)' is zero and 'argument index(i)' is also zero in
syscall_get_arguments(), none of arguments should be copied by memcpy().
Otherwise 'n--' can be a big positive number and unexpected amount of data
will be copied. Tracing system calls which take no argument, say sync(void),
may hit this case and eventually make the system corrupted.
This patch fixes the issue both in syscall_get_arguments() and
syscall_set_arguments().
Cc: <stable@vger.kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Mon, 1 Jul 2013 14:55:02 +0000 (15:55 +0100)]
DMA-API: firmware/google/gsmi.c: avoid direct access to DMA masks
This driver doesn't need to directly access DMA masks if it uses the
platform_device_register_full() API rather than
platform_device_register_simple() - the former function can initialize
the DMA mask appropriately.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Russell King [Thu, 27 Jun 2013 13:14:43 +0000 (14:14 +0100)]
DMA-API: dcdbas: update DMA mask handing
dcdbas was explicitly initializing DMA masks thusly:
dcdbas_pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
dcdbas_pdev->dev.dma_mask = &dcdbas_pdev->dev.coherent_dma_mask;
which bypasses the architecture check. Moreover, it is creating the
dcdbas_pdev device itself, and using the platform_device_register_full()
avoids some of this explicit initialization.
Convert the driver to use platform_device_register_full(), and as it
makes use of coherent DMA, also call dma_set_coherent_mask() to ensure
that the architecture gets to check the mask.
Tested-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>