remove_pagetable() gets start argument and passes the argument to
sync_global_pgds(). In this case, the argument must not be modified. If
the argument is modified and passed to sync_global_pgds(),
sync_global_pgds() does not correctly synchronize PGD to PGD entries of
all processes MM since synchronized range of memory [start, end] is wrong.
Unfortunately the start argument is modified in remove_pagetable(). So
this patch fixes the issue.
Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Heiko Carstens [Thu, 26 Jun 2014 00:42:18 +0000 (10:42 +1000)]
fs/seq_file: fallback to vmalloc allocation
There are a couple of seq_files which use the single_open() interface.
This interface requires that the whole output must fit into a single
buffer.
E.g. for /proc/stat allocation failures have been observed because an
order-4 memory allocation failed due to memory fragmentation. In such
situations reading /proc/stat is not possible anymore.
Therefore change the seq_file code to fallback to vmalloc allocations
which will usually result in a couple of order-0 allocations and hence
also work if memory is fragmented.
For reference a call trace where reading from /proc/stat failed:
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: David Rientjes <rientjes@google.com> Cc: Ian Kent <raven@themaw.net> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com> Cc: Andrea Righi <andrea@betterlinux.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Heiko Carstens [Thu, 26 Jun 2014 00:42:18 +0000 (10:42 +1000)]
proc/stat: convert to single_open_size()
These two patches are supposed to "fix" failed order-4 memory allocations
which have been observed when reading /proc/stat. The problem has been
observed on s390 as well as on x86.
To address the problem change the seq_file memory allocations to fallback
to use vmalloc, so that allocations also work if memory is fragmented.
This approach seems to be simpler and less intrusive than changing
/proc/stat to use an interator. Also it "fixes" other users as well,
which use seq_file's single_open() interface.
This patch (of 2):
Use seq_file's single_open_size() to preallocate a buffer that is large
enough to hold the whole output, instead of open coding it. Also
calculate the requested size using the number of online cpus instead of
possible cpus, since the size of the output only depends on the number of
online cpus.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Ian Kent <raven@themaw.net> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Thorsten Diehl <thorsten.diehl@de.ibm.com> Cc: Andrea Righi <andrea@betterlinux.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stefan Bader <stefan.bader@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ian Kent [Thu, 26 Jun 2014 00:42:17 +0000 (10:42 +1000)]
autofs4: fix false positive compile error
On strict build environments we can see:
fs/autofs4/inode.c: In function 'autofs4_fill_super':
fs/autofs4/inode.c:312: error: 'pgrp' may be used uninitialized in this
function
make[2]: *** [fs/autofs4/inode.o] Error 1
make[1]: *** [fs/autofs4] Error 2
make: *** [fs] Error 2
make: *** Waiting for unfinished jobs....
This is due to the use of pgrp_set being used to indicate pgrp has
has been set rather than initializing pgrp itself.
Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Joonsoo Kim [Thu, 26 Jun 2014 00:42:17 +0000 (10:42 +1000)]
slub: fix off by one in number of slab tests
min_partial means minimum number of slab cached in node partial list. So,
if nr_partial is less than it, we keep newly empty slab on node partial
list rather than freeing it. But if nr_partial is equal or greater than
it, it means that we have enough partial slabs so should free newly empty
slab. Current implementation missed the equal case so if we set
min_partial is 0, then, at least one slab could be cached. This is
critical problem to kmemcg destroying logic because it doesn't works
properly if some slabs is cached. This patch fixes this problem.
Fixes 91cb69620284 ("slub: make dead memcg caches discard free slabs
immediately").
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This happens because init_cma_reserved_pageblock() calls __free_one_page()
with pageblock_order as page order but it is bigger than MAX_ORDER. This
in turn causes accesses past zone->free_list[].
Fix the problem by changing init_cma_reserved_pageblock() such that it
splits pageblock into individual MAX_ORDER pages if pageblock is bigger
than a MAX_ORDER page.
In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
architectures expect for ia64, powerpc and tile at the moment, the
“pageblock_order > MAX_ORDER” condition will be optimised out since
both sides of the operator are constants. In cases where pageblock size
is variable, the performance degradation should not be significant anyway
since init_cma_reserved_pageblock() is called only at boot time at most
MAX_CMA_AREAS times which by default is eight.
Signed-off-by: Michal Nazarewicz <mina86@mina86.com> Reported-by: Mark Salter <msalter@redhat.com> Tested-by: Mark Salter <msalter@redhat.com> Tested-by: Christopher Covington <cov@codeaurora.org> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: <stable@vger.kernel.org> [3.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Linus Torvalds [Wed, 25 Jun 2014 12:44:17 +0000 (05:44 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes and cleanups from Ben Herrenschmidt:
"Here are a handful or two of powerpc fixes and simple/trivial
cleanups. A bunch of them fix ftrace with the new ABI v2 in Little
Endian, the rest is a scattering of fairly simple things"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Don't skip ePAPR spin-table CPUs
powerpc/module: Fix TOC symbol CRC
powerpc/powernv: Remove OPAL v1 takeover
powerpc/kmemleak: Do not scan the DART table
selftests/powerpc: Use the test harness for the TM DSCR test
powerpc/cell: cbe_thermal.c: Cleaning up a variable is of the wrong type
powerpc/kprobes: Fix jprobes on ABI v2 (LE)
powerpc/ftrace: Use pr_fmt() to namespace error messages
powerpc/ftrace: Fix nop of modules on 64bit LE (ABIv2)
powerpc/ftrace: Fix inverted check of create_branch()
powerpc/ftrace: Fix typo in mask of opcode
powerpc: Add ppc_global_function_entry()
powerpc/macintosh/smu.c: Fix closing brace followed by if
powerpc: Remove __arch_swab*
powerpc: Remove ancient DEBUG_SIG code
powerpc/kerenl: Enable EEH for IO accessors
Linus Torvalds [Wed, 25 Jun 2014 12:30:20 +0000 (05:30 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull vhost cleanups from Michael S Tsirkin:
"Two cleanup patches removing code duplication that got introduced by
changes in rc1. Not fixing crashes, but I'd rather not carry the
duplicate code until the next merge window"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost-scsi: don't open-code kvfree
vhost-net: don't open-code kvfree
Linus Torvalds [Wed, 25 Jun 2014 12:08:09 +0000 (05:08 -0700)]
Merge tag 'trace-fixes-v3.16-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing cleanups and fixes from Steven Rostedt:
"This includes three patches from Oleg Nesterov. The first is a fix to
a race condition that happens between enabling/disabling syscall
tracepoints and new process creations (the check to go into the ptrace
path for a process can be set when it shouldn't, or not set when it
should). Not a major bug but one that should be fixed and even
applied to stable.
The other two patches are cleanup/fixes that are not that critical,
but for an -rc1 release would be nice to have. They both deal with
syscall tracepoints.
It also includes a patch to introduce a new macro for the
TRACE_EVENT() format called __field_struct(). Originally, __field()
was used to record any variable into a trace event, but with the
addition of setting the "is signed" attribute, the check causes
anything but a primitive variable to fail to compile. That is,
structs and unions can't be used as they once were. When the "is
signed" check was introduce there were only primitive variables being
recorded. But that will change soon and it was reported that
__field() causes build failures.
To solve the __field() issue, __field_struct() is introduced to allow
trace_events to be able to record complex types too"
* tag 'trace-fixes-v3.16-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add __field_struct macro for TRACE_EVENT()
tracing: syscall_regfunc() should not skip kernel threads
tracing: Change syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread()
tracing: Fix syscall_*regfunc() vs copy_process() race
Scott Wood [Wed, 25 Jun 2014 01:15:51 +0000 (20:15 -0500)]
powerpc: Don't skip ePAPR spin-table CPUs
Commit 59a53afe70fd530040bdc69581f03d880157f15a "powerpc: Don't setup
CPUs with bad status" broke ePAPR SMP booting. ePAPR says that CPUs
that aren't presently running shall have status of disabled, with
enable-method being used to determine whether the CPU can be enabled.
Fix by checking for spin-table, which is currently the only supported
enable-method.
Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Emil Medve <Emilian.Medve@Freescale.com> Cc: stable@vger.kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Laurent Dufour [Tue, 24 Jun 2014 08:53:59 +0000 (10:53 +0200)]
powerpc/module: Fix TOC symbol CRC
The commit 71ec7c55ed91 introduced the magic symbol ".TOC." for ELFv2 ABI.
This symbol is built manually and has no CRC value computed. A zero value
is put in the CRC section to avoid modpost complaining about a missing CRC.
Unfortunately, this breaks the kernel module loading when the kernel is
relocated (kdump case for instance) because of the relocation applied to
the kcrctab values.
This patch compute a CRC value for the TOC symbol which will match the one
compute by the kernel when it is relocated - aka '0 - relocate_start' done in
maybe_relocated called by check_version (module.c).
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Cc: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Tue, 24 Jun 2014 07:17:47 +0000 (17:17 +1000)]
powerpc/powernv: Remove OPAL v1 takeover
In commit 27f4488872d9 "Add OPAL takeover from PowerVM" we added support
for "takeover" on OPAL v1 machines.
This was a mode of operation where we would boot under pHyp, and query
for the presence of OPAL. If detected we would then do a special
sequence to take over the machine, and the kernel would end up running
in hypervisor mode.
OPAL v1 was never a supported product, and was never shipped outside
IBM. As far as we know no one is still using it.
Newer versions of OPAL do not use the takeover mechanism. Although the
query for OPAL should be harmless on machines with newer OPAL, we have
seen a machine where it causes a crash in Open Firmware.
The code in early_init_devtree() to copy boot_command_line into cmd_line
was added in commit 817c21ad9a1f "Get kernel command line accross OPAL
takeover", and AFAIK is only used by takeover, so should also be
removed.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Tue, 24 Jun 2014 21:00:13 +0000 (14:00 -0700)]
Merge git://git.kvack.org/~bcrl/aio-fixes
Pull aio fixes from Ben LaHaise:
"These fix a kernel memory disclosure issue (arbitrary kmap() &
copy_to_user()) revealed in CVE-2014-0206 by changes that were
introduced in v3.10"
* git://git.kvack.org/~bcrl/aio-fixes:
aio: fix kernel memory disclosure in io_getevents() introduced in v3.10
aio: fix aio request leak when events are reaped by userspace
Linus Torvalds [Tue, 24 Jun 2014 20:59:00 +0000 (13:59 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"A number of low impact fixes, the most noticable one is the thumb2
frame pointer fix. We also fix a regression caused during this merge
window with ARM925 CPUs running with caches disabled, and fix a number
of warnings"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: arm925: ensure assembly sets up writethrough mapping
ARM: perf: fix compiler warning with gcc 4.6.4 (and tidy code)
ARM: l2c: fix dependencies on PL310 errata symbols
ARM: 8069/1: Make thread_save_fp macro aware of THUMB2 mode
ARM: 8068/1: scoop: Remove unused variable
Benjamin LaHaise [Tue, 24 Jun 2014 17:32:51 +0000 (13:32 -0400)]
aio: fix kernel memory disclosure in io_getevents() introduced in v3.10
A kernel memory disclosure was introduced in aio_read_events_ring() in v3.10
by commit a31ad380bed817aa25f8830ad23e1a0480fef797. The changes made to
aio_read_events_ring() failed to correctly limit the index into
ctx->ring_pages[], allowing an attacked to cause the subsequent kmap() of
an arbitrary page with a copy_to_user() to copy the contents into userspace.
This vulnerability has been assigned CVE-2014-0206. Thanks to Mateusz and
Petr for disclosing this issue.
This patch applies to v3.12+. A separate backport is needed for 3.10/3.11.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: stable@vger.kernel.org
Benjamin LaHaise [Tue, 24 Jun 2014 17:12:55 +0000 (13:12 -0400)]
aio: fix aio request leak when events are reaped by userspace
The aio cleanups and optimizations by kmo that were merged into the 3.10
tree added a regression for userspace event reaping. Specifically, the
reference counts are not decremented if the event is reaped in userspace,
leading to the application being unable to submit further aio requests.
This patch applies to 3.12+. A separate backport is required for 3.10/3.11.
This issue was uncovered as part of CVE-2014-0206.
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: stable@vger.kernel.org Cc: Kent Overstreet <kmo@daterainc.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com>
Catalin Marinas [Fri, 13 Jun 2014 08:44:21 +0000 (09:44 +0100)]
powerpc/kmemleak: Do not scan the DART table
The DART table allocation is registered to kmemleak via the
memblock_alloc_base() call. However, the DART table is later unmapped
and dart_tablebase VA no longer accessible. This patch tells kmemleak
not to scan this block and avoid an unhandled paging request.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Mon, 23 Jun 2014 03:23:31 +0000 (13:23 +1000)]
powerpc/kprobes: Fix jprobes on ABI v2 (LE)
In commit 721aeaa9 "Build little endian ppc64 kernel with ABIv2", we
missed some updates required in the kprobes code to make jprobes work
when the kernel is built with ABI v2.
Firstly update arch_deref_entry_point() to do the right thing. Now that
we have added ppc_global_function_entry() we can just always use that, it
will do the right thing for 32 & 64 bit and ABI v1 & v2.
Secondly we need to update the code that sets up the register state before
calling the jprobe handler. On ABI v1 we setup r2 to hold the TOC, on ABI
v2 we need to populate r12 with the function entry point address.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Tue, 17 Jun 2014 06:15:36 +0000 (16:15 +1000)]
powerpc/ftrace: Use pr_fmt() to namespace error messages
The printks() in our ftrace code have no prefix, so they appear on the
console with very little context, eg:
Branch out of range
Use pr_fmt() & pr_err() to add a prefix. While we're at it, collapse a
few split lines that don't need to be, and add a missing newline to one
message.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Tue, 17 Jun 2014 06:15:34 +0000 (16:15 +1000)]
powerpc/ftrace: Fix inverted check of create_branch()
In commit 24a1bdc35, "Fix ABIv2 issues with __ftrace_make_call", Anton
changed the logic that creates and patches the branch, and added a
thinko in the check of create_branch(). create_branch() returns the
instruction that was generated, so if we get zero then it succeeded.
The result is we can't ftrace modules:
Branch out of range
WARNING: at ../kernel/trace/ftrace.c:1638
ftrace failed to modify [<d000000004ba001c>] fuse_req_init_context+0x1c/0x90 [fuse]
We should probably fix patch_instruction() to do that check and make the
API saner, but that's a separate patch. For now just invert the test.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Tue, 17 Jun 2014 06:15:33 +0000 (16:15 +1000)]
powerpc/ftrace: Fix typo in mask of opcode
In commit 24a1bdc35, "Fix ABIv2 issues with __ftrace_make_call", Anton
changed the logic that checks for the expected code sequence when
patching a module.
We missed the typo in the mask, 0xffff00000 should be 0xffff0000, which
has the effect of making the test always true.
That makes it impossible to ftrace against modules, eg:
Unexpected call sequence: 48000008e8410018
WARNING: at ../kernel/trace/ftrace.c:1638
ftrace failed to modify [<d000000007cf001c>] rng_dev_open+0x1c/0x70 [rng_core]
Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Tue, 17 Jun 2014 06:15:32 +0000 (16:15 +1000)]
powerpc: Add ppc_global_function_entry()
ABIv2 has the concept of a global and local entry point to a function.
In most cases we are interested in the local entry point, and so that is
what ppc_function_entry() returns.
However we have a case in the ftrace code where we want the global entry
point, and there may be other places we need it too. Rather than special
casing each, add an accessor.
For ABIv1 and 32-bit there is only a single entry point, so we return
that. That means it's safe for the caller to use this without also
checking the ABI version.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Michael Ellerman [Mon, 23 Jun 2014 04:17:47 +0000 (14:17 +1000)]
powerpc: Remove ancient DEBUG_SIG code
We have some compile-time disabled debug code in signal_xx.c. It's from
some ancient time BG, almost certainly part of the original port, given
the very similar code on other arches.
The show_unhandled_signal logic, added in d0c3d534a438 (2.6.24) is
cleaner and prints more useful information, so drop the debug code.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Tue, 24 Jun 2014 00:05:28 +0000 (17:05 -0700)]
Merge tag 'compress-3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull compress bugfixes from Greg KH:
"Here are two bugfixes for some compression functions that resolve some
errors when uncompressing some pathalogical data. Both were found by
Don A Bailey"
* tag 'compress-3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
lz4: ensure length does not wrap
lzo: properly check for overruns
Linus Torvalds [Mon, 23 Jun 2014 23:48:14 +0000 (16:48 -0700)]
Merge branch 'akpm' (patches from Andrew Morton)
Merge fixes from Andrew Morton:
"The nmi patch and watchdog patch aren't actually fixes - they're
features which needed a few last-minutes touchups.
Otherwise, a rather large batch of fixes - ocfs2 review takes a while
and I got distracted and missed last week's batch"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (31 commits)
ocfs2/dlm: do not purge lockres that is queued for assert master
ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount
ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod
ocfs2: fix a tiny race when running dirop_fileop_racer
ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list()
ocfs2: refcount: take rw_lock in ocfs2_reflink
ocfs2: revert "ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously"
ocfs2: fix deadlock when two nodes are converting same lock from PR to EX and idletimeout closes conn
ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename()
mm: fix crashes from mbind() merging vmas
checkpatch: reduce false positives when checking void function return statements
ia64: arch/ia64/include/uapi/asm/fcntl.h needs personality.h
DMA, CMA: fix possible memory leak
slab: fix oops when reading /proc/slab_allocators
shmem: fix faulting into a hole while it's punched
mm: let mm_find_pmd fix buggy race with THP fault
mm: thp: fix DEBUG_PAGEALLOC oops in copy_page_rep()
kernel/watchdog.c: print traces for all cpus on lockup detection
nmi: provide the option to issue an NMI back trace to every cpu but current
Documentation/accounting/getdelays.c: add missing null-terminate after strncpy call
...
Xue jiufei [Mon, 23 Jun 2014 20:22:09 +0000 (13:22 -0700)]
ocfs2/dlm: do not purge lockres that is queued for assert master
When workqueue is delayed, it may occur that a lockres is purged while it
is still queued for master assert. it may trigger BUG() as follows.
N1 N2
dlm_get_lockres()
->dlm_do_master_requery
is the master of lockres,
so queue assert_master work
dlm_thread() start running
and purge the lockres
dlm_assert_master_worker()
send assert master message
to other nodes
receiving the assert_master
message, set master to N2
dlmlock_remote() send create_lock message to N2, but receive DLM_IVLOCKID,
if it is RECOVERY lockres, it triggers the BUG().
Another BUG() is triggered when N3 become the new master and send
assert_master to N1, N1 will trigger the BUG() because owner doesn't
match. So we should not purge lockres when it is queued for assert
master.
Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
jiangyiwen [Mon, 23 Jun 2014 20:22:09 +0000 (13:22 -0700)]
ocfs2: do not return DLM_MIGRATE_RESPONSE_MASTERY_REF to avoid endless,loop during umount
The following case may lead to endless loop during umount.
node A node B node C node D
umount volume,
migrate lockres1
to B
want to lock lockres1,
send
MASTER_REQUEST_MSG
to C
init block mle
send
MIGRATE_REQUEST_MSG
to C
find a block
mle, and then
return
DLM_MIGRATE_RESPONSE_MASTERY_REF
to B
set C in refmap
umount successfully
try to umount, endless
loop occurs when migrate
lockres1 since C is in
refmap
So we can fix this endless loop case by only returning
DLM_MIGRATE_RESPONSE_MASTERY_REF if it has a mastery mle when receiving
MIGRATE_REQUEST_MSG.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Xue jiufei <xuejiufei@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
jiangyiwen [Mon, 23 Jun 2014 20:22:09 +0000 (13:22 -0700)]
ocfs2: manually do the iput once ocfs2_add_entry failed in ocfs2_symlink and ocfs2_mknod
When the call to ocfs2_add_entry() failed in ocfs2_symlink() and
ocfs2_mknod(), iput() will not be called during dput(dentry) because no
d_instantiate(), and this will lead to umount hung.
Signed-off-by: jiangyiwen <jiangyiwen@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yiwen Jiang [Mon, 23 Jun 2014 20:22:09 +0000 (13:22 -0700)]
ocfs2: fix a tiny race when running dirop_fileop_racer
When running dirop_fileop_racer we found a dead lock case.
2 nodes, say Node A and Node B, mount the same ocfs2 volume. Create
/race/16/1 in the filesystem, and let the inode number of dir 16 is less
than the inode number of dir race.
Node A Node B
mv /race/16/1 /race/
right after Node A has got the
EX mode of /race/16/, and tries to
get EX mode of /race
ls /race/16/
In this case, Node A has got the EX mode of /race/16/, and wants to get EX
mode of /race/. Node B has got the PR mode of /race/, and wants to get
the PR mode of /race/16/. Since EX and PR are mutually exclusive, dead
lock happens.
This patch fixes this case by locking in ancestor order before trying
inode number order.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xue jiufei [Mon, 23 Jun 2014 20:22:08 +0000 (13:22 -0700)]
ocfs2/dlm: fix misuse of list_move_tail() in dlm_run_purge_list()
When a lockres in purge list but is still in use, it should be moved to
the tail of purge list. dlm_thread will continue to check next lockres in
purge list. However, code list_move_tail(&dlm->purge_list,
&lockres->purge) will do *no* movements, so dlm_thread will purge the same
lockres in this loop again and again. If it is in use for a long time,
other lockres will not be processed.
Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com> Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It crashes at
BUG_ON(create && (ext_flags & OCFS2_EXT_REFCOUNTED));
in ocfs2_direct_IO_get_blocks.
ocfs2_direct_IO_get_blocks is expecting the OCFS2_EXT_REFCOUNTED be removed in
ocfs2_prepare_inode_for_write() if it was there. But no cluster lock is taken
during the time before (or inside) ocfs2_prepare_inode_for_write() and after
ocfs2_direct_IO_get_blocks().
It can happen in this case:
Node A(which crashes) Node B
------------------------ ---------------------------
ocfs2_file_aio_write
ocfs2_prepare_inode_for_write
ocfs2_inode_lock
...
ocfs2_inode_unlock
#no refcount found
.... ocfs2_reflink
ocfs2_inode_lock
...
ocfs2_inode_unlock
#now, refcount flag set on extent
...
flush change to disk
ocfs2_direct_IO_get_blocks
ocfs2_get_clusters
#extent map miss
#buffer_head miss
read extents from disk
found refcount flag on extent
crash..
Fix:
Take rw_lock in ocfs2_reflink path
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xue jiufei [Mon, 23 Jun 2014 20:22:08 +0000 (13:22 -0700)]
ocfs2: revert "ocfs2: fix NULL pointer dereference when dismount and ocfs2rec simultaneously"
75f82eaa502c ("ocfs2: fix NULL pointer dereference when dismount and
ocfs2rec simultaneously") may cause umount hang while shutting down
truncate log.
The situation is as followes:
ocfs2_dismout_volume
-> ocfs2_recovery_exit
-> free osb->recovery_map
-> ocfs2_truncate_shutdown
-> lock global bitmap inode
-> ocfs2_wait_for_recovery
-> check whether osb->recovery_map->rm_used is zero
Because osb->recovery_map is already freed, rm_used can be any other
values, so it may yield umount hang.
Signed-off-by: joyce.xue <xuejiufei@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Two node cluster and both nodes hold a lock at PR level and both want to
convert to EX at the same time. Master node 1 has sent BAST and then
closes the connection due to idletime out. Node 0 receives BAST, sends
unlock req with cancel flag but gets error -ENOTCONN. The problem is
this error is ignored in dlm_send_remote_unlock_request() on the
**incorrect** assumption that the master is dead. See NOTE in comment
why it returns DLM_NORMAL. Upon getting DLM_NORMAL, node 0 proceeds to
sends convert (without cancel flg) which fails with -ENOTCONN. waits 5
sec and resends.
This time gets DLM_IVLOCKID from the master since lock not found in
grant, it had been moved to converting queue in response to conv PR->EX
req. No way out.
Node 1 (master) Node 0
============== ======
lock mode PR PR
convert PR -> EX
mv grant -> convert and que BAST
...
<-------- convert PR -> EX
convert que looks like this: ((node 1, PR -> EX) (node 0, PR -> EX))
...
BAST (want PR -> NL)
------------------>
...
idle timout, conn closed
...
In response to BAST,
sends unlock with cancel convert flag
gets -ENOTCONN. Ignores and
sends remote convert request
gets -ENOTCONN, waits 5 Sec, retries
...
reconnects
<----------------- convert req goes through on next try
does not find lock on grant que
status DLM_IVLOCKID
------------------>
...
No way out. Fix is to keep retrying unlock with cancel flag until it
succeeds or the master dies.
Signed-off-by: Tariq Saeed <tariq.x.saeed@oracle.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alex chen [Mon, 23 Jun 2014 20:22:07 +0000 (13:22 -0700)]
ocfs2: should add inode into orphan dir after updating entry in ocfs2_rename()
There are two files a and b in dir /mnt/ocfs2.
node A node B
mv a b
In ocfs2_rename(), after calling
ocfs2_orphan_add(), the inode of
file b will be added into orphan
dir.
If ocfs2_update_entry() fails,
ocfs2_rename return error and mv
operation fails. But file b still
exists in the parent dir.
ocfs2_queue_orphan_scan
-> ocfs2_queue_recovery_completion
-> ocfs2_complete_recovery
-> ocfs2_recover_orphans
The inode of the file b will be
put with iput().
ocfs2_evict_inode
-> ocfs2_delete_inode
-> ocfs2_wipe_inode
-> ocfs2_remove_inode
OCFS2_VALID_FL in the inode
i_flags will be cleared.
The file b still can be accessed
on node B.
ls /mnt/ocfs2
When first read the file b with
ocfs2_read_inode_block(). It will
validate the inode using
ocfs2_validate_inode_block().
Because OCFS2_VALID_FL not set in
the inode i_flags, so the file
system will be readonly.
So we should add inode into orphan dir after updating entry in
ocfs2_rename().
Signed-off-by: alex.chen <alex.chen@huawei.com> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 23 Jun 2014 20:22:07 +0000 (13:22 -0700)]
mm: fix crashes from mbind() merging vmas
In v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem")
introduced vma merging to mbind(), but it should have also changed the
convention of passing start vma from queue_pages_range() (formerly
check_range()) to new_vma_page(): vma merging may have already freed
that structure, resulting in BUG at mm/mempolicy.c:1738 and probably
worse crashes.
fs/notify/fanotify/fanotify_user.c: In function 'SYSC_fanotify_init':
fs/notify/fanotify/fanotify_user.c:726: error: implicit declaration of function 'personality'
fs/notify/fanotify/fanotify_user.c:726: error: 'PER_LINUX32' undeclared (first use in this function)
fs/notify/fanotify/fanotify_user.c:726: error: (Each undeclared identifier is reported only once
fs/notify/fanotify/fanotify_user.c:726: error: for each function it appears in.)
Joonsoo Kim [Mon, 23 Jun 2014 20:22:06 +0000 (13:22 -0700)]
slab: fix oops when reading /proc/slab_allocators
Commit b1cb0982bdd6 ("change the management method of free objects of
the slab") introduced a bug on slab leak detector
('/proc/slab_allocators'). This detector works like as following
decription.
1. traverse all objects on all the slabs.
2. determine whether it is active or not.
3. if active, print who allocate this object.
but that commit changed the way how to manage free objects, so the logic
determining whether it is active or not is also changed. In before, we
regard object in cpu caches as inactive one, but, with this commit, we
mistakenly regard object in cpu caches as active one.
This intoduces kernel oops if DEBUG_PAGEALLOC is enabled. If
DEBUG_PAGEALLOC is enabled, kernel_map_pages() is used to detect who
corrupt free memory in the slab. It unmaps page table mapping if object
is free and map it if object is active. When slab leak detector check
object in cpu caches, it mistakenly think this object active so try to
access object memory to retrieve caller of allocation. At this point,
page table mapping to this object doesn't exist, so oops occurs.
Following is oops message reported from Dave.
It blew up when something tried to read /proc/slab_allocators
(Just cat it, and you should see the oops below)
To fix the problem, I introduce an object status buffer on each slab.
With this, we can track object status precisely, so slab leak detector
would not access active object and no kernel oops would occur. Memory
overhead caused by this fix is only imposed to CONFIG_DEBUG_SLAB_LEAK
which is mainly used for debugging, so memory overhead isn't big
problem.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reported-by: Dave Jones <davej@redhat.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reviewed-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 23 Jun 2014 20:22:06 +0000 (13:22 -0700)]
shmem: fix faulting into a hole while it's punched
Trinity finds that mmap access to a hole while it's punched from shmem
can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE)
from completing, until the reader chooses to stop; with the puncher's
hold on i_mutex locking out all other writers until it can complete.
It appears that the tmpfs fault path is too light in comparison with its
hole-punching path, lacking an i_data_sem to obstruct it; but we don't
want to slow down the common case.
Extend shmem_fallocate()'s existing range notification mechanism, so
shmem_fault() can refrain from faulting pages into the hole while it's
punched, waiting instead on i_mutex (when safe to sleep; or repeatedly
faulting when not).
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I believe this comes about because, whereas collapsing and splitting THP
functions take anon_vma lock in write mode (which excludes concurrent
rmap walks), faulting THP functions (write protection and misplaced
NUMA) do not - and mostly they do not need to.
But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, for
an instant (indeed, for a long instant, given the inter-CPU TLB flush in
there), leaves *pmd neither present not trans_huge.
Which can confuse a concurrent rmap walk, as when removing migration
ptes, seen in the dumped trace. Although that rmap walk has a 4k page
to insert, anon_vmas containing THPs are in no way segregated from
4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to cope with
that instant when a trans_huge pmd is temporarily absent.
I don't think we need strengthen the locking at the THP end: it's easily
handled with an ACCESS_ONCE() before testing both conditions.
And since mm_find_pmd() had only one caller who wanted a THP rather than
a pmd, let's slightly repurpose it to fail when it hits a THP or
non-present pmd, and open code split_huge_page_address() again.
Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Bob Liu <bob.liu@oracle.com> Cc: Christoph Lameter <cl@gentwo.org> Cc: Dave Jones <davej@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 23 Jun 2014 20:22:05 +0000 (13:22 -0700)]
mm: thp: fix DEBUG_PAGEALLOC oops in copy_page_rep()
Trinity has for over a year been reporting a CONFIG_DEBUG_PAGEALLOC oops
in copy_page_rep() called from copy_user_huge_page() called from
do_huge_pmd_wp_page().
I believe this is a DEBUG_PAGEALLOC false positive, due to the source
page being split, and a tail page freed, while copy is in progress; and
not a problem without DEBUG_PAGEALLOC, since the pmd_same() check will
prevent a miscopy from being made visible.
Fix by adding get_user_huge_page() and put_user_huge_page(): reducing to
the usual get_page() and put_page() on head page in the usual config;
but get and put references to all of the tail pages when
DEBUG_PAGEALLOC.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Cc: Dave Jones <davej@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aaron Tomlin [Mon, 23 Jun 2014 20:22:05 +0000 (13:22 -0700)]
kernel/watchdog.c: print traces for all cpus on lockup detection
A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than a predefined period to time, without giving
other tasks a chance to run.
Currently, upon detection of this condition by the per-cpu watchdog
task, debug information (including a stack trace) is sent to the system
log.
On some occasions, we have observed that the "victim" rather than the
actual "culprit" (i.e. the owner/holder of the contended resource) is
reported to the user. Often this information has proven to be
insufficient to assist debugging efforts.
To avoid loss of useful debug information, for architectures which
support NMI, this patch makes it possible to improve soft lockup
reporting. This is accomplished by issuing an NMI to each cpu to obtain
a stack trace.
If NMI is not supported we just revert back to the old method. A sysctl
and boot-time parameter is available to toggle this feature.
[dzickus@redhat.com: add CONFIG_SMP in certain areas]
[akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
[mq@suse.cz: fix warning] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jan Moskyto Matejka <mq@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aaron Tomlin [Mon, 23 Jun 2014 20:22:05 +0000 (13:22 -0700)]
nmi: provide the option to issue an NMI back trace to every cpu but current
Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
routine when one wants to avoid capturing a back trace for current. For
instance if one was previously captured recently.
This patch provides a new routine namely
trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
an NMI to every cpu but current and capture a back trace accordingly.
Patch x86 and sparc to support new routine.
[dzickus@redhat.com: add stub in #else clause]
[dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
[sfr@canb.auug.org.au: undo C99ism] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
However, if the percpu_pagelist_fraction sysctl is set by the user, it
is also impossible to restore it to the kernel default since the user
cannot write 0 to the sysctl.
This patch allows the user to write 0 to restore the default behavior.
It still requires a fraction equal to or larger than 8, however, as
stated by the documentation for sanity. If a value in the range [1, 7]
is written, the sysctl will return EINVAL.
This successfully solves the divide by zero issue at the same time.
Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Oleg Drokin <green@linuxhacker.ru> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
What happened is after updating the watchdog_thresh, the lockup detector
is restarted to utilize the new value. Part of this process involved
disabling preemption. Once preemption was disabled, perf tried to
allocate a new event (as part of the restart). This caused the above
BUG_ON as you can't sleep with preemption disabled.
The preemption restriction seemed agressive as we are not doing anything
on that particular cpu, but with all the online cpus (which are
protected by the get_online_cpus lock). Remove the restriction and the
BUG_ON goes away.
Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Reported-by: Peter Wu <peter@lekensteyn.nl> Tested-by: Peter Wu <peter@lekensteyn.nl> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.13+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove one inactive maintainer, add two new ones and update my email
address. Plus add Andrew. And fix the glob to include files like
mm/slab_common.c
Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Mon, 23 Jun 2014 20:22:03 +0000 (13:22 -0700)]
hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entry
There's a race between fork() and hugepage migration, as a result we try
to "dereference" a swap entry as a normal pte, causing kernel panic.
The cause of the problem is that copy_hugetlb_page_range() can't handle
"swap entry" family (migration entry and hwpoisoned entry) so let's fix
it.
Hugh Dickins [Mon, 23 Jun 2014 20:22:03 +0000 (13:22 -0700)]
tmpfs: ZERO_RANGE and COLLAPSE_RANGE not currently supported
I was well aware of FALLOC_FL_ZERO_RANGE and FALLOC_FL_COLLAPSE_RANGE
support being added to fallocate(); but didn't realize until now that I
had been too stupid to future-proof shmem_fallocate() against new
additions. -EOPNOTSUPP instead of going on to ordinary fallocation.
David Rientjes [Mon, 23 Jun 2014 20:22:03 +0000 (13:22 -0700)]
mm, hotplug: probe interface is available on several platforms
Documentation/memory-hotplug.txt incorrectly states that the memory
driver "probe" interface is only supported on powerpc and is vague about
its application on x86. Clarify the platforms that make this interface
available if memory hotplug is enabled.
Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Petr Tesarik [Mon, 23 Jun 2014 20:22:03 +0000 (13:22 -0700)]
kexec: save PG_head_mask in VMCOREINFO
To allow filtering of huge pages, makedumpfile must be able to identify
them in the dump. This can be done by checking the appropriate page
flag, so communicate its value to makedumpfile through the VMCOREINFO
interface.
There's only one small catch. Depending on how many page flags are
available on a given architecture, this bit can be called PG_head or
PG_compound.
I sent a similar patch back in 2012, but Eric Biederman did not like
using an #ifdef. So, this time I'm adding a common symbol
(PG_head_mask) instead.
See https://lkml.org/lkml/2012/11/28/91 for the previous version.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Shaohua Li <shli@kernel.org> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Srivatsa S. Bhat [Mon, 23 Jun 2014 20:22:02 +0000 (13:22 -0700)]
CPU hotplug, smp: flush any pending IPI callbacks before CPU offline
There is a race between the CPU offline code (within stop-machine) and
the smp-call-function code, which can lead to getting IPIs on the
outgoing CPU, *after* it has gone offline.
Specifically, this can happen when using
smp_call_function_single_async() to send the IPI, since this API allows
sending asynchronous IPIs from IRQ disabled contexts. The exact race
condition is described below.
During CPU offline, in stop-machine, we don't enforce any rule in the
_DISABLE_IRQ stage, regarding the order in which the outgoing CPU and
the other CPUs disable their local interrupts. Due to this, we can
encounter a situation in which an IPI is sent by one of the other CPUs
to the outgoing CPU (while it is *still* online), but the outgoing CPU
ends up noticing it only *after* it has gone offline.
CPU 1 CPU 2
(Online CPU) (CPU going offline)
Enter _PREPARE stage Enter _PREPARE stage
Enter _DISABLE_IRQ stage
=
Got a device interrupt, and | Didn't notice the IPI
the interrupt handler sent an | since interrupts were
IPI to CPU 2 using | disabled on this CPU.
smp_call_function_single_async() |
=
Enter _DISABLE_IRQ stage
Enter _RUN stage Enter _RUN stage
=
Busy loop with interrupts | Invoke take_cpu_down()
disabled. | and take CPU 2 offline
=
Enter _EXIT stage Enter _EXIT stage
Re-enable interrupts Re-enable interrupts
The pending IPI is noted
immediately, but alas,
the CPU is offline at
this point.
This of course, makes the smp-call-function IPI handler code running on
CPU 2 unhappy and it complains about "receiving an IPI on an offline
CPU".
One real example of the scenario on CPU 1 is the block layer's
complete-request call-path:
However, if we look closely, the block layer does check that the target
CPU is online before firing the IPI. So in this case, it is actually
the unfortunate ordering/timing of events in the stop-machine phase that
leads to receiving IPIs after the target CPU has gone offline.
In reality, getting a late IPI on an offline CPU is not too bad by
itself (this can happen even due to hardware latencies in IPI
send-receive). It is a bug only if the target CPU really went offline
without executing all the callbacks queued on its list. (Note that a
CPU is free to execute its pending smp-call-function callbacks in a
batch, without waiting for the corresponding IPIs to arrive for each one
of those callbacks).
So, fixing this issue can be broken up into two parts:
1. Ensure that a CPU goes offline only after executing all the
callbacks queued on it.
2. Modify the warning condition in the smp-call-function IPI handler
code such that it warns only if an offline CPU got an IPI *and* that
CPU had gone offline with callbacks still pending in its queue.
Achieving part 1 is straight-forward - just flush (execute) all the
queued callbacks on the outgoing CPU in the CPU_DYING stage[1],
including those callbacks for which the source CPU's IPIs might not have
been received on the outgoing CPU yet. Once we do this, an IPI that
arrives late on the CPU going offline (either due to the race mentioned
above, or due to hardware latencies) will be completely harmless, since
the outgoing CPU would have executed all the queued callbacks before
going offline.
Overall, this fix (parts 1 and 2 put together) additionally guarantees
that we will see a warning only when the *IPI-sender code* is buggy -
that is, if it queues the callback _after_ the target CPU has gone
offline.
[1]. The CPU_DYING part needs a little more explanation: by the time we
execute the CPU_DYING notifier callbacks, the CPU would have already
been marked offline. But we want to flush out the pending callbacks at
this stage, ignoring the fact that the CPU is offline. So restructure
the IPI handler code so that we can by-pass the "is-cpu-offline?" check
in this particular case. (Of course, the right solution here is to fix
CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING
notifiers, but this requires a lot of audit to ensure that this change
doesn't break any existing code; hence lets go with the solution
proposed above until that is done).
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Suggested-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Borislav Petkov <bp@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sachin Kamat <sachin.kamat@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steven Miao [Mon, 23 Jun 2014 20:22:02 +0000 (13:22 -0700)]
mm: nommu: per-thread vma cache fix
mm could be removed from current task struct, using previous vma->vm_mm
It will crash on blackfin after updated to Linux 3.15. The commit "mm:
per-thread vma caching" caused the crash. mm could be removed from
current task struct before
Given some pathologically compressed data, lz4 could possibly decide to
wrap a few internal variables, causing unknown things to happen. Catch
this before the wrapping happens and abort the decompression.
The lzo decompressor can, if given some really crazy data, possibly
overrun some variable types. Modify the checking logic to properly
detect overruns before they happen.
Reported-by: "Don A. Bailey" <donb@securitymouse.com> Tested-by: "Don A. Bailey" <donb@securitymouse.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Romain Francoise [Thu, 12 Jun 2014 08:42:34 +0000 (10:42 +0200)]
vhost-net: don't open-code kvfree
Commit 23cc5a991c ("vhost-net: extend device allocation to vmalloc")
added another open-coded version of kvfree (which is available since
v3.15-rc5), nuke it.
Signed-off-by: Romain Francoise <romain@orebokech.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Linus Torvalds [Sun, 22 Jun 2014 05:01:15 +0000 (19:01 -1000)]
Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c new drivers from Wolfram Sang:
"Here is a pull request from i2c hoping for the "new driver" rule.
Originally, I wanted to send this request during the merge window, but
code checkers with very recent additions complained, so a few fixups
were needed. So, some more time went by and I merged rc1 to get a
stable base"
So the "new driver" rule is really about drivers that people absolutely
need for the kernel to work on new hardware, which is not so much the
case for i2c. So I considered not pulling this, but eventually
relented.
Just for FYI: the whole (and only) point of "new drivers" is not that
new drivers cannot regress things (they can, and they have - by
triggering badly tested code on machines that never triggered that code
before), but because they can bring to life machines that otherwise
wouldn't be useful at all without the drivers.
So the new driver rule is for essential things that actual consumers
would care about, ie devices like networking or disk drivers that matter
to normal people (not server people - they run old kernels anyway, so
mainlining new drivers is irrelevant for them).
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: sun6-p2wi: fix call to snprintf
i2c: rk3x: add NULL entry to the end of_device_id array
i2c: sun6i-p2wi: use proper return value in probe
i2c: sunxi: add P2WI (Push/Pull 2 Wire Interface) controller support
i2c: sunxi: add P2WI DT bindings documentation
i2c: rk3x: add driver for Rockchip RK3xxx SoC I2C adapter
Linus Torvalds [Sun, 22 Jun 2014 02:40:30 +0000 (16:40 -1000)]
Merge tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux
Pull file locking fixes from Jeff Layton:
"File locking related bugfixes
Nothing too earth-shattering here. A fix for a potential regression
due to a patch in pile #1, and the addition of a memory barrier to
prevent a race condition between break_deleg and generic_add_lease"
* tag 'locks-v3.16-2' of git://git.samba.org/jlayton/linux:
locks: set fl_owner for leases back to current->files
locks: add missing memory barrier in break_deleg
Linus Torvalds [Sun, 22 Jun 2014 02:38:16 +0000 (16:38 -1000)]
Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild fixes from Michal Marek:
"There are three fixes for regressions caused by the relative paths
series: deb-pkg, tar-pkg and *docs did not work with O=.
Plus, there is a fix for the linux-headers deb package and a fixed
typo. These are not regression fixes but are safe enough"
* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
kbuild: fix a typo in a kbuild document
builddeb: fix missing headers in linux-headers package
Documentation: Fix DocBook build with relative $(srctree)
kbuild: Fix tar-pkg with relative $(objtree)
deb-pkg: Fix for relative paths
Linus Torvalds [Sun, 22 Jun 2014 00:21:43 +0000 (14:21 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"This fixes some lockups in btrfs reported with rc1. It probably has
some performance impact because it is backing off our spinning locks
more often and switching to a blocking lock. I'll be able to nail
that down next week, but for now I want to get the lockups taken care
of.
Otherwise some more stack reduction and assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix wrong error handle when the device is missing or is not writeable
Btrfs: fix deadlock when mounting a degraded fs
Btrfs: use bio_endio_nodec instead of open code
Btrfs: fix NULL pointer crash when running balance and scrub concurrently
btrfs: Skip scrubbing removed chunks to avoid -ENOENT.
Btrfs: fix broken free space cache after the system crashed
Btrfs: make free space cache write out functions more readable
Btrfs: remove unused wait queue in struct extent_buffer
Btrfs: fix deadlocks with trylock on tree nodes
Linus Torvalds [Sun, 22 Jun 2014 00:20:38 +0000 (14:20 -1000)]
Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
"Fixes for a new regression from the xdr encoding rewrite, and a
delegation problem we've had for a while (made somewhat more annoying
by the vfs delegation support added in 3.13)"
* 'for-3.16' of git://linux-nfs.org/~bfields/linux:
NFSD: fix bug for readdir of pseudofs
NFSD: Don't hand out delegations for 30 seconds after recalling them.
Linus Torvalds [Sat, 21 Jun 2014 17:07:17 +0000 (07:07 -1000)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"This is larger than usual: the main reason are the ARM symbol lookup
speedups that came in late and were hard to resist.
There's also a kprobes fix and various tooling fixes, plus the minimal
re-enablement of the mmap2 support interface"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/kprobes: Fix build errors and blacklist context_track_user
perf tests: Add test for closing dso objects on EMFILE error
perf tests: Add test for caching dso file descriptors
perf tests: Allow reuse of test_file function
perf tests: Spawn child for each test
perf tools: Add dso__data_* interface descriptons
perf tools: Allow to close dso fd in case of open failure
perf tools: Add file size check and factor dso__data_read_offset
perf tools: Cache dso data file descriptor
perf tools: Add global count of opened dso objects
perf tools: Add global list of opened dso objects
perf tools: Add data_fd into dso object
perf tools: Separate dso data related variables
perf tools: Cache register accesses for unwind processing
perf record: Fix to honor user freq/interval properly
perf timechart: Reflow documentation
perf probe: Improve error messages in --line option
perf probe: Improve an error message of perf probe --vars mode
perf probe: Show error code and description in verbose mode
perf probe: Improve error message for unknown member of data structure
...
Linus Torvalds [Sat, 21 Jun 2014 17:06:02 +0000 (07:06 -1000)]
Merge branch 'locking-urgent-for-linus.patch' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull rtmutex fixes from Thomas Gleixner:
"Another three patches to make the rtmutex code more robust. That's
the last urgent fallout from the big futex/rtmutex investigation"
* 'locking-urgent-for-linus.patch' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtmutex: Plug slow unlock race
rtmutex: Detect changes in the pi lock chain
rtmutex: Handle deadlock detection smarter
Linus Torvalds [Sat, 21 Jun 2014 16:47:01 +0000 (06:47 -1000)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 patches from Martin Schwidefsky:
"A couple of bug fixes, a debug change for qdio, an update for the
default config, and one small extension.
The watchdog module based on diagnose 0x288 is converted to the
watchdog API and it now works under LPAR as well"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/ccwgroup: use ccwgroup_ungroup wrapper
s390/ccwgroup: fix an uninitialized return code
s390/ccwgroup: obtain extra reference for asynchronous processing
qdio: Keep device-specific dbf entries
s390/compat: correct ucontext layout for high gprs
s390/cio: set device name as early as possible
s390: update default configuration
s390: avoid format strings leaking into names
s390/airq: silence lockdep warning
s390/watchdog: add support for LPAR operation (diag288)
s390/watchdog: use watchdog API
s390/sclp_vt220: Enable ASCII console per default
s390/qdio: replace shift loop by ilog2
s390/cio: silence lockdep warning
s390/uaccess: always load the kernel ASCE after task switch
s390/ap_bus: Make modules parameters visible in sysfs
Linus Torvalds [Sat, 21 Jun 2014 16:45:54 +0000 (06:45 -1000)]
Merge tag 'for-linus' of git://github.com/gxt/linux
Pull UniCore32 bug fixes from Guan Xuetao:
"This includes bugfixes to make unicore32 successfully build under
defconfig, and some changes for allmodconfig (though not finished)"
* tag 'for-linus' of git://github.com/gxt/linux:
unicore32: Remove ARCH_HAS_CPUFREQ config option
UniCore32: Change git tree location information in MAINTAINERS
arch: unicore32: ksyms: export '__cpuc_coherent_kern_range' to avoid compiling failure
arch: unicore32: ksyms: export 'pm_power_off' to avoid compiling failure.
arch: unicore32: ksyms: export additional find_first_*() to avoid compiling failure
arch:unicore32:mm: add devmem_is_allowed() to support STRICT_DEVMEM
unicore32: include: asm: add missing ')' for PAGE_* macros in pgtable.h
arch/unicore32/kernel/setup.c: add generic 'screen_info' to avoid compiling failure
drivers: scsi: mvsas: fix compiling issue by adding 'MVS_' for "enum pci_interrupt_cause"
arch: unicore32: kernel: ksyms: remove 'bswapsi2' and 'muldi3' to avoid compiling failure
arch/unicore32/kernel/ksyms.c: remove 2 export symbols to avoid compiling failure
drivers/rtc/rtc-puv3.c: remove "&dev->" for typo issue MIME-Version: 1.0
drivers/rtc/rtc-puv3.c: use dev_dbg() instead of dev_debug() for typo issue
arch/unicore32/include/asm/io.h: add readl_relaxed() generic definition
arch/unicore32/include/asm/ptrace.h: add generic definition for profile_pc()
arch/unicore32/mm/alignment.c: include "asm/pgtable.h" to avoid compiling error
arch/unicore32/kernel/clock.c: add readl() and writel() for 'PM_' macros
arch/unicore32/kernel/module.c: use __vmalloc_node_range() instead of __vmalloc_area()
arch/unicore32/kernel/ksyms.c: remove several undefined exported symbols
Linus Torvalds [Sat, 21 Jun 2014 16:43:19 +0000 (06:43 -1000)]
Merge tag 'char-misc-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver fixes from Greg KH:
"Here are 3 patches, one a revert of the UIO patch you objected to in
3.16-rc1 and that no one wanted to defend, a w1 driver bugfix, and a
MAINTAINERS update for the vmware balloon driver.
All of these, except for the MAINTAINERS update which just got added,
have been in linux-next just fine"
* tag 'char-misc-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
MAINTAINERS: add entry for VMware Balloon driver
w1: mxc_w1: Fix incorrect "presence" status
Revert "uio: fix vma io range check in mmap"
Linus Torvalds [Sat, 21 Jun 2014 16:42:40 +0000 (06:42 -1000)]
Merge tag 'staging-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are a few fixes for staging and iio drivers that resolve issues
reported in 3.16-rc1.
All have been in linux-next just fine"
* tag 'staging-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
imx-drm: parallel-display: Fix DPMS default state.
staging: android: timed_output: fix use after free of dev
staging: comedi: addi_apci_1564: add addi_watchdog dependency
staging: rtl8723au: Reference correct firmwarefiles with MODULE_FIRMWARE()
staging: rtl8723au: Request correct firmware file for A-cut parts
iio: adc: checking for NULL instead of IS_ERR() in probe
iio: adc: at91: signedness bug in at91_adc_get_trigger_value_by_name()
iio: mxs-lradc: fix divider
iio: Fix endianness issue in ak8975_read_axis()
staging/iio: IIO_SIMPLE_DUMMY_BUFFER neds IIO_BUFFER
twl4030-madc: Request processed values in twl4030_get_madc_conversion
staging: iio: tsl2x7x_core: fix proximity treshold
iio: Fix two mpl3115 issues in measurement conversion
iio: hid-sensors: Get feature report from sensor hub after changing power state
Linus Torvalds [Sat, 21 Jun 2014 16:41:42 +0000 (06:41 -1000)]
Merge tag 'tty-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial bugfixes from Greg KH:
"Here are some tty / serial driver bugfixes for 3.16-rc2 that resolve
some reported issues. The samsung driver build error itself has been
reported by a bunch of people, sorry about that one. The others are
all tiny and everyone seems to like them in linux-next so far"
* tag 'tty-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty/serial: fix 8250 early console option passing to regular console
tty: Correct INPCK handling
serial: Fix IGNBRK handling
serial: samsung: Fix build error
Linus Torvalds [Sat, 21 Jun 2014 16:41:07 +0000 (06:41 -1000)]
Merge tag 'usb-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some USB fixes for 3.16-rc2 that resolve some reported
issues. All of these have been in linux-next for a while with no
problems"
* tag 'usb-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: usbtest: add a timeout for scatter-gather tests
USB: EHCI: avoid BIOS handover on the HASEE E200
usb: fix hub-port pm_runtime_enable() vs runtime pm transitions
usb: quiet peer failure warning, disable poweroff
usb: improve "not suspended yet" message in hub_suspend()
xhci: Fix sleeping with IRQs disabled in xhci_stop_device()
usb: fix ->update_hub_device() vs hdev->maxchild
Steven Rostedt [Tue, 17 Jun 2014 12:59:16 +0000 (08:59 -0400)]
tracing: Add __field_struct macro for TRACE_EVENT()
Currently the __field() macro in TRACE_EVENT is only good for primitive
values, such as integers and pointers, but it fails on complex data types
such as structures or unions. This is because the __field() macro
determines if the variable is signed or not with the test of:
(((type)(-1)) < (type)1)
Unfortunately, that fails when type is a structure.
Since trace events should support structures as fields a new macro
is created for such a case called __field_struct() which acts exactly
the same as __field() does but it does not do the signed type check
and just uses a constant false for that answer.
Cc: Tony Luck <tony.luck@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
tracing: syscall_regfunc() should not skip kernel threads
syscall_regfunc() ignores the kernel threads because "it has no effect",
see cc3b13c1 "Don't trace kernel thread syscalls" which added this check.
However, this means that a user-space task spawned by call_usermodehelper()
will run without TIF_SYSCALL_TRACEPOINT if sys_tracepoint_refcount != 0.
Remove this check. The unnecessary report from ret_from_fork path mentioned
by cc3b13c1 is no longer possible, see See commit fb45550d76bb5 "make sure
that kernel_thread() callbacks call do_exit() themselves".
A kernel_thread() callback can only return and take the int_ret_from_sys_call
path after do_execve() succeeds, otherwise the kernel will crash. But in this
case it is no longer a kernel thread and thus is needs TIF_SYSCALL_TRACEPOINT.
tracing: Fix syscall_*regfunc() vs copy_process() race
syscall_regfunc() and syscall_unregfunc() should set/clear
TIF_SYSCALL_TRACEPOINT system-wide, but do_each_thread() can race
with copy_process() and miss the new child which was not added to
the process/thread lists yet.
Change copy_process() to update the child's TIF_SYSCALL_TRACEPOINT
under tasklist.
Link: http://lkml.kernel.org/p/20140413185854.GB20668@redhat.com Cc: stable@vger.kernel.org # 2.6.33 Fixes: a871bd33a6c0 "tracing: Add syscall tracepoints" Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Russell King [Fri, 20 Jun 2014 10:23:02 +0000 (11:23 +0100)]
ARM: arm925: ensure assembly sets up writethrough mapping
Commit ca8f0b0a545f ("ARM: ensure C page table setup code follows
assembly code") did what it said on the tin, but some of the older
CPU code omitted the default cache policy from their files. This
results in the kernel running with the caches disabled. Fix this
for ARM925.
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Linus Torvalds [Fri, 20 Jun 2014 04:58:57 +0000 (18:58 -1000)]
Merge tag 'pm+acpi-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management fixes from Rafael Wysocki:
"These are fixes mostly (ia64 regression related to the ACPI
enumeration of devices, cpufreq regressions, fix for I2C controllers
included in Intel SoCs, mvebu cpuidle driver fix related to sysfs)
plus additional kernel command line arguments from Kees to make it
possible to build kernel images with hibernation and the kernel
address space randomization included simultaneously, a new ACPI
battery driver quirk for a system with a broken BIOS and a couple of
ACPI core cleanups.
Specifics:
- Fix for an ia64 regression introduced during the 3.11 cycle by a
commit that modified the hardware initialization ordering and made
device discovery fail on some systems.
- Fix for a build problem on systems where the cpufreq-cpu0 driver is
built-in and the cpu-thermal driver is modular from Arnd Bergmann.
- Fix for a recently introduced computational mistake in the
intel_pstate driver that leads to excessive rounding errors from
Doug Smythies.
- Fix for a failure code path in cpufreq_update_policy() that fails
to unlock the locks acquired previously from Aaron Plattner.
- Fix for the cpuidle mvebu driver to use shorter state names which
will prevent the sysfs interface from returning mangled strings.
From Gregory Clement.
- ACPI LPSS driver fix to make sure that the I2C controllers included
in BayTrail SoCs are not held in the reset state while they are
being probed from Mika Westerberg.
- New kernel command line arguments making it possible to build
kernel images with hibernation and kASLR included at the same time
and to select which of them will be used via the command line (they
are still functionally mutually exclusive, though). From Kees
Cook.
- ACPI battery driver quirk for Acer Aspire V5-573G that fails to
send battery status change notifications timely from Alexander
Mezin.
- Two ACPI core cleanups from Christoph Jaeger and Fabian Frederick"
* tag 'pm+acpi-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: mvebu: Fix the name of the states
cpufreq: unlock when failing cpufreq_update_policy()
intel_pstate: Correct rounding in busy calculation
ACPI: use kstrto*() instead of simple_strto*()
ACPI / processor replace __attribute__((packed)) by __packed
ACPI / battery: add quirk for Acer Aspire V5-573G
ACPI / battery: use callback for setting up quirks
ACPI / LPSS: Take I2C host controllers out of reset
x86, kaslr: boot-time selectable with hibernation
PM / hibernate: introduce "nohibernate" boot parameter
cpufreq: cpufreq-cpu0: fix CPU_THERMAL dependency
ACPI / ia64 / sba_iommu: Restore the working initialization ordering
Linus Torvalds [Fri, 20 Jun 2014 04:49:37 +0000 (18:49 -1000)]
Merge tag 'sound-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The significant part here is a few security fixes for ALSA core
control API by Lars. Besides that, there are a few fixes for ASoC
sigmadsp (again by Lars) for building properly, and small fixes for
ASoC rsnd, MMP, PXA and FSL, in addition to a fix for bogus WARNING in
i915/HD-audio binding"
* tag 'sound-3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: control: Make sure that id->index does not overflow
ALSA: control: Handle numid overflow
ALSA: control: Don't access controls outside of protected regions
ALSA: control: Fix replacing user controls
ALSA: control: Protect user controls against concurrent access
drm/i915, HD-audio: Don't continue probing when nomodeset is given
ASoC: fsl: Fix build problem
ASoC: rsnd: fixup index of src/dst mod when capture
ASoC: fsl_spdif: Fix integer overflow when calculating divisors
ASoC: fsl_spdif: Fix incorrect usage of regmap_read()
ASoC: dapm: Make sure register value is in sync with DAPM kcontrol state
ASoC: sigmadsp: Split regmap and I2C support into separate modules
ASoC: MMP audio needs sram support
ASoC: pxa: add I2C dependencies as needed
Linus Torvalds [Fri, 20 Jun 2014 04:40:36 +0000 (18:40 -1000)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"This looks bigger than it is, as one of the nouveau firmware fixes
("drm/gf100-/gr: report class data to host on fwmthd failure")
regenerates a bunch of the firmware files after changing the assembly
by a few lines, without that, its more of a
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (33 commits)
drm: fix uninitialized acquire_ctx fields (v2)
drm/radeon: Fix radeon_irq_kms_pflip_irq_get/put() imbalance
Revert "drm/radeon: remove drm_vblank_get|put from pflip handling"
drm/radeon: improve dvi_mode_valid
drm/radeon: update mode_valid testing for DP
drm/radeon: Use dce5/6 hdmi deep color clock setup also on dce8+
drm/nouveau/disp: fix oops in destructor with headless cards
drm/gf117/i2c: no aux channels on this chipset
drm/nouveau/doc: update the thermal documentation
drm/nouveau/pwr: fix typo in fifo wrap handling
drm/nv50/disp: fix a potential oops in supervisor handling
drm/nouveau/disp/dp: don't touch link config after success
drm/nouveau/kms: reference vblank for crtc during pageflip.
drm/gk104/fb/ram: fixups from an earlier search+replace
drm/nv50/gr: remove an unneeded write while initialising PGRAPH
drm/nv50/gr: fix overlap while zeroing zcull regions
drm/gf100-/gr: report class data to host on fwmthd failure
drm/gk104/ibus: increase various random timeouts
drm/gk104/clk: only touch divider for mode we'll be using
drm/radeon: Bypass hw lut's for > 8 bpc framebuffer scanout.
...
Linus Torvalds [Fri, 20 Jun 2014 03:56:43 +0000 (17:56 -1000)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A smaller collection of fixes for the block core that would be nice to
have in -rc2. This pull request contains:
- Fixes for races in the wait/wakeup logic used in blk-mq from
Alexander. No issues have been observed, but it is definitely a
bit flakey currently. Alternatively, we may drop the cyclic
wakeups going forward, but that needs more testing.
- Some cleanups from Christoph.
- Fix for an oops in null_blk if queue_mode=1 and softirq completions
are used. From me.
- A fix for a regression caused by the chunk size setting. It
inadvertently used max_hw_sectors instead of max_sectors, which is
incorrect, and causes hangs on btrfs multi-disk setups (where hw
sectors apparently isn't set). From me.
- Removal of WQ_POWER_EFFICIENT in the kblockd creation. This was a
recent addition as well, but it actually breaks blk-mq which relies
on strict scheduling. If the workqueue power_efficient mode is
turned on, this breaks blk-mq. From Matias.
- null_blk module parameter description fix from Mike"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: bitmap tag: fix races in bt_get() function
blk-mq: bitmap tag: fix race on blk_mq_bitmap_tags::wake_cnt
blk-mq: bitmap tag: fix races on shared ::wake_index fields
block: blk_max_size_offset() should check ->max_sectors
null_blk: fix softirq completions for queue_mode == 1
blk-mq: merge blk_mq_drain_queue and __blk_mq_drain_queue
blk-mq: properly drain stopped queues
block: remove WQ_POWER_EFFICIENT from kblockd
null_blk: fix name and description of 'queue_mode' module parameter
block: remove elv_abort_queue and blk_abort_flushes
Linus Torvalds [Fri, 20 Jun 2014 03:53:20 +0000 (17:53 -1000)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"A first set of bug fixes that didn't make it for the merge window, and
two Kconfig cleanups that still make sense at this point.
Unfortunately, one of the two cleanups caused an unintended change in
the original version, so we had to revert one part of it and do some
more testing to ensure the rest is really fine. There was also a
last-minute rebase of the patches to remove another bad commit"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: use menuconfig for sub-arch menus
ARM: multi_v7_defconfig: re-enable SDHCI drivers
ARM: EXYNOS: Fix compilation warning
ARM: exynos: move sysram info to exynos.c
ARM: dts: Specify the NAND ECC scheme explicitly on Armada 385 DB board
ARM: dts: Specify the NAND ECC scheme explicitly on Armada 375 DB board
ARM: exynos: cleanup kconfig option display
misc: vexpress: fix error handling vexpress_syscfg_regmap_init()
ARM: Remove ARCH_HAS_CPUFREQ config option
ARM: integrator: fix section mismatch problem
ARM: mvebu: DT: fix OpenBlocks AX3-4 RAM size
ARM: samsung: make SAMSUNG_DMADEV optional
remoteproc: da8xx: don't select CMA on no-MMU
bus/arm-cci: add dependency on OF && CPU_V7
ARM: keystone requires ARM_PATCH_PHYS_VIRT
ARM: omap2: fix am43xx dependency on l2x0 cache
Stephen Boyd [Tue, 3 Jun 2014 18:30:03 +0000 (11:30 -0700)]
unicore32: Remove ARCH_HAS_CPUFREQ config option
This config exists entirely to hide the cpufreq menu from the
kernel configuration unless a platform has selected it. Nothing
is actually built if this config is 'Y' and it just leads to more
patches that add a select under a platform Kconfig so that some
other CPUfreq option can be chosen. Let's remove the option so
that we can always enable CPUfreq drivers on unicore32 platforms.
Guan Xuetao [Mon, 26 May 2014 23:53:10 +0000 (07:53 +0800)]
UniCore32: Change git tree location information in MAINTAINERS
UniCore32 git repo has moved to github.
Branch 'unicore32' is used for prepared patches, and automatically merged to linux-next.
Branch 'unicore32-working' is used for development.
Chen Gang [Tue, 27 May 2014 00:08:06 +0000 (08:08 +0800)]
arch: unicore32: ksyms: export '__cpuc_coherent_kern_range' to avoid compiling failure
flush_icache_range() is '__cpuc_coherent_kern_range' under unicore32,
and lkdtm.ko needs it. At present, '__cpuc_coherent_kern_range' is
still used by unicore32, so export it to avoid compiling failure.
The related error (with allmodconfig under unicore32):