Rik van Riel [Sat, 17 May 2014 13:19:25 +0000 (23:19 +1000)]
sysrq: rcu-ify __handle_sysrq
Echoing values into /proc/sysrq-trigger seems to be a popular way to get
information out of the kernel. However, dumping information about
thousands of processes, or hundreds of CPUs to serial console can result
in IRQs being blocked for minutes, resulting in various kinds of cascade
failures.
The most common failure is due to interrupts being blocked for a very long
time. This can lead to things like failed IO requests, and other things
the system cannot easily recover from.
This problem is easily fixable by making __handle_sysrq use RCU instead of
spin_lock_irqsave.
This leaves the warning that RCU grace periods have not elapsed for a long
time, but the system will come back from that automatically.
It also leaves sysrq-from-irq-context when the sysrq keys are pressed, but
that is probably desired since people want that to work in situations
where the system is already hosed.
The callers of register_sysrq_key and unregister_sysrq_key appear to be
capable of sleeping.
Signed-off-by: Rik van Riel <riel@redhat.com> Reported-by: Madper Xie <cxie@redhat.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Sat, 17 May 2014 13:19:24 +0000 (23:19 +1000)]
kernel/kprobes.c: convert printk to pr_foo()
Also fixes some checkpatch warnings
-Static initialization
-Lines over 80 characters
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andy Lutomirski [Sat, 17 May 2014 13:19:24 +0000 (23:19 +1000)]
x86,vdso: fix an OOPS accessing the hpet mapping w/o an hpet
The oops can be triggered in qemu using -no-hpet (but not nohpet) by
reading a couple of pages past the end of the vdso text. This should send
SIGBUS instead of OOPSing.
x86, vdso: Add 32 bit VDSO time support for 32 bit kernel
which is new in 3.15.
This will be fixed separately in 3.15, but that patch will not apply to
tip/x86/vdso. This is the equivalent fix for tip/x86/vdso and,
presumably, 3.16.
Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stefani Seibold <stefani@seibold.net> Cc: <stable@vger.kernel.org> [needs rework for 3.15 and earlier] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Davidlohr Bueso [Sat, 17 May 2014 13:19:24 +0000 (23:19 +1000)]
rwsem: Support optimistic spinning
We have reached the point where our mutexes are quite fine tuned for a
number of situations. This includes the use of heuristics and optimistic
spinning, based on MCS locking techniques.
Exclusive ownership of read-write semaphores are, conceptually, just about
the same as mutexes, making them close cousins. To this end we need to
make them both perform similarly, and right now, rwsems are simply not up
to it. This was discovered by both reverting commit 4fc3f1d6 (mm/rmap,
migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable)
and similarly, converting some other mutexes (ie: i_mmap_mutex) to rwsems.
This creates a situation where users have to choose between a rwsem and
mutex taking into account this important performance difference.
Specifically, biggest difference between both locks is when we fail to
acquire a mutex in the fastpath, optimistic spinning comes in to play and
we can avoid a large amount of unnecessary sleeping and overhead of moving
tasks in and out of wait queue. Rwsems do not have such logic.
This patch, based on the work from Tim Chen and I, adds support for
write-side optimistic spinning when the lock is contended. It also
includes support for the recently added cancelable MCS locking for
adaptive spinning. Note that is is only applicable to the xadd method,
and the spinlock rwsem variant remains intact.
Allowing optimistic spinning before putting the writer on the wait queue
reduces wait queue contention and provided greater chance for the rwsem to
get acquired. With these changes, rwsem is on par with mutex. The
performance benefits can be seen on a number of workloads. For instance,
on a 8 socket, 80 core 64bit Westmere box, aim7 shows the following
improvements in throughput:
There was also improvement on smaller systems, such as a quad-core x86-64
laptop running a 30Gb PostgreSQL (pgbench) workload for up to +60% in
throughput for over 50 clients. Additionally, benefits were also noticed
in exim (mail server) workloads. When comparing against regular
non-blocking rw locks ([q]rwlock_t), this change proves that it can
outperform them, for instance when studying the popular anon-vma lock:
kernel/watchdog.c: In function `watchdog_timer_fn':
kernel/watchdog.c:368:4: warning: `smp_mb__after_clear_bit' is deprecated (declared at include/linux/bitops.h:48) [-Wdeprecated-declarations]
smp_mb__after_clear_bit();
That code was introduced in commit 90e6b763ca8a5eb739e59489f42d45e13431d157
("kernel/watchdog.c: print traces for all cpus on lockup detection") and then
merged with another branch containing commit febdbfe8a91ce0d11939d4940b592eb0dba8d663 ("arch: Prepare for
smp_mb__{before,after}_atomic()") which deprecates the
smp_mb__after_clear_bit() call in favour of smp_mb__after_atomic().
Signed-off-by: Jan Moskyto Matejka <mq@suse.cz> Acked-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aaron Tomlin <atomlin@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Don Zickus <dzickus@redhat.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Aaron Tomlin [Sat, 17 May 2014 13:19:22 +0000 (23:19 +1000)]
kernel/watchdog.c: print traces for all cpus on lockup detection
A 'softlockup' is defined as a bug that causes the kernel to loop in
kernel mode for more than a predefined period to time, without giving
other tasks a chance to run.
Currently, upon detection of this condition by the per-cpu watchdog task,
debug information (including a stack trace) is sent to the system log.
On some occasions, we have observed that the "victim" rather than the
actual "culprit" (i.e. the owner/holder of the contended resource) is
reported to the user. Often this information has proven to be
insufficient to assist debugging efforts.
To avoid loss of useful debug information, for architectures which support
NMI, this patch makes it possible to improve soft lockup reporting. This
is accomplished by issuing an NMI to each cpu to obtain a stack trace.
If NMI is not supported we just revert back to the old method. A sysctl
and boot-time parameter is available to toggle this feature.
[dzickus@redhat.com: add CONFIG_SMP in certain areas] Signed-off-by: Aaron Tomlin <atomlin@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Davidlohr Bueso [Sat, 17 May 2014 13:19:22 +0000 (23:19 +1000)]
blackfin/ptrace: call find_vma with the mmap_sem held
Performing vma lookups without taking the mm->mmap_sem is asking for
trouble. While doing the search, the vma in question can be modified or
even removed before returning to the caller. Take the lock (shared) in
order to avoid races while iterating through the vmacache and/or rbtree.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com> Cc: Steven Miao <realmz6@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
rtc: s5m: consolidate two device type switch statements
In probe the configuration of driver for different chipsets was done in
two switch (pdata->device_type) statements. Consolidate them into one
switch statement to increase code readability.
Additionally check the return value of regmap_irq_get_virq and exit probe
on error.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add support for S2MPS14 to the rtc-s5m driver. Differences in S2MPS14
(in comparison to S5M8767):
- Layout of registers;
- Lack of century support for time and alarms (7 registers used for
storing time/alarm);
- Two buffer control registers: WUDR and RUDR;
- No register for enabling writing time;
- RTC interrupts are reported in main PMIC I2C device;
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Prepare for adding support for S2MPS14 RTC device to the
rtc-s5m driver:
1. Add a map of registers used by the driver which differ between
the chipsets (S5M876X and S2MPS14).
2. Move code of checking for alarm pending to separate function.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Set the time needed for updating alarm and time registers to 0.45 ms.
The default is 7.32 ms which is too long and leads to warnings when
setting alarm or time:
s5m-rtc: waiting for UDR update, reached max number of retries
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
rtc: s5m: remove undocumented time init on first boot
Remove the code for initializing time if this is first boot.
The code for detecting first boot uses undocumented field RTC_TCON in
RTC_UDR_CON register. According to S5M8767's datasheet this field is
reserved. On S2MPS14 it is not documented at all. On device first boot
the registers will be initialized with reset value (2000-01-01 00:00:00).
The code might work on S5M8763 but still this does not look like a task
for RTC driver.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Prepare for adding support for S2MPS14 RTC device to the rtc-s5m driver:
1. Rename SEC* symbols to S5M.
2. Add S5M prefix to some of defines which are different between S5M876X
and S2MPS14.
This is only a rename-like patch, new code is not added.
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Lee Jones <lee.jones@linaro.org> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Sangbeom Kim <sbkim73@samsung.com> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Sat, 17 May 2014 13:19:15 +0000 (23:19 +1000)]
lib/test_bpf.c: don't use gcc union shortcut
Older gcc's (mine is gcc-4.4.4) make a mess of this.
lib/test_bpf.c:74: error: unknown field 'insns' specified in initializer
lib/test_bpf.c:75: warning: missing braces around initializer
lib/test_bpf.c:75: warning: (near initialization for 'tests[0].<anonymous>.insns[0]')
lib/test_bpf.c:76: error: extra brace group at end of initializer
lib/test_bpf.c:76: error: (near initialization for 'tests[0].<anonymous>')
lib/test_bpf.c:76: warning: excess elements in union initializer
lib/test_bpf.c:76: warning: (near initialization for 'tests[0].<anonymous>')
lib/test_bpf.c:77: error: extra brace group at end of initializer
Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Sat, 17 May 2014 13:19:15 +0000 (23:19 +1000)]
mm/page_io.c: work around gcc bug
gcc-4.4.4 (at least) screws up this initialization.
mm/page_io.c: In function '__swap_writepage':
mm/page_io.c:277: error: unknown field 'bvec' specified in initializer
mm/page_io.c:278: warning: excess elements in struct initializer
mm/page_io.c:278: warning: (near initialization for 'from')