Lai Jiangshan [Thu, 22 May 2014 00:44:09 +0000 (10:44 +1000)]
idr: don't need to shink the free list when idr_remove()
After idr subsystem is changed to RCU-awared, the free layer will not go
to the free list. The free list will not be filled up when idr_remove().
So we don't need to shink it too.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lai Jiangshan [Thu, 22 May 2014 00:44:08 +0000 (10:44 +1000)]
idr: fix idr_replace()'s returned error code
When the smaller id is not found, idr_replace() returns -ENOENT. But when
the id is bigger enough, idr_replace() returns -EINVAL, actually there is
no difference between these two kinds of ids.
These are all unallocated id, the return values of the idr_replace() for
these ids should be the same: -ENOENT.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lai Jiangshan [Thu, 22 May 2014 00:44:08 +0000 (10:44 +1000)]
idr: fix NULL pointer dereference when ida_remove(unallocated_id)
If the ida has at least one existing id, and when an unallocated ID which
meets a certain condition is passed to the ida_remove(), the system will
crash because it hits NULL pointer dereference.
The condition is that the unallocated ID shares the same lowest idr layer
with the existing ID, but the idr slot would be different if the
unallocated ID were to be allocated.
In this case the matching idr slot for the unallocated_id is NULL, causing
@bitmap to be NULL which the function dereferences without checking
crashing the kernel.
See the test code:
static void test3(void)
{
int id;
DEFINE_IDA(test_ida);
printk(KERN_INFO "Start test3\n");
if (ida_pre_get(&test_ida, GFP_KERNEL) < 0) return;
if (ida_get_new(&test_ida, &id) < 0) return;
ida_remove(&test_ida, 4000); /* bug: null deference here */
printk(KERN_INFO "End of test3\n");
}
It happens only when the caller tries to free an unallocated ID which is
the caller's fault. It is not a bug. But it is better to add the proper
check and complain rather than crashing the kernel.
[tj@kernel.org: updated patch description] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
It happens only when the caller tries to free an unallocated ID which is
the caller's fault. It is not a bug. But it is better to add the proper
check and complain rather than removing an existing_id silently.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lai Jiangshan [Thu, 22 May 2014 00:44:08 +0000 (10:44 +1000)]
idr: fix overflow bug during maximum ID calculation at maximum height
idr_replace() open-codes the logic to calculate the maximum valid ID given
the height of the idr tree; unfortunately, the open-coded logic doesn't
account for the fact that the top layer may have unused slots and
over-shifts the limit to zero when the tree is at its maximum height.
The following test code shows it fails to replace the value for
id=((1<<27)+42):
static void test5(void)
{
int id;
DEFINE_IDR(test_idr);
#define TEST5_START ((1<<27)+42) /* use the highest layer */
Fix the bug by using idr_max() which correctly takes into account the
maximum allowed shift.
sub_alloc() shares the same problem and may incorrectly fail with -EAGAIN;
however, this bug doesn't affect correct operation because
idr_get_empty_slot(), which already uses idr_max(), retries with the
increased @id in such cases.
[tj@kernel.org: Updated patch description.] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Petr Tesarik [Thu, 22 May 2014 00:44:07 +0000 (10:44 +1000)]
kexec: save PG_head_mask in VMCOREINFO
To allow filtering of huge pages, makedumpfile must be able to identify
them in the dump. This can be done by checking the appropriate page flag,
so communicate its value to makedumpfile through the VMCOREINFO interface.
There's only one small catch. Depending on how many page flags are
available on a given architecture, this bit can be called PG_head or
PG_compound.
I sent a similar patch back in 2012, but Eric Biederman did not like using
an #ifdef. So, this time I'm adding a common symbol (PG_head_mask)
instead.
See https://lkml.org/lkml/2012/11/28/91 for the previous version.
Signed-off-by: Petr Tesarik <ptesarik@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Shaohua Li <shli@kernel.org> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Masami Hiramatsu [Thu, 22 May 2014 00:44:07 +0000 (10:44 +1000)]
kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers
Add a "crash_kexec_post_notifiers" boot option to run kdump after running
panic_notifiers and dump kmsg. This can help rare situations where kdump
fails because of unstable crashed kernel or hardware failure (memory
corruption on critical data/code), or the 2nd kernel is already broken by
the 1st kernel (it's a broken behavior, but who can guarantee that the
"crashed" kernel works correctly?).
Usage: add "crash_kexec_post_notifiers" to kernel boot option.
Note that this actually increases risks of the failure of kdump.
This option should be set only if you worry about the rare case
of kdump failure rather than increasing the chance of success.
Srivatsa S. Bhat [Thu, 22 May 2014 00:44:06 +0000 (10:44 +1000)]
CPU hotplug, smp: Flush any pending IPI callbacks before CPU offline
During CPU offline, in the stop-machine loop, we use 2 separate stages to
disable interrupts, to ensure that the CPU going offline doesn't get any
new IPIs from the other CPUs after it has gone offline.
However, an IPI sent much earlier might arrive late on the target CPU
(possibly _after_ the CPU has gone offline) due to hardware latencies, and
due to this, the smp-call-function callbacks queued on the outgoing CPU
might not get noticed (and hence not executed) at all.
This is somewhat theoretical, but in any case, it makes sense to
explicitly loop through the call_single_queue and flush any pending
callbacks before the CPU goes completely offline. So, flush the queued
smp-call-function callbacks in the MULTI_STOP_DISABLE_IRQ_ACTIVE stage,
after disabling interrupts on the active CPU. This can be trivially
achieved by invoking the generic_smp_call_function_single_interrupt()
function itself (and since the outgoing CPU is still online at this point,
we won't trigger the "IPI to offline CPU" warning in this function; so we
are safe to call it here).
This way, we would have handled all the queued callbacks before going
offline, and also, no new IPIs can be sent by the other CPUs to the
outgoing CPU at that point, because they will all be executing the
stop-machine code with interrupts disabled.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Suggested-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Srivatsa S. Bhat [Thu, 22 May 2014 00:44:06 +0000 (10:44 +1000)]
CPU hotplug, stop-machine: plug race-window that leads to "IPI-to-offline-CPU"
During CPU offline, stop-machine is used to take control over all the
online CPUs (via the per-cpu stopper thread) and then run take_cpu_down()
on the CPU that is to be taken offline.
But stop-machine itself has several stages: _PREPARE, _DISABLE_IRQ, _RUN
etc. The important thing to note here is that the _DISABLE_IRQ stage
comes much later after starting stop-machine, and hence there is a large
window where other CPUs can send IPIs to the CPU going offline. As a
result, we can encounter a scenario as depicted below, which causes IPIs
to be sent to the CPU going offline, and that CPU notices them *after* it
has gone offline, triggering the "IPI-to-offline-CPU" warning from the
smp-call-function code.
CPU 1 CPU 2
(Online CPU) (CPU going offline)
Enter _PREPARE stage Enter _PREPARE stage
Enter _DISABLE_IRQ stage
=
Got a device interrupt, | Didn't notice the IPI
and the interrupt handler | since interrupts were
called smp_call_function() | disabled on this CPU.
and sent an IPI to CPU 2. |
=
Enter _DISABLE_IRQ stage
Enter _RUN stage Enter _RUN stage
=
Busy loop with interrupts | Invoke take_cpu_down()
disabled. | and take CPU 2 offline
=
Enter _EXIT stage Enter _EXIT stage
Re-enable interrupts Re-enable interrupts
The pending IPI is noted
immediately, but alas,
the CPU is offline at
this point.
So, as we can observe from this scenario, the IPI was sent when CPU 2 was
still online, and hence it was perfectly legal. But unfortunately it was
noted only after CPU 2 went offline, resulting in the warning from the IPI
handling code. In other words, the fault was not at the sender, but at
the receiver side - and if we look closely, the real bug is in the
stop-machine sequence itself.
The problem here is that the CPU going offline disabled its local
interrupts (by entering _DISABLE_IRQ phase) *before* the other CPUs. And
that's the reason why it was not able to respond to the IPI before going
offline.
A simple solution to this problem is to ensure that the CPU going offline
disables its interrupts only *after* the other CPUs do the same thing. To
achieve this, split the _DISABLE_IRQ state into 2 parts:
1st part: MULTI_STOP_DISABLE_IRQ_INACTIVE, where only the non-active CPUs
(i.e., the "other" CPUs) disable their interrupts.
2nd part: MULTI_STOP_DISABLE_IRQ_ACTIVE, where the active CPU (i.e., the
CPU going offline) disables its interrupts.
With this in place, the CPU going offline will always be the last one to
disable interrupts. After this step, no further IPIs can be sent to the
outgoing CPU, since all the other CPUs would be executing the stop-machine
code with interrupts disabled. And by the time stop-machine ends, the CPU
would have gone offline and disappeared from the cpu_online_mask, and
hence future invocations of smp_call_function() and friends will
automatically prune that CPU out. Thus, we can guarantee that no CPU will
end up *inadvertently* sending IPIs to an offline CPU.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Srivatsa S. Bhat [Thu, 22 May 2014 00:44:05 +0000 (10:44 +1000)]
smp: print more useful debug info upon receiving IPI on an offline CPU
There is a longstanding problem related to CPU hotplug which causes IPIs
to be delivered to offline CPUs, and the smp-call-function IPI handler
code prints out a warning whenever this is detected. Every once in a
while this (usually harmless) warning gets reported on LKML, but so far it
has not been completely fixed. Usually the solution involves finding out
the IPI sender and fixing it by adding appropriate synchronization with
CPU hotplug.
However, while going through one such internal bug reports, I found that
there is a significant bug in the receiver side itself (more specifically,
in stop-machine) that can lead to this problem even when the sender code
is perfectly fine. This patchset fixes that synchronization problem in
the CPU hotplug stop-machine code.
Patch 1 adds some additional debug code to the smp-call-function
framework, to help debug such issues easily.
Patch 2 modifies the stop-machine code to ensure that any IPIs that were
sent while the target CPU was online, would be noticed and handled by that
CPU without fail before it goes offline. Thus, this avoids scenarios
where IPIs are received on offline CPUs (as long as the sender uses proper
hotplug synchronization).
In fact, I debugged the problem by using Patch 1, and found that the
payload of the IPI was always the block layer's trigger_softirq()
function. But I was not able to find anything wrong with the block layer
code. That's when I started looking at the stop-machine code and realized
that there is a race-window which makes the IPI _receiver_ the culprit,
not the sender. Patch 2 fixes that race and hence this should put an end
to most of the hard-to-debug IPI-to-offline-CPU issues.
This patch (of 2):
Today the smp-call-function code just prints a warning if we get an IPI on
an offline CPU. This info is sufficient to let us know that something
went wrong, but often it is very hard to debug exactly who sent the IPI
and why, from this info alone.
In most cases, we get the warning about the IPI to an offline CPU,
immediately after the CPU going offline comes out of the stop-machine
phase and reenables interrupts. Since all online CPUs participate in
stop-machine, the information regarding the sender of the IPI is already
lost by the time we exit the stop-machine loop. So even if we dump the
stack on each CPU at this point, we won't find anything useful since all
of them will show the stack-trace of the stopper thread. So we need a
better way to figure out who sent the IPI and why.
To achieve this, when we detect an IPI targeted to an offline CPU, loop
through the call-single-data linked list and print out the payload (i.e.,
the name of the function which was supposed to be executed by the target
CPU). This would give us an insight as to who might have sent the IPI and
help us debug this further.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <mgalbraith@suse.de> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 22 May 2014 00:44:04 +0000 (10:44 +1000)]
signals: introduce kernel_sigaction()
Now that allow_signal() is really trivial we can unify it with
disallow_signal(). Add the new helper, kernel_sigaction(), and
reimplement allow_signal/disallow_signal as a trivial wrappers.
This saves one EXPORT_SYMBOL() and the new helper can have more users.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 22 May 2014 00:44:04 +0000 (10:44 +1000)]
signals: disallow_signal() should flush the potentially pending signal
disallow_signal() simply sets SIG_IGN, this is not enough and
recalc_sigpending() is simply pointless because in can never change the
state of TIF_SIGPENDING.
If we ignore a signal, we also need to do flush_sigqueue_mask() for the
case when this signal is pending, this way recalc_sigpending() can
actually clear TIF_SIGPENDING and we do not "leak" the allocated
siginfo's.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 22 May 2014 00:44:03 +0000 (10:44 +1000)]
signals: kill the obsolete sigdelset() and recalc_sigpending() in allow_signal()
allow_signal() does sigdelset(current->blocked) due to historic reason,
previously it could be called by a daemonize()'ed kthread, and daemonize()
played with current->blocked.
Now that daemonize() has gone away we can remove sigdelset() and
recalc_sigpending(). If a user really wants to unblock a signal, it must
use sigprocmask() or set_current_block() explicitely.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 22 May 2014 00:44:03 +0000 (10:44 +1000)]
signals: jffs2: fix the wrong usage of disallow_signal()
jffs2_garbage_collect_thread() does disallow_signal(SIGHUP) around
jffs2_garbage_collect_pass() and the comment says "We don't want SIGHUP to
interrupt us".
But disallow_signal() can't ensure that jffs2_garbage_collect_pass() won't
be interrupted by SIGHUP, the problem is that SIGHUP can be already
pending when disallow_signal() is called, and in this case any
interruptible sleep won't block.
Note: this is in fact because disallow_signal() is buggy and should be
fixed, see the next changes.
But there is another reason why disallow_signal() is wrong: SIG_IGN set by
disallow_signal() silently discards any SIGHUP which can be sent before
the next allow_signal(SIGHUP).
Change this code to use sigprocmask(SIG_UNBLOCK/SIG_BLOCK, SIGHUP). This
even matches the old (and wrong) semantics allow/disallow had when this
logic was written.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 22 May 2014 00:44:03 +0000 (10:44 +1000)]
signals: mv {dis,}allow_signal() from sched.h/exit.c to signal.[ch]
Move the declaration/definition of allow_signal/disallow_signal to
signal.h/signal.c. The new place is more logical and allows to use the
static helpers in signal.c (see the next changes).
While at it, make them return void and remove the valid_signal() check.
Nobody checks the returned value, and in-kernel users must not pass the
wrong signal number.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 22 May 2014 00:44:03 +0000 (10:44 +1000)]
signals: cleanup the usage of t/current in do_sigaction()
The usage of "task_struct *t" and "current" in do_sigaction() looks really
annoying and chaotic. Initially "t" is used as a cached value of current
but not consistently, then it is reused as a loop variable and we have to
use "current" again.
Clean up this mess and also convert the code to use for_each_thread().
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 22 May 2014 00:44:02 +0000 (10:44 +1000)]
signals: rename rm_from_queue_full() to flush_sigqueue_mask()
"rm_from_queue_full" looks ugly and misleading, especially now that
rm_from_queue() has gone away. Rename it to flush_sigqueue_mask(), this
matches flush_sigqueue() we already have.
Also remove the obsolete comment which explains the difference with
rm_from_queue() we already killed.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Matthew Dempsky [Thu, 22 May 2014 00:44:01 +0000 (10:44 +1000)]
ptrace: fix fork event messages across pid namespaces
When tracing a process in another pid namespace, it's important for fork
event messages to contain the child's pid as seen from the tracer's pid
namespace, not the parent's. Otherwise, the tracer won't be able to
correlate the fork event with later SIGTRAP signals it receives from the
child.
We still risk a race condition if a ptracer from a different pid namespace
attaches after we compute the pid_t value. However, sending a bogus fork
event message in this unlikely scenario is still a vast improvement over
the status quo where we always send bogus fork event messages to debuggers
in a different pid namespace than the forking process.
Signed-off-by: Matthew Dempsky <mdempsky@chromium.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Julien Tinnes <jln@chromium.org> Cc: Roland McGrath <mcgrathr@chromium.org> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Jacob Keller [Thu, 22 May 2014 00:44:01 +0000 (10:44 +1000)]
Documentation/SubmittingPatches: describe the Fixes: tag
Update the SubmittingPatches process to include howto about the new
'Fixes:' tag to be used when a patch fixes an issue in a previous commit
(found by git-bisect for example).
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fs/fat/inode.c: clean up string initializations (char[] instead of char *)
Initializations like 'char *foo = "bar"' will create two variables: a
static string and a pointer (foo) to that static string. Instead 'char
foo[] = "bar"' will declare a single variable and will end up in shorter
assembly (according to Jeff Garzik on the KernelJanitor's TODO list).
Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Conrad Meyer [Thu, 22 May 2014 00:44:00 +0000 (10:44 +1000)]
fs/fat/: add support for DOS 1.x formatted volumes
Add structure for parsed BPB information, struct fat_bios_param_block, and
move all of the deserialization and validation logic from fat_fill_super()
into fat_read_bpb().
Add a 'dos1xfloppy' mount option to infer DOS 2.x BIOS Parameter Block
defaults from block device geometry for ancient floppies and floppy
images, as a fall-back from the default BPB parsing logic.
When fat_read_bpb() finds an invalid FAT filesystem and dos1xfloppy is
set, fall back to fat_read_static_bpb(). fat_read_static_bpb() validates
that the entire BPB is zero, and that the floppy has a DOS-style 8086 code
bootstrapping header. Then it fills in default BPB values from media size
and a table.[0]
Media size is assumed to be static for archaic FAT volumes. See also:
[1].
Sougata Santra [Thu, 22 May 2014 00:43:59 +0000 (10:43 +1000)]
hfsplus: fix longname handling
Longname is not correctly handled by hfsplus driver. If an attempt to
create a longname(>255) file/directory is made, it succeeds by creating a
file/directory with HFSPLUS_MAX_STRLEN and incorrect catalog key. Thus
leaving the volume in an inconsistent state. This patch fixes this issue.
Although lookup is always called first to create a negative entry, so just
doing a check in lookup would probably fix this issue. I choose to
propagate error to other iops as well.
Please NOTE: I have factored out hfsplus_cat_build_key_with_cnid from
hfsplus_cat_build_key, to avoid unncessary branching.
mkdir $dir
cd $dir
touch $name255
rm -f $name255
touch $name256
ls -la
cd $cdir
rm -rf $dir
RESULT:
-------
[sougata@ultrabook tmp]$ cdir=`pwd`
[sougata@ultrabook tmp]$
name255="_123456789_123456789_123456789_123456789_123456789_123456789\
> _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
> _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
> _123456789_123456789_123456789_123456789_123456789_1234"
[sougata@ultrabook tmp]$ name256="${name255}5"
[sougata@ultrabook tmp]$
[sougata@ultrabook tmp]$ mkdir $dir
[sougata@ultrabook tmp]$ cd $dir
[sougata@ultrabook TEST_DIR]$ touch $name255
[sougata@ultrabook TEST_DIR]$ rm -f $name255
[sougata@ultrabook TEST_DIR]$ touch $name256
[sougata@ultrabook TEST_DIR]$ ls -la
ls: cannot access
_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234:
No such file or directory
total 0
drwxrwxr-x 1 sougata sougata 3 Feb 20 19:56 .
drwxrwxrwx 1 root root 6 Feb 20 19:56 ..
-????????? ? ? ? ? ?
_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234
[sougata@ultrabook TEST_DIR]$ cd $cdir
[sougata@ultrabook tmp]$ rm -rf $dir
rm: cannot remove `TEST_DIR': Directory not empty
-ENAMETOOLONG returned from hfsplus_asc2uni was not propaged to iops.
This allowed hfsplus to create files/directories with HFSPLUS_MAX_STRLEN
and incorrect keys, leaving the FS in an inconsistent state. This patch
fixes this issue.
Signed-off-by: Sougata Santra <sougata@tuxera.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sergei Antonov [Thu, 22 May 2014 00:43:59 +0000 (10:43 +1000)]
hfsplus: fix "unused node is not erased" error
Zero newly allocated extents in the catalog tree if volume attributes tell
us to. Not doing so we risk getting the "unused node is not erased"
error. See kHFSUnusedNodeFix flag in Apple's source code for reference.
There was a previous commit clearing the node when it is freed:
commit 899bed05e9f6bbb21776f9ebd88f5631987f987a
Author: Vyacheslav Dubeyko <slava@dubeyko.com>
Date: Wed Feb 27 17:03:06 2013 -0800
hfsplus: fix issue with unzeroed unused b-tree nodes
It did not handle newly allocated extents (this patch fixes it). And it zeroed
nodes in all trees unconditionally which is an overkill. This patch adds a
condition and also switches to 'tree->node_size' as a simpler method of getting
the length to zero.
Signed-off-by: Sergei Antonov <saproj@gmail.com> Cc: Anton Altaparmakov <aia21@cam.ac.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Kyle Laracey <kalaracey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sergei Antonov [Thu, 22 May 2014 00:43:58 +0000 (10:43 +1000)]
hfsplus: emit proper file type from readdir
hfsplus_readdir() incorrectly returned DT_REG for symbolic links and
special files. Return DT_REG, DT_LNK, DT_FIFO, DT_CHR, DT_BLK, DT_SOCK,
or DT_UNKNOWN according to mode field in catalog record. Programs relying
on information from readdir will now work correctly with HFS+.
Signed-off-by: Sergei Antonov <saproj@gmail.com> Cc: Anton Altaparmakov <aia21@cam.ac.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The directory/file catalog b-tree equivalent, hfsplus_build_key_uni(), is
used by hfsplus_find_cat() for internal referencing between catalog
records. There is no corresponding usage for attributes - attribute
records do not refer to one another.
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Sougata Santra <sougata@tuxera.com> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fs/hfsplus/xattr_security.c: In function 'hfsplus_security_getxattr':
fs/hfsplus/xattr_security.c:23: error: 'NLS_MAX_CHARSET_SIZE' undeclared (first use in this function)
fs/hfsplus/xattr_security.c:23: error: (Each undeclared identifier is reported o
Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Sougata Santra <sougata@tuxera.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
fs/hfsplus/xattr_user.c: In function 'hfsplus_user_getxattr':
fs/hfsplus/xattr_user.c:21: error: 'NLS_MAX_CHARSET_SIZE' undeclared (first use in this function)
fs/hfsplus/xattr_user.c:21: error: (Each undeclared identifier is reported only once
Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Sougata Santra <sougata@tuxera.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hin-Tak Leung [Thu, 22 May 2014 00:43:57 +0000 (10:43 +1000)]
hfsplus: correct usage of HFSPLUS_ATTR_MAX_STRLEN for non-English attributes
HFSPLUS_ATTR_MAX_STRLEN (=127) is the limit of attribute names for the
number of unicode character (UTF-16BE) storable in the HFS+ file system.
Almost all the current usage of it is wrong, in relation to NLS to on-disk
conversion.
Except for one use calling hfsplus_asc2uni (which should stay the same)
and its uses in calling hfsplus_uni2asc (which was corrected in the
earlier patch in this series concerning usage of hfsplus_uni2asc), all the
other uses are of the forms:
Conversion between on-disk unicode representation and NLS char strings (in
whichever direction) always needs to accommodate the worst-case NLS
conversion, so all char buffers of that size need to have a
NLS_MAX_CHARSET_SIZE x .
The bound checks are all wrong, since they compare nls_length derived from
strlen() to a unicode length limit.
It turns out that all the bound-checks do is to protect hfsplus_asc2uni(),
which can fail if the input is too large. There is only one usage of it
as far as attributes are concerned, in hfsplus_attr_build_key(). It is in
turn used by hfsplus_find_attr(), hfsplus_create_attr(),
hfsplus_delete_attr(). Thus making sure that errors from
hfsplus_asc2uni() is caught in hfsplus_attr_build_key() and propagated is
sufficient to replace all the bound checks.
Unpropagated errors from hfsplus_asc2uni() in the file catalog code was
addressed recently in an independent patch "hfsplus: fix longname handling"
by Sougata Santra.
Before this patch, trying to set a 55 CJK character (in a UTF-8
locale, > 127/3=42) attribute plus user prefix fails with:
(= "pointlessly long attribute for testing", elaborate Chinese in
UTF-8 enoding).
However, it is not possible to set double the size (110 + 5 is still
under 127) in a UTF-8 locale:
$setfattr -n user.`cat testing-string testing-string` -v \
`cat testing-string testing-string` testing-string
setfattr: testing-string: Numerical result out of range
110 CJK char in UTF-8 is 330 bytes - the generic get/set attribute system
call code in linux/fs/xattr.c imposes a 255 byte limit. One can use a
combination of iconv to encode content, changing terminal locale for
viewing, and an nls=cp932/cp936/cp949/cp950 mount option to fully use
127-unicode attribute in a double-byte locale.
Also, as an additional information, it is possible to (mis-)use unicode
half-width/full-width forms (U+FFxx) to write attributes which looks like
english but not actually ascii.
Thanks Anton Altaparmakov for reviewing the earlier ideas behind this
change.
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Sougata Santra <sougata@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Hin-Tak Leung [Thu, 22 May 2014 00:43:56 +0000 (10:43 +1000)]
hfsplus: fix worst-case unicode to char conversion of file names and attributes
This is a series of 3 patches which corrects issues in HFS+ concerning the
use of non-english file names and attributes. Names and attributes are
stored internally as UTF-16 units up to a fixed maximum size, and convert
to and from user-representation by NLS. The code incorrectly assume that
NLS string lengths are equal to unicode lengths, which is only true for
English ascii usage.
This patch (of 3):
The HFS Plus Volume Format specification (TN1150) states that file names
are stored internally as a maximum of 255 unicode characters, as defined
by The Unicode Standard, Version 2.0 [Unicode, Inc. ISBN 0-201-48345-9].
File names are converted by the NLS system on Linux before presented to
the user.
255 CJK characters converts to UTF-8 with 1 unicode character to up to 3
bytes, and to GB18030 with 1 unicode character to up to 4 bytes. Thus,
trying in a UTF-8 locale to list files with names of more than 85 CJK
characters results in:
$ ls /mnt
ls: reading directory /mnt: File name too long
The receiving buffer to hfsplus_uni2asc() needs to be 255 x
NLS_MAX_CHARSET_SIZE bytes, not 255 bytes as the code has always been.
Similar consideration applies to attributes, which are stored internally
as a maximum of 127 UTF-16BE units. See XNU source for an up-to-date
reference on attributes.
Strictly speaking, the maximum value of NLS_MAX_CHARSET_SIZE = 6 is not
attainable in the case of conversion to UTF-8, as going beyond 3 bytes
requires the use of surrogate pairs, i.e. consuming two input units.
Thanks Anton Altaparmakov for reviewing an earlier version of this change.
This patch fixes all callers of hfsplus_uni2asc(), and also enables the
use of long non-English file names in HFS+. The getting and setting, and
general usage of long non-English attributes requires further forthcoming
work, in the following patches of this series.
Signed-off-by: Hin-Tak Leung <htl10@users.sourceforge.net> Reviewed-by: Anton Altaparmakov <anton@tuxera.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Sougata Santra <sougata@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Thu, 22 May 2014 00:43:56 +0000 (10:43 +1000)]
fs/coda: use __func__
Replace all function names by __func__ in pr_foo()
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Thu, 22 May 2014 00:43:56 +0000 (10:43 +1000)]
fs/coda: logging prefix uniformization
- Add pr_fmt based on module name.
- Remove Coda: coda: from pr_foo()
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Thu, 22 May 2014 00:43:55 +0000 (10:43 +1000)]
fs/coda: replace printk by pr_foo()
No level printk converted to pr_warn or pr_info
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Jan Harkes <jaharkes@cs.cmu.edu> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Fabian Frederick [Thu, 22 May 2014 00:43:55 +0000 (10:43 +1000)]
fs/isofs: logging clean-up
-All printk(KERN_foo converted to pr_foo()
-Default printk converted to pr_warn()
-Define DEBUG in pr_debug callsites to keep old printk(DEBUG behaviour
-Add DEBUG_FLAGS in Makefile for previous #ifdef DEBUG
-Coalesce format fragments.
-Separate format/arguments on lines > 80 characters.
-Add ISOFS, ISOFS Rock, zisofs pr_fmt
Signed-off-by: Fabian Frederick <fabf@skynet.be> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mike Frysinger [Thu, 22 May 2014 00:43:53 +0000 (10:43 +1000)]
drivers/rtc/rtc-bfin.c: do not abort when requesting irq fails
The RTC framework does not let you return an error once a call to
devm_rtc_device_register has succeeded. Avoid doing that when the IRQ
request fails as we can still support reading/writing the clock without
the IRQ.
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Reported-by: Ales Novak <alnovak@suse.cz> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sekhar Nori [Thu, 22 May 2014 00:43:53 +0000 (10:43 +1000)]
drivers/rtc/rtc-omap.c: add support for enabling 32khz clock
Newer versions of OMAP RTC IP such as those found in AM335x and DRA7x need
an explicit enable of 32khz functional clock which ticks the RTC.
AM335x support was working so far because of settings done in U-Boot.
However, the DRA7x U-Boot does no such enable of 32khz clock and this
patch is need to get the RTC to work on DRA7x at least. In general, it is
better to not depend on settings done in U-Boot.
Thanks to Lokesh Vutla for noticing this.
Signed-off-by: Sekhar Nori <nsekhar@ti.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Sekhar Nori [Thu, 22 May 2014 00:43:53 +0000 (10:43 +1000)]
drivers/rtc/rtc-omap.c: remove multiple device id checks
Remove multiple superfluous device id checks. Since an id_table is
present in the driver probe() should never encounter an empty device id
entry. In case of OF style match, of_match_device() returns an matching
entry.
For paranoia sake, check for device id entry once and fail probe() if none
is found. This is much better than checking for it multiple times.
Signed-off-by: Sekhar Nori <nsekhar@ti.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Lokesh Vutla <lokeshvutla@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Andrew Morton [Thu, 22 May 2014 00:43:52 +0000 (10:43 +1000)]
rtc-da9063-rtc-driver-fix
coding-style tweaks
Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Dajun Chen <david.chen@diasemi.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Mark Brown <broonie@linaro.org> Cc: Opensource [Steve Twiss] <stwiss.opensource@diasemi.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Opensource [Steve Twiss] <stwiss.opensource@diasemi.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Lee Jones <lee.jones@linaro.org> Cc: Mark Brown <broonie@linaro.org> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: David Dajun Chen <david.chen@diasemi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Setting the alarm to a time not on a minute boundary results in repeated
interrupts being generated by the DA9052/3 PMIC device until the kernel
RTC core sees that the alarm has rung. Sometimes the number and frequency
of interrupts can cause the kernel to disable the IRQ line used by the
DA9052/3 PMIC with disasterous consequences. This patch fixes the
problem.
Even though the DA9052/3 PMIC is capable generating periodic interrupts,
ie TICKS, the method used to distinguish RTC_AF from RTC_PF events was
flawed and can not work in conjunction with the regmap_irq kernel core.
Thus that flawed detection has also been removed by the DA9052/3 PMIC RTC
driver's irq handler, so that it no longer reports the wrong type of event
to the kernel RTC core.
The internal static functions within the DA9052/3 PMIC RTC driver have
been changed to pass the 'da9052_rtc' structure instead of the 'da9052'
because there is no backwards pointer from the 'da9052' structure.
This patch fixes the three issues described above. The first is serious
because usiing the RTC alarm set to a non minute boundary will eventually
cause all component drivers that depend on the interrupt line to fail.
The solution adopted is to round up to alarm time to the next highest
minute.
The second bug, reporting a RTC_PF event instead of an RTC_AF event turns
out to not matter with the current implementation of the kernel RTC core
as it seems to ignore the event type. However, should that change in the
future it is better to fix the issue now and not have 'problems waiting to
happen'
The third set of changes are to make the da9052_rtc structure available to
all the local internal functions in the driver. This was done during
testing so that diagnostic data could be stored there. Should the
solution to the first issue be found not acceptable, then the alternative
of using the TICKS interrupt at the fixed one second interval in order to
step to the exact second of the requested alarm requires an extra (alarm
time) piece of data to be stored. In devices that use the alarm function
to wake up from sleep, accuracy to the second will result in the device
being awake for up to nearly a minute longer than expected.
Signed-off-by: Anthony Olech <anthony.olech.opensource@diasemi.com> Cc: David Dajun Chen <dchen@diasemi.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
drivers/rtc/rtc-cmos.c: drivers/char/rtc.c features for DECstation support
This brings in drivers/char/rtc.c functionality required for DECstation
and, should the maintainers decide to switch, Alpha systems to use
rtc-cmos.
Specifically these features are made available:
* RTC iomem rather than x86/PCI port I/O mapping, controlled with the
RTC_IOMAPPED macro as with the original driver. The DS1287A chip in all
DECstation systems is mapped in the host bus address space as a
contiguous block of 64 32-bit words of which the least significant byte
accesses the RTC chip for both reads and writes. All the address and
data window register accesses are made transparently by the chipset glue
logic so that the device appears directly mapped on the host bus.
* A way to set the size of the address space explicitly with the
newly-added `address_space' member of the platform part of the RTC
device structure. This avoids the unreliable heuristics that does not
work in a setup where the RTC is not explicitly accessed with the usual
address and data window register pair.
* The ability to use the RTC periodic interrupt as a system clock
device, which is implemented by arch/mips/kernel/cevt-ds1287.c for
DECstation systems and takes the RTC interrupt away from the RTC driver.
Eventually hooking back to the clock device's interrupt handler should
be possible for the purpose of the alarm clock and possibly also
update-in-progress interrupt, but this is not done by this change.
o To avoid interfering with the clock interrupt all the places where
the RTC interrupt mask is fiddled with are only executed if and IRQ
has been assigned to the RTC driver.
o To avoid changing the clock setup Register A is not fiddled with
if CMOS_RTC_FLAGS_NOFREQ is set in the newly-added `flags' member of
the platform part of the RTC device structure. Originally, in
drivers/char/rtc.c, this was keyed with the absence of the RTC
interrupt, just like the interrupt mask, but there only the periodic
interrupt frequency is set, whereas rtc-cmos also sets the divider
bits. Therefore a new flag is introduced so that systems where the
RTC interrupt is not usable rather than used as a system clock device
can fully initialise the RTC.
* A small clean-up is made to the IRQ assignment code that makes the IRQ
number hardcoded to -1 rather than arbitrary -ENXIO (or whatever error
happens to be returned by platform_get_irq) where no IRQ has been
assigned to the RTC driver (NO_IRQ might be another candidate, but it
looks like this macro has inconsistent or missing definitions and
limited use and might therefore be unsafe).
Verified to work correctly with a DECstation 5000/240 system.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Lee, Chun-Yi [Thu, 22 May 2014 00:43:48 +0000 (10:43 +1000)]
drivers/rtc/rtc-efi.c: avoid subtracting day twice when computing year days
Compared source code of rtc-lib.c::rtc_year_days() with
efirtc.c::rtc_year_days(), found the code in rtc-efi decreases value of
day twice when it computing year days. rtc-lib.c::rtc_year_days() has
already decrease days and return the year days from 0 to 365.
Ales Novak [Thu, 22 May 2014 00:43:46 +0000 (10:43 +1000)]
drivers/rtc/interface.c: fix for fix of alarm initialization
Seems the previous patch "fix infinite loop in initializing the alarm"
did break the infinite loop in alarm initialization, but not in the right
way. The loop indeed should walk through the not-leap years and stop on
the leap one.
This patch does apply on top of the previous one.
Signed-off-by: Ales Novak <alnovak@suse.cz> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Ales Novak [Thu, 22 May 2014 00:43:46 +0000 (10:43 +1000)]
drivers/rtc/interface.c: fix infinite loop in initializing the alarm
In __rtc_read_alarm(), if the alarm time retrieved by
rtc_read_alarm_internal() from the device contains invalid values (e.g.
month=2,mday=31) and the year not set (=-1), the initialization will loop
infinitely because the year-fixing loop expects the time being invalid due
to leap year.
Fix reduces the loop to the leap years and adds final validity check.
Signed-off-by: Ales Novak <alnovak@suse.cz> Acked-by: Alessandro Zummo <a.zummo@towertech.it> Reported-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Oleg Nesterov [Thu, 22 May 2014 00:43:45 +0000 (10:43 +1000)]
kthreads: kill CLONE_KERNEL, change kernel_thread(kernel_init) to avoid CLONE_SIGHAND
1. Remove CLONE_KERNEL, it has no users and it is dangerous.
The (old) comment says "List of flags we want to share for kernel
threads" but this is not true, we do not want to share ->sighand by
default. This flag can only be used if the caller is sure that both
parent/child will never play with signals (say, allow_signal/etc).
2. Change rest_init() to clone kernel_init() without CLONE_SIGHAND.
In this case CLONE_SIGHAND does not really hurt, and it looks like
optimization because copy_sighand() can avoid kmem_cache_alloc().
But in fact this only adds the minor pessimization. kernel_init()
is going to exec the init process, and de_thread() will need to
unshare ->sighand and do kmem_cache_alloc(sighand_cachep) anyway,
but it needs to do more work and take tasklist_lock and siglock.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When a module is built into the kernel the module_init() function becomes
an initcall. Sometimes debugging through dynamic debug can help, however,
debugging built in kernel modules is typically done by changing the
.config, recompiling, and booting the new kernel in an effort to determine
exactly which module caused a problem.
This patchset can be useful stand-alone or combined with initcall_debug.
There are cases where some initcalls can hang the machine before the
console can be flushed, which can make initcall_debug output inaccurate.
Having the ability to skip initcalls can help further debugging of these
scenarios.
Usage: initcall_blacklist=<list of comma separated initcalls>
ex) added "initcall_blacklist=sgi_uv_sysfs_init" as a kernel parameter and
the log contains:
Andrew Morton [Thu, 22 May 2014 00:43:44 +0000 (10:43 +1000)]
init/main.c: don't use pr_debug()
Pertially revert ea676e846a8171b8 ("init/main.c: convert to pr_foo()").
Unbeknownst to me, pr_debug() is different from the other pr_foo() levels:
pr_debug() is a no-op when DEBUG is not defined.
Happily, init/main.c does have a #define DEBUG so we didn't break
initcall_debug. But the functioning of initcall_debug should not be
dependent upon the presence of that #define DEBUG.
Reported-by: Russell King <rmk@arm.linux.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We observed this problem has been occurring since 2.6.30 with
fs/binfmt_elf.c: create_elf_tables()->get_random_bytes(), introduced by f06295b44c296c8f ("ELF: implement AT_RANDOM for glibc PRNG seeding").
/*
* Generate 16 random bytes for userspace PRNG seeding.
*/
get_random_bytes(k_rand_bytes, sizeof(k_rand_bytes));
The patch introduces a wrapper around get_random_int() which has lower
overhead than calling get_random_bytes() directly.
With this patch applied:
$ cat /proc/sys/kernel/random/entropy_avail
2731
$ cat /proc/sys/kernel/random/entropy_avail
2802
$ cat /proc/sys/kernel/random/entropy_avail
2878
Analyzed by John Sobecki.
This has been applied on a specific Oracle kernel and has been running on
the customer's production environment (the original bug reporter) for
several months; it has worked fine until now.
Signed-off-by: Jie Liu <jeff.liu@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <aedilger@gmail.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Arnd Bergmann <arnn@arndb.de> Cc: John Sobecki <john.sobecki@oracle.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Ted Ts'o <tytso@mit.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Kees Cook <keescook@chromium.org> Cc: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>