ptrace: __ptrace_unlink: use the ptrace_reparented() helper
Currently __ptrace_unlink() checks list_empty(->ptrace_list) to figure out
whether the child was reparented. Change the code to use ptrace_reparented()
to make this check more explicit and consistent.
No functional changes.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
exit.c has numerous "->exit_signal == -1" comparisons, this check is subtle
and deserves a helper. Imho makes the code more parseable for humans. At
least it's surely more greppable.
Also, a couple of whitespace cleanups. No functional changes.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Wed, 30 Apr 2008 07:53:10 +0000 (00:53 -0700)]
signals: x86 TS_RESTORE_SIGMASK
Replace TIF_RESTORE_SIGMASK with TS_RESTORE_SIGMASK and define our own
set_restore_sigmask() function. This saves the costly SMP-safe set_bit
operation, which we do not need for the sigmask flag since TIF_SIGPENDING
always has to be set too.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Wed, 30 Apr 2008 07:53:09 +0000 (00:53 -0700)]
signals: use HAVE_SET_RESTORE_SIGMASK
Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to
#ifdef HAVE_SET_RESTORE_SIGMASK. If arch code defines it first, the generic
set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Wed, 30 Apr 2008 07:53:08 +0000 (00:53 -0700)]
signals: s390: renumber TIF_RESTORE_SIGMASK
TIF_RESTORE_SIGMASK no longer needs to be in the _TIF_WORK_* masks. Those low
bits are scarce, and are all used up now. Renumber TIF_RESTORE_SIGMASK to
free one up.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Wed, 30 Apr 2008 07:53:07 +0000 (00:53 -0700)]
signals: set_restore_sigmask TIF_SIGPENDING
Set TIF_SIGPENDING in set_restore_sigmask. This lets arch code take
TIF_RESTORE_SIGMASK out of the set of bits that will be noticed on return to
user mode. On some machines those bits are scarce, and we can free this
unneeded one up for other uses.
It is probably the case that TIF_SIGPENDING is always set anyway everywhere
set_restore_sigmask() is used. But this is some cheap paranoia in case there
is an arcane case where it might not be.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roland McGrath [Wed, 30 Apr 2008 07:53:06 +0000 (00:53 -0700)]
signals: add set_restore_sigmask
This adds the set_restore_sigmask() inline in <linux/thread_info.h> and
replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it. No
change, but abstracts the details of the flag protocol from all the calls.
Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: allow the kernel to actually kill /sbin/init
Currently the buggy /sbin/init hangs if SIGSEGV/etc happens. The kernel sends
the signal, init dequeues it and ignores, returns from the exception, repeats
the faulting instruction, and so on forever.
Imho, such a behaviour is not good. I think that the explicit loud death of
the buggy /sbin/init is better than the silent hang.
Change force_sig_info() to clear SIGNAL_UNKILLABLE when the task should be
really killed.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: fix /sbin/init protection from unwanted signals
The global init has a lot of long standing problems with the unhandled fatal
signals.
- The "is_global_init(current)" check in get_signal_to_deliver()
protects only the main thread. Sub-thread can dequee the fatal
signal and shutdown the whole thread group except the main thread.
If it dequeues SIGSTOP /sbin/init will be stopped, this is not
right too. Note that we can't use is_global_init(->group_leader),
this breaks exec and this can't solve other problems we have.
- Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT
on delivery. This breaks exec, has other bad implications, and this
is just wrong.
Introduce the new SIGNAL_UNKILLABLE flag to fix these problems. It also helps
to solve some other problems addressed by the subsequent patches.
Currently we use this flag for the global init only, but it could also be used
by kthreads and (perhaps) by the sub-namespace inits.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: check_kill_permission: check session under tasklist_lock
This wasn't documented, but as Atsushi Tsuji pointed out
check_kill_permission() needs tasklist_lock for task_session_nr(). I missed
this fact when removed tasklist from the callers.
Change check_kill_permission() to take tasklist_lock for the SIGCONT case.
Re-order security checks so that we take tasklist_lock only if/when it is
actually needed. This is a minimal fix for now, tasklist will be removed
later.
Also change the code to use task_session() instead of task_session_nr().
Also, remove the SIGCONT check from cap_task_kill(), it is bogus (and the
whole function is bogus. Serge, Eric, why it is still alive?).
signals: fold sig_ignored() into handle_stop_signal()
Rename handle_stop_signal() to prepare_signal(), make it return a boolean, and
move the callsites of sig_ignored() into it.
No functional changes for now. But it would be nice to factor out the "should
we drop this signal" checks as much as possible, before we try to fix the bugs
with the sub-namespace init's signals (actually the global /sbin/init has some
problems with signals too).
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: cleanup the usage of print_fatal_signal()
Move the callsite of print_fatal_signal() down, under "if
(sig_kernel_coredump(signr))", so we don't need to check signr != SIGKILL.
We are only interested in the sig_kernel_coredump() signals anyway, and due to
the previous changes we almost never can see other fatal signals here except
SIGKILL.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: handle_stop_signal: don't worry about SIGKILL
handle_stop_signal() clears SIGNAL_STOP_DEQUEUED when sig == SIGKILL. Remove
this nasty special case. It was needed to prevent the race with group stop
and exit caused by thread-specific SIGKILL. Now that we use complete_signal()
for private signals too this is not needed, complete_signal() will notice
SIGKILL and abort the soon-to-begin group stop.
Except: the target thread is dead (has PF_EXITING). But in that case we
should not just clear SIGNAL_STOP_DEQUEUED and nothing more. We should either
kill the whole thread group, or silently ignore the signal.
I suspect we are not right wrt zombie leaders, but this is another issue which
and should be fixed separately. Note that this check can't abort the group
stop if it was already started/finished, this check only adds a subtle side
effect if we race with the thread which has already dequeued sig_kernel_stop()
signal and temporary released ->siglock.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: join send_sigqueue() with send_group_sigqueue()
We export send_sigqueue() and send_group_sigqueue() for the only user,
posix_timer_event(). This is a bit silly, because both are just trivial
helpers on top of do_send_sigqueue() and because the we pass the unused
.si_signo parameter.
Kill them both, rename do_send_sigqueue() to send_sigqueue(), and export it.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
send_sigqueue/send_group_sigqueue are only differ in how they lock ->siglock.
Unify them. send_group_sigqueue() uses spin_lock() because it knows the task
can't exit, but in that case lock_task_sighand() can't fail and doesn't hurt.
Note that the "sig" argument is ignored, it is always equal to ->si_signo.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: use __group_complete_signal() for the specific signals too
Based on Pavel Emelyanov's suggestion.
Rename __group_complete_signal() to complete_signal() and use it to process
the specific signals too. To do this we simply add the "int group" argument.
This allows us to greatly simply the signal-sending code and adds a useful
behaviour change. We can avoid the unneeded wakeups for the private signals
because wants_signal() is more clever than sigismember(blocked), but more
importantly we now take into account the fatal specific signals too.
The latter allows us to kill some subtle checks in handle_stop_signal() and
makes the specific/group signal's behaviour more consistent. For example,
currently sigtimedwait(FATAL_SIGNAL) behaves differently depending on was the
signal sent by kill() or tkill() if the signal was not blocked.
And. This allows us to tweak/fix the behaviour when the specific signal is
sent to the dying/dead ->group_leader.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: change send_signal/do_send_sigqueue to take "boolean group" parameter
send_signal() is used either with ->pending or with ->signal->shared_pending.
Change it to take "int group" instead, this argument will be re-used later.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The comment in send_sig_info() is wrong, tasklist_lock can't help.
The caller must ensure the task can't go away, otherwise ->sighand can be NULL
even before we take the lock.
p->sighand could be changed by exec(), but I can't imagine how it is possible
to prevent exit(), but not exec().
Since the things seem to work, I assume all callers are correct. However,
drm_vbl_send_signals() looks broken. block_all_signals() which is solely used
by drm is definitely broken.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert do_tkill() to use rcu_read_lock() + lock_task_sighand() to avoid
taking tasklist lock.
Note that we don't return an error if lock_task_sighand() fails, we pretend
the task dies after receiving the signal. Otherwise, we should fight with the
nasty races with mt-exec without having any advantage.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: move handle_stop_signal() into send_signal()
Move handle_stop_signal() into send_signal(). This factors out a couple of
callsites and allows us to do further unifications.
Also, with this change specific_send_sig_info() does handle_stop_signal().
Not that this is really important, we never send STOP/CONT via send_sig() and
friends, but still this looks more consistent.
The only (afaics) special case is get_signal_to_deliver(). If the traced task
dequeues SIGCONT, it can re-send it to itself after ptrace_stop() if the
signal was blocked by debugger. In that case handle_stop_signal() is
unnecessary, but hopefully not a problem.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_signal_to_deliver: use the cached ->signal/sighand values
Cache the values of current->signal/sighand. Shrinks .text a bit and makes
the code more readable. Also, remove "sigset_t *mask", it is pointless
because in fact we save the constant offset.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kill_pid_info: don't take now unneeded tasklist_lock
Previously handle_stop_signal(SIGCONT) could drop ->siglock. That is why
kill_pid_info(SIGCONT) takes tasklist_lock to make sure the target task can't
go away after unlock. Not needed now.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: re-assign CLD_CONTINUED notification from the sender to reciever
Based on discussion with Jiri and Roland.
In short: currently handle_stop_signal(SIGCONT, p) sends the notification to
p->parent, with this patch p itself notifies its parent when it becomes
running.
handle_stop_signal(SIGCONT) has to drop ->siglock temporary in order to notify
the parent with do_notify_parent_cldstop(). This leads to multiple problems:
- as Jiri Kosina pointed out, the stopped task can resume without
actually seeing SIGCONT which may have a handler.
- we race with another sig_kernel_stop() signal which may come in
that window.
- we race with sig_fatal() signals which may set SIGNAL_GROUP_EXIT
in that window.
- we can't avoid taking tasklist_lock() while sending SIGCONT.
With this patch handle_stop_signal() just sets the new SIGNAL_CLD_CONTINUED
flag in p->signal->flags and returns. The notification is sent by the first
task which returns from finish_stop() (there should be at least one) or any
other signalled thread from get_signal_to_deliver().
This is a user-visible change. Say, currently kill(SIGCONT, stopped_child)
can't return without seeing SIGCHLD, with this patch SIGCHLD can be delayed
unpredictably. Another difference is that if the child is ptraced by another
process, CLD_CONTINUED may be delivered to ->real_parent after ptrace_detach()
while currently it always goes to the tracer which doesn't actually need this
notification. Hopefully not a problem.
The patch asks for the futher obvious cleanups, I'll send them separately.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Every implementation of ->task_kill() does nothing when the signal comes from
the kernel. This is correct, but means that check_kill_permission() should
call security_task_kill() only for SI_FROMUSER() case, and we can remove the
same check from ->task_kill() implementations.
(sadly, check_kill_permission() is the last user of signal->session/__session
but we can't s/task_session_nr/task_session/ here).
NOTE: Eric W. Biederman pointed out cap_task_kill() should die, and I think
he is very right.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Roland McGrath <roland@redhat.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: David Quigley <dpquigl@tycho.nsa.gov> Cc: Eric Paris <eparis@redhat.com> Cc: Harald Welte <laforge@gnumonks.org> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Wed, 30 Apr 2008 07:52:41 +0000 (00:52 -0700)]
signals: consolidate send_sigqueue and send_group_sigqueue
Both functions do the same thing after proper locking, but with
different sigpending structs, so move the common code into a helper.
After this we have 4 places that look very similar: send_sigqueue: calls
do_send_sigqueue and signal_wakeup send_group_sigqueue: calls
do_send_sigqueue and __group_complete_signal __group_send_sig_info:
calls send_signal and __group_complete_signal specific_send_sig_info:
calls send_signal and signal_wakeup
Besides, send_signal performs actions similar to do_send_sigqueue's
and __group_complete_signal - to signal_wakeup.
It looks like they can be consolidated gracefully.
Oleg said:
Personally, I think this change is very good. But send_sigqueue() and
send_group_sigqueue() have a very subtle difference which I was never able
to understand.
Let's suppose that sigqueue is already queued, and the signal is ignored
(the latter means we should re-schedule cpu timer or handle overrruns). In
that case send_sigqueue() returns 0, but send_group_sigqueue() returns 1.
I think this is not the problem (in fact, I think this patch makes the
behaviour more correct), but I hope Thomas can take a look and confirm.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Wed, 30 Apr 2008 07:52:39 +0000 (00:52 -0700)]
signals: consolidate checks for whether or not to ignore a signal
Both sig_ignored() and do_sigaction() check for signr to be explicitly or
implicitly ignored. Introduce a helper for them.
This patch is aimed to help handling signals by pid namespace's init, and was
derived from one of Oleg's patches
https://lists.linux-foundation.org/pipermail/containers/2007-December/009308.html
so, if he doesn't mind, he should be considered as an author.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most of the callers of lock_task_sighand() doesn't actually need rcu_lock().
lock_task_sighand() needs it only to safely play with tsk->sighand, it can
take the lock itself.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: do_group_exit(): use signal_group_exit() more consistently
do_group_exit() checks SIGNAL_GROUP_EXIT to avoid taking sighand->siglock.
Since ed5d2cac114202fe2978a9cbcab8f5032796d538 exec() doesn't set this
flag, we should use signal_group_exit().
This is not needed for correctness, but can speedup the multithreaded exec
and makes the code more consistent.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
signals: do_signal_stop(): use signal_group_exit()
do_signal_stop() needs signal_group_exit() but checks sig->group_exit_task.
This (optimization) is correct, SIGNAL_STOP_DEQUEUED and SIGNAL_GROUP_EXIT
are mutually exclusive, but looks confusing. Use signal_group_exit(), this
is not fastpath, the code clarity is more important.
Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 30 Apr 2008 07:52:33 +0000 (00:52 -0700)]
isofs: fix access to unallocated memory when reading corrupted filesystem
When a directory on isofs is corrupted, we did not check whether length of the
name in a directory entry and the length of the directory entry itself are
consistent. This could lead to possible access beyond the end of buffer when
the length of the name was too big. Add this sanity check to directory
reading code.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Wed, 30 Apr 2008 07:52:32 +0000 (00:52 -0700)]
md: support blocking writes to an array on device failure
Allows a userspace metadata handler to take action upon detecting a device
failure.
Based on an original patch by Neil Brown.
Changes:
-added blocked_wait waitqueue to rdev
-don't qualify Blocked with Faulty always let userspace block writes
-added md_wait_for_blocked_rdev to wait for the block device to be clear, if
userspace misses the notification another one is sent every 5 seconds
-set MD_RECOVERY_NEEDED after clearing "blocked"
-kill DoBlock flag, just test mddev->external
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Wed, 30 Apr 2008 07:52:32 +0000 (00:52 -0700)]
md: prevent duplicates in bind_rdev_to_array
Found when trying to reassemble an active externally managed array. Without
this check we hit the more noisy "sysfs duplicate" warning in the later call
to kobject_add.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Wed, 30 Apr 2008 07:52:31 +0000 (00:52 -0700)]
md: remove a stray command from a copy and paste error in resync_start_store
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
md: fix up switching md arrays between read-only and read-write
When setting an array to 'readonly' or to 'active' via sysfs, we must make the
appropriate set_disk_ro call too.
Also when switching to "read_auto" (which is like readonly, but blocks on the
first write so that metadata can be marked 'dirty') we need to be more careful
about what state we are changing from.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
md: fix 'safemode' handling for external metadata.
'safemode' relates to marking an array as 'clean' if there has been no write
traffic for a while (a couple of seconds), to reduce the chance of the array
being found dirty on reboot.
->safemode is set to '1' when there have been no write for a while, and it
gets set to '0' when the superblock is updates with the 'clean' flag set.
This requires a few fixes for 'external' metadata:
- When an array is set to 'clean' via sysfs, 'safemode' must be cleared.
- when we write to an array that has 'safemode' set (there must have been
some delay in updating the metadata), we need to clear safemode.
- Don't try to update external metadata in md_check_recovery for safemode
transitions - it won't work.
Also, don't try to support "immediate safe mode" (safemode==2) for external
metadata, it cannot really work (the safemode timeout can be set very low if
this is really needed).
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I keep finding problems where an mddev gets reused and some fields has a value
from a previous usage that confuses the new usage. So clear all fields that
could possible need clearing when calling do_md_stop.
Also initialise the 'level' of a new array to LEVEL_NONE (which isn't 0).
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
md: skip all metadata update processing when using external metadata.
All the metadata update processing for external metadata is on in user-space
or through the sysfs interfaces, so make "md_update_sb" a no-op in that case.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Wed, 30 Apr 2008 07:52:28 +0000 (00:52 -0700)]
md: fix use after free when removing rdev via sysfs
rdev->mddev is no longer valid upon return from entry->store() when the
'remove' command is given.
Cc: <stable@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Miao [Wed, 30 Apr 2008 07:52:27 +0000 (00:52 -0700)]
pxafb: preliminary smart panel interface support (update)
FB_PXA_SMARTPANEL defaults to "n" and removed the cast to void *.
Signed-off-by: Daniel Mack <daniel@caiaq.de> Acked-by: Eric Miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Miao [Wed, 30 Apr 2008 07:52:26 +0000 (00:52 -0700)]
pxafb: preliminary smart panel interface support
Signed-off-by: Daniel Mack <daniel@caiaq.de> Signed-off-by: Eric Miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Miao [Wed, 30 Apr 2008 07:52:25 +0000 (00:52 -0700)]
pxafb: move parallel LCD timing setup into dedicate function
the new_regs stuff has been removed, and all the setup (modification to those
fbi->reg_*) is protected with IRQ disabled
* disable IRQ is too heavy here, provided that no IRQ context will
touch the fbi->reg_* and the only possible contending place is
in the CPUFREQ_POSTCHANGE (task context), a mutex will be better,
leave this for future improvement
Signed-off-by: eric miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Miao [Wed, 30 Apr 2008 07:52:24 +0000 (00:52 -0700)]
pxafb: use completion for LCD disable wait code
Signed-off-by: eric miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Miao [Wed, 30 Apr 2008 07:52:24 +0000 (00:52 -0700)]
pxafb: introduce lcd_{read,write}l() to wrap the __raw_{read,write}l()
using __raw_{read,write}l() everywhere looks messy, introduce
lcd_{read,write}l() to get this cleaned up a bit
Signed-off-by: eric miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eric miao [Wed, 30 Apr 2008 07:52:23 +0000 (00:52 -0700)]
pxafb: make lubbock/mainstone/zylonite/littleton to use new LCD connection type
Signed-off-by: eric miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eric miao [Wed, 30 Apr 2008 07:52:22 +0000 (00:52 -0700)]
pxafb: introduce register independent LCD connection type for pxafb
Reasons:
1. straight forward: the name "LCD_COLOR_DSTN_16BPP" is much better
than "LCCR0_Pas | LCCR0_Color | LCCR0_Dual"
2. by defining LCD connection types as constants, it allows only
valid possibilities
3. by removing the dependency of register bits definitions, those
can be later moved into the body of pxafb.c, instead of having
a regs-lcd.h around
Currently, only lubbock, mainstone, zylonite and littleton have been
modified to support these types (see coming patches after this).
Other platforms are encouraged to change their way describing the
LCD controller connections.
Signed-off-by: eric miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eric miao [Wed, 30 Apr 2008 07:52:21 +0000 (00:52 -0700)]
pxafb: introduce "struct pxafb_dma_buff" for palette and dma descriptors
Use structure and array for palette buffer and dma descriptors to:
1. better organize code for future expansion like overlays
2. separate palette and dma descriptors from frame buffer
Signed-off-by: eric miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eric miao [Wed, 30 Apr 2008 07:52:21 +0000 (00:52 -0700)]
pxafb: convert fb driver to use ioremap() and __raw_{readl, writel}
This is part of the effort moving peripheral registers outside of pxa-regs.h,
and using ioremap() make it possible the same IP can be re-used on different
processors with different registers space
As a result, the fixed mapping in pxa_map_io() is removed.
The regs-lcd.h can actually moved to where closer to pxafb.c but some of its
bit definitions are directly used by various platform code, though this is not
a good style.
Signed-off-by: eric miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eric miao [Wed, 30 Apr 2008 07:52:20 +0000 (00:52 -0700)]
pxafb: sanitize the usage of #ifdef .. processing pxafb parameters
So to get a better coding style and centralize the pxafb parameters
handling code.
Signed-off-by: eric miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eric miao [Wed, 30 Apr 2008 07:52:19 +0000 (00:52 -0700)]
pxafb: purge unnecessary pr_debug and comments from pxafb
Signed-off-by: eric miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eric miao [Wed, 30 Apr 2008 07:52:19 +0000 (00:52 -0700)]
pxafb: fix various coding style issues for pxafb
Signed-off-by: eric miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
eric miao [Wed, 30 Apr 2008 07:52:18 +0000 (00:52 -0700)]
pxafb: un-nest pxafb_parse_options() to cleanup the coding style issue
pxafb_parse_options() has very long lines exceeding far beyond 80 characters,
which makes the function looks bad. Un-nest it into smaller functions and use
a temporary string for only what has been overridden instead of the whole
dev_info() message to reduce the line a bit more.
Signed-off-by: eric miao <eric.miao@marvell.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kallsyms: nuke all ChangeLog, this should be logged by git
Pointed out by Paulo:
"When I wrote this initially, it was a mistake to add a Changelog in
the first place, but I didn't know better at the time.
If you're going to make changes to this file, please remove all the
Changelog, instead of adding more entries to it. The 'Changelog'
should be kept by the version control system, and not the source code
itself."
Cc: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Bryan Wu <cooloney@kernel.org> Acked-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
/proc/pagetypeinfo: fix output for memoryless nodes
on memoryless node, /proc/pagetypeinfo is displayed slightly funny output.
this patch fix it.
output example (header is outputed, but no data is outputed)
--------------------------------------------------------------
Page block order: 14
Pages per block: 16384
Free pages count per migrate type at order 0 1 2 3 4 5 \
6 7 8 9 10 11 12 13 14 15 16
Number of blocks type Unmovable Reclaimable Movable Reserve Isolate
Page block order: 14
Pages per block: 16384
Free pages count per migrate type at order 0 1 2 3 4 5 \
6 7 8 9 10 11 12 13 14 15 16
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/v4l-dvb:
V4L/DVB (7798): tuners/Kconfig: Change config name and help to reflect dynamic load for tuners
V4L/DVB (7794): cx88: Fix a warning
V4L/DVB (7792): ivtv: correct misspelled "HIMEM4G" to "HIGHMEM4G" in error message
V4L/DVB (7791): ivtv: POLLHUP must be returned on eof
V4L/DVB (7789b): Fix merge conflicts
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (48 commits)
ext4: fix hot spins in mballoc after err_freebuddy and err_freemeta
ext4: fix test ext_generic_write_end() copied return value
ext3: fix test ext_generic_write_end() copied return value
ext4: Move mballoc headers/structures to a seperate header file mballoc.h
ext4: cleanup for compiling mballoc with verification and debugging #defines
ext4: don't use ext4_error in ext4_check_descriptors
ext4: mark inode dirty after initializing the extent tree
ext4: update ctime and mtime for truncate with extents.
ext4: Don't do GFP_NOFS allocations after taking ext4_lock_group
ext4: move headers out of include/linux
ext4: fix wrong gfp type under transaction
ext4: Fix hang on umount with quotas when journal is aborted
ext4: Fix update of mtime and ctime on rename
jdb2: replace remaining __FUNCTION__ occurrences
ext4: replace remaining __FUNCTION__ occurrences
jbd2: only create debugfs and stats entries if init is successful
jbd2: fix kernel-doc notation
jbd2: replace potentially false assertion with if block
jbd2: eliminate duplicated code in revocation table init/destroy functions
jbd2: tidy up revoke cache initialisation and destruction
...
Robert P. J. Day [Mon, 28 Apr 2008 23:16:20 +0000 (20:16 -0300)]
V4L/DVB (7792): ivtv: correct misspelled "HIMEM4G" to "HIGHMEM4G" in error message
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Stephen Rothwell [Wed, 30 Apr 2008 01:16:16 +0000 (11:16 +1000)]
pasemi_edac needs to include linux/edac.h
Commit c3c52bce6993c6d37af2c2de9b482a7013d646a7 ("edac: fix module
initialization on several modules 2nd time") added a call to opstate_init
but did not include linux/edac.h that declares it.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ext4: fix hot spins in mballoc after err_freebuddy and err_freemeta
In ext4_mb_init_backend() 'i' is of type ext4_group_t. Since unsigned, i
>= 0 is always true, so fix hot spins after err_freebuddy: and -meta:
and prevent decrements when zero.
It fixes:
Compilation errors:
fs/ext4/mballoc.c: In function '__mb_check_buddy':
fs/ext4/mballoc.c:605: error: 'struct ext4_prealloc_space' has no member named 'group_list'
fs/ext4/mballoc.c:606: error: 'struct ext4_prealloc_space' has no member named 'pstart'
fs/ext4/mballoc.c:608: error: 'struct ext4_prealloc_space' has no member named 'len'
Compilation warnings:
fs/ext4/mballoc.c: In function 'ext4_mb_normalize_group_request':
fs/ext4/mballoc.c:2863: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'
fs/ext4/mballoc.c: In function 'ext4_mb_use_inode_pa':
fs/ext4/mballoc.c:3103: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'
Sparse check:
fs/ext4/mballoc.c:3818:2: warning: context imbalance in 'ext4_mb_show_ac' - different lock contexts for basic block
Josef Bacik [Wed, 30 Apr 2008 02:00:28 +0000 (22:00 -0400)]
ext4: don't use ext4_error in ext4_check_descriptors
Because ext4_check_descriptors is called at mount time you can't use ext4_error
as it calls ext4_commit_sb, which since the sb isn't all the way initialized
causes bad things to happen (ie a panic). This patch changes the ext4_error's
to printk's to keep this problem from happening. Thanks much,
Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
ext4: update ctime and mtime for truncate with extents.
The recently announced "Linux POSIX file system test suite"
caught a truncate issue when using extents:
mtime and ctime are not updated when truncate is successful.
This is the single issue caught with "default" ext4 (mkfs and mount
with minimal options).
The testsuite does not report failure with -o noextents.
With the following patch, all tests of the testsuite pass.