hawkes@sgi.com [Thu, 13 Oct 2005 19:01:18 +0000 (12:01 -0700)]
[IA64] another place to use for_each_cpu_mask() in arch/ia64
In arch/ia64 change the explicit use of a for-loop using NR_CPUS into the
general for_each_online_cpu() construct. This widens the scope of potential
future optimizations of the general constructs, as well as takes advantage
of the existing optimizations of first_cpu() and next_cpu(), which is
advantageous when the true CPU count is much smaller than NR_CPUS.
Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
hawkes@sgi.com [Mon, 10 Oct 2005 15:43:26 +0000 (08:43 -0700)]
[IA64] wider use of for_each_cpu_mask() in arch/ia64
In arch/ia64 change the explicit use of for-loops and NR_CPUS into the
general for_each_cpu() or for_each_online_cpu() constructs, as
appropriate. This widens the scope of potential future optimizations
of the general constructs, as well as takes advantage of the existing
optimizations of first_cpu() and next_cpu().
Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Andrew Morton [Tue, 25 Oct 2005 18:00:56 +0000 (11:00 -0700)]
[PATCH] qlogic lockup fix
If qla2x00_probe_one()'s call to qla2x00_iospace_config() fails, we call
qla2x00_free_device() to clean up. But because ha->dpc_pid hasn't been set
yet, qla2x00_free_device() tries to stop a kernel thread which hasn't started
yet. It does wait_for_completion() against an uninitialised completion struct
and the kernel hangs up.
Fix it by initialising ha->dpc_pid a bit earlier.
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Oleg Nesterov [Mon, 24 Oct 2005 14:29:58 +0000 (18:29 +0400)]
[PATCH] posix-timers: fix posix_cpu_timer_set() vs run_posix_cpu_timers() race
This might be harmless, but looks like a race from code inspection (I
was unable to trigger it). I must admit, I don't understand why we
can't return TIMER_RETRY after 'spin_unlock(&p->sighand->siglock)'
without doing bump_cpu_timer(), but this is what original code does.
We are probaly deleting the timer from run_posix_cpu_timers's 'firing'
local list_head while run_posix_cpu_timers() does list_for_each_safe.
Various bad things can happen, for example we can just delete this timer
so that list_for_each() will not notice it and run_posix_cpu_timers()
will not reset '->firing' flag. In that case,
....
if (timer->it.cpu.firing) {
read_unlock(&tasklist_lock);
timer->it.cpu.firing = -1;
return TIMER_RETRY;
}
sys_timer_settime() goes to 'retry:', calls posix_cpu_timer_set() again,
it returns TIMER_RETRY ...
Oleg Nesterov [Mon, 24 Oct 2005 10:34:03 +0000 (14:34 +0400)]
[PATCH] posix-timers: remove false BUG_ON() from run_posix_cpu_timers()
do_exit() clears ->it_##clock##_expires, but nothing prevents
another cpu to attach the timer to exiting process after that.
After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
before do_exit() calls 'schedule() local timer interrupt can find
tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
does sys_wait4) interrupted task has ->signal == NULL.
At this moment exiting task has no pending cpu timers, they were cleaned
up in __exit_signal()->posix_cpu_timers_exit{,_group}(), so we can just
return from irq.
Oleg Nesterov [Sun, 23 Oct 2005 16:25:39 +0000 (20:25 +0400)]
[PATCH] posix-timers: fix cleanup_timers() and run_posix_cpu_timers() races
1. cleanup_timers() sets timer->task = NULL under tasklist + ->sighand locks.
That means that this code in posix_cpu_timer_del() and posix_cpu_timer_set()
lock_timer(timer);
if (timer->task == NULL)
return;
read_lock(tasklist);
put_task_struct(timer->task)
is racy. With this patch timer->task modified and accounted only under
timer->it_lock. Sadly, this means that dead task_struct won't be freed
until timer deleted or armed.
2. run_posix_cpu_timers() collects expired timers into local list under
tasklist + ->sighand again. That means that posix_cpu_timer_del()
should check timer->it.cpu.firing under these locks too.
Roland Dreier [Sun, 23 Oct 2005 19:57:19 +0000 (12:57 -0700)]
[PATCH] ib: mthca: Always re-arm EQs in mthca_tavor_interrupt()
We should always re-arm an event queue's interrupt in
mthca_tavor_interrupt() if the corresponding bit is set in the event cause
register (ECR), even if we didn't find any entries in the EQ. If we don't,
then there's a window where we miss an EQ entry and then get stuck because
we don't get another EQ event.
Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sun, 23 Oct 2005 19:57:18 +0000 (12:57 -0700)]
[PATCH] inotify/idr leak fix
Fix a bug which was reported and diagnosed by
Stefan Jones <stefan.jones@churchillrandoms.co.uk>
IDR trees include a cache of idr_layer objects. There's no way to destroy
this cache, so when we discard an overall idr tree we end up leaking some
memory.
Add and use idr_destroy() for this. v9fs and infiniband also need to use
idr_destroy() to avoid leaks.
Or, we make the cache global, like radix_tree_preload(). Which is probably
better. Later.
Cc: Eric Van Hensbergen <ericvh@ericvh.myip.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Robert Love <rml@novell.com> Cc: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Davi Arnaut [Sun, 23 Oct 2005 19:57:16 +0000 (12:57 -0700)]
[PATCH] SELinux: handle sel_make_bools() failure in selinuxfs
This patch fixes error handling in sel_make_bools(), where currently we'd
get a memory leak via security_get_bools() and try to kfree() the wrong
pointer if called again.
Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Smalley [Sun, 23 Oct 2005 19:57:15 +0000 (12:57 -0700)]
[PATCH] selinux: Fix NULL deref in policydb_destroy
This patch fixes a possible NULL dereference in policydb_destroy, where
p->type_attr_map can be NULL if policydb_destroy is called to clean up a
partially loaded policy upon an error during policy load. Please apply.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Randy Dunlap [Sun, 23 Oct 2005 19:57:11 +0000 (12:57 -0700)]
[PATCH] kernel-parameters cleanup
Fix typos & trailing whitespace.
Add blank lines in a few places.
Remove "AM53C974=" option: driver does not exist.
Restrict to < 80 columns in most places (but don't split formatted
command-line arguments).
Add a few option arguments for completeness.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Sun, 23 Oct 2005 23:31:16 +0000 (16:31 -0700)]
cardbus: limit IO windows to 256 bytes
That's what we've always historically done, and bigger windows seem to
confuse some cardbus bridges. Or something.
Alan reports that this makes the ThinkPad 600x series work properly
again: the 4kB IO window for some reason made IDE DMA not work, which
makes IDE painfully slow even if it works after DMA timeouts.
Linus Torvalds [Sun, 23 Oct 2005 17:02:50 +0000 (10:02 -0700)]
Posix timers: limit number of timers firing at once
Bursty timers aren't good for anybody, very much including latency for
other programs when we trigger lots of timers in interrupt context. So
set a random limit, after which we'll handle the rest on the next timer
tick.
Herbert Xu [Sun, 23 Oct 2005 07:18:00 +0000 (17:18 +1000)]
[NEIGH] Fix timer leak in neigh_changeaddr
neigh_changeaddr attempts to delete neighbour timers without setting
nud_state. This doesn't work because the timer may have already fired
when we acquire the write lock in neigh_changeaddr. The result is that
the timer may keep firing for quite a while until the entry reaches
NEIGH_FAILED.
It should be setting the nud_state straight away so that if the timer
has already fired it can simply exit once we relinquish the lock.
In fact, this whole function is simply duplicating the logic in
neigh_ifdown which in turn is already doing the right thing when
it comes to deleting timers and setting nud_state.
So all we have to do is take that code out and put it into a common
function and make both neigh_changeaddr and neigh_ifdown call it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Herbert Xu [Sun, 23 Oct 2005 06:37:48 +0000 (16:37 +1000)]
[NEIGH] Fix add_timer race in neigh_add_timer
neigh_add_timer cannot use add_timer unconditionally. The reason is that
by the time it has obtained the write lock someone else (e.g., neigh_update)
could have already added a new timer.
So it should only use mod_timer and deal with its return value accordingly.
This bug would have led to rare neighbour cache entry leaks.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Ralf Baechle [Fri, 14 Oct 2005 20:29:56 +0000 (21:29 +0100)]
[AX.25]: Fix signed char bug
On architectures where the char type defaults to unsigned some of the
arithmetic in the AX.25 stack to fail, resulting in some packets being dropped
on receive.
Credits for tracking this down and the original patch to
Bob Brose N0QBJ <linuxhams@n0qbj-11.ampr.org>.
Julian Anastasov [Sat, 22 Oct 2005 10:39:21 +0000 (13:39 +0300)]
[SK_BUFF]: ipvs_property field must be copied
IPVS used flag NFC_IPVS_PROPERTY in nfcache but as now nfcache was removed the
new flag 'ipvs_property' still needs to be copied. This patch should be
included in 2.6.14.
Further comments from Harald Welte:
Sorry, seems like the bug was introduced by me.
Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Chris Wright [Fri, 21 Oct 2005 23:56:08 +0000 (16:56 -0700)]
[PATCH] typo fix in last cpufreq powernow patch
Not sure how it slipped by, but here's a trivial typo fix for powernow.
Signed-off-by: Chris Wright <chrisw@osdl.org>
[ It's "nurter" backwards.. Maybe we have a hillbilly The Shining fan? ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roland McGrath [Fri, 21 Oct 2005 22:03:29 +0000 (15:03 -0700)]
[PATCH] Call exit_itimers from do_exit, not __exit_signal
When I originally moved exit_itimers into __exit_signal, that was the only
place where we could reliably know it was the last thread in the group
dying, without races. Since then we've gotten the signal_struct.live
counter, and do_exit can reliably do group-wide cleanup work.
This patch moves the call to do_exit, where it's made without locks. This
avoids the deadlock issues that the old __exit_signal code's comment talks
about, and the one that Oleg found recently with process CPU timers.
AMD recently discovered that on some hardware, there is a race condition
possible when a C-state change request goes onto the bus at the same
time as a P-state change request.
Both requests happen, but the southbridge hardware only acknowledges the
C-state change. The PowerNow! driver is then stuck in a loop, waiting
for the P-state change acknowledgement. The driver eventually times
out, but can no longer perform P-state changes.
It turns out the solution is to resend the P-state change, which the
southbridge will acknowledge normally.
Thanks to Johannes Winkelmann for reporting this and testing the fix.
Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
David Gibson [Fri, 21 Oct 2005 03:41:19 +0000 (13:41 +1000)]
[PATCH] ppc64: Fix typo bug in iSeries hash code
This fixes a stupid typo bug in the iSeries hash table code.
When we place a hash PTE in the secondary bucket, instead of setting the
SECONDARY flag bit, as we should, we (redundantly) set the VALID flag.
This was introduced with the patch abolishing bitfields from the hash
table code. Mea culpa, oops. It hasn't been noticed until now because
in practice we don't hit the secondary bucket terribly often.
Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
While working on 64K pages, I found this little buglet in our
update_mmu_cache() implementation.
The code calls __hash_page() passing it an "access" parameter (the type
of access that triggers the hash) containing the bits _PAGE_RW and
_PAGE_USER of the linux PTE. The latter is useless in this case and the
former is wrong. In fact, if we have a writeable PTE and we pass
_PAGE_RW to hash_page(), it will set _PAGE_DIRTY (since we track dirty
that way, by hash faulting !dirty) which is not what we want.
In fact, the correct fix is to always pass 0. That means that only
read-only or already dirty read write PTEs will be preloaded. The
(hopefully rare) case of a non dirty read write PTE can't be preloaded
this way, it will have to fault in hash_page on the actual access.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mackerras [Fri, 21 Oct 2005 12:39:36 +0000 (22:39 +1000)]
[PATCH] ppc64: Fix typo in time calculations
This fixes a typo in the div128_by_32 function used in the timekeeping
calculations on ppc64. If you look at the code it's quite obvious
that we need (rb + c) rather than (rb + b). The "b" is clearly just a
typo.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Eric Moore [Fri, 21 Oct 2005 18:56:36 +0000 (20:56 +0200)]
[PATCH] mptsas: fix phy identifiers
This fixes handling of the phy identifiers in mptsas.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
[ split it a pre-2.6.14 portion from Eric's bigger patch ] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Guillaume Gourat <guillaume.gourat@nexvision.fr> Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ben Dooks [Thu, 20 Oct 2005 22:21:19 +0000 (23:21 +0100)]
[ARM] 3027/1: BAST - reduce NAND timings slightly
Patch from Ben Dooks
The current Simtec BAST nand area timings are a little
too slow to be obtained by a 2410 running at 266MHz,
so reduce the timings slightly to bring them into the
acceptable range.
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ben Dooks [Thu, 20 Oct 2005 22:21:18 +0000 (23:21 +0100)]
[ARM] 3026/1: S3C2410 - avoid possible overflow in pll calculations
Patch from Ben Dooks
Avoid the possiblity that if the board is using
a 16.9334 or higher crystal with a high PLL
multiplier, then the pll value could overflow
the capability of an int.
Also fix the value types of the intermediate
variables to unsigned int.
Rewrite of patch from Guillaume Gourat
Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Herbert Xu [Thu, 20 Oct 2005 19:13:13 +0000 (17:13 -0200)]
[TCP] Allow len == skb->len in tcp_fragment
It is legitimate to call tcp_fragment with len == skb->len since
that is done for FIN packets and the FIN flag counts as one byte.
So we should only check for the len > skb->len case.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Herbert Xu [Tue, 18 Oct 2005 02:03:28 +0000 (12:03 +1000)]
[DCCP]: Clear the IPCB area
Turns out the problem has nothing to do with use-after-free or double-free.
It's just that we're not clearing the CB area and DCCP unlike TCP uses a CB
format that's incompatible with IP.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Herbert Xu [Sun, 16 Oct 2005 11:08:46 +0000 (21:08 +1000)]
[DCCP]: Make dccp_write_xmit always free the packet
icmp_send doesn't use skb->sk at all so even if skb->sk has already
been freed it can't cause crash there (it would've crashed somewhere
else first, e.g., ip_queue_xmit).
I found a double-free on an skb that could explain this though.
dccp_sendmsg and dccp_write_xmit are a little confused as to what
should free the packet when something goes wrong. Sometimes they
both go for the ball and end up in each other's way.
This patch makes dccp_write_xmit always free the packet no matter
what. This makes sense since dccp_transmit_skb which in turn comes
from the fact that ip_queue_xmit always frees the packet.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Herbert Xu [Fri, 14 Oct 2005 06:38:49 +0000 (16:38 +1000)]
[DCCP]: Use skb_set_owner_w in dccp_transmit_skb when skb->sk is NULL
David S. Miller <davem@davemloft.net> wrote:
> One thing you can probably do for this bug is to mark data packets
> explicitly somehow, perhaps in the SKB control block DCCP already
> uses for other data. Put some boolean in there, set it true for
> data packets. Then change the test in dccp_transmit_skb() as
> appropriate to test the boolean flag instead of "skb_cloned(skb)".
I agree. In fact we already have that flag, it's called skb->sk.
So here is patch to test that instead of skb_cloned().
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
Hugh Dickins [Thu, 20 Oct 2005 15:24:28 +0000 (16:24 +0100)]
[PATCH] Fix handling spurious page fault for hugetlb region
This reverts commit 3359b54c8c07338f3a863d1109b42eebccdcf379 and
replaces it with a cleaner version that is purely based on page table
operations, so that the synchronization between inode size and hugetlb
mappings becomes moot.
Al Viro [Tue, 18 Oct 2005 21:45:17 +0000 (22:45 +0100)]
[PATCH] build fix for uml/amd64
Missing half of the [PATCH] uml: Fix sysrq-r support for skas mode
We need to remove these (UPT_[DEFG]S) from the read side as well as the
write one - otherwise it simply won't build.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Yasunori Goto [Wed, 19 Oct 2005 22:52:18 +0000 (15:52 -0700)]
[PATCH] swiotlb: make sure initial DMA allocations really are in DMA memory
This introduces a limit parameter to the core bootmem allocator; The new
parameter indicates that physical memory allocated by the bootmem
allocator should be within the requested limit.
We also introduce alloc_bootmem_low_pages_limit, alloc_bootmem_node_limit,
alloc_bootmem_low_pages_node_limit apis, but alloc_bootmem_low_pages_limit
is the only api used for swiotlb.
The existing alloc_bootmem_low_pages() api could instead have been
changed and made to pass right limit to the core allocator. But that
would make the patch more intrusive for 2.6.14, as other arches use
alloc_bootmem_low_pages(). We may be done that post 2.6.14 as a
cleanup.
With this, swiotlb gets memory within 4G for both x86_64 and ia64
arches.
Peter Chubb [Thu, 20 Oct 2005 05:45:14 +0000 (22:45 -0700)]
[PATCH] `unaligned access' in acpi get_root_bridge_busnr()
In drivers/acpi/glue.c the address of an integer is cast to the address of
an unsigned long. This breaks on systems where a long is larger than an
int --- for a start the int can be misaligned; for a second the assignment
through the pointer will overwrite part of the next variable.
Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au> Acked-by: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Dave Airlie [Thu, 20 Oct 2005 04:23:51 +0000 (21:23 -0700)]
[PATCH] fix MGA DRM regression before 2.6.14
I've gotten a report on lkml, of a possible regression in the MGA DRM in
2.6.14-rc4 (since -rc1), I haven't been able to reproduce it here, but I've
figured out some possible issues in the mga code that were definitely
wrong, some of these are from DRM CVS, the main fix is the agp enable bit
on the old code path still used by everyone.....
Signed-off-by: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan Stern [Thu, 20 Oct 2005 04:23:51 +0000 (21:23 -0700)]
[PATCH] Threads shouldn't inherit PF_NOFREEZE
The PF_NOFREEZE process flag should not be inherited when a thread is
forked. This patch (as585) removes the flag from the child.
This problem is starting to show up more and more as drivers turn to the
kthread API instead of using kernel_thread(). As a result, their kernel
threads are now children of the kthread worker instead of modprobe, and
they inherit the PF_NOFREEZE flag. This can cause problems during system
suspend; the kernel threads are not getting frozen as they ought to be.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The implementation of __kernel_gettimeofday() in the 32 bits vDSO has a
small bug (a typo actually) that will cause it to lose 1 bit of precision.
Not terribly bad but worth fixing.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Thu, 20 Oct 2005 04:23:47 +0000 (21:23 -0700)]
[PATCH] Three one-liners in md.c
The main problem fixes is that in certain situations stopping md arrays may
take longer than you expect, or may require multiple attempts. This would
only happen when resync/recovery is happening.
This patch fixes three vaguely related bugs.
1/ The recent change to use kthreads got the setting of the
process name wrong. This fixes it.
2/ The recent change to use kthreads lost the ability for
md threads to be signalled with SIG_KILL. This restores that.
3/ There is a long standing bug in that if:
- An array needs recovery (onto a hot-spare) and
- The recovery is being blocked because some other array being
recovered shares a physical device and
- The recovery thread is killed with SIG_KILL
Then the recovery will appear to have completed with no IO being
done, which can cause data corruption.
This patch makes sure that incomplete recovery will be treated as
incomplete.
Note that any kernel affected by bug 2 will not suffer the problem of bug
3, as the signal can never be delivered. Thus the current 2.6.14-rc
kernels are not susceptible to data corruption. Note also that if arrays
are shutdown (with "mdadm -S" or "raidstop") then the problem doesn't
occur. It only happens if a SIGKILL is independently delivered as done by
'init' when shutting down.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andy Wingo [Thu, 20 Oct 2005 04:23:46 +0000 (21:23 -0700)]
[PATCH] raw1394: fix locking in the presence of SMP and interrupts
Changes all spinlocks that can be held during an irq handler to disable
interrupts while the lock is held. Changes spin_[un]lock_irq to use the
irqsave/irqrestore variants for robustness and readability.
In raw1394.c:handle_iso_listen(), don't grab host_info_lock at all -- we're
not accessing host_info_list or host_count, and holding this lock while
trying to tasklet_kill the iso tasklet this can cause an ABBA deadlock if
ohci:dma_rcv_tasklet is running and tries to grab host_info_lock in
raw1394.c:receive_iso. Test program attached reliably deadlocks all SMP
machines I have been able to test without this patch.
Signed-off-by: Andy Wingo <wingo@pobox.com> Acked-by: Ben Collins <bcollins@ubuntu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Hugh Dickins [Thu, 20 Oct 2005 04:23:43 +0000 (21:23 -0700)]
[PATCH] mm: hugetlb truncation fixes
hugetlbfs allows truncation of its files (should it?), but hugetlb.c often
forgets that: crashes and misaccounting ensue.
copy_hugetlb_page_range better grab the src page_table_lock since we don't
want to guess what happens if concurrently truncated. unmap_hugepage_range
rss accounting must not assume the full range was mapped. follow_hugetlb_page
must guard with page_table_lock and be prepared to exit early.
Restyle copy_hugetlb_page_range with a for loop like the others there.
Roland McGrath [Thu, 20 Oct 2005 05:21:23 +0000 (22:21 -0700)]
[PATCH] Fix cpu timers exit deadlock and races
Oleg Nesterov reported an SMP deadlock. If there is a running timer
tracking a different process's CPU time clock when the process owning
the timer exits, we deadlock on tasklist_lock in posix_cpu_timer_del via
exit_itimers.
That code was using tasklist_lock to check for a race with __exit_signal
being called on the timer-target task and clearing its ->signal.
However, there is actually no such race. __exit_signal will have called
posix_cpu_timers_exit and posix_cpu_timers_exit_group before it does
that. Those will clear those k_itimer's association with the dying
task, so posix_cpu_timer_del will return early and never reach the code
in question.
In addition, posix_cpu_timer_del called from exit_itimers during execve
or directly from timer_delete in the process owning the timer can race
with an exiting timer-target task to cause a double put on timer-target
task struct. Make sure we always access cpu_timers lists with sighand
lock held.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Seth, Rohit [Tue, 18 Oct 2005 21:15:12 +0000 (14:15 -0700)]
[PATCH] Handle spurious page fault for hugetlb region
The hugetlb pages are currently pre-faulted. At the time of mmap of
hugepages, we populate the new PTEs. It is possible that HW has already
cached some of the unused PTEs internally. These stale entries never
get a chance to be purged in existing control flow.
This patch extends the check in page fault code for hugepages. Check if
a faulted address falls with in size for the hugetlb file backing it.
We return VM_FAULT_MINOR for these cases (assuming that the arch
specific page-faulting code purges the stale entry for the archs that
need it).
Signed-off-by: Rohit Seth <rohit.seth@intel.com>
[ This is apparently arguably an ia64 port bug. But the code won't
hurt, and for now it fixes a real problem on some ia64 machines ]
Paul Schulz [Tue, 18 Oct 2005 18:40:32 +0000 (19:40 +0100)]
[ARM] 3023/1: pxa-regs: Typo in ARM pxa register definitions.
Patch from Paul Schulz
The following trivial patch is to fix what looks like a typo in the PXA register
definitions. The correction comes directly from the definition in the
Intel Documentation.
Neither 'UDCCS_IO_ROF' or 'UDCCS_IO_DME' are currently used elseware
in the main code (from grep of tree)... The current definitions have been
in the code since at lease 2.4.7.
Signed-off-by: Paul Schulz <paul@mawsonlakes.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[PATCH] vesafb: Fix display corruption on display blank
Reported by: Bob Tracy <rct@gherkin.frus.com>
"...I've got a Toshiba notebook (730XCDT -- Pentium 150MMX) for which
I'm using the Vesa FB driver. When the machine has been idle for some
time and the driver attempts to powerdown the display, rather than the
display going blank, it goes gray with several strange lines. When I
hit the "shift" key or other-wise wake up the display, the old video
state is not fully restored..."
vesafb recently added a blank method which has only 2 states, powerup and
powerdown. The powerdown state is used for all blanking levels, but in his
case, powerdown does not work correctly for higher levels of display
powersaving. Thus, for intermediate power levels, use software blanking,
and use only hardware blanking for an explicit powerdown.
Linus Torvalds [Tue, 18 Oct 2005 15:26:15 +0000 (08:26 -0700)]
Add some basic .gitignore files
This still leaves driver and architecture-specific subdirectories alone,
but gets rid of the bulk of the "generic" generated files that we should
ignore.
Kenneth Tan [Tue, 18 Oct 2005 06:53:35 +0000 (07:53 +0100)]
[ARM] 3021/1: Interrupt 0 bug fix for ixp4xx
Patch from Kenneth Tan
The get_irqnr_and_base subroutine of ixp4xx does not take interrupt 0 condition into account properly. We should not perform "subs" here. The Z flag will be set when interrupt 0 occur, which resulting "movne r1, sp" in the caller routine (irq_handler) not being executed.
When interrupt 0 occur:
o if CONFIG_CPU_IXP46X is not set, "subs" will set the Z flag and return
o if CONFIG_CPU_IXP46X is set, codes in upper interrupt handling will be trigerred. But since this is not supper interrupt, the "cmp" in the upper interrupt handling portion will set the Z flag and return
Signed-off-by: Kenneth Tan <chong.yin.tan@intel.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Mark Rustad [Mon, 17 Oct 2005 23:43:34 +0000 (16:43 -0700)]
[PATCH] kbuild: Eliminate build error when KALLSYMS not defined
The following build error happens with 2.6.14-rc4 when CONFIG_KALLSYMS is
not defined. The error message in a fragment of the output was:
CC arch/i386/lib/usercopy.o
AR arch/i386/lib/lib.a
/bin/sh: line 1: +@: command not found
make[3]: warning: jobserver unavailable: using -j1. Add `+' to parent make rule.
CHK include/linux/compile.h
Signed-off-by: Mark Rustad <mrustad@mac.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zach Brown [Mon, 17 Oct 2005 23:43:33 +0000 (16:43 -0700)]
[PATCH] aio: revert lock_kiocb()
lock_kiocb() was introduced to serialize retrying and cancellation. In the
process of doing so it tried to sleep waiting for KIF_LOCKED while holding
the ctx_lock spinlock. Recent fixes have ensured that multiple concurrent
retries won't be attempted for a given iocb. Cancel has other problems and
has no significant in-tree users that have been complaining about it. So
for the immediate future we'll revert sleeping with the lock held and will
address proper cancellation and retry serialization in the future.
Signed-off-by: Zach Brown <zach.brown@oracle.com> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephan Brodkorb [Mon, 17 Oct 2005 23:43:30 +0000 (16:43 -0700)]
[PATCH] n_r3964 mod_timer() fix
Since Revision 1.10 was released the n_r3964 module wasn't able to receive any
data. The reason for that behavior is because there were some wrong calls of
mod_timer(...) in the function receive_char (...). This patch should fix this
problem and was successfully tested with talking to some kuka industrial
robots.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Trond Myklebust [Mon, 17 Oct 2005 10:02:00 +0000 (06:02 -0400)]
[PATCH] NFS: Fix cache consistency races
If the data cache has been marked as potentially invalid by nfs_refresh_inode,
we should invalidate it rather than assume that changes are due to our own
activity.
Also ensure that we always start with a valid cache before declaring it
to be protected by a delegation.
Christian Krause [Mon, 17 Oct 2005 21:30:48 +0000 (14:30 -0700)]
[PATCH] USB: fix bug in handling of highspeed usb HID devices
During the development of an USB device I found a bug in the handling of
Highspeed HID devices in the kernel.
What happened?
Highspeed HID devices are correctly recognized and enumerated by the
kernel. But even if usbhid kernel module is loaded, no HID reports are
received by the kernel.
The output of the hardware USB analyzer told me that the host doesn't
even poll for interrupt IN transfers (even the "interrupt in" USB
transfer are polled by the host).
After some debugging in hid-core.c I've found the reason.
In case of a highspeed device, the endpoint interval is re-calculated in
driver/usb/input/hid-core.c:
Because of this the value of urb->interval is sometimes negative and is
rejected in core/urb.c:
line 353:
/* too small? */
if (urb->interval <= 0)
return -EINVAL;
The conclusion is, that the recalculaton of the interval (which is
necessary for highspeed) should not be made twice, because this is
simply wrong. ;-)
Re-calculation in usb_fill_int_urb makes more sense, because it is the
most general approach. So it would make sense to remove it from
hid-core.c.
Because in hid-core.c the interval variable is only used for calling
usb_fill_int_urb, it is no problem to remove the highspeed
re-calculation in this file.
Olav Kongas [Mon, 17 Oct 2005 21:30:43 +0000 (14:30 -0700)]
[PATCH] isp116x-hcd: fix handling of short transfers
Increased use of scatter-gather by usb-storage driver after 2.6.13 has
exposed a buggy codepath in isp116x-hcd, which was probably never
visited before: bug happened only for those urbs, for which
URB_SHORT_NOT_OK was set AND short transfer occurred.
The fix attached was tested in 2 ways: (a) it fixed failing
initialization of a flash drive with an embedded hub; (b) the fix was
tested with 'usbtest' against a modified g_zero driver (on top of
net2280), which generated short bulk IN transfers of various lengths
including multiples and non-multiples of max_packet_length.
This patch from Eric fixes handling of the phy identifiers in mptsas.
I've split it up from his bigger patch as it should go into 2.6.14
still.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Linus Torvalds [Mon, 17 Oct 2005 16:10:15 +0000 (09:10 -0700)]
Increase default RCU batching sharply
Dipankar made RCU limit the batch size to improve latency, but that
approach is unworkable: it can cause the RCU queues to grow without
bounds, since the batch limiter ended up limiting the callbacks.
So make the limit much higher, and start planning on instead limiting
the batch size by doing RCU callbacks more often if the queue looks like
it might be growing too long.
Ronald S. Bultje [Mon, 17 Oct 2005 03:29:25 +0000 (20:29 -0700)]
[PATCH] fix black/white-only svideo input in vpx3220 decoder
Fix the fact that the svideo input will only give input in black/white in
some circumstances. Reason is that in the PCI controller driver (zr36067),
after setting input, we reset norm, which overwrites the input register
with the default. This patch makes it always set the correct value for the
input when changing norm.
Signed-off-by: Ronald S. Bultje <rbultje@ronald.bitfreak.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Ronald S. Bultje [Mon, 17 Oct 2005 03:29:24 +0000 (20:29 -0700)]
[PATCH] fix vpx3220 offset issue in SECAM
Fix bug #5404 in kernel bugzilla.
It basically updates the vpx3220 initialization tables with some newer
values that we've had in CVS for a while (and that, for some reason, never
ended up in the kernel... must've gotten lost). Those fix a ~16 pixels
noise at the top of the picture in at least SECAM, although (now that I
think about it) PAL was probably affected, also.
Signed-off-by: Ronald S. Bultje <rbultje@ronald.bitfreak.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Samuel Thibault [Mon, 17 Oct 2005 03:29:22 +0000 (20:29 -0700)]
[PATCH] SVGATextMode fix
Fix bug 5441.
I didn't know about messy programs like svgatextmode... Couldn't this be
integrated in some linux/drivers/video/console/svgacon.c ?... So because
of the existence of the svgatextmode program, the kernel is not supposed to
touch to CRT_OVERFLOW/SYNC_END/DISP/DISP_END/OFFSET ?
Disabling the check in vgacon_resize() might help indeed, but I'm really
not sure whether it will work for any chipset: in my patch, CRT registers
are set at each console switch, since stty rows/cols apply to consoles
separately...
The attached solution is to keep the test, but if it fails, we assume that
the caller knows what it does (i.e. it is svgatextmode) and then disable
any further call to vgacon_doresize. Svgatextmode is usually used to
_expand_ the display, not to shrink it. And it is harmless in the case of
a too big stty rows/cols: the display will just be cropped. I tested it on
my laptop, and it works fine with svgatextmode.
A better solution would be that svgatextmode explicitely tells the kernel
not to care about video timing, but for this an interface needs be defined
and svgatextmode be patched.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Without the missing barrier, the pos->next value may turn out to be stale.
In fact, if "do stuff" were also dereferencing pos and relying on
list_for_each_rcu to provide the barrier then it may also break.
So here is a patch to make sure that we have a barrier for the first
element in the list.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Mon, 17 Oct 2005 00:36:06 +0000 (17:36 -0700)]
Fix memory ordering bug in page reclaim
As noticed by Nick Piggin, we need to make sure that we check the page
count before we check for PageDirty, since the dirty check is only valid
if the count implies that we're the only possible ones holding the page.
We always did do this, but the code needs a read-memory-barrier to make
sure that the orderign is also honored by the CPU.
(The writer side is ordered due to the atomic decrement and test on the
page count, see the discussion on linux-kernel)
Alan Stern [Fri, 14 Oct 2005 15:23:27 +0000 (11:23 -0400)]
[SCSI] Fix leak of Scsi_Cmnds
When a request is deferred in scsi_init_io because the sg table could not
be allocated, the associated scsi_cmnd is not released and the request is
not marked with REQ_DONTPREP. When the command is retried, if
scsi_prep_fn decides to kill it then the scsi_cmnd will never be released.
This patch (as573) changes scsi_init_io so that it calls scsi_put_command
before deferring a request.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>