The test for MSI IRQ could have timing issues. The PCI write needs to be
pushed out before waiting, and the wait queue should be initialized before
the IRQ.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don't clear status IRQ until list has been read to avoid causing
status list wraparound. Clearing IRQ forces a Transmit Status update
if it is pending.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Daniel Kobras [Sun, 27 Aug 2006 08:23:24 +0000 (01:23 -0700)]
dm: Fix deadlock under high i/o load in raid1 setup.
On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
we experienced frequent deadlock of the system under high i/o load. 'cat
/dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
after a few GB, 'cp' would be left in 'D' state along with kjournald and
kmirrord. The functions cp and kjournald were blocked in did vary, but
kmirrord's wchan always pointed to 'mempool_alloc()'. We've seen this pattern
on 2.6.15 and 2.6.17 kernels. http://lkml.org/lkml/2005/4/20/142 indicates
that this problem has been around even before.
So much for the facts, here's my interpretation: mempool_alloc() first tries
to atomically allocate the requested memory, or falls back to hand out
preallocated chunks from the mempool. If both fail, it puts the calling
process (kmirrord in this case) on a private waitqueue until somebody refills
the pool. Where the only 'somebody' is kmirrord itself, so we have a
deadlock.
I worked around this problem by falling back to a (blocking) kmalloc when
before kmirrord would have ended up on the waitqueue. This defeats part of
the benefits of using the mempool, but at least keeps the system running. And
it could be done with a two-line change. Note that mempool_alloc() clears the
GFP_NOIO flag internally, and only uses it to decide whether to wait or return
an error if immediate allocation fails, so the attached patch doesn't change
behaviour in the non-deadlocking case. Path is against current git
(2.6.18-rc4), but should apply to earlier versions as well. I've tested on
2.6.15, where this patch makes the difference between random lockup and a
stable system.
Signed-off-by: Daniel Kobras <kobras@linux.de> Acked-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Cox [Wed, 30 Aug 2006 18:35:49 +0000 (11:35 -0700)]
Missing PCI id update for VIA IDE
The following change from -mm is important to 2.6.18 (actually to 2.6.17
but its too late for that). This was contributed over three months ago
by VIA to Bartlomiej and nothing happened. As a result the new chipset
is now out and Linux won't run on it. By the time 2.6.18 is finalised
this will be the defacto standard VIA chipset so support would be a good
plan.
Tested in -mm for a while, its essentially a PCI ident update but for
the bridge chip because VIA do things in weird ways.
Robin Holt [Fri, 1 Sep 2006 15:41:39 +0000 (10:41 -0500)]
Silent data corruption caused by XPC
Jack Steiner identified a problem where XPC can cause a silent
data corruption. On module load, the placement may cause the
xpc_remote_copy_buffer to span two physical pages. DMA transfers are
done to the start virtual address translated to physical.
This patch changes the buffer from a statically allocated buffer to a
kmalloc'd buffer. Dean Nelson reviewed this before posting. I have
tested it in the configuration that was showing the memory corruption
and verified it works. I also added a BUG_ON statement to help catch
this if a similar situation is encountered.
Signed-off-by: Robin Holt <holt@sgi.com> Signed-off-by: Dean Nelson <dcn@sgi.com> Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Thu, 31 Aug 2006 18:18:39 +0000 (14:18 -0400)]
uhci-hcd: fix list access bug
When skipping to the last TD of an URB, go to the _last_ entry in the
list instead of the _first_ entry (as780). This fixes Bugzilla #6747 and
possibly others.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ernie Petrides [Sat, 26 Aug 2006 14:20:45 +0000 (10:20 -0400)]
binfmt_elf: fix checks for bad address
Fix check for bad address; use macro instead of open-coding two checks.
Taken from RHEL4 kernel update.
For background, the BAD_ADDR() macro should return TRUE if the address is
TASK_SIZE, because that's the lowest address that is *not* valid for
user-space mappings. The macro was correct in binfmt_aout.c but was wrong
for the "equal to" case in binfmt_elf.c. There were two in-line validations
of user-space addresses in binfmt_elf.c, which have been appropriately
converted to use the corrected BAD_ADDR() macro in the patch you posted
yesterday. Note that the size checks against TASK_SIZE are okay as coded.
The additional changes that I propose are below. These are in the error
paths for bad ELF entry addresses once load_elf_binary() has already
committed to exec'ing the new image (following the tearing down of the
task's original address space).
The 1st hunk deals with the interp-side of the outer "if". There were two
problems here. The printk() should be removed because this path can be
triggered at will by a bogus interpreter image created and used by a
malicious user. Further, the error code should not be ENOEXEC, because that
causes the loop in search_binary_handler() to continue trying other exec
handlers (twice, in fact). But it's too late for this to work correctly,
because the user address space has already been torn down, and an exec()
failure cannot be returned to the user code because the code no longer
exists. The only recovery is to force a SIGSEGV, but it's best to terminate
the search loop immediately. I somewhat arbitrarily chose EINVAL as a
fallback error code, but any error returned by load_elf_interp() will
override that (but this value will never be seen by user-space).
The 2nd hunk deals with the non-interp-side of the outer "if". There were
two problems here as well. The SIGSEGV needs to be forced, because a prior
sigaction() syscall might have set the associated disposition to SIG_IGN.
And the ENOEXEC should be changed to EINVAL as described above.
static int unqueue_me(struct futex_q *q)
{
int ret = 0;
spinlock_t *lock_ptr;
/* In the common case we don't take the spinlock, which is nice. */
retry:
lock_ptr = q->lock_ptr;
if (lock_ptr != 0) {
spin_lock(lock_ptr);
/*
* q->lock_ptr can change between reading it and
* spin_lock(), causing us to take the wrong lock. This
* corrects the race condition.
[...]
and my compiler (gcc 4.1.0) makes the following out of it:
00000000000003c8 <unqueue_me>:
3c8: eb bf f0 70 00 24 stmg %r11,%r15,112(%r15)
3ce: c0 d0 00 00 00 00 larl %r13,3ce <unqueue_me+0x6>
3d0: R_390_PC32DBL .rodata+0x2a
3d4: a7 f1 1e 00 tml %r15,7680
3d8: a7 84 00 01 je 3da <unqueue_me+0x12>
3dc: b9 04 00 ef lgr %r14,%r15
3e0: a7 fb ff d0 aghi %r15,-48
3e4: b9 04 00 b2 lgr %r11,%r2
3e8: e3 e0 f0 98 00 24 stg %r14,152(%r15)
3ee: e3 c0 b0 28 00 04 lg %r12,40(%r11)
/* write q->lock_ptr in r12 */
3f4: b9 02 00 cc ltgr %r12,%r12
3f8: a7 84 00 4b je 48e <unqueue_me+0xc6>
/* if r12 is zero then jump over the code.... */
3fc: e3 20 b0 28 00 04 lg %r2,40(%r11)
/* write q->lock_ptr in r2 */
402: c0 e5 00 00 00 00 brasl %r14,402 <unqueue_me+0x3a>
404: R_390_PC32DBL _spin_lock+0x2
/* use r2 as parameter for spin_lock */
So the code becomes more or less:
if (q->lock_ptr != 0) spin_lock(q->lock_ptr)
instead of
if (lock_ptr != 0) spin_lock(lock_ptr)
Which caused the oops from above.
After adding a barrier gcc creates code without this problem:
[...] (the same)
3ee: e3 c0 b0 28 00 04 lg %r12,40(%r11)
3f4: b9 02 00 cc ltgr %r12,%r12
3f8: b9 04 00 2c lgr %r2,%r12
3fc: a7 84 00 48 je 48c <unqueue_me+0xc4>
400: c0 e5 00 00 00 00 brasl %r14,400 <unqueue_me+0x38>
402: R_390_PC32DBL _spin_lock+0x2
As a general note, this code of unqueue_me seems a bit fishy. The retry logic
of unqueue_me only works if we can guarantee, that the original value of
q->lock_ptr is always a spinlock (Otherwise we overwrite kernel memory). We
know that q->lock_ptr can change. I dont know what happens with the original
spinlock, as I am not an expert with the futex code.
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@timesys.com> Signed-off-by: Christian Borntraeger <borntrae@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Trond Myklebust [Tue, 29 Aug 2006 06:15:54 +0000 (02:15 -0400)]
fcntl(F_SETSIG) fix
fcntl(F_SETSIG) no longer works on leases because
lease_release_private_callback() gets called as the lease is copied in
order to initialise it.
The problem is that lease_alloc() performs an unnecessary initialisation,
which sets the lease_manager_ops. Avoid the problem by allocating the
target lease structure using locks_alloc_lock().
[IPV6]: Fix kernel OOPs when setting sticky socket options.
Bug noticed by Remi Denis-Courmont <rdenis@simphalempin.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
SCTP: Fix sctp_primitive_ABORT() call in sctp_close().
With the recent fix, the callers of sctp_primitive_ABORT()
need to create an ABORT chunk and pass it as an argument rather
than msghdr that was passed earlier.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike the hugetlb code paths, the normal fault code is not setup to
propagate PTE changes for large page sizes correctly like the ones we
make for I/O mappings in io_remap_pfn_range().
It is absolutely necessary to update all sub-ptes of a largepage
mapping on a fault. Adding special handling for this would add
considerably complexity to tlb_batch_add(). So let's just side-step
the issue and forcefully dirty any writable PTEs created by
io_remap_pfn_range().
The only other real option would be to disable to large PTE code of
io_remap_pfn_range() and we really don't want to do that.
Much thanks to Mikael Pettersson for tracking down this problem and
testing debug patches.
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 25 Aug 2006 21:54:13 +0000 (14:54 -0700)]
TG3: Disable TSO by default
Disable TSO by default on some chips due to hardware errata.
Enabling TSO can lead to tx timeouts in some cases when the TSO
header size exceeds 80 bytes on the affected chips. This limit
can be exceeded when the TCP header contains the timestamp option
plus 2 SACK blocks, for example. A more complete workaround is
available in the next 2.6.18 kernel.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Neil Brown [Mon, 26 Jun 2006 07:27:26 +0000 (00:27 -0700)]
dm: mirror sector offset fix
The device-mapper core does not perform any remapping of bios before passing
them to the targets. If a particular mapping begins part-way into a device,
targets obtain the sector relative to the start of the mapping by subtracting
ti->begin.
The dm-raid1 target didn't do this everywhere: this patch fixes it, taking
care to subtract ti->begin exactly once for each bio.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Mahoney [Mon, 26 Jun 2006 07:27:25 +0000 (00:27 -0700)]
dm: fix block device initialisation
In alloc_dev(), we register the device with the block layer and then continue
to initialize the device. But register_disk() makes the device available to
be opened before we have completed initialising it.
This patch moves the final bits of the initialization above the disk
registration.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Mahoney [Mon, 26 Jun 2006 07:27:25 +0000 (00:27 -0700)]
dm: add module ref counting
The reference counting on dm-mod is zero if no mapped devices are open. This
is incorrect, and can lead to an oops if the module is unloaded while mapped
devices exist.
This patch claims a reference to the module whenever a device is created, and
drops it again when the device is freed.
Devices must be removed before dm-mod is unloaded.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Mahoney [Mon, 26 Jun 2006 07:27:23 +0000 (00:27 -0700)]
dm: add DMF_FREEING
There is a chicken and egg problem between the block layer and dm in which the
gendisk associated with a mapping keeps a reference-less pointer to the
mapped_device.
This patch uses a new flag DMF_FREEING to indicate when the mapped_device is
no longer valid. This is checked to prevent any attempt to open the device
from succeeding while the device is being destroyed.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeff Mahoney [Mon, 26 Jun 2006 07:27:21 +0000 (00:27 -0700)]
dm: fix idr minor allocation
One part of the system can attempt to use a mapped device before another has
finished initialising it or while it is being freed.
This patch introduces a place holder value, MINOR_ALLOCED, to mark the minor
as allocated but in a state where it can't be used, such as mid-allocation or
mid-free. At the end of the initialization, it replaces the place holder with
the pointer to the mapped_device, making it available to the rest of the dm
subsystem.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Persistent snapshots currently store a private copy of the chunk size.
Userspace also supplies the chunk size when loading a snapshot. Ensure
consistency by only storing the chunk_size in one place instead of two.
Currently the two sizes will differ if the chunk size supplied by userspace
does not match the chunk size an existing snapshot actually uses. Amongst
other problems, this causes an incorrect 'percentage full' to be reported.
The patch ensures consistency by only storing the chunk_size in one place,
removing it from struct pstore. Some initialisation is delayed until the
correct chunk_size is known. If read_header() discovers that the wrong chunk
size was supplied, the 'area' buffer (which the header already got read into)
is reinitialised to the correct size.
[akpm: too late for 2.6.17 - suitable for 2.6.17.x after it has settled]
Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
There are black box devices out there, routers and firewalls and
whatnot, that simply cannot grok the TCP window scaling option
correctly.
People should and do bark at the site running the device causing
the problems, but in the mean time folks do want a way to deal
with the problem. We don't want them to turn off window scaling
completely as that hurts performance of connections that would run
just fine with window scaling enabled.
So give a way to do this on a per-route basis by limiting the
window scaling by the per-connection window clamp. Stephen's
changelog message explains how to do this using a route metric.
[TCP]: Limit window scaling if window is clamped.
This small change allows for easy per-route workarounds for broken hosts or
middleboxes that are not compliant with TCP standards for window scaling.
Rather than having to turn off window scaling globally. This patch allows
reducing or disabling window scaling if window clamp is present.
Example: Mark Lord reported a problem with 2.6.17 kernel being unable to
access http://www.everymac.com
# ip route add 216.145.246.23/32 via 10.8.0.1 window 65535
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
bridge-netfilter: don't overwrite memory outside of skb
The bridge netfilter code needs to check for space at the
front of the skb before overwriting; otherwise if skb from
device doesn't have headroom, then it will cause random
memory corruption.
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* include/asm-ia64/mman.h
...
#ifdef __KERNEL__
#define arch_mmap_check ia64_map_check_rgn
int ia64_map_check_rgn(unsigned long addr, unsigned long len,
unsigned long flags);
#endif
...
Signed-off-by: Fernando Vazquez <fernando@intellilink.co.jp> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Tue, 22 Aug 2006 20:41:18 +0000 (13:41 -0700)]
Fix output framentation of paged-skbs
[INET]: Use pskb_trim_unique when trimming paged unique skbs
The IPv4/IPv6 datagram output path was using skb_trim to trim paged
packets because they know that the packet has not been cloned yet
(since the packet hasn't been given to anything else in the system).
This broke because skb_trim no longer allows paged packets to be
trimmed. Paged packets must be given to one of the pskb_trim functions
instead.
This patch adds a new pskb_trim_unique function to cover the IPv4/IPv6
datagram output path scenario and replaces the corresponding skb_trim
calls with it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Michael Rash [Tue, 22 Aug 2006 02:07:57 +0000 (04:07 +0200)]
TEXTSEARCH: Fix Boyer Moore initialization bug
[TEXTSEARCH]: Fix Boyer Moore initialization bug
The pattern is set after trying to compute the prefix table, which tries
to use it. Initialize it before calling compute_prefix_tbl, make
compute_prefix_tbl consistently use only the data from struct ts_bm
and remove the now unnecessary arguments.
Signed-off-by: Michael Rash <mbr@cipherdyne.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Danny Tholen [Fri, 18 Aug 2006 23:10:16 +0000 (16:10 -0700)]
1394: fix for recently added firewire patch that breaks things on ppc
Recently a patch was added for preliminary suspend/resume handling on
!PPC_PMAC. However, this broke both suspend and firewire on powerpc
because it saves the pci state after the device has already been disabled.
This moves the save state to before the pmac specific code.
Signed-off-by: Danny Tholen <obiwan@mailmij.org> Cc: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Collins <bcollins@ubuntu.com> Cc: Jody McIntyre <scjody@modernduck.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
NeilBrown [Mon, 21 Aug 2006 00:05:26 +0000 (10:05 +1000)]
MD: Fix a potential NULL dereference in md/raid1
At the point where this 'atomic_add' is, rdev could be NULL, as seen by
the fact that we test for this in the very next statement.
Further is it is really the wrong place of the add. We could add to the
count of corrected errors once the are sure it was corrected, not before
trying to correct it.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
Michal Miroslaw [Mon, 14 Aug 2006 06:24:20 +0000 (23:24 -0700)]
dm: BUG/OOPS fix
Fix BUG I tripped on while testing failover and multipathing.
BUG shows up on error path in multipath_ctr() when parse_priority_group()
fails after returning at least once without error. The fix is to
initialize m->ti early - just after alloc()ing it.
Alexey Kuznetsov [Fri, 18 Aug 2006 05:57:22 +0000 (22:57 -0700)]
Fix ipv4 routing locking bug
[IPV4]: severe locking bug in fib_semantics.c
Found in 2.4 by Yixin Pan <yxpan@hotmail.com>.
> When I read fib_semantics.c of Linux-2.4.32, write_lock(&fib_info_lock) =
> is used in fib_release_info() instead of write_lock_bh(&fib_info_lock). =
> Is the following case possible: a BH interrupts fib_release_info() while =
> holding the write lock, and calls ip_check_fib_default() which calls =
> read_lock(&fib_info_lock), and spin forever.
Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Daniel Ritz [Fri, 18 Aug 2006 14:50:40 +0000 (16:50 +0200)]
PCI: fix ICH6 quirks
- add the ICH6(R) LPC to the ICH6 ACPI quirks. currently only the ICH6-M is
handled. [ PCI_DEVICE_ID_INTEL_ICH6_1 is the ICH6-M LPC, ICH6_0 is the ICH6(R) ]
- remove the wrong quirk calling asus_hides_smbus_lpc() for ICH6. the register
modified in asus_hides_smbus_lpc() has a different meaning in ICH6.
Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kirill Korotaev [Mon, 14 Aug 2006 06:24:23 +0000 (23:24 -0700)]
sys_getppid oopses on debug kernel
sys_getppid() optimization can access a freed memory. On kernels with
DEBUG_SLAB turned ON, this results in Oops. As Dave Hansen noted, this
optimization is also unsafe for memory hotplug.
So this patch always takes the lock to be safe.
[oleg@tv-sign.ru: simplifications]
Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mark Huang [Sat, 12 Aug 2006 00:45:44 +0000 (02:45 +0200)]
ulog: fix panic on SMP kernels
[NETFILTER]: ulog: fix panic on SMP kernels
Fix kernel panic on various SMP machines. The culprit is a null
ub->skb in ulog_send(). If ulog_timer() has already been scheduled on
one CPU and is spinning on the lock, and ipt_ulog_packet() flushes the
queue on another CPU by calling ulog_send() right before it exits,
there will be no skbuff when ulog_timer() acquires the lock and calls
ulog_send(). Cancelling the timer in ulog_send() doesn't help because
it has already been scheduled and is running on the first CPU.
Similar problem exists in ebt_ulog.c and nfnetlink_log.c.
Signed-off-by: Mark Huang <mlhuang@cs.princeton.edu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Diego Calleja [Sat, 5 Aug 2006 19:14:55 +0000 (12:14 -0700)]
Fix BeFS slab corruption
In bugzilla #6941, Jens Kilian reported:
"The function befs_utf2nls (in fs/befs/linuxvfs.c) writes a 0 byte past the
end of a block of memory allocated via kmalloc(), leading to memory
corruption. This happens only for filenames which are pure ASCII and a
multiple of 4 bytes in length. [...]
Without DEBUG_SLAB, this leads to further corruption and hard lockups; I
believe this is the bug which has made kernels later than 2.6.8 unusable
for me. (This must be due to changes in memory management, the bug has
been in the BeFS driver since the time it was introduced (AFAICT).)
Steps to reproduce:
Create a directory (in BeOS, naturally :-) with files named, e.g.,
"1", "22", "333", "4444", ... Mount it in Linux and do an "ls" or "find""
This patch implements the suggested fix. Credits to Jens Kilian for
debugging the problem and finding the right fix.
Signed-off-by: Diego Calleja <diegocg@gmail.com> Cc: Jens Kilian <jjk@acm.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Wed, 9 Aug 2006 09:33:28 +0000 (02:33 -0700)]
Fix IFLA_ADDRESS handling
[RTNETLINK]: Fix IFLA_ADDRESS handling.
The ->set_mac_address handlers expect a pointer to a
sockaddr which contains the MAC address, whereas
IFLA_ADDRESS provides just the MAC address itself.
So whip up a sockaddr to wrap around the netlink
attribute for the ->set_mac_address call.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dmitry Mishin [Wed, 9 Aug 2006 09:36:33 +0000 (02:36 -0700)]
Fix timer race in dst GC code
[NET]: add_timer -> mod_timer() in dst_run_gc()
Patch from Dmitry Mishin <dim@openvz.org>:
Replace add_timer() by mod_timer() in dst_run_gc
in order to avoid BUG message.
CPU1 CPU2
dst_run_gc() entered dst_run_gc() entered
spin_lock(&dst_lock) .....
del_timer(&dst_gc_timer) fail to get lock
.... mod_timer() <--- puts
timer back
to the list
add_timer(&dst_gc_timer) <--- BUG because timer is in list already.
Found during OpenVZ internal testing.
At first we thought that it is OpenVZ specific as we
added dst_run_gc(0) call in dst_dev_event(),
but as Alexey pointed to me it is possible to trigger
this condition in mainstream kernel.
F.e. timer has fired on CPU2, but the handler was preeempted
by an irq before dst_lock is tried.
Meanwhile, someone on CPU1 adds an entry to gc list and
starts the timer.
If CPU2 was preempted long enough, this timer can expire
simultaneously with resuming timer handler on CPU1, arriving
exactly to the situation described.
Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kirill Korotaev [Wed, 9 Aug 2006 09:35:21 +0000 (02:35 -0700)]
Kill HASH_HIGHMEM from route cache hash sizing
[IPV4]: Limit rt cache size properly.
During OpenVZ stress testing we found that UDP traffic with random src
can generate too much excessive rt hash growing leading finally to OOM
and kernel panics.
It was found that for 4GB i686 system (having 1048576 total pages and
225280 normal zone pages) kernel allocates the following route hash:
syslog: IP route cache hash table entries: 262144 (order: 8, 1048576
bytes) => ip_rt_max_size = 4194304 entries, i.e. max rt size is 4194304 * 256b = 1Gb of RAM > normal_zone
Attached the patch which removes HASH_HIGHMEM flag from
alloc_large_system_hash() call.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On the 88E805X chipsets (used in laptops), the PHY was not getting powered
out of shutdown properly. The variable reg1 was getting reused incorrectly.
This is probably the cause of the bug.
http://bugzilla.kernel.org/show_bug.cgi?id=6471
Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
sctp_make_abort_user() now takes the msg_len along with the msg
so that we don't have to recalculate the bytes in iovec.
It also uses memcpy_fromiovec() so that we don't go beyond the
length allocated.
It is good to have this fix even if verify_iovec() is fixed to
return error on overflow.
Steven Rostedt [Thu, 3 Aug 2006 16:28:11 +0000 (12:28 -0400)]
Add stable branch to maintainers file
While helping someone to submit a patch to the stable branch, I noticed
that the stable branch is not listed in the MAINTAINERS file. This was
after I went there to look for the email addresses for the stable branch
list (stable@kernel.org).
This patch adds the stable branch to the maintainers file so that people
can find where to send patches when they have a fix for the stable team.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
futex_atomic_cmpxchg_inatomic has the same bug as the other
atomic futex operations: the operation needs to be done in the
user address space, not the kernel address space. Add the missing
sacf 256 & sacf 0.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently I am doing lots of refactoring work in the dvb tree. This
bugfix became necessary to fix 2.6.17 whilst I was in the middle of this
work. Unfortunately after I tested the original code for the patch, I
generated the diff against the wrong tree (I accidentally used a tree
with part of the refactoring code in it). This resulted in the reported
compile errors because that tree (a) was incomplete, and (b) used
features which are simply not in the mainline kernel yet.
Many apologies for the error and problems this has caused. :(
Signed-off-by: Andrew de Quincey <adq_dvb@lidskialf.net> Signed-off-by: Michael Krufky <mkrufky@linuxtv.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andrew Morton [Sat, 29 Jul 2006 02:52:09 +0000 (22:52 -0400)]
cond_resched() fix
Fix a bug identified by Zou Nan hai <nanhai.zou@intel.com>:
If the system is in state SYSTEM_BOOTING, and need_resched() is true,
cond_resched() returns true even though it didn't reschedule. Consequently
need_resched() remains true and JBD locks up.
Fix that by teaching cond_resched() to only return true if it really did call
schedule().
cond_resched_lock() and cond_resched_softirq() have a problem too. If we're
in SYSTEM_BOOTING state and need_resched() is true, these functions will drop
the lock and will then try to call schedule(), but the SYSTEM_BOOTING state
will prevent schedule() from being called. So on return, need_resched() will
still be true, but cond_resched_lock() has to return 1 to tell the caller that
the lock was dropped. The caller will probably lock up.
Bottom line: if these functions dropped the lock, they _must_ call schedule()
to clear need_resched(). Make it so.
Also, uninline __cond_resched(). It's largeish, and slowpath.
The Intel(R) PRO/1000 82572EI card is fully supported by 7.0.33-k2 and
onward. Add the device ID so this card works with 2.6.17.y onward. This
device ID was accidentally omitted.
Neil Brown [Sun, 30 Jul 2006 10:03:01 +0000 (03:03 -0700)]
ext3: avoid triggering ext3_error on bad NFS file handle
The inode number out of an NFS file handle gets passed eventually to
ext3_get_inode_block() without any checking. If ext3_get_inode_block()
allows it to trigger an error, then bad filehandles can have unpleasant
effect - ext3_error() will usually cause a forced read-only remount, or a
panic if `errors=panic' was used.
So remove the call to ext3_error there and put a matching check in
ext3/namei.c where inode numbers are read off storage.
[akpm@osdl.org: fix off-by-one error] Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Jan Kara <jack@suse.cz> Cc: Marcel Holtmann <marcel@holtmann.org> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: Eric Sandeen <esandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For files other than IFREG, nobh option doesn't make sense. Modifications
to them are journalled and needs buffer heads to do that. Without this
patch, we get kernel oops in page_buffers().
Neil Brown [Thu, 3 Aug 2006 00:20:12 +0000 (10:20 +1000)]
Fix race related problem when adding items to and svcrpc auth cache.
Fix race related problem when adding items to and svcrpc auth cache.
If we don't find the item we are lookng for, we allocate a new one,
and then grab the lock again and search to see if it has been added
while we did the alloc.
If it had been added we need to 'cache_put' the newly created item
that we are never going to use. But as it hasn't been initialised
properly, putting it can cause an oops.
So move the ->init call earlier to that it will always be fully
initilised if we have to put it.
Thanks to Philipp Matthias Hahn <pmhahn@svs.Informatik.Uni-Oldenburg.de>
for reporting the problem.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stefan Richter [Wed, 2 Aug 2006 17:40:06 +0000 (19:40 +0200)]
ieee1394: sbp2: enable auto spin-up for Maxtor disks
At least Maxtor OneTouch III require a "start stop unit" command after
auto spin-down before the next access can proceed. This patch activates
the responsible code in scsi_mod for all Maxtor SBP-2 disks.
https://bugzilla.novell.com/show_bug.cgi?id=183011
Maybe that should be done for all SBP-2 disks, but better be cautious.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Fri, 28 Jul 2006 00:02:36 +0000 (17:02 -0700)]
Sparc64 quad-float emulation fix
[SPARC64]: Fix quad-float multiply emulation.
Something is wrong with the 3-multiply (vs. 4-multiply) optimized
version of _FP_MUL_MEAT_2_*(), so just use the slower version
which actually computes correct values.
Noticed by Rene Rebe
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stefan Rompf [Mon, 24 Jul 2006 20:54:15 +0000 (13:54 -0700)]
VLAN state handling fix
[VLAN]: Fix link state propagation
When the queue of the underlying device is stopped at initialization time
or the device is marked "not present", the state will be propagated to the
vlan device and never change. Based on an analysis by Patrick McHardy.
Signed-off-by: Stefan Rompf <stefan@loplof.de> ACKed-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Herbert Xu [Sun, 30 Jul 2006 22:50:37 +0000 (08:50 +1000)]
Update frag_list in pskb_trim
[NET]: Update frag_list in pskb_trim
When pskb_trim has to defer to ___pksb_trim to trim the frag_list part of
the packet, the frag_list is not updated to reflect the trimming. This
will usually work fine until you hit something that uses the packet length
or tail from the frag_list.
Examples include esp_output and ip_fragment.
Another problem caused by this is that you can end up with a linear packet
with a frag_list attached.
It is possible to get away with this if we audit everything to make sure
that they always consult skb->len before going down onto frag_list. In
fact we can do the samething for the paged part as well to avoid copying
the data area of the skb. For now though, let's do the conservative fix
and update frag_list.
Many thanks to Marco Berizzi for helping me to track down this bug.
This 4-year old bug took 3 months to track down. Marco was very patient
indeed :)
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Mon, 24 Jul 2006 16:06:55 +0000 (12:06 -0400)]
UHCI: Fix handling of short last packet
This patch (as753) fixes the way uhci-hcd handles a short packet when it
is the last packet of an URB. Right now the driver handles short packets
the same no matter when they occur. However, the controller stops
transferring packets when the short packet is not the last one (otherwise
it would be reading beyond the end of the device's data) and needs to be
restarted, whereas no such need occurs when the short packet is the last
one.
The result of the bug is that USB endpoint queues experience intermittent
hangs, a regression in 2.6.17 with respect to earlier kernels. The bug
was raised in Bugzilla #6752 and this patch fixed it.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mark M. Hoffman [Wed, 26 Jul 2006 19:53:13 +0000 (21:53 +0200)]
i2c: Fix 'ignore' module parameter handling in i2c-core
This patch fixes a bug in the handling of 'ignore' module parameters of I2C
client drivers.
Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jean Delvare [Wed, 26 Jul 2006 19:50:15 +0000 (21:50 +0200)]
scx200_acb: Fix the block transactions
The scx200_acb i2c bus driver pretends to support SMBus block
transactions, but in fact it implements the more simple I2C block
transactions. Additionally, it lacks sanity checks on the length
of the block transactions, which could lead to a buffer overrun.
This fixes an oops reported by Alexander Atanasov:
http://marc.theaimsgroup.com/?l=linux-kernel&m=114970382125094
Thanks to Ben Gardner for fixing my bugs :)
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This just turns off chmod() on the /proc/<pid>/ files, since there is no
good reason to allow it, and had we disallowed it originally, the nasty
/proc race exploit wouldn't have been possible.
The other patches already fixed the problem chmod() could cause, so this
is really just some final mop-up..
This particular version is based off a patch by Eugene and Marcel which
had much better naming than my original equivalent one.
Chuck Ebbert [Thu, 15 Jun 2006 08:41:52 +0000 (04:41 -0400)]
PCI: fix issues with extended conf space when MMCONFIG disabled because of e820
On 15 Jun 2006 03:45:10 +0200, Andi Kleen wrote:
> Anyways I would say that if the BIOS can't get MCFG right then
> it's likely not been validated on that board and shouldn't be used.
According to Petr Vandrovec:
... "What is important (and checked) is address of MMCONFIG reported by MCFG
table... Unfortunately code does not bother with printing that address :-(
"Another problem is that code has hardcoded that MMCONFIG area is 256MB large.
Unfortunately for the code PCI specification allows any power of two between 2MB
and 256MB if vendor knows that such amount of busses (from 2 to 128) will be
sufficient for system. With notebook it is quite possible that not full 8 bits
are implemented for MMCONFIG bus number."
So here is a patch. Unfortunately my system still fails the test because
it doesn't reserve any part of the MMCONFIG area, but this may fix others.
Booted on x86_64, only compiled on i386. x86_64 still remaps the max area
(256MB) even though only 2MB is checked... but 2.6.16 had no check at all
so it is still better.
PCI: reduce size of x86 MMCONFIG reserved area check
1. Print the address of the MMCONFIG area when the test for that area
being reserved fails.
2. Only check if the first 2MB is reserved, as that is the minimum.
Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Pavel Machek [Sat, 8 Jul 2006 15:37:31 +0000 (17:37 +0200)]
pdflush: handle resume wakeups
2.6.16 needs this. It was merged into 2.6.18-rc1 in
http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d616e09ab33aa4d013a93c9b393efd5cebf78521 .
pdflush is carefully designed to ensure that all wakeups have some
corresponding work to do - if a woken-up pdflush thread discovers that
it hasn't been given any work to do then this is considered an error.
That all broke when swsusp came along - because a timer-delivered
wakeup to a frozen pdflush thread will just get lost. This causes the
pdflush thread to get lost as well: the writeback timer is supposed to
be re-armed by pdflush in process context, but pdflush doesn't execute
the callout which does this.
Fix that up by ignoring the return value from try_to_freeze(): jsut
proceed, see if we have any work pending and only go back to sleep if
that is not the case.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Patrick McHardy [Sat, 8 Jul 2006 20:39:35 +0000 (13:39 -0700)]
Fix IPv4/DECnet routing rule dumping
When more rules are present than fit in a single skb, the remaining
rules are incorrectly skipped.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When found, it is obvious. nfds calculated when allocating fdsets is
rewritten by calculation of size of fdtable, and when we are unlucky, we
try to free fdsets of wrong size.
Found due to OpenVZ resource management (User Beancounters).