[IPV6]: Fix kernel OOPs when setting sticky socket options.
Bug noticed by Remi Denis-Courmont <rdenis@simphalempin.com>.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Turned out to be a race-condition and NULL ptr deref, here's my fix:
Users of the idr code are supposed to call idr_pre_get without locking, so the
idr code must serialize itself with respect to layer allocations. However, it
fails to do so in an error path in idr_get_new_above_int(). I added the
missing locking to fix this.
Signed-off-by: Sonny Rao <sonny@burdell.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
static int unqueue_me(struct futex_q *q)
{
int ret = 0;
spinlock_t *lock_ptr;
/* In the common case we don't take the spinlock, which is nice. */
retry:
lock_ptr = q->lock_ptr;
if (lock_ptr != 0) {
spin_lock(lock_ptr);
/*
* q->lock_ptr can change between reading it and
* spin_lock(), causing us to take the wrong lock. This
* corrects the race condition.
[...]
and my compiler (gcc 4.1.0) makes the following out of it:
00000000000003c8 <unqueue_me>:
3c8: eb bf f0 70 00 24 stmg %r11,%r15,112(%r15)
3ce: c0 d0 00 00 00 00 larl %r13,3ce <unqueue_me+0x6>
3d0: R_390_PC32DBL .rodata+0x2a
3d4: a7 f1 1e 00 tml %r15,7680
3d8: a7 84 00 01 je 3da <unqueue_me+0x12>
3dc: b9 04 00 ef lgr %r14,%r15
3e0: a7 fb ff d0 aghi %r15,-48
3e4: b9 04 00 b2 lgr %r11,%r2
3e8: e3 e0 f0 98 00 24 stg %r14,152(%r15)
3ee: e3 c0 b0 28 00 04 lg %r12,40(%r11)
/* write q->lock_ptr in r12 */
3f4: b9 02 00 cc ltgr %r12,%r12
3f8: a7 84 00 4b je 48e <unqueue_me+0xc6>
/* if r12 is zero then jump over the code.... */
3fc: e3 20 b0 28 00 04 lg %r2,40(%r11)
/* write q->lock_ptr in r2 */
402: c0 e5 00 00 00 00 brasl %r14,402 <unqueue_me+0x3a>
404: R_390_PC32DBL _spin_lock+0x2
/* use r2 as parameter for spin_lock */
So the code becomes more or less:
if (q->lock_ptr != 0) spin_lock(q->lock_ptr)
instead of
if (lock_ptr != 0) spin_lock(lock_ptr)
Which caused the oops from above.
After adding a barrier gcc creates code without this problem:
[...] (the same)
3ee: e3 c0 b0 28 00 04 lg %r12,40(%r11)
3f4: b9 02 00 cc ltgr %r12,%r12
3f8: b9 04 00 2c lgr %r2,%r12
3fc: a7 84 00 48 je 48c <unqueue_me+0xc4>
400: c0 e5 00 00 00 00 brasl %r14,400 <unqueue_me+0x38>
402: R_390_PC32DBL _spin_lock+0x2
As a general note, this code of unqueue_me seems a bit fishy. The retry logic
of unqueue_me only works if we can guarantee, that the original value of
q->lock_ptr is always a spinlock (Otherwise we overwrite kernel memory). We
know that q->lock_ptr can change. I dont know what happens with the original
spinlock, as I am not an expert with the futex code.
Signed-off-by: Christian Borntraeger <borntrae@de.ibm.com> Acked-by: Ingo Molnar <mingo@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
With the recent fix, the callers of sctp_primitive_ABORT()
need to create an ABORT chunk and pass it as an argument rather
than msghdr that was passed earlier.
Adrian Bunk:
Ported to 2.6.16.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Willy Tarreau [Thu, 31 Aug 2006 20:02:56 +0000 (22:02 +0200)]
ethtool: fix oops in ethtool_set_pauseparam()
The function pointers which were checked were for their get_* counterparts.
Typically a copy-paste typo.
Signed-off-by: Willy Tarreau <w@1wt.eu> Acked-by: Jeff Garzik <jeff@garzik.org> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Herbert Xu [Thu, 31 Aug 2006 19:59:19 +0000 (21:59 +0200)]
ETHTOOL: Fix UFO typo
The function ethtool_get_ufo was referring to ETHTOOL_GTSO instead of
ETHTOOL_GUFO.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Neil Brown [Wed, 30 Aug 2006 15:58:44 +0000 (17:58 +0200)]
ext3: avoid triggering ext3_error on bad NFS file handle
The inode number out of an NFS file handle gets passed eventually to
ext3_get_inode_block() without any checking. If ext3_get_inode_block()
allows it to trigger an error, then bad filehandles can have unpleasant
effect - ext3_error() will usually cause a forced read-only remount, or a
panic if `errors=panic' was used.
So remove the call to ext3_error there and put a matching check in
ext3/namei.c where inode numbers are read off storage.
Andrew Morton fixed an off-by-one error.
Dann Frazier ported the patch to 2.6.16.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Alexey Dobriyan [Wed, 30 Aug 2006 15:31:15 +0000 (17:31 +0200)]
eicon: fix define conflict with ptrace
* MODE_MASK is unused in eicon driver.
* Conflicts with a ptrace stuff on arm.
drivers/isdn/hardware/eicon/divasync.h:259:1: warning: "MODE_MASK" redefined
include2/asm/ptrace.h:48:1: warning: this is the location of the previous definition
Hannes Reinecke [Wed, 30 Aug 2006 15:23:16 +0000 (17:23 +0200)]
aic79xx: use BIOS settings
This patch fixes the aic79xx driver to properly respond to BIOS
settings.
Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Mark Huang [Sat, 26 Aug 2006 15:35:49 +0000 (17:35 +0200)]
ulog: fix panic on SMP kernels
Fix kernel panic on various SMP machines. The culprit is a null
ub->skb in ulog_send(). If ulog_timer() has already been scheduled on
one CPU and is spinning on the lock, and ipt_ulog_packet() flushes the
queue on another CPU by calling ulog_send() right before it exits,
there will be no skbuff when ulog_timer() acquires the lock and calls
ulog_send(). Cancelling the timer in ulog_send() doesn't help because
it has already been scheduled and is running on the first CPU.
Similar problem exists in ebt_ulog.c and nfnetlink_log.c.
Signed-off-by: Mark Huang <mlhuang@cs.princeton.edu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Neil Brown [Sat, 26 Aug 2006 15:33:27 +0000 (17:33 +0200)]
Fix a potential NULL dereference in md/raid1
At the point where this 'atomic_add' is, rdev could be NULL,
as seen by the fact that we test for this in the very next
statement.
Further is it is really the wrong place of the add.
We could add to the count of corrected errors
once the are sure it was corrected, not before
trying to correct it.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Tsutomu Fujii [Sat, 26 Aug 2006 00:41:37 +0000 (02:41 +0200)]
SCTP: Send only 1 window update SACK per message.
Right now, every time we increase our rwnd by more then MTU bytes, we
trigger a SACK. When processing large messages, this will generate a
SACK for almost every other SCTP fragment. However since we are freeing
the entire message at the same time, we might as well collapse the SACK
generation to 1.
Signed-off-by: Tsutomu Fujii <t-fujii@nb.jp.nec.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Vlad Yasevich [Sat, 26 Aug 2006 00:41:12 +0000 (02:41 +0200)]
SCTP: Reset rtt_in_progress for the chunk when processing its sack.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Vlad Yasevich [Sat, 26 Aug 2006 00:40:29 +0000 (02:40 +0200)]
SCTP: Limit association max_retrans setting in setsockopt.
When using ASSOCINFO socket option, we need to limit the number of
maximum association retransmissions to be no greater than the sum
of all the path retransmissions. This is specified in Section 7.1.2
of the SCTP socket API draft.
However, we only do this if the association has multiple paths. If
there is only one path, the protocol stack will use the
assoc_max_retrans setting when trying to retransmit packets.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Neil Horman [Sat, 26 Aug 2006 00:39:48 +0000 (02:39 +0200)]
SCTP: Fix persistent slowdown in sctp when a gap ack consumes rx buffer.
In the event that our entire receive buffer is full with a series of
chunks that represent a single gap-ack, and then we accept a chunk
(or chunks) that fill in the gap between the ctsn and the first gap,
we renege chunks from the end of the buffer, which effectively does
nothing but move our gap to the end of our received tsn stream. This
does little but move our missing tsns down stream a little, and, if the
sender is sending sufficiently large retransmit frames, the result is a
perpetual slowdown which can never be recovered from, since the only
chunk that can be accepted to allow progress in the tsn stream necessitates
that a new gap be created to make room for it. This leads to a constant
need for retransmits, and subsequent receiver stalls. The fix I've come up
with is to deliver the frame without reneging if we have a full receive
buffer and the receiving sockets sk_receive_queue is empty(indicating that
the receive buffer is being blocked by a missing tsn).
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Vlad Yasevich [Sat, 26 Aug 2006 00:39:03 +0000 (02:39 +0200)]
SCTP: Reject sctp packets with broadcast addresses.
Make SCTP handle broadcast properly
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Danny Tholen [Wed, 23 Aug 2006 17:28:41 +0000 (19:28 +0200)]
1394: fix for recently added firewire patch that breaks things on ppc
Recently a patch was added for preliminary suspend/resume handling on
!PPC_PMAC. However, this broke both suspend and firewire on powerpc
because it saves the pci state after the device has already been disabled.
This moves the save state to before the pmac specific code.
Signed-off-by: Danny Tholen <obiwan@mailmij.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
sctp_make_abort_user() now takes the msg_len along with the msg
so that we don't have to recalculate the bytes in iovec.
It also uses memcpy_fromiovec() so that we don't go beyond the
length allocated.
It is good to have this fix even if verify_iovec() is fixed to
return error on overflow.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Acked-by: David Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Al Boldi [Fri, 18 Aug 2006 19:30:53 +0000 (21:30 +0200)]
ide-io: increase timeout value to allow for slave wakeup
During an STR resume cycle, the ide master disk times-out when there is
also a slave present (especially CD). Increasing the timeout in ide-io
from 10,000 to 100,000 fixes this problem.
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
David S. Miller [Fri, 18 Aug 2006 19:28:03 +0000 (21:28 +0200)]
SPARC64: Fix quad-float multiply emulation.
Something is wrong with the 3-multiply (vs. 4-multiply) optimized
version of _FP_MUL_MEAT_2_*(), so just use the slower version
which actually computes correct values.
Noticed by Rene Rebe
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Bob Breuer [Fri, 18 Aug 2006 19:26:56 +0000 (21:26 +0200)]
SPARC32: Fix iommu_flush_iotlb end address
Fix the calculation of the end address when flushing iotlb entries to
ram. This bug has been a cause of esp dma errors, and it affects
HyperSPARC systems much worse than SuperSPARC systems.
Signed-off-by: Bob Breuer <breuerr@mc.net> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Adrian Bunk [Sat, 12 Aug 2006 16:59:17 +0000 (18:59 +0200)]
update the i386 defconfig
The i386 defconfig wasn't updated for ages.
Instead of running "make oldconfig" on the old defconfig and trying to
give reasonable answers at all new options, this patch replaces it with
the one I'm using in 2.6.16-rc1.
This way, it's a .config that is confirmed to work on at least one
computer in the world. ;-)
Stefan Richter [Fri, 11 Aug 2006 21:42:30 +0000 (23:42 +0200)]
ieee1394: sbp2: enable auto spin-up for Maxtor disks
At least Maxtor OneTouch III require a "start stop unit" command after
auto spin-down before the next access can proceed. This patch activates
the responsible code in scsi_mod for all Maxtor SBP-2 disks.
https://bugzilla.novell.com/show_bug.cgi?id=183011
Maybe that should be done for all SBP-2 disks, but better be cautious.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Robert Hancock [Fri, 11 Aug 2006 21:41:52 +0000 (23:41 +0200)]
Fix broken suspend/resume in ohci1394
I've been experimenting to track down the cause of suspend/resume problems
on my Compaq Presario X1050 laptop:
http://bugzilla.kernel.org/show_bug.cgi?id=6075
Essentially the ACPI Embedded Controller and keyboard controller would
get into a bizarre, confused state after resume.
I found that unloading the ohci1394 module before suspend and reloading it
after resume made the problem go away. Diffing the dmesg output from
resume, with and without the module loaded, I found that with the module
loaded I was missing these:
PM: Writing back config space on device 0000:02:00.0 at offset 1. (Was 2100080, writing 2100007)
PM: Writing back config space on device 0000:02:00.0 at offset 3. (Was 0, writing 8008)
PM: Writing back config space on device 0000:02:00.0 at offset 4. (Was 0, writing 90200000)
PM: Writing back config space on device 0000:02:00.0 at offset 5. (Was 1, writing 2401)
PM: Writing back config space on device 0000:02:00.0 at offset f. (Was 20000100, writing 2000010a)
The default PCI driver performs the pci_restore_state when no driver is
loaded for the device. When the ohci1394 driver is loaded, it is supposed
to do this, however it appears not to do so.
I created the patch below and tested it, and it appears to resolve the
suspend problems I was having with the module loaded. I only added in the
pci_save_state and pci_restore_state - however, though I know little of
this hardware, surely the driver should really be doing more than this when
suspending and resuming? Currently it does almost nothing, what if there
are commands in progress, etc?
Signed-off-by: Robert Hancock <hancockr@shaw.ca> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Jens Axboe [Fri, 11 Aug 2006 20:43:42 +0000 (22:43 +0200)]
fix debugfs inode leak
Looking at the reiser4 crash, I found a leak in debugfs. In
debugfs_mknod(), we create the inode before checking if the dentry
already has one attached. We don't free it if that is the case.
Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Jens Axboe [Fri, 11 Aug 2006 20:29:11 +0000 (22:29 +0200)]
Fix missing ret assignment in __bio_map_user() error path
If get_user_pages() returns less pages than what we asked for, we
jump to out_unmap which will return ERR_PTR(ret). But ret can contain
a positive number just smaller than local_nr_pages, so be sure to set
it to -EFAULT always.
Problem found and diagnosed by Damien Le Moal <damien@sdl.hitachi.co.jp>
Signed-off-by: Jens Axboe <axboe@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Yasunori Goto [Tue, 8 Aug 2006 15:35:33 +0000 (17:35 +0200)]
memory hotplug: solve config broken: undefined reference to `online_page'
Memory hotplug code of i386 adds memory to only highmem. So, if
CONFIG_HIGHMEM is not set, CONFIG_MEMORY_HOTPLUG shouldn't be set.
Otherwise, it causes compile error.
In addition, many architecture can't use memory hotplug feature yet. So, I
introduce CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG.
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Pavel Machek [Tue, 8 Aug 2006 09:58:01 +0000 (11:58 +0200)]
pdflush: handle resume wakeups
2.6.16 needs this. It was merged into 2.6.18-rc1.
pdflush is carefully designed to ensure that all wakeups have some
corresponding work to do - if a woken-up pdflush thread discovers that
it hasn't been given any work to do then this is considered an error.
That all broke when swsusp came along - because a timer-delivered
wakeup to a frozen pdflush thread will just get lost. This causes the
pdflush thread to get lost as well: the writeback timer is supposed to
be re-armed by pdflush in process context, but pdflush doesn't execute
the callout which does this.
Fix that up by ignoring the return value from try_to_freeze(): jsut
proceed, see if we have any work pending and only go back to sleep if
that is not the case.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Andi Kleen [Mon, 7 Aug 2006 19:26:18 +0000 (21:26 +0200)]
BLOCK: Fix bounce limit address check
This fixes some OOMs on 64bit systems with <4GB of RAM when accessing
the cdrom.
Do a safer check for when to enable DMA. Currently we enable ISA DMA
for cases that do not need it, resulting in OOM conditions when ZONE_DMA
runs out of space.
IB/mthca: restore missing PCI registers after reset
mthca does not restore the following PCI-X/PCI Express registers after reset:
PCI-X device: PCI-X command register
PCI-X bridge: upstream and downstream split transaction registers
PCI Express : PCI Express device control and link control registers
This causes instability and/or bad performance on systems where one of
these registers is set to a non-default value by BIOS.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Adrian Bunk [Thu, 3 Aug 2006 19:58:11 +0000 (21:58 +0200)]
fix the SND_FM801_TEA575X dependencies
CONFIG_SND_FM801=y, CONFIG_SND_FM801_TEA575X=m resulted in the following
compile error:
<-- snip -->
...
LD vmlinux
sound/built-in.o: In function 'snd_fm801_free':
fm801.c:(.text+0x3c15b): undefined reference to 'snd_tea575x_exit'
sound/built-in.o: In function 'snd_card_fm801_probe':
fm801.c:(.text+0x3cfde): undefined reference to 'snd_tea575x_init'
make: *** [vmlinux] Error 1
Ian Abbott [Mon, 26 Jun 2006 11:59:17 +0000 (12:59 +0100)]
[PATCH] USB serial ftdi_sio: Prevent userspace DoS (CVE-2006-2936)
This patch limits the amount of outstanding 'write' data that can be
queued up for the ftdi_sio driver, to prevent userspace DoS attacks (or
simple accidents) that use up all the system memory by writing lots of
data to the serial port.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] IPV6 ADDRCONF: Fix default source address selection without CONFIG_IPV6_PRIVACY
We need to update hiscore.rule even if we don't enable CONFIG_IPV6_PRIVACY,
because we have more less significant rule; longest match.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Two additional labels (RFC 3484, sec. 10.3) for IPv6 addreses
are defined to make a distinction between global unicast
addresses and Unique Local Addresses (fc00::/7, RFC 4193) and
Teredo (2001::/32, RFC 4380). It is necessary to avoid attempts
of connection that would either fail (eg. fec0:: to 2001:feed::)
or be sub-optimal (2001:0:: to 2001:feed::).
Signed-off-by: \e$,1 aukasz Stelmach <stlman@poczta.fm> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We have a bad interaction with both the kernel and user space being able
to change some of the /proc file status. This fixes the most obvious
part of it, but I expect we'll also make it harder for users to modify
even their "own" files in /proc.
fix prctl privilege escalation and suid_dumpable (CVE-2006-2451)
Based on a patch from Ernie Petrides
During security research, Red Hat discovered a behavioral flaw in core
dump handling. A local user could create a program that would cause a
core file to be dumped into a directory they would not normally have
permissions to write to. This could lead to a denial of service (disk
consumption), or allow the local user to gain root privileges.
Patrick McHardy [Fri, 30 Jun 2006 03:33:12 +0000 (05:33 +0200)]
[PATCH] NETFILTER: SCTP conntrack: fix crash triggered by packet without chunks [CVE-2006-2934]
When a packet without any chunks is received, the newconntrack variable
in sctp_packet contains an out of bounds value that is used to look up an
pointer from the array of timeouts, which is then dereferenced, resulting
in a crash. Make sure at least a single chunk is present.
Problem noticed by George A. Theall <theall@tenablesecurity.com>
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
[PATCH] NTFS: Critical bug fix (affects MIPS and possibly others)
It fixes a crash in NTFS on architectures where flush_dcache_page()
is a real function. I never noticed this as all my testing is done on
i386 where flush_dcache_page() is NULL.
http://bugzilla.kernel.org/show_bug.cgi?id=6700
Many thanks to Pauline Ng for the detailed bug report and analysis!
Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
James Bottomley [Thu, 8 Jun 2006 04:03:28 +0000 (00:03 -0400)]
[PATCH] scsi_lib.c: properly count the number of pages in scsi_req_map_sg()
The calculation of nr_pages in scsi_req_map_sg() doesn't account for
the fact that the first page could have an offset that pushes the end
of the buffer onto a new page.
Signed-off-by: Bryan Holty <lgeek@frontiernet.net> Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Kleikamp [Wed, 7 Jun 2006 02:54:44 +0000 (22:54 -0400)]
[PATCH] JFS: Fix multiple errors in metapage_releasepage
It looks like metapage_releasepage was making in invalid assumption that
the releasepage method would not be called on a dirty page. Instead of
issuing a warning and releasing the metapage, it should return 0, indicating
that the private data for the page cannot be released.
I also realized that metapage_releasepage had the return code all wrong. If
it is successful in releasing the private data, it should return 1, otherwise
it needs to return 0.
Lastly, there is no need to call wait_on_page_writeback, since
try_to_release_page will not call us with a page in writback state.
Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Trond Myklebust [Tue, 6 Jun 2006 15:19:35 +0000 (11:19 -0400)]
[PATCH] fs/namei.c: Call to file_permission() under a spinlock in do_lookup_path()
We're presently running lock_kernel() under fs_lock via nfs's ->permission
handler. That's a ranking bug and sometimes a sleep-in-spinlock bug. This
problem was introduced in the openat() patchset.
We should not need to hold the current->fs->lock for a codepath that doesn't
use current->fs.
Robin H. Johnson [Tue, 13 Jun 2006 17:06:11 +0000 (18:06 +0100)]
[PATCH] tmpfs: time granularity fix for [acm]time going backwards
I noticed a strange behavior in a tmpfs file system the other day, while
building packages - occasionally, and seemingly at random, make decided to
rebuild a target. However, only on tmpfs.
A file would be created, and if checked, it had a sub-second timestamp.
However, after an utimes related call where sub-seconds should be set, they
were zeroed instead. In the case that a file was created, and utimes(...,NULL)
was used on it in the same second, the timestamp on the file moved backwards.
After some digging, I found that this was being caused by tmpfs not having a
time granularity set, thus inheriting the default 1 second granularity.
Hugh adds: yes, we missed tmpfs when the s_time_gran mods went into 2.6.11.
Unfortunately, the granularity of CURRENT_TIME, often used in filesystems,
does not match the default granularity set by alloc_super. A few more such
discrepancies have been found, but this is the most important to fix now.
Signed-off-by: Robin H. Johnson <robbat2@gentoo.org> Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Oleg Drokin [Sat, 25 Mar 2006 11:06:54 +0000 (03:06 -0800)]
[PATCH] Missed error checking for intent's filp in open_namei().
It seems there is error check missing in open_namei for errors returned
through intent.open.file (from lookup_instantiate_filp).
If there is plain open performed, then such a check done inside
__path_lookup_intent_open called from path_lookup_open(), but when the open
is performed with O_CREAT flag set, then __path_lookup_intent_open is only
called with LOOKUP_PARENT set where no file opening can occur yet.
Later on lookup_hash is called where exact opening might take place and
intent.open.file may be filled. If it is filled with error value of some
sort, then we get kernel attempting to dereference this error value as
address (and corresponding oops) in nameidata_to_filp() called from
filp_open().
While this is relatively simple to workaround in ->lookup() method by just
checking lookup_instantiate_filp() return value and returning error as
needed, this is not so easy in ->d_revalidate(), where we can only return
"yes, dentry is valid" or "no, dentry is invalid, perform full lookup
again", and just returning 0 on error would cause extra lookup (with
potential extra costly RPCs).
So in short, I believe that there should be no difference in error handling
for opening a file and creating a file in open_namei() and propose this
simple patch as a solution.
David Miller [Mon, 5 Jun 2006 18:27:10 +0000 (11:27 -0700)]
[PATCH] SPARC64: Fix missing fold at end of checksums.
Both csum_partial() and the csum_partial_copy*() family of routines
forget to do a final fold on the computed checksum value on sparc64.
So do the standard Sparc "add + set condition codes, add carry"
sequence, then make sure the high 32-bits of the return value are
clear.
Based upon some excellent detective work and debugging done by
Richard Braun and Samuel Thibault.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Mon, 5 Jun 2006 03:41:00 +0000 (20:41 -0700)]
[PATCH] SPARC64: Respect gfp_t argument to dma_alloc_coherent().
Using asm-generic/dma-mapping.h does not work because pushing
the call down to pci_alloc_coherent() causes the gfp_t argument
of dma_alloc_coherent() to be ignored.
Fix this by implementing things directly, and adding a gfp_t
argument we can use in the internal call down to the PCI DMA
implementation of pci_alloc_coherent().
This fixes massive memory corruption when using the sound driver
layer, which passes things like __GFP_COMP down into these
routines and (correctly) expects that to work.
This is a disk eater when sound is used, so it's pretty critical.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Miller [Sat, 3 Jun 2006 01:30:58 +0000 (18:30 -0700)]
[PATCH] SPARC64: Fix D-cache corruption in mremap
If we move a mapping from one virtual address to another,
and this changes the virtual color of the mapping to those
pages, we can see corrupt data due to D-cache aliasing.
Check for and deal with this by overriding the move_pte()
macro. Set things up so that other platforms can cleanly
override the move_pte() macro too.
This long standing bug corrupts user memory, and in particular
has been notorious for corrupting Debian package database
files on sparc64 boxes.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Oleg Nesterov [Thu, 15 Jun 2006 16:11:43 +0000 (20:11 +0400)]
[PATCH] run_posix_cpu_timers: remove a bogus BUG_ON() (CVE-2006-2445)
do_exit() clears ->it_##clock##_expires, but nothing prevents
another cpu to attach the timer to exiting process after that.
arm_timer() tries to protect against this race, but the check
is racy.
After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and
before do_exit() calls 'schedule() local timer interrupt can find
tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu
does sys_wait4) interrupted task has ->signal == NULL.
At this moment exiting task has no pending cpu timers, they were
cleanuped in __exit_signal()->posix_cpu_timers_exit{,_group}(),
so we can just return from irq.
Oleg Nesterov [Thu, 15 Jun 2006 16:11:15 +0000 (20:11 +0400)]
[PATCH] check_process_timers: fix possible lockup
If the local timer interrupt happens just after do_exit() sets PF_EXITING
(and before it clears ->it_xxx_expires) run_posix_cpu_timers() will call
check_process_timers() with tasklist_lock + ->siglock held and
check_process_timers:
t = tsk;
do {
....
do {
t = next_thread(t);
} while (unlikely(t->flags & PF_EXITING));
} while (t != tsk);
the outer loop will never stop.
Actually, the window is bigger. Another process can attach the timer
after ->it_xxx_expires was cleared (see the next commit) and the 'if
(PF_EXITING)' check in arm_timer() is racy (see the one after that).
Paul Mackerras [Fri, 9 Jun 2006 03:02:59 +0000 (13:02 +1000)]
[PATCH] powerpc: Fix machine check problem on 32-bit kernels (CVE-2006-2448)
This fixes a bug found by Dave Jones that means that it is possible
for userspace to provoke a machine check on 32-bit kernels. This
also fixes a couple of other places where I found similar problems
by inspection.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Stefan Richter [Sat, 3 Jun 2006 00:00:33 +0000 (02:00 +0200)]
[PATCH] sbp2: fix check of return value of hpsb_allocate_and_register_addrspace
I added a failure check in patch "sbp2: variable status FIFO address
(fix login timeout)" --- alas for a wrong error value. This is a bug
since Linux 2.6.16. Leads to NULL pointer dereference if the call
failed, and bogus failure handling if call succeeded.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Stefan Richter [Fri, 2 Jun 2006 17:34:30 +0000 (19:34 +0200)]
[PATCH] sbp2: backport read_capacity workaround for iPod
There is a firmware bug in several Apple iPods which prevents access to
these iPods under certain conditions. The disk size reported by the iPod
is one sector too big. Once access to the end of the disk is attempted,
the iPod becomes inaccessible. This problem has been known for USB iPods
for some time and has recently been discovered to exist with
FireWire/USB combo iPods too.
This patch is derived from the fix in Linux 2.6.17, commit e9a1c52c7b19d10342226c12f170d7ab644427e2, to be applicable to 2.6.16.x
without prerequisite patches. It hard-wires a workaround for three known
affected model numbers (those of 4th generation iPod, iPod Photo, iPod
mini).
Note: This patch lacks Linux 2.6.17's ability to enable and disable the
workaround via a module parameter.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
o Start booting into the capture kernel after an Oops if system is in a
unrecoverable state. System will boot into the capture kernel, if one is
pre-loaded by the user, and capture the kernel core dump.
o One of the following conditions should be true to trigger the booting of
capture kernel.
- panic_on_oops is set.
- pid of current thread is 0
- pid of current thread is 1
- Oops happened inside interrupt context.
Zhu Yi [Wed, 1 Mar 2006 21:55:51 +0000 (05:55 +0800)]
[PATCH] ipw2200: Filter unsupported channels out in ad-hoc mode
Currently iwlist ethX freq[uency]/channel lists all the channels the card
supported for the current region, which includes some channels can only
be used in infrastructure mode. This patch filters these channels out if
the card is currently in ad-hoc mode.
Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Mark Lord [Sun, 28 May 2006 15:28:00 +0000 (11:28 -0400)]
[PATCH] the latest consensus libata resume fix
Okay, just to sum things up.
This forces libata to wait for up to 2 seconds for BUSY|DRQ to clear
on resume before continuing.
[jgarzik adds...] During testing we never saw DRQ asserted, but
nonetheless (a) this works and (b) testing for DRQ won't hurt.
Signed-off-by: Mark Lord <liml@rtr.ca> Acked-by: Jens Axboe <axboe@suse.de> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stefan Richter [Sat, 27 May 2006 12:11:18 +0000 (14:11 +0200)]
[PATCH] ohci1394, sbp2: fix "scsi_add_device failed" with PL-3507 based devices
Re-enable posted writes for status FIFO.
Besides bringing back a very minor bandwidth tweak from Linux 2.6.15.x
and older, this also fixes an interoperability regression since 2.6.16:
http://bugzilla.kernel.org/show_bug.cgi?id=6356
(sbp2: scsi_add_device failed. IEEE1394 HD is not working anymore.)
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Tested-by: Vanei Heidemann <linux@javanei.com.br> Tested-by: Martin Putzlocher <mputzi@gmx.de> (chip type unconfirmed) Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Johannes Berg [Fri, 26 May 2006 01:44:24 +0000 (18:44 -0700)]
[PATCH] PowerMac: force only suspend-to-disk to be valid
For a very long time, echoing 'standby' or 'mem' into /sys/power/state has
killed the machine on powerpc. This patch fixes that.
This patch adds the .valid callback to pm_ops on PowerMac so that only the
suspend to disk state can be entered. Note that just returning 0 would
suffice since the upper layers don't pass PM_SUSPEND_DISK down, but we
handle it there regardless just in case that changes.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Paul Jackson [Tue, 23 May 2006 00:56:07 +0000 (17:56 -0700)]
[PATCH] Cpuset: might sleep checking zones allowed fix
Fix an infrequently encountered 'sleeping function called
from invalid context' in the cpuset hooks in __alloc_pages.
Could sleep while interrupts disabled.
The routine cpuset_zone_allowed() is called by code in
mm/page_alloc.c __alloc_pages() to determine if a zone is
allowed in the current tasks cpuset. This routine can sleep,
for certain GFP_KERNEL allocations, if the zone is on a memory
node not allowed in the current cpuset, but might be allowed
in a parent cpuset.
But we can't sleep in __alloc_pages() if in interrupt, nor
if called for a GFP_ATOMIC request (__GFP_WAIT not set in
gfp_flags).
The rule was intended to be:
Don't call cpuset_zone_allowed() if you can't sleep, unless you
pass in the __GFP_HARDWALL flag set in gfp_flag, which disables
the code that might scan up ancestor cpusets and sleep.
This rule was being violated due to a bogus change made (by myself,
pj) to __alloc_pages() as part of the November 2005 effort to
cleanup its logic.
The bogus change can be seen at:
http://linux.derkeiler.com/Mailing-Lists/Kernel/2005-11/4691.html
[PATCH 01/05] mm fix __alloc_pages cpuset ALLOC_* flags
This was first noticed on a tight memory system, in code that
was disabling interrupts and doing allocation requests with
__GFP_WAIT not set, which resulted in __might_sleep() writing
complaints to the log "Debug: sleeping function called ...",
when the code in cpuset_zone_allowed() tried to take the
callback_sem cpuset semaphore.
Special thanks to Dave Chinner, for figuring this out,
and a tip of the hat to Nick Piggin who warned me of this
back in Nov 2005, before I was ready to listen.
Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Pat Gefre [Mon, 1 May 2006 19:16:08 +0000 (12:16 -0700)]
[PATCH] Altix: correct ioc3 port order
Currently loading the ioc3 as a module will cause the ports to be numbered
in reverse order. This mod maintains the proper order of cards for port
numbering.
Signed-off-by: Patrick Gefre <pfg@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Brent Casavant [Thu, 4 May 2006 02:55:10 +0000 (19:55 -0700)]
[PATCH] Altix: correct ioc4 port order
Currently loading the ioc4 as a module will cause the ports to be numbered
in reverse order. This mod maintains the proper order of cards for port
numbering.
Signed-off-by: Brent Casavant <bcasavan@sgi.com> Cc: Pat Gefre <pfg@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>