Herbert Xu [Wed, 14 Feb 2007 08:39:09 +0000 (09:39 +0100)]
[NETFILTER]: Clear GSO bits for TCP reset packet
The TCP reset packet is copied from the original. This
includes all the GSO bits which do not apply to the new
packet. So we should clear those bits.
Spotted by Patrick McHardy.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
I encountered a kernel panic with my test program, which is a very
simple IPv6 client-server program.
The server side sets IPV6_RECVPKTINFO on a listening socket, and the
client side just sends a message to the server. Then the kernel panic
occurs on the server. (If you need the test program, please let me
know. I can provide it.)
This problem happens because a skb is forcibly freed in
tcp_rcv_state_process().
When a socket in listening state(TCP_LISTEN) receives a syn packet,
then tcp_v6_conn_request() will be called from
tcp_rcv_state_process(). If the tcp_v6_conn_request() successfully
returns, the skb would be discarded by __kfree_skb().
However, in case of a listening socket which was already set
IPV6_RECVPKTINFO, an address of the skb will be stored in
treq->pktopts and a ref count of the skb will be incremented in
tcp_v6_conn_request(). But, even if the skb is still in use, the skb
will be freed. Then someone still using the freed skb will cause the
kernel panic.
I suggest to use kfree_skb() instead of __kfree_skb().
Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Baruch Even [Wed, 14 Feb 2007 08:29:14 +0000 (09:29 +0100)]
TCP: Fix sorting of SACK blocks.
The sorting of SACK blocks actually munges them rather than sort,
causing the TCP stack to ignore some SACK information and breaking the
assumption of ordered SACK blocks after sorting.
The sort takes the data from a second buffer which isn't moved causing
subsequent data moves to occur from the wrong location. The fix is to
use a temporary buffer as a normal sort does.
Signed-off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>
DECNET: Handle a failure in neigh_parms_alloc (take 2)
While enhancing the neighbour code to handle multiple network
namespaces I noticed that decnet is assuming neigh_parms_alloc
will allways succeed, which is clearly wrong. So handle the
failure.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Hugh Dickins [Tue, 13 Feb 2007 12:10:20 +0000 (13:10 +0100)]
fix umask when noACL kernel meets extN tuned for ACLs
Fix insecure default behaviour reported by Tigran Aivazian: if an
ext2 or ext3 filesystem is tuned to mount with "acl", but mounted by
a kernel built without ACL support, then umask was ignored when creating
inodes - though root or user has umask 022, touch creates files as 0666,
and mkdir creates directories as 0777.
This appears to have worked right until 2.6.11, when a fix to the default
mode on symlinks (always 0777) assumed VFS applies umask: which it does,
unless the mount is marked for ACLs; but ext[23] set MS_POSIXACL in
s_flags according to s_mount_opt set according to def_mount_opts.
We could revert to the 2.6.10 ext[23]_init_acl (adding an S_ISLNK test);
but other filesystems only set MS_POSIXACL when ACLs are configured. We
could fix this at another level; but it seems most robust to avoid setting
the s_mount_opt flag in the first place (at the expense of more ifdefs).
Likewise don't set the XATTR_USER flag when built without XATTR support.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
reiserfs: avoid tail packing if an inode was ever mmapped
This patch fixes a confusion reiserfs has for a long time.
On release file operation reiserfs used to try to pack file data stored in
last incomplete page of some files into metadata blocks. After packing the
page got cleared with clear_page_dirty. It did not take into account that
the page may be mmaped into other process's address space. Recent
replacement for clear_page_dirty cancel_dirty_page found the confusion with
sanity check that page has to be not mapped.
The patch fixes the confusion by making reiserfs avoid tail packing if an
inode was ever mmapped. reiserfs_mmap and reiserfs_file_release are
serialized with mutex in reiserfs specific inode. reiserfs_mmap locks the
mutex and sets a bit in reiserfs specific inode flags.
reiserfs_file_release checks the bit having the mutex locked. If bit is
set - tail packing is avoided. This eliminates a possibility that mmapped
page gets cancel_page_dirty-ed.
Signed-off-by: Vladimir Saveliev <vs@namesys.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Neil Brown [Tue, 30 Jan 2007 23:53:52 +0000 (00:53 +0100)]
Make 'repair' actually work for raid1.
When 'repair' finds a block that is different one the various
parts of the mirror. it is meant to write a chosen good version
to the others. However it currently writes out the original data
to each. The memcpy to make all the data the same is missing.
Also correct a test so that 'repair' causes a repair, rather than
anything other then 'repair'.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
James Bottomley [Sat, 27 Jan 2007 23:54:39 +0000 (00:54 +0100)]
[SCSI] arcmsr: fix up sysfs values
The sysfs files in arcmsr are non-standard in that they aren't simple
filename value pairs, the values actually contain preceeding text which
would have to be parsed. The idea of sysfs files is that the file name
is the description and the contents is a simple value.
Fix up arcmsr to conform to this standard.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Andrew Morton [Sat, 27 Jan 2007 23:53:31 +0000 (00:53 +0100)]
[SCSI] areca sysfs fix
Remove sysfs_remove_bin_file() return-value checking from the areca driver.
There's nothing a driver can do if sysfs file removal fails, so we'll soon be
changing sysfs_remove_bin_file() to internally print a diagnostic and to
return void.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Marcel Holtmann [Thu, 25 Jan 2007 19:54:35 +0000 (20:54 +0100)]
[Bluetooth] Fix deadlock in the L2CAP layer
The Bluetooth L2CAP layer has 2 locks that are used in softirq context,
(one spinlock and one rwlock, where the softirq usage is readlock) but
where not all usages of the lock were _bh safe. The patch below corrects
this.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Marcel Holtmann [Thu, 25 Jan 2007 19:34:48 +0000 (20:34 +0100)]
[Bluetooth] Fix compat ioctl for BNEP, CMTP and HIDP
There exists no attempt do deal with the fact that a structure with
a uint32_t followed by a pointer is going to be different for 32-bit
and 64-bit userspace. Any 32-bit process trying to use it will be
failing with -EFAULT if it's lucky; suffering from having data dumped
at a random address if it's not.
Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Marcel Holtmann [Thu, 25 Jan 2007 18:40:43 +0000 (19:40 +0100)]
[Bluetooth] Fix uninitialized return value for RFCOMM sendmsg()
When calling send() with a zero length parameter on a RFCOMM socket
it returns a positive value. In this rare case the variable err is
used uninitialized and unfortunately its value is returned.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Jan Andersson [Wed, 24 Jan 2007 23:10:10 +0000 (00:10 +0100)]
sparc32: add offset in pci_map_sg()
Add sg->offset to sg->dvma_address in pci_map_sg() on sparc32. Without the
offset, transfers to buffers that do not begin on a page boundary will not
work as expected.
Signed-off-by: Jan Andersson <jan.andersson@ieee.org> Acked-By: David Miller <davem@davemloft.net>
Eric Sesterhenn [Wed, 24 Jan 2007 23:05:10 +0000 (00:05 +0100)]
V4L/DVB: Missing statement in drivers/media/dvb/frontends/cx22700.c
Stumbled over this because of coverity (id #492),
seems like we are missing a return statement here and fail
to do proper bounds checking. If this assumption is false
we should at least change the identation to make it clear
Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Hugh Dickins [Tue, 23 Jan 2007 15:46:22 +0000 (16:46 +0100)]
read_zero_pagealigned() locking fix
Ramiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range: kernel
bugzilla 7645. Right: read_zero_pagealigned uses down_read of mmap_sem,
but another thread's racing read of /dev/zero, or a normal fault, can
easily set that pte again, in between zap_page_range and zeromap_page_range
getting there. It's been wrong ever since 2.4.3.
The simple fix is to use down_write instead, but that would serialize reads
of /dev/zero more than at present: perhaps some app would be badly
affected. So instead let zeromap_page_range return the error instead of
BUG_ON, and read_zero_pagealigned break to the slower clear_user loop in
that case - there's no need to optimize for it.
Use -EEXIST for when a pte is found: BUG_ON in mmap_zero (the other user of
zeromap_page_range), though it really isn't interesting there. And since
mmap_zero wants -EAGAIN for out-of-memory, the zeromaps better return that
than -ENOMEM.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Alan Cox [Mon, 22 Jan 2007 19:39:00 +0000 (20:39 +0100)]
atiixp: hang fix
When the old IDE layer calls into methods in the driver during error
handling it is essentially random whether ide_lock is already held. This
causes a deadlock in the atiixp driver which also uses ide_lock internally
for locking.
Switch to a private lock instead.
[akpm@osl.org: cleanup] Signed-off-by: Alan Cox <alan@redhat.com> Acked-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Jens Axboe [Mon, 22 Jan 2007 19:34:31 +0000 (20:34 +0100)]
cdrom: set default timeout to 7 seconds
It's a known fact that Windows times out commands after 7 seconds, so
drives generally try and respond if they can before that happens. We
default to 5 seconds, which sometimes is a bit too short.
Jeremy Higdon reported here:
http://lkml.org/lkml/2007/1/1/145
that his drive takes longer than 5 seconds for a "read track
information" command, later confirming that it is about 6.7 seconds.
So just do the sane thing and change the default command timeout to 7
seconds to avoid other surprises.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Jes Sorensen [Mon, 22 Jan 2007 19:20:21 +0000 (20:20 +0100)]
[SCSI] qla1280 command timeout
Original patch from Ian Dall in bugzilla. Set command timeout as
specified by the SCSI layer rather than hardcode it to 30 seconds. I
have received a couple of reports of people hitting this one with
various tape configurations and the patch looks obviously correct.
From http://bugzilla.kernel.org/show_bug.cgi?id=6275
Ian Dall <ian@beware.dropbear.id.au>:
The command sent to the card was using a 30second timeout regardless of the
timeout requested in the scsi command passed down from higher levels.
Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
James Bursa [Sat, 20 Jan 2007 21:58:51 +0000 (22:58 +0100)]
adfs: fix filename handling
Fix filenames on adfs discs being terminated at the first character greater
than 128 (adfs filenames are Latin 1). I saw this problem when using a
loopback adfs image on a 2.6.17-rc5 x86_64 machine, and the patch fixed it
there.
Include connector config in the s390 arch Kconfig to get support for
connectors.
This also fixes the following Kconfig warning:
fs/Kconfig:1728:warning: 'select' used by config symbol 'CIFS_UPCALL' refer to undefined symbol 'CONNECTOR'
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Patrick McHardy [Sat, 20 Jan 2007 21:18:30 +0000 (22:18 +0100)]
NETFILTER: NAT: fix NOTRACK checksum handling
The whole idea with the NOTRACK netfilter target is that
you can force the netfilter code to avoid connection
tracking, and all costs assosciated with it, by making
traffic match a NOTRACK rule.
But this is totally broken by the fact that we do a checksum
calculation over the packet before we do the NOTRACK bypass
check, which is very expensive. People setup NOTRACK rules
explicitly to avoid all of these kinds of costs.
This patch from Patrick, already in Linus's tree, fixes the
bug.
Move the check for ip_conntrack_untracked before the call to
skb_checksum_help to fix NOTRACK excemptions from NAT. Pre-2.6.19
NAT code breaks TSO by invalidating hardware checksums for every
packet, even if explicitly excluded from NAT through NOTRACK.
2.6.19 includes a fix that makes NAT and TSO live in harmony,
but the performance degradation caused by this deserves making
at least the workaround work properly in -stable.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Arnaud Patard [Mon, 8 Jan 2007 22:09:59 +0000 (23:09 +0100)]
ALSA: emu10k1: Fix outl() in snd_emu10k1_resume_regs()
The emu10k1 driver saves the A_IOCFG and HCFG register on suspend and restores
it on resumes. Unfortunately, this doesn't work as the arguments to outl() are
reversed.
Jean Delvare [Mon, 8 Jan 2007 06:05:19 +0000 (07:05 +0100)]
V4L: cx88: Fix leadtek_eeprom tagging
reference to .init.text: from .text between 'cx88_card_setup'
(at offset 0x68c) and 'cx88_risc_field'
Caused by leadtek_eeprom() being declared __devinit and called from
a non-devinit context.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Phillip Lougher [Mon, 8 Jan 2007 06:02:45 +0000 (07:02 +0100)]
corrupted cramfs filesystems cause kernel oops (CVE-2006-5823)
Steve Grubb's fzfuzzer tool (http://people.redhat.com/sgrubb/files/
fsfuzzer-0.6.tar.gz) generates corrupt Cramfs filesystems which cause
Cramfs to kernel oops in cramfs_uncompress_block(). The cause of the oops
is an unchecked corrupted block length field read by cramfs_readpage().
This patch adds a sanity check to cramfs_readpage() which checks that the
block length field is sensible. The (PAGE_CACHE_SIZE << 1) size check is
intentional, even though the uncompressed data is not going to be larger
than PAGE_CACHE_SIZE, gzip sometimes generates compressed data larger than
the original source data. Mkcramfs checks that the compressed size is
always less than or equal to PAGE_CACHE_SIZE << 1. Of course Cramfs could
use the original uncompressed data in this case, but it doesn't.
Signed-off-by: Phillip Lougher <phillip@lougher.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de>
I've been using Steve Grubb's purely evil "fsfuzzer" tool, at
http://people.redhat.com/sgrubb/files/fsfuzzer-0.4.tar.gz
Basically it makes a filesystem, splats some random bits over it, then
tries to mount it and do some simple filesystem actions.
At best, the filesystem catches the corruption gracefully. At worst,
things spin out of control.
As you might guess, we found a couple places in ext3 where things spin out
of control :)
First, we had a corrupted directory that was never checked for
consistency... it was corrupt, and pointed to another bad "entry" of
length 0. The for() loop looped forever, since the length of
ext3_next_entry(de) was 0, and we kept looking at the same pointer over and
over and over and over... I modeled this check and subsequent action on
what is done for other directory types in ext3_readdir...
(adding this check adds some computational expense; I am testing a followup
patch to reduce the number of times we check and re-check these directory
entries, in all cases. Thanks for the idea, Andreas).
Next we had a root directory inode which had a corrupted size, claimed to
be > 200M on a 4M filesystem. There was only really 1 block in the
directory, but because the size was so large, readdir kept coming back for
more, spewing thousands of printk's along the way.
Per Andreas' suggestion, if we're in this read error condition and we're
trying to read an offset which is greater than i_blocks worth of bytes,
stop trying, and break out of the loop.
With these two changes fsfuzz test survives quite well on ext3.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Eric Sandeen [Mon, 8 Jan 2007 05:59:28 +0000 (06:59 +0100)]
ext2: skip pages past number of blocks in ext2_find_entry (CVE-2006-6054)
This one was pointed out on the MOKB site:
http://kernelfun.blogspot.com/2006/11/mokb-09-11-2006-linux-26x-ext2checkpage.html
If a directory's i_size is corrupted, ext2_find_entry() will keep processing
pages until the i_size is reached, even if there are no more blocks associated
with the directory inode. This patch puts in some minimal sanity-checking
so that we don't keep checking pages (and issuing errors) if we know there
can be no more data to read, based on the block count of the directory inode.
This is somewhat similar in approach to the ext3 patch I sent earlier this
year.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>
and boom. Need to make sure the error cases return an error, I think.
[akpm@osdl.org: return -ENOMEM on oom] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Ran into BUG() while doing madvise(REMOVE) testing. If we are punching a
hole into shared memory segment using madvise(REMOVE) and the entire hole
is below the indirect blocks, we hit following assert.
BUG_ON(limit <= SHMEM_NR_DIRECT);
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Forwarded-by: Jordan Neumeyer Signed-off-by: Adrian Bunk <bunk@stusta.de>
John Heffner [Sat, 6 Jan 2007 21:31:44 +0000 (22:31 +0100)]
TCP: Fix and simplify microsecond rtt sampling
This changes the microsecond RTT sampling so that samples are taken in
the same way that RTT samples are taken for the RTO calculator: on the
last segment acknowledged, and only when the segment hasn't been
retransmitted.
Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Makes UML compile on any possible processor choice. The two problems were:
*) x86 code, when 386 is selected, checks at runtime boot_cpuflags, which we
not have.
*) 3Dnow support for memcpy() et al. does not compile currently and fixing t
is not trivial, so simply disable it; with this change, if one selects MK
UML compiles (while it did not).
Merged upstream.
Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Willy Tarreau [Sat, 6 Jan 2007 01:31:24 +0000 (02:31 +0100)]
rio: typo in bitwise AND expression.
The line:
hp->Mode &= !RIO_PCI_INT_ENABLE;
is obviously wrong as RIO_PCI_INT_ENABLE=0x04 and is used as a bitmask
2 lines before. Getting no IRQ would not disable RIO_PCI_INT_ENABLE
but rather RIO_PCI_BOOT_FROM_RAM which equals 0x01.
Obvious fix is to change ! for ~.
Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Adrian Bunk <bunk@stusta.de>
David Brownell [Sat, 6 Jan 2007 00:08:47 +0000 (01:08 +0100)]
SPI/MTD: mtd_dataflash oops prevention
Return a fault code if the Dataflash driver runs into a "no device present"
error when the MISO line has a pulldown (it currently expects a pullup), so
that rmmod won't oops.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Georg Chini [Sat, 6 Jan 2007 00:00:08 +0000 (01:00 +0100)]
[SOUND] Sparc CS4231: Fix IRQ return value and initialization.
SBUS: Change IRQ-handler return value from 0 to IRQ_HANDLED and
fix some initialisation problems.
Change period_bytes_min from 4096 to 256 to allow driver to work with
low latency (VOIP) applications. Hope this does not break EBUS.
Signed-off-by: Georg Chini <georg.chini@triaton-webhosting.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Andrew Morton [Thu, 4 Jan 2007 22:29:51 +0000 (23:29 +0100)]
ibmtr section fixes
WARNING: drivers/net/tokenring/ibmtr.o - Section mismatch: reference to .init.data:ibmtr_mem_base from .text between 'ibmtr_probe1' (at offset 0x6e6) and 'ibmtr_probe_card'
WARNING: drivers/net/tokenring/ibmtr.o - Section mismatch: reference to .init.data:ibmtr_mem_base from .text between 'ibmtr_probe1' (at offset 0x74a) and 'ibmtr_probe_card'
WARNING: drivers/net/tokenring/ibmtr.o - Section mismatch: reference to .init.data:ibmtr_mem_base from .text between 'ibmtr_probe1' (at offset 0x7fd) and 'ibmtr_probe_card'
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Andi Kleen [Mon, 8 Jan 2007 21:44:07 +0000 (22:44 +0100)]
x86_64: Don't leak NT bit into next task (CVE-2006-5755)
SYSENTER can cause a NT to be set which might cause crashes on the IRET
in the next task.
Following similar i386 patch from Linus.
Backport to 2.6.16 by Chuck Ebbert <76306.1226@compuserve.com>
[Changed 'set_debugreg' to the older 'set_debug' in setup64.c
and added raw_local_save_flags() from 2.6.19 to system.h]
Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Marcel Holtmann [Thu, 4 Jan 2007 21:57:52 +0000 (22:57 +0100)]
Bluetooth: Add packet size checks for CAPI messages (CVE-2006-6106)
With malformed packets it might be possible to overwrite internal
CMTP and CAPI data structures. This patch adds additional length
checks to prevent these kinds of remote attacks.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
If grow_buffers() is for some reason passed a block number which wants to li
outside the maximum-addressable pagecache range (PAGE_SIZE * 4G bytes) then
will accidentally truncate `index' and will then instnatiate a page at the
wrong pagecache offset. This causes __getblk_slow() to go into an infinite
loop.
This can happen with corrupted disks, or with software errors elsewhere.
Detect that, and handle it.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Linus Torvalds [Thu, 4 Jan 2007 22:23:27 +0000 (23:23 +0100)]
i386: save/restore eflags in context switch (CVE-2006-5173)
(And reset it on new thread creation)
It turns out that eflags is important to save and restore not just
because of iopl, but due to the magic bits like the NT bit, which we
don't want leaking between different threads.
Backported to 2.6.16 by Chuck Ebbert <76306.1226@compuserve.com>
[Backport consisted of removing the CFI annotations.]
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Linus Torvalds [Thu, 4 Jan 2007 00:44:45 +0000 (01:44 +0100)]
Fix incorrect user space access locking in mincore() (CVE-2006-4814)
Doug Chapman noticed that mincore() will doa "copy_to_user()" of the
result while holding the mmap semaphore for reading, which is a big
no-no. While a recursive read-lock on a semaphore in the case of a page
fault happens to work, we don't actually allow them due to deadlock
schenarios with writers due to fairness issues.
Doug and Marcel sent in a patch to fix it, but I decided to just rewrite
the mess instead - not just fixing the locking problem, but making the
code smaller and (imho) much easier to understand.
Also included are two fixes for the original patch including one
by Oleg Nesterov.
Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Miklos Szeredi [Thu, 4 Jan 2007 00:14:06 +0000 (01:14 +0100)]
fuse: fix hang on SMP
Fuse didn't always call i_size_write() with i_mutex held which caused
rare hangs on SMP/32bit. This bug has been present since fuse-2.2,
well before being merged into mainline.
The simplest solution is to protect i_size_write() with the
per-connection spinlock. Using i_mutex for this purpose would require
some restructuring of the code and I'm not even sure it's always safe
to acquire i_mutex in all places i_size needs to be set.
Since most of vmtruncate is already duplicated for other reasons,
duplicate the remaining part as well, making all i_size_write() calls
internal to fuse.
Using i_size_write() was unnecessary in fuse_init_inode(), since this
function is only called on a newly created locked inode.
Reported by a few people over the years, but special thanks to Dana
Henriksen who was persistent enough in helping me debug it.
Adrian Bunk:
Backported to 2.6.16.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Adrian Bunk <bunk@stusta.de>
Dirk Eibach [Wed, 3 Jan 2007 23:42:01 +0000 (00:42 +0100)]
i2c: fix broken ds1337 initialization
On a custom board with ds1337 RTC I found that upgrade from 2.6.15 to
2.6.18 broke RTC support.
The main problem are changes to ds1337_init_client().
When a ds1337 recognizes a problem (e.g. power or clock failure) bit 7
in status register is set. This has to be reset by writing 0 to status
register. But since there are only 16 byte written to the chip and the
first byte is interpreted as an address, the status register (which is
the 16th) is never written.
The other problem is, that initializing all registers to zero is not
valid for day, date and month register. Funny enough this is checked by
ds1337_detect(), which depends on this values not being zero. So then
treated by ds1337_init_client() the ds1337 is not detected anymore,
whereas the failure bit in the status register is still set.
Patrick McHardy [Wed, 3 Jan 2007 23:38:10 +0000 (00:38 +0100)]
NET_SCHED: Fix fallout from dev->qdisc RCU change
The move of qdisc destruction to a rcu callback broke locking in the
entire qdisc layer by invalidating previously valid assumptions about
the context in which changes to the qdisc tree occur.
The two assumptions were:
- since changes only happen in process context, read_lock doesn't need
bottem half protection. Now invalid since destruction of inner qdiscs,
classifiers, actions and estimators happens in the RCU callback unless
they're manually deleted, resulting in dead-locks when read_lock in
process context is interrupted by write_lock_bh in bottem half context.
- since changes only happen under the RTNL, no additional locking is
necessary for data not used during packet processing (f.e. u32_list).
Again, since destruction now happens in the RCU callback, this assumption
is not valid anymore, causing races while using this data, which can
result in corruption or use-after-free.
Instead of "fixing" this by disabling bottem halfs everywhere and adding
new locks/refcounting, this patch makes these assumptions valid again by
moving destruction back to process context. Since only the dev->qdisc
pointer is protected by RCU, but ->enqueue and the qdisc tree are still
protected by dev->qdisc_lock, destruction of the tree can be performed
immediately and only the final free needs to happen in the rcu callback
to make sure dev_queue_xmit doesn't access already freed memory.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@stusta.de>