Al Viro [Mon, 14 Jul 2008 18:09:23 +0000 (21:09 +0300)]
fix SMP ordering hole in fcntl_setlk() (CVE-2008-1669)
fcntl_setlk()/close() race prevention has a subtle hole - we need to
make sure that if we *do* have an fcntl/close race on SMP box, the
access to descriptor table and inode->i_flock won't get reordered.
As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs.
STORE descriptor table entry, LOAD inode->i_flock with not a single
lock in common on both sides. We do have BKL around the first STORE,
but check in locks_remove_posix() is outside of BKL and for a good
reason - we don't want BKL on common path of close(2).
Solution is to hold ->file_lock around fcheck() in there; that orders
us wrt removal from descriptor table that preceded locks_remove_posix()
on close path and we either come first (in which case eviction will be
handled by the close side) or we'll see the effect of close and do
eviction ourselves. Note that even though it's read-only access,
we do need ->file_lock here - rcu_read_lock() won't be enough to
order the things.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@kernel.org>
David S. Miller [Mon, 14 Jul 2008 18:09:23 +0000 (21:09 +0300)]
[NETFILTER]: Fix warnings in ip_nat_snmp_basic.c
net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'asn1_header_decode':
net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'len' may be used unini
net/ipv4/netfilter/ip_nat_snmp_basic.c:248: warning: 'def' may be used unini
net/ipv4/netfilter/ip_nat_snmp_basic.c: In function 'snmp_translate':
net/ipv4/netfilter/ip_nat_snmp_basic.c:672: warning: 'l' may be used uniniti
net/ipv4/netfilter/ip_nat_snmp_basic.c:668: warning: 'type' may be used unin
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
When selecting a new window, tcp_select_window() tries not to shrink
the offered window by using the maximum of the remaining offered window
size and the newly calculated window size. The newly calculated window
size is always a multiple of the window scaling factor, the remaining
window size however might not be since it depends on rcv_wup/rcv_nxt.
This means we're effectively shrinking the window when scaling it down.
The dump below shows the problem (scaling factor 2^7):
- Window size of 557 (71296) is advertised, up to 3111907257:
IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...>
- New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
below the last end:
If the sender uses up the entire window before it is shrunk, this can have
chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
will notice that the window has been shrunk since tcp_wnd_end() is before
tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
This will fail the receivers checks in tcp_sequence() however since it
is before it's tp->rcv_wup, making it respond with a dupack.
If both sides are in this condition, this leads to a constant flood of
ACKs until the connection times out.
Make sure the window is never shrunk by aligning the remaining window to
the window scaling factor.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
x86: Replace NSC/Cyrix specific chipset access macros by inlined functions.
Due to index register access ordering problems, when using macros a line
like this fails (and does nothing):
setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
With inlined functions this line will work as expected.
Note about a side effect: Seems on Geode GX1 based systems the
"suspend on halt power saving feature" was never enabled due to this
wrong macro expansion. With inlined functions it will be enabled, but
this will stop the TSC when the CPU runs into a HLT instruction.
Kernel output something like this:
Clocksource tsc unstable (delta = -472746897 ns)
This is the 3rd version of this patch.
- Adding missed arch/i386/kernel/cpu/mtrr/state.c
Thanks to Andres Salomon
- Adding some big fat comments into the new header file
Suggested by Andi Kleen
AK: fixed x86-64 compilation
Adrian Bunk:
Added workaround for x86_64 compilation.
We got several false bug reports because of enabled
CONFIG_DETECT_SOFTLOCKUP. Disable soft lockup detection on s390, since it
doesn't work on a virtualized architecture.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Thomas Graf [Wed, 19 Mar 2008 21:14:34 +0000 (23:14 +0200)]
[DECNet] fib: Fix out of bound access of dn_fib_props[]
Fixes a typo which caused fib_props[] to have the wrong size
and makes sure the value used to index the array which is
provided by userspace via netlink is checked to avoid out of
bound access.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Adrian Bunk [Fri, 14 Mar 2008 20:05:58 +0000 (22:05 +0200)]
gcc >= 4.3 is not supported
Building kernel 2.6.16 with gcc 4.3 is completely untested, and
you might run into both kernel and gcc problems (as always with
new gcc versions).
For making this obvious the kernel build now #error's when trying
to build with gcc >= 4.3.
The kernel might work fine when compiled with gcc 4.3 and it's
therefore possible to remove the #error, but if someone really
longs for regressions he can as well try a more recent kernel
instead.
Trond Myklebust [Mon, 21 Jan 2008 19:04:16 +0000 (21:04 +0200)]
NFS: call nfs_wb_all() only on regular files
It looks like nfs_setattr() and nfs_rename() also need to test whether the
target is a regular file before calling nfs_wb_all()...
It isn't technically needed since the version of nfs_wb_all() that exists
on 2.6.16 should be safe to call on non-regular files (it will be a no-op).
However it is a useful optimisation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Al went through the ip_fast_csum callers and found this piece of code
that did not validate the IP header. While root crashing the machine
by sending bogus packets through raw or AF_PACKET sockets isn't that
serious, it is still nice to react gracefully.
This patch ensures that the skb has enough data for an IP header and
that the header length field is valid.
Adrian Bunk:
Backported to 2.6.16 following instructions by David Miller.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Here's proposed fix for RX checksum handling in cassini; it affects
little-endian working with half-duplex gigabit, but obviously needs
testing on big-endian too.
The problem is, we need to convert checksum to fixed-endian *before*
correcting for (unstripped) FCS. On big-endian it won't matter
(conversion is no-op), on little-endian it will, but only if FCS is
not stripped by hardware; i.e. in half-duplex gigabit mode when
->crc_size is set.
cassini.c part is that fix, cassini.h one consists of trivial
endianness annotations. With that applied the sucker is endian-clean,
according to sparse.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Jeff Moyer [Sun, 20 Jan 2008 19:31:32 +0000 (21:31 +0200)]
raw: don't allow the creation of a raw device with minor number 0
Minor number 0 (under the raw major) is reserved for the rawctl device
file, which is used to query, set, and unset raw device bindings. However,
the ioctl interface does not protect the user from specifying a raw device
with minor number 0:
$ sudo ./raw /dev/raw/raw0 /dev/VolGroup00/swap
/dev/raw/raw0: bound to major 253, minor 2
$ ls -l /dev/rawctl
ls: /dev/rawctl: No such file or directory
$ ls -l /dev/raw/raw0
crw------- 1 root root 162, 0 Jan 12 10:51 /dev/raw/raw0
$ sudo ./raw -qa
Cannot open master raw device '/dev/rawctl' (No such file or directory)
As you can see, this prevents any further raw operations from
succeeding. The fix (from Steve Fernandez) is quite simple - do not
allow the allocation of minor number 0.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
The original meaning of the old test (p->state > TASK_STOPPED) was
"not dead", since it was before TASK_TRACED existed and before the
state/exit_state split. It was a wrong correction in commit 14bf01bb0599c89fc7f426d20353b76e12555308 to make this test for
TASK_TRACED instead. It should have been changed when TASK_TRACED
was introducted and again when exit_state was introduced.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Eric Sandeen [Wed, 16 Jan 2008 21:36:44 +0000 (23:36 +0200)]
limit minixfs printks on corrupted dir i_size (CVE-2006-6058)
First reported at http://projects.info-pull.com/mokb/MOKB-17-11-2006.html
Essentially a corrupted minix dir inode reporting a very large
i_size will loop for a very long time in minix_readdir, minix_find_entry,
etc, because on EIO they just move on to try the next page. This is
under the BKL, printk-storming as well. This can lock up the machine
for a very long time. Simply ratelimiting the printks gets things back
under control. Make the message a bit more informative while we're here.
Adrian Bunk:
Backported to 2.6.16.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Way back when (in commit 834f2a4a1554dc5b2598038b3fe8703defcbe467, aka
"VFS: Allow the filesystem to return a full file pointer on open intent"
to be exact), Trond changed the open logic to keep track of the original
flags to a file open, in order to pass down the the intent of a dentry
lookup to the low-level filesystem.
However, when doing that reorganization, it changed the meaning of
namei_flags, and thus inadvertently changed the test of access mode for
directories (and RO filesystem) to use the wrong flag. So fix those
test back to use access mode ("acc_mode") rather than the open flag
("flag").
Issue noticed by Bill Roman at Datalight.
Reported-and-tested-by: Bill Roman <bill.roman@datalight.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>
The aalgos/ealgos fields are only 32 bits wide. However, af_key tries
to test them with the expression 1 << id where id can be as large as
253. This produces different behaviour on different architectures.
The following patch explicitly checks whether ID is greater than 31
and fails the check if that's the case.
We cannot easily extend the mask to be longer than 32 bits due to
exposure to user-space. Besides, this whole interface is obsolete
anyway in favour of the xfrm_user interface which doesn't use this
bit mask in templates (well not within the kernel anyway).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
When re-naming an interface, the previous secondary address
labels get lost e.g.
$> brctl addbr foo
$> ip addr add 192.168.0.1 dev foo
$> ip addr add 192.168.0.2 dev foo label foo:00
$> ip addr show dev foo | grep inet
inet 192.168.0.1/32 scope global foo
inet 192.168.0.2/32 scope global foo:00
$> ip link set foo name bar
$> ip addr show dev bar | grep inet
inet 192.168.0.1/32 scope global bar
inet 192.168.0.2/32 scope global bar:2
Turns out to be a simple thinko in inetdev_changename() - clearly we
want to look at the address label, rather than the device name, for
a suffix to retain.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
We currently check that iph->ihl is bounded by the real length and that
the real length is greater than the minimum IP header length. However,
we did not check the caes where iph->ihl is less than the minimum IP
header length.
This breaks because some ip_fast_csum implementations assume that which
is quite reasonable.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Avaid provided test application, so bug got fixed.
IPv6 addrconf removes ipv6 inner device from netdev each time cmu
changes and new value is less than IPV6_MIN_MTU (1280 bytes).
When mtu is changed and new value is greater than IPV6_MIN_MTU,
it does not add ipv6 addresses and inner device bac.
This patch fixes that.
Tested with Avaid's application, which works ok now.
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Adrian Bunk <bunk@kernel.org>
fix build failure with gcc-4.2.x: fix up casts in cia_io* routines to avoid
warnings ('discards qualifiers from pointer target type'), which are
failures, thanks to -Werror;
Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Alan Cox [Sun, 9 Dec 2007 18:07:00 +0000 (19:07 +0100)]
[SCSI] aacraid: fix security weakness
Actually there are several but one is trivially fixed
1. FSACTL_GET_NEXT_ADAPTER_FIB ioctl does not lock dev->fib_list
but needs to
2. Ditto for FSACTL_CLOSE_GET_ADAPTER_FIB
3. It is possible to construct an attack via the SRB ioctls where
the user obtains assorted elevated privileges. Various approaches are
possible, the trivial ones being things like writing to the raw media
via scsi commands and the swap image of other executing programs with
higher privileges.
So the ioctls should be CAP_SYS_RAWIO - at least all the FIB manipulating
ones. This is a bandaid fix for #3 but probably the ioctls should grow
their own capable checks. The other two bugs need someone competent in that
driver to fix them.
Signed-off-by: Alan Cox <alan@redhat.com> Acked-by: Mark Salyzyn <mark_salyzyn@adaptec.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Jean Delvare [Sun, 9 Dec 2007 17:58:59 +0000 (18:58 +0100)]
hwmon/lm87: Fix a division by zero
Missing parentheses in the definition of FAN_FROM_REG cause a
division by zero for a specific register value.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Hans de Goede <j.w.r.degoede@hhs.nl> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Jean Delvare [Sun, 9 Dec 2007 17:57:37 +0000 (18:57 +0100)]
hwmon/lm87: Disable VID when it should be
A stupid bit shifting bug caused the VID value to be always exported
even when the hardware is configured for something different.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mark M. Hoffman <mhoffman@lightlink.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Kernel needs to respond to an SADB_GET with the same message type to
conform to the RFC 2367 Section 3.1.5
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Adrian Bunk <bunk@kernel.org>
if you are lucky (unlucky?) enough to have shared interrupts, the
interrupt handler can be called before the tasklet and lock are ready
for use.
Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Adrian Bunk <bunk@kernel.org>
tmpfs was misconverted to __GFP_ZERO in 2.6.11. There's an unusual case in
which shmem_getpage receives the page from its caller instead of allocating.
We must cover this case by clear_highpage before SetPageUptodate, as before.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
The #ifdef's in arp_process() were not only a mess, they were also wrong
in the CONFIG_NET_ETHERNET=n and (CONFIG_NETDEV_1000=y or
CONFIG_NETDEV_10000=y) cases.
Since they are not required this patch removes them.
Also removed are some #ifdef's around #include's that caused compile
errors after this change.
Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts
by moving the schedule_timeout() call to a new function that doesn't
propagate the remaining timeout back to the caller. This means on each
retry we start with the full timeout again.
ipc/mqueue.c seems to actually want to wait indefinitely so this
behaviour is retained.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
This patch fixes a memory leak when a PPPoE socket is release()d after
it has been connect()ed, but before the PPPIOCGCHAN ioctl ever has been
called on it.
This is somewhat of a security problem, too, since PPPoE sockets can be
created by any user, so any user can easily allocate all the machine's
RAM to non-swappable address space and thus DoS the system.
Is there any specific reason for PPPoE sockets being available to any
unprivileged process, BTW? After all, you need a packet socket for the
discovery stage anyway, so it's unlikely that any unprivileged process
will ever need to create a PPPoE socket, no? Allocating all session IDs
for a known AC is a kind of DoS, too, after all - with Juniper ERXes,
this is really easy, actually, since they don't ever assign session ids
above 8000 ...
Signed-off-by: Florian Zumbiehl <florz@florz.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Radu Rendec [Tue, 13 Nov 2007 08:30:35 +0000 (09:30 +0100)]
[PKT_SCHED] CLS_U32: Fix endianness problem with u32 classifier hash masks.
While trying to implement u32 hashes in my shaping machine I ran into
a possible bug in the u32 hash/bucket computing algorithm
(net/sched/cls_u32.c).
The problem occurs only with hash masks that extend over the octet
boundary, on little endian machines (where htonl() actually does
something).
Let's say that I would like to use 0x3fc0 as the hash mask. This means
8 contiguous "1" bits starting at b6. With such a mask, the expected
(and logical) behavior is to hash any address in, for instance,
192.168.0.0/26 in bucket 0, then any address in 192.168.0.64/26 in
bucket 1, then 192.168.0.128/26 in bucket 2 and so on.
This is exactly what would happen on a big endian machine, but on
little endian machines, what would actually happen with current
implementation is 0x3fc0 being reversed (into 0xc03f0000) by htonl()
in the userspace tool and then applied to 192.168.x.x in the u32
classifier. When shifting right by 16 bits (rank of first "1" bit in
the reversed mask) and applying the divisor mask (0xff for divisor
256), what would actually remain is 0x3f applied on the "168" octet of
the address.
One could say is this can be easily worked around by taking endianness
into account in userspace and supplying an appropriate mask (0xfc03)
that would be turned into contiguous "1" bits when reversed
(0x03fc0000). But the actual problem is the network address (inside
the packet) not being converted to host order, but used as a
host-order value when computing the bucket.
Let's say the network address is written as n31 n30 ... n0, with n0
being the least significant bit. When used directly (without any
conversion) on a little endian machine, it becomes n7 ... n0 n8 ..n15
etc in the machine's registers. Thus bits n7 and n8 would no longer be
adjacent and 192.168.64.0/26 and 192.168.128.0/26 would no longer be
consecutive.
The fix is to apply ntohl() on the hmask before computing fshift,
and in u32_hash_fold() convert the packet data to host order before
shifting down by fshift.
With helpful feedback from Jamal Hadi Salim and Jarek Poplawski.
Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
tecl_reset() is called from deactivate and qdisc is set to noop already,
but subsequent teql_xmit does not know about it and dereference private
data as teql qdisc and thus oopses.
not catch it first :)
Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Peter Zijlstra [Tue, 13 Nov 2007 07:46:02 +0000 (08:46 +0100)]
i386: fixup TRACE_IRQ breakage
The TRACE_IRQS_ON function in iret_exc: calls a C function without
ensuring that the segments are set properly. Move the trace function and
the enabling of interrupt into the C stub.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Michal Schmidt [Tue, 13 Nov 2007 06:48:46 +0000 (07:48 +0100)]
[PPP_MPPE]: Don't put InterimKey on the stack
ppp_mppe puts a crypto key on the kernel stack, then passes the
address of that into the crypto layer. That doesn't work because the
crypto layer needs to be able to do virt_to_*() on the address which
does not universally work for the kernel stack on all platforms.
Adrian Bunk:
Backported to 2.6.16.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Patrick McHardy [Mon, 12 Nov 2007 12:04:20 +0000 (13:04 +0100)]
[INET_DIAG]: Fix oops in netlink_rcv_skb
netlink_run_queue() doesn't handle multiple processes processing the
queue concurrently. Serialize queue processing in inet_diag to fix
a oops in netlink_rcv_skb caused by netlink_run_queue passing a
NULL for the skb.
[IPV6]: Fix unbalanced socket reference with MSG_CONFIRM.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Neil Brown [Fri, 2 Nov 2007 22:08:36 +0000 (23:08 +0100)]
knfsd: allow nfsd READDIR to return 64bit cookies
->readdir passes lofft_t offsets (used as nfs cookies) to
nfs3svc_encode_entry{,_plus}, but when they pass it on to encode_entry it
becomes an 'off_t', which isn't good.
So filesystems that returned 64bit offsets would lose.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Nick Piggin [Fri, 2 Nov 2007 22:07:14 +0000 (23:07 +0100)]
buffer: memorder fix
unlock_buffer(), like unlock_page(), must not clear the lock without
ensuring that the critical section is closed.
Mingming later sent the same patch, saying:
We are running SDET benchmark and saw double free issue for ext3 extended
attributes block, which complains the same xattr block already being freed (in
ext3_xattr_release_block()). The problem could also been triggered by
multiple threads loop untar/rm a kernel tree.
The race is caused by missing a memory barrier at unlock_buffer() before the
lock bit being cleared, resulting in possible concurrent h_refcounter update.
That causes a reference counter leak, then later leads to the double free that
we have seen.
Inside unlock_buffer(), there is a memory barrier is placed *after* the lock
bit is being cleared, however, there is no memory barrier *before* the bit is
cleared. On some arch the h_refcount update instruction and the clear bit
instruction could be reordered, thus leave the critical section re-entered.
The race is like this: For example, if the h_refcount is initialized as 1,
Adit Ranadive [Fri, 2 Nov 2007 22:05:27 +0000 (23:05 +0100)]
[PKTGEN]: srcmac fix
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Herbert Xu [Fri, 2 Nov 2007 21:53:44 +0000 (22:53 +0100)]
[SNAP]: Check packet length before reading
The snap_rcv code reads 5 bytes so we should make sure that
we have 5 bytes in the head before proceeding.
Based on diagnosis and fix by Evgeniy Polyakov, reported by
Alan J. Wylie.
Patch also kills the skb->sk assignment before kfree_skb
since it's redundant.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
-Fixes ABBA deadlock noted by Patrick McHardy <kaber@trash.net>:
> There is at least one ABBA deadlock, est_timer() does:
> read_lock(&est_lock)
> spin_lock(e->stats_lock) (which is dev->queue_lock)
>
> and qdisc_destroy calls htb_destroy under dev->queue_lock, which
> calls htb_destroy_class, then gen_kill_estimator and this
> write_locks est_lock.
To fix the ABBA deadlock the rate estimators are now kept on an rcu list.
-The est_lock changes the use from protecting the list to protecting
the update to the 'bstat' pointer in order to avoid NULL dereferencing.
-The 'interval' member of the gen_estimator structure removed as it is
not needed.
Signed-off-by: Ranko Zivojnovic <ranko@spidernet.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Patrick McHardy [Fri, 2 Nov 2007 21:42:48 +0000 (22:42 +0100)]
[ICMP]: Fix icmp_errors_use_inbound_ifaddr sysctl
Currently when icmp_errors_use_inbound_ifaddr is set and an ICMP error is
sent after the packet passed through ip_output(), an address from the
outgoing interface is chosen as ICMP source address since skb->dev doesn't
point to the incoming interface anymore.
Fix this by doing an interface lookup on rt->dst.iif and using that device.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Ohad Ben-Cohen [Fri, 2 Nov 2007 03:41:26 +0000 (04:41 +0100)]
[Bluetooth] Fix NULL pointer dereference in HCI line discipline
Normally a serial Bluetooth device is opened, TIOSETD'ed to N_HCI line
discipline, HCIUARTSETPROTO'ed and finally closed. In case the device
fails to HCIUARTSETPROTO, closing it produces a NULL pointer dereference.
Alan Cox [Fri, 2 Nov 2007 02:41:27 +0000 (03:41 +0100)]
aacraid: fix security hole (CVE-2007-4308)
On the SCSI layer ioctl path there is no implicit permissions check for
ioctls (and indeed other drivers implement unprivileged ioctls). aacraid
however allows all sorts of very admin only things to be done so should
check.
Signed-off-by: Alan Cox <alan@redhat.com> Acked-by: Mark Salyzyn <mark_salyzyn@adaptec.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Steve French [Fri, 2 Nov 2007 02:30:35 +0000 (03:30 +0100)]
CIFS should honour umask (CVE-2007-3740)
This patch makes CIFS honour a process' umask like other filesystems.
Of course the server is still free to munge the permissions if it wants
to; but the client will send the "right" permissions to begin with.
A few caveats:
1) It only applies to filesystems that have CAP_UNIX (aka support unix
extensions)
2) It applies the correct mode to the follow up CIFSSMBUnixSetPerms()
after remote creation
When mode to CIFS/NTFS ACL mapping is complete we can do the
same thing for that case for servers which do not
support the Unix Extensions.
Signed-off-by: Matt Keenen <matt@opcode-solutions.com> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
[IEEE80211]: avoid integer underflow for runt rx frames (CVE-2007-4997)
Reported by Chris Evans <scarybeasts@gmail.com>:
> The summary is that an evil 80211 frame can crash out a victim's
> machine. It only applies to drivers using the 80211 wireless code, and
> only then to certain drivers (and even then depends on a card's
> firmware not dropping a dubious packet). I must confess I'm not
> keeping track of Linux wireless support, and the different protocol
> stacks etc.
>
> Details are as follows:
>
> ieee80211_rx() does not explicitly check that "skb->len >= hdrlen".
> There are other skb->len checks, but not enough to prevent a subtle
> off-by-two error if the frame has the IEEE80211_STYPE_QOS_DATA flag
> set.
>
> This leads to integer underflow and crash here:
>
> if (frag != 0)
> flen -= hdrlen;
>
> (flen is subsequently used as a memcpy length parameter).
How about this?
Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Oliver Neukum [Sat, 27 Oct 2007 21:36:46 +0000 (23:36 +0200)]
USB: fix DoS in pwc USB video driver (CVE-2007-5093)
The pwc driver has a disconnect method that waits for user space to
close the device. This opens up an opportunity for a DoS attack,
blocking the USB subsystem and making khubd's task busy wait in
kernel space. This patch shifts freeing resources to close if an opened
device is disconnected.
Adrian Bunk:
Backported to 2.6.16.
Signed-off-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Chris Wright [Wed, 24 Oct 2007 19:54:41 +0000 (21:54 +0200)]
[SPARC64] pass correct addr in get_fb_unmapped_area(MAP_FIXED)
Looks like the MAP_FIXED case is using the wrong address hint. I'd
expect the comment "don't mess with it" means pass the request
straight on through, not change the address requested to -ENOMEM.
Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Hugh Dickins [Sun, 28 Oct 2007 21:32:04 +0000 (22:32 +0100)]
hugetlb: fix size=4G parsing
On 32-bit machines, mount -t hugetlbfs -o size=4G gave a 0GB filesystem,
size=5G gave a 1GB filesystem etc: there's no point in masking size with
HPAGE_MASK just before shifting its lower bits away, and since HPAGE_MASK is a
UL, that removed all the higher bits of the unsigned long long size.
Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
David Gibson [Sun, 28 Oct 2007 21:20:34 +0000 (22:20 +0100)]
hugetlb: check for brk() entering a hugepage region
Unlike mmap(), the codepath for brk() creates a vma without first checking
that it doesn't touch a region exclusively reserved for hugepages. On
powerpc, this can allow it to create a normal page vma in a hugepage
region, causing oopses and other badness.
Add a test to prevent this. With this patch, brk() will simply fail if it
attempts to move the break into a hugepage reserved region.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Ken Chen [Sun, 28 Oct 2007 20:40:41 +0000 (21:40 +0100)]
[IA64] fix ia64 is_hugepage_only_range
fix is_hugepage_only_range() definition to be "overlaps"
instead of "within architectural restricted hugetlb address
range". Simplify the ia64 specific code that used to use
is_hugepage_only_range() to just check which region the
address is in.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Adam Litke [Fri, 19 Oct 2007 17:05:10 +0000 (19:05 +0200)]
Don't allow the stack to grow into hugetlb reserved regions (CVE-2007-3739)
When expanding the stack, we don't currently check if the VMA will cross
into an area of the address space that is reserved for hugetlb pages.
Subsequent faults on the expanded portion of such a VMA will confuse the
low-level MMU code, resulting in an OOPS. Check for this.
Signed-off-by: Adam Litke <agl@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Hugh Dickins [Fri, 19 Oct 2007 12:30:18 +0000 (14:30 +0200)]
hugetlb: fix prio_tree unit (CVE-2007-4133)
hugetlb_vmtruncate_list was misconverted to prio_tree: its prio_tree is in
units of PAGE_SIZE (PAGE_CACHE_SIZE) like any other, not HPAGE_SIZE (whereas
its radix_tree is kept in units of HPAGE_SIZE, otherwise slots would be
absurdly sparse).
At first I thought the error benign, just calling __unmap_hugepage_range on
more vmas than necessary; but on 32-bit machines, when the prio_tree is
searched correctly, it happens to ensure the v_offset calculation won't
overflow. As it stood, when truncating at or beyond 4GB, it was liable to
discard pages COWed from lower offsets; or even to clear pmd entries of
preceding vmas, triggering exit_mmap's BUG_ON(nr_ptes).
Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Arthur Othieno [Fri, 19 Oct 2007 00:04:58 +0000 (02:04 +0200)]
hugetlbfs: add Kconfig help text
In kernel bugzilla #6248 (http://bugzilla.kernel.org/show_bug.cgi?id=6248),
Adrian Bunk <bunk@stusta.de> notes that CONFIG_HUGETLBFS is missing Kconfig
help text.
Signed-off-by: Arthur Othieno <apgo@patchbomb.org> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Ken Chen [Thu, 18 Oct 2007 23:59:17 +0000 (01:59 +0200)]
x86: HUGETLBFS and DEBUG_PAGEALLOC are incompatible
DEBUG_PAGEALLOC is not compatible with hugetlb page support. That debug
option turns off PSE. Once it is turned off in CR4, the cpu will ignore
pse bit in the pmd and causing infinite page-not- present faults.
So disable DEBUG_PAGEALLOC if the user selected hugetlbfs.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Stephen Smalley [Thu, 18 Oct 2007 23:27:51 +0000 (01:27 +0200)]
SELinux: clear parent death signal on SID transitions
Clear parent death signal on SID transitions to prevent unauthorized
signaling between SIDs.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Eric Paris <eparis@parisplace.org> Signed-off-by: James Morris <jmorris@localhost.localdomain> Signed-off-by: Adrian Bunk <bunk@kernel.org>
Ulrich Drepper [Thu, 18 Oct 2007 21:46:58 +0000 (23:46 +0200)]
make UML compile (FC6/x86-64)
I need this patch to get a UML kernel to compile. This is with the
kernel headers in FC6 which are automatically generated from the kernel
tree. Some headers are missing but those files don't need them. At
least it appears so since the resuling kernel works fine.
Tested on x86-64.
Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>