Eric Dumazet [Tue, 9 Jun 2009 11:01:02 +0000 (04:01 -0700)]
r8169: fix crash when large packets are received
Michael Tokarev reported receiving a large packet could crash
a machine with RTL8169 NIC.
( original thread at http://lkml.org/lkml/2009/6/8/192 )
Problem is this driver tells that NIC frames up to 16383 bytes
can be received but provides skb to rx ring allocated with
smaller sizes (1536 bytes in case standard 1500 bytes MTU is used)
When a frame larger than what was allocated by driver is received,
dma transfert can occurs past the end of buffer and corrupt
kernel memory.
Fix is to tell to NIC what is the maximum size a frame can be.
This bug is very old, (before git introduction, linux-2.6.10), and
should be backported to stable versions.
Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
Minoru Usui [Tue, 2 Jun 2009 09:17:34 +0000 (02:17 -0700)]
net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup
This patch fixes a bug which unconfigured struct tcf_proto keeps
chaining in tc_ctl_tfilter(), and avoids kernel panic in
cls_cgroup_classify() when we use cls_cgroup.
When we execute 'tc filter add', tcf_proto is allocated, initialized
by classifier's init(), and chained. After it's chained,
tc_ctl_tfilter() calls classifier's change(). When classifier's
change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto.
In addition, cls_cgroup is initialized in change() not in init(). It
accesses unconfigured struct tcf_proto which is chained before
change(), then hits Oops.
Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Tue, 2 Jun 2009 08:29:58 +0000 (01:29 -0700)]
e1000: add missing length check to e1000 receive routine
Patch to fix bad length checking in e1000. E1000 by default does two
things:
1) Spans rx descriptors for packets that don't fit into 1 skb on recieve
2) Strips the crc from a frame by subtracting 4 bytes from the length prior to
doing an skb_put
Since the e1000 driver isn't written to support receiving packets that span
multiple rx buffers, it checks the End of Packet bit of every frame, and
discards it if its not set. This places us in a situation where, if we have a
spanning packet, the first part is discarded, but the second part is not (since
it is the end of packet, and it passes the EOP bit test). If the second part of
the frame is small (4 bytes or less), we subtract 4 from it to remove its crc,
underflow the length, and wind up in skb_over_panic, when we try to skb_put a
huge number of bytes into the skb. This amounts to a remote DOS attack through
careful selection of frame size in relation to interface MTU. The fix for this
is already in the e1000e driver, as well as the e1000 sourceforge driver, but no
one ever pushed it to e1000. This is lifted straight from e1000e, and prevents
small frames from causing the underflow described above
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Tested-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Ed Swierk [Tue, 2 Jun 2009 07:19:52 +0000 (00:19 -0700)]
forcedeth: add phy_power_down parameter, leave phy powered up by default (v2)
Add a phy_power_down parameter to forcedeth: set to 1 to power down the
phy and disable the link when an interface goes down; set to 0 to always
leave the phy powered up.
The phy power state persists across reboots; Windows, some BIOSes, and
older versions of Linux don't bother to power up the phy again, forcing
users to remove all power to get the interface working (see
http://bugzilla.kernel.org/show_bug.cgi?id=13072). Leaving the phy
powered on is the safest default behavior. Users accustomed to seeing
the link state reflect the interface state and/or wanting to minimize
power consumption can set phy_power_down=1 if compatibility with other
OSes is not an issue.
Signed-off-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Several EISA device IDs for 3c509 family network cards are missing from
the driver, making the cards unusable in their EISA mode. Here's a fix to
add them based on the EISA configuration files distributed by 3Com and our
eisa.ids database.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Yevgeny Petrilin [Mon, 25 May 2009 20:57:21 +0000 (20:57 +0000)]
mlx4_en: Fix a kernel panic when waking tx queue
When the transmit queue gets full we enable interrupts for TX completions
There was a race that we handled the TX queue both from the interrupt context
and from the transmit function. Using "spin_trylock_irq()" ensures this
doesn't happen.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville [Thu, 28 May 2009 09:39:02 +0000 (11:39 +0200)]
at76c50x-usb: avoid mutex deadlock in at76_dwork_hw_scan
http://bugzilla.kernel.org/show_bug.cgi?id=13312
at76_dwork_hw_scan holds a mutex while calling ieee80211_scan_completed,
which then calls at76_config which needs the same mutex. This reworks
the ordering to not hold the lock while calling ieee80211_scan_completed.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Finn Thain [Thu, 28 May 2009 02:05:53 +0000 (02:05 +0000)]
mac8390: fix build with NET_POLL_CONTROLLER
Fix the build for CONFIG_NET_POLL_CONTROLLER that I broke with 217cbfa856dc1cbc2890781626c4032d9e3ec59f ("mac8390: fix regression
caused during net_device_ops conversion").
Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
introduced a regression on platforms defining DECLARE_PCI_UNMAP_ADDR()
and related macros as no-ops.
Rx descriptors are fed with the a page buffer bus address + page chunk offset.
The page buffer bus address is set and retrieved through
pci_unamp_addr_set(), pci_unmap_addr().
These functions being meaningless on x86 (if CONFIG_DMA_API_DEBUG is not set).
The HW ends up with a bogus bus address.
This patch saves the page buffer bus address for all plaftorms.
Signed-off-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This problem was introduced in 72961ecf84d67d6359a1b30f9b2a8427f13e1e71
since no space was reserved for the new attributes NFULA_HWTYPE,
NFULA_HWLEN and NFULA_HWHEADER.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
The calls to flush_work() are pointless in a single thread workqueue
and they are actually causing a lockdep warning.
=============================================
[ INFO: possible recursive locking detected ] 2.6.30-rc6-02911-gbb803cf #16
---------------------------------------------
bluetooth/2518 is trying to acquire lock:
(bluetooth){+.+.+.}, at: [<c0130c14>] flush_work+0x28/0xb0
but task is already holding lock:
(bluetooth){+.+.+.}, at: [<c0130424>] worker_thread+0x149/0x25e
other info that might help us debug this:
2 locks held by bluetooth/2518:
#0: (bluetooth){+.+.+.}, at: [<c0130424>] worker_thread+0x149/0x25e
#1: (&conn->work_del){+.+...}, at: [<c0130424>] worker_thread+0x149/0x25e
Based on a report by Oliver Hartkopp <oliver@hartkopp.net>
Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Tested-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Mike Frysinger [Wed, 27 May 2009 03:55:33 +0000 (20:55 -0700)]
bfin_mac: fix build error due to net_device_ops convert
The previous commit "convert to net_device_ops" broke the Blackfin MAC
driver as it declared the new structure before the function it used:
CC drivers/net/bfin_mac.o
drivers/net/bfin_mac.c:984: error: ‘bfin_mac_close’ undeclared here (not in a function)
make[1]: *** [drivers/net/bfin_mac.o] Error 1
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alex Chiang [Wed, 27 May 2009 03:50:12 +0000 (20:50 -0700)]
atlx: move modinfo data from atlx.h to atl1.c
Both atl1.c and atl2.c include atlx.h, which defines some modinfo
stuff. But atl2.c seems like it doesn't want the modinfo data
from atlx.h, as it defines its own.
Running modinfo on atl2.ko, we get conflicting information:
$ /sbin/modinfo drivers/net/atlx/atl2.ko | egrep "version|description|author"
version: 2.2.3
description: Atheros Fast Ethernet Network Driver
author: Atheros Corporation <xiong.huang@atheros.com>, Chris Snook <csnook@redhat.com>
version: 2.1.3
author: Xiong Huang <xiong.huang@atheros.com>, Chris Snook <csnook@redhat.com>, Jay Cliburn <jcliburn@gmail.com>
Move the modinfo data out of atlx.h and into atl1.c to eliminate
the confusion:
Reported-by: Scott Scriven <scott.scriven@hp.com> Signed-off-by: Alex Chiang <achiang@hp.com> Acked-by: Jay Cliburn <jcliburn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaotian Feng [Wed, 27 May 2009 03:48:04 +0000 (20:48 -0700)]
gianfar: fix babbling rx error event bug
Gianfar interrupt handler uses IEVENT_ERR_MASK to check and handle errors.
Babbling RX error (IEVENT_BABR) should be included in IEVENT_ERROR_MASK.
Otherwise if BABR is raised, it never gets handled nor cleared, and an
interrupt storm results. This has been observed to happen on sending a
burst of ethernet frames to a gianfar based board.
Signed-off-by: Xiaotian Feng <xiaotian.feng@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Menage [Wed, 27 May 2009 03:47:02 +0000 (20:47 -0700)]
cls_cgroup: read classid atomically in classifier
Avoid reading the unsynchronized value cs->classid multiple times,
since it could change concurrently from non-zero to zero; this would
result in the classifier returning a positive result with a bogus
(zero) classid.
Signed-off-by: Paul Menage <menage@google.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Dillow [Fri, 22 May 2009 15:29:34 +0000 (15:29 +0000)]
r8169: avoid losing MSI interrupts
The 8169 chip only generates MSI interrupts when all enabled event
sources are quiescent and one or more sources transition to active. If
not all of the active events are acknowledged, or a new event becomes
active while the existing ones are cleared in the handler, we will not
see a new interrupt.
The current interrupt handler masks off the Rx and Tx events once the
NAPI handler has been scheduled, which opens a race window in which we
can get another Rx or Tx event and never ACK'ing it, stopping all
activity until the link is reset (ifconfig down/up). Fix this by always
ACK'ing all event sources, and loop in the handler until we have all
sources quiescent.
Signed-off-by: David Dillow <dave@thedillows.org> Tested-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Finn Thain [Tue, 26 May 2009 05:43:49 +0000 (22:43 -0700)]
mac8390: fix regression caused during net_device_ops conversion
Changeset ca17584bf2ad1b1e37a5c0e4386728cc5fc9dabc ("mac8390: update
to net_device_ops") broke mac8390 by adding 8390.o to the link. That
meant that lib8390.c was included twice, once in mac8390.c and once in
8390.c, subject to different macros. This patch reverts that by
avoiding the wrappers in 8390.c. They seem to be of no value since
COMPAT_NET_DEV_OPS is going away soon.
Tested with a Kinetics EtherPort card.
Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert L Mathews discovered that some clients send evil TCP RST segments,
which are accepted by netfilter conntrack but discarded by the
destination. Thus the conntrack entry is destroyed but the destination
retransmits data until timeout.
The same technique, i.e. sending properly crafted RST segments, can easily
be used to bypass connlimit/connbytes based restrictions (the sample
script written by Robert can be found in the netfilter mailing list
archives).
The patch below adds a new flag and new field to struct ip_ct_tcp_state so
that checking RST segments can be made more strict and thus TCP conntrack
can catch the invalid ones: the RST segment is accepted only if its
sequence number higher than or equal to the highest ack we seen from the
other direction. (The last_ack field cannot be reused because it is used
to catch resent packets.)
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
gianfar: fix BUG under load after introduction of skb recycling
Since commit 0fd56bb5be6455d0d42241e65aed057244665e5e ("gianfar:
Add support for skb recycling"), gianfar puts skbuffs that are in
the rx ring back onto the recycle list as-is in case there was a
receive error, but this breaks the following invariant: that all
skbuffs on the recycle list have skb->data = skb->head + NET_SKB_PAD.
The RXBUF_ALIGNMENT realignment done in gfar_new_skb() will be done
twice on skbuffs recycled in this way, causing there not to be enough
room in the skb anymore to receive a full packet, eventually leading
to an skb_over_panic from gfar_clean_rx_ring() -> skb_put().
Resetting the skb->data pointer to skb->head + NET_SKB_PAD before
putting the skb back onto the recycle list restores the mentioned
invariant, and should fix this issue.
Reported-by: Michael Guntsche <mike@it-loops.com> Tested-by: Michael Guntsche <mike@it-loops.com> Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David S. Miller <davem@davemloft.net>
wimax/i2400m: usb: fix device reset on autosuspend while not yet idle
When the i2400m is connected to a network, the host interface (USB)
cannot be suspended. For that to happen, the device has to have
negotiated with the basestation to put the link on IDLE state.
If the host tries to put the device in standby while it is connected
but not idle, the device resets, as the driver should not do that.
To avoid triggering that, when the USB susbsytem requires the driver
to autosuspend the device, the driver checks if the device is not yet
idle. If it is not, the request is rejected (will be retried again
later on after the autosuspend timeout). At some point the device will
enter idle and the request will succeed (unless of course, there is
network traffic, but at that point, there is no idle neither in the
link or the host interface).
Dan Carpenter [Thu, 21 May 2009 22:22:02 +0000 (15:22 -0700)]
RxRPC: Error handling for rxrpc_alloc_connection()
rxrpc_alloc_connection() doesn't return an error code on failure, it just
returns NULL. IS_ERR(NULL) is false.
Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Olsson [Thu, 21 May 2009 22:20:59 +0000 (15:20 -0700)]
ipv4: Fix oops with FIB_TRIE
It seems we can fix this by disabling preemption while we re-balance the
trie. This is with the CONFIG_CLASSIC_RCU. It's been stress-tested at high
loads continuesly taking a full BGP table up/down via iproute -batch.
Note. fib_trie is not updated for CONFIG_PREEMPT_RCU
Reported-by: Andrei Popa Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Thu, 21 May 2009 22:07:12 +0000 (15:07 -0700)]
pktgen: do not access flows[] beyond its length
typo -- pkt_dev->nflows is for stats only, the number of concurrent
flows is stored in cflows.
Reported-By: Vladimir Ivashchenko <hazard@francoudi.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
The use of unspecified protocol in IPv6 initial route prevents quagga to
install IPv6 default route:
# show ipv6 route
S ::/0 [1/0] via fe80::1, eth1_0
K>* ::/0 is directly connected, lo, rej
C>* ::1/128 is directly connected, lo
C>* fe80::/64 is directly connected, eth1_0
# ip -6 route
fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440
hoplimit -1
ff00::/8 dev eth1_0 metric 256 mtu 1500 advmss 1440 hoplimit -1
unreachable default dev lo proto none metric -1 error -101 hoplimit 255
The attached patch ensures RTPROT_KERNEL to the default initial route
and fixes the problem for quagga.
This is similar to "ipv6: protocol for address routes" f410a1fba7afa79d2992620e874a343fdba28332.
# show ipv6 route
S>* ::/0 [1/0] via fe80::1, eth1_0
C>* ::1/128 is directly connected, lo
C>* fe80::/64 is directly connected, eth1_0
# ip -6 route
fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440
hoplimit -1
fe80::/64 dev eth1_0 proto kernel metric 256 mtu 1500 advmss 1440
hoplimit -1
ff00::/8 dev eth1_0 metric 256 mtu 1500 advmss 1440 hoplimit -1
default via fe80::1 dev eth1_0 proto zebra metric 1024 mtu 1500
advmss 1440 hoplimit -1
unreachable default dev lo proto kernel metric -1 error -101 hoplimit 255
Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 19 May 2009 20:14:28 +0000 (20:14 +0000)]
net: fix rtable leak in net/ipv4/route.c
Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
Quoted here because its a perfect one :
begin_of_quotation
2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
patch has at least one critical flaw, and another problem.
rt_intern_hash calculates rthi pointer, which is later used for new entry
insertion. The same loop calculates cand pointer which is used to clean the
list. If the pointers are the same, rtable leak occurs, as first the cand is
removed then the new entry is appended to it.
This leak leads to unregister_netdevice problem (usage count > 0).
Another problem of the patch is that it tries to insert the entries in certain
order, to facilitate counting of entries distinct by all but QoS parameters.
Unfortunately, referencing an existing rtable entry moves it to list beginning,
to speed up further lookups, so the carefully built order is destroyed.
For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
it will also destroy the ordering.
end_of_quotation
Trying to keep dst_entries ordered is too complex and breaks the fact that
order should depend on the frequency of use for garbage collection.
A possible fix is to make rt_intern_hash() simpler, and only makes
rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
entries order. The added loop is running on cache hot data, while cpu
is prefetching next object, so should be unnoticied.
Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 19 May 2009 18:54:22 +0000 (18:54 +0000)]
net: fix length computation in rt_check_expire()
rt_check_expire() computes average and standard deviation of chain lengths,
but not correclty reset length to 0 at beginning of each chain.
This probably gives overflows for sum2 (and sum) on loaded machines instead
of meaningful results.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jay Sternberg [Tue, 19 May 2009 21:56:36 +0000 (14:56 -0700)]
iwlwifi: update 5000 ucode support to version 2 of API
enable iwl driver to support 5000 ucode having version 2 of API
Signed-off-by: Jay Sternberg <jay.e.sternberg@linux.intel.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
cfg80211: fix race between core hint and driver's custom apply
Its possible for cfg80211 to have scheduled the work and for
the global workqueue to not have kicked in prior to a cfg80211
driver's regulatory hint or wiphy_apply_custom_regulatory().
Although this is very unlikely its possible and should fix
this race. When this race would happen you are expected to have
hit a null pointer dereference panic.
Cc: stable@kernel.org Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Tested-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk> Signed-off-by: John W. Linville <linville@tuxdriver.com>
John W. Linville [Wed, 20 May 2009 14:51:41 +0000 (10:51 -0400)]
airo: fix airo_get_encode{,ext} buffer overflow like I mean it...
"airo: airo_get_encode{,ext} potential buffer overflow" was actually a
no-op, due to an unrecognized type overflow in an assignment. Oddly,
gcc only seems to tell me about it when using -Wextra...grrr...
Signed-off-by: John W. Linville <linville@tuxdriver.com>
When the EEPROM contains weird values for the power levels we have to
fix the interpolation process.
Signed-off-by: Fabio Rossi <rossi.f@inwind.it> Acked-by: Nick Kossifidis <mickflemm@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Reinette Chatre [Fri, 15 May 2009 23:13:46 +0000 (16:13 -0700)]
iwlwifi: do not cancel delayed work inside spin_lock_irqsave
Calling cancel_delayed_work() from inside
spin_lock_irqsave, introduces a potential deadlock.
As explained by Johannes Berg <johannes@sipsolutions.net>
A - lock
T - timer
phase CPU 1 CPU 2
---------------------------------------------
some place that calls
cancel_timer_sync()
(which is the | code)
lock-irq(A)
| "lock-irq"(T)
| "unlock"(T)
| wait(T)
unlock(A)
timer softirq
"lock"(T)
run(T)
"unlock"(T)
irq handler
lock(A)
unlock(A)
Now all that again, interleaved, leading to deadlock:
lock-irq(A)
"lock"(T)
run(T)
IRQ during or maybe
before run(T) --> lock(A)
"lock-irq"(T)
wait(T)
We fix this by moving the call to cancel_delayed_work() into workqueue.
There are cases where the work may not actually be queued or running
at the time we are trying to cancel it, but cancel_delayed_work() is
able to deal with this.
Also cleanup iwl_set_mode related to this call. This function
(iwl_set_mode) is only called when bringing interface up and there will
thus not be any scanning done. No need to try to cancel scanning.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=13224, which was also
reported at http://marc.info/?l=linux-wireless&m=124081921903223&w=2 .
Tested-by: Miles Lane <miles.lane@gmail.com> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Forrest Zhang [Wed, 13 May 2009 15:14:39 +0000 (11:14 -0400)]
ath5k: fix exp off-by-one when computing OFDM delta slope
Commit e8f055f0c3b ("ath5k: Update reset code") subtly changed the
code that computes floating point values for the PHY3_TIMING register
such that the exponent is off by a decimal point, which can cause
problems with OFDM channel operation.
get_bitmask_order() actually returns the highest bit set plus one,
whereas the previous code wanted the highest bit set. Instead, use
ilog2 which is what this code is really calculating. Also check
coef_scaled to handle the (invalid) case where we need log2(0).
Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Wed, 13 May 2009 10:04:30 +0000 (12:04 +0200)]
wext: verify buffer size for SIOCSIWENCODEEXT
Another design flaw in wireless extensions (is anybody
surprised?) in the way it handles the iw_encode_ext
structure: The structure is part of the 'extra' memory
but contains the key length explicitly, instead of it
just being the length of the extra buffer - size of
the struct and using the explicit key length only for
the get operation (which only writes it).
Now, all drivers I checked use ext->key_len without
checking that both key_len and the struct fit into the
extra buffer that has been copied from userspace. This
leads to a buffer overrun while reading that buffer,
depending on the driver it may be possible to specify
arbitrary key_len or it may need to be a proper length
for the key algorithm specified.
Thankfully, this is only exploitable by root, but root
can actually cause a segfault or use kernel memory as
a key (which you can even get back with siocgiwencode
or siocgiwencodeext from the key buffer).
Fix this by verifying that key_len fits into the buffer
along with struct iw_encode_ext.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Frans Pop [Tue, 19 May 2009 04:48:38 +0000 (21:48 -0700)]
ipv4: make default for INET_LRO consistent with help text
Commit e81963b1 ("ipv4: Make INET_LRO a bool instead of tristate.")
changed this config from tristate to bool. Add default so that it is
consistent with the help text.
Signed-off-by: Frans Pop <elendil@planet.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Chenault [Tue, 19 May 2009 04:43:27 +0000 (21:43 -0700)]
net: fix skb_seq_read returning wrong offset/length for page frag data
When called with a consumed value that is less than skb_headlen(skb)
bytes into a page frag, skb_seq_read() incorrectly returns an
offset/length relative to skb->data. Ensure that data which should come
from a page frag does.
Signed-off-by: Thomas Chenault <thomas_chenault@dell.com> Tested-by: Shyam Iyer <shyam_iyer@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 19 May 2009 02:26:37 +0000 (19:26 -0700)]
pkt_sched: gen_estimator: use 64 bit intermediate counters for bps
gen_estimator can overflow bps (bytes per second) with Gb links, while
it was designed with a u32 API, with a theorical limit of 34360Mbit
(2^32 bytes)
Using 64 bit intermediate avbps/brate counters can allow us to reach
this theorical limit.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilpo Järvinen [Sun, 10 May 2009 20:32:34 +0000 (20:32 +0000)]
tcp: fix MSG_PEEK race check
Commit 518a09ef11 (tcp: Fix recvmsg MSG_PEEK influence of
blocking behavior) lets the loop run longer than the race check
did previously expect, so we need to be more careful with this
check and consider the work we have been doing.
I tried my best to deal with urg hole madness too which happens
here:
if (!sock_flag(sk, SOCK_URGINLINE)) {
++*seq;
...
by using additional offset by one but I certainly have very
little interest in testing that part.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: Frans Pop <elendil@planet.nl> Tested-by: Ian Zimmermann <itz@buug.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Gabriel Paubert [Mon, 18 May 2009 04:16:47 +0000 (21:16 -0700)]
mv643xx_eth: fix PPC DMA breakage
After 2.6.29, PPC no more admits passing NULL to the dev parameter of
the DMA API. The result is a BUG followed by solid lock-up when the
mv643xx_eth driver brings an interface up. The following patch makes
the driver work on my Pegasos again; it is mostly a search and replace
of NULL by mp->dev->dev.parent in dma allocation/freeing/mapping/unmapping
functions.
Signed-off-by: Gabriel Paubert <paubert@iram.es> Acked-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
One of the purposes of bonding is to allow for redundant links, and failover
correctly if the cable is pulled. If all the members of a bonded device have
no carrier present, the bonded device itself needs to report no carrier present
to user space so management tools (like routing daemons) can respond.
Bonding in 802.3ad mode does not work correctly for this because it incorrectly
chooses a link that is down as a possible aggregator.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If bridge is configured with no STP and forwarding delay of 0 (which
is typical for virtualization) then when link starts it will flood all
packets for the first 20 seconds.
This bug was introduced by a combination of earlier changes:
* forwarding database uses hold time of zero to indicate
user wants to always flood packets
* optimzation of the case of forwarding delay of 0 avoids the initial
timer tick
The fix is to just skip all the topology change detection code if
kernel STP is not being used.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the bridge catches all STP packets; even if STP is turned
off. This prevents other systems (which do have STP turned on)
from being able to detect loops in the network.
With this patch, if STP is off, then any packet sent to the STP
multicast group address is forwarded to all ports.
Based on earlier patch by Joakim Tjernlund with changes
to go through forwarding (not local chain), and optimization
that only last octet needs to be checked.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ralf Baechle [Sat, 16 May 2009 01:21:58 +0000 (01:21 +0000)]
NET: Meth: Fix unsafe mix of irq and non-irq spinlocks.
Mixing of normal and irq spinlocks results in the following lockdep messages
on bootup on IP32:
[...]
Sending DHCP requests .
======================================================
[ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] 2.6.30-rc5-00164-g41baeef #30
------------------------------------------------------
swapper/1 [HC0[0]:SC0[1]:HE0:SE0] is trying to acquire:
(&priv->meth_lock){+.+...}, at: [<ffffffff8026388c>] meth_tx+0x48/0x43c
and this task is already holding:
(_xmit_ETHER#2){+.-...}, at: [<ffffffff802d3a00>] __qdisc_run+0x118/0x30c
which would create a new lock dependency:
(_xmit_ETHER#2){+.-...} -> (&priv->meth_lock){+.+...}
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Tested-by: Andrew Randrianasulu <randrik_a@yahoo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yevgeny Petrilin [Mon, 18 May 2009 03:48:59 +0000 (20:48 -0700)]
mlx4_en: Fix not deleted napi structures
Napi structures are being created each time we open a port, but when
the port is closed the napi structure is only disabled but not removed.
This bug caused hang while removing the driver.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Friesen [Mon, 18 May 2009 03:39:33 +0000 (20:39 -0700)]
ipconfig: handle case of delayed DHCP server
If a DHCP server is delayed, it's possible for the client to receive the
DHCPOFFER after it has already sent out a new DHCPDISCOVER message from
a second interface. The client then sends out a DHCPREQUEST from the
second interface, but the server doesn't recognize the device and
rejects the request.
This patch simply tracks the current device being configured and throws
away the OFFER if it is not intended for the current device. A more
sophisticated approach would be to put the OFFER information into the
struct ic_device rather than storing it globally.
Signed-off-by: Chris Friesen <cfriesen@nortel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Emelyanov [Mon, 11 May 2009 00:36:35 +0000 (00:36 +0000)]
netpoll: don't dereference NULL dev from np
It looks like the dev in netpoll_poll can be NULL - at lease it's
checked at the function beginning. Thus the dev->netde_ops dereference
looks dangerous.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 16 May 2009 20:41:28 +0000 (13:41 -0700)]
Fix caller information for warn_slowpath_null
Ian Campbell noticed that since "Eliminate thousands of warnings with
gcc 3.2 build" (commit 57adc4d2dbf968fdbe516359688094eef4d46581) all
WARN_ON()'s currently appear to come from warn_slowpath_null(), eg:
WARNING: at kernel/softirq.c:143 warn_slowpath_null+0x1c/0x20()
because now that warn_slowpath_null() is in the call path, the
__builtin_return_address(0) returns that, rather than the place that
caused the warning.
Fix this by splitting up the warn_slowpath_null/fmt cases differently,
using a common helper function, and getting the return address in the
right place. This also happens to avoid the unnecessary stack usage for
the non-stdargs case, and just generally cleans things up.
Make the function name printout use %pS while at it.
Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 16 May 2009 19:47:11 +0000 (12:47 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
piix: The Sony TZ90 needs the cable type hardcoding
icside: register second channel of version 6 PCB
ide-tape: remove back-to-back REQUEST_SENSE detection
Alan Cox [Sat, 16 May 2009 17:03:36 +0000 (19:03 +0200)]
piix: The Sony TZ90 needs the cable type hardcoding
The Sony TZ90 needs the cable type hardcoding. See bug #12734
Signed-off-by: Alan Cox <alan@linux.intel.com> Reported-by: Jonathan E. Snow <jesnow@uh.edu>
[bart: port it from ata_piix to piix and give reporter the proper credit] Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Sergei Shtylyov [Sat, 16 May 2009 17:03:36 +0000 (19:03 +0200)]
icside: register second channel of version 6 PCB
The second IDE channel of version 6 PCB is not being registered anymore since
the commit 48c3c1072651922ed153bcf0a33ea82cf20df390 (ide: add struct ide_host
(take 3)).
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
ide_tape_issue_pc() assumed drive->pc isn't NULL on invocation when
checking for back-to-back request sense issues but drive->pc can be
NULL and even when it's not NULL, it's not safe to dereference it once
the previous command is complete because pc could have been freed or
was on stack. Kill back-to-back REQUEST_SENSE detection.
Len Brown [Fri, 15 May 2009 05:29:31 +0000 (01:29 -0400)]
ACPI: Idle C-states disabled by max_cstate should not disable the TSC
Processor idle power states C2 and C3 stop the TSC on many machines.
Linux recognizes this situation and marks the TSC as unstable:
Marking TSC unstable due to TSC halts in idle
But if those same machines are booted with "processor.max_cstate=1",
then there is no need to validate C2 and C3, and no need to
disable the TSC, which can be reliably used as a clocksource.
Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de>
Len Brown [Thu, 14 May 2009 21:27:38 +0000 (17:27 -0400)]
ACPI: idle: fix init-time TSC check regression
A previous 2.6.30 patch, a71e4917dc0ebbcb5a0ecb7ca3486643c1c9a6e2,
(ACPI: idle: mark_tsc_unstable() at init-time, not run-time)
erroneously disabled the TSC on systems that did not actually
have valid deep C-states.
Move the check after the deep-C-states are validated,
via new helper, tsc_check_state(), hich replaces tsc_halts_in_c().
Signed-off-by: Len Brown <len.brown@intel.com> Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Frans Pop <elendil@planet.nl>
But we discovered that some BIOS do not restore BM_RLD
after suspend, causing device errors on C3 and C4
after resume. So now the kernel restores BM_RLD.
Linus Torvalds [Fri, 15 May 2009 23:47:55 +0000 (16:47 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI MSI: Fix MSI-X with NIU cards
PCI: Fix pci-e port driver slot_reset bad default return value
Linus Torvalds [Fri, 15 May 2009 20:22:11 +0000 (13:22 -0700)]
Merge branch 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
* 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
drm/i915: Add new GET_PIPE_FROM_CRTC_ID ioctl.
drm/i915: Set HDMI hot plug interrupt enable for only the output in question.
drm/i915: Include 965GME pci ID in IS_I965GM(dev) to match UMS.
drm/i915: Use the GM45 VGA hotplug workaround on G45 as well.
drm/i915: ignore LVDS on intel graphics systems that lie about having it
drm/i915: sanity check IER at wait_request time
drm/i915: workaround IGD i2c bus issue in kernel side (v2)
drm/i915: Don't allow binding objects into the last page of the aperture.
drm/i915: save/restore fence registers across suspend/resume
drm/i915: x86 always has writeq. Add I915_READ64 for symmetry.
Linus Torvalds [Fri, 15 May 2009 19:04:37 +0000 (12:04 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: Media rotation rate and form factor heuristics
libata: Report disk alignment and physical block size
sata_fsl: Fix the command description of FSL SATA controller
sata_fsl: Fix compile warnings
[libata] sata_sx4: fixup interrupt handling
[libata] sata_sx4: convert to new exception handling methods
* git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6:
iwlwifi: fix device id registration for 6000 series 2x2 devices
ath5k: update channel in sw state after stopping RX and TX
rtl8187: use DMA-aware buffers with usb_control_msg
mac80211: avoid NULL ptr deref when finding max_rates in PID and minstrel
airo: airo_get_encode{,ext} potential buffer overflow
Pulled directly by Linus because Davem is off playing shuffle-board at
some Alaskan cruise, and the NULL ptr deref issue hits people and should
get merged sooner rather than later.
David - make us proud on the shuffle-board tournament!
libata: Media rotation rate and form factor heuristics
This patch provides new heuristics for parsing both the form factor and
media rotation rate ATA IDENFITY words.
The reported ATA version must be 7 or greater and the device must return
values defined as valid in the standard. Only then are the
characteristics reported to SCSI via the VPD B1 page.
This seems like a reasonable compromise to me considering that we have
been shipping several kernel releases that key off the rotation rate bit
without any version checking whatsoever. With no complaints so far.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
libata: Report disk alignment and physical block size
For disks with 4KB sectors, report the correct block size and alignment
when filling out the READ CAPACITY(16) response.
This patch is based upon code from Matthew Wilcox' 4KB ATA tree. I
fixed the bug I reported a while back caused by ATA and SCSI using
different approaches to describing the alignment.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Dave Liu [Thu, 14 May 2009 14:47:07 +0000 (09:47 -0500)]
sata_fsl: Fix the command description of FSL SATA controller
The bit 11 of command description is reserved bit in Freescale
SATA controller and needs to be set to '1'. This is needed to
make sure the last write from the controller to the buffer
descriptor is seen before an interrupt is raised.
Signed-off-by: Dave Liu <daveliu@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Kumar Gala [Thu, 14 May 2009 03:10:50 +0000 (22:10 -0500)]
sata_fsl: Fix compile warnings
We we build with dma_addr_t as a 64-bit quantity we get:
drivers/ata/sata_fsl.c: In function 'sata_fsl_fill_sg':
drivers/ata/sata_fsl.c:340: warning: format '%x' expects type 'unsigned int', but argument 4 has type 'dma_addr_t'
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Linus Torvalds [Fri, 15 May 2009 15:07:25 +0000 (08:07 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix race in ext4_inode_info.i_cached_extent
ext4: Clear the unwritten buffer_head flag after the extent is initialized
ext4: Use a fake block number for delayed new buffer_head
ext4: Fix sub-block zeroing for writes into preallocated extents
Linus Torvalds [Fri, 15 May 2009 15:06:45 +0000 (08:06 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: gdb documentation fix
kgdb,i386: use address that SP register points to in the exception frame
sysrq, intel_fb: fix sysrq g collision
Linus Torvalds [Fri, 15 May 2009 15:05:37 +0000 (08:05 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Revert "mm: add /proc controls for pdflush threads"
viocd: needs to depend on BLOCK
block: fix the bio_vec array index out-of-bounds test
Linus Torvalds [Fri, 15 May 2009 15:05:02 +0000 (08:05 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix PCI ROM access
powerpc/pseries: Really fix the oprofile CPU type on pseries
serial/nwpserial: Fix wrong register read address and add interrupt acknowledge.
powerpc/cell: Make ptcal more reliable
powerpc: Allow mem=x cmdline to work with 4G+
powerpc/mpic: Fix incorrect allocation of interrupt rev-map
powerpc: Fix oprofile sampling of marked events on POWER7
powerpc/iseries: Fix pci breakage due to bad dma_data initialization
powerpc: Fix mktree build error on Mac OS X host
powerpc/virtex: Fix duplicate level irq events.
powerpc/virtex: Add uImage to the default images list
powerpc/boot: add simpleImage.* to clean-files list
powerpc/8xx: Update defconfigs
powerpc/embedded6xx: Update defconfigs
powerpc/86xx: Update defconfigs
powerpc/85xx: Update defconfigs
powerpc/83xx: Update defconfigs
powerpc/fsl_soc: Remove mpc83xx_wdt_init, again
devpts_get_sb() calls memset(0) to clear mount options and calls
parse_mount_options() if user specified any mount options.
The memset(0) is bogus since the 'mode' and 'ptmxmode' options are
non-zero by default. parse_mount_options() restores options to default
anyway and can properly deal with NULL mount options.
So in devpts_get_sb() remove memset(0) and call parse_mount_options() even
for NULL mount options.
Bug reported by Eric Paris: http://lkml.org/lkml/2009/5/7/448.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Tested-by: Marc Dionne <marc.c.dionne@gmail.com> Reported-by: Eric Paris <eparis@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>