Michał Mirosław [Mon, 16 May 2011 19:14:21 +0000 (15:14 -0400)]
net: Change netdev_fix_features messages loglevel
Those reduced to DEBUG can possibly be triggered by unprivileged processes
and are nothing exceptional. Illegal checksum combinations can only be
caused by driver bug, so promote those messages to WARN.
Since GSO without SG will now only cause DEBUG message from
netdev_fix_features(), remove the workaround from register_netdevice().
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Jarosch [Mon, 16 May 2011 06:28:15 +0000 (06:28 +0000)]
vmxnet3: Fix inconsistent LRO state after initialization
During initialization of vmxnet3, the state of LRO
gets out of sync with netdev->features.
This leads to very poor TCP performance in a IP forwarding
setup and is hitting many VMware users.
Simplified call sequence:
1. vmxnet3_declare_features() initializes "adapter->lro" to true.
2. The kernel automatically disables LRO if IP forwarding is enabled,
so vmxnet3_set_flags() gets called. This also updates netdev->features.
3. Now vmxnet3_setup_driver_shared() is called. "adapter->lro" is still
set to true and LRO gets enabled again, even though
netdev->features shows it's disabled.
Fix it by updating "adapter->lro", too.
The private vmxnet3 adapter flags are scheduled for removal
in net-next, see commit a0d2730c9571aeba793cb5d3009094ee1d8fda35
"net: vmxnet3: convert to hw_features".
Patch applies to 2.6.37 / 2.6.38 and 2.6.39-rc6.
Please CC: comments.
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 16 May 2011 06:13:49 +0000 (06:13 +0000)]
sfc: Fix oops in register dump after mapping change
Commit 747df2258b1b9a2e25929ef496262c339c380009 ('sfc: Always map MCDI
shared memory as uncacheable') introduced a separate mapping for the
MCDI shared memory (MC_TREG_SMEM). This means we can no longer easily
include it in the register dump. Since it is not particularly useful
in debugging, substitute a recognisable dummy value.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Mon, 16 May 2011 12:45:39 +0000 (14:45 +0200)]
netfilter: nf_ct_sip: fix SDP parsing in TCP SIP messages for some Cisco phones
Some Cisco phones do not place the Content-Length field at the end of the
SIP message. This is valid, due to a misunderstanding of the specification
the parser expects the SDP body to start directly after the Content-Length
field. Fix the parser to scan for \r\n\r\n to locate the beginning of the
SDP body.
Reported-by: Teresa Kang <teresa_kang@gemtek.com.tw> Signed-off-by: Patrick McHardy <kaber@trash.net>
Patrick McHardy [Mon, 16 May 2011 12:42:26 +0000 (14:42 +0200)]
netfilter: nf_ct_sip: validate Content-Length in TCP SIP messages
Verify that the message length of a single SIP message, which is calculated
based on the Content-Length field contained in the SIP message, does not
exceed the packet boundaries.
Hans Schillstrom [Sun, 15 May 2011 15:20:29 +0000 (17:20 +0200)]
IPVS: fix netns if reading ip_vs_* procfs entries
Without this patch every access to ip_vs in procfs will increase
the netns count i.e. an unbalanced get_net()/put_net().
(ipvsadm commands also use procfs.)
The result is you can't exit a netns if reading ip_vs_* procfs entries.
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The commit 6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e
bridge: Reset IPCB when entering IP stack on NF_FORWARD
broke forwarding of IPV6 packets in bridge because it would
call bp_parse_ip_options with an IPV6 packet.
Reported-by: Noah Meyerhans <noahm@debian.org> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The commit 6b1e960fdbd75dcd9bcc3ba5ff8898ff1ad30b6e
bridge: Reset IPCB when entering IP stack on NF_FORWARD
broke forwarding of IPV6 packets in bridge because it would
call bp_parse_ip_options with an IPV6 packet.
Reported-by: Noah Meyerhans <noahm@debian.org> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding,llc: Fix structure sizeof incompatibility for some PDUs
With some combinations of arch/compiler (e.g. arm-linux-gcc) the sizeof
operator on structure returns value greater than expected. In cases when the
structure is used for mapping PDU fields it may lead to unexpected results
(such as holes and alignment problems in skb data). __packed prevents this
undesired behavior.
Signed-off-by: Vitalii Demianets <vitas@nppfactor.kiev.ua> Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit e9df2e8fd8fbc9 (Use appropriate sock tclass setting for
routing lookup) we lost ability to properly add ECN codemarks to ipv6
TCP frames.
It seems like TCP_ECN_send() calls INET_ECN_xmit(), which only sets the
ECN bit in the IPv4 ToS field (inet_sk(sk)->tos), but after the patch,
what's checked is inet6_sk(sk)->tclass, which is a completely different
field.
Close bug https://bugzilla.kernel.org/show_bug.cgi?id=34322
[Eric Dumazet] : added the INET_ECN_dontxmit() fix and replace macros
by inline functions for clarity.
Signed-off-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ne-h8300: Fix regression caused during net_device_ops conversion
Changeset dcd39c90290297f6e6ed8a04bb20da7ac2b043c5 ("ne-h8300: convert to
net_device_ops") broke ne-h8300 by adding 8390.o to the link. That
meant that lib8390.c was included twice, once in ne-h8300.c and once in
8390.c, subject to different macros. This patch reverts that by
avoiding the wrappers in 8390.c.
hydra: Fix regression caused during net_device_ops conversion
Changeset 5618f0d1193d6b051da9b59b0e32ad24397f06a4 ("hydra: convert to
net_device_ops") broke hydra by adding 8390.o to the link. That
meant that lib8390.c was included twice, once in hydra.c and once in
8390.c, subject to different macros. This patch reverts that by
avoiding the wrappers in 8390.c.
zorro8390: Fix regression caused during net_device_ops conversion
Changeset b6114794a1c394534659f4a17420e48cf23aa922 ("zorro8390: convert to
net_device_ops") broke zorro8390 by adding 8390.o to the link. That
meant that lib8390.c was included twice, once in zorro8390.c and once in
8390.c, subject to different macros. This patch reverts that by
avoiding the wrappers in 8390.c.
Reported-by: Christian T. Steigies <cts@debian.org> Suggested-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Christian T. Steigies <cts@debian.org> Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 11 May 2011 16:41:18 +0000 (17:41 +0100)]
sfc: Always map MCDI shared memory as uncacheable
We enabled write-combining for memory-mapped registers in commit 65f0b417dee94f779ce9b77102b7d73c93723b39, but inhibited it for the
MCDI shared memory where this is not supported. However,
write-combining mappings also allow read-reordering, which may also
be a problem.
I found that when an SFC9000-family controller is connected to an
Intel 3000 chipset, and write-combining is enabled, the controller
stops responding to PCIe read requests during driver initialisation
while the driver is polling for completion of an MCDI command. This
results in an NMI and system hang. Adding read memory barriers
between all reads to the shared memory area appears to reduce but not
eliminate the probability of this.
We have not yet established whether this is a bug in our BIU or in the
PCIe bridge. For now, work around by mapping the shared memory area
separately.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
To fix this, initialise the waitqueues during port probe instead
of port open.
Signed-off-by: Anton Blanchard <anton@samba.org> Cc: stable@kernel.org Acked-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Hartkopp [Tue, 10 May 2011 20:12:30 +0000 (13:12 -0700)]
slcan: fix ldisc->open retval
TTY layer expects 0 if the ldisc->open operation succeeded.
Reported-by: Matvejchikov Ilya <matvejchikov@gmail.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Williams [Mon, 9 May 2011 07:43:20 +0000 (07:43 +0000)]
net/usb: mark LG VL600 LTE modem ethernet interface as WWAN
Like other mobile broadband device ethernet interfaces, mark the LG
VL600 with the 'wwan' devtype so userspace knows it needs additional
configuration via the AT port before the interface can be used.
Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
xfrm: Don't allow esn with disabled anti replay detection
Unlike the standard case, disabled anti replay detection needs some
nontrivial extra treatment on ESN. RFC 4303 states:
Note: If a receiver chooses to not enable anti-replay for an SA, then
the receiver SHOULD NOT negotiate ESN in an SA management protocol.
Use of ESN creates a need for the receiver to manage the anti-replay
window (in order to determine the correct value for the high-order
bits of the ESN, which are employed in the ICV computation), which is
generally contrary to the notion of disabling anti-replay for an SA.
So return an error if an ESN state with disabled anti replay detection
is inserted for now and add the extra treatment later if we need it.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
xfrm: Assign the inner mode output function to the dst entry
As it is, we assign the outer modes output function to the dst entry
when we create the xfrm bundle. This leads to two problems on interfamily
scenarios. We might insert ipv4 packets into ip6_fragment when called
from xfrm6_output. The system crashes if we try to fragment an ipv4
packet with ip6_fragment. This issue was introduced with git commit ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
as needed). The second issue is, that we might insert ipv4 packets in
netfilter6 and vice versa on interfamily scenarios.
With this patch we assign the inner mode output function to the dst entry
when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
mode is used and the right fragmentation and netfilter functions are called.
We switch then to outer mode with the output_finish functions.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Frank Blaschka <blaschka@linux.vnet.ibm.com> Reported-by: Einar EL Lueck <ELELUECK@de.ibm.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We should call vlan_gvrp_request_leave() from unregister_vlan_dev(),
not from vlan_dev_stop(), because vlan_gvrp_uninit_applicant()
is called right after unregister_netdevice_queue(). In batch mode,
unregister_netdevice_queue() doesn’t immediately call vlan_dev_stop().
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Fox [Mon, 9 May 2011 09:40:42 +0000 (10:40 +0100)]
libertas: fix cmdpendingq locking
We occasionally see list corruption using libertas.
While we haven't been able to diagnose this precisely, we have spotted
a possible cause: cmdpendingq is generally modified with driver_lock
held. However, there are a couple of points where this is not the case.
Fix up those operations to execute under the lock, it seems like
the correct thing to do and will hopefully improve the situation.
Signed-off-by: Paul Fox <pgf@laptop.org> Signed-off-by: Daniel Drake <dsd@laptop.org> Acked-by: Dan Williams <dcbw@redhat.com> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
That commit claim that mac80211 will not use non-IBSS channel in IBSS
mode, what definitely is not true. Bug probably should be fixed in
mac80211, but that will require more work, so better to apply that patch
temporally, and provide proper mac80211 fix latter.
ath9k: Fix a warning due to a queued work during S3 state
during suspend/S3 state drv_flush is called from mac80211 irrespective of
interface count. In ath9k we queue a work in ath9k_flush which we expect
to be cancelled in the drv_stop call back. during suspend process mac80211
calls drv_stop only when the interface count(local->count) is non-zero.
unfortunately when the network manager is enabled, drv_flush is called
while drv_stop is not called as local->count reaches '0'.
So fix this by simply checking for the device presence in the
drv_flush call back in the driver before queueing work or anything else.
this patch fixes the following WARNING
Signed-off-by: Rajkumar Manoharan <rmanoharan@atheros.com> Signed-off-by: Mohammed Shafi Shajakhan <mshajakhan@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Luciano Coelho [Tue, 3 May 2011 18:40:08 +0000 (21:40 +0300)]
mac80211: don't start the dynamic ps timer if not associated
When we are disconnecting, we set PS off, but this happens before we
send the deauth/disassoc request. When the deauth/disassoc frames are
sent, we trigger the dynamic ps timer, which then times out and turns
PS back on. Thus, PS remains on after disconnecting, causing problems
when associating again.
This can be fixed by preventing the timer to start when we're not
associated anymore.
Signed-off-by: Luciano Coelho <coelho@ti.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Florian Wesphal says:
"... when the packet was sent from the local machine the skb
already has ->nfct attached, and -m conntrack seems to do
the right thing."
Acked-by: Jan Engelhardt <jengelh@medozas.de> Reported-by: Florian Wesphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: IPv6: initialize TOS field in REJECT target module
The IPv6 header is not zeroed out in alloc_skb so we must initialize
it properly unless we want to see IPv6 packets with random TOS fields
floating around. The current implementation resets the flow label
but this could be changed if deemed necessary.
We stumbled upon this issue when trying to apply a mangle rule to
the RST packet generated by the REJECT target module.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
DESCRIPTION
This patch tries to restore the initial init and cleanup
sequences that was before namspace patch.
Netns also requires action when net devices unregister
which has never been implemented. I.e this patch also
covers when a device moves into a network namespace,
and has to be released.
IMPLEMENTATION
The number of calls to register_pernet_device have been
reduced to one for the ip_vs.ko
Schedulers still have their own calls.
This patch adds a function __ip_vs_service_cleanup()
and an enable flag for the netfilter hooks.
The nf hooks will be enabled when the first service is loaded
and never disabled again, except when a namespace exit starts.
Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg>
[horms@verge.net.au: minor edit to changelog] Signed-off-by: Simon Horman <horms@verge.net.au>
IPVS: Change of socket usage to enable name space exit.
If the sync daemons run in a name space while it crashes
or get killed, there is no way to stop them except for a reboot.
When all patches are there, ip_vs_core will handle register_pernet_(),
i.e. ip_vs_sync_init() and ip_vs_sync_cleanup() will be removed.
Kernel threads should not increment the use count of a socket.
By calling sk_change_net() after creating a socket this is avoided.
sock_release cant be used intead sk_release_kernel() should be used.
Thanks Eric W Biederman for your advices.
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
[horms@verge.net.au: minor edit to changelog] Signed-off-by: Simon Horman <horms@verge.net.au>
netfilter: ebtables: only call xt_compat_add_offset once per rule
The optimizations in commit 255d0dc34068a976
(netfilter: x_table: speedup compat operations) assume that
xt_compat_add_offset is called once per rule.
ebtables however called it for each match/target found in a rule.
The match/watcher/target parser already returns the needed delta, so it
is sufficient to move the xt_compat_add_offset call to a more reasonable
location.
While at it, also get rid of the unused COMPAT iterator macros.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
Eric Dumazet [Thu, 21 Apr 2011 08:57:21 +0000 (10:57 +0200)]
netfilter: fix ebtables compat support
commit 255d0dc34068a976 (netfilter: x_table: speedup compat operations)
made ebtables not working anymore.
1) xt_compat_calc_jump() is not an exact match lookup
2) compat_table_info() has a typo in xt_compat_init_offsets() call
3) compat_do_replace() misses a xt_compat_init_offsets() call
Reported-by: dann frazier <dannf@dannf.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
Tomoya [Mon, 9 May 2011 01:19:37 +0000 (01:19 +0000)]
pch_gbe: support ML7223 IOH
Support new device OKI SEMICONDUCTOR ML7223 IOH(Input/Output Hub).
The ML7223 IOH is for MP(Media Phone) use.
The ML7223 is companion chip for Intel Atom E6xx series.
The ML7223 is completely compatible for Intel EG20T PCH.
Signed-off-by: Tomoya MORINAGA <tomoya-linux@dsn.okisemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiharu Okada [Fri, 6 May 2011 02:53:51 +0000 (02:53 +0000)]
PCH_GbE : Fixed the issue of collision detection
The collision detection setting was invalid.
When collision occurred, because data was not resent,
there was an issue to which a transmitting throughput falls.
This patch enables the collision detection.
Signed-off-by: Toshiharu Okada <toshiharu-linux@dsn.okisemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
TTY layer expects 0 if the ldisc->open operation succeeded.
Signed-off-by : Matvejchikov Ilya <matvejchikov@gmail.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Somnath Kotur [Wed, 4 May 2011 22:40:46 +0000 (22:40 +0000)]
be2net: Fixed bugs related to PVID.
Fixed bug to make sure 'pvid' retrieval will work on big endian hosts.
Fixed incorrect comparison between the Rx Completion's 16-bit VLAN TCI
and the PVID. Now comparing only the relevant 12 bits corresponding to
the VID.
Renamed 'vid' field under Rx Completion to 'vlan_tag' to reflect
accurate description.
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently EHEA reports to ethtool as supporting 10M, 100M, 1G and
10G and connected to FIBRE independent of the hardware configuration.
However, when connected to FIBRE the only supported speed is 10G
full-duplex, and the other speeds and modes are only supported
when connected to twisted pair.
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Acked-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_cubic: limit delayed_ack ratio to prevent divide error
TCP Cubic keeps a metric that estimates the amount of delayed
acknowledgements to use in adjusting the window. If an abnormally
large number of packets are acknowledged at once, then the update
could wrap and reach zero. This kind of ACK could only
happen when there was a large window and huge number of
ACK's were lost.
This patch limits the value of delayed ack ratio. The choice of 32
is just a conservative value since normally it should be range of
1 to 4 packets.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 3 May 2011 07:49:25 +0000 (07:49 +0000)]
ipheth: Properly distinguish length and alignment in URBs and skbs
The USB protocol this driver implements appears to require 2 bytes of
padding in front of each received packet. This used to be equal to
the value of NET_IP_ALIGN on x86, so the driver abused that constant
and mostly worked, but this is no longer the case. The driver also
mixed up the URB and packet lengths, resulting in 2 bytes of junk at
the end of the skb.
Introduce a private constant for the 2 bytes of padding; fix this
confusion and check for the under-length case.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
DESCRIPTION
This patch tries to restore the initial init and cleanup
sequences that was before namspace patch.
Netns also requires action when net devices unregister
which has never been implemented. I.e this patch also
covers when a device moves into a network namespace,
and has to be released.
IMPLEMENTATION
The number of calls to register_pernet_device have been
reduced to one for the ip_vs.ko
Schedulers still have their own calls.
This patch adds a function __ip_vs_service_cleanup()
and an enable flag for the netfilter hooks.
The nf hooks will be enabled when the first service is loaded
and never disabled again, except when a namespace exit starts.
Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg>
[horms@verge.net.au: minor edit to changelog] Signed-off-by: Simon Horman <horms@verge.net.au>
IPVS: Change of socket usage to enable name space exit.
If the sync daemons run in a name space while it crashes
or get killed, there is no way to stop them except for a reboot.
When all patches are there, ip_vs_core will handle register_pernet_(),
i.e. ip_vs_sync_init() and ip_vs_sync_cleanup() will be removed.
Kernel threads should not increment the use count of a socket.
By calling sk_change_net() after creating a socket this is avoided.
sock_release cant be used intead sk_release_kernel() should be used.
Thanks Eric W Biederman for your advices.
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
[horms@verge.net.au: minor edit to changelog] Signed-off-by: Simon Horman <horms@verge.net.au>
Roland Dreier [Fri, 6 May 2011 08:32:53 +0000 (08:32 +0000)]
vmxnet3: Consistently disable irqs when taking adapter->cmd_lock
Using the vmxnet3 driver produces a lockdep warning because
vmxnet3_set_mc(), which is called with mc->mca_lock held, takes
adapter->cmd_lock. However, there are a couple of places where
adapter->cmd_lock is taken with softirqs enabled, lockdep warns that a
softirq that tries to take mc->mca_lock could happen while
adapter->cmd_lock is held, leading to an AB-BA deadlock.
I'm not sure if this is a real potential deadlock or not, but the
simplest and best fix seems to be simply to make sure we take cmd_lock
with spin_lock_irqsave() everywhere -- the places with plain spin_lock
just look like oversights.
The full enormous lockdep warning is:
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.39-rc6+ #1
---------------------------------------------------------
ifconfig/567 just changed the state of lock:
(&(&mc->mca_lock)->rlock){+.-...}, at: [<ffffffff81531e9f>] mld_ifc_timer_expire+0xff/0x280
but this lock took another, SOFTIRQ-unsafe lock in the past:
(&(&adapter->cmd_lock)->rlock){+.+...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
4 locks held by ifconfig/567:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff8147d547>] rtnl_lock+0x17/0x20
#1: ((inetaddr_chain).rwsem){.+.+.+}, at: [<ffffffff810896cf>] __blocking_notifier_call_chain+0x5f/0xb0
#2: (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8106f21b>] run_timer_softirq+0xeb/0x3f0
#3: (&ndev->lock){++.-..}, at: [<ffffffff81531dd2>] mld_ifc_timer_expire+0x32/0x280
Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Shreyas N Bhatewara <sbhatewara@vmware.com> Signed-off-by: Scott J. Goldman <scottjg@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Rosenberg [Fri, 6 May 2011 03:27:18 +0000 (03:27 +0000)]
dccp: handle invalid feature options length
A length of zero (after subtracting two for the type and len fields) for
the DCCPO_{CHANGE,CONFIRM}_{L,R} options will cause an underflow due to
the subtraction. The subsequent code may read past the end of the
options value buffer when parsing. I'm unsure of what the consequences
of this might be, but it's probably not good.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com> Cc: stable@kernel.org Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Kurt Van Dijck [Mon, 2 May 2011 04:50:48 +0000 (04:50 +0000)]
can: fix SJA1000 dlc for RTR packets
RTR frames do have a valid data length code on CAN.
The driver for SJA1000 did not handle that situation properly.
Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Ming Lei [Thu, 28 Apr 2011 22:37:09 +0000 (22:37 +0000)]
usbnet: runtime pm: fix out of memory
This patch makes use of the EVENT_DEV_OPEN flag introduced recently to
fix one out of memory issue, which can be reproduced on omap3/4 based
pandaboard/beagle XM easily with steps below:
- enable runtime pm
echo auto > /sys/devices/platform/usbhs-omap.0/ehci-omap.0/usb1/1-1/1-1.1/power/control
- ifconfig eth0 up
- then out of memroy happened, see [1] for kernel message.
Follows my analysis:
- 'ifconfig eth0 up' brings eth0 out of suspend, and usbnet_resume
is called to schedule dev->bh, then rx urbs are submited to prepare for
recieving data;
- some usbnet devices will produce garbage rx packets flood if
info->reset is not called in usbnet_open.
- so there is no enough chances for usbnet_bh to handle and release
recieved skb buffers since many rx interrupts consumes cpu, so out of memory
for atomic allocation in rx_submit happened.
This patch fixes the issue by simply not allowing schedule of usbnet_bh until device
is opened.
Eric Dumazet [Wed, 4 May 2011 10:02:26 +0000 (10:02 +0000)]
net: ip_expire() must revalidate route
Commit 4a94445c9a5c (net: Use ip_route_input_noref() in input path)
added a bug in IP defragmentation handling, in case timeout is fired.
When a frame is defragmented, we use last skb dst field when building
final skb. Its dst is valid, since we are in rcu read section.
But if a timeout occurs, we take first queued fragment to build one ICMP
TIME EXCEEDED message. Problem is all queued skb have weak dst pointers,
since we escaped RCU critical section after their queueing. icmp_send()
might dereference a now freed (and possibly reused) part of memory.
Calling skb_dst_drop() and ip_route_input_noref() to revalidate route is
the only possible choice.
Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Revert: veth: remove unneeded ifname code from veth_newlink()
84c49d8c3e4abefb0a41a77b25aa37ebe8d6b743 ("veth: remove unneeded
ifname code from veth_newlink()") caused regression on veth
creation. This patch reverts the original one.
Reported-by: Michał Mirosław <mirqus@gmail.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rabin Vincent [Sat, 30 Apr 2011 08:29:27 +0000 (08:29 +0000)]
smsc95xx: fix reset check
The reset loop check should check the MII_BMCR register value for
BMCR_RESET rather than for MII_BMCR (the register address, which also
happens to be zero).
Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: David S. Miller <davem@davemloft.net>
tg3: Fix failure to enable WoL by default when possible
tg3 is supposed to enable WoL by default on adapters which support
that, but it fails to do so unless the adapter's
/sys/devices/.../power/wakeup file contains 'enabled' during the
initialization of the adapter. Fix that by making tg3 use
device_set_wakeup_enable() to enable wakeup automatically whenever
WoL should be enabled by default.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
af_unix: Only allow recv on connected seqpacket sockets.
This fixes the following oops discovered by Dan Aloni:
> Anyway, the following is the output of the Oops that I got on the
> Ubuntu kernel on which I first detected the problem
> (2.6.37-12-generic). The Oops that followed will be more useful, I
> guess.
The bug was that unix domain sockets use a pseduo packet for
connecting and accept uses that psudo packet to get the socket.
In the buggy seqpacket case we were allowing unconnected
sockets to call recvmsg and try to receive the pseudo packet.
That is always wrong and as of commit 7361c36c5 the pseudo
packet had become enough different from a normal packet
that the kernel started oopsing.
Do for seqpacket_recv what was done for seqpacket_send in 2.5
and only allow it on connected seqpacket sockets.
Cc: stable@kernel.org Tested-by: Dan Aloni <dan@aloni.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add support of pause frames advertise in mii_get_an. This provides all drivers
that use mii_ethtool_gset to represent their own and Link partner flow control
abilities in ethtool.
Signed-off-by: Artem Polyakov <artpol84@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Adam Jaremko [Thu, 28 Apr 2011 07:41:18 +0000 (07:41 +0000)]
net: ftmac100: fix scheduling while atomic during PHY link status change
Signed-off-by: Adam Jaremko <adam.jaremko@gmail.com> Acked-by: Po-Yu Chuang <ratbert@faraday-tech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
It occured during some stress testing, in which the reporter was repeatedly
removing and modprobing the bnx2 module while doing various other random
operations on the bnx2 registered net device. Noting that this error occured on
a serdes based device, we noted that there were a few ethtool operations (most
notably self_test and set_phys_id) that have execution paths that lead into
bnx2_setup_serdes_phy. This function is notable because it executes a mod_timer
call, which starts the bp->timer running. Currently bnx2 is setup to assume
that this timer only nees to be stopped when bnx2_close or bnx2_suspend is
called. Since the above ethtool operations are not gated on the net device
having been opened however, that assumption is incorrect, and can lead to the
timer still running after the module has been removed, leading to the oops above
(as well as other simmilar oopses).
Fix the problem by ensuring that the timer is stopped when pci_device_unregister
is called.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Hushan Jia <hjia@redhat.com> CC: Michael Chan <mchan@broadcom.com> CC: "David S. Miller" <davem@davemloft.net> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patch partially resolve:
https://bugzilla.kernel.org/show_bug.cgi?id=16691
However, there are still 11n performance problems on 4965 and 5xxx
devices that need to be investigated.
Cc: stable@kernel.org # 2.6.35+ Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Peter Korsgaard [Tue, 26 Apr 2011 01:45:41 +0000 (01:45 +0000)]
dsa/mv88e6131: fix unknown multicast/broadcast forwarding on mv88e6085
The 88e6085 has a few differences from the other devices in the port
control registers, causing unknown multicast/broadcast packets to get
dropped when using the standard port setup.
At the same time update kconfig to clarify that the mv88e6085 is now
supported.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk> Acked-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Stewart [Thu, 28 Apr 2011 05:43:37 +0000 (05:43 +0000)]
usbnet: Resubmit interrupt URB if device is open
Resubmit interrupt URB if device is open. Use a flag set in
usbnet_open() to determine this state. Also kill and free
interrupt URB in usbnet_disconnect().
[Rebased off git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git]
Signed-off-by: Paul Stewart <pstew@chromium.org> Signed-off-by: David S. Miller <davem@davemloft.net>
WARNING: at drivers/net/wireless/iwlegacy/iwl-4965.c:1128 \
iwl4965_send_tx_power+0x61/0x102 [iwl4965]() Hardware name: [...]
TX Power requested while scanning!
Reported-and-tested-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
xfrm: Check for the new replay implementation if an esn state is inserted
IPsec extended sequence numbers can be used only with the new
anti-replay window implementation. So check if the new implementation
is used if an esn state is inserted and return an error if it is not.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
When we use IPsec extended sequence numbers, we may overwrite
the last scatterlist of the associated data by the scatterlist
for the skb. This patch fixes this by placing the scatterlist
for the skb right behind the last scatterlist of the associated
data. esp4 does it already like that.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
xfrm: Fix replay window size calculation on initialization
On replay initialization, we compute the size of the replay
buffer to see if the replay window fits into the buffer.
This computation lacks a mutliplication by 8 because we need
the size in bit, not in byte. So we might return an error
even though the replay window would fit into the buffer.
This patch fixes this issue.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
The oops happens in dst_metrics_write_ptr()
include/net/dst.h:124: return dst->ops->cow_metrics(dst, p);
dst->ops->cow_metrics is NULL and causes the oops.
Provide cow_metrics() methods, like we did in commit 214f45c91bb
(net: provide default_advmss() methods to blackhole dst_ops)
Signed-off-by: Held Bernhard <berny156@gmx.de> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The default maximum transmit length for NCM USB frames should be so
that a short packet happens at the end if the device supports a length
greater than the defined maximum. This is achieved by adding 4 bytes
to the maximum length so that the existing logic can fit a short
packet there.
Signed-off-by: Hans Petter Selasky <hselasky@c2i.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
François Romieu [Sun, 24 Apr 2011 15:38:48 +0000 (17:38 +0200)]
r8169: don't request firmware when there's no userspace.
The firmware is cached during the first successfull call to open() and
released once the network device is unregistered. The driver uses the
cached firmware between open() and unregister_netdev().
So far the firmware is optional : a failure to load the firmware does
not prevent open() to success. It is thus necessary to 1) unregister
all 816x / 810[23] devices and 2) force a driver probe to issue a new
firmware load.
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Fixed-by: Ciprian Docan <docan@eden.rutgers.edu> Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Fixed packets parameters for FW in UDP checksum offload flow.
Do not dereference TCP headers on non TCP frames. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Fri, 22 Apr 2011 08:10:59 +0000 (08:10 +0000)]
netconsole: fix deadlock when removing net driver that netconsole is using (v2)
A deadlock was reported to me recently that occured when netconsole was being
used in a virtual guest. If the virtio_net driver was removed while netconsole
was setup to use an interface that was driven by that driver, the guest
deadlocked. No backtrace was provided because netconsole was the only console
configured, but it became clear pretty quickly what the problem was. In
netconsole_netdev_event, if we get an unregister event, we call
__netpoll_cleanup with the target_list_lock held and irqs disabled.
__netpoll_cleanup can, if pending netpoll packets are waiting call
cancel_delayed_work_sync, which is a sleeping path. the might_sleep call in
that path gets triggered, causing a console warning to be issued. The
netconsole write handler of course tries to take the target_list_lock again,
which we already hold, causing deadlock.
The fix is pretty striaghtforward. Simply drop the target_list_lock and
re-enable irqs prior to calling __netpoll_cleanup, the re-acquire the lock, and
restart the loop. Confirmed by myself to fix the problem reported.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
The commit was meant to support transport bridging, and specifically
virtual machines bridged to an ethernet interface connected to a
switch port wiht 802.1x enabled.
But this isn't the way to do it, it breaks too many other things.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tim Gardner [Wed, 20 Apr 2011 09:00:49 +0000 (09:00 +0000)]
atl1c: Fix work event interrupt/task races
The mechanism used to initiate work events from the interrupt
handler has a classic read/modify/write race between the interrupt
handler that sets the condition, and the worker task that reads and
clears the condition. Close these races by using atomic
bit fields.
Cc: stable@kernel.org Cc: Jie Yang <jie.yang@atheros.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 21 Apr 2011 00:20:04 +0000 (00:20 +0000)]
be2net: increment work_counter in be_worker
The commit 609ff3b ("be2net: add code to display temperature of ASIC")
adds support to display temperature of ASIC but there is missing
increment of work_counter in be_worker. Because of this 1) the
function be_cmd_get_die_temperature is called every 1 second instead
of every 32 seconds 2) be_cmd_get_die_temperature is called, although
it is not supported. This patch fixes this bug.
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Egerer [Wed, 20 Apr 2011 22:56:02 +0000 (22:56 +0000)]
ipv6: Remove hoplimit initialization to -1
The changes introduced with git-commit a02e4b7d ("ipv6: Demark default
hoplimit as zero.") missed to remove the hoplimit initialization. As a
result, ipv6_get_mtu interprets the return value of dst_metric_raw
(-1) as 255 and answers ping6 with this hoplimit. This patche removes
the line such that ping6 is answered with the hoplimit value
configured via sysctl.
Signed-off-by: Thomas Egerer <thomas.egerer@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Galbusera [Thu, 21 Apr 2011 02:21:21 +0000 (02:21 +0000)]
powerpc: Fix multicast problem in fs_enet driver
mac-fec.c was setting individual UDP address registers instead of multicast
group address registers when joining a multicast group.
This prevented from correctly receiving UDP multicast packets.
According to datasheet, replaced hash_table_high and hash_table_low
with grp_hash_table_high and grp_hash_table_low respectively.
Also renamed hash_table_* with grp_hash_table_* in struct fec declaration
for 8xx: these registers are used only for multicast there.
Tested on a MPC5121 based board.
Build tested also against mpc866_ads_defconfig.
Signed-off-by: Andrea Galbusera <gizero@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
At this point, skb->data points to skb_transport_header.
So, headroom check is wrong.
For some case:bridge(UFO is on) + eth device(UFO is off),
there is no enough headroom for IPv6 frag head.
But headroom check is always false.
This will bring about data be moved to there prior to skb->head,
when adding IPv6 frag header to skb.
Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter: ebtables: only call xt_compat_add_offset once per rule
The optimizations in commit 255d0dc34068a976
(netfilter: x_table: speedup compat operations) assume that
xt_compat_add_offset is called once per rule.
ebtables however called it for each match/target found in a rule.
The match/watcher/target parser already returns the needed delta, so it
is sufficient to move the xt_compat_add_offset call to a more reasonable
location.
While at it, also get rid of the unused COMPAT iterator macros.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
Eric Dumazet [Thu, 21 Apr 2011 08:57:21 +0000 (10:57 +0200)]
netfilter: fix ebtables compat support
commit 255d0dc34068a976 (netfilter: x_table: speedup compat operations)
made ebtables not working anymore.
1) xt_compat_calc_jump() is not an exact match lookup
2) compat_table_info() has a typo in xt_compat_init_offsets() call
3) compat_do_replace() misses a xt_compat_init_offsets() call
Reported-by: dann frazier <dannf@dannf.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
iwlwifi: sanity check before counting number of tfds can be free
we use skb->data after calling ieee80211_tx_status_irqsafe(), which
could free skb instantly.
On current kernels I do not observe practical problems related with
bug, but on 2.6.35.y it cause random system hangs when stressing
wireless link, making bisection of other problems impossible.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
Johannes Berg [Tue, 19 Apr 2011 18:44:04 +0000 (20:44 +0200)]
mac80211: fix SMPS debugfs locking
The locking with SMPS requests means that the
debugs file should lock the mgd mutex, not the
iflist mutex. Calls to __ieee80211_request_smps()
need to hold that mutex, so add an assertion.
This has always been wrong, but for some reason
never been noticed, probably because the locking
error only happens while unassociated.
Cc: stable@kernel.org [2.6.34+] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>