Yuchung Cheng [Mon, 3 Mar 2014 20:31:36 +0000 (12:31 -0800)]
tcp: snmp stats for Fast Open, SYN rtx, and data pkts
Add the following snmp stats:
TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed beacuse
the remote does not accept it or the attempts timed out.
TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down
retransmissions into SYN, fast-retransmits, timeout retransmits, etc.
TCPOrigDataSent: number of outgoing packets with original data (excluding
retransmission but including data-in-SYN). This counter is different from
TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is
more useful to track the TCP retransmission rate.
Change TCPFastOpenActive to track only successful Fast Opens to be symmetric to
TCPFastOpenPassive.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Lawrence Brakmo <brakmo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hiroaki SHIMODA [Sun, 2 Mar 2014 08:30:26 +0000 (17:30 +0900)]
sch_tbf: Remove holes in struct tbf_sched_data.
On x86_64 we have 3 holes in struct tbf_sched_data.
The member peak_present can be replaced with peak.rate_bytes_ps,
because peak.rate_bytes_ps is set only when peak is specified in
tbf_change(). tbf_peak_present() is introduced to test
peak.rate_bytes_ps.
The member max_size is moved to fill 32bit hole.
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Sacren [Sat, 1 Mar 2014 22:54:36 +0000 (15:54 -0700)]
ieee802154: fix at86rf212_set_txpower() exit path
The commit 9b2777d6089bc ("ieee802154: add TX power control to
wpan_phy") introduced the new function at86rf212_set_txpower() with
the questionable check of the return of __at86rf230_write() in the
exit path:
1) Both at86rf212_set_txpower() and __at86rf230_write() have the
same return type.
2) Whatever __at86rf230_write() returns becomes the return value of
at86rf212_set_txpower().
Thus, fix the exit path by getting rid of that check entirely.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset contains some fixes for small trivial bugs, and
compilation/syntactic parsers warnings
Patchset was applied and tested over commit 750f679 "Merge branch '6lowpan'"
Changes from V1:
-patch 5/9: Replace mlx4_en_mac_to_u64() with mlx4_mac_to_u64()
- Remove unnecessary define of ETH_ALEN
Changes from V0:
-patch 3/9: net/mlx4_en: Pad ethernet packets smaller than 17 bytes
- Make condition more efficient
- Didn't use canonical function to pad buffer since using bounce buffer
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Sun, 2 Mar 2014 08:25:04 +0000 (10:25 +0200)]
net/mlx4_en: Use union for BlueFlame WQE
When BlueFlame is turned on, control segment of the TX WQE is changed,
and the second line of it is used for QPN.
Changed code to use a union in the mlx4_wqe_ctrl_seg instead of casting.
This makes the code clearer and solves the static checker warning:
Eyal Perry [Sun, 2 Mar 2014 08:25:03 +0000 (10:25 +0200)]
net/mlx4_core: Fix sparse warning
This patch force conversion to u32 to fix the following sparse warning:
drivers/net/ethernet/mellanox/mlx4/fw.c:1822:53: warning: restricted __be32
degrades to integer
Casting to u32 is safe here, because token will be returned as is
from the hardware without any modification.
Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/mlx4: Replace mlx4_en_mac_to_u64() with mlx4_mac_to_u64()
Currently, the EN driver uses a private static function
mlx4_en_mac_to_u64(). Move it to a common include file (driver.h)
for mlx4_en and mlx4_ib for further use.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/mlx4_en: Move queue stopped/waked counters to be per ring
Give accurate counters and avoids cache misses when several rings
update the counters of stop/wake queue.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/mlx4_en: Pad ethernet packets smaller than 17 bytes
Hardware can't accept packets smaller than 17 bytes. Therefore need to
pad with zeros.
Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Verify mlx4_en module parameters.
In case they are out of range - reset to default values.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Sun, 2 Mar 2014 08:24:57 +0000 (10:24 +0200)]
net/mlx4_en: Fix UP limit in ieee_ets->prio_tc
User priority limit has to be less than MLX4_EN_NUM_UP.
Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Mar 2014 00:20:45 +0000 (19:20 -0500)]
Merge branch '6lowpan'
Alexander Aring says:
====================
6lowpan: fix issues with byte ordering types
I got some mail from a "kbuild test robot" and it detected some byte
ordering issues with the tag and datagram size value of 6LoWPAN IEEE
802.15.4 fragmentation header.
This patch series should fix the issues with the byte ordering.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Mar 2014 00:06:49 +0000 (19:06 -0500)]
Merge branch 'intel-next'
Aaron Brown says:
====================
Mark updates ixgbe for LER / adapter removal. He restores the HW
address in the recovery path so the device is not perpetually removed,
fixes up some removed state ethtool results and adds checks related to
config space access.
Jacob adds support for the new SIOCGHWTSTAMP ioctl.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Fri, 28 Feb 2014 23:48:58 +0000 (15:48 -0800)]
ixgbe: implement SIOCGHWTSTAMP ioctl
This patch adds support for the new SIOCGHWTSTAMP ioctl, which enables a
process to determine the current timestamp configuration. In order to
implement this, store a copy of the timestamp configuration. In
addition, we can remove the 'int cmd' parameter as the new set_ts_config
function doesn't use it. I also fixed a typo in the function
description.
-v2
* Only save the settings after validating them
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Fri, 28 Feb 2014 23:48:57 +0000 (15:48 -0800)]
ixgbe: Check config reads for removal
Configuration space reads should also be checked for removal. So
add some checks related to config space accesses.
v2:
* Fixed indent
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Fri, 28 Feb 2014 23:48:56 +0000 (15:48 -0800)]
ixgbe: Fix up some ethtool results when adapter is removed
Some ethtool tests returned apparently good results when the
adapter was in a removed state. Fix that by checking for removal.
This also fixes two paths that could return uninitialized memory
in data[4].
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Fri, 28 Feb 2014 23:48:55 +0000 (15:48 -0800)]
ixgbe: Restore hw_addr in LER recovery paths
The hw_addr needs to be restored in the pcie recovery path or
else the device will be perpetually removed. Also restore the
value in the resume path.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Fri, 28 Feb 2014 11:39:19 +0000 (12:39 +0100)]
bonding: send arp requests even if there's no route to them
Currently we're only sending arp requests if we have a route to the target
(and, thus, can find out the source ip address).
There are some use cases, however, where we don't want/need to set an ip
address (or set up a specific route) for bonding to use arp monitoring *for
traffic generation*. We can easily send arp probes (arp requests with src
ip == 0) to generate arp broadcast responses from the target ip and use
them for determining if the target is up.
This, obviously, won't work with arp validation - because we don't have the
ip address set and, thus, will filter out the responses. So in that case -
print a warning.
CC: François CACHEREUL <f.cachereul@alphalink.fr> CC: Zhenjie Chen <zhchen@redhat.com> CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Feb 2014 22:05:32 +0000 (17:05 -0500)]
Merge branch '6lowpan'
Alexander Aring says:
====================
6lowpan: reimplementation of fragmentation handling
this patch series reimplementation the fragmentation handling of 6lowpan
accroding to rfc4944 [1].
The first big note is, that the current fragmentation behaviour isn't rfc
complaint. The main issue is a wrong datagram_size value which needs to be:
datagram_size = ipv6_payload + ipv6 header + (maybe compressed transport header,
currently only udp is supported)
but the current datagram_size value is calculated as:
datagram_size = ipv6_payload
Fragmentation work in a linux<->linux communication only.
Why reimplementation?
I reimplemted the reassembly side only. The current behaviour is to allocate a
skb with the reassembled size and hold all fragments in a list, protected by a
spinlock. After we received all fragments (detected by the sum of all fragments,
it begins to place all fragments into the allocated skb).
This reassembly implementation has some race condition. Additional I make it more
rfc complaint. The current implementation match on the tag value inside the frag
header only, but rfc4944 says we need to match on dst addr(mac), src addr(mac),
tag value, datagram_size value. [2]
The new reassembly handling use the inet_frag api (I mean the callback interface
of ipv6 and ipv4 reassembly). I looked into ipv6 and wanted to see how ipv6 is
dealing with reassembly, so I based my code on this implementation.
On the sending side to generate the fragments I improved the current code to use
the nearest 8 divided payload. (We can do that, because the mac layer has a
dynamic size, so it depends on mac_header how big we can do the payload).
Of course I fix also the reassembly/sending side to be rfc complaint now.
changes since v2:
- rework checkpatch code style issue patch.
Merge two pr_debugs into one pr_debug.
changes since v3:
- rename 6lowpan.ko to 6lowpan_rtnl.c in commit msg of patch 5/8.
changes since v4:
- Add a new patch 2/8 to introduce lowpan_uncompress_size function. Also
improving this function a little bit.
- Add a new patch 4/8 to change tag value to __be16.
- use skb_header_reset function on FRAG1 only, which should have the
lowpan header. See lowpan_get_frag_info function. (slightly improving
of fragmentation header parsing).
- changes types of variables to u16 in lowpan_skb_fragmentation.
- use lowpan_uncompress_size instead of storing necessary information
in skb control block, this can be destroyed after dev_queue_xmit call.
Thanks David for this hint.
- remove Tested-by: Martin Townsend <martin.townsend@xsilon.com>, because
too many funcionality change.
changes since v5:
- handle lowpan_addr_mode_size with lookup table.
changes since v6:
- remove unnecessary parameter in lowpan_frag_queue.
- fix commit message in patch 8/8 which included a describtion of adding the
lownpan_uncompress_size function. This was splitted in a seperate patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:50 +0000 (07:32 +0100)]
6lowpan: handling 6lowpan fragmentation via inet_frag api
This patch drops the current way of 6lowpan fragmentation on receiving
side and replace it with a implementation which use the inet_frag api.
The old fragmentation handling has some race conditions and isn't
rfc4944 compatible. Also adding support to match fragments on
destination address, source address, tag value and datagram_size
which is missing in the current implementation.
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:49 +0000 (07:32 +0100)]
net: ns: add ieee802154_6lowpan namespace
This patch adds necessary ieee802154 6lowpan namespace to provide the
inet_frag information. This is a initial support for handling 6lowpan
fragmentation with the inet_frag api.
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:47 +0000 (07:32 +0100)]
6lowpan: move 6lowpan.c to 6lowpan_rtnl.c
We have a 6lowpan.c file and 6lowpan.ko file. To avoid confusing we
should move 6lowpan.c to 6lowpan_rtnl.c. Then we can support multiple
source files for 6lowpan module.
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Aring [Fri, 28 Feb 2014 06:32:45 +0000 (07:32 +0100)]
6lowpan: fix fragmentation on sending side
This patch fix the fragmentation on sending side according to rfc4944.
Also add improvement to use the full payload of a PDU which calculate
the nearest divided to 8 payload length for the fragmentation datagram
size attribute.
The main issue is that the datagram size of fragmentation header use the
ipv6 payload length, but rfc4944 says it's the ipv6 payload length inclusive
network header size (and transport header size if compressed).
Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 28 Feb 2014 01:22:06 +0000 (02:22 +0100)]
packet: allow to transmit +4 byte in TX_RING slot for VLAN case
Commit 57f89bfa2140 ("network: Allow af_packet to transmit +4 bytes
for VLAN packets.") added the possibility for non-mmaped frames to
send extra 4 byte for VLAN header so the MTU increases from 1500 to
1504 byte, for example.
Commit cbd89acb9eb2 ("af_packet: fix for sending VLAN frames via
packet_mmap") attempted to fix that for the mmap part but was
reverted as it caused regressions while using eth_type_trans()
on output path.
Lets just act analogous to 57f89bfa2140 and add a similar logic
to TX_RING. We presume size_max as overcharged with +4 bytes and
later on after skb has been built by tpacket_fill_skb() check
for ETH_P_8021Q header on packets larger than normal MTU. Can
be easily reproduced with a slightly modified trafgen in mmap(2)
mode, test cases:
Note that we need to do the test right after tpacket_fill_skb()
as sockets can have PACKET_LOSS set where we would not fail but
instead just continue to traverse the ring.
Reported-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Ben Greear <greearb@candelatech.com> Cc: Phil Sutter <phil@nwl.cc> Tested-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Feb 2014 17:42:13 +0000 (12:42 -0500)]
Merge branch 'intel-next'
Aaron Brown says:
====================
This series contains updates to ixgbe and ixgbevf.
Don provides an update to change a hard coded timeout interval to
a system-wide timeout one, collects AUTOC register functions into
one place and fixes some firmware bit handling.
Emil resolves a tx handling error introduced in a recent commit and
adds check for CHECKSUM_PARTIAL to avoid an skb_is_gso check
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Fri, 28 Feb 2014 04:32:45 +0000 (20:32 -0800)]
ixgbevf: add check for CHECKSUM_PARTIAL when doing TSO
This patch adds check for CHECKSUM_PARTIAL to avoid the skb_is_gso check
in ixgbevf_tso(). It should reduce overhead for workloads that are not using
TSO or checksum offloads. It is the same as in ixgbe.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Emil Tantilov [Fri, 28 Feb 2014 04:32:44 +0000 (20:32 -0800)]
ixgbevf: fix handling of tx checksumming
This patch resolves an issue introduced by:
commit 7ad1a093519e37fb673579819bf6af122641c397
ixgbevf: make the first tx_buffer a repository for most of the skb info
Incorrect check for the result of ixgbevf_tso() can lead to calling
ixgbevf_tx_csum() which can spawn 2 context descriptors and result in
performance degradation and/or corrupted packets.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Fri, 28 Feb 2014 04:32:43 +0000 (20:32 -0800)]
ixgbe: Add check for FW veto bit
The driver will now honor the MNG FW veto bit in blocking link resets.
This patch will affect x520 and x540 systems.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Fri, 28 Feb 2014 04:32:42 +0000 (20:32 -0800)]
ixgbe: fix bit toggled for 82599 reset fix.
The current code doesn't toggle the correct bit to reset the data pipeline
on Restart_AN assertion. This patch corrects that.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Fri, 28 Feb 2014 04:32:41 +0000 (20:32 -0800)]
ixgbe: collect all 82599 AUTOC code in one function
When reading or writing to the AUTOC register on 82599 devices we need to
preform various operations that aren't needed for other MAC types. This
patch will collect all of that code into one place to minimize MAC checks
in common code paths.
While doing this I also clean up some cases where we weren't holding the
SW/FW semaphore during a read/modify/write of AUTOC.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Fri, 28 Feb 2014 04:32:40 +0000 (20:32 -0800)]
ixgbe: fix to use correct timeout interval for memory read completion
Currently we were just always polling for a hard coded 80 ms and not
respecting the system-wide timeout interval. Since up until now all
devices have been tested with this 80ms value we continue to use this
value as a hard minimum.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Feb 2014 21:31:54 +0000 (16:31 -0500)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
This is the rework of the IPsec virtual tunnel interface
for ipv4 to support inter address family tunneling and
namespace crossing. The only change to the last RFC version
is a compile fix for an odd configuration where CONFIG_XFRM
is set but CONFIG_INET is not set.
1) Add and use a IPsec protocol multiplexer.
2) Add xfrm_tunnel_skb_cb to the skb common buffer
to store a receive callback there.
3) Make vti work with i_key set by not including the i_key
when comupting the hash for the tunnel lookup in case of
vti tunnels.
4) Update ip_vti to use it's own receive hook.
5) Remove xfrm_tunnel_notifier, this is replaced by the IPsec
protocol multiplexer.
6) We need to be protocol family indepenent, so use the on xfrm_lookup
returned dst_entry instead of the ipv4 rtable in vti_tunnel_xmit().
7) Add support for inter address family tunneling.
8) Check if the tunnel endpoints of the xfrm state and the vti interface
are matching and return an error otherwise.
8) Enable namespace crossing tor vti devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While working on extending some functionality I felt restricted
with the amount of documentation I can add. Part of this is that
the existing style on the header files don't let me be verbose.
This starts addressing that by using kdoc for the net_device
flags, and as Ben noted, the priv_flags can be moved out from
UAPI.
Luis R. Rodriguez (2):
net: kdoc struct net_device flags and priv_flags
net: move net_device priv_flags out from UAPI
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
These are private to userspace, and they're unstable
anyway and can be shuffled at will (see 080e4130b1fb)
so any userspace application relying on them is on crack.
Test compiled with allyesconfig.
mcgrof@drvbp1 /pub/mem/mcgrof/net-next (git::master)$ make allyesconfig
mcgrof@drvbp1 /pub/mem/mcgrof/net-next (git::master)$ time make -j 20
...
BUILD arch/x86/boot/bzImage
Setup is 16992 bytes (padded to 17408 bytes).
System is 56153 kB
CRC 721d2751
Kernel: arch/x86/boot/bzImage is ready (#1)
real 19m35.744s
user 280m37.984s
sys 27m54.104s
Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We have documentation for these flags but they're scattered
all over the place. #defines don't allow documentation to be
written easily so to help to start bringing some documentation
together use the enums kdoc practice but keep the defines to
allow userspace to be able to #ifdef them.
I've verified the same values are assigned before and after
with a simple userspace test program [0] and checksumming the
output.
Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We are trying to finally kill off interruptible_sleep_on_timeout.
the two uses in the nicstar driver can be trivially replaced
with wait_event_interruptible_lock_irq_timeout, which prevents the
wake-up race and is able to check the buffer state with scq->lock
held.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 26 Feb 2014 22:02:48 +0000 (14:02 -0800)]
tcp: switch rtt estimations to usec resolution
Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.
FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'
TCP_CONG_RTT_STAMP is no longer needed.
As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.
In this example ss command displays a srtt of 32 usecs (10Gbit link)
lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51
With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Larry Brakmo <brakmo@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Note : local_clock() might have a (bounded) drift between cpus.
Do not use this infra in place of ktime_get() without understanding the
issues.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Larry Brakmo <brakmo@google.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Dooks [Wed, 26 Feb 2014 11:48:00 +0000 (11:48 +0000)]
phy: micrel: add of configuration for LED mode
Add support for the led-mode property for the following PHYs
which have a single LED mode configuration value.
KSZ8001 and KSZ8041 which both use register 0x1e bits 15,14 and
KSZ8021, KSZ8031 and KSZ8051 which use register 0x1f bits 5,4
to control the LED configuration.
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 26 Feb 2014 11:01:55 +0000 (12:01 +0100)]
isdn: fix multiple sleep_on races
The isdn core code uses a couple of wait queues with
interruptible_sleep_on, which is racy and about to get
removed from the kernel. Fortunately, we know for each case
what we are waiting for, so they can all be converted to
the better wait_event_interruptible interface.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
These two drivers use identical code for their procfs status
file handling, which contains a small race against status
data becoming available while reading the file.
This uses wait_event_interruptible instead to fix this
particular race and eventually get rid of all sleep_on
instances. There seems to be another race involving
multiple concurrent readers of the same procfs file, which
I don't try to fix here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 26 Feb 2014 11:01:53 +0000 (12:01 +0100)]
isdn: hisax/elsa: fix sleep_on race in elsa FSM
The state machine code in the elsa driver uses interruptible_sleep_on
to wait for state changes, which is racy. A closer look at the possible
states reveals that it is always used to wait for getting back into
ARCOFI_NOP, so we can use wait_event_interruptible instead.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 26 Feb 2014 11:01:52 +0000 (12:01 +0100)]
isdn: pcbit: fix interruptible_sleep_on race
interruptible_sleep_on is racy and going away. In case of pcbit,
the driver would run into a timeout if the card is initialized
before we start waiting for it. This uses wait_event to fix the
race. In order to do this, the state machine handling for the
timeout case has to get trivially reorganized so we actually know
whether the timeout has occorred or not.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Karsten Keil <isdn@linux-pingi.de> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 26 Feb 2014 11:01:51 +0000 (12:01 +0100)]
atm: firestream: fix interruptible_sleep_on race
interruptible_sleep_on is racy and going away. This replaces the one use
in the firestream driver with the appropriate wait_event_interruptible
variant.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Chas Williams <chas@cmf.nrl.navy.mil> Cc: linux-atm-general@lists.sourceforge.net Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Feb 2014 20:55:53 +0000 (15:55 -0500)]
Merge branch 'intel-next'
Aaron Brown says:
====================
Intel Wired LAN Driver Updates
This series contains updates to ixgbe, igb and documentation. The
first four have been sent up as part of other series where 1 or more
in the series were rejected and either dropped or still being worked
on for reasons unrelated to these patches.
Don makes recovery from a HW ECC error just schedule a reset as it turns
out the previous behaviour of forcing the user to reload is not necessary.
Mark adds WoL support to port 0 of a new device. Jacob removes a magic
number from the ptp_caps.name and updates the SubmittingPatches
documentation with details on the Fixed: tag. And Carolyn updates igb
files to remove the FSF physical mail address.
[ DaveM Note: SubmittingPatches change omitted, will go via LKML ]
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Carolyn Wyborny [Wed, 26 Feb 2014 01:58:57 +0000 (17:58 -0800)]
igb: Update license text to remove FSF address and update copyright.
This patch updates the license text to remove address of Free Software
Foundation and refer users to www.gnu.org instead. This patch also updates
the copyright dates in appropriate igb driver files.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher [Wed, 26 Feb 2014 01:58:56 +0000 (17:58 -0800)]
igb: make local functions static and remove dead code
Based on Stephen Hemminger's original patch.
Make local functions static, and remove unused functions.
Reported-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Wed, 26 Feb 2014 01:58:55 +0000 (17:58 -0800)]
ixgbe: Add WoL support for a new device
Add WoL support for port 0 of a new 82599-based device.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Wed, 26 Feb 2014 01:58:54 +0000 (17:58 -0800)]
ixgbe: don't use magic size number to assign ptp_caps.name
Rather than using a magic size number, just use sizeof since that will
work and is more robust to future changes.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Wed, 26 Feb 2014 01:58:53 +0000 (17:58 -0800)]
ixgbe: modify behavior on receiving a HW ECC error.
Currently when we noticed a HW ECC error we would request the use reload
the driver to force a reset of the part. This was done due to the mistaken
believe that a normal reset would not be sufficient. Well it turns out it
would be so now we just schedule a reset upon seeing the ECC.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: yet another new IPV6_MTU_DISCOVER option IPV6_PMTUDISC_OMIT
This option has the same semantic as IP_PMTUDISC_OMIT for IPv4 which
got recently introduced. It doesn't honor the path mtu discovered by the
host but in contrary to IPV6_PMTUDISC_INTERFACE allows the generation of
fragments if the packet size exceeds the MTU of the outgoing interface
MTU.
Fixes: 93b36cf3425b9b ("ipv6: support IPV6_PMTU_INTERFACE on sockets") Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: yet another new IP_MTU_DISCOVER option IP_PMTUDISC_OMIT
IP_PMTUDISC_INTERFACE has a design error: because it does not allow the
generation of fragments if the interface mtu is exceeded, it is very
hard to make use of this option in already deployed name server software
for which I introduced this option.
This patch adds yet another new IP_MTU_DISCOVER option to not honor any
path mtu information and not accepting new icmp notifications destined for
the socket this option is enabled on. But we allow outgoing fragmentation
in case the packet size exceeds the outgoing interface mtu.
As such this new option can be used as a drop-in replacement for
IP_PMTUDISC_DONT, which is currently in use by most name server software
making the adoption of this option very smooth and easy.
The original advantage of IP_PMTUDISC_INTERFACE is still maintained:
ignoring incoming path MTU updates and not honoring discovered path MTUs
in the output path.
Fixes: 482fc6094afad5 ("ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE") Cc: Florian Weimer <fweimer@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: use ip_skb_dst_mtu to determine mtu in ip_fragment
ip_skb_dst_mtu mostly falls back to ip_dst_mtu_maybe_forward if no socket
is attached to the skb (in case of forwarding) or determines the mtu like
we do in ip_finish_output, which actually checks if we should branch to
ip_fragment. Thus use the same function to determine the mtu here, too.
This is important for the introduction of IP_PMTUDISC_OMIT, where we
want the packets getting cut in pieces of the size of the outgoing
interface mtu. IPv6 already does this correctly.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Timo Teräs [Wed, 26 Feb 2014 09:43:04 +0000 (11:43 +0200)]
neigh: probe application via netlink in NUD_PROBE
iproute2 arpd seems to expect this as there's code and comments
to handle netlink probes with NUD_PROBE set. It is used to flush
the arpd cached mappings.
opennhrp instead turns off unicast probes (so it can handle all
neighbour discovery). Without this change it will not see NUD_PROBE
probes and cannot reconfirm the mapping. Thus currently neigh entry
will just fail and can cause few packets dropped until broadcast
discovery is restarted.
Earlier discussion on the subject:
http://marc.info/?t=139305877100001&r=1&w=2
Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Sacren [Wed, 26 Feb 2014 05:38:29 +0000 (22:38 -0700)]
ieee802154: fix new function declaration
The commit 8fad346f366a7 ("eee802154: add basic support for RF212 to
at86rf230 driver") introduced the new function is_rf212() with some
minor issues in declaration:
1) Fix the function type by changing it to bool as the function
definition returns a boolean value. Additionally both callers of
is_rf212() are expected to return a boolean value.
2) Fix the function specifier by deleting the inline keyword as the
compiler takes care of that.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Tue, 25 Feb 2014 20:11:02 +0000 (21:11 +0100)]
ipv6: log src and dst along with "udp checksum is 0"
These info messages are rather pointless without any means to identify
the source of the bogus packets. Logging the src and dst addresses and
ports may help a bit.
Cc: Joe Perches <joe@perches.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Feb 2014 20:38:18 +0000 (15:38 -0500)]
Merge branch 'mlx4'
Amir Vadai says:
====================
net, net/mlx4: Add sysfs file for port number
Modern distro's are using biosdevname to rename interface to a name based on
slot/port number.
biosdevname can't get the port number of devices that have multiple ports that
share the same PCI function.
This patch adds a sysfs file under: /sys/devices/.../net/<interface>/dev_port,
that contains the port number (0 based) - to be used by biosdevname.
Also, dev_id was wrongly used in mlx4_en driver - added a patch that fix it.
This patch was tested and applied over commit 51adfcc "net: bcmgenet: remove
unused bh_lock member"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Amir Vadai [Tue, 25 Feb 2014 16:17:50 +0000 (18:17 +0200)]
net: Add sysfs file for port number
Add a sysfs file to enable user space to query the device
port number used by a netdevice instance. This is needed for
devices that have multiple ports on the same PCI function.
Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 26 Feb 2014 20:28:08 +0000 (15:28 -0500)]
Merge branch 'bnx2x'
Michal Schmidt says:
====================
bnx2x: minimize RAM usage in kdump
kdump kernels usually have only a small amount of memory reserved.
bnx2x can be memory-hungry. Let's minimize its memory usage when
running in kdump.
I detect kdump by looking at the "reset_devices" flag. A couple of
storage drivers (cciss, hpsa) use it for the same purpose. I am not sure
this is the best way to solve the problem, but it works.
Should it be made more generic by, say, looking at the total amount
of lowmem instead? Not using TPA by default when lowmem is small and/or
defaulting to fewer queues would help 32bit systems where a driver for
a multi-function multi-queue NIC can consume a significant amount
of available memory. Or do we want no such heuristics?
Is this something to consider doing for other network drivers too?
====================
Acked-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Tue, 25 Feb 2014 15:04:25 +0000 (16:04 +0100)]
bnx2x: save RAM in kdump kernel by using a single queue
When running in a kdump kernel, make sure to use only a single ethernet
queue even if a num_queues option in /etc/modprobe.d/*.conf would specify
otherwise. This saves memory, which tends to be scarce in kdump.
This saves about 40 MB in the kdump environment on a setup with
num_queues=8 in the config file.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Tue, 25 Feb 2014 15:04:24 +0000 (16:04 +0100)]
bnx2x: clamp num_queues to prevent passing a negative value
Use the clamp() macro to make the calculation of the number of queues
slightly easier to understand. It also avoids a crash when someone
accidentally passes a negative value in num_queues= module parameter.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Tue, 25 Feb 2014 13:34:32 +0000 (14:34 +0100)]
net: tcp: add mib counters to track zero window transitions
Three counters are added:
- one to track when we went from non-zero to zero window
- one to track the reverse
- one counter incremented when we want to announce zero window,
but can't because we would shrink current window.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Jerram [Tue, 25 Feb 2014 11:17:25 +0000 (11:17 +0000)]
net: order MPLS ethertypes numerically
All ethertypes other than ETH_P_MPLS_UC, ETH_P_MPLS_MC and
ETH_P_ATMMPOA were already ordered numerically. This commit moves
those three ETH_P_... values into correct numerical order too.
Signed-off-by: Neil Jerram <Neil.Jerram@metaswitch.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Fri, 21 Feb 2014 07:41:11 +0000 (08:41 +0100)]
vti4: Check the tunnel endpoints of the xfrm state and the vti interface
The tunnel endpoints of the xfrm_state we got from the xfrm_lookup
must match the tunnel endpoints of the vti interface. This patch
ensures this matching.
Steffen Klassert [Fri, 21 Feb 2014 07:41:10 +0000 (08:41 +0100)]
vti4: Support inter address family tunneling.
With this patch we can tunnel ipv6 traffic via a vti4
interface. A vti4 interface can now have an ipv6 address
and ipv6 traffic can be routed via a vti4 interface.
The resulting traffic is xfrm transformed and tunneled
throuhg ipv4 if matching IPsec policies and states are
present.
Steffen Klassert [Fri, 21 Feb 2014 07:41:10 +0000 (08:41 +0100)]
vti4: Use the on xfrm_lookup returned dst_entry directly
We need to be protocol family indepenent to support
inter addresss family tunneling with vti. So use a
dst_entry instead of the ipv4 rtable in vti_tunnel_xmit.
Steffen Klassert [Fri, 21 Feb 2014 07:41:10 +0000 (08:41 +0100)]
vti: Update the ipv4 side to use it's own receive hook.
With this patch, vti uses the IPsec protocol multiplexer to
register it's own receive side hooks for ESP, AH and IPCOMP.
Vti now does the following on receive side:
1. Do an input policy check for the IPsec packet we received.
This is required because this packet could be already
prosecces by IPsec, so an inbuond policy check is needed.
2. Mark the packet with the i_key. The policy and the state
must match this key now. Policy and state belong to the outer
namespace and policy enforcement is done at the further layers.
3. Call the generic xfrm layer to do decryption and decapsulation.
4. Wait for a callback from the xfrm layer to properly clean the
skb to not leak informations on namespace and to update the
device statistics.
On transmit side:
1. Mark the packet with the o_key. The policy and the state
must match this key now.
2. Do a xfrm_lookup on the original packet with the mark applied.
3. Check if we got an IPsec route.
4. Clean the skb to not leak informations on namespace
transitions.
5. Attach the dst_enty we got from the xfrm_lookup to the skb.
Steffen Klassert [Fri, 21 Feb 2014 07:41:09 +0000 (08:41 +0100)]
ip_tunnel: Make vti work with i_key set
Vti uses the o_key to mark packets that were transmitted or received
by a vti interface. Unfortunately we can't apply different marks
to in and outbound packets with only one key availabe. Vti interfaces
typically use wildcard selectors for vti IPsec policies. On forwarding,
the same output policy will match for both directions. This generates
a loop between the IPsec gateways until the ttl of the packet is
exceeded.
The gre i_key/o_key are usually there to find the right gre tunnel
during a lookup. When vti uses the i_key to mark packets, the tunnel
lookup does not work any more because vti does not use the gre keys
as a hash key for the lookup.
This patch workarounds this my not including the i_key when comupting
the hash for the tunnel lookup in case of vti tunnels.
With this we have separate keys available for the transmitting and
receiving side of the vti interface.
Steffen Klassert [Fri, 21 Feb 2014 07:41:09 +0000 (08:41 +0100)]
xfrm: Add xfrm_tunnel_skb_cb to the skb common buffer
IPsec vti_rcv needs to remind the tunnel pointer to
check it later at the vti_rcv_cb callback. So add
this pointer to the IPsec common buffer, initialize
it and check it to avoid transport state matching of
a tunneled packet.
Steffen Klassert [Fri, 21 Feb 2014 07:41:08 +0000 (08:41 +0100)]
xfrm4: Add IPsec protocol multiplexer
This patch add an IPsec protocol multiplexer. With this
it is possible to add alternative protocol handlers as
needed for IPsec virtual tunnel interfaces.
Florian Fainelli [Tue, 25 Feb 2014 00:56:11 +0000 (16:56 -0800)]
net: bcmgenet: drop checks on priv->phydev
Drop all the checks on priv->phydev since we will refuse probing the
driver if we cannot attach to a PHY device. Drop all checks on
priv->phydev. This also fixes some smatch issues reported by Dan
Carpenter.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 25 Feb 2014 00:38:53 +0000 (19:38 -0500)]
Merge branch 'gianfar'
Claudiu Manoil says:
====================
gianfar: Device reset and reconfig fixes
These patches end up fixing some notable device reset & reconfig
related problems. One issue is on-the-fly (Rx/Tx on) programming
of interrupt coalescing (IC) registers on the processing path,
against HW recommendation. This is an old issue that became visible
after BQL introduction, as under certain conditions (low traffic)
one TX interrupt gets lost and BQL fires Tx timeout as a result.
Another notable issue is a race on the Tx path (xmit, clean_tx)
during device reset (i.e. during Tx timeout watchdog firing)
that leads to NULL access.
Fixing the problematic on-thy-fly register writes (i.e. the IC regs)
required the implementation of a MAC soft reset procedure.
The race leading to NULL access was addressed by fixing the
stop_gfar()/startup_gfar() pair (disable/enable napi a.s.o.)
and adding the device state DOWN to sync with the TX path.
v2: Refactored if() clauses from gfar_set_features(), PATCH 2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Mon, 24 Feb 2014 10:13:46 +0000 (12:13 +0200)]
gianfar: Fix Tx int miss, dont write IC on-the-fly
Programming the interrupt coalescing (IC) registers while
the controller/DMA is on may incur the loss of one Tx
confirmation interrupt, under certain conditions. This is
a subtle hw race because it does not occur during a burst
of Tx packets. It has been observed on p2020 devices that,
if just one packet is being xmit'ed, the Tx confirmation
doesn't trigger and BQL evetually blocks the Tx queues,
followed by Tx timeout and an un-responsive device.
This issue was not apparent prior to introducing BQL
support, as a late Tx confirmation was not an issue back then
and the next burst of Tx frames would have triggered the
Tx confirmation/ Tx ring cleanup anyway.
Bottom line, the hw specifications state that the IC registers
should not be programmed while the Rx/Tx blocks (the DMA) are
enabled. Further more, these registers are currently re-written
with the same values on the processing path, over and over again.
To fix this, rewriting the IC registers has been removed from
the processing path (napi poll). A complete MAC reset procedure
has been implemented for the ethtool -c option instead, to
reliably update these registers while the controller is stopped.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Mon, 24 Feb 2014 10:13:45 +0000 (12:13 +0200)]
gianfar: Fix device reset races (oops) for Tx
The device reset procedure, stop_gfar()/startup_gfar(), has
concurrency issues.
"Kernel access of bad area" oopses show up during Tx timeout
device reset or other reset cases (like changing MTU) that
happen while the interface still has traffic. The oopses
happen in start_xmit and clean_tx_ring when accessing tx_queue->
tx_skbuff which is NULL. The race comes from de-allocating the
tx_skbuff while transmission and napi processing are still
active. Though the Tx queues get temoprarily stopped when Tx
timeout occurs, they get re-enabled as a result of Tx congestion
handling inside the napi context (see clean_tx_ring()). Not
disabling the napi during reset is also a bug, because
clean_tx_ring() will try to access tx_skbuff while it is being
de-alloc'ed and re-alloc'ed.
To fix this, stop_gfar() needs to disable napi processing
after stopping the Tx queues. However, in order to prevent
clean_tx_ring() to re-enable the Tx queue before the napi
gets disabled, the device state DOWN has been introduced.
It prevents the Tx congestion management from re-enabling the
de-congested Tx queue while the device is brought down.
An additional locking state, RESETTING, has been introduced
to prevent simultaneous resets or to prevent configuring the
device while it is resetting.
The bogus 'rxlock's (for each Rx queue) have been removed since
their purpose is not justified, as they don't prevent nor are
suited to prevent device reset/reconfig races (such as this one).
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>