The motivation for an additional notifier in batched netdevice
notification (rt_do_flush) only needs to be called once per batch not
once per namespace.
For further batching improvements I need a guarantee that the
netdevices are unregistered in order allowing me to unregister an all
of the network devices in a network namespace at the same time with
the guarantee that the loopback device is really and truly
unregistered last.
Additionally it appears that we moved the route cache flush after
the final synchronize_net, which seems wrong and there was no
explanation. So I have restored the original location of the final
synchronize_net.
Cc: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sun, 29 Nov 2009 15:15:41 +0000 (15:15 +0000)]
sfc: Add support for SFC9000 family (2)
This integrates support for the SFC9000 family of 10G Ethernet
controllers and LAN-on-motherboard chips, starting with the SFL9021
'Siena' and SFC9020 'Bethpage'.
Credit for this code is largely due to my colleagues at Solarflare:
Guido Barzini
Steve Hodgson
Kieran Mansley
Matthew Slattery
Neil Turton
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sun, 29 Nov 2009 15:15:25 +0000 (15:15 +0000)]
sfc: Add support for SFC9000 family (1)
This adds support for the SFC9000 family of 10G Ethernet controllers
and LAN-on-motherboard chips, starting with the SFL9021 'Siena' and
SFC9020 'Bethpage'.
The SFC9000 family is based on the SFC4000 'Falcon' architecture, but
with some significant changes:
- Two ports are associated with two independent PCI functions
(except SFC9010)
- Integrated 10GBASE-T PHY(s) (SFL9021/9022)
- MAC, PHY and board peripherals are managed by firmware
- Driver does not require board-specific code
- Firmware supports wake-on-LAN and lights-out management through NC-SI
- IPv6 checksum offload and RSS
- Filtering by MAC address and VLAN (not included in this code)
- PCI SR-IOV (not included in this code)
Credit for this code is largely due to my colleagues at Solarflare:
Guido Barzini
Steve Hodgson
Kieran Mansley
Matthew Slattery
Neil Turton
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sun, 29 Nov 2009 15:10:44 +0000 (15:10 +0000)]
sfc: Extend MTD driver for use with new NICs
In new NICs flash is managed by firmware and we will use high-level
operations on partitions rather than direct SPI commands. Add support
for multiple MTD partitions per flash device and remove the direct
link between MTD and SPI devices. Maintain a list of MTD partitions
in struct efx_nic.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sun, 29 Nov 2009 15:08:55 +0000 (15:08 +0000)]
sfc: Remove static PHY data and enumerations
New NICs have firmware managing the PHY, and we will discover the PHY
capabilities at run-time. Replace the static data with probe() and
test_name() operations.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We have not held the link settings in software (aside from flow
control), and have relied on asking the hardware what they are. This
is a problem because in some cases the hardware may no longer be in a
state to tell us. In particular, if an entire multi-port board is
reset through one port, the driver bindings to other ports have no
chance to save settings before recovering.
We only actually need to keep track of the autonegotiation settings,
so add an ethtool advertising mask to struct efx_nic, initialise it
in PHY init and update it as necessary.
Remove now-unneeded uses of efx_phy_op::{get,set}_settings() and
struct ethtool_cmd.
Much of this was done by Steve Hodgson <shodgson@solarflare.com>.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Sun, 29 Nov 2009 03:42:18 +0000 (03:42 +0000)]
sfc: Turn pause frame generation on and off at the MAC, not the RX FIFO
Pause frame generation is gated by both RX_XOFF_MAC_EN and an enable
bit in each MAC. RX_XOFF_MAC_EN bit always reads back as 0 so we need
to set it correctly every time we modify RX_CFG_REG. Simplify this by
always setting it to 1 and only changing the enable bits in the MACs.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
PJ Waskiewicz [Wed, 25 Nov 2009 00:11:54 +0000 (00:11 +0000)]
ixgbe: Display currently attached PHY through ethtool
This patch extends the ethtool interface to display what PHY
is currently connected to a NIC. The results can be viewed in
ethtool ethX output.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
PJ Waskiewicz [Wed, 25 Nov 2009 00:11:30 +0000 (00:11 +0000)]
ethtool: Add Direct Attach support to connector port reporting
This patch allows a base driver to specify Direct Attach as the
type of port through the ethtool interface.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 24 Nov 2009 18:52:10 +0000 (18:52 +0000)]
ixgbe: Fix Receive Address Register (RAR) cleaning and accounting
This fixes an issue when clearing out the RAR entries. If RAR[0]
is the only address in use, don't clear the others.
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Don Skidmore [Tue, 24 Nov 2009 18:51:48 +0000 (18:51 +0000)]
ixgbe: LINKS2 is not a valid register for 82598
82598 shouldn't try and access LINKS2 while configuring
link and flow control. This is an 82599-only register.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
PJ Waskiewicz [Tue, 24 Nov 2009 18:51:28 +0000 (18:51 +0000)]
ixgbe: Disable Flow Control for certain devices
Flow Control autoneg should be disabled for certain adapters
that don't support autonegotiation of Flow Control at 10 gigabit.
These interfaces are the 10GBASE-T devices, CX4, and SFP+, all
running at 10 gigabit only. 1 gigabit is fine.
Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 24 Nov 2009 18:51:06 +0000 (18:51 +0000)]
ixgbe: handle parameters for tx and rx EITR, no div0
The driver was doing a divide by zero when adjusting tx-usecs.
This patch removes the divide by zero code and changes the logic slightly
to ignore tx-usecs in the case of shared TxRx vectors.
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
andrew hendry [Tue, 24 Nov 2009 15:16:05 +0000 (15:16 +0000)]
X25: Fix oops and refcnt problems from x25_dev_get
Calls to x25_dev_get check for dev = NULL which was not set.
It allowed x25 to set routes and ioctls on down interfaces.
This caused oopses and refcnt problems on device_unregister.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
andrew hendry [Tue, 24 Nov 2009 15:15:26 +0000 (15:15 +0000)]
X25: Move SYSCTL ifdefs into header
Moves the CONFIG_SYSCTL ifdefs in x25_init into header.
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kevin Hilman [Tue, 24 Nov 2009 12:57:47 +0000 (12:57 +0000)]
NET: smc91x: convert to dev_pm_ops
Convert smc91x driver from legacy PM hooks over to using dev_pm_ops.
Tested on OMAP3 platform.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: on T3_RTX retransmit all the in-flight chunks
When retransmitting due to T3 timeout, retransmit all the
in-flight chunks for the corresponding transport/path, including
chunks sent less then 1 rto ago.
This is the correct behaviour according to rfc4960 section 6.3.3
E3 and
"Note: Any DATA chunks that were sent to the address for which the
T3-rtx timer expired but did not fit in one MTU (rule E3 above)
should be marked for retransmission and sent as soon as cwnd
allows (normally, when a SACK arrives). ".
This fixes problems when more then one path is present and the T3
retransmission of the first chunk that timeouts stops the T3 timer
for the initial active path, leaving all the other in-flight
chunks waiting forever or until a new chunk is transmitted on the
same path and timeouts (and this will happen only if the cwnd
allows sending new chunks, but since cwnd was dropped to MTU by
the timeout => it will wait until the first heartbeat).
Example: 10 packets in flight, sent at 0.1 s intervals on the
primary path. The primary path is down and the first packet
timeouts. The first packet is retransmitted on another path, the
T3 timer for the primary path is stopped and cwnd is set to MTU.
All the other 9 in-flight packets will not be retransmitted
(unless more new packets are sent on the primary path which depend
on cwnd allowing it, and even in this case the 9 packets will be
retransmitted only after a new packet timeouts which even in the
best case would be more then RTO).
p.s The problem is not only when multiple paths are there. It
can happen in a single homed environment. If the application
stops sending data, it possible to have a hung association.
Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Steve Hodgson [Sat, 28 Nov 2009 05:34:29 +0000 (05:34 +0000)]
sfc: QT202x: Reset before reading PHY id
Reading standard registers on the QT2025C before its firmware has
booted may cause the boot process to fail. Therefore, follow the
recommended reset sequence before reading its id registers. Either
order works for the QT2022C2, so don't differentiate.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Steve Hodgson [Sat, 28 Nov 2009 05:34:05 +0000 (05:34 +0000)]
sfc: Simplify PHY polling
Falcon can generate events for LASI interrupts from the PHY, but in
practice we have never implemented this in reference designs. Instead
we have polled, inserted the appropriate events, and then handled the
events later. This is a waste of time and code.
Instead, make PHY poll functions update the link state synchronously
and report whether it changed. We can still make use of the LASI
registers as a shortcut on the SFT9001.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 25 Nov 2009 07:54:54 +0000 (07:54 +0000)]
vlan: support "loose binding" to the underlying network device
Currently the UP/DOWN state of VLANs is synchronized to the state of the
underlying device, meaning all VLANs are set down once the underlying
device is set down. This causes all routes to the VLAN devices to vanish.
Add a flag to specify a "loose binding" mode, in which only the operstate
is transfered, but the VLAN device state is independant.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 25 Nov 2009 16:12:16 +0000 (16:12 +0000)]
sfc: Change MAC promiscuity and multicast hash at the same time
From: Steve Hodgson <shodgson@solarflare.com>
Currently we can set multicast hash immediately (in atomic context)
but must delay setting MAC promiscuity. There is not that much
point in deferring one but not the other, and setting the multicast
hash on Siena will involve a firmware request. So process them
both in efx_mac_work().
Also, set the broadcast bit in the multicast hash in
efx_set_multicast_list(), since this is required for both Falcon and
Siena.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 25 Nov 2009 16:12:01 +0000 (16:12 +0000)]
sfc: Simplify XMAC link polling
From: Steve Hodgson <shodgson@solarflare.com>
Only the XMAC on Falcon needs help from the driver to poll and reset
the MAC-PHY link (XAUI); GMII is a simple parallel bus and on later
NICs firmware takes care of the XAUI link. Also, an XMAC interrupt
currently schedules a work item which simply clears a flag
(efx_nic::mac_up) to be checked by the regular monitor (or the next
link reconfiguration, if that is sooner).
Rename the flag to xmac_poll_required, changing its sense. Remove the
needless indirection and just set the flag immediately. Call
falcon_xmac_poll() directly where required.
Add a new generic operation mac_op::check_fault to check the link
outside of regular monitoring, as required during self-tests.
(Note that this leaves us with an unused work item, but we will
immediately have another use for it.)
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 25 Nov 2009 16:11:35 +0000 (16:11 +0000)]
sfc: Split MAC stats DMA initiation and completion
From: Steve Hodgson <shodgson@solarflare.com>
Currently we initiate MAC stats DMA and busy-wait for completion when
stats are requested. We can improve on this with a periodic timer to
initiate and poll for stats, and opportunistically poll when stats are
requested.
Since efx_nic::stats_disable_count and efx_stats_{disable,enable}()
are Falcon-specific, rename them and move them accordingly.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 25 Nov 2009 16:11:19 +0000 (16:11 +0000)]
sfc: Hold MAC lock for longer in efx_init_port()
Although efx_init_port() is only called at probe time and so cannot
race with port reconfiguration, most of the functions it calls can
expect to be called with the MAC lock held.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 25 Nov 2009 16:09:55 +0000 (16:09 +0000)]
sfc: Fix bugs in RX queue flushing
Avoid overrunning the hardware limit of 4 concurrent RX queue flushes.
Expand the queue flush state to support this. Make similar changes to
TX flushing to keep the code symmetric.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Wed, 25 Nov 2009 16:08:41 +0000 (16:08 +0000)]
sfc: Treat all MAC registers as 128-bit
Although all the defined fields in these registers are within 32 bits,
they are architecturally defined as 128-bit like most other Falcon
registers. In particular, we must use efx_reado() to ensure proper
locking when reading MD_STAT_REG.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 26 Nov 2009 06:07:11 +0000 (06:07 +0000)]
macvlan: export macvlan mode through netlink
In order to support all three modes of macvlan at
runtime, extend the existing netlink protocol
to allow choosing the mode per macvlan slave
interface.
This depends on a matching patch to iproute2
in order to become accessible in user land.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Thu, 26 Nov 2009 06:07:10 +0000 (06:07 +0000)]
macvlan: implement bridge, VEPA and private mode
This allows each macvlan slave device to be in one
of three modes, depending on the use case:
MACVLAN_PRIVATE:
The device never communicates with any other device
on the same upper_dev. This even includes frames
coming back from a reflective relay, where supported
by the adjacent bridge.
MACVLAN_VEPA:
The new Virtual Ethernet Port Aggregator (VEPA) mode,
we assume that the adjacent bridge returns all frames
where both source and destination are local to the
macvlan port, i.e. the bridge is set up as a reflective
relay.
Broadcast frames coming in from the upper_dev get
flooded to all macvlan interfaces in VEPA mode.
We never deliver any frames locally.
MACVLAN_BRIDGE:
We provide the behavior of a simple bridge between
different macvlan interfaces on the same port. Frames
from one interface to another one get delivered directly
and are not sent out externally. Broadcast frames get
flooded to all other bridge ports and to the external
interface, but when they come back from a reflective
relay, we don't deliver them again.
Since we know all the MAC addresses, the macvlan bridge
mode does not require learning or STP like the bridge
module does.
Based on an earlier patch "macvlan: Reflect macvlan packets
meant for other macvlan devices" by Eric Biederman.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Patrick McHardy <kaber@trash.net> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Kagstrom [Wed, 25 Nov 2009 22:10:34 +0000 (22:10 +0000)]
via-velocity: Change DMA_LENGTH_DEF (from the VIA driver)
The VIA driver has changed the default for the DMA_LENGTH_DEF parameter.
Together with adaptive interrupt supression and NAPI support, this
improves performance quite a bit
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Kagstrom [Wed, 25 Nov 2009 22:10:26 +0000 (22:10 +0000)]
via-velocity: Implement NAPI support
This patch adds NAPI support for VIA velocity. The new velocity_poll
function also pairs tx/rx handling twice which improves perforamance on
some workloads (e.g., netperf UDP_STREAM) significantly (that part is
from the VIA driver).
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Kagstrom [Wed, 25 Nov 2009 22:10:12 +0000 (22:10 +0000)]
via-velocity: Add ethtool interrupt coalescing support
(Partially from the upstream VIA driver). Tweaking the number of
frames-per-interrupt and timer-until-interrupt can reduce the amount of
CPU work quite a lot.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Willi [Wed, 25 Nov 2009 00:58:39 +0000 (00:58 +0000)]
xfrm: Add SHA384 and SHA512 HMAC authentication algorithms to XFRM
These algorithms use a truncation of 192/256 bits, as specified
in RFC4868.
Signed-off-by: Martin Willi <martin@strongswan.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Willi [Wed, 25 Nov 2009 00:29:52 +0000 (00:29 +0000)]
xfrm: Store aalg in xfrm_state with a user specified truncation length
Adding a xfrm_state requires an authentication algorithm specified
either as xfrm_algo or as xfrm_algo_auth with a specific truncation
length. For compatibility, both attributes are dumped to userspace,
and we also accept both attributes, but prefer the new syntax.
If no truncation length is specified, or the authentication algorithm
is specified using xfrm_algo, the truncation length from the algorithm
description in the kernel is used.
Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Willi [Wed, 25 Nov 2009 00:29:51 +0000 (00:29 +0000)]
xfrm: Define new XFRM netlink auth attribute with specified truncation bits
The new XFRMA_ALG_AUTH_TRUNC attribute taking a xfrm_algo_auth as
argument allows the installation of authentication algorithms with
a truncation length specified in userspace, i.e. SHA256 with 128 bit
instead of 96 bit truncation.
Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Wed, 25 Nov 2009 23:40:35 +0000 (15:40 -0800)]
net: convert /proc/net/rt_acct to seq_file
Rewrite statistics accumulation to be in terms of structure fields,
not raw u32 additions. Keep them in same order, though.
This is the last user of create_proc_read_entry() in net/,
please NAK all new ones as well as all new ->write_proc, ->read_proc and
create_proc_entry() users. Cc me if there are problems. :-)
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 23 Nov 2009 16:07:30 +0000 (16:07 +0000)]
sfc: Combine high-level header files
All files that include ethtool.h, rx.h or tx.h are also including
efx.h, and there is no good reason to separate out the few
declarations they contain. Therefore fold them into efx.h.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>