Daniel Borkmann [Mon, 13 Jan 2014 17:41:20 +0000 (18:41 +0100)]
net: vxlan: properly cleanup devs on module unload
We should use vxlan_dellink() handler in vxlan_exit_net(), since
i) we're not in fast-path and we should be consistent in dismantle
just as we would remove a device through rtnl ops, and more
importantly, ii) in case future code will kfree() memory in
vxlan_dellink(), we would leak it right here unnoticed. Therefore,
do not only half of the cleanup work, but make it properly.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 13 Jan 2014 17:41:19 +0000 (18:41 +0100)]
net: vxlan: when lower dev unregisters remove vxlan dev as well
We can create a vxlan device with an explicit underlying carrier.
In that case, when the carrier link is being deleted from the
system (e.g. due to module unload) we should also clean up all
created vxlan devices on top of it since otherwise we're in an
inconsistent state in vxlan device. In that case, the user needs
to remove all such devices, while in case of other virtual devs
that sit on top of physical ones, it is usually the case that
these devices do unregister automatically as well and do not
leave the burden on the user.
This work is not necessary when vxlan device was not created with
a real underlying device, as connections can resume in that case
when driver is plugged again. But at least for the other cases,
we should go ahead and do the cleanup on removal.
We don't register the notifier during vxlan_newlink() here since
I consider this event rather rare, and therefore we should not
bloat vxlan's core structure unecessary. Also, we can simply make
use of unregister_netdevice_many() to batch that. fdb is flushed
upon ndo_stop().
E.g. `ip -d link show vxlan13` after carrier removal before
this patch:
5: vxlan13: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default
link/ether 1e:47:da:6d:4d:99 brd ff:ff:ff:ff:ff:ff promiscuity 0
vxlan id 13 group 239.0.0.10 dev 2 port 32768 61000 ageing 300
^^^^^ Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Jan 2014 02:59:33 +0000 (18:59 -0800)]
Merge branch 'intel-next'
Aaron Brown says:
====================
Intel Wired LAN Driver Updates, ixgbe: Add LER support
The following patches add Live Error Recovery (LER) support to the
ixgbe driver. This support also improves behavior in Thunderbolt
environments. This involves checking all register reads for a
value of all ones and when that is seen, to read the status
register, which should never properly return all ones, to
confirm whether the received value was correct. When this detects
a removal, the hw_addr field is cleared to indicate the removal.
This then blocks subsequent access to the device registers.
All register access macros have been changed to static inline
functions and all register accesses now use them.ยท Macro versions
are temporarily provided.
The __IXGBE_DOWN bit is no longer overloaded to also mean that
device removal has been initiated. Now the bit can be used to
protect ixgbe_down from multiple entry via test_and_set_bit. A
needed smp_mb__before_clear_bit was also added.
V2 Changes:
- Use ACCESS_ONCE where needed, thanks to Ben Hutchings
- Fix crash on module removal
- Use boolean values for boolean returns instead of 0 and 1
- Reword Kconfig help text
V3 Changes:
- Drop config option, per David Miller
- Drop tail register write checks, per Alexander Duyck
- Change writeq implementation to a static inline, thanks to Joe Perches
V4 Changes:
- Change __IXGBE_REMOVE to __IXGBE_REMOVING, per Scott Feldman's comment
- Add #define writeq writeq, per Alexander Duyck
- Change static inline functions to lower case, per David Miller
- Use new lower case names in added and modified register accesses
- Provide temporary upper case macros for register access functions
- Change IXGBE_REMOVED from macro to static inline and change references
- Correct IXGBE_WRITE_FLUSH to properly enclose parameter expansion
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Wed, 15 Jan 2014 02:53:17 +0000 (18:53 -0800)]
ixgbe: Additional adapter removal checks
Additional checks are needed for a detected removal not to cause
problems. Some involve simply avoiding a lot of stuff that can't
do anything good, and also cases where the phony return value can
cause problems. In addition, down the adapter when the removal is
sensed.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Wed, 15 Jan 2014 02:53:16 +0000 (18:53 -0800)]
ixgbe: Check for adapter removal on register writes
Prevent writes to an adapter that has been detected as removed
by a previous failing read. This also fixes some include file
ordering confusion that this patch revealed.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Wed, 15 Jan 2014 02:53:15 +0000 (18:53 -0800)]
ixgbe: Check register reads for adapter removal
Check all register reads for adapter removal by checking the status
register after any register read that returns 0xFFFFFFFF. Since the
status register will never return 0xFFFFFFFF unless the adapter is
removed, such a value from a status register read confirms the
removal.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Wed, 15 Jan 2014 02:53:14 +0000 (18:53 -0800)]
ixgbe: Make ethtool register test use accessors
Make the ethtool register test use the normal register accessor
functions. Also eliminate macros used for calling register test
functions to make error exits clearer. Use boolean values for
boolean returns instead of 0 and 1.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Wed, 15 Jan 2014 02:53:13 +0000 (18:53 -0800)]
ixgbe: Use static inlines instead of macros
Kernel coding standard prefers static inline functions instead
of macros, so use them for register accessors. This is to prepare
for adding LER, Live Error Recovery, checks to those accessors.
Temporarily provide macros for calling the new static inline
accessors until all references are changed.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Wed, 15 Jan 2014 02:53:12 +0000 (18:53 -0800)]
ixbge: Protect ixgbe_down with __IXGBE_DOWN bit
The ixgbe_down function can now prevent multiple executions by
doing test_and_set_bit on __IXGBE_DOWN. This did not work before
introduction of the __IXGBE_REMOVING bit, because of overloading
of __IXGBE_DOWN. Also add smp_mb__before_clear_bit call before
clearing the __IXGBE_DOWN bit.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Wed, 15 Jan 2014 02:53:11 +0000 (18:53 -0800)]
ixgbe: Indicate removal state explicitly
Add a bit, __IXGBE_REMOVING, to indicate that the module is being
removed. The __IXGBE_DOWN bit had been overloaded for this purpose,
but that leads to trouble. A few places now check both __IXGBE_DOWN
and __IXGBE_REMOVE. Notably, setting either bit will prevent service
task execution.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Tue, 14 Jan 2014 08:49:53 +0000 (00:49 -0800)]
i40e: trivial cleanup
Remove some un-necessary parenthesis.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Brandeburg [Tue, 14 Jan 2014 08:49:52 +0000 (00:49 -0800)]
i40e: whitespace fixes
Fix some whitespace and comment issues.
Change-ID: I1587599e50ce66fd389965720e86f9e331d86643 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mitch Williams [Tue, 14 Jan 2014 08:49:51 +0000 (00:49 -0800)]
i40e: make message meaningful
Make this message mean something, rather than just spitting out a VSI id
without any context whatsoever.
Change-ID: Iafb906c6db46d4b5dcbe84adc9ed44730d08bd42 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 14 Jan 2014 08:49:50 +0000 (00:49 -0800)]
i40e: associate VMDq queue with VM type
Fix a bug where the queue was not associated with the right set-up
within the hardware. The fix is to use the right QTX_CTL VSI type
when associating it to the VSI.
Change-ID: I65ef6c5a8205601c640a6593e4b7e78d6ba45545 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mitch Williams [Tue, 14 Jan 2014 08:49:49 +0000 (00:49 -0800)]
i40e: remove extra register write
This write done at the end of VF reset and should not be performed here.
Change-ID: I4d89813b68c6173184293868a6f26cf559bc2405 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 15 Jan 2014 02:52:57 +0000 (18:52 -0800)]
Merge branch 'dev_get_by_index'
Ying Xue says:
====================
use appropriate APIs to get interfaces
Under rtnl_lock protection, we should use __dev_get_name/index()
rather than dev_get_name()/index() to find interface handlers
because the former interfaces can help us avoid to change interface
reference counter.
v2 changes:
- Change return value of nl80211_set_wiphy() to 0 in patch #10
by johannes's suggestion.
- Add 'Acked-by' into several patches which were acknowledged by
corresponding maintainers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 15 Jan 2014 02:23:45 +0000 (10:23 +0800)]
net: nl80211: __dev_get_by_index instead of dev_get_by_index to find interface
As __cfg80211_rdev_from_attrs(), nl80211_dump_wiphy_parse() and
nl80211_set_wiphy() are all under rtnl_lock protection,
__dev_get_by_index() instead of dev_get_by_index() should be used
to find interface handler in them allowing us to avoid to change
interface reference counter.
Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 15 Jan 2014 02:23:44 +0000 (10:23 +0800)]
can: use __dev_get_by_index instead of dev_get_by_index to find interface
As cgw_create_job() is always under rtnl_lock protection,
__dev_get_by_index() instead of dev_get_by_index() should be used to
find interface handler in it having us avoid to change interface
reference counter.
Cc: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 15 Jan 2014 02:23:43 +0000 (10:23 +0800)]
caif: __dev_get_by_index instead of dev_get_by_index to find interface
The following call chains indicate that chnl_net_open() is under
rtnl_lock protection as __dev_open() is protected by rtnl_lock.
So if __dev_get_by_index() instead of dev_get_by_index() is used
to find interface handler in it, this would help us avoid to change
interface reference counter.
__dev_open()
chnl_net_open()
Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 15 Jan 2014 02:23:42 +0000 (10:23 +0800)]
batman-adv: use __dev_get_by_index instead of dev_get_by_index to find interface
The following call chains indicate that batadv_is_on_batman_iface()
is always under rtnl_lock protection as call_netdevice_notifier()
is protected by rtnl_lock. So if __dev_get_by_index() rather than
dev_get_by_index() is used to find interface handler in it, this
would help us avoid to change interface reference counter.
Cc: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Antonio Quartulli <antonio@meshcoding.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 15 Jan 2014 02:23:41 +0000 (10:23 +0800)]
vxlan: use __dev_get_by_index instead of dev_get_by_index to find interface
The following call chains indicate that vxlan_fdb_parse() is
under rtnl_lock protection. So if we use __dev_get_by_index()
instead of dev_get_by_index() to find interface handler in it,
this would help us avoid to change interface reference counter.
Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Ying Xue <ying.xue@windriver.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Wed, 15 Jan 2014 02:23:40 +0000 (10:23 +0800)]
decnet: use __dev_get_by_index instead of dev_get_by_index to find interface
The following call chain we can identify that dn_cache_getroute() is
protected under rtnl_lock. So if we use __dev_get_by_index() instead
of dev_get_by_index() to find interface handlers in it, this would help
us avoid to change interface reference counter.
Ying Xue [Wed, 15 Jan 2014 02:23:39 +0000 (10:23 +0800)]
dcb: use __dev_get_by_name instead of dev_get_by_name to find interface
The following call chain indicates that dcb_doit() is protected
under rtnl_lock. So if we use __dev_get_by_name() instead of
dev_get_by_name() to find interface handlers in it, this would
help us avoid to change interface reference counter.
Ying Xue [Wed, 15 Jan 2014 02:23:38 +0000 (10:23 +0800)]
eql: use __dev_get_by_name instead of dev_get_by_name to find interface
The following call chain indicates that eql_ioctl(), eql_enslave(),
eql_emancipate(), eql_g_slave_cfg() and eql_s_slave_cfg() are
protected under rtnl_lock. So if we use __dev_get_by_name() instead
of dev_get_by_name() to find interface handlers in them, this would
help us avoid to change interface reference counters.
Ying Xue [Wed, 15 Jan 2014 02:23:37 +0000 (10:23 +0800)]
bonding: use __dev_get_by_name instead of dev_get_by_name to find interface
The following call chain indicates that bond_do_ioctl() is protected
under rtnl_lock. If we use __dev_get_by_name() instead of
dev_get_by_name() to find interface handler in it, this would
help us avoid to change reference counter of interface once.
Ying Xue [Wed, 15 Jan 2014 02:23:36 +0000 (10:23 +0800)]
Drivers: Staging: cxt1e1: use __dev_get_name instead of dev_get_name to find interfaces
The following call chain denotes that both do_reset() and do_del_chan()
are protected under rtnl_lock. If we use __dev_get_by_name() instead of
dev_get_by_name() to find interface handlers in them, this would help
us avoid to change interface reference counter.
hayeswang [Wed, 15 Jan 2014 02:42:16 +0000 (10:42 +0800)]
r8152: ecm and vendor modes coexist
Remove the limitation that the ecm and r8152 drivers couldn't coexist.
- Remove the devices from the blacklist of relative drivers.
- Remove usb_driver_set_configuration() from r8152 driver.
- Modify the id_table of the r8152 driver for the vendor mode only.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Wed, 15 Jan 2014 02:42:15 +0000 (10:42 +0800)]
r8152: fix the warnings and a error from checkpatch.pl
Fix the following warnings and error:
- WARNING: usb_free_urb(NULL) is safe this check is probably not required
- WARNING: kfree(NULL) is safe this check is probably not required
- ERROR: do not use C99 // comments
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 13 Jan 2014 03:05:55 +0000 (19:05 -0800)]
bgmac: propagate error codes in bgmac_probe()
bgmac_mii_register() and register_netdev() both return appropriate error
codes for the failures they would encounter, propagate this error code
instead of overriding the value with -ENOTSUPP which is not the correct
error code to return.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Rafaล Miลecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
FX Le Bail [Mon, 13 Jan 2014 14:59:01 +0000 (15:59 +0100)]
IPv6: move the anycast_src_echo_reply sysctl to netns_sysctl_ipv6
This change move anycast_src_echo_reply sysctl with other ipv6 sysctls.
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Mon, 13 Jan 2014 13:46:08 +0000 (16:46 +0300)]
sctp: remove a redundant NULL check
It confuses Smatch when we check "sinit" for NULL and then non-NULL and
that causes a false positive warning later.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Following Ben Hutchings's advice on how to fill net_stats in alx [1],
this patch modifies the other atheros ethernet drivers
similarly. Minor whitespace/empty line changes in atl1c and atl1e to
make the code completely consistent between atl1c, atl1e, and alx.
I don't have this hardware, so these patches have only been
compile-tested.
v2 (changes only in atl1):
- don't set soft_stats.rx_missed_errors (Ben)
- add errors to soft_stats.{rx,tx}_packets (Ben)
- add soft_stats.rx_dropped field and update soft_stats.rx_dropped
instead of netdev->stats (overwritten) outside of the stats
update function
Sabrina Dubroca [Sun, 12 Jan 2014 17:50:40 +0000 (18:50 +0100)]
atl1: update statistics code
As Ben Hutchings pointed out for the stats in alx, some
hardware-specific stats aren't matched to the right net_device_stats
field. Also fix the collision field and include errors in the total
number of RX/TX packets. Add a rx_dropped field and use it where
netdev->stats was modified directly out of the stats update function.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Sun, 12 Jan 2014 17:50:39 +0000 (18:50 +0100)]
atl1e: update statistics code
As Ben Hutchings pointed out for the stats in alx, some
hardware-specific stats aren't matched to the right net_device_stats
field. Also fix the collision field and include errors in the total
number of RX/TX packets.
Minor whitespace fixes to match the style in alx.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Sun, 12 Jan 2014 17:50:38 +0000 (18:50 +0100)]
atl1c: update statistics code
As Ben Hutchings pointed out for the stats in alx, some
hardware-specific stats aren't matched to the right net_device_stats
field. Also fix the collision field and include errors in the total
number of RX/TX packets.
Minor whitespace fixes to match the style in alx.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 12 Jan 2014 12:37:59 +0000 (14:37 +0200)]
bnx2x: Correct default Tx switching behaviour
With this patch bnx2x will configure the PF to perform Tx switching on
out-going traffic as soon as SR-IOV is dynamically enabled and de-activate
it when it is disabled.
This will allow VFs to communicate with their parent PFs.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Jan 2014 23:29:25 +0000 (15:29 -0800)]
Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next
Marc Kleine-Budde says:
====================
this is a pull request of three patches for net-next/master.
Oleg Moroz added support for a new PCI card to the generic SJA1000 PCI
driver, Guenter Roeck's patch limits the flexcan driver to little
endian arm (and powerpc) and I fixed a sparse warning found by the
kbuild robot in the ti_hecc driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: replace macros net_random and net_srandom with direct calls to prandom
This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.
Signed-off-by: Aruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: copy traffic class from ping request to reply
Suggested-by: Simon Schneider <simon-schneider@gmx.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sat, 11 Jan 2014 00:09:45 +0000 (16:09 -0800)]
ipv4: register igmp_notifier even when !CONFIG_PROC_FS
We still need this notifier even when we don't config
PROC_FS.
It should be rare to have a kernel without PROC_FS,
so just for completeness.
Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Jan 2014 22:46:13 +0000 (14:46 -0800)]
Merge branch 'netdev_tracing'
Ben Hutchings says:
====================
Improve tracing at the driver/core boundary
These patches add static tracpeoints at the driver/core boundary which
record various skb fields likely to be useful for datapath debugging.
On the TX side the boundary is where the core calls ndo_start_xmit, and
on the RX side it is where any of the various exported receive functions
is called.
The set of skb fields is mostly based on what I thought would be
interesting for sfc.
These patches are basically the same as what I sent as an RFC in
November, but rebased. They now depend on 'net: core: explicitly select
a txq before doing l2 forwarding', so please merge net into net-next
before trying to apply them. The first patch fixes a code formatting
error left behind after that fix.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Fri, 10 Jan 2014 22:17:24 +0000 (22:17 +0000)]
net: Add trace events for all receive entry points, exposing more skb fields
The existing net/netif_rx and net/netif_receive_skb trace events
provide little information about the skb, nor do they indicate how it
entered the stack.
Add trace events at entry of each of the exported functions, including
most fields that are likely to be interesting for debugging driver
datapath behaviour. Split netif_rx() and netif_receive_skb() so that
internal calls are not traced.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Fri, 10 Jan 2014 22:17:03 +0000 (22:17 +0000)]
net: Add net_dev_start_xmit trace event, exposing more skb fields
The existing net/net_dev_xmit trace event provides little information
about the skb that has been passed to the driver, and it is not
simple to add more since the skb may already have been freed at
the point the event is emitted.
Add a separate trace event before the skb is passed to the driver,
including most fields that are likely to be interesting for debugging
driver datapath behaviour.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Jan 2014 22:24:25 +0000 (14:24 -0800)]
Merge branch 'skb_checksum_help'
Paul Durrant says:
====================
make skb_checksum_setup generally available
Both xen-netfront and xen-netback need to be able to set up the partial
checksum offset of an skb and may also need to recalculate the pseudo-
header checksum in the process. This functionality is currently private
and duplicated between the two drivers.
Patch #1 of this series moves the implementation into the core network code
as there is nothing xen-specific about it and it is potentially useful to
any network driver.
Patch #2 removes the private implementation from netback.
Patch #3 removes the private implementation from netfront.
v2:
- Put skb_checksum_setup in skbuff.c rather than dev.c
- remove inline
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Thu, 9 Jan 2014 10:02:48 +0000 (10:02 +0000)]
xen-netfront: use new skb_checksum_setup function
Use skb_checksum_setup to set up partial checksum offsets rather
then a private implementation.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Thu, 9 Jan 2014 10:02:47 +0000 (10:02 +0000)]
xen-netback: use new skb_checksum_setup function
Use skb_checksum_setup to set up partial checksum offsets rather
then a private implementation.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Thu, 9 Jan 2014 10:02:46 +0000 (10:02 +0000)]
net: add skb_checksum_setup
This patch adds a function to set up the partial checksum offset for IP
packets (and optionally re-calculate the pseudo-header checksum) into the
core network code.
The implementation was previously private and duplicated between xen-netback
and xen-netfront, however it is not xen-specific and is potentially useful
to any network driver.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 10 Jan 2014 21:58:47 +0000 (13:58 -0800)]
bridge: move br_net_exit() to br.c
And it can become static.
Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjรธrn Mork [Fri, 10 Jan 2014 22:10:17 +0000 (23:10 +0100)]
net: usbnet: fix SG initialisation
Commit 60e453a940ac ("USBNET: fix handling padding packet")
added an extra SG entry in case padding is necessary, but
failed to update the initialisation of the list. This can
cause list traversal to fall off the end of the list,
resulting in an oops.
Fixes: 60e453a940ac ("USBNET: fix handling padding packet") Reported-by: Thomas Kear <thomas@kear.co.nz> Cc: Ming Lei <ming.lei@canonical.com> Signed-off-by: Bjรธrn Mork <bjorn@mork.no> Tested-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dingtianhong [Sat, 11 Jan 2014 08:23:37 +0000 (16:23 +0800)]
net: 3com: fix warning for incorrect type in argument
The commit c466a9b2b329f7d9982c14eedc83a923d3bc711c
(net: 3com: slight optimization of addr compare)
cause a warning: "passing argument 1 of 'ether_addr_equal'
from incompatible pointer type", so fix it.
I think julia will convert ether_addr_equal to ether_addr_equal_64bits later.
Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dingtianhong [Sat, 11 Jan 2014 08:23:35 +0000 (16:23 +0800)]
net: qlcnic: fix warning for incorrect type in argument
The commit 6878f79a8b71e8c7b0587a1185584f54fd31f185
(net: qlcnic: slight optimization of addr compare)
cause a warning "sparse: incorrect type in argument 2
(different type sizes)", so fix it.
I think julia will convert ether_addr_equal to ether_addr_equal_64bits later.
Cc: Himanshu Madhani <himanshu.madhani@qlogic.com> Cc: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 10 Jan 2014 23:41:49 +0000 (02:41 +0300)]
sh_eth: fix garbled TX error message
sh_eth_error() in case of a TX error tries to print a message using 2 dev_err()
calls with the first string not finished by '\n', so that the resulting message
would inevitably come out garbled, with something like "3net eth0: " inserted
in the middle. Avoid that by merging 2 calls into one.
While at it, insert an empty line after the nearby declaration.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Jan 2014 07:14:25 +0000 (23:14 -0800)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Conflicts:
net/xfrm/xfrm_policy.c
Steffen Klassert says:
====================
This pull request has a merge conflict between commits be7928d20bab
("net: xfrm: xfrm_policy: fix inline not at beginning of declaration") and da7c224b1baa ("net: xfrm: xfrm_policy: silence compiler warning") from
the net-next tree and commit 2f3ea9a95c58 ("xfrm: checkpatch erros with
inline keyword position") from the ipsec-next tree.
The version from net-next can be used, like it is done in linux-next.
1) Checkpatch cleanups, from Weilong Chen.
2) Fix lockdep complaints when pktgen is used with IPsec,
from Fan Du.
3) Update pktgen to allow any combination of IPsec transport/tunnel mode
and AH/ESP/IPcomp type, from Fan Du.
4) Make pktgen_dst_metrics static, Fengguang Wu.
5) Compile fix for pktgen when CONFIG_XFRM is not set,
from Fan Du.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Fri, 10 Jan 2014 20:34:45 +0000 (15:34 -0500)]
inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets
Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT
and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock
(not just TIME_WAIT), and for such sockets the tw_substate field holds
the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2.
This brings the inet_diag state-matching code in line with the field
it uses to populate idiag_state. This is also analogous to the info
exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state
and get_timewait4_sock() exports tw->tw_substate.
Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state
fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi
state time-wait" would also return sockets in state TCP_FIN_WAIT2.
This is an old bug that predates 05dbc7b ("tcp/dccp: remove twchain").
Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
While digging through bond_3ad.c I've found that the RCU usage there is
just wrong - it's used as a kind of mutex/spinlock instead of RCU.
v3->v4: remove useless goto and wrap __get_first_agg() in proper RCU.
v2->v3: make bond_3ad_set_carrier() use RCU read lock for the whole
function, so that all other functions will be protected by RCU as well.
This way we can use _rcu variants everywhere.
v1->v2: use generic primitives instead of _rcu ones cause we can hold RTNL
lock without RCU one, which is still safe.
This patchset is on top of bond_3ad.c cleanup:
http://www.spinics.net/lists/netdev/msg265447.html
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Fri, 10 Jan 2014 10:59:45 +0000 (11:59 +0100)]
bonding: fix __get_active_agg() RCU logic
Currently, the implementation is meaningless - once again, we take the
slave structure and use it after we've exited RCU critical section.
Fix this by removing the rcu_read_lock() from __get_active_agg(), and
ensuring that all its callers are holding RCU.
Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()") CC: dingtianhong@huawei.com CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Fri, 10 Jan 2014 10:59:44 +0000 (11:59 +0100)]
bonding: fix __get_first_agg RCU usage
Currently, the RCU read lock usage is just wrong - it gets the slave struct
under RCU and continues to use it when RCU lock is released.
However, it's still safe to do this cause we didn't need the
rcu_read_lock() initially - all of the __get_first_agg() callers are either
holding RCU read lock or the RTNL lock, so that we can't sync while in it.
Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()") CC: dingtianhong@huawei.com CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Fri, 10 Jan 2014 10:59:43 +0000 (11:59 +0100)]
bonding: fix bond_3ad_set_carrier() RCU usage
Currently, its usage is just plainly wrong. It first gets a slave under
RCU, and, after releasing the RCU lock, continues to use it - whilst it can
be freed.
Fix this by ensuring that bond_3ad_set_carrier() holds RCU till it uses its
slave (or its agg).
Fixes: be79bd048ab ("bonding: add RCU for bond_3ad_state_machine_handler()") CC: dingtianhong@huawei.com CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 14 Jan 2014 05:50:27 +0000 (21:50 -0800)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Included changes:
- drop dependency against CRC16
- move to new release version
- add size check at compile time for packet structs
- update copyright years in every file
- implement new bonding/interface alternation feature
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 13 Jan 2014 23:31:42 +0000 (15:31 -0800)]
Merge branch 'qlcnic'
Shahed Shaikh says:
====================
This series includes following changes:
o SRIOV and VLAN filtering related enhancements which includes
- Do MAC learning for PF
- Restrict VF from configuring any VLAN mode
- Enable flooding on PF
- Turn on promiscuous mode for PF
o Bug fix in qlcnic_sriov_cleanup() introduced by commit 154d0c81("qlcnic: VLAN enhancement for 84XX adapters")
o Beaconing support for 83xx and 84xx series adapters
o Allow 82xx adapter to perform IPv6 LRO even if destination IP address is not
programmed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Himanshu Madhani [Fri, 10 Jan 2014 16:48:56 +0000 (11:48 -0500)]
qlcnic: Enable beaconing for 83xx/84xx Series adapter.
o Refactored code to handle beaconing test for all adapters.
o Use GET_LED_CONFIG mailbox command for 83xx/84xx series adapter
to detect current beaconing state of the adapter.
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
o MAC learning will be done for SRIOV PF to help program VLAN filters
onto adapter. This will help VNIC traffic to flow through without
flooding traffic.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qlcnic: Restrict VF from configuring any VLAN mode.
o Adapter should allow vlan traffic only for vlans configured on a VF.
On configuring any vlan mode from VF, adapter will allow any vlan
traffic to pass for that VF. Do not allow VF to configure this mode.
Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Fri, 10 Jan 2014 15:56:25 +0000 (16:56 +0100)]
net: make dev_set_mtu() honor notification return code
Currently, after changing the MTU for a device, dev_set_mtu() calls
NETDEV_CHANGEMTU notification, however doesn't verify it's return code -
which can be NOTIFY_BAD - i.e. some of the net notifier blocks refused this
change, and continues nevertheless.
To fix this, verify the return code, and if it's an error - then revert the
MTU to the original one, notify again and pass the error code.
CC: Jiri Pirko <jiri@resnulli.us> CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
packet: doc: describe PACKET_MMAP with one packet socket for rx and tx
Document how to use one AF_PACKET mmap socket for RX and TX.
Signed-off-by: Norbert van Bolhuis <nvbolhuis@aimvalley.nl> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Fri, 10 Jan 2014 06:28:11 +0000 (14:28 +0800)]
phylib: Add of_phy_attach
10G PHYs don't currently support running the state machine, which
is implicitly setup via of_phy_connect(). Therefore, it is necessary
to implement an OF version of phy_attach(), which does everything
except start the state machine.
Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Fri, 10 Jan 2014 06:27:54 +0000 (14:27 +0800)]
phylib: Support attaching to generic 10g driver
phy_attach_direct() may now attach to a generic 10G driver. It can
also be used exactly as phy_connect_direct(), which will be useful
when using of_mdio, as phy_connect (and therefore of_phy_connect)
start the PHY state machine, which is currently irrelevant for 10G
PHYs.
Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Fri, 10 Jan 2014 06:27:37 +0000 (14:27 +0800)]
phylib: Add generic 10G driver
Very incomplete, but will allow for binding an ethernet controller
to it.
Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Fri, 10 Jan 2014 06:26:46 +0000 (14:26 +0800)]
phylib: introduce PHY_INTERFACE_MODE_XGMII for 10G PHY
Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Fri, 10 Jan 2014 06:25:09 +0000 (14:25 +0800)]
phylib: Add Clause 45 read/write functions
Need an extra parameter to read or write Clause 45 PHYs, so
need a different API with the extra parameter.
Signed-off-by: Andy Fleming <afleming@gmail.com> Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a bunch of whole lot of namespace issues with the Broadcom bnx2x driver
found by running 'make namespacecheck'
* global variables must be prefixed with bnx2x_
naming a variable int_mode, or num_queue is invitation to disaster
* make local functions static
* move some inline's used in one file out of header
(this driver has a bad case of inline-itis)
* remove resulting dead code fallout
bnx2x_pfc_statistic,
bnx2x_emac_get_pfc_stat
bnx2x_init_vlan_mac_obj,
Looks like vlan mac support in this driver was a botch from day one
either never worked, or not implemented or missing support functions
Compile tested only.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Pankaj Dubey [Fri, 10 Jan 2014 03:04:06 +0000 (12:04 +0900)]
drivers: net: silence compiler warning in smc91x.c
If used 64 bit compiler GCC warns that:
drivers/net/ethernet/smsc/smc91x.c:1897:7:
warning: cast from pointer to integer of different
size [-Wpointer-to-int-cast]
This patch fixes this by changing typecast from "unsigned int" to "unsigned long"
CC: "David S. Miller" <davem@davemloft.net> CC: Jingoo Han <jg1.han@samsung.com> CC: netdev@vger.kernel.org Signed-off-by: Pankaj Dubey <pankaj.dubey@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Fri, 10 Jan 2014 01:47:17 +0000 (20:47 -0500)]
gre_offload: simplify GRE header length calculation in gre_gso_segment()
Simplify the GRE header length calculation in gre_gso_segment().
Switch to an approach that is simpler, faster, and more general. The
new approach will continue to be correct even if we add support for
the optional variable-length routing info that may be present in a GRE
header.
Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 10 Jan 2014 00:14:05 +0000 (16:14 -0800)]
net_sched: act: remove struct tcf_act_hdr
It is not necessary at all.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 10 Jan 2014 00:14:03 +0000 (16:14 -0800)]
net_sched: avoid casting void pointer
tp->root is a void* pointer, no need to cast it.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 10 Jan 2014 00:14:02 +0000 (16:14 -0800)]
net_sched: optimize tcf_match_indev()
tcf_match_indev() is called in fast path, it is not wise to
search for a netdev by ifindex and then compare by its name,
just compare the ifindex.
Also, dev->name could be changed by user-space, therefore
the match would be always fail, but dev->ifindex could
be consistent.
BTW, this will also save some bytes from the core struct of u32.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 10 Jan 2014 00:14:01 +0000 (16:14 -0800)]
net_sched: add struct net pointer to tcf_proto_ops->dump
It will be needed by the next patch.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 10 Jan 2014 00:14:00 +0000 (16:14 -0800)]
net_sched: act: clean up notification functions
Refactor tcf_add_notify() and factor out tcf_del_notify().
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Fri, 10 Jan 2014 00:13:59 +0000 (16:13 -0800)]
net_sched: act: move idx_gen into struct tcf_hashinfo
There is no need to store the index separatedly
since tcf_hashinfo is allocated statically too.
Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 9 Jan 2014 22:12:19 +0000 (14:12 -0800)]
net: gro: change GRO overflow strategy
GRO layer has a limit of 8 flows being held in GRO list,
for performance reason.
When a packet comes for a flow not yet in the list,
and list is full, we immediately give it to upper
stacks, lowering aggregation performance.
With TSO auto sizing and FQ packet scheduler, this situation
happens more often.
This patch changes strategy to simply evict the oldest flow of
the list. This works better because of the nature of packet
trains for which GRO is efficient. This also has the effect
of lowering the GRO latency if many flows are competing.
Tested :
Used a 40Gbps NIC, with 4 RX queues, and 200 concurrent TCP_STREAM
netperf.
Before patch, aggregate rate is 11Gbps (while a single flow can reach
30Gbps)
After patch, line rate is reached.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jerry Chu <hkchu@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>