]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
11 years agobonding: use vlan_uses_dev() in __bond_release_one()
Veaceslav Falico [Wed, 28 Aug 2013 21:25:12 +0000 (23:25 +0200)]
bonding: use vlan_uses_dev() in __bond_release_one()

We always hold the rtnl_lock() in __bond_release_one(), so use
vlan_uses_dev() instead of bond_vlan_used().

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: convert bond_has_this_ip() to use upper devices
Veaceslav Falico [Wed, 28 Aug 2013 21:25:11 +0000 (23:25 +0200)]
bonding: convert bond_has_this_ip() to use upper devices

Currently, bond_has_this_ip() is aware only of vlan upper devices, and thus
will return false if the address is associated with the upper bridge or any
other device, and thus will break the arp logic.

Fix this by using the upper device list. For every upper device we verify
if the address associated with it is our address, and if yes - return true.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: make bond_arp_send_all use upper device list
Veaceslav Falico [Wed, 28 Aug 2013 21:25:10 +0000 (23:25 +0200)]
bonding: make bond_arp_send_all use upper device list

Currently, bond_arp_send_all() is aware only of vlans, which breaks
configurations like bond <- bridge (or any other 'upper' device) with IP
(which is quite a common scenario for virt setups).

To fix this we convert the bond_arp_send_all() to first verify if the rt
device is the bond itself, and if not - to go through its list of upper
vlans and their respectiv upper devices (if the vlan's upper device matches
- tag the packet), if still not found - go through all of our upper list
devices to see if any of them match the route device for the target. If the
match is a vlan device - we also save its vlan_id and tag it in
bond_arp_send().

Also, clean the function a bit to be more readable.

CC: Vlad Yasevich <vyasevic@redhat.com>
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: use netdev_upper list in bond_vlan_used
Veaceslav Falico [Wed, 28 Aug 2013 21:25:09 +0000 (23:25 +0200)]
bonding: use netdev_upper list in bond_vlan_used

Convert bond_vlan_used() to traverse the upper device list to see if we
have any vlans above us. It's protected by rcu, and in case we are holding
rtnl_lock we should call vlan_uses_dev() instead - it's faster.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: add netdev_for_each_upper_dev_rcu()
Veaceslav Falico [Wed, 28 Aug 2013 21:25:08 +0000 (23:25 +0200)]
net: add netdev_for_each_upper_dev_rcu()

The new macro netdev_for_each_upper_dev_rcu(dev, upper, iter) iterates
through the dev->upper_dev_list starting from the first element, using
the netdev_upper_get_next_dev_rcu(dev, &iter).

Must be called under RCU read lock.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: add netdev_upper_get_next_dev_rcu(dev, iter)
Veaceslav Falico [Wed, 28 Aug 2013 21:25:07 +0000 (23:25 +0200)]
net: add netdev_upper_get_next_dev_rcu(dev, iter)

This function returns the next dev in the dev->upper_dev_list after the
struct list_head **iter position, and updates *iter accordingly. Returns
NULL if there are no devices left.

Caller must hold RCU read lock.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: remove search_list from netdev_adjacent
Veaceslav Falico [Wed, 28 Aug 2013 21:25:06 +0000 (23:25 +0200)]
net: remove search_list from netdev_adjacent

We already don't need it cause we see every upper/lower device in the list
already.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: add lower_dev_list to net_device and make a full mesh
Veaceslav Falico [Wed, 28 Aug 2013 21:25:05 +0000 (23:25 +0200)]
net: add lower_dev_list to net_device and make a full mesh

This patch adds lower_dev_list list_head to net_device, which is the same
as upper_dev_list, only for lower devices, and begins to use it in the same
way as the upper list.

It also changes the way the whole adjacent device lists work - now they
contain *all* of upper/lower devices, not only the first level. The first
level devices are distinguished by the bool neighbour field in
netdev_adjacent, also added by this patch.

There are cases when a device can be added several times to the adjacent
list, the simplest would be:

     /---- eth0.10 ---\
eth0-        --- bond0
     \---- eth0.20 ---/

where both bond0 and eth0 'see' each other in the adjacent lists two times.
To avoid duplication of netdev_adjacent structures ref_nr is being kept as
the number of times the device was added to the list.

The 'full view' is achieved by adding, on link creation, all of the
upper_dev's upper_dev_list devices as upper devices to all of the
lower_dev's lower_dev_list devices (and to the lower_dev itself), and vice
versa. On unlink they are removed using the same logic.

I've tested it with thousands vlans/bonds/bridges, everything works ok and
no observable lags even on a huge number of interfaces.

Memory footprint for 128 devices interconnected with each other via both
upper and lower (which is impossible, but for the comparison) lists would be:

128*128*2*sizeof(netdev_adjacent) = 1.5MB

but in the real world we usualy have at most several devices with slaves
and a lot of vlans, so the footprint will be much lower.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: rename netdev_upper to netdev_adjacent
Veaceslav Falico [Wed, 28 Aug 2013 21:25:04 +0000 (23:25 +0200)]
net: rename netdev_upper to netdev_adjacent

Rename the structure to reflect the upcoming addition of lower_dev_list.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Cong Wang <amwang@redhat.com>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net...
David S. Miller [Thu, 29 Aug 2013 20:13:32 +0000 (16:13 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
This series contains updates to ixgbe.

Jacob provides a fix for 82599 devices where it can potentially keep link
lights up when the adapter has gone down.

Mark provides a fix to resolve the possible use of uninitialized memory
by checking the return value on EEPROM reads.

Don provides 2 patches, one to fix a issue where we were traversing the
Tx ring with the value of IXGBE_NUM_RX_QUEUES which currently happens
to have the correct value but this is misleading.  A change later, could
easily make this no longer correct so when traversing the Tx ring, use
netdev->num_tx_queues.  His second patch does some minor clean ups of log
messages.

Emil provides the remaining ixgbe patches.  First he fixes the link test
where forcing the laser before the link check can lead to inconsistent
results because it does not guarantee that the link will be negotiated
correctly.  Then he initializes the message buffer array to 0 in order
to avoid using random numbers from the memory as a MAC address for the
VF.  Emil also fixes the read loop for the I2C data to account for the
offset for SFP+ modules.  Lastly, Emil provides several patches to add
support for QSFP modules where 1Gbps support is added as well as support
for older QSFP active direct attach cables which pre-date SFF-8436 v3.6.

v2: Fixed patch 4 description and added blank line based on feedback from
    Sergei Shtylyov
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agofec: Use NAPI_POLL_WEIGHT
Fabio Estevam [Tue, 27 Aug 2013 20:35:08 +0000 (17:35 -0300)]
fec: Use NAPI_POLL_WEIGHT

Instead of using a custom 'FEC_NAPI_WEIGHT', just use the generic
'NAPI_POLL_WEIGHT' definition instead.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: sctp_verify_init: clean up mandatory checks and add comment
Daniel Borkmann [Tue, 27 Aug 2013 14:53:52 +0000 (16:53 +0200)]
net: sctp: sctp_verify_init: clean up mandatory checks and add comment

Add a comment related to RFC4960 explaning why we do not check for initial
TSN, and while at it, remove yoda notation checks and clean up code from
checks of mandatory conditions. That's probably just really minor, but makes
reviewing easier.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agotcp: TSO packets automatic sizing
Eric Dumazet [Tue, 27 Aug 2013 12:46:32 +0000 (05:46 -0700)]
tcp: TSO packets automatic sizing

After hearing many people over past years complaining against TSO being
bursty or even buggy, we are proud to present automatic sizing of TSO
packets.

One part of the problem is that tcp_tso_should_defer() uses an heuristic
relying on upcoming ACKS instead of a timer, but more generally, having
big TSO packets makes little sense for low rates, as it tends to create
micro bursts on the network, and general consensus is to reduce the
buffering amount.

This patch introduces a per socket sk_pacing_rate, that approximates
the current sending rate, and allows us to size the TSO packets so
that we try to send one packet every ms.

This field could be set by other transports.

Patch has no impact for high speed flows, where having large TSO packets
makes sense to reach line rate.

For other flows, this helps better packet scheduling and ACK clocking.

This patch increases performance of TCP flows in lossy environments.

A new sysctl (tcp_min_tso_segs) is added, to specify the
minimal size of a TSO packet (default being 2).

A follow-up patch will provide a new packet scheduler (FQ), using
sk_pacing_rate as an input to perform optional per flow pacing.

This explains why we chose to set sk_pacing_rate to twice the current
rate, allowing 'slow start' ramp up.

sk_pacing_rate = 2 * cwnd * mss / srtt

v2: Neal Cardwell reported a suspect deferring of last two segments on
initial write of 10 MSS, I had to change tcp_tso_should_defer() to take
into account tp->xmit_size_goal_segs

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Van Jacobson <vanj@google.com>
Cc: Tom Herbert <therbert@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipv6: drop fragmented ndisc packets by default (RFC 6980)
Hannes Frederic Sowa [Mon, 26 Aug 2013 23:36:51 +0000 (01:36 +0200)]
ipv6: drop fragmented ndisc packets by default (RFC 6980)

This patch implements RFC6980: Drop fragmented ndisc packets by
default. If a fragmented ndisc packet is received the user is informed
that it is possible to disable the check.

Cc: Fernando Gont <fernando@gont.com.ar>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoARM: at91/dt: fix phy address in sama5xmb to match the reg property
Boris BREZILLON [Tue, 27 Aug 2013 12:41:53 +0000 (14:41 +0200)]
ARM: at91/dt: fix phy address in sama5xmb to match the reg property

Fix phy0 address to match the reg property defined in phy0 node.

Signed-off-by: Boris BREZILLON <b.brezillon@overkiz.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/cadence/macb: fix invalid 0 return if no phy is discovered on mii init
Boris BREZILLON [Tue, 27 Aug 2013 12:36:14 +0000 (14:36 +0200)]
net/cadence/macb: fix invalid 0 return if no phy is discovered on mii init

Replace misleading -1 (-EPERM) by a more appropriate return code (-ENXIO)
in macb_mii_probe function.
Save macb_mii_probe return before branching to err_out_unregister to avoid
erronous 0 return.

Signed-off-by: Boris BREZILLON <b.brezillon@overkiz.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobridge: inherit slave devices needed_headroom
Florian Fainelli [Tue, 27 Aug 2013 11:03:53 +0000 (12:03 +0100)]
bridge: inherit slave devices needed_headroom

Some slave devices may have set a dev->needed_headroom value which is
different than the default one, most likely in order to prepend a
hardware descriptor in front of the Ethernet frame to send. Whenever a
new slave is added to a bridge, ensure that we update the
needed_headroom value accordingly to account for the slave
needed_headroom value.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: sctp: reorder sctp_globals to reduce cacheline usage
Daniel Borkmann [Mon, 26 Aug 2013 14:34:00 +0000 (16:34 +0200)]
net: sctp: reorder sctp_globals to reduce cacheline usage

Reduce cacheline usage from 2 to 1 cacheline for sctp_globals structure. By
reordering elements, we can close gaps and simply achieve the following:

Current situation:
  /* size: 80, cachelines: 2, members: 10 */
  /* sum members: 57, holes: 4, sum holes: 16 */
  /* padding: 7 */
  /* last cacheline: 16 bytes */

Afterwards:
  /* size: 64, cachelines: 1, members: 10 */
  /* padding: 7 */

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: mdio-sun4i: Convert to devm_* api
Jisheng Zhang [Mon, 26 Aug 2013 13:11:57 +0000 (21:11 +0800)]
net: mdio-sun4i: Convert to devm_* api

Use devm_ioremap_resource instead of of_iomap() and devm_kzalloc()
instead of kmalloc() to make cleanup paths simpler. This patch also
fixes the resource leak caused by missing corresponding iounamp()
of the of_iomap().

Signed-off-by: Jisheng Zhang <jszhang@marvell.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoixgbe: add support for older QSFP active DA cables
Emil Tantilov [Fri, 16 Aug 2013 23:11:14 +0000 (23:11 +0000)]
ixgbe: add support for older QSFP active DA cables

This patch adds support for QSFP active direct attach (DA) cables which
pre-date SFF-8436 v3.6.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: include QSFP PHY types in ixgbe_is_sfp()
Emil Tantilov [Wed, 14 Aug 2013 07:12:27 +0000 (07:12 +0000)]
ixgbe: include QSFP PHY types in ixgbe_is_sfp()

This patch makes sure that QSFP+ modules use the SFP+ code path for
setting up link.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: add 1Gbps support for QSFP+
Emil Tantilov [Tue, 13 Aug 2013 07:22:16 +0000 (07:22 +0000)]
ixgbe: add 1Gbps support for QSFP+

This patch adds GB speed support for QSFP+ modules.
Autonegotiation is not supported with QSFP+. The user will have to set
the desired speed on both link partners using ethtool advertise setting.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix SFF data dumps of SFP+ modules from an offset
Emil Tantilov [Tue, 13 Aug 2013 04:59:29 +0000 (04:59 +0000)]
ixgbe: fix SFF data dumps of SFP+ modules from an offset

This patch fixes the read loop for the I2C data to account for the offset.

Also includes a whitespace cleanup and removes ret_val as it is not needed.

CC: Ben Hutchings <bhutchings@solarflare.com>
Reported-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: cleanup some log messages
Don Skidmore [Wed, 31 Jul 2013 05:27:04 +0000 (05:27 +0000)]
ixgbe: cleanup some log messages

Some minor log messages cleanup, changing the level one message is logged,
adding a bit of detail to another and put all the text on one line.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: zero out mailbox buffer on init
Emil Tantilov [Fri, 26 Jul 2013 07:34:54 +0000 (07:34 +0000)]
ixgbe: zero out mailbox buffer on init

This patch initializes the msgbuf array to 0 in order to avoid using random
numbers from the memory as MAC address for the VF.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix link test when connected to 1Gbps link partner
Emil Tantilov [Wed, 10 Jul 2013 02:47:24 +0000 (02:47 +0000)]
ixgbe: fix link test when connected to 1Gbps link partner

This patch is a partial reverse of:
commit dfcc4615f09c33454bc553567f7c7506cae60cb9
Author: Jacob Keller <jacob.e.keller@intel.com>
Date: Thu Nov 8 07:07:08 2012 +0000

  ixgbe: ethtool ixgbe_diag_test cleanup

Specifically forcing the laser before the link check can lead to
inconsistent results because it does not guarantee that the link will be
negotiated correctly. Such is the case when dual speed SFP+ module is
connected to a gigabit link partner.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: fix incorrect limit value in ring transverse
Don Skidmore [Fri, 28 Jun 2013 05:35:50 +0000 (05:35 +0000)]
ixgbe: fix incorrect limit value in ring transverse

We were transversing the tx_ring with IXGBE_NUM_RX_QUEUES.  Now this define
happens to have the correct value but this is misleading and a change later
could easily make this no longer true.  I updated it to netdev->num_tx_queues
like we use in ixgbe_get_strings().

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: Check return value on eeprom reads
Mark Rustad [Fri, 24 May 2013 07:31:09 +0000 (07:31 +0000)]
ixgbe: Check return value on eeprom reads

This patch fixes the possible use of uninitialized memory by checking the
return value on eeprom reads. These issues were identified by static
analysis. In many cases error messages will be produced so that corrupted
eeprom issues will be more visible.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoixgbe: disable link when adapter goes down
Jacob Keller [Tue, 25 Jun 2013 07:59:23 +0000 (07:59 +0000)]
ixgbe: disable link when adapter goes down

This patch fixes an issue with the 82599 adapter where it can potentially keep
link lights up when the adapter has gone down. The patch adds a function which
ensures link is disabled, and calls this function when the adapter transitions
to a down state.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc...
David S. Miller [Thu, 29 Aug 2013 05:56:01 +0000 (01:56 -0400)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next

Ben Hutchings says:

====================
1. Further cleanup and refactoring in preparation for EF10.
2. Remove ethtool stats that are always zero on Falcon boards.
3. Add an ethtool stat for merged TX completions.
4. Prepare to support merged RX completions.
5. Prepare to support more hwmon sensors.
6. Add support for new events that are generated by EF10 firmware.
7. Update MC reboot detection for EF10.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
David S. Miller [Thu, 29 Aug 2013 05:44:24 +0000 (01:44 -0400)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Included changes:
- set the protocol field in the skb structure according to the encapsulated
  payload
- make the gateway component send a uevent in case of "gw client mode"
  de-selection
- increment version number
- minor code rearrangement

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: underflow in qlcnic_validate_max_tx_rings()
Dan Carpenter [Tue, 27 Aug 2013 01:16:22 +0000 (04:16 +0300)]
qlcnic: underflow in qlcnic_validate_max_tx_rings()

This function checks the upper bound but it doesn't check for negative
numbers:

if (txq > QLCNIC_MAX_TX_RINGS) {

I've solved this by making "txq" a u32 type.  I chose that because
->tx_count in the ethtool_channels struct is a __u32.

This bug was added in aa4a1f7df7 ('qlcnic: Enable Tx queue changes using
ethtool for 82xx Series adapter.').

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'xen-netback'
David S. Miller [Thu, 29 Aug 2013 05:18:52 +0000 (01:18 -0400)]
Merge branch 'xen-netback'

Wei Liu says:

====================
xen-netback: switch to NAPI + kthread 1:1 model

This series implements NAPI + kthread 1:1 model for Xen netback.

This model
 - provides better scheduling fairness among vifs
 - is prerequisite for implementing multiqueue for Xen network driver

The second patch has the real meat:
 - make use of NAPI to mitigate interrupt
 - kthreads are not bound to CPUs any more, so that we can take
   advantage of backend scheduler and trust it to do the right thing

Benchmark is done on a Dell T3400 workstation with 4 cores, running 4
DomUs. Netserver runs in Dom0. DomUs do netperf to Dom0 with
following command: /root/netperf -H Dom0 -fm -l120

IRQs are distributed to 4 cores by hand in the new model, while in the
old model vifs are automatically distributed to 4 kthreads.

* New model
%Cpu0  :  0.5 us, 20.3 sy,  0.0 ni, 28.9 id,  0.0 wa,  0.0 hi, 24.4 si, 25.9 st
%Cpu1  :  0.5 us, 17.8 sy,  0.0 ni, 28.8 id,  0.0 wa,  0.0 hi, 27.7 si, 25.1 st
%Cpu2  :  0.5 us, 18.8 sy,  0.0 ni, 30.7 id,  0.0 wa,  0.0 hi, 22.9 si, 27.1 st
%Cpu3  :  0.0 us, 20.1 sy,  0.0 ni, 30.4 id,  0.0 wa,  0.0 hi, 22.7 si, 26.8 st
Throughputs: 2027.89 2025.95 2018.57 2016.23 aggregated: 8088.64

* Old model
%Cpu0  :  0.5 us, 68.8 sy,  0.0 ni, 16.1 id,  0.5 wa,  0.0 hi,  2.8 si, 11.5 st
%Cpu1  :  0.4 us, 45.1 sy,  0.0 ni, 31.1 id,  0.4 wa,  0.0 hi,  2.1 si, 20.9 st
%Cpu2  :  0.9 us, 44.8 sy,  0.0 ni, 30.9 id,  0.0 wa,  0.0 hi,  1.3 si, 22.2 st
%Cpu3  :  0.8 us, 46.4 sy,  0.0 ni, 28.3 id,  1.3 wa,  0.0 hi,  2.1 si, 21.1 st
Throughputs: 1899.14 2280.43 1963.33 1893.47 aggregated: 8036.37

We can see that the impact is mainly on CPU usage. The new model moves
processing from kthread to NAPI (software interrupt).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoxen-netback: rename functions
Wei Liu [Mon, 26 Aug 2013 11:59:39 +0000 (12:59 +0100)]
xen-netback: rename functions

As we move to 1:1 model and melt xen_netbk and xenvif together, it would
be better to use single prefix for all functions in xen-netback.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoxen-netback: switch to NAPI + kthread 1:1 model
Wei Liu [Mon, 26 Aug 2013 11:59:38 +0000 (12:59 +0100)]
xen-netback: switch to NAPI + kthread 1:1 model

This patch implements 1:1 model netback. NAPI and kthread are utilized
to do the weight-lifting job:

- NAPI is used for guest side TX (host side RX)
- kthread is used for guest side RX (host side TX)

Xenvif and xen_netbk are made into one structure to reduce code size.

This model provides better scheduling fairness among vifs. It is also
prerequisite for implementing multiqueue for Xen netback.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoxen-netback: remove page tracking facility
Wei Liu [Mon, 26 Aug 2013 11:59:37 +0000 (12:59 +0100)]
xen-netback: remove page tracking facility

The data flow from DomU to DomU on the same host in current copying
scheme with tracking facility:

       copy
DomU --------> Dom0          DomU
 |                            ^
 |____________________________|
             copy

The page in Dom0 is a page with valid MFN. So we can always copy from
page Dom0, thus removing the need for a tracking facility.

       copy           copy
DomU --------> Dom0 -------> DomU

Simple iperf test shows no performance regression (obviously we copy
twice either way):

  W/  tracking: ~5.3Gb/s
  W/o tracking: ~5.4Gb/s

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Matt Wilson <msw@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobatman-adv: send GW_DEL event when the gw client mode is deselected
Antonio Quartulli [Fri, 12 Jul 2013 22:06:00 +0000 (00:06 +0200)]
batman-adv: send GW_DEL event when the gw client mode is deselected

Whenever the GW client mode is deselected, a DEL event has
to be sent in order to tell userspace that the current
gateway has been lost. Send the uevent on state change only
if a gateway was currently selected.

Reported-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
11 years agobatman-adv: Start new development cycle
Simon Wunderlich [Sun, 21 Jul 2013 21:03:15 +0000 (23:03 +0200)]
batman-adv: Start new development cycle

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
11 years agobatman-adv: move enum definition at the top of the file
Antonio Quartulli [Sat, 17 Aug 2013 10:44:44 +0000 (12:44 +0200)]
batman-adv: move enum definition at the top of the file

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
11 years agobatman-adv: set skb priority according to content
Simon Wunderlich [Mon, 29 Jul 2013 15:56:44 +0000 (17:56 +0200)]
batman-adv: set skb priority according to content

The skb priority field may help the wireless driver to choose the right
queue (e.g. WMM queues). This should be set in batman-adv, as this
information is only available here.

This patch adds support for IPv4/IPv6 DS fields and VLAN PCP. Note that
only VLAN PCP is used if a VLAN header is present. Also initially set
TC_PRIO_CONTROL only for self-generated packets, and keep the priority
set by higher layers.

Signed-off-by: Simon Wunderlich <simon@open-mesh.com>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
David S. Miller [Wed, 28 Aug 2013 02:11:18 +0000 (22:11 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch

Jesse Gross says:

====================
A number of significant new features and optimizations for net-next/3.12.
Highlights are:
 * "Megaflows", an optimization that allows userspace to specify which
   flow fields were used to compute the results of the flow lookup.
   This allows for a major reduction in flow setups (the major
   performance bottleneck in Open vSwitch) without reducing flexibility.
 * Converting netlink dump operations to use RCU, allowing for
   additional parallelism in userspace.
 * Matching and modifying SCTP protocol fields.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Wed, 28 Aug 2013 02:07:02 +0000 (22:07 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter updates for your net-next tree,
they are:

* The new SYNPROXY target for iptables, including IPv4 and IPv6 support,
  from Patrick McHardy.

* nf_defrag_ipv6.o should be only linked to nf_defrag_ipv6.ko, from
  Nathan Hintz.

* Fix an old bug in REJECT, which replies with wrong MAC source address
  from the bridge, by Phil Oester.

* Fix uninitialized helper variable in the expectation support over
  nfnetlink_queue, from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc...
David S. Miller [Wed, 28 Aug 2013 01:56:22 +0000 (21:56 -0400)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next

Ben Hutchings says:

====================
More refactoring and cleanup, particularly around filter management.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonetfilter: ctnetlink: fix uninitialized variable
Florian Westphal [Tue, 27 Aug 2013 09:47:26 +0000 (11:47 +0200)]
netfilter: ctnetlink: fix uninitialized variable

net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_nfqueue_attach_expect':
'helper' may be used uninitialized in this function

It was only initialized in if CTA_EXPECT_HELP_NAME attribute was
present, it must be NULL otherwise.

Problem added recently in bd077937
(netfilter: nfnetlink_queue: allow to attach expectations to conntracks).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: add IPv6 SYNPROXY target
Patrick McHardy [Tue, 27 Aug 2013 06:50:16 +0000 (08:50 +0200)]
netfilter: add IPv6 SYNPROXY target

Add an IPv6 version of the SYNPROXY target. The main differences to the
IPv4 version is routing and IP header construction.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonet: syncookies: export cookie_v6_init_sequence/cookie_v6_check
Patrick McHardy [Tue, 27 Aug 2013 06:50:15 +0000 (08:50 +0200)]
net: syncookies: export cookie_v6_init_sequence/cookie_v6_check

Extract the local TCP stack independant parts of tcp_v6_init_sequence()
and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY
target.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: add SYNPROXY core/target
Patrick McHardy [Tue, 27 Aug 2013 06:50:14 +0000 (08:50 +0200)]
netfilter: add SYNPROXY core/target

Add a SYNPROXY for netfilter. The code is split into two parts, the synproxy
core with common functions and an address family specific target.

The SYNPROXY receives the connection request from the client, responds with
a SYN/ACK containing a SYN cookie and announcing a zero window and checks
whether the final ACK from the client contains a valid cookie.

It then establishes a connection to the original destination and, if
successful, sends a window update to the client with the window size
announced by the server.

Support for timestamps, SACK, window scaling and MSS options can be
statically configured as target parameters if the features of the server
are known. If timestamps are used, the timestamp value sent back to
the client in the SYN/ACK will be different from the real timestamp of
the server. In order to now break PAWS, the timestamps are translated in
the direction server->client.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonet: syncookies: export cookie_v4_init_sequence/cookie_v4_check
Patrick McHardy [Tue, 27 Aug 2013 06:50:13 +0000 (08:50 +0200)]
net: syncookies: export cookie_v4_init_sequence/cookie_v4_check

Extract the local TCP stack independant parts of tcp_v4_init_sequence()
and cookie_v4_check() and export them for use by the upcoming SYNPROXY
target.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: nf_conntrack: make sequence number adjustments usuable without NAT
Patrick McHardy [Tue, 27 Aug 2013 06:50:12 +0000 (08:50 +0200)]
netfilter: nf_conntrack: make sequence number adjustments usuable without NAT

Split out sequence number adjustments from NAT and move them to the conntrack
core to make them usable for SYN proxying. The sequence number adjustment
information is moved to a seperate extend. The extend is added to new
conntracks when a NAT mapping is set up for a connection using a helper.

As a side effect, this saves 24 bytes per connection with NAT in the common
case that a connection does not have a helper assigned.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Tested-by: Martin Topholm <mph@one.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: nf_defrag_ipv6.o included twice
Nathan Hintz [Fri, 23 Aug 2013 05:09:12 +0000 (22:09 -0700)]
netfilter: nf_defrag_ipv6.o included twice

'nf_defrag_ipv6' is built as a separate module; it shouldn't be
included in the 'nf_conntrack_ipv6' module as well.

Signed-off-by: Nathan Hintz <nlhintz@hotmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agonetfilter: ip[6]t_REJECT: tcp-reset using wrong MAC source if bridged
Phil Oester [Wed, 26 Jun 2013 21:16:28 +0000 (17:16 -0400)]
netfilter: ip[6]t_REJECT: tcp-reset using wrong MAC source if bridged

As reported by Casper Gripenberg, in a bridged setup, using ip[6]t_REJECT
with the tcp-reset option sends out reset packets with the src MAC address
of the local bridge interface, instead of the MAC address of the intended
destination.  This causes some routers/firewalls to drop the reset packet
as it appears to be spoofed.  Fix this by bypassing ip[6]_local_out and
setting the MAC of the sender in the tcp reset packet.

This closes netfilter bugzilla #531.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
11 years agosfc: Use extended MC_CMD_SENSOR_INFO and MC_CMD_READ_SENSORS
Ben Hutchings [Thu, 8 Aug 2013 10:14:20 +0000 (11:14 +0100)]
sfc: Use extended MC_CMD_SENSOR_INFO and MC_CMD_READ_SENSORS

We need to use extended requests to read and get metadata for sensors
numbered > 31.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Return an error code when a sensor is busy.
Alexandre Rames [Wed, 3 Jul 2013 08:47:34 +0000 (09:47 +0100)]
sfc: Return an error code when a sensor is busy.

[bwh: Also name this new state, though we don't expect to see it in an event]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Add support for reading packet length from prefix
Ben Hutchings [Sat, 27 Apr 2013 00:55:18 +0000 (01:55 +0100)]
sfc: Add support for reading packet length from prefix

Define a flag for struct efx_rx_buffer and efx_rx_packet() that
indicates packet length must be read from the prefix.  If this
is set, read the length in __efx_rx_packet() (when the prefix
should have arrived in cache).

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Add TX merged completion counter
Ben Hutchings [Sat, 27 Apr 2013 00:55:21 +0000 (01:55 +0100)]
sfc: Add TX merged completion counter

Add a counter for TX merged completion events.

This is implemented in the common TX path, because the NIC event
handlers only know how many descriptors were completed, not how many
packets.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Generalise packet hash lookup to support EF10 RX prefix
Jon Cooper [Thu, 18 Oct 2012 14:49:54 +0000 (15:49 +0100)]
sfc: Generalise packet hash lookup to support EF10 RX prefix

EF10 uses an entirely different RX prefix format from Falcon-arch.
Extend struct efx_nic_type to describe this.

[bwh: Also replace the magic numbers used for the Falcon-arch RX prefix]
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Rename EFX_PAGE_BLOCK_SIZE to EFX_VI_PAGE_SIZE and adjust comments
Ben Hutchings [Fri, 28 Jun 2013 19:14:46 +0000 (20:14 +0100)]
sfc: Rename EFX_PAGE_BLOCK_SIZE to EFX_VI_PAGE_SIZE and adjust comments

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Remove early call to efx_nic_type::reconfigure_mac in efx_reset_up()
Ben Hutchings [Wed, 26 Jun 2013 23:13:07 +0000 (00:13 +0100)]
sfc: Remove early call to efx_nic_type::reconfigure_mac in efx_reset_up()

efx_reset_up() calls efx_nic_type::reconfigure_mac once directly,
then again through efx_start_all() -> efx_start_port() ->
efx->type->reconfigure_mac().

This first call is also made too early to work properly on EF10.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: use MCDI epoch flag to improve MC reboot detection in the driver
Daniel Pieczko [Thu, 20 Jun 2013 10:40:07 +0000 (11:40 +0100)]
sfc: use MCDI epoch flag to improve MC reboot detection in the driver

The Huntington MC will reject all MCDI requests after an MC reboot until it sees
one with the NOT_EPOCH flag clear.  This flag is set by default for all requests,
and then cleared on the first request after we detect that an MC reboot has
occurred.

The old MCDI_STATUS_DELAY_COUNT gave a timeout of 10ms, which was not long enough
for the driver to detect that a reboot had occurred based on the warm boot count
while calling efx_mcdi_poll_reboot() from the loop in efx_mcdi_ev_death().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Add EF10 support for TX/RX DMA error events handling.
Alexandre Rames [Thu, 13 Jun 2013 10:36:15 +0000 (11:36 +0100)]
sfc: Add EF10 support for TX/RX DMA error events handling.

Also, since we handle all DMA errors in the same way, merge
RESET_TYPE_(RX|TX)_DESC_FETCH into RESET_TYPE_DMA_ERROR.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Add a function pointer to abstract write of host time into NIC shared memory
Laurence Evans [Thu, 7 Mar 2013 11:46:58 +0000 (11:46 +0000)]
sfc: Add a function pointer to abstract write of host time into NIC shared memory

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: PTP MCDI requests need to initialise periph ID field
Laurence Evans [Wed, 6 Mar 2013 15:33:17 +0000 (15:33 +0000)]
sfc: PTP MCDI requests need to initialise periph ID field

This field is ignored by Siena firmware but is significant to EF10 firmware.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Delegate MAC/NIC statistic description to efx_nic_type
Ben Hutchings [Fri, 14 Dec 2012 21:52:56 +0000 (21:52 +0000)]
sfc: Delegate MAC/NIC statistic description to efx_nic_type

Various hardware statistics that are available for Siena are
unavailable or meaningless for Falcon.  Huntington adds further to the
NIC-type-specific statistics, as it has different MAC blocks from
Falcon/Siena.

All NIC types still provide most statistics by DMA, and use
little-endian byte order.

Therefore:
1. Add some general utility functions for reporting hardware statistics,
   efx_nic_describe_stats() and efx_nic_update_stats().
2. Add an efx_nic_type::describe_stats operation to get the number and
   names of statistics, implemented using efx_nic_describe_stats()
3. Change efx_nic_type::update_stats to store the core statistics
   (struct rtnl_link_stats64) or full statistics (array of u64) in a
   caller-provided buffer.  Use efx_nic_update_stats() to aid in the
   implementation.
4. Rename struct efx_ethtool_stat to struct efx_sw_stat_desc and
   EFX_ETHTOOL_NUM_STATS to EFX_ETHTOOL_SW_STAT_COUNT.
5. Remove efx_nic::mac_stats and struct efx_mac_stats.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Remove driver-local struct ethtool_string
Ben Hutchings [Fri, 14 Dec 2012 22:18:55 +0000 (22:18 +0000)]
sfc: Remove driver-local struct ethtool_string

It's not really helpful to pretend ethtool string arrays are
structured.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Remove more left-overs from Falcon GMAC support
Ben Hutchings [Fri, 14 Dec 2012 21:52:56 +0000 (21:52 +0000)]
sfc: Remove more left-overs from Falcon GMAC support

We only ever used the XMAC (10G link speed) in production.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agosfc: Move MTD operations into efx_nic_type
Ben Hutchings [Wed, 28 Nov 2012 04:38:14 +0000 (04:38 +0000)]
sfc: Move MTD operations into efx_nic_type

Merge the per-NIC-type MTD probe selection and struct efx_mtd_ops into
struct efx_nic_type.  Move the implementations into the appropriate
source files.

Several NVRAM functions are now only called from MTD operations which
are now implemented in the same file (falcon.c or mcdi.c).  There is no
need for them to be extern, or to be defined at all if CONFIG_SFC_MTD
is not enabled, so move them into the #ifdef CONFIG_SFC_MTD sections
in those files.

Most of the SPI-related definitions are also only used in falcon.c,
so move them there.  Put the remainder of spi.h into nic.h (which
previously included it).

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
11 years agoopenvswitch: optimize flow compare and mask functions
Andy Zhou [Tue, 27 Aug 2013 20:02:21 +0000 (13:02 -0700)]
openvswitch: optimize flow compare and mask functions

Make sure the sw_flow_key structure and valid mask boundaries are always
machine word aligned. Optimize the flow compare and mask operations
using machine word size operations. This patch improves throughput on
average by 15% when CPU is the bottleneck of forwarding packets.

This patch is inspired by ideas and code from a patch submitted by Peter
Klausler titled "replace memcmp() with specialized comparator".
However, The original patch only optimizes for architectures
support unaligned machine word access. This patch optimizes for all
architectures.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoe1000e: balance semaphore put/get for 82573
Steven La [Sat, 24 Aug 2013 00:19:37 +0000 (17:19 -0700)]
e1000e: balance semaphore put/get for 82573

Steven (cc-ed) noticed an imbalance in semaphore put/get for
82573-based NICs. Don't we need something like the following
(untested) patch?

Signed-off-by: Steven La <sla@riverbed.com>
Acked-by: Arthur Kepner <akepner@riverbed.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoDocumentation/networking/: Update Intel wired LAN driver documentation
Jeff Kirsher [Sat, 24 Aug 2013 00:19:23 +0000 (17:19 -0700)]
Documentation/networking/: Update Intel wired LAN driver documentation

Updates the documentation to the Intel wired LAN drivers.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobna: firmware update to 3.2.1.1
Rasesh Mody [Fri, 23 Aug 2013 21:31:30 +0000 (14:31 -0700)]
bna: firmware update to 3.2.1.1

This patch updates the firmware to address the thermal notification issue

Signed-off-by: Rasesh Mody <rmody@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoVMXNET3: Add support for virtual IOMMU
Andy King [Fri, 23 Aug 2013 16:33:49 +0000 (09:33 -0700)]
VMXNET3: Add support for virtual IOMMU

This patch adds support for virtual IOMMU to the vmxnet3 module.  We
switch to DMA consistent mappings for anything we pass to the device.
There were a few places where we already did this, but using pci_blah();
these have been fixed to use dma_blah(), along with all new occurrences
where we've replaced kmalloc() and friends.

Also fix two small bugs:
1) use after free of rq->buf_info in vmxnet3_rq_destroy()
2) a cpu_to_le32() that should have been a cpu_to_le64()

Acked-by: George Zhang <georgezhang@vmware.com>
Acked-by: Aditya Sarwade <asarwade@vmware.com>
Signed-off-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: implement ethtool set/get_channel hooks
Sathya Perla [Tue, 27 Aug 2013 11:27:35 +0000 (16:57 +0530)]
be2net: implement ethtool set/get_channel hooks

Support is provided only for combined channels. When SR-IOV is not
enabled, BE3 supports upto 16 channels and Lancer-R/SH-R support upto
32 channels.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: refactor be_setup() to consolidate queue creation routines
Sathya Perla [Tue, 27 Aug 2013 11:27:34 +0000 (16:57 +0530)]
be2net: refactor be_setup() to consolidate queue creation routines

1) Move be_cmd_if_create() above queue create routines to allow
   TXQ creation (that requires if_handle) to be clubbed with TX-CQ creation.
2) Consolidate all queue create routines into be_setup_queues()

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: Fix be_cmd_if_create() to use MBOX if MCCQ is not created
Sathya Perla [Tue, 27 Aug 2013 11:27:33 +0000 (16:57 +0530)]
be2net: Fix be_cmd_if_create() to use MBOX if MCCQ is not created

Currently the IF_CREATE FW cmd is issued only *after* MCCQ is created as
it was coded to only use MCCQ. By fixing this, cmd_if_create() can be
called before MCCQ is created and the same routine for VF provisioning
can be called after.
This allows for consolidating all the queue create routines by moving
the be_cmd_if_create() call above all queue create calls in be_setup().

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: refactor be_get_resources() code
Sathya Perla [Tue, 27 Aug 2013 11:27:32 +0000 (16:57 +0530)]
be2net: refactor be_get_resources() code

1) use be_resources{} struct to query/store HW resource limits
2) The HW queue/resource limits for BE2/BE3 chips are mostly called out
   in driver as constants.  Code to handle this is scattered across various
   places in be_setup(). Consolidate this code into BEx_get_resources().
   For Lancer-R, Skyhawk-R, these limits are queried from FW.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: Fixup profile management routines
Vasundhara Volam [Tue, 27 Aug 2013 11:27:31 +0000 (16:57 +0530)]
be2net: Fixup profile management routines

1) Parse PCIe descriptor for max-VFs supported by HW
2) Cleanup NIC descriptor parsing in get_func/profile_config() routines
3) Use common struct definitions for v0 and v1 versions of GET_FUNC_CONFIG
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobe2net: use EQ_CREATEv2 for SH-R
Sathya Perla [Tue, 27 Aug 2013 11:27:30 +0000 (16:57 +0530)]
be2net: use EQ_CREATEv2 for SH-R

EQ_CREATEv2 explicitly returns the msix-index associated with a EQ.
For SH-R this is needed if EQs need to be deleted and re-created without
resetting a function.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: tcp_probe: allow more advanced ingress filtering by mark
Daniel Borkmann [Fri, 23 Aug 2013 14:16:33 +0000 (16:16 +0200)]
net: tcp_probe: allow more advanced ingress filtering by mark

Currently, the tcp_probe snooper can either filter packets by a given
port (handed to the module via module parameter e.g. port=80) or lets
all TCP traffic pass (port=0, default). When a port is specified, the
port number is tested against the sk's source/destination port. Thus,
if one of them matches, the information will be further processed for
the log.

As this is quite limited, allow for more advanced filtering possibilities
which can facilitate debugging/analysis with the help of the tcp_probe
snooper. Therefore, similarly as added to BPF machine in commit 7e75f93e
("pkt_sched: ingress socket filter by mark"), add the possibility to
use skb->mark as a filter.

If the mark is not being used otherwise, this allows ingress filtering
by flow (e.g. in order to track updates from only a single flow, or a
subset of all flows for a given port) and other things such as dynamic
logging and reconfiguration without removing/re-inserting the tcp_probe
module, etc. Simple example:

  insmod net/ipv4/tcp_probe.ko fwmark=8888 full=1
  ...
  iptables -A INPUT -i eth4 -t mangle -p tcp --dport 22 \
           --sport 60952 -j MARK --set-mark 8888
  [... sampling interval ...]
  iptables -D INPUT -i eth4 -t mangle -p tcp --dport 22 \
           --sport 60952 -j MARK --set-mark 8888

The current option to filter by a given port is still being preserved. A
similar approach could be done for the sctp_probe module as a follow-up.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: Update version to 5.3.49.
Sucheta Chakraborty [Fri, 23 Aug 2013 17:38:29 +0000 (13:38 -0400)]
qlcnic: Update version to 5.3.49.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: dcb: Add support for CEE Netlink interface.
Sucheta Chakraborty [Fri, 23 Aug 2013 17:38:28 +0000 (13:38 -0400)]
qlcnic: dcb: Add support for CEE Netlink interface.

o Adapter and driver supports only CEE dcbnl ops. Only GET callbacks
  within dcbnl ops are supported currently.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: dcb: Register DCB AEN handler.
Sucheta Chakraborty [Fri, 23 Aug 2013 17:38:27 +0000 (13:38 -0400)]
qlcnic: dcb: Register DCB AEN handler.

o Adapter sends Asynchronous Event Notifications to the driver when
  there are changes in the switch or adapter DCBX configuration.
  AEN handler updates the driver DCBX parameters.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: dcb: Get DCB parameters from the adapter.
Sucheta Chakraborty [Fri, 23 Aug 2013 17:38:26 +0000 (13:38 -0400)]
qlcnic: dcb: Get DCB parameters from the adapter.

o Populate driver data structures with local, operational, and peer
  DCB parameters.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoqlcnic: dcb: Query adapter DCB capabilities.
Sucheta Chakraborty [Fri, 23 Aug 2013 17:38:25 +0000 (13:38 -0400)]
qlcnic: dcb: Query adapter DCB capabilities.

o Query adapter DCB capabilities and  populate local data structures
  with relevant information.

o Add QLCNIC_DCB to Kconfig for enabling/disabling DCB.

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc...
David S. Miller [Tue, 27 Aug 2013 16:16:20 +0000 (12:16 -0400)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next

Ben Hutchings says:

====================
1. Refactoring and cleanup in preparation for new hardware support.
2. Some bug fixes for firmware completion handling.  (They're not known
to cause real problems, otherwise I'd be submitting these for net and
stable.)
3. Update to the firmware protocol (MCDI) definitions.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoopenvswitch: Rename key_len to key_end
Andy Zhou [Thu, 22 Aug 2013 19:12:57 +0000 (12:12 -0700)]
openvswitch: Rename key_len to key_end

Key_end is a better name describing the ending boundary than key_len.
Rename those variables to make it less confusing.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Add SCTP support
Joe Stringer [Thu, 22 Aug 2013 19:30:48 +0000 (12:30 -0700)]
openvswitch: Add SCTP support

This patch adds support for rewriting SCTP src,dst ports similar to the
functionality already available for TCP/UDP.

Rewriting SCTP ports is expensive due to double-recalculation of the
SCTP checksums; this is performed to ensure that packets traversing OVS
with invalid checksums will continue to the destination with any
checksum corruption intact.

Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Mon, 26 Aug 2013 20:37:08 +0000 (16:37 -0400)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Conflicts:
drivers/net/wireless/iwlwifi/pcie/trans.c
include/linux/inetdevice.h

The inetdevice.h conflict involves moving the IPV4_DEVCONF values
into a UAPI header, overlapping additions of some new entries.

The iwlwifi conflict is a context overlap.

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'cadence'
David S. Miller [Mon, 26 Aug 2013 20:04:26 +0000 (16:04 -0400)]
Merge branch 'cadence'

Boris BREZILLON says:

====================
net/cadence/macb: add support for dt phy definition

This patch series adds support for ethernet phy definition using device
tree.

This may help in moving some at91 boards to dt (some of them define an
interrupt pin).

Tested on samad31ek.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoARM: at91/dt: define phy available on sama5d3 mother board
Boris BREZILLON [Thu, 22 Aug 2013 15:58:29 +0000 (17:58 +0200)]
ARM: at91/dt: define phy available on sama5d3 mother board

This patch describe the phy used on atmel sama5d3 mother board:
 - phy address
 - phy interrupt pin

Signed-off-by: Boris BREZILLON <b.brezillon@overkiz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet/cadence/macb: add support for dt phy definition
Boris BREZILLON [Thu, 22 Aug 2013 15:57:28 +0000 (17:57 +0200)]
net/cadence/macb: add support for dt phy definition

The macb driver only handle PHY description through platform_data
(macb_platform_data).
Thus, when using dt you cannot define phy properties like phy address or
phy irq pin.

This patch makes use of the of_mdiobus_register to add support for
phy device definition using dt.
A fallback to the autoscan procedure is added in case there is no phy
devices defined in dt.

Signed-off-by: Boris BREZILLON <b.brezillon@overkiz.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoipip: potential race in ip_tunnel_init_net()
Dan Carpenter [Fri, 23 Aug 2013 08:15:37 +0000 (11:15 +0300)]
ipip: potential race in ip_tunnel_init_net()

Eric Dumazet says that my previous fix for an ERR_PTR dereference
(ea857f28ab 'ipip: dereferencing an ERR_PTR in ip_tunnel_init_net()')
could be racy and suggests the following fix instead.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agobonding: fix error return code in bond_enslave()
Wei Yongjun [Fri, 23 Aug 2013 02:45:07 +0000 (10:45 +0800)]
bonding: fix error return code in bond_enslave()

Fix to return a negative error code in the add bond vlan ids error
handling case instead of 0, as done elsewhere in this function.

Introduced by commit 1ff412ad7714f6952f76ffd77f0a7f2f563288a1.
(bonding: change the bond's vlan syncing functions with the standard ones)

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agoMerge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc...
David S. Miller [Sun, 25 Aug 2013 22:30:27 +0000 (18:30 -0400)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next

Merge SFC driver changes from Ben Hutchings.

Signed-off-by: David S. Miller <davem@davemloft.net>
11 years agonet: Add NEXTHDR_SCTP to ipv6.h
Joe Stringer [Tue, 23 Jul 2013 04:37:45 +0000 (13:37 +0900)]
net: Add NEXTHDR_SCTP to ipv6.h

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Mega flow implementation
Andy Zhou [Thu, 8 Aug 2013 03:01:00 +0000 (20:01 -0700)]
openvswitch: Mega flow implementation

Add wildcarded flow support in kernel datapath.

Wildcarded flow can improve OVS flow set up performance by avoid sending
matching new flows to the user space program. The exact performance boost
will largely dependent on wildcarded flow hit rate.

In case all new flows hits wildcard flows, the flow set up rate is
within 5% of that of linux bridge module.

Pravin has made significant contributions to this patch. Including API
clean ups and bug fixes.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: check CONFIG_OPENVSWITCH_GRE in makefile
Cong Wang [Tue, 20 Aug 2013 17:48:15 +0000 (10:48 -0700)]
openvswitch: check CONFIG_OPENVSWITCH_GRE in makefile

Cc: Jesse Gross <jesse@nicira.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Fix argument descriptions in vport.c.
Justin Pettit [Tue, 20 Aug 2013 00:49:29 +0000 (17:49 -0700)]
openvswitch: Fix argument descriptions in vport.c.

Signed-off-by: Justin Pettit <jpettit@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch:: link upper device for port devices
Jiri Pirko [Fri, 26 Jul 2013 12:01:54 +0000 (14:01 +0200)]
openvswitch:: link upper device for port devices

Link upper device properly. That will make IFLA_MASTER filled up.
Set the master to port 0 of the datapath under which the port belongs.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Use non rcu hlist_del() flow table entry.
Pravin B Shelar [Tue, 30 Jul 2013 22:45:59 +0000 (15:45 -0700)]
openvswitch: Use non rcu hlist_del() flow table entry.

Flow table destroy is done in rcu call-back context.  Therefore
there is no need to use rcu variant of hlist_del().

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
11 years agoopenvswitch: Use RCU lock for dp dump operation.
Pravin B Shelar [Tue, 30 Jul 2013 22:42:19 +0000 (15:42 -0700)]
openvswitch: Use RCU lock for dp dump operation.

RCUfy dp-dump operation which is already read-only. This
makes all ovs dump operations lockless.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>