git.karo-electronics.de Git - linux-beck.git/log

net: Make enabling of zero UDP6 csums more restrictive

RFC 6935 permits zero checksums to be used in IPv6 however this is
recommended only for certain tunnel protocols, it does not make
checksums completely optional like they are in IPv4.

This patch restricts the use of IPv6 zero checksums that was previously
intoduced. no_check6_tx and no_check6_rx have been added to control
the use of checksums in UDP6 RX and TX path. The normal
sk_no_check_{rx,tx} settings are not used (this avoids ambiguity when
dealing with a dual stack socket).

A helper function has been added (udp_set_no_check6) which can be
called by tunnel impelmentations to all zero checksums (send on the
socket, and accept them as valid).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Split sk_no_check into sk_no_check_{rx,tx}

Define separate fields in the sock structure for configuring disabling
checksums in both TX and RX-- sk_no_check_tx and sk_no_check_rx.
The SO_NO_CHECK socket option only affects sk_no_check_tx. Also,
removed UDP_CSUM_* defines since they are no longer necessary.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Eliminate no_check from protosw

It doesn't seem like an protocols are setting anything other
than the default, and allowing to arbitrarily disable checksums
for a whole protocol seems dangerous. This can be done on a per
socket basis.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sunrpc: Remove sk_no_check setting

Setting sk_no_check to UDP_CSUM_NORCV seems to have no effect.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to igb, igbvf, ixgbe, i40e and i40evf.

Jacob provides eight patches to cleanup the ixgbe driver to resolve various
checkpatch.pl warnings/errors as well as minor coding style issues.

Stephen Hemminger and I provide simple cleanups of void functions which
had useless return statements at the end of the function which are not
needed.

v2: Dropped Emil's patch "ixgbe: fix the detection of SFP+ capable interfaces"
while I wait for his updated patch to be validated.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mvneta-next'

Ezequiel Garcia says:

====================
net: ethernet: marvell: Assorted fixes

New round for this assorted fixes and clean-up series. There is more room for
clean-ups, and I'll start preparing more patches once these are accepted.

This series consists of cleanups and minor improvements on mvneta, mv643xx_eth
and mvmdio drivers. None of the patches imply any functionality change, except
for the patch six "Change the number of default rx queues to one".

This patch reduces the driver's allocated resources and makes the multiqueue
path in the poll function not get taken. The previous patchset contains more
details:

  http://permalink.gmane.org/gmane.linux.network/315015

As usual, any feedback on this will be well received!

Changes from v2:

  * Rebased on today's net-next and dropped patch
    "net: mvneta: Factorize feature setting", merged in the recent
    TSO series.

  * As per Sergei suggestion, used devm_kcalloc or devm_kmalloc_array
    when suitable.

Changes from v1:

  * Added two more clean-up patches to the series.

  * Added Sebastian's Acked-by's.

  * Fixed extra empty line in "net: mv643xx_eth: Simplify
    mv643xx_eth_adjust_link()" as pointed out by David Miller.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: Remove unneeded 'weigth' field

The 'weight' field is only used to pass the weigth to napi initialization
function. This commit removes the field, and instead uses a fixed value to
initialize the napi context.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvmdio: Use devm_* API to simplify the code

This commit makes use of devm_kmalloc_array() for memory allocation and the
recently introduced devm_mdiobus_alloc() API to simplify driver's code.
While here, remove a redundant out of memory error message.

Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: Change the number of default rx queues to one

The driver does not support multiple rx queues, and so it's a waste
of resources to have a default number larger than one (1).

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: Use prepare/commit API to simplify MAC address setting

Use eth_prepare_mac_addr_change and eth_commit_mac_addr_change, instead
of manually checking and storing the MAC address, which makes the
code slightly more robust. This fixes the lack of valid MAC address check
in the driver's .ndo_set_mac_address hook.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: Clean-up mvneta_init()

This commit cleans-up mvneta_init(), which initializes the hardware
and allocates the rx/qx queues. The queue allocation is simplified
by using devm_kcalloc instead of kzalloc. The unused phy_addr parameter
is removed. While here, the 'hal' references in the comments are removed.
This commit makes no functionality change.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: Check tx queue setup error in mvneta_change_mtu()

This commit checks the return code of mvneta_setup_txq() call
in mvneta_change_mtu(). Also, use the netdevice pointer directly
instead of dereferencing the port structure. While here, let's
fix a tiny comment typo.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: Clean-up mvneta_tx_frag_process()

A tiny clean-up to improve readability. This commit makes no functionality
change.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mv643xx_eth: Simplify mv643xx_eth_adjust_link()

Currently, mv643xx_eth_adjust_link() is only used to call mv643xx_adjust_pscr().
This commit renames the latter to the former, and therefore removes the extra
and useless function.

Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lib/test_bpf.c: don't use gcc union shortcut

Older gcc's (mine is gcc-4.4.4) make a mess of this.

lib/test_bpf.c:74: error: unknown field 'insns' specified in initializer
lib/test_bpf.c:75: warning: missing braces around initializer
lib/test_bpf.c:75: warning: (near initialization for 'tests[0].<anonymous>.insns[0]')
lib/test_bpf.c:76: error: extra brace group at end of initializer
lib/test_bpf.c:76: error: (near initialization for 'tests[0].<anonymous>')
lib/test_bpf.c:76: warning: excess elements in union initializer
lib/test_bpf.c:76: warning: (near initialization for 'tests[0].<anonymous>')
lib/test_bpf.c:77: error: extra brace group at end of initializer

Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next:v4: Add support to configure SR-IOV VF minimum and maximum Tx rate through ip tool.

o min_tx_rate puts lower limit on the VF bandwidth. VF is guaranteed
  to have a bandwidth of at least this value.
  max_tx_rate puts cap on the VF bandwidth. VF can have a bandwidth
  of up to this value.

o A new handler set_vf_rate for attr IFLA_VF_RATE has been introduced
  which takes 4 arguments:
  netdev, VF number, min_tx_rate, max_tx_rate

o ndo_set_vf_rate replaces ndo_set_vf_tx_rate handler.

o Drivers that currently implement ndo_set_vf_tx_rate should now call
  ndo_set_vf_rate instead and reject attempt to set a minimum bandwidth
  greater than 0 for IFLA_VF_TX_RATE when IFLA_VF_RATE is not yet
  implemented by driver.

o If user enters only one of either min_tx_rate or max_tx_rate, then,
  userland should read back the other value from driver and set both
  for IFLA_VF_RATE.
  Drivers that have not yet implemented IFLA_VF_RATE should always
  return min_tx_rate as 0 when read from ip tool.

o If both IFLA_VF_TX_RATE and IFLA_VF_RATE options are specified, then
  IFLA_VF_RATE should override.

o Idea is to have consistent display of rate values to user.

o Usage example: -

  ./ip link set p4p1 vf 0 rate 900

  ./ip link show p4p1
  32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
  DEFAULT qlen 1000
    link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 900 (Mbps), max_tx_rate 900Mbps
    vf 1 MAC f6:c6:7c:3f:3d:6c
    vf 2 MAC 56:32:43:98:d7:71
    vf 3 MAC d6:be:c3:b5:85:ff
    vf 4 MAC ee:a9:9a:1e:19:14
    vf 5 MAC 4a:d0:4c:07:52:18
    vf 6 MAC 3a:76:44:93:62:f9
    vf 7 MAC 82:e9:e7:e3:15:1a

  ./ip link set p4p1 vf 0 max_tx_rate 300 min_tx_rate 200

  ./ip link show p4p1
  32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
  DEFAULT qlen 1000
    link/ether 00:0e:1e:08:b0:f0 brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 3e:a0:ca:bd:ae:5a, tx rate 300 (Mbps), max_tx_rate 300Mbps,
    min_tx_rate 200Mbps
    vf 1 MAC f6:c6:7c:3f:3d:6c
    vf 2 MAC 56:32:43:98:d7:71
    vf 3 MAC d6:be:c3:b5:85:ff
    vf 4 MAC ee:a9:9a:1e:19:14
    vf 5 MAC 4a:d0:4c:07:52:18
    vf 6 MAC 3a:76:44:93:62:f9
    vf 7 MAC 82:e9:e7:e3:15:1a

  ./ip link set p4p1 vf 0 max_tx_rate 600 rate 300

  ./ip link show p4p1
  32: p4p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
  DEFAULT qlen 1000
    link/ether 00:0e:1e:08:b0:f brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 3e:a0:ca:bd:ae:5, tx rate 600 (Mbps), max_tx_rate 600Mbps,
    min_tx_rate 200Mbps
    vf 1 MAC f6:c6:7c:3f:3d:6c
    vf 2 MAC 56:32:43:98:d7:71
    vf 3 MAC d6:be:c3:b5:85:ff
    vf 4 MAC ee:a9:9a:1e:19:14
    vf 5 MAC 4a:d0:4c:07:52:18
    vf 6 MAC 3a:76:44:93:62:f9
    vf 7 MAC 82:e9:e7:e3:15:1a

Signed-off-by: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hyperv: Add hash value into RNDIS Per-packet info

It passes the hash value as the RNDIS Per-packet info to the Hyper-V host,
so that the send completion notices can be spread across multiple channels.
MS-TFS: 140273

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pshelar/openvswitch

Pravin B Shelar says:

====================
Open vSwitch

A set of OVS changes for net-next/3.16.

Most of change are related to improving performance of flow setup by
minimizing critical sections.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

i40e,igb,ixgbe: remove usless return statements

Remove cases where useless bare return is left at end of function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb/ixgbe: remove return statements for void functions

Remove useless return statements for void functions which do not need
it.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>

ixgbe: add /* fallthrough */ comment to case statements

This semicomplex switch-case has various fallthrough portions, that were
not indicated by a /* fallthrough */ comment.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: add space between operands to &

This patch cleans up a checkpatch.pl style warning in the ixgbe code.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: don't check NULL for debugfs_remove_recursive

The debugfs_remove_recursive function is NULL-safe, so we don't need to
check here ourselves.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: add braces around else block

This commit fixes a checkpatch.pl warning for style, by adding braces
around the else block, since the if block requires braces.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix several concatenated strings to single line

This patch fixes various log strings that are split over multiple lines
in the ixgbe driver. This cleans up checkpatch.pl warnings, and makes it
easier to search the code for warning strings displayed to the kernel
log.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix checkpatch style of blank line after declaration

This patch fixes checkpatch warnings in ixgbe, by adding a blank line
between declaration and code blocks.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix function-like macro, remove semicolon

This patch removes the semicolon from the end of the do-while(0)
construct in two function-like macros.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: clean up checkpatch warnings about CODE_INDENT and LEADING_SPACE

The contents of this patch were originally generated by
"scripts/checkpatch.pl --fix-inplace --types CODE_INDENT,LEADING_SPACE
drivers/net/ethernet/ixgbe/*.[ch]", and then hand verified for
consistency.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

openvswitch: Simplify genetlink code.

Following patch get rid of struct genl_family_and_ops which is
redundant due to changes to struct genl_family.

Signed-off-by: Kyle Mestery <mestery@noironetworks.com>
Acked-by: Kyle Mestery <mestery@noironetworks.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Minimize ovs_flow_cmd_new|set critical sections.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Split ovs_flow_cmd_new_or_set().

Following patch will be easier to reason about with separate
ovs_flow_cmd_new() and ovs_flow_cmd_set() functions.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Minimize ovs_flow_cmd_del critical section.

ovs_flow_cmd_del() now allocates reply (if needed) after the flow has
already been removed from the flow table. If the reply allocation
fails, a netlink error is signaled with netlink_set_err(), as is
already done in ovs_flow_cmd_new_or_set() in the similar situation.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Reduce locking requirements.

Reduce and clarify locking requirements for ovs_flow_cmd_alloc_info(),
ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info().

A datapath pointer is available only when holding a lock.  Change
ovs_flow_cmd_fill_info() and ovs_flow_cmd_build_info() to take a
dp_ifindex directly, rather than a datapath pointer that is then
(only) used to get the dp_ifindex.  This is useful, since the
dp_ifindex is available even when the datapath pointer is not, both
before and after taking a lock, which makes further critical section
reduction possible.

Make ovs_flow_cmd_alloc_info() take an 'acts' argument instead a
'flow' pointer.  This allows some future patches to do the allocation
before acquiring the flow pointer.

The locking requirements after this patch are:

ovs_flow_cmd_alloc_info(): May be called without locking, must not be
called while holding the RCU read lock (due to memory allocation).
If 'acts' belong to a flow in the flow table, however, then the
caller must hold ovs_mutex.

ovs_flow_cmd_fill_info(): Either ovs_mutex or RCU read lock must be held.

ovs_flow_cmd_build_info(): This calls both of the above, so the caller
must hold ovs_mutex.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Fix ovs_flow_stats_get/clear RCU dereference.

For ovs_flow_stats_get() using ovsl_dereference() was wrong, since
flow dumps call this with RCU read lock.

ovs_flow_stats_clear() is always called with ovs_mutex, so can use
ovsl_dereference().

Also, make the ovs_flow_stats_get() 'flow' argument const to make
later patches cleaner.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Fix typo.

Incorrect struct name was confusing, even though otherwise
inconsequental.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Minimize dp and vport critical sections.

Move most memory allocations away from the ovs_mutex critical
sections. vport allocations still happen while the lock is taken, as
changing that would require major refactoring. Also, vports are
created very rarely so it should not matter.

Change ovs_dp_cmd_get() now only takes the rcu_read_lock(), rather
than ovs_lock(), as nothing need to be changed. This was done by
ovs_vport_cmd_get() already.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Make flow mask removal symmetric.

Masks are inserted when flows are inserted to the table, so it is
logical to correspondingly remove masks when flows are removed from
the table, in ovs_flow_table_remove().

This allows ovs_flow_free() to be called without locking, which will
be used by later patches.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Build flow cmd netlink reply only if needed.

Use netlink_has_listeners() and NLM_F_ECHO flag to determine if a
reply is needed or not for OVS_FLOW_CMD_NEW, OVS_FLOW_CMD_SET, or
OVS_FLOW_CMD_DEL. Currently, OVS userspace does not request a reply
for OVS_FLOW_CMD_NEW, but usually does for OVS_FLOW_CMD_DEL, as stats
may have changed.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Clarify locking.

Remove unnecessary locking from functions that are always called with
appropriate locking.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Avoid assigning a NULL pointer to flow actions.

Flow SET can accept an empty set of actions, with the intended
semantics of leaving existing actions unmodified. This seems to have
been brokin after OVS 1.7, as we have assigned the flow's actions
pointer to NULL in this case, but we never check for the NULL pointer
later on. This patch restores the intended behavior and documents it
in the include/linux/openvswitch.h.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

openvswitch: Compact sw_flow_key.

Minimize padding in sw_flow_key and move 'tp' top the main struct.
These changes simplify code when accessing the transport port numbers
and the tcp flags, and makes the sw_flow_key 8 bytes smaller on 64-bit
systems (128->120 bytes).  These changes also make the keys for IPv4
packets to fit in one cache line.

There is a valid concern for safety of packing the struct
ovs_key_ipv4_tunnel, as it would be possible to take the address of
the tun_id member as a __be64 * which could result in unaligned access
in some systems. However:

- sw_flow_key itself is 64-bit aligned, so the tun_id within is
  always
  64-bit aligned.
- We never make arrays of ovs_key_ipv4_tunnel (which would force
  every
  second tun_key to be misaligned).
- We never take the address of the tun_id in to a __be64 *.
- Whereever we use struct ovs_key_ipv4_tunnel outside the
  sw_flow_key,
  it is in stack (on tunnel input functions), where compiler has full
  control of the alignment.

Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>

Merge branch 'mlx4-next'

Amir Vadai says:

====================
net/mlx4_core: Deprecate module parameter use_prio

This small patchset deprecates the mlx4_core module paramater 'use_prio', as
suggested by Carol Soto from IBM in [1].
Also, replaced some calls to the prefered pr_warn/info/devel macro's.

Patchset was applied and tested on commit b6052af: "Merge tag
'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge"

[1] - http://marc.info/?l=linux-netdev&m=139871350103432&w=2
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Replace pr_warning() with pr_warn()

As checkpatch suggests. Also changed some printk's into pr_*

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Deprecate use_prio module parameter

use_prio was added as part of an infrastructure for running FCoE in A0 mode.
FCoE didn't get into Mellanox Upstream driver, and when it will, it won't be
using A0 steering mode.

Therefore we can safely deprecate this module parameter without hurting any
existing user.

CC: Carol Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2014-05-22

This is the last ipsec pull request before I leave for
a three weeks vacation tomorrow. David, can you please
take urgent ipsec patches directly into net/net-next
during this time?

I'll continue to run the ipsec/ipsec-next trees as soon
as I'm back.

1) Simplify the xfrm audit handling, from Tetsuo Handa.

2) Codingstyle cleanup for xfrm_output, from abian Frederick.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ieee802154: missing put_dev() on error

We should call put_dev() on the error path here.

Fixes: 3e9c156e2c21 ('ieee802154: add netlink interfaces for llsec')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cdc_ncm: fix typo in test for supported formats

There is a typo here where we test for USB_CDC_NCM_NTH32_SIGN instead
of USB_CDC_NCM_NTB32_SUPPORTED. The test probably still works as
written because 0x686D636E has (1 << 1) set and doesn't have (1 << 0)
set.

Fixes: f8afb73da375 ('net: cdc_ncm: factor out one-time device initialization')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xilinx: Use time_before_eq()

To be future-proof and for better readability the time comparisons are modified
to use time_before_eq() instead of plain, error-prone math.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

micrel: Use time_before_eq()

To be future-proof and for better readability the time comparisons are modified
to use time_before_eq() instead of plain, error-prone math.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Use mod_timer()

The code for resetting the timer can be simplified if mod_timer() is used
instead of del_timer() followed by add_timer().

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Use time_before()

To be future-proof and for better readability the time comparisons are modified
to use time_before() instead of plain, error-prone math.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlogic: Use time_before()

To be future-proof and for better readability the time comparisons are modified
to use time_before() instead of plain, error-prone math.

Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bonding'

Veaceslav Falico says:

====================
bonding: fix enslaving a dev without mtu setting support

With the introduction of bond_free_slave() we need to have slave->bond
populated before calling it, however if the dev_mtu_set(slave, mtu) fails,
we call bond_free_slave() before actually setting slave->bond, and thus
we'll panic.

Fix this by populating slave->bond (and ->dev, it seems appropriate) as
early as possible.

Also, remove a harmful check for NULL in bond_get_bond_by_slave(), as it's
only hiding the real problem and making it harder to debug.
====================

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: remove NULL verification from bond_get_bond_by_slave()

Every caller relies on the result being the actual bond, so this
verification just masks the real problem.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: populate essential new_slave->bond/dev early

The new bond_free_slave() needs new_slave->bond to verify if additional
structures were allocated, so populate it early so that, in case of failure
in bond_enslave(), we would be able to get it.

Also populate the new_slave->dev field, as it's too one of the most needed
things to assign early.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@gmail.com>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'genphy'

Sascha Hauer says:

====================
make of_set_phy_supported work with genphy driver

The mdio phys recently gained support for specifying the phy speed
via devicetree. This currently only works with the hardware specific
phy drivers but not with the genphy driver. This series fixes this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: make of_set_phy_supported work with genphy driver

of_set_phy_supported allows overwiting hardware capabilities of
a phy with values from the devicetree. of_set_phy_supported is
called right after phy_device_register in the assumption that
phy_probe is called from phy_device_register and the features
of the phy are already initialized. For the genphy driver this
is not true, here phy_probe is called later during phy_connect
time. phy_probe will then overwrite all settings done from
of_set_phy_supported
Fix this by moving of_set_phy_supported to the core phy code
and calling it from phy_probe.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: genphy: Allow overwriting features

of_set_phy_supported allows overwiting hardware capabilities of
a phy with values from the devicetree. This does not work with
the genphy driver though because the genphys config_init function
will overwrite all values adjusted by of_set_phy_supported. Fix
this by initialising the genphy features in the phy_driver struct
and in config_init just limit the features to the ones the hardware
can actually support. The resulting features are a subset of the
devicetree specified features and the hardware features.

Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-next-for-3.16-20140521' of git://gitorious.org/linux-can/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2014-05-19

this is a pull request of a single patch for net-next/master. It fixes a
use after free(), which slipped into to gs_usb driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: make br_device_notifier static

Merge net/bridge/br_notify.c into net/bridge/br.c,
since it has only br_device_event() and br.c is small.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/dccp/timer.c: use 'u64' instead of 's64' to avoid compiler's warning

'dccp_timestamp_seed' is initialized once by ktime_get_real() in
dccp_timestamping_init(). It is always less than ktime_get_real()
in dccp_timestamp().

Then, ktime_us_delta() in dccp_timestamp() will always return positive
number. So can use manual type cast to let compiler and do_div() know
about it to avoid warning.

The related warning (with allmodconfig under unicore32):

    CC [M]  net/dccp/timer.o
  net/dccp/timer.c: In function ‘dccp_timestamp’:
  net/dccp/timer.c:285: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac802154: llsec: correctly lookup implicit-indexed keys

Key id comparison for type 1 keys (implicit source, with index) should
return true if mode and id are equal, not false.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mac80211'

Phoebe Buckheister says:

====================
mac802154: llsec oversights

Fixes an unlock operation not matching a previous lock operation in an
unlikely error path and removes a redundant check.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mac802154: llsec: fold useless return value check

llsec_do_encrypt will never return a positive value, so the restriction
to 0-or-negative on return is useless.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac802154: llsec: fix incorrect lock pairing

In encrypt, sec->lock is taken with read_lock_bh, so in the error path,
we must read_unlock_bh.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: davinci_emac: fix oops caused by uninitialized ndev->dev

Commit e194312854edc22a2faf1931b3c0608fe20cb969 (drivers: net:
davinci_cpdma: Convert kzalloc() to devm_kzalloc()) triggered
a bug in emac_probe() wherein dev member of net_device is used
for devres allocations even before it is initialized.

This patch fixes that by using the struct device in platform_device
instead.

While at it, use &pdev->dev consistently for console messages instead
of using ndev->dev for just one case and remove an unnecessary line
continuation.

Reported-by: Kevin Hilman <khilman@linaro.org>
Helped-by: George Cherian <george.cherian@ti.com>
Signed-off-by: Sekhar Nori <nsekhar@ti.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Tested-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fixed_phy'

Florian Fainelli says:

====================
net: of_phy_connect_fixed_link removal

This patch set removes of_phy_connect_fixed_link() from the tree now that
we have a better solution for dealing with fixed PHY (emulated PHY) devices
for drivers that require them.

First two patches update the 'fixed-link' Device Tree binding and drivers to
refere to it.

Patches 3 to 7 update the in-tree network drivers that use
of_phy_connect_fixed_link()

Patch 8 removes of_phy_connect_fixed_link

Patch 9 removes the PowerPC code that parsed the 'fixed-link' property.

Patch 9 can be merged via the net-next tree if the PowerPC folks ack it,
but it really has to be merged after the first 8 patches in order to avoid
breakage.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

powerpc/fsl: fsl_soc: remove 'fixed-link' parsing code

Parsing and registration of fixed PHY devices was needed with the use of
of_phy_connect_fixed_link() because this function was using the
designated PHY address identifier (first cell of the property) as the
address to bind the PHY on the emulated bus.

Since commit 3be2a49e5c08d268f8af0dd4fe89a24ea8cdc339 ("of: provide a
binding for fixed link PHYs") a new pair of functions has been
introduced which allows for dynamic address allocation of these fixed
PHY devices, but also parses the old 'fixed-link' 5-digit property.

Registration of fixed PHY early in platform code was needed because we
could not issue a fixed MDIO bus re-scan within network drivers. The
fixed PHYs had to be registered before the network drivers would call
of_phy_connect_fixed_link(). All of these caveats are solved now, such
that we can safely remove of_add_fixed_phys() now.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

of: mdio: remove of_phy_connect_fixed_link

All in-tree drivers have been converted to use the new pair of
functions: of_is_fixed_phy_link() plus of_phy_register_fixed_link(), we
can now safely remove of_phy_connect_fixed_link.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ucc_geth: use the new fixed PHY helpers

of_phy_connect_fixed_link() is becoming obsolete, and also required
platform code to register the fixed PHYs at the specified addresses for
those to be usable. Get rid of it and use the new of_phy_is_fixed_link()
plus of_phy_register_fixed_link() helpers to transition over the new
scheme.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: use the new fixed PHY helpers

of_phy_connect_fixed_link() is becoming obsolete, and also required
platform code to register the fixed PHYs at the specified addresses for
those to be usable. Get rid of it and use the new of_phy_is_fixed_link()
plus of_phy_register_fixed_link() helpers to transition over the new
scheme.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fs_enet: use the new fixed PHY helpers

of_phy_connect_fixed_link() is becoming obsolete, and also required
platform code to register the fixed PHYs at the specified addresses for
those to be usable. Get rid of it and use the new of_phy_is_fixed_link()
plus of_phy_register_fixed_link() helpers to transition over the new
scheme.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: systemport: use the new fixed PHY helpers

of_phy_connect_fixed_link() is becoming obsolete, and also required
platform code to register the fixed PHYs at the specified addresses for
those to be usable. Get rid of it and use the new of_phy_is_fixed_link()
plus of_phy_register_fixed_link() helpers to transition over the new
scheme.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: use the new fixed PHY helpers

of_phy_connect_fixed_link() is becoming obsolete, and also required
platform code to register the fixed PHYs at the specified addresses for
those to be usable. Get rid of it and use the new of_phy_is_fixed_link()
plus of_phy_register_fixed_link() helpers to transition over the new
scheme.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: devicetree: net: refer to fixed-link.txt

Update the Freescale TSEC PHY, Broadcom GENET & SYSTEMPORT Device Tree
binding documentation to refer to the fixed-link Device Tree binding in
fixed-link.txt.

Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: devicetree: add old and deprecated 'fixed-link'

Update the fixed-link Device Tree binding documentation to contain
information about the old and deprecated 5-digit 'fixed-link' property.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vlan_features'

Michal Kubecek says:

====================
net: device features handling fixes

> I think we need to think more about what exact behavior is desired in
> mixed csum feature set cases. Probably whatever csum offload type is
> the most prevelant should be the one advertised by the master. It
> means we will do that software checksum fixup for the oddball slaves,
> but it seems the best we can do.

I've been thinking about it some more and I believe what I proposed is
correct: features that are or-ed (ONE_FOR_ALL) should start with 0,
those which are and-ed (ALL_FOR_ALL) with 1.

But once we do that, we have to fix another problem in vlan module:
features of a vlan device are mostly computed as bitwise AND of features
and vlan_features of its underlying device. The problem is checksumming
features don't really behave like independent flags as their main
purpose is to be used in can_checksum_protocol() whose logic is that
HW_CSUM (GEN_CSUM) means "can checksum everything" so that it kind of
contains IP(V6)_CSUM functionality. Therefore intersection of HW_CSUM
and IP(V6)_CSUM should result in the latter.

An alternative approach would be to always set IP(V6)_CSUM once HW_CSUM
(any of GEN_CSUM) is set but as for long time people were taught not to
do that and we even have a warning for it, switching to the exact
opposite could cause a lot of problems.

> Also, like Jay, I agree that whatever we do in bonding needs to be
> done in team too and I'd like the bug fixed at the same time in both
> places.

Agreed. Added a similar patch for teaming driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

teaming: fix vlan_features computing

__team_compute_features() uses netdev_increment_features() to
combine vlan_features of slaves into vlan_features of the team.
As netdev_increment_features() only adds most features and we
start with TEAM_VLAN_FEATURES, we can end up with features none
of the slaves provided.

Initialize vlan_features only with the flags which are both in
TEAM_VLAN_FEATURES and NETIF_F_ALL_FOR_ALL. Right now there is
no such feature so that we actually initialize vlan_features
with zero but stating it explicitely will make the code more
future proof.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: fix vlan_features computing

bond_compute_features() uses netdev_increment_features() to
combine vlan_features of slaves into vlan_features of the bond.
As netdev_increment_features() only adds most features and we
start with BOND_VLAN_FEATURES, we can end up with features none
of the slaves provided.

If there is at least one slave, initialize vlan_features only
with the flags in NETIF_F_ALL_FOR_ALL. Right now there is none
in BOND_VLAN_FEATURES but stating it explicitely will make the
code more future proof.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: more careful checksum features handling

When combining real_dev's features and vlan_features, simple
bitwise AND is used. This doesn't work well for checksum
offloading features as if one set has NETIF_F_HW_CSUM and the
other NETIF_F_IP_CSUM and/or NETIF_F_IPV6_CSUM, we end up with
no checksum offloading. However, from the logical point of view
(how can_checksum_protocol() works), NETIF_F_HW_CSUM contains
the functionality of NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM so
that the result should be IP/IPV6.

Add helper function netdev_intersect_features() implementing
this logic and use it in vlan_dev_fix_features().

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: correct the MDIO clock source

Since imx serials FEC/ENET MDIO clock source is internal ipg clock,
and "ahb" clock is defined as FEC/ENET bus clock, so the patch just
correct the fec driver MDIO clock source.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Acked-by: Frank Li <frank.li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: optimize the clock management to save power

Add below clock management to save fec power:
- After probe, disable all clocks incluing ipg, ahb, enet_out, ptp clock.
- Open ethx interface enable necessary clocks.
Close ethx interface disable all clocks.

The patch also encapsulates the all enet clocks enable/disable to
.fec_enet_clk_enable(), which can reduce the repetitional code in
driver.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Acked-by: Frank Li <Frank.li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sw_tso'

Ezequiel Garcia says:

====================
net: Introduce a software TSO helper API

Here's a first proposal for a generic software TSO helper API, following
David's suggestion.

There are at least two drivers that currently implement some form of software
TSO: Solarflare network driver (drivers/net/ethernet/sfc) and Tilera GX
network driver (drivers/net/ethernet/tile/tilegx.c).

The rationale behind adding a generic API is to provide a boiler plate with the
segmentation and other common tasks, making this support easier to add in other
drivers.

When designing the API, I've considered mainly two design choices:

  1. Implement a series of callbacks that each driver would implement
     and the net core code would call to fill in descriptors and egress
     that data.

  2. Implement an API for drivers to use in a driver's specific tx_tso
     function. This functions would exhaust a sk_buff payload, and use the
     API as helper for building the headers and segmented data.

I've chosen (2), to avoid function pointers (which was Willy's concern) and
because it seemed less fragile. Of course, this is argueable.

The API is by no means complete, and lacks some features, however it allows
to support TSO in mv643xx_eth and mvneta network drivers with some very
good performance results. I've added this support as an example of the API
in action.

In particular the following needs some revisiting:

  1. IPv6 support is lacking.

  2. The required descriptor counting needs some verification. The current
     implementation might be too "sketchy". The tilegx one can be a good
     starting point.

  3. The implemenation assumes the hardware can compute the TCP and IP
     checksums for the built headers. However, some controllers may need
     some initial calculation (such as tilegx, for instance).

Despite this, I hope this proposal is good enough to trigger some discussion
and to check if I'm on the right track. Feedback is much appreciated!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mv643xx_eth: Implement software TSO

Now that the TSO helper API has been introduced, this commit makes use
of it to add support for software TSO in this driver.

This feature allows to improve outbound throughput performance significantly.
Running iperf tests shows a 30% improvement, tested on a Kirkwood Openblocks
A6 board.

$ ethtool -K eth0 tso off
$ iperf -c 192.168.0.45 -t 3
------------------------------------------------------------
Client connecting to 192.168.0.45, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[  3] local 192.168.0.159 port 46389 connected with 192.168.0.45 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 3.0 sec   217 MBytes   607 Mbits/sec

$ ethtool -K eth0 tso on
$ iperf -c 192.168.0.45 -t 3
------------------------------------------------------------
Client connecting to 192.168.0.45, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[  3] local 192.168.0.159 port 46390 connected with 192.168.0.45 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 3.0 sec   336 MBytes   938 Mbits/sec

This commit is just an example of the usage of the TSO API, it works fine
but needs some more work. In particular, the descriptor unmapping path must
avoid unmapping the TSO headers.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mv643xx_eth: Use dma_map_single() to map the skb fragments

Using dma_map_single() instead of skb_frag_dma_map() allows to unmap
all the descriptors using dma_unmap_single(). This change allows
to introduce software TSO in a less intrusive way.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mv643xx_eth: Factorize feature setting

In order to ease the addition of new features, let's factorize the
feature list.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mv643xx_eth: Avoid setting the initial TCP checksum

As specified in the datasheet, the driver can set the "L4Chk_Mode" flag
(bit 10) in the Tx descriptor command/status to specify that a frame is not
IP fragmented and that the controller is in charge of generating the TCP/IP
checksum. This must be used together with the "GL4chk" flag (bit 17).

These two flags allow to avoid setting the initial TCP checksum in the l4i_chk
field of the Tx descriptor, which is needed to support software TSO.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mv643xx_eth: Factorize initial checksum and command preparation

Make the code more readable by moving the initial checksum setup
and the command/status preparation to its own function.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: Implement software TSO

Now that the TSO helper API has been introduced, this commit makes use
of it to implement the TSO in this driver.

Using iperf to test and vmstat to check the CPU usage, shows a substantial
CPU usage drop when TSO is on (~15% vs. ~25%). HTTP-based tests performed
by Willy Tarreau have shown performance improvements.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: Clean mvneta_tx() sk_buff handling

Rework mvneta_tx() so that the code that performs the final handling
before a sk_buff is transmitted is done only if the numbers of fragments
processed if positive.

This is preparation work to add the support for software TSO.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: Factorize feature setting

In order to ease the addition of new features, let's factorize the
feature list.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add a software TSO helper API

Although the implementation probably needs a lot of work, this initial API
allows to implement software TSO in mvneta and mv643xx_eth drivers in a not
so intrusive way.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables

Pablo Neira Ayuso says:

====================
Netfilter/nftables updates for net-next

The following patchset contains Netfilter/nftables updates for net-next,
most relevantly they are:

1) Add set element update notification via netlink, from Arturo Borrero.

2) Put all object updates in one single message batch that is sent to
   kernel-space. Before this patch only rules where included in the batch.
   This series also introduces the generic transaction infrastructure so
   updates to all objects (tables, chains, rules and sets) are applied in
   an all-or-nothing fashion, these series from me.

3) Defer release of objects via call_rcu to reduce the time required to
   commit changes. The assumption is that all objects are destroyed in
   reverse order to ensure that dependencies betweem them are fulfilled
   (ie. rules and sets are destroyed first, then chains, and finally
   tables).

4) Allow to match by bridge port name, from Tomasz Bursztyka. This series
   include two patches to prepare this new feature.

5) Implement the proper set selection based on the characteristics of the
   data. The new infrastructure also allows you to specify your preferences
   in terms of memory and computational complexity so the underlying set
   type is also selected according to your needs, from Patrick McHardy.

6) Several cleanup patches for nft expressions, including one minor possible
   compilation breakage due to missing mark support, also from Patrick.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e and i40evf.

Shannon makes minor changes to the AdminQ interface to bring it up to
date.  Removes the hard coding of stats struct size in ethtool, in prep
for adding data fields which are configuration dependent.

Catherine removes some unused and unneeded PCI bus defines.

Jesse fixes the copyright headers and finishes up the removal of the PTP
Tx work functionality which allows us to rely on the Tx timesync interrupt.

Mitch provides a number of fixes and cleanups for i40e/i40evf based on
suggestions from Ben Hutchings.  First is to use a macro parameter for
ethtool stats instead of just assuming that a valid netdev variable
exists.  Second is not to tell ethtool that the VF can do 10GbaseT, when
it really has no idea what its link speed is, so set the supported value
to 0 instead.  Make the ethtool_ops structure constant since it is
extremely unlikely to change at runtime.  Ethtool consistently reports
0 values for our ITR settings because we never actually use them, so
fix this by setting the default values to the specified default values.

Greg avoids a compile error by wrapping the call to i40e_alloc_vfs() in
CONFIG_PCI_IOV because the function itself is wrapped in the same
conditional compile block.

Alexander Gordeev updates the driver to use the new pci_enable_msi_range()
and pci_enable_msix_range() or pci_enable_msi_exact() and
pci_enable_msix_exact().

Jean Sacren provides a fix where the wrong error code was being passed to
i40e_open().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: make cwnd-limited checks measurement-based, and gentler

Experience with the recent e114a710aa50 ("tcp: fix cwnd limited
checking to improve congestion control") has shown that there are
common cases where that commit can cause cwnd to be much larger than
necessary. This leads to TSO autosizing cooking skbs that are too
large, among other things.

The main problems seemed to be:

(1) That commit attempted to predict the future behavior of the
connection by looking at the write queue (if TSO or TSQ limit
sending). That prediction sometimes overestimated future outstanding
packets.

(2) That commit always allowed cwnd to grow to twice the number of
outstanding packets (even in congestion avoidance, where this is not
needed).

This commit improves both of these, by:

(1) Switching to a measurement-based approach where we explicitly
track the largest number of packets in flight during the past window
("max_packets_out"), and remember whether we were cwnd-limited at the
moment we finished sending that flight.

(2) Only allowing cwnd to grow to twice the number of outstanding
packets ("max_packets_out") in slow start. In congestion avoidance
mode we now only allow cwnd to grow if it was fully utilized.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wimax/i2400m: make return of 0 explicit

Delete unnecessary local variable whose value is always 0 and that hides
the fact that the result is always 0.

A simplified version of the semantic patch that fixes this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r exists@
local idexpression ret;
expression e;
position p;
@@

-ret = 0;
... when != ret = e
return
- ret
+ 0
;
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: cleanup invocation of internal BPF

Kernel API for classic BPF socket filters is:

sk_unattached_filter_create() - validate classic BPF, convert, JIT
SK_RUN_FILTER() - run it
sk_unattached_filter_destroy() - destroy socket filter

Cleanup internal BPF kernel API as following:

sk_filter_select_runtime() - final step of internal BPF creation.
  Try to JIT internal BPF program, if JIT is not available select interpreter
SK_RUN_FILTER() - run it
sk_filter_free() - free internal BPF program

Disallow direct calls to BPF interpreter. Execution of the BPF program should
be done with SK_RUN_FILTER() macro.

Example of internal BPF create, run, destroy:

  struct sk_filter *fp;

  fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL);
  memcpy(fp->insni, prog, prog_len * sizeof(fp->insni[0]));
  fp->len = prog_len;

  sk_filter_select_runtime(fp);

  SK_RUN_FILTER(fp, ctx);

  sk_filter_free(fp);

Sockets, seccomp, testsuite, tracing are using different ways to populate
sk_filter, so first steps of program creation are not common.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'enic-next'

Govindarajulu Varadarajan says:

====================
enic: Add adaptive coalescing interrupt support

This series add support for adaptive coalescing interrupt and updates
enic Maintainers.

v1->v2:
* Add commit log
* do vnic_intr_coalescing_timer_set only while enabling intr
* use ktime_get instead of hrtimer
* make enic_set_rx_coal_setting return type void
* change func name enic_apply_int_moderation to enic_calc_int_moderation
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: Update enic maintainers

Cc: Sujith Sankar <ssujith@cisco.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Neel Patel <neepatel@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: Add support for adaptive interrupt coalescing

This patch adds support for adaptive interrupt coalescing.

For small pkts with low pkt rate, we can decrease the coalescing interrupt
dynamically which decreases the latency. This however increases the cpu
utilization. Based on testing with different coal intr and pkt rate we came up
with a table(mod_table) with rx_rate and coalescing interrupt value where we
get low latency without significant increase in cpu. mod_table table stores
the coalescing timer percentage value for different throughputs.

Function enic_calc_int_moderation() calculates the desired coalescing intr timer
value. This function is called in driver rx napi_poll. The actual value is set
by enic_set_int_moderation() which is called when napi_poll is complete. i.e
when we unmask the rx intr.

Adaptive coal intr is support only when driver is using msix intr. Because
intr is not shared.

Struct mod_range is used to store only the default adaptive coalescing intr
value.

Adaptive coal intr calue is calculated by

timer = range_start + ((rx_coal->range_end - range_start) *
mod_table[index].range_percent / 100);

rx_coal->range_end is the rx-usecs-high value set using ethtool.
range_start is rx-usecs-low, set using ethtool, if rx_small_pkt_bytes_cnt is
greater than 2 * rx_large_pkt_bytes_cnt. i.e small pkts are dominant. Else its
rx-usecs-low + 3.

Cc: Christian Benvenuti <benve@cisco.com>
Cc: Neel Patel <neepatel@cisco.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>