use_prio was added as part of an infrastructure for running FCoE in A0 mode.
FCoE didn't get into Mellanox Upstream driver, and when it will, it won't be
using A0 steering mode.
Therefore we can safely deprecate this module parameter without hurting any
existing user.
CC: Carol Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This is the last ipsec pull request before I leave for
a three weeks vacation tomorrow. David, can you please
take urgent ipsec patches directly into net/net-next
during this time?
I'll continue to run the ipsec/ipsec-next trees as soon
as I'm back.
1) Simplify the xfrm audit handling, from Tetsuo Handa.
2) Codingstyle cleanup for xfrm_output, from abian Frederick.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 22 May 2014 07:53:06 +0000 (10:53 +0300)]
ieee802154: missing put_dev() on error
We should call put_dev() on the error path here.
Fixes: 3e9c156e2c21 ('ieee802154: add netlink interfaces for llsec') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 22 May 2014 07:52:29 +0000 (10:52 +0300)]
net: cdc_ncm: fix typo in test for supported formats
There is a typo here where we test for USB_CDC_NCM_NTH32_SIGN instead
of USB_CDC_NCM_NTB32_SUPPORTED. The test probably still works as
written because 0x686D636E has (1 << 1) set and doesn't have (1 << 0)
set.
Fixes: f8afb73da375 ('net: cdc_ncm: factor out one-time device initialization') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 May 2014 19:46:38 +0000 (15:46 -0400)]
Merge branch 'bonding'
Veaceslav Falico says:
====================
bonding: fix enslaving a dev without mtu setting support
With the introduction of bond_free_slave() we need to have slave->bond
populated before calling it, however if the dev_mtu_set(slave, mtu) fails,
we call bond_free_slave() before actually setting slave->bond, and thus
we'll panic.
Fix this by populating slave->bond (and ->dev, it seems appropriate) as
early as possible.
Also, remove a harmful check for NULL in bond_get_bond_by_slave(), as it's
only hiding the real problem and making it harder to debug.
====================
CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Wed, 21 May 2014 15:42:01 +0000 (17:42 +0200)]
bonding: remove NULL verification from bond_get_bond_by_slave()
Every caller relies on the result being the actual bond, so this
verification just masks the real problem.
CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Wed, 21 May 2014 15:42:00 +0000 (17:42 +0200)]
bonding: populate essential new_slave->bond/dev early
The new bond_free_slave() needs new_slave->bond to verify if additional
structures were allocated, so populate it early so that, in case of failure
in bond_enslave(), we would be able to get it.
Also populate the new_slave->dev field, as it's too one of the most needed
things to assign early.
CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@gmail.com> Acked-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 May 2014 19:43:27 +0000 (15:43 -0400)]
Merge branch 'genphy'
Sascha Hauer says:
====================
make of_set_phy_supported work with genphy driver
The mdio phys recently gained support for specifying the phy speed
via devicetree. This currently only works with the hardware specific
phy drivers but not with the genphy driver. This series fixes this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sascha Hauer [Wed, 21 May 2014 13:29:45 +0000 (15:29 +0200)]
net: phy: make of_set_phy_supported work with genphy driver
of_set_phy_supported allows overwiting hardware capabilities of
a phy with values from the devicetree. of_set_phy_supported is
called right after phy_device_register in the assumption that
phy_probe is called from phy_device_register and the features
of the phy are already initialized. For the genphy driver this
is not true, here phy_probe is called later during phy_connect
time. phy_probe will then overwrite all settings done from
of_set_phy_supported
Fix this by moving of_set_phy_supported to the core phy code
and calling it from phy_probe.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Sascha Hauer [Wed, 21 May 2014 13:29:44 +0000 (15:29 +0200)]
net: phy: genphy: Allow overwriting features
of_set_phy_supported allows overwiting hardware capabilities of
a phy with values from the devicetree. This does not work with
the genphy driver though because the genphys config_init function
will overwrite all values adjusted by of_set_phy_supported. Fix
this by initialising the genphy features in the phy_driver struct
and in config_init just limit the features to the ones the hardware
can actually support. The resulting features are a subset of the
devicetree specified features and the hardware features.
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 21 May 2014 00:30:00 +0000 (17:30 -0700)]
bridge: make br_device_notifier static
Merge net/bridge/br_notify.c into net/bridge/br.c,
since it has only br_device_event() and br.c is small.
Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chen Gang [Wed, 21 May 2014 00:19:34 +0000 (08:19 +0800)]
net/dccp/timer.c: use 'u64' instead of 's64' to avoid compiler's warning
'dccp_timestamp_seed' is initialized once by ktime_get_real() in
dccp_timestamping_init(). It is always less than ktime_get_real()
in dccp_timestamp().
Then, ktime_us_delta() in dccp_timestamp() will always return positive
number. So can use manual type cast to let compiler and do_div() know
about it to avoid warning.
The related warning (with allmodconfig under unicore32):
CC [M] net/dccp/timer.o
net/dccp/timer.c: In function ‘dccp_timestamp’:
net/dccp/timer.c:285: warning: comparison of distinct pointer types lacks a cast
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In encrypt, sec->lock is taken with read_lock_bh, so in the error path,
we must read_unlock_bh.
Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sekhar Nori [Tue, 20 May 2014 10:11:37 +0000 (15:41 +0530)]
net: davinci_emac: fix oops caused by uninitialized ndev->dev
Commit e194312854edc22a2faf1931b3c0608fe20cb969 (drivers: net:
davinci_cpdma: Convert kzalloc() to devm_kzalloc()) triggered
a bug in emac_probe() wherein dev member of net_device is used
for devres allocations even before it is initialized.
This patch fixes that by using the struct device in platform_device
instead.
While at it, use &pdev->dev consistently for console messages instead
of using ndev->dev for just one case and remove an unnecessary line
continuation.
Reported-by: Kevin Hilman <khilman@linaro.org> Helped-by: George Cherian <george.cherian@ti.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Acked-by: Mugunthan V N <mugunthanvnm@ti.com> Tested-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Tested-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch set removes of_phy_connect_fixed_link() from the tree now that
we have a better solution for dealing with fixed PHY (emulated PHY) devices
for drivers that require them.
First two patches update the 'fixed-link' Device Tree binding and drivers to
refere to it.
Patches 3 to 7 update the in-tree network drivers that use
of_phy_connect_fixed_link()
Patch 8 removes of_phy_connect_fixed_link
Patch 9 removes the PowerPC code that parsed the 'fixed-link' property.
Patch 9 can be merged via the net-next tree if the PowerPC folks ack it,
but it really has to be merged after the first 8 patches in order to avoid
breakage.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Parsing and registration of fixed PHY devices was needed with the use of
of_phy_connect_fixed_link() because this function was using the
designated PHY address identifier (first cell of the property) as the
address to bind the PHY on the emulated bus.
Since commit 3be2a49e5c08d268f8af0dd4fe89a24ea8cdc339 ("of: provide a
binding for fixed link PHYs") a new pair of functions has been
introduced which allows for dynamic address allocation of these fixed
PHY devices, but also parses the old 'fixed-link' 5-digit property.
Registration of fixed PHY early in platform code was needed because we
could not issue a fixed MDIO bus re-scan within network drivers. The
fixed PHYs had to be registered before the network drivers would call
of_phy_connect_fixed_link(). All of these caveats are solved now, such
that we can safely remove of_add_fixed_phys() now.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 22 May 2014 16:47:50 +0000 (09:47 -0700)]
of: mdio: remove of_phy_connect_fixed_link
All in-tree drivers have been converted to use the new pair of
functions: of_is_fixed_phy_link() plus of_phy_register_fixed_link(), we
can now safely remove of_phy_connect_fixed_link.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 22 May 2014 16:47:49 +0000 (09:47 -0700)]
ucc_geth: use the new fixed PHY helpers
of_phy_connect_fixed_link() is becoming obsolete, and also required
platform code to register the fixed PHYs at the specified addresses for
those to be usable. Get rid of it and use the new of_phy_is_fixed_link()
plus of_phy_register_fixed_link() helpers to transition over the new
scheme.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 22 May 2014 16:47:48 +0000 (09:47 -0700)]
gianfar: use the new fixed PHY helpers
of_phy_connect_fixed_link() is becoming obsolete, and also required
platform code to register the fixed PHYs at the specified addresses for
those to be usable. Get rid of it and use the new of_phy_is_fixed_link()
plus of_phy_register_fixed_link() helpers to transition over the new
scheme.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 22 May 2014 16:47:47 +0000 (09:47 -0700)]
fs_enet: use the new fixed PHY helpers
of_phy_connect_fixed_link() is becoming obsolete, and also required
platform code to register the fixed PHYs at the specified addresses for
those to be usable. Get rid of it and use the new of_phy_is_fixed_link()
plus of_phy_register_fixed_link() helpers to transition over the new
scheme.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 22 May 2014 16:47:46 +0000 (09:47 -0700)]
net: systemport: use the new fixed PHY helpers
of_phy_connect_fixed_link() is becoming obsolete, and also required
platform code to register the fixed PHYs at the specified addresses for
those to be usable. Get rid of it and use the new of_phy_is_fixed_link()
plus of_phy_register_fixed_link() helpers to transition over the new
scheme.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 22 May 2014 16:47:45 +0000 (09:47 -0700)]
net: bcmgenet: use the new fixed PHY helpers
of_phy_connect_fixed_link() is becoming obsolete, and also required
platform code to register the fixed PHYs at the specified addresses for
those to be usable. Get rid of it and use the new of_phy_is_fixed_link()
plus of_phy_register_fixed_link() helpers to transition over the new
scheme.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 22 May 2014 16:47:44 +0000 (09:47 -0700)]
Documentation: devicetree: net: refer to fixed-link.txt
Update the Freescale TSEC PHY, Broadcom GENET & SYSTEMPORT Device Tree
binding documentation to refer to the fixed-link Device Tree binding in
fixed-link.txt.
Reviewed-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 May 2014 19:07:31 +0000 (15:07 -0400)]
Merge branch 'vlan_features'
Michal Kubecek says:
====================
net: device features handling fixes
> I think we need to think more about what exact behavior is desired in
> mixed csum feature set cases. Probably whatever csum offload type is
> the most prevelant should be the one advertised by the master. It
> means we will do that software checksum fixup for the oddball slaves,
> but it seems the best we can do.
I've been thinking about it some more and I believe what I proposed is
correct: features that are or-ed (ONE_FOR_ALL) should start with 0,
those which are and-ed (ALL_FOR_ALL) with 1.
But once we do that, we have to fix another problem in vlan module:
features of a vlan device are mostly computed as bitwise AND of features
and vlan_features of its underlying device. The problem is checksumming
features don't really behave like independent flags as their main
purpose is to be used in can_checksum_protocol() whose logic is that
HW_CSUM (GEN_CSUM) means "can checksum everything" so that it kind of
contains IP(V6)_CSUM functionality. Therefore intersection of HW_CSUM
and IP(V6)_CSUM should result in the latter.
An alternative approach would be to always set IP(V6)_CSUM once HW_CSUM
(any of GEN_CSUM) is set but as for long time people were taught not to
do that and we even have a warning for it, switching to the exact
opposite could cause a lot of problems.
> Also, like Jay, I agree that whatever we do in bonding needs to be
> done in team too and I'd like the bug fixed at the same time in both
> places.
Agreed. Added a similar patch for teaming driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubeček [Tue, 20 May 2014 06:29:45 +0000 (08:29 +0200)]
teaming: fix vlan_features computing
__team_compute_features() uses netdev_increment_features() to
combine vlan_features of slaves into vlan_features of the team.
As netdev_increment_features() only adds most features and we
start with TEAM_VLAN_FEATURES, we can end up with features none
of the slaves provided.
Initialize vlan_features only with the flags which are both in
TEAM_VLAN_FEATURES and NETIF_F_ALL_FOR_ALL. Right now there is
no such feature so that we actually initialize vlan_features
with zero but stating it explicitely will make the code more
future proof.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubeček [Tue, 20 May 2014 06:29:35 +0000 (08:29 +0200)]
bonding: fix vlan_features computing
bond_compute_features() uses netdev_increment_features() to
combine vlan_features of slaves into vlan_features of the bond.
As netdev_increment_features() only adds most features and we
start with BOND_VLAN_FEATURES, we can end up with features none
of the slaves provided.
If there is at least one slave, initialize vlan_features only
with the flags in NETIF_F_ALL_FOR_ALL. Right now there is none
in BOND_VLAN_FEATURES but stating it explicitely will make the
code more future proof.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubeček [Tue, 20 May 2014 06:29:25 +0000 (08:29 +0200)]
vlan: more careful checksum features handling
When combining real_dev's features and vlan_features, simple
bitwise AND is used. This doesn't work well for checksum
offloading features as if one set has NETIF_F_HW_CSUM and the
other NETIF_F_IP_CSUM and/or NETIF_F_IPV6_CSUM, we end up with
no checksum offloading. However, from the logical point of view
(how can_checksum_protocol() works), NETIF_F_HW_CSUM contains
the functionality of NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM so
that the result should be IP/IPV6.
Add helper function netdev_intersect_features() implementing
this logic and use it in vlan_dev_fix_features().
Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
Nimrod Andy [Tue, 20 May 2014 05:23:09 +0000 (13:23 +0800)]
net: fec: correct the MDIO clock source
Since imx serials FEC/ENET MDIO clock source is internal ipg clock,
and "ahb" clock is defined as FEC/ENET bus clock, so the patch just
correct the fec driver MDIO clock source.
Signed-off-by: Fugang Duan <B38611@freescale.com> Acked-by: Frank Li <frank.li@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nimrod Andy [Tue, 20 May 2014 05:22:51 +0000 (13:22 +0800)]
net: fec: optimize the clock management to save power
Add below clock management to save fec power:
- After probe, disable all clocks incluing ipg, ahb, enet_out, ptp clock.
- Open ethx interface enable necessary clocks.
Close ethx interface disable all clocks.
The patch also encapsulates the all enet clocks enable/disable to
.fec_enet_clk_enable(), which can reduce the repetitional code in
driver.
Signed-off-by: Fugang Duan <B38611@freescale.com> Acked-by: Frank Li <Frank.li@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 May 2014 18:57:40 +0000 (14:57 -0400)]
Merge branch 'sw_tso'
Ezequiel Garcia says:
====================
net: Introduce a software TSO helper API
Here's a first proposal for a generic software TSO helper API, following
David's suggestion.
There are at least two drivers that currently implement some form of software
TSO: Solarflare network driver (drivers/net/ethernet/sfc) and Tilera GX
network driver (drivers/net/ethernet/tile/tilegx.c).
The rationale behind adding a generic API is to provide a boiler plate with the
segmentation and other common tasks, making this support easier to add in other
drivers.
When designing the API, I've considered mainly two design choices:
1. Implement a series of callbacks that each driver would implement
and the net core code would call to fill in descriptors and egress
that data.
2. Implement an API for drivers to use in a driver's specific tx_tso
function. This functions would exhaust a sk_buff payload, and use the
API as helper for building the headers and segmented data.
I've chosen (2), to avoid function pointers (which was Willy's concern) and
because it seemed less fragile. Of course, this is argueable.
The API is by no means complete, and lacks some features, however it allows
to support TSO in mv643xx_eth and mvneta network drivers with some very
good performance results. I've added this support as an example of the API
in action.
In particular the following needs some revisiting:
1. IPv6 support is lacking.
2. The required descriptor counting needs some verification. The current
implementation might be too "sketchy". The tilegx one can be a good
starting point.
3. The implemenation assumes the hardware can compute the TCP and IP
checksums for the built headers. However, some controllers may need
some initial calculation (such as tilegx, for instance).
Despite this, I hope this proposal is good enough to trigger some discussion
and to check if I'm on the right track. Feedback is much appreciated!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Mon, 19 May 2014 17:00:00 +0000 (14:00 -0300)]
net: mv643xx_eth: Implement software TSO
Now that the TSO helper API has been introduced, this commit makes use
of it to add support for software TSO in this driver.
This feature allows to improve outbound throughput performance significantly.
Running iperf tests shows a 30% improvement, tested on a Kirkwood Openblocks
A6 board.
$ ethtool -K eth0 tso off
$ iperf -c 192.168.0.45 -t 3
------------------------------------------------------------
Client connecting to 192.168.0.45, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.0.159 port 46389 connected with 192.168.0.45 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 3.0 sec 217 MBytes 607 Mbits/sec
$ ethtool -K eth0 tso on
$ iperf -c 192.168.0.45 -t 3
------------------------------------------------------------
Client connecting to 192.168.0.45, TCP port 5001
TCP window size: 43.8 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.0.159 port 46390 connected with 192.168.0.45 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 3.0 sec 336 MBytes 938 Mbits/sec
This commit is just an example of the usage of the TSO API, it works fine
but needs some more work. In particular, the descriptor unmapping path must
avoid unmapping the TSO headers.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Mon, 19 May 2014 16:59:59 +0000 (13:59 -0300)]
net: mv643xx_eth: Use dma_map_single() to map the skb fragments
Using dma_map_single() instead of skb_frag_dma_map() allows to unmap
all the descriptors using dma_unmap_single(). This change allows
to introduce software TSO in a less intrusive way.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Mon, 19 May 2014 16:59:57 +0000 (13:59 -0300)]
net: mv643xx_eth: Avoid setting the initial TCP checksum
As specified in the datasheet, the driver can set the "L4Chk_Mode" flag
(bit 10) in the Tx descriptor command/status to specify that a frame is not
IP fragmented and that the controller is in charge of generating the TCP/IP
checksum. This must be used together with the "GL4chk" flag (bit 17).
These two flags allow to avoid setting the initial TCP checksum in the l4i_chk
field of the Tx descriptor, which is needed to support software TSO.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Mon, 19 May 2014 16:59:55 +0000 (13:59 -0300)]
net: mvneta: Implement software TSO
Now that the TSO helper API has been introduced, this commit makes use
of it to implement the TSO in this driver.
Using iperf to test and vmstat to check the CPU usage, shows a substantial
CPU usage drop when TSO is on (~15% vs. ~25%). HTTP-based tests performed
by Willy Tarreau have shown performance improvements.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Mon, 19 May 2014 16:59:54 +0000 (13:59 -0300)]
net: mvneta: Clean mvneta_tx() sk_buff handling
Rework mvneta_tx() so that the code that performs the final handling
before a sk_buff is transmitted is done only if the numbers of fragments
processed if positive.
This is preparation work to add the support for software TSO.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Mon, 19 May 2014 16:59:52 +0000 (13:59 -0300)]
net: Add a software TSO helper API
Although the implementation probably needs a lot of work, this initial API
allows to implement software TSO in mvneta and mv643xx_eth drivers in a not
so intrusive way.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Netfilter/nftables updates for net-next
The following patchset contains Netfilter/nftables updates for net-next,
most relevantly they are:
1) Add set element update notification via netlink, from Arturo Borrero.
2) Put all object updates in one single message batch that is sent to
kernel-space. Before this patch only rules where included in the batch.
This series also introduces the generic transaction infrastructure so
updates to all objects (tables, chains, rules and sets) are applied in
an all-or-nothing fashion, these series from me.
3) Defer release of objects via call_rcu to reduce the time required to
commit changes. The assumption is that all objects are destroyed in
reverse order to ensure that dependencies betweem them are fulfilled
(ie. rules and sets are destroyed first, then chains, and finally
tables).
4) Allow to match by bridge port name, from Tomasz Bursztyka. This series
include two patches to prepare this new feature.
5) Implement the proper set selection based on the characteristics of the
data. The new infrastructure also allows you to specify your preferences
in terms of memory and computational complexity so the underlying set
type is also selected according to your needs, from Patrick McHardy.
6) Several cleanup patches for nft expressions, including one minor possible
compilation breakage due to missing mark support, also from Patrick.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 22 May 2014 16:05:01 +0000 (12:05 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to i40e and i40evf.
Shannon makes minor changes to the AdminQ interface to bring it up to
date. Removes the hard coding of stats struct size in ethtool, in prep
for adding data fields which are configuration dependent.
Catherine removes some unused and unneeded PCI bus defines.
Jesse fixes the copyright headers and finishes up the removal of the PTP
Tx work functionality which allows us to rely on the Tx timesync interrupt.
Mitch provides a number of fixes and cleanups for i40e/i40evf based on
suggestions from Ben Hutchings. First is to use a macro parameter for
ethtool stats instead of just assuming that a valid netdev variable
exists. Second is not to tell ethtool that the VF can do 10GbaseT, when
it really has no idea what its link speed is, so set the supported value
to 0 instead. Make the ethtool_ops structure constant since it is
extremely unlikely to change at runtime. Ethtool consistently reports
0 values for our ITR settings because we never actually use them, so
fix this by setting the default values to the specified default values.
Greg avoids a compile error by wrapping the call to i40e_alloc_vfs() in
CONFIG_PCI_IOV because the function itself is wrapped in the same
conditional compile block.
Alexander Gordeev updates the driver to use the new pci_enable_msi_range()
and pci_enable_msix_range() or pci_enable_msi_exact() and
pci_enable_msix_exact().
Jean Sacren provides a fix where the wrong error code was being passed to
i40e_open().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neal Cardwell [Thu, 22 May 2014 14:41:08 +0000 (10:41 -0400)]
tcp: make cwnd-limited checks measurement-based, and gentler
Experience with the recent e114a710aa50 ("tcp: fix cwnd limited
checking to improve congestion control") has shown that there are
common cases where that commit can cause cwnd to be much larger than
necessary. This leads to TSO autosizing cooking skbs that are too
large, among other things.
The main problems seemed to be:
(1) That commit attempted to predict the future behavior of the
connection by looking at the write queue (if TSO or TSQ limit
sending). That prediction sometimes overestimated future outstanding
packets.
(2) That commit always allowed cwnd to grow to twice the number of
outstanding packets (even in congestion avoidance, where this is not
needed).
This commit improves both of these, by:
(1) Switching to a measurement-based approach where we explicitly
track the largest number of packets in flight during the past window
("max_packets_out"), and remember whether we were cwnd-limited at the
moment we finished sending that flight.
(2) Only allowing cwnd to grow to twice the number of outstanding
packets ("max_packets_out") in slow start. In congestion avoidance
mode we now only allow cwnd to grow if it was fully utilized.
Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sk_unattached_filter_create() - validate classic BPF, convert, JIT
SK_RUN_FILTER() - run it
sk_unattached_filter_destroy() - destroy socket filter
Cleanup internal BPF kernel API as following:
sk_filter_select_runtime() - final step of internal BPF creation.
Try to JIT internal BPF program, if JIT is not available select interpreter
SK_RUN_FILTER() - run it
sk_filter_free() - free internal BPF program
Disallow direct calls to BPF interpreter. Execution of the BPF program should
be done with SK_RUN_FILTER() macro.
Sockets, seccomp, testsuite, tracing are using different ways to populate
sk_filter, so first steps of program creation are not common.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 21 May 2014 21:04:42 +0000 (17:04 -0400)]
Merge branch 'enic-next'
Govindarajulu Varadarajan says:
====================
enic: Add adaptive coalescing interrupt support
This series add support for adaptive coalescing interrupt and updates
enic Maintainers.
v1->v2:
* Add commit log
* do vnic_intr_coalescing_timer_set only while enabling intr
* use ktime_get instead of hrtimer
* make enic_set_rx_coal_setting return type void
* change func name enic_apply_int_moderation to enic_calc_int_moderation
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sujith Sankar [Mon, 19 May 2014 21:44:05 +0000 (03:14 +0530)]
enic: Add support for adaptive interrupt coalescing
This patch adds support for adaptive interrupt coalescing.
For small pkts with low pkt rate, we can decrease the coalescing interrupt
dynamically which decreases the latency. This however increases the cpu
utilization. Based on testing with different coal intr and pkt rate we came up
with a table(mod_table) with rx_rate and coalescing interrupt value where we
get low latency without significant increase in cpu. mod_table table stores
the coalescing timer percentage value for different throughputs.
Function enic_calc_int_moderation() calculates the desired coalescing intr timer
value. This function is called in driver rx napi_poll. The actual value is set
by enic_set_int_moderation() which is called when napi_poll is complete. i.e
when we unmask the rx intr.
Adaptive coal intr is support only when driver is using msix intr. Because
intr is not shared.
Struct mod_range is used to store only the default adaptive coalescing intr
value.
rx_coal->range_end is the rx-usecs-high value set using ethtool.
range_start is rx-usecs-low, set using ethtool, if rx_small_pkt_bytes_cnt is
greater than 2 * rx_large_pkt_bytes_cnt. i.e small pkts are dominant. Else its
rx-usecs-low + 3.
Cc: Christian Benvenuti <benve@cisco.com> Cc: Neel Patel <neepatel@cisco.com> Signed-off-by: Sujith Sankar <ssujith@cisco.com> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Himangi Saraogi [Mon, 19 May 2014 16:55:11 +0000 (22:25 +0530)]
ieee802154: Introduce the use of the managed version of kzalloc
This patch moves data allocated using kzalloc to managed data allocated
using devm_kzalloc and cleans now unnecessary kfrees in probe and remove
functions. An explicit linux/device.h include is added to make sure
the devm_*() routine declarations are unambiguously available.
The following Coccinelle semantic patch was used for making the change:
Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Acked-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Remove double checks, convert printk to pr_warn, and move the call to
pr_warn to the first check. The simplified version of the coccinelle
semantic patch that find this issue is as follows:
Xi Wang [Fri, 16 May 2014 22:11:48 +0000 (15:11 -0700)]
net-tun: restructure tun_do_read for better sleep/wakeup efficiency
tun_do_read always adds current thread to wait queue, even if a packet
is ready to read. This is inefficient because both sleeper and waker
want to acquire the wait queue spin lock when packet rate is high.
We restructure the read function and use common kernel networking
routines to handle receive, sleep and wakeup. With the change
available packets are checked first before the reading thread is added
to the wait queue.
Ran performance tests with the following configuration:
- my packet generator -> tap1 -> br0 -> tap0 -> my packet consumer
- sender pinned to one core and receiver pinned to another core
- sender send small UDP packets (64 bytes total) as fast as it can
- sandy bridge cores
- throughput are receiver side goodput numbers
The results are
baseline: 731k pkts/sec, cpu utilization at 1.50 cpus
changed: 783k pkts/sec, cpu utilization at 1.53 cpus
The performance difference is largely determined by packet rate and
inter-cpu communication cost. For example, if the sender and
receiver are pinned to different cpu sockets, the results are
baseline: 558k pkts/sec, cpu utilization at 1.71 cpus
changed: 690k pkts/sec, cpu utilization at 1.67 cpus
Co-authored-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Xi Wang <xii@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Sacren [Thu, 1 May 2014 14:31:18 +0000 (14:31 +0000)]
i40e: fix passing wrong error code to i40e_open()
The commit 6c167f582ea9 ("i40e: Refactor and cleanup i40e_open(),
adding i40e_vsi_open()") introduced a new function i40e_vsi_open()
with the regression by a typo. Due to the commit, the wrong error
code would be passed to i40e_open(). Fix this error in
i40e_vsi_open() by turning the macro into a negative value so that
i40e_open() could return the pertinent error code correctly.
Fixes: 6c167f582ea9 ("i40e: Refactor and cleanup i40e_open(), adding i40e_vsi_open()") Signed-off-by: Jean Sacren <sakiwit@gmail.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40evf: Use pci_enable_msix_range() instead of pci_enable_msix()
As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() or pci_enable_msi_exact()
and pci_enable_msix_range() or pci_enable_msix_exact()
interfaces.
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The previous removal of the PTP Tx work functionality was
incomplete as noted by Jake Keller. This removal allows
us to rely on the Tx timesync interrupt.
CC: Jacob Keller <jacob.e.keller@intel.com>
Change-ID: Id4faaf275a3688053ebbf07bef08072f9fd11aa9 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 4 Apr 2014 04:43:14 +0000 (04:43 +0000)]
i40e: Don't disable SR-IOV when VFs are assigned
When VFs are assigned to active VMs and we disable SR-IOV out from under them,
bad things happen. Currently, the VM does not crash, but the VFs lose all
resources and have no way to get them back.
Add an additional check for when the user is disabling through sysfs, and add a
comment to clarify why we check twice.
Change-ID: Icad78eef516e4e1e4a87874d59132bc3baa058d4 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Fri, 4 Apr 2014 04:43:12 +0000 (04:43 +0000)]
i40e: remove hardcode of stats struct size in ethtool
Base the queue stats length on the queue stats struct rather than
assuming it is 2 fields. This is in prep for adding data fields
which are configuration dependent.
Change-ID: I937f471f389d2e0f8cec733960c5d9a06b14f3ec Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 4 Apr 2014 04:43:10 +0000 (04:43 +0000)]
i40e/i40evf: control auto ITR through ethtool
For all of our supported kernels, ethtool allows us to directly control
adaptive ITR instead of just faking it with an ITR value. Support this
capability so that user knows explicitly when ITR is being controlled
dynamically. Suggested by Ben Hutchings.
CC: Ben Hutchings <ben@decadent.org.uk>
Change-ID: Iae6b79c5db767a63d22ecd9a9c24acaff02a096e Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 4 Apr 2014 04:43:07 +0000 (04:43 +0000)]
i40e/i40evf: set proper default for ITR registers
Ethtool consistently reports 0 values for our ITR settings because
we never actually set them. Fix this by setting the default values
to the specified default values.
Change-ID: I2832406a66f7140f2b1230945d6ff6cbf77467c8 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 4 Apr 2014 04:43:11 +0000 (04:43 +0000)]
i40evf: make ethtool_ops const
Const-ify the ethtool_ops structure, as it is extremely unlikely to
change at runtime. Suggested by Ben Hutchings.
CC: Ben Hutchings <ben@decadent.org.uk>
Change-ID: I1ccb1b7c3ea801cc934447599a35910e7c93d321 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 4 Apr 2014 04:43:09 +0000 (04:43 +0000)]
i40evf: don't lie to ethtool
Don't tell ethtool that the VF can do 10GbaseT, when it really has no
idea what its link speed is. Set the supported values to 0 instead.
Suggested by Ben Hutchings.
CC: Ben Hutchings <ben@decadent.org.uk>
Change-ID: Iceb0d8af68fe5d8dc13224366979ba701ba89c39 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Fri, 4 Apr 2014 04:43:08 +0000 (04:43 +0000)]
i40evf: Use macro param for ethtool stats
Use a macro parameter for ethtool stats instead of just assuming
that a valid netdev variable exists. Suggested by Ben Hutchings.
CC: Ben Hutchings <ben@decadent.org.uk>
Change-ID: I66681698573c1549f95fdea310149d8a7e96a60f Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Fri, 4 Apr 2014 04:43:03 +0000 (04:43 +0000)]
i40evf: Update AdminQ interface
Minor changes to the AdminQ interface to bring it up-to-date.
Change-ID: Ie31a4cc4911b2d9d3b7f9af2e56fb0ae674f6345 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Kevin Scott <kevin.c.scott@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
this is a pull request of 13 patches for net-next/master.
A patch by Dan Carpenter fixes a coccinelle warning in the mcp251x
driver. Jean Delvare contributes three patches to tightening the
Kconfig dependencies for some drivers. Then come three patches by Pavel
Machek that improve the c_can driver support on the socfpga platform.
Sergei Shtylyov's patch brings support for the CAN hardware found on
Renesas R-Car CAN controllers. Four patches by Oliver Hartkopp, the
first cleans up the guard macros in the CAN headers the other three
improve the EFF frame filtering. Maximilian Schneider's patch adds
support for the GS_USB CAN devices.
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Mon, 19 May 2014 07:21:09 +0000 (09:21 +0200)]
net: cdc_ncm: fix 64bit division build error
The upper timer_interval limit is arbitrary and much higher
than anything usable in the real world. Reducing it from 15s
to ~4s to make the timer_interval fit in an u32 does not make
much difference. The limit is still outside the practical
bounds.
This eliminates the need for a 64bit timer_interval, fixing a
build error related to 64bit division:
drivers/built-in.o: In function `cdc_ncm_get_coalesce':
ak8975.c:(.text+0x1ac994): undefined reference to `__aeabi_uldivmod'
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter: nf_tables: defer all object release via rcu
Now that all objects are released in the reverse order via the
transaction infrastructure, we can enqueue the release via
call_rcu to save one synchronize_rcu. For small rule-sets loaded
via nft -f, it now takes around 50ms less here.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: remove skb and nlh from context structure
Instead of caching the original skbuff that contains the netlink
messages, this stores the netlink message sequence number, the
netlink portID and the report flag. This helps to prepare the
introduction of the object release via call_rcu.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now that all these function are called from the commit path, we can
pass the context structure to reduce the amount of parameters in all
of the nf_tables_*_notify functions. This patch also removes unneeded
branches to check for skb, nlh and net that should be always set in
the context structure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: use new transaction infrastructure to handle table
This patch speeds up rule-set updates and it also provides a way
to revert updates and leave things in consistent state in case that
the batch needs to be aborted.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: use new transaction infrastructure to handle chain
This patch speeds up rule-set updates and it also introduces a way to
revert chain updates if the batch is aborted. The idea is to store the
changes in the transaction to apply that in the commit step.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: use new transaction infrastructure to handle sets
This patch reworks the nf_tables API so set updates are included in
the same batch that contains rule updates. This speeds up rule-set
updates since we skip a dialog of four messages between kernel and
user-space (two on each direction), from:
1) create the set and send netlink message to the kernel
2) process the response from the kernel that contains the allocated name.
3) add the set elements and send netlink message to the kernel.
4) process the response from the kernel (to check for errors).
To:
1) add the set to the batch.
2) add the set elements to the batch.
3) add the rule that points to the set.
4) send batch to the kernel.
This also introduces an internal set ID (NFTA_SET_ID) that is unique
in the batch so set elements and rules can refer to new sets.
Backward compatibility has been only retained in userspace, this
means that new nft versions can talk to the kernel both in the new
and the old fashion.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: add message type to transactions
The patch adds message type to the transaction to simplify the
commit the and abort routines. Yet another step forward in the
generalisation of the transaction infrastructure.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: relocate commit and abort routines in the source file
Move the commit and abort routines to the bottom of the source code
file. This change is required by the follow up patches that add the
set, chain and table transaction support.
This patch is just a cleanup to access several functions without
having to declare their prototypes.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This patch generalises the existing rule transaction infrastructure
so it can be used to handle set, table and chain object transactions
as well. The transaction provides a data area that stores private
information depending on the transaction type.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: deconstify table and chain in context structure
The new transaction infrastructure updates the family, table and chain
objects in the context structure, so let's deconstify them. While at it,
move the context structure initialization routine to the top of the
source file as it will be also used from the table and chain routines.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
can: gs_usb: Added support for the GS_USB CAN devices
The Geschwister Schneider Family of devices are galvanically isolated USB2.0 to
CAN2.0A/B adapters. Currently two form factors are available, a tethered dongle
in a rugged enclosure, and mini-pci-e card.
Signed-off-by: Maximilian Schneider <max@schneidersoft.net> Acked-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Wed, 2 Apr 2014 18:25:27 +0000 (20:25 +0200)]
can: add documentation for CAN filter usage optimisation
To benefit from special filters for single SFF or single EFF CAN identifier
subscriptions the CAN_EFF_FLAG bit and the CAN_RTR_FLAG bit has to be set
together with the CAN_(SFF|EFF)_MASK in can_filter.mask.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Wed, 2 Apr 2014 18:25:26 +0000 (20:25 +0200)]
can: add hash based access to single EFF frame filters
In contrast to the direct access to the single SFF frame filters (which are
indexed by the SFF CAN ID itself) the single EFF frame filters are arranged
in a single linked hlist. To reduce the hlist traversal in the case of many
filter subscriptions a hash based access is introduced for single EFF filters.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Oliver Hartkopp [Wed, 2 Apr 2014 18:25:25 +0000 (20:25 +0200)]
can: proc: make array printing function indenpendent from sff frames
The can_rcvlist_sff_proc_show_one() function which prints the array of filters
for the single SFF CAN identifiers is prepared to be used by a second caller.
Therefore it is also renamed to properly describe its future functionality.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>