David Lebrun [Tue, 8 Nov 2016 13:57:41 +0000 (14:57 +0100)]
ipv6: sr: add support for SRH encapsulation and injection with lwtunnels
This patch creates a new type of interfaceless lightweight tunnel (SEG6),
enabling the encapsulation and injection of SRH within locally emitted
packets and forwarded packets.
>From a configuration viewpoint, a seg6 tunnel would be configured as follows:
ip -6 ro ad fc00::1/128 encap seg6 mode encap segs fc42::1,fc42::2,fc42::3 dev eth0
Any packet whose destination address is fc00::1 would thus be encapsulated
within an outer IPv6 header containing the SRH with three segments, and would
actually be routed to the first segment of the list. If `mode inline' was
specified instead of `mode encap', then the SRH would be directly inserted
after the IPv6 header without outer encapsulation.
The inline mode is only available if CONFIG_IPV6_SEG6_INLINE is enabled. This
feature was made configurable because direct header insertion may break
several mechanisms such as PMTUD or IPSec AH.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:57:40 +0000 (14:57 +0100)]
ipv6: sr: add code base for control plane support of SR-IPv6
This patch adds the necessary hooks and structures to provide support
for SR-IPv6 control plane, essentially the Generic Netlink commands
that will be used for userspace control over the Segment Routing
kernel structures.
The genetlink commands provide control over two different structures:
tunnel source and HMAC data. The tunnel source is the source address
that will be used by default when encapsulating packets into an
outer IPv6 header + SRH. If the tunnel source is set to :: then an
address of the outgoing interface will be selected as the source.
The HMAC commands currently just return ENOTSUPP and will be implemented
in a future patch.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 8 Nov 2016 13:57:39 +0000 (14:57 +0100)]
ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)
Implement minimal support for processing of SR-enabled packets
as described in
https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-02.
This patch implements the following operations:
- Intermediate segment endpoint: incrementation of active segment and rerouting.
- Egress for SR-encapsulated packets: decapsulation of outer IPv6 header + SRH
and routing of inner packet.
- Cleanup flag support for SR-inlined packets: removal of SRH if we are the
penultimate segment endpoint.
A per-interface sysctl seg6_enabled is provided, to accept/deny SR-enabled
packets. Default is deny.
This patch does not provide support for HMAC-signed packets.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 8 Nov 2016 13:31:38 +0000 (14:31 +0100)]
net: mii: report 0 for unknown lp_advertising
The newly introduced mii_ethtool_get_link_ksettings function sets
lp_advertising to an uninitialized value when BMCR_ANENABLE is not
set:
drivers/net/mii.c: In function 'mii_ethtool_get_link_ksettings':
drivers/net/mii.c:224:2: error: 'lp_advertising' may be used uninitialized in this function [-Werror=maybe-uninitialized]
As documented in include/uapi/linux/ethtool.h, the value is
expected to be zero when we don't know it, so let's initialize
it to that.
Fixes: bc8ee596afe8 ("net: mii: add generic function to support ksetting support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Beulich [Tue, 8 Nov 2016 07:45:53 +0000 (00:45 -0700)]
xen-netback: prefer xenbus_scanf() over xenbus_gather()
For single items being collected this should be preferred as being more
typesafe (as the compiler can check format string and to-be-written-to
variable match) and more efficient (requiring one less parameter to be
passed).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: l2tp: netlink: l2tp_nl_tunnel_send: set UDP6 checksum flags
This patch causes the proper attribute flags to be set,
in the case that IPv6 UDP checksums are disabled, so that
userspace ie. `ip l2tp show tunnel` knows about it.
Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st> Signed-off-by: David S. Miller <davem@davemloft.net>
The attributes L2TP_ATTR_UDP_ZERO_CSUM6_RX and
L2TP_ATTR_UDP_ZERO_CSUM6_TX are used as flags,
but is defined as a u8 in a comment.
This patch redocuments them as flags.
Adding nla_policy entries would break API, so not doing that.
CC: Tom Herbert <therbert@google.com> Signed-off-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 7 Nov 2016 19:12:27 +0000 (11:12 -0800)]
net-gro: avoid reorders
Receiving a GSO packet in dev_gro_receive() is not uncommon
in stacked devices, or devices partially implementing LRO/GRO
like bnx2x. GRO is implementing the aggregation the device
was not able to do itself.
Current code causes reorders, like in following case :
For a given flow where sender sent 3 packets P1,P2,P3,P4
Receiver might receive P1 as a single packet, stored in GRO engine.
Then P2-P4 are received as a single GSO packet, immediately given to
upper stack, while P1 is held in GRO engine.
This patch will make sure P1 is given to upper stack, then P2-P4
immediately after.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
EF10 based NICs have configurable RSS hash fields, and can be made to take the
ports into the hash on UDP (they already do so for TCP). This patch series
enables this, in order to improve spreading of UDP traffic.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This series further enhances the SRIOV TC offloads of mlx5 to handle the
TC tunnel_key release and set actions.
This serves a common use-case in virtualization systems where the virtual
switch encapsulate packets (tunnel_key set action) sent from VMs with
outer headers corresponding to the local/remote host IPs and de-capsulate
(tunnel_key release) outer headers before the packets are received by the
VM.
We use the new E-Switch switchdev mode and TC tunnel_key set/release
action to achieve that also in SW defined SRIOV environments by
offloading TC rules that contain these actions along with forwarding
(TC mirred/redirect action) the packets.
The first six patches are adding the needed support in flow dissector,
flower and tc for offloading tunnel_key actions:
- The first three patches are adding the needed help functions
and enums
- The next three patches in the series are adding UDP port attribute
to tunnel_key release and set actions.
The addition of UDP ports would allow the HW driver to make sure they are
given (say) a VXLAN tunnel to offload (mlx5e uses that).
Patches 7-10 are mlx5 preparations for tunnel_key actions offloads support.
Patch #11 adds mlx5e support to offload tunnel_key release action, and the
last two patches (#12-13) add mlx5e support to tc tunnel_key set action.
Currently in order to offload tc tunnel_key release action, the tc rule
should be placed on top of the mlx5e offloading (uplink) interface instead
of the shared tunnel interface. The resolution between the tunnel interface
to the HW netdevice will be implemented in a follow up series.
This series was generated against commit 94edc86bf13f ("Merge branch 'dwmac-sti-refactor-cleanup'")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:48 +0000 (15:14 +0200)]
net/mlx5e: Add basic TC tunnel set action for SRIOV offloads
In mlx5 HW, encapsulation is offloaded by the steering rule having
index into an encapsulation table containing the entire set of headers
to be added by the HW. The driver sets these headers in a buffer when we
are offloading the action.
The code maintains mlx5_encap_entry for each encap header it has
encountered when attempted to offload TC tunnel set action.
This entry maintains a linked list of all the flows sharing the same
encap header, when the last flow is removed from the list the encap
entry is removed.
The actual encap_header is allocated by the driver in the hardware only
if we have layer two neighbour info when the encap entry is created.
While the flow is in the driver, the driver holds a reference on the
neighbour.
When a new flow with encap action is inserted, the code first checks if
the required encap entry exists according to the tunnel set parameters.
If it does the encap is shared, otherwise a new mlx5_encap_entry is
created.
TC action parsing implementation in the driver assumes that tunnel set
action is provided in the same order set by the user, e.g before the
mirred_redirect action.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:47 +0000 (15:14 +0200)]
net/mlx5e: Add ndo_udp_tunnel_add to VF representors
By implementing this ndo, the host stack will set the vxlan udp port
also to VF representor netdevices. This will allow the TC offload code
in the driver when it gets a tunnel key set action to identify the UDP
port as vxlan, and hence the rule will be a candidate for offloading.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:46 +0000 (15:14 +0200)]
net/mlx5e: Add TC tunnel release action for SRIOV offloads
Enhance the parsing of offloaded TC rules to set HW matching on outer
(encapsulation) headers.
Parse TC tunnel release action and set it as mlx5 decap action when the
required capabilities are supported.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:45 +0000 (15:14 +0200)]
net/mlx5: Support encap id when setting new steering entry
In order to support steering rules which add encapsulation headers,
encap_id parameter is needed.
Add new mlx5_flow_act struct which holds action related parameter:
action, flow_tag and encap_id. Use mlx5_flow_act struct when adding a new
steering rule.
This patch doesn't change any functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:44 +0000 (15:14 +0200)]
net/mlx5: Add creation flags when adding new flow table
When creating flow tables, allow the caller to specify creation flags.
Currently no flags are used and as such this patch doesn't add any new
functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:43 +0000 (15:14 +0200)]
net/mlx5: Check max encap header size capability
Instead of comparing to a const value, check the value of max encap
header size capability as reported by the Firmware.
Fixes: 575ddf5888ea ('net/mlx5: Introduce alloc_encap and dealloc_encap commands') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:42 +0000 (15:14 +0200)]
net/mlx5: Move alloc/dealloc encap commands declarations to common header file
The alloc and dealloc encap commands will be used in the mlx5e driver,
as such, declare them in a common header file.
Also, rename the functions: mlx5_cmd_{de}alloc_encap is replaced with
mlx5_encap_{de}alloc.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:41 +0000 (15:14 +0200)]
net/sched: act_tunnel_key: Add UDP dst port option
The current tunnel set action supports only IP addresses and key
options. Add UDP dst port option.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:40 +0000 (15:14 +0200)]
net/dst: Add dst port to dst_metadata utility functions
Add dst port parameter to __ip_tun_set_dst and __ipv6_tun_set_dst
utility functions.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:39 +0000 (15:14 +0200)]
net/sched: cls_flower: Add UDP port to tunnel parameters
The current IP tunneling classification supports only IP addresses and key.
Enhance UDP based IP tunneling classification parameters by adding UDP
src and dst port.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:38 +0000 (15:14 +0200)]
net/sched: cls_flower: Allow setting encapsulation fields as used key
When encapsulation field is set, mark it as used key for the flow
dissector. This will be used by offloading drivers.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:37 +0000 (15:14 +0200)]
flow_dissector: Add enums for encapsulation keys
New encapsulation keys were added to the flower classifier, which allow
classification according to outer (encapsulation) headers attributes
such as key and IP addresses.
In order to expose those attributes outside flower, add
corresponding enums in the flow dissector.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Mon, 7 Nov 2016 13:14:36 +0000 (15:14 +0200)]
net/sched: act_tunnel_key: add helper inlines to access tcf_tunnel_key
Needed for drivers to pick the relevant action when offloading tunnel
key act.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tested: https://android-review.googlesource.com/#/c/299980/ Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 6 Nov 2016 15:12:27 +0000 (17:12 +0200)]
qed: Prevent stack corruption on MFW interaction
Driver uses a union for copying data to & from management firmware
when interacting with it.
Problem is that the function always copies sizeof(union) while commit 2edbff8dcb5d ("qed: Learn resources from management firmware") is casting
a union elements which is of smaller size [24-byte instead of 88-bytes].
Also, the union contains some inappropriate elements which increase its
size [should have been 32-bytes]. While this shouldn't corrupt other
PF messages to the MFW [as management firmware enforces permissions so
that each PF is allowed to write only to its own mailbox] we fix this
here as well.
Fixes: 2edbff8dcb5d ("qed: Learn resources from management firmware") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When moving from typhoon_get_settings to typhoon_getlink_ksettings
in the commit f7a5537cd2a5 ("net: 3com: typhoon: use new api
ethtool_{get|set}_link_ksettings"), we use a local variable supported
but we forgot to update the struct ethtool_link_ksettings with
this value.
We also initialize advertising to zero, because otherwise it may
be uninitialized if no case of the switch (tp->xcvr_select) is used.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sun, 6 Nov 2016 13:57:04 +0000 (14:57 +0100)]
net: xgbe: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Philippe Reynes [Sat, 5 Nov 2016 15:17:54 +0000 (16:17 +0100)]
net: alteon: acenic: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Acked-by: Jes Sorensen <Jes.Sorensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 9 Nov 2016 18:21:25 +0000 (13:21 -0500)]
Merge branch 'stmmac-dwmac-rk-PM'
Joachim Eastwood says:
====================
stmmac: dwmac-rk: convert to standard PM/remove functions
This patch set aims to remove the init/exit callbacks from the
dwmac-rk driver and instead use standard PM callbacks. Eventually
the init/exit callbacks will be deprecated and removed from all
drivers dwmac-* except for dwmac-generic. Drivers will be refactored
to use standard PM and remove callbacks.
This conversion was pretty straight forward, but it would really nice
if some chromium people could test suspend/resume with this patch set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Revert "net: stmmac: allow to split suspend/resume from init/exit callbacks"
Instead of adding hooks inside stmmac_platform it is better to just use
the standard PM callbacks within the specific dwmac-driver. This only
used by the dwmac-rk driver.
This reverts commit cecbc5563a02 ("stmmac: allow to split suspend/resume
from init/exit callbacks").
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since the rk_gmac_init() only calls another function move this
function call into probe so rk_gmac_init() can be removed.
Since commit cecbc5563a02 ("stmmac: allow to split suspend/resume
from init/exit callbacks") the init hook is no longer used in
dwmac-rk so this can be removed.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: dwmac-rk: turn resume/suspend into standard PM callbacks
Use standard PM resume/suspend callbacks instead of the hooks in
stmmac_platform. This gives the driver more control and flexibility
when implementing PM functionality. The hooks in stmmac_platform
also doesn't buy us anything extra.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This short series prepares tcp_get_info() for more detailed infos.
In order to not slow down fast path, our goal is to use the normal
socket spinlock instead of custom synchronization.
All we need to ensure is that tcp_get_info() is not called with
ehash lock, which might dead lock, since packet processing would acquire
the spinlocks in reverse way.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 4 Nov 2016 18:54:32 +0000 (11:54 -0700)]
tcp: no longer hold ehash lock while calling tcp_get_info()
We had various problems in the past in tcp_get_info() and used
specific synchronization to avoid deadlocks.
We would like to add more instrumentation points for TCP, and
avoiding grabing socket lock in tcp_getinfo() was too costly.
Being able to lock the socket allows to provide consistent set
of fields.
inet_diag_dump_icsk() can make sure ehash locks are not
held any more when tcp_get_info() is called.
We can remove syncp added in commit d654976cbf85
("tcp: fix a potential deadlock in tcp_get_info()"), but we need
to use lock_sock_fast() instead of spin_lock_bh() since TCP input
path can now be run from process context.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 4 Nov 2016 18:54:31 +0000 (11:54 -0700)]
tcp: shortcut listeners in tcp_get_info()
Being lockless in tcp_get_info() is hard, because we need to add
specific synchronization in TCP fast path, like seqcount.
Following patch will change inet_diag_dump_icsk() to no longer
hold any lock for non listeners, so that we can properly acquire
socket lock in get_tcp_info() and let it return more consistent counters.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 9 Nov 2016 17:50:56 +0000 (12:50 -0500)]
Merge branch 'Meson-GXL-internal-phy'
Neil Armstrong says:
====================
ARM64: Add Internal PHY support for Meson GXL
The Amlogic Meson GXL SoCs have an internal RMII PHY that is muxed with the
external RGMII pins.
In order to support switching between the two PHYs links, extended registers
size for mdio-mux-mmioreg must be added.
The DT related patches submitted as RFC in [3] will be sent in a separate
patchset due to multiple patchsets and DTSI migrations.
Changes since v2 RFC patchset at : [3]
- Change phy Kconfig/Makefile alphabetic order
- GXL dtsi cleanup
Changes since original RFC patchset at : [2]
- Remove meson8b experimental phy switching
- Switch to mdio-mux-mmioreg with extennded size support
- Add internal phy support for S905x and p231
- Add external PHY support for p230
Neil Armstrong [Fri, 4 Nov 2016 15:51:23 +0000 (16:51 +0100)]
net: phy: Add Meson GXL Internal PHY driver
Add driver for the Internal RMII PHY found in the Amlogic Meson GXL SoCs.
This PHY seems to only implement some standard registers and need some
workarounds to provide autoneg values from vendor registers.
Some magic values are currently used to configure the PHY, and this a
temporary setup until clarification about these registers names and
registers fields are provided by Amlogic.
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Armstrong [Fri, 4 Nov 2016 15:51:22 +0000 (16:51 +0100)]
net: mdio-mux-mmioreg: Add support for 16bit and 32bit register sizes
In order to support PHY switching on Amlogic GXL SoCs, add support for
16bit and 32bit registers sizes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The for() loop in rds_tcp_accept_one() assumes that the 0'th
rds_tcp_conn_path is UP and starts multipath accepts at index 1.
But this assumption may not always be true: if the 0'th path
has failed (ERROR or DOWN state) an incoming connection request
should be used to resurrect this path.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
RDS: TCP: report addr/port info based on TCP socket in rds-info
The socket argument passed to rds_tcp_tc_info() is a PF_RDS socket,
so it is incorrect to report the address port info based on
rds_getname() as part of TCP state report.
Invoke inet_getname() for the t_sock associated with the
rds_tcp_connection instead.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Do not set sk_err when dequeuing errors from the error queue.
Doing so results in:
a) Bugs: By overwriting existing sk_err values, it possibly
hides legitimate errors. It is also incorrect when local
errors are queued with ip_local_error. That happens in the
context of a system call, which already returns the error
code.
b) Inconsistent behavior: When there are pending errors on
the error queue, sk_err is sometimes 0 (e.g., for
the first timestamp on the error queue) and sometimes
set to an error code (after dequeuing the first
timestamp).
c) Suboptimality: Setting sk_err to ENOMSG on simple
TX timestamps can abort parallel reads and writes.
Removing this line doesn't break userspace. This is because
userspace code cannot rely on sk_err for detecting whether
there is something on the error queue. Except for ICMP messages
received for UDP and RAW, sk_err is not set at enqueue time,
and as a result sk_err can be 0 while there are plenty of
errors on the error queue.
For ICMP packets in UDP and RAW, sk_err is set when they are
enqueued on the error queue, but that does not result in aborting
reads and writes. For such cases, sk_err is only readable via
getsockopt(SO_ERROR) which will reset the value of sk_err on
its own. More importantly, prior to this patch,
recvmsg(MSG_ERRQUEUE) has a race on setting sk_err (i.e.,
sk_err is set by sock_dequeue_err_skb without atomic ops or
locks) which can store 0 in sk_err even when we have ICMP
messages pending. Removing this line from sock_dequeue_err_skb
eliminates that race.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 8 Nov 2016 01:15:56 +0000 (20:15 -0500)]
Merge branch 'IFF_NO_QUEUE-semantics'
Jesper Dangaard Brouer says:
====================
qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices
This patchset is a cleanup for IFF_NO_QUEUE devices. It will
hopefully help userspace get a more consistent behavior when attaching
qdisc to such virtual devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
It is a clear misconfiguration to attach a qdisc to a device with
tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
htb, plug and sfb) inherit/copy this value as their queue length.
Why should the kernel catch such a misconfiguration? Because prior to
introducing the IFF_NO_QUEUE device flag, userspace found a loophole
in the qdisc config system that allowed them to achieve the equivalent
of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
a device. The loophole on older kernels is setting tx_queue_len=0,
*prior* to device qdisc init (the config time is significant, simply
setting tx_queue_len=0 doesn't trigger the loophole).
This loophole is currently used by Docker[1] to get better performance
and scalability out of the veth device. The Docker developers were
warned[1] that they needed to adjust the tx_queue_len if ever
attaching a qdisc. The OpenShift project didn't remember this warning
and attached a qdisc, this were caught and fixed in[2].
Instead of fixing every userspace program that used this loophole, and
forgot to reset the tx_queue_len, prior to attaching a qdisc. Let's
catch the misconfiguration on the kernel side.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len
The flag IFF_NO_QUEUE marks virtual device drivers that doesn't need a
default qdisc attached, given they will be backed by physical device,
that already have a qdisc attached for pushback.
It is still supported to attach a qdisc to a IFF_NO_QUEUE device, as
this can be useful for difference policy reasons (e.g. bandwidth
limiting containers). For this to work, the tx_queue_len need to have
a sane value, because some qdiscs inherit/copy the tx_queue_len
(namely, pfifo, bfifo, gred, htb, plug and sfb).
Commit a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling
ether_setup()") caught situations where some drivers didn't initialize
tx_queue_len. The problem with the commit was choosing 1 as the
fallback value.
A qdisc queue length of 1 causes more harm than good, because it
creates hard to debug situations for userspace. It gives userspace a
false sense of a working config after attaching a qdisc. As low
volume traffic (that doesn't activate the qdisc policy) works,
like ping, while traffic that e.g. needs shaping cannot reach the
configured policy levels, given the queue length is too small.
This patch change the value to DEFAULT_TX_QUEUE_LEN, given other
IFF_NO_QUEUE devices (that call ether_setup()) also use this value.
Fixes: a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling ether_setup()") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: make default TX queue length a defined constant
The default TX queue length of Ethernet devices have been a magic
constant of 1000, ever since the initial git import.
Looking back in historical trees[1][2] the value used to be 100,
with the same comment "Ethernet wants good queues". The commit[3]
that changed this from 100 to 1000 didn't describe why, but from
conversations with Robert Olsson it seems that it was changed
when Ethernet devices went from 100Mbit/s to 1Gbit/s, because the
link speed increased x10 the queue size were also adjusted. This
value later caused much heartache for the bufferbloat community.
This patch merely moves the value into a defined constant.
David S. Miller [Mon, 7 Nov 2016 18:24:42 +0000 (13:24 -0500)]
Merge branch 'udp-fwd-mem-sched-on-dequeue'
Paolo Abeni says:
====================
udp: do fwd memory scheduling on dequeue
After commit 850cbaddb52d ("udp: use it's own memory accounting schema"),
the udp code needs to acquire twice the receive queue spinlock on dequeue.
This patch series remove the need for the second lock at skb free time,
moving the udp memory scheduling inside the dequeue operation; the skb
destructor field is not used anymore and an additional sk argument is added
to ip_cmsg_recv_offset() to cope with null skb->sk after dequeue.
Many thanks to Eric Dumazed for suggesting pretty all much the above.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 4 Nov 2016 10:28:59 +0000 (11:28 +0100)]
udp: do fwd memory scheduling on dequeue
A new argument is added to __skb_recv_datagram to provide
an explicit skb destructor, invoked under the receive queue
lock.
The UDP protocol uses such argument to perform memory
reclaiming on dequeue, so that the UDP protocol does not
set anymore skb->desctructor.
Instead explicit memory reclaiming is performed at close() time and
when skbs are removed from the receive queue.
The in kernel UDP protocol users now need to call a
skb_recv_udp() variant instead of skb_recv_datagram() to
properly perform memory accounting on dequeue.
Overall, this allows acquiring only once the receive queue
lock on dequeue.
Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.
v1 -> v2:
- do rmem and allocated memory scheduling under the receive lock
- do bulk scheduling in first_packet_length() and in udp_destruct_sock()
- avoid the typdef for the dequeue callback
Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 4 Nov 2016 10:28:58 +0000 (11:28 +0100)]
net/sock: add an explicit sk argument for ip_cmsg_recv_offset()
So that we can use it even after orphaining the skbuff.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Nov 2016 18:11:23 +0000 (13:11 -0500)]
Merge branch 'ns2-amac'
Jon Mason says:
====================
add NS2 support to bgmac
Changes in v6:
* Use a common bgmac_phy_connect_direct (per Rafal Milecki)
* Rebased on latest net-next
* Added Reviewed-by to the relevant patches
Changes in v5:
* Change a pr_err to netdev_err (per Scott Branden)
* Reword the lane swap binding documentation (per Andrew Lunn)
Changes in v4:
* Actually send out the lane swap binding doc patch (Per Scott Branden)
* Remove unused #define (Per Andrew Lunn)
Changes in v3:
* Clean-up the bgmac DT binding doc (per Rob Herring)
* Document the lane swap binding and make it generic (Per Andrew Lunn)
Changes in v2:
* Remove the PHY power-on (per Andrew Lunn)
* Misc PHY clean-ups regarding comments and #defines (per Andrew Lunn)
This results on none of the original PHY code from Vikas being
present. So, I'm removing him as an author and giving him
"Inspired-by" credit.
* Move PHY lane swapping to PHY driver (per Andrew Lunn and Florian
Fainelli)
* Remove bgmac sleep (per Florian Fainelli)
* Re-add bgmac chip reset (per Florian Fainelli and Ray Jui)
* Rebased on latest net-next
* Added patch for bcm54xx_auxctl_read, which is used in the BCM54810
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:11:01 +0000 (01:11 -0400)]
net: ethernet: bgmac: add NS2 support
Add support for the variant of amac hardware present in the Broadcom
Northstar2 based SoCs. Northstar2 requires an additional register to be
configured with the port speed/duplexity (NICPM). This can be added to
the link callback to hide it from the instances that do not use this.
Also, clearing of the pending interrupts on init is required due to
observed issues on some platforms.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:59 +0000 (01:10 -0400)]
Documentation: devicetree: net: add NS2 bindings to amac
Clean-up the documentation to the bgmac-amac driver, per suggestion by
Rob Herring, and add details for NS2 support.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:58 +0000 (01:10 -0400)]
net: phy: broadcom: Add BCM54810 PHY entry
The BCM54810 PHY requires some semi-unique configuration, which results
in some additional configuration in addition to the standard config.
Also, some users of the BCM54810 require the PHY lanes to be swapped.
Since there is no way to detect this, add a device tree query to see if
it is applicable.
Inspired-by: Vikas Soni <vsoni@broadcom.com> Signed-off-by: Jon Mason <jon.mason@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:57 +0000 (01:10 -0400)]
Documentation: devicetree: add PHY lane swap binding
Add the documentation for PHY lane swapping. This is a boolean entry to
notify the phy device drivers that the TX/RX lanes need to be swapped.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:56 +0000 (01:10 -0400)]
net: phy: broadcom: add bcm54xx_auxctl_read
Add a helper function to read the AUXCTL register for the BCM54xx. This
mirrors the bcm54xx_auxctl_write function already present in the code.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch set aims to remove the init/exit callbacks from the
dwmac-sti driver and instead use standard PM callbacks. Doing this
will also allow us to cleanup the driver.
Eventually the init/exit callbacks will be deprecated and removed
from all drivers dwmac-* except for dwmac-generic. Drivers will be
refactored to use standard PM and remove callbacks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev member of struct sti_dwmac is not used anywhere in the driver
so lets just remove it.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: dwmac-sti: clean up and rename sti_dwmac_init
Rename sti_dwmac_init to sti_dwmac_set_mode which is a better
description for what it really does.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: dwmac-sti: move clk_prepare_enable out of init and add error handling
Add clock error handling to probe and in the process move clock enabling
out of sti_dwmac_init() to make this easier.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: dwmac-sti: move st, gmac_en parsing to sti_dwmac_parse_data
The sti_dwmac_init() function is called both from probe and resume.
Since DT properties doesn't change between suspend/resume cycles move
parsing of this parameter into sti_dwmac_parse_data() where it belongs.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Implement PM callbacks and driver remove in the driver instead
of relying on the init/exit hooks in stmmac_platform. This gives
the driver more flexibility in how the code is organized.
Eventually the init/exit callbacks will be deprecated in favor
of the standard PM callbacks and driver remove function.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since sti_dwmac_parse_data() sets dwmac->clk to NULL if not clock was
provided in DT and NULL is a valid clock there is no need to check for
NULL before using this clock.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since dwmac-sti is a DT only driver checking for OF node is not necessary.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Nov 2016 02:42:34 +0000 (21:42 -0500)]
Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2016-11-04
This series contains updates to ixgbe and ixgbevf only.
Don does cleanup and configuration for our X553 devices, related to LED,
auto-negotiation, flow control and SFP+ setup and config. Adds the
(not secret) sauce for B0 hardware for X553 hardware.
Emil provides several fixes, first replaces the driver specific MDIO
defines for the more preferred equivalent kernel ones. Provides a fix
for auto-negotiaion status, by reading a PHY register twice. Introduces
ixgbe_link_operations structure to allow X550EM_a to override the
methods for MDIO access while X550EM_x provides methods to use I2C
combined access.
Mark fixes an issue where the driver was crashing when msix_entires
were not there because they were freed by a previous suspend or remove.
Sowmini Varadhan fixes an issue where an incorrect check for IPPROTO_UDP
in ixgbe_atr(). Then makes sure that the network and transport headers
in the paged data are available in the headlen bytes to calculate the
l4_proto.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Fri, 28 Oct 2016 17:46:39 +0000 (10:46 -0700)]
ixgbevf: Handle previously-freed msix_entries
The msix_entries memory can be freed by a previous suspend or
remove, so don't crash on close when it isn't there. Also only
clear the interrupts when the interface is up, because there
aren't any when it is not up.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sowmini Varadhan [Mon, 24 Oct 2016 22:36:39 +0000 (15:36 -0700)]
ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has network/transport headers
For some Tx paths (e.g., tpacket_snd()), ixgbe_atr may be
passed down an sk_buff that has the network and transport
header in the paged data, so it needs to make sure these
headers are available in the headlen bytes to calculate the
l4_proto.
This patch expect that network and transport headers are
already available in the non-paged header dat. The assumption
is that the caller has set this up if l4_proto based Tx
steering is desired.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sowmini Varadhan [Mon, 24 Oct 2016 22:36:38 +0000 (15:36 -0700)]
ixgbe: ixgbe_atr() should access udp_hdr(skb) only for UDP packets
Commit 9f12df906cd8 ("ixgbe: Store VXLAN port number in network order")
incorrectly checks for hdr.ipv4->protocol != IPPROTO_UDP
in ixgbe_atr(). This check should be for "==" instead.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Fri, 4 Nov 2016 20:46:16 +0000 (16:46 -0400)]
ixgbe: Correct X550 phy ID
We were using an old Alpha version of the X550 phy ID. This was leading
to unnecessary queries of the PHY. I removed the old ID (which shouldn't
be on any HW) and add the two that are.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Fri, 4 Nov 2016 01:01:37 +0000 (21:01 -0400)]
ixgbe: Add X553 FW ALEF support
This patch add X553 FW ALEF support for B0. ALEF is the new unified
FW. This contains updated register defines for ALEF speed
configuration. Likewise it also removes the AN_CNTL_8 usage from
the native SFI flow as it is no longer supported by FW.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This will allow X550EM_a to override these methods for MDIO access
while X550EM_x provides methods to use I2C combined access. This
also adds a new structure, ixgbe_link_info, to hold information
about the link. Initially this is just method pointers and a bus
address.
The functions involved in combined I2C accesses were moved from
ixgbe_phy.c to ixgbe_x550.c. The underlying functions that carry
out the combined I2C accesses were left in ixgbe_phy.c because
they share some functions with other I2C methods.
v2 - set hw->link.ops in probe.
v3 - check ii->link_ops before setting it since we don't have it
for all devices.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Tue, 27 Sep 2016 18:31:12 +0000 (14:31 -0400)]
ixgbe: Add X553 PHY FC autoneg support
This patch adds X553 flow control auto negotiation for fiber and
backplain. To enable this new function pointers were added as well
as creating a function to dynamically set function pointer we can't
define only on MAC type.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Sat, 22 Oct 2016 01:10:54 +0000 (21:10 -0400)]
ixgbe: Update setup PHY link to unset all speeds
This patch updates ixgbe_setup_phy_link_generic to set/unset
auto-negotiation for all speeds. This ensures that unsupported
speeds are unset. This is necessary since the PHY NVM may
advertise unsupported speeds.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Fri, 21 Oct 2016 01:42:00 +0000 (21:42 -0400)]
ixgbe: Add support to retrieve and store LED link active
This patch adds support to get the LED link active via the LEDCTL
register. If the LEDCTL register does not have LED link active
(LED mode field = 0x0100) set then default LED link active returned.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Thu, 22 Sep 2016 00:21:52 +0000 (20:21 -0400)]
ixgbe: Add X552 iXFI configuration helper function
X553 doesn't need all the initialization that X552 did for iXFI. This
patch will allow native SPI SFP+ to work with X553 devices. Future
patches will add additional configuration as needed.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Fri, 4 Nov 2016 18:56:17 +0000 (14:56 -0400)]
Merge branch 'nfp-ring-reconfig-and-xdp-support'
Jakub Kicinski says:
====================
ring reconfiguration and XDP support
This set adds support for ethtool channel API and XDP.
I kick off with ethtool get_channels() implementation.
set_channels() needs some preparations to get right. I follow
the prepare/commit paradigm and allocate all resources before
stopping the device. It has already been done for ndo_change_mtu
and ethtool set_ringparam(), it makes sense now to consolidate all
the required logic in one place.
XDP support requires splitting TX rings into two classes -
for the stack and for XDP. The ring structures are identical.
The differences are in how they are connected to IRQ vector
structs and how the completion/cleanup works. When XDP is enabled
I switch from the frag allocator to page-per-packet and map buffers
BIDIRECTIONALly.
Last but not least XDP offload is added (the patch just takes
care of the small formal differences between cls_bpf and XDP).
There is a tiny & trivial DebugFS patch in the mix, I hope it can
be taken via net-next provided we have the right Acks.
Resending with improved commit message and CCing more people on patch 10.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:12:08 +0000 (17:12 +0000)]
nfp: remove unnecessary parameters from nfp_net_bpf_offload()
nfp_net_bpf_offload() takes all .setup_tc() parameters but it
doesn't use them at the moment. Remove unnecessary ones to make
it possible for XDP to reuse this function.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:12:07 +0000 (17:12 +0000)]
nfp: add XDP support in the driver
Add XDP support. Separate stack's and XDP's TX rings logically.
Add functions for handling XDP_TX and cleanup of XDP's TX rings.
For XDP allocate all RX buffers as separate pages and map them
with DMA_BIDIRECTIONAL.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:12:06 +0000 (17:12 +0000)]
debugfs: constify argument to debugfs_real_fops()
seq_file users can only access const version of file pointer,
because the ->file member of struct seq_operations is marked
as such. Make parameter to debugfs_real_fops() const.
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Nicolai Stange <nicstange@gmail.com> CC: Christian Lamparter <chunkeey@gmail.com> CC: LKML <linux-kernel@vger.kernel.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:12:05 +0000 (17:12 +0000)]
nfp: reorganize nfp_net_rx() to get packet offsets early
Calculate packet offsets early in nfp_net_rx() so that we will be
able to use them in upcoming XDP handler. While at it move relevant
variables into the loop scope.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:12:04 +0000 (17:12 +0000)]
nfp: add support for ethtool .set_channels
Allow changing the number of rings via ethtool .set_channels API.
Runtime reconfig needs to be extended to handle number of rings.
We need to be able to activate interrupt vectors before rings are
assigned to them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>