Neil Armstrong [Fri, 4 Nov 2016 15:51:22 +0000 (16:51 +0100)]
net: mdio-mux-mmioreg: Add support for 16bit and 32bit register sizes
In order to support PHY switching on Amlogic GXL SoCs, add support for
16bit and 32bit registers sizes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The for() loop in rds_tcp_accept_one() assumes that the 0'th
rds_tcp_conn_path is UP and starts multipath accepts at index 1.
But this assumption may not always be true: if the 0'th path
has failed (ERROR or DOWN state) an incoming connection request
should be used to resurrect this path.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
RDS: TCP: report addr/port info based on TCP socket in rds-info
The socket argument passed to rds_tcp_tc_info() is a PF_RDS socket,
so it is incorrect to report the address port info based on
rds_getname() as part of TCP state report.
Invoke inet_getname() for the t_sock associated with the
rds_tcp_connection instead.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Do not set sk_err when dequeuing errors from the error queue.
Doing so results in:
a) Bugs: By overwriting existing sk_err values, it possibly
hides legitimate errors. It is also incorrect when local
errors are queued with ip_local_error. That happens in the
context of a system call, which already returns the error
code.
b) Inconsistent behavior: When there are pending errors on
the error queue, sk_err is sometimes 0 (e.g., for
the first timestamp on the error queue) and sometimes
set to an error code (after dequeuing the first
timestamp).
c) Suboptimality: Setting sk_err to ENOMSG on simple
TX timestamps can abort parallel reads and writes.
Removing this line doesn't break userspace. This is because
userspace code cannot rely on sk_err for detecting whether
there is something on the error queue. Except for ICMP messages
received for UDP and RAW, sk_err is not set at enqueue time,
and as a result sk_err can be 0 while there are plenty of
errors on the error queue.
For ICMP packets in UDP and RAW, sk_err is set when they are
enqueued on the error queue, but that does not result in aborting
reads and writes. For such cases, sk_err is only readable via
getsockopt(SO_ERROR) which will reset the value of sk_err on
its own. More importantly, prior to this patch,
recvmsg(MSG_ERRQUEUE) has a race on setting sk_err (i.e.,
sk_err is set by sock_dequeue_err_skb without atomic ops or
locks) which can store 0 in sk_err even when we have ICMP
messages pending. Removing this line from sock_dequeue_err_skb
eliminates that race.
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 8 Nov 2016 01:15:56 +0000 (20:15 -0500)]
Merge branch 'IFF_NO_QUEUE-semantics'
Jesper Dangaard Brouer says:
====================
qdisc and tx_queue_len cleanups for IFF_NO_QUEUE devices
This patchset is a cleanup for IFF_NO_QUEUE devices. It will
hopefully help userspace get a more consistent behavior when attaching
qdisc to such virtual devices.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device
It is a clear misconfiguration to attach a qdisc to a device with
tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
htb, plug and sfb) inherit/copy this value as their queue length.
Why should the kernel catch such a misconfiguration? Because prior to
introducing the IFF_NO_QUEUE device flag, userspace found a loophole
in the qdisc config system that allowed them to achieve the equivalent
of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
a device. The loophole on older kernels is setting tx_queue_len=0,
*prior* to device qdisc init (the config time is significant, simply
setting tx_queue_len=0 doesn't trigger the loophole).
This loophole is currently used by Docker[1] to get better performance
and scalability out of the veth device. The Docker developers were
warned[1] that they needed to adjust the tx_queue_len if ever
attaching a qdisc. The OpenShift project didn't remember this warning
and attached a qdisc, this were caught and fixed in[2].
Instead of fixing every userspace program that used this loophole, and
forgot to reset the tx_queue_len, prior to attaching a qdisc. Let's
catch the misconfiguration on the kernel side.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/qdisc: IFF_NO_QUEUE drivers should use consistent TX queue len
The flag IFF_NO_QUEUE marks virtual device drivers that doesn't need a
default qdisc attached, given they will be backed by physical device,
that already have a qdisc attached for pushback.
It is still supported to attach a qdisc to a IFF_NO_QUEUE device, as
this can be useful for difference policy reasons (e.g. bandwidth
limiting containers). For this to work, the tx_queue_len need to have
a sane value, because some qdiscs inherit/copy the tx_queue_len
(namely, pfifo, bfifo, gred, htb, plug and sfb).
Commit a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling
ether_setup()") caught situations where some drivers didn't initialize
tx_queue_len. The problem with the commit was choosing 1 as the
fallback value.
A qdisc queue length of 1 causes more harm than good, because it
creates hard to debug situations for userspace. It gives userspace a
false sense of a working config after attaching a qdisc. As low
volume traffic (that doesn't activate the qdisc policy) works,
like ping, while traffic that e.g. needs shaping cannot reach the
configured policy levels, given the queue length is too small.
This patch change the value to DEFAULT_TX_QUEUE_LEN, given other
IFF_NO_QUEUE devices (that call ether_setup()) also use this value.
Fixes: a813104d9233 ("IFF_NO_QUEUE: Fix for drivers not calling ether_setup()") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: make default TX queue length a defined constant
The default TX queue length of Ethernet devices have been a magic
constant of 1000, ever since the initial git import.
Looking back in historical trees[1][2] the value used to be 100,
with the same comment "Ethernet wants good queues". The commit[3]
that changed this from 100 to 1000 didn't describe why, but from
conversations with Robert Olsson it seems that it was changed
when Ethernet devices went from 100Mbit/s to 1Gbit/s, because the
link speed increased x10 the queue size were also adjusted. This
value later caused much heartache for the bufferbloat community.
This patch merely moves the value into a defined constant.
David S. Miller [Mon, 7 Nov 2016 18:24:42 +0000 (13:24 -0500)]
Merge branch 'udp-fwd-mem-sched-on-dequeue'
Paolo Abeni says:
====================
udp: do fwd memory scheduling on dequeue
After commit 850cbaddb52d ("udp: use it's own memory accounting schema"),
the udp code needs to acquire twice the receive queue spinlock on dequeue.
This patch series remove the need for the second lock at skb free time,
moving the udp memory scheduling inside the dequeue operation; the skb
destructor field is not used anymore and an additional sk argument is added
to ip_cmsg_recv_offset() to cope with null skb->sk after dequeue.
Many thanks to Eric Dumazed for suggesting pretty all much the above.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 4 Nov 2016 10:28:59 +0000 (11:28 +0100)]
udp: do fwd memory scheduling on dequeue
A new argument is added to __skb_recv_datagram to provide
an explicit skb destructor, invoked under the receive queue
lock.
The UDP protocol uses such argument to perform memory
reclaiming on dequeue, so that the UDP protocol does not
set anymore skb->desctructor.
Instead explicit memory reclaiming is performed at close() time and
when skbs are removed from the receive queue.
The in kernel UDP protocol users now need to call a
skb_recv_udp() variant instead of skb_recv_datagram() to
properly perform memory accounting on dequeue.
Overall, this allows acquiring only once the receive queue
lock on dequeue.
Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.
v1 -> v2:
- do rmem and allocated memory scheduling under the receive lock
- do bulk scheduling in first_packet_length() and in udp_destruct_sock()
- avoid the typdef for the dequeue callback
Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 4 Nov 2016 10:28:58 +0000 (11:28 +0100)]
net/sock: add an explicit sk argument for ip_cmsg_recv_offset()
So that we can use it even after orphaining the skbuff.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Nov 2016 18:11:23 +0000 (13:11 -0500)]
Merge branch 'ns2-amac'
Jon Mason says:
====================
add NS2 support to bgmac
Changes in v6:
* Use a common bgmac_phy_connect_direct (per Rafal Milecki)
* Rebased on latest net-next
* Added Reviewed-by to the relevant patches
Changes in v5:
* Change a pr_err to netdev_err (per Scott Branden)
* Reword the lane swap binding documentation (per Andrew Lunn)
Changes in v4:
* Actually send out the lane swap binding doc patch (Per Scott Branden)
* Remove unused #define (Per Andrew Lunn)
Changes in v3:
* Clean-up the bgmac DT binding doc (per Rob Herring)
* Document the lane swap binding and make it generic (Per Andrew Lunn)
Changes in v2:
* Remove the PHY power-on (per Andrew Lunn)
* Misc PHY clean-ups regarding comments and #defines (per Andrew Lunn)
This results on none of the original PHY code from Vikas being
present. So, I'm removing him as an author and giving him
"Inspired-by" credit.
* Move PHY lane swapping to PHY driver (per Andrew Lunn and Florian
Fainelli)
* Remove bgmac sleep (per Florian Fainelli)
* Re-add bgmac chip reset (per Florian Fainelli and Ray Jui)
* Rebased on latest net-next
* Added patch for bcm54xx_auxctl_read, which is used in the BCM54810
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:11:01 +0000 (01:11 -0400)]
net: ethernet: bgmac: add NS2 support
Add support for the variant of amac hardware present in the Broadcom
Northstar2 based SoCs. Northstar2 requires an additional register to be
configured with the port speed/duplexity (NICPM). This can be added to
the link callback to hide it from the instances that do not use this.
Also, clearing of the pending interrupts on init is required due to
observed issues on some platforms.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:59 +0000 (01:10 -0400)]
Documentation: devicetree: net: add NS2 bindings to amac
Clean-up the documentation to the bgmac-amac driver, per suggestion by
Rob Herring, and add details for NS2 support.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:58 +0000 (01:10 -0400)]
net: phy: broadcom: Add BCM54810 PHY entry
The BCM54810 PHY requires some semi-unique configuration, which results
in some additional configuration in addition to the standard config.
Also, some users of the BCM54810 require the PHY lanes to be swapped.
Since there is no way to detect this, add a device tree query to see if
it is applicable.
Inspired-by: Vikas Soni <vsoni@broadcom.com> Signed-off-by: Jon Mason <jon.mason@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:57 +0000 (01:10 -0400)]
Documentation: devicetree: add PHY lane swap binding
Add the documentation for PHY lane swapping. This is a boolean entry to
notify the phy device drivers that the TX/RX lanes need to be swapped.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Mason [Fri, 4 Nov 2016 05:10:56 +0000 (01:10 -0400)]
net: phy: broadcom: add bcm54xx_auxctl_read
Add a helper function to read the AUXCTL register for the BCM54xx. This
mirrors the bcm54xx_auxctl_write function already present in the code.
Signed-off-by: Jon Mason <jon.mason@broadcom.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch set aims to remove the init/exit callbacks from the
dwmac-sti driver and instead use standard PM callbacks. Doing this
will also allow us to cleanup the driver.
Eventually the init/exit callbacks will be deprecated and removed
from all drivers dwmac-* except for dwmac-generic. Drivers will be
refactored to use standard PM and remove callbacks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev member of struct sti_dwmac is not used anywhere in the driver
so lets just remove it.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: dwmac-sti: clean up and rename sti_dwmac_init
Rename sti_dwmac_init to sti_dwmac_set_mode which is a better
description for what it really does.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: dwmac-sti: move clk_prepare_enable out of init and add error handling
Add clock error handling to probe and in the process move clock enabling
out of sti_dwmac_init() to make this easier.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: dwmac-sti: move st, gmac_en parsing to sti_dwmac_parse_data
The sti_dwmac_init() function is called both from probe and resume.
Since DT properties doesn't change between suspend/resume cycles move
parsing of this parameter into sti_dwmac_parse_data() where it belongs.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Implement PM callbacks and driver remove in the driver instead
of relying on the init/exit hooks in stmmac_platform. This gives
the driver more flexibility in how the code is organized.
Eventually the init/exit callbacks will be deprecated in favor
of the standard PM callbacks and driver remove function.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since sti_dwmac_parse_data() sets dwmac->clk to NULL if not clock was
provided in DT and NULL is a valid clock there is no need to check for
NULL before using this clock.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since dwmac-sti is a DT only driver checking for OF node is not necessary.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Tested-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Nov 2016 02:42:34 +0000 (21:42 -0500)]
Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2016-11-04
This series contains updates to ixgbe and ixgbevf only.
Don does cleanup and configuration for our X553 devices, related to LED,
auto-negotiation, flow control and SFP+ setup and config. Adds the
(not secret) sauce for B0 hardware for X553 hardware.
Emil provides several fixes, first replaces the driver specific MDIO
defines for the more preferred equivalent kernel ones. Provides a fix
for auto-negotiaion status, by reading a PHY register twice. Introduces
ixgbe_link_operations structure to allow X550EM_a to override the
methods for MDIO access while X550EM_x provides methods to use I2C
combined access.
Mark fixes an issue where the driver was crashing when msix_entires
were not there because they were freed by a previous suspend or remove.
Sowmini Varadhan fixes an issue where an incorrect check for IPPROTO_UDP
in ixgbe_atr(). Then makes sure that the network and transport headers
in the paged data are available in the headlen bytes to calculate the
l4_proto.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mark Rustad [Fri, 28 Oct 2016 17:46:39 +0000 (10:46 -0700)]
ixgbevf: Handle previously-freed msix_entries
The msix_entries memory can be freed by a previous suspend or
remove, so don't crash on close when it isn't there. Also only
clear the interrupts when the interface is up, because there
aren't any when it is not up.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sowmini Varadhan [Mon, 24 Oct 2016 22:36:39 +0000 (15:36 -0700)]
ixgbe: ixgbe_atr() compute l4_proto only if non-paged data has network/transport headers
For some Tx paths (e.g., tpacket_snd()), ixgbe_atr may be
passed down an sk_buff that has the network and transport
header in the paged data, so it needs to make sure these
headers are available in the headlen bytes to calculate the
l4_proto.
This patch expect that network and transport headers are
already available in the non-paged header dat. The assumption
is that the caller has set this up if l4_proto based Tx
steering is desired.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sowmini Varadhan [Mon, 24 Oct 2016 22:36:38 +0000 (15:36 -0700)]
ixgbe: ixgbe_atr() should access udp_hdr(skb) only for UDP packets
Commit 9f12df906cd8 ("ixgbe: Store VXLAN port number in network order")
incorrectly checks for hdr.ipv4->protocol != IPPROTO_UDP
in ixgbe_atr(). This check should be for "==" instead.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Fri, 4 Nov 2016 20:46:16 +0000 (16:46 -0400)]
ixgbe: Correct X550 phy ID
We were using an old Alpha version of the X550 phy ID. This was leading
to unnecessary queries of the PHY. I removed the old ID (which shouldn't
be on any HW) and add the two that are.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Fri, 4 Nov 2016 01:01:37 +0000 (21:01 -0400)]
ixgbe: Add X553 FW ALEF support
This patch add X553 FW ALEF support for B0. ALEF is the new unified
FW. This contains updated register defines for ALEF speed
configuration. Likewise it also removes the AN_CNTL_8 usage from
the native SFI flow as it is no longer supported by FW.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This will allow X550EM_a to override these methods for MDIO access
while X550EM_x provides methods to use I2C combined access. This
also adds a new structure, ixgbe_link_info, to hold information
about the link. Initially this is just method pointers and a bus
address.
The functions involved in combined I2C accesses were moved from
ixgbe_phy.c to ixgbe_x550.c. The underlying functions that carry
out the combined I2C accesses were left in ixgbe_phy.c because
they share some functions with other I2C methods.
v2 - set hw->link.ops in probe.
v3 - check ii->link_ops before setting it since we don't have it
for all devices.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Tue, 27 Sep 2016 18:31:12 +0000 (14:31 -0400)]
ixgbe: Add X553 PHY FC autoneg support
This patch adds X553 flow control auto negotiation for fiber and
backplain. To enable this new function pointers were added as well
as creating a function to dynamically set function pointer we can't
define only on MAC type.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Sat, 22 Oct 2016 01:10:54 +0000 (21:10 -0400)]
ixgbe: Update setup PHY link to unset all speeds
This patch updates ixgbe_setup_phy_link_generic to set/unset
auto-negotiation for all speeds. This ensures that unsupported
speeds are unset. This is necessary since the PHY NVM may
advertise unsupported speeds.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Fri, 21 Oct 2016 01:42:00 +0000 (21:42 -0400)]
ixgbe: Add support to retrieve and store LED link active
This patch adds support to get the LED link active via the LEDCTL
register. If the LEDCTL register does not have LED link active
(LED mode field = 0x0100) set then default LED link active returned.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Thu, 22 Sep 2016 00:21:52 +0000 (20:21 -0400)]
ixgbe: Add X552 iXFI configuration helper function
X553 doesn't need all the initialization that X552 did for iXFI. This
patch will allow native SPI SFP+ to work with X553 devices. Future
patches will add additional configuration as needed.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Fri, 4 Nov 2016 18:56:17 +0000 (14:56 -0400)]
Merge branch 'nfp-ring-reconfig-and-xdp-support'
Jakub Kicinski says:
====================
ring reconfiguration and XDP support
This set adds support for ethtool channel API and XDP.
I kick off with ethtool get_channels() implementation.
set_channels() needs some preparations to get right. I follow
the prepare/commit paradigm and allocate all resources before
stopping the device. It has already been done for ndo_change_mtu
and ethtool set_ringparam(), it makes sense now to consolidate all
the required logic in one place.
XDP support requires splitting TX rings into two classes -
for the stack and for XDP. The ring structures are identical.
The differences are in how they are connected to IRQ vector
structs and how the completion/cleanup works. When XDP is enabled
I switch from the frag allocator to page-per-packet and map buffers
BIDIRECTIONALly.
Last but not least XDP offload is added (the patch just takes
care of the small formal differences between cls_bpf and XDP).
There is a tiny & trivial DebugFS patch in the mix, I hope it can
be taken via net-next provided we have the right Acks.
Resending with improved commit message and CCing more people on patch 10.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:12:08 +0000 (17:12 +0000)]
nfp: remove unnecessary parameters from nfp_net_bpf_offload()
nfp_net_bpf_offload() takes all .setup_tc() parameters but it
doesn't use them at the moment. Remove unnecessary ones to make
it possible for XDP to reuse this function.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:12:07 +0000 (17:12 +0000)]
nfp: add XDP support in the driver
Add XDP support. Separate stack's and XDP's TX rings logically.
Add functions for handling XDP_TX and cleanup of XDP's TX rings.
For XDP allocate all RX buffers as separate pages and map them
with DMA_BIDIRECTIONAL.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:12:06 +0000 (17:12 +0000)]
debugfs: constify argument to debugfs_real_fops()
seq_file users can only access const version of file pointer,
because the ->file member of struct seq_operations is marked
as such. Make parameter to debugfs_real_fops() const.
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Nicolai Stange <nicstange@gmail.com> CC: Christian Lamparter <chunkeey@gmail.com> CC: LKML <linux-kernel@vger.kernel.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:12:05 +0000 (17:12 +0000)]
nfp: reorganize nfp_net_rx() to get packet offsets early
Calculate packet offsets early in nfp_net_rx() so that we will be
able to use them in upcoming XDP handler. While at it move relevant
variables into the loop scope.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:12:04 +0000 (17:12 +0000)]
nfp: add support for ethtool .set_channels
Allow changing the number of rings via ethtool .set_channels API.
Runtime reconfig needs to be extended to handle number of rings.
We need to be able to activate interrupt vectors before rings are
assigned to them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:11:59 +0000 (17:11 +0000)]
nfp: rename ring allocation helpers
"Shadow" in ring helpers used to mean that the helper will allocate
rings without touching existing configuration, this was used for
reconfiguration while the device was running. We will soon use
the same helpers for .ndo_open() path, so replace "shadow" with
"ring_set".
No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Thu, 3 Nov 2016 17:11:58 +0000 (17:11 +0000)]
nfp: centralize runtime reconfiguration logic
All functions which need to reallocate ring resources at runtime
look very similar. Centralize that logic into a separate function.
Encapsulate configuration parameters in a structure.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series is targeted at preparing the driver for a new PCI version
of the hardware. After this series is applied, a follow-on series will
introduce the support for the PCI version of the hardware.
The following updates and fixes are included in this driver update series:
- Fix formatting of PCS debug register dump
- Prepare for priority-based FIFO allocation
- Implement priority-based FIFO allocation
- Prepare for working with more than one type of PCS/PHY
- Prepare for the introduction of clause 37 auto-negotiation
- Add support for clause 37 auto-negotiation
- Prepare for supporting a new PCS register access method
- Add support for 64-bit management counter registers
- Update DMA channel status determination
- Prepare for supporting PCI devices in addition to platform devices
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the driver framework to separate out platform/ACPI specific code
from general code during device initialization. This will allow for the
introduction of PCI device support.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
amd-xgbe: Update how to determine DMA channel status
Tx and Rx DMA channel status determiniation is different depending on the
version of the hardware. Update the channel status processing code to
account for the change. Also, reduce the timeout value used when stopping
the channels.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
amd-xgbe: Support for 64-bit management counter registers
Add support for reading all management counter registers as 64-bit
values. The indication of whether to read the high 32-bits to form
a 64-bit value is indicated in the version data.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
amd-xgbe: Prepare for a new PCS register access method
Prepare the code to be able to support accessing of the PCS registers
in a new way, while maintaining the current access method. Provide a
version specific field that indicates the method to use.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
amd-xgbe: Prepare for introduction of clause 37 autoneg
Prepare for the future introduction of clause 37 auto-negotiation by
updating the current auto-negotiation related functions to identify
them as clause 73 functions. Move interrupt enablement to the
enable/disable auto-negotiation functions. Update what will be common
routines to check for the current type of AN and process accordingly.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
amd-xgbe: Prepare for working with more than one type of phy
Prepare the code to be able to work with more than one type of phy by
adding additional callable functions into the phy interface and removing
phy specific settings/functions from non-phy related files.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate the FIFO across the hardware Rx queues based on the priority
of the queues. Giving more FIFO resources to queues with a higher
priority. If PFC is active but not enabled for a queue, then less
resources can allocated to the queue.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
amd-xgbe: Prepare for priority-based FIFO allocation
Currently, the Rx and Tx fifos are evenly allocated between the hardware
queues of the device. As more queues are instantiated, the fifo memory
needs to be able to be allocated based on queue priority. This allows for
higher priority queues to have more fifo memory than lower priority
queues. Prepare for this by modifying the current fifo calculation to
assign the fifo queue allocation in an array that is then used to program
the hardware.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Nov 2016 18:45:24 +0000 (14:45 -0400)]
Merge branch 'uid-routing'
Lorenzo Colitti says:
====================
net: inet: Support UID-based routing
This patchset adds support for per-UID routing. It allows the
administrator to configure rules such as:
ip rule add uidrange 100-200 lookup 123
This functionality has been in use by all Android devices since
5.0. It is primarily used to impose per-app routing policies (on
Android, every app has its own UID) without having to resort to
rerouting packets in iptables, which breaks getsockname() and
MTU/MSS calculation, and generally disrupts end-to-end
connectivity.
This patch series is similar to the code currently used on
Android, but has better correctness and performance because
it stores the UID in the socket instead of calling sock_i_uid.
This avoids contention on sk->sk_callback_lock, and makes it
possible to correctly route a socket on which userspace has
called close(), for which sock_i_uid will return 0.
Changes from v1:
- Don't set the UID in sk_clone_lock, it's already set by
sock_copy.
- For packets originated by kernel sockets, don't use the socket
UID. This is the UID that created the namespace, but it might
not be mapped in the namespace at all. Instead, use UID 0 in
the namespace, which is less surprising and consistent with
what happens in the root namespace.
- Fix UID routing of IPv4 and IPv6 SYN_RECV sockets.
- Fix UID routing of received IPv6 redirects.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Thu, 3 Nov 2016 17:23:43 +0000 (02:23 +0900)]
net: inet: Support UID-based routing in IP protocols.
- Use the UID in routing lookups made by protocol connect() and
sendmsg() functions.
- Make sure that routing lookups triggered by incoming packets
(e.g., Path MTU discovery) take the UID of the socket into
account.
- For packets not associated with a userspace socket, (e.g., ping
replies) use UID 0 inside the user namespace corresponding to
the network namespace the socket belongs to. This allows
all namespaces to apply routing and iptables rules to
kernel-originated traffic in that namespaces by matching UID 0.
This is better than using the UID of the kernel socket that is
sending the traffic, because the UID of kernel sockets created
at namespace creation time (e.g., the per-processor ICMP and
TCP sockets) is the UID of the user that created the socket,
which might not be mapped in the namespace.
Tested: compiles allnoconfig, allyesconfig, allmodconfig
Tested: https://android-review.googlesource.com/253302 Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Thu, 3 Nov 2016 17:23:42 +0000 (02:23 +0900)]
net: core: add UID to flows, rules, and routes
- Define a new FIB rule attributes, FRA_UID_RANGE, to describe a
range of UIDs.
- Define a RTA_UID attribute for per-UID route lookups and dumps.
- Support passing these attributes to and from userspace via
rtnetlink. The value INVALID_UID indicates no UID was
specified.
- Add a UID field to the flow structures.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Thu, 3 Nov 2016 17:23:41 +0000 (02:23 +0900)]
net: core: Add a UID field to struct sock.
Protocol sockets (struct sock) don't have UIDs, but most of the
time, they map 1:1 to userspace sockets (struct socket) which do.
Various operations such as the iptables xt_owner match need
access to the "UID of a socket", and do so by following the
backpointer to the struct socket. This involves taking
sk_callback_lock and doesn't work when there is no socket
because userspace has already called close().
Simplify this by adding a sk_uid field to struct sock whose value
matches the UID of the corresponding struct socket. The semantics
are as follows:
1. Whenever sk_socket is non-null: sk_uid is the same as the UID
in sk_socket, i.e., matches the return value of sock_i_uid.
Specifically, the UID is set when userspace calls socket(),
fchown(), or accept().
2. When sk_socket is NULL, sk_uid is defined as follows:
- For a socket that no longer has a sk_socket because
userspace has called close(): the previous UID.
- For a cloned socket (e.g., an incoming connection that is
established but on which userspace has not yet called
accept): the UID of the socket it was cloned from.
- For a socket that has never had an sk_socket: UID 0 inside
the user namespace corresponding to the network namespace
the socket belongs to.
Kernel sockets created by sock_create_kern are a special case
of #1 and sk_uid is the user that created them. For kernel
sockets created at network namespace creation time, such as the
per-processor ICMP and TCP sockets, this is the user that created
the network namespace.
Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: dsa: mv88e6xxx: refine port operations
The Marvell chips have one internal SMI device per port, containing a
set of registers used to configure a port's link, STP state, default
VLAN or addresses database, etc.
This patchset creates port files to implement the port operations as
described in datasheets, and extend the chip ops structure with them.
Patches 1 to 6 implement accessors for port's STP state, port based VLAN
map, default FID, default VID, and 802.1Q mode.
Patches 7 to 11 implement the port's MAC setup of link state, duplex
mode, RGMII delay and speed, all accessed through port's register 0x01.
The new port's MAC setup code is used to re-implement the adjust_link
code and correctly force the link down before changing any of the MAC
settings, as requested by the datasheets.
The port's MAC accessors use values compatible with struct phy_device
(e.g. DUPLEX_FULL) and extend them when needed (e.g. SPEED_MAX).
Changes in v2:
- Strictly use new _UNFORCED values instead of re-using _UNKNOWN ones.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 4 Nov 2016 02:23:36 +0000 (03:23 +0100)]
net: dsa: mv88e6xxx: setup port's MAC
Now that we have setters to configure the port's MAC, use them to
refactor the port setup and adjust_link code.
Note that port's MAC speed, duplex or RGMII delay must not be changed
unless the port's link is forced down. So wrap all that in a
mv88e6xxx_port_setup_mac function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 4 Nov 2016 02:23:35 +0000 (03:23 +0100)]
net: dsa: mv88e6xxx: add port's MAC speed setter
While the two bits for link, duplex or RGMII delays are used the same
way on chips supporting the said feature, the two bits for speed have
different meaning for most of the chips out there.
Speed value is stored in bits 1:0, 0x3 means unforce (normal detection).
Some chips reuse values for alternative speeds when bit 12 is set.
Newer chips with speed > 1Gbps reuse value 0x3 thus need a new bit 13.
Here are the values to write in register 0x1 to (un)force speed:
Some chips such as 88E6352 and 88E6390 can be programmed to add delays
to RXCLK for IND inputs or to GTXCLK for OUTD outputs when port is in
RGMII mode.
Add a port function to program such delays according to the provided PHY
interface mode.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 4 Nov 2016 02:23:32 +0000 (03:23 +0100)]
net: dsa: mv88e6xxx: add port link setter
Most of the chips will have a port register control bits to force the
port's link up, down, or let normal link detection occurs.
Implement such operation to use it later when setting duplex, etc.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 4 Nov 2016 02:23:26 +0000 (03:23 +0100)]
net: dsa: mv88e6xxx: add port files
The Marvell switches contains one internal SMI device per port, called
"Port Registers". Depending on the model, the addresses of these devices
start from 0x0, 0x8 or 0x10.
Start moving Port Registers specific code to their own files.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Nov 2016 19:41:12 +0000 (15:41 -0400)]
Merge branch 'ip-recvfragsize-cmsg'
Willem de Bruijn says:
====================
ip: add RECVFRAGSIZE cmsg
On IP datagrams and raw sockets, when packets arrive fragmented,
expose the largest received fragment size through a new cmsg.
Protocols implemented on top of these sockets may use this, for
instance, to inform peers to lower MSS on platforms that silently
allow send calls to exceed PMTU and cause fragmentation.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
IP6CB and IPCB have a frag_max_size field. In IPv6 this field is
filled in when packets are reassembled by the connection tracking
code. Also fill in when reassembling in the input path, to expose
it through cmsg IPV6_RECVFRAGSIZE in all cases.
Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When reading a datagram or raw packet that arrived fragmented, expose
the maximum fragment size if recorded to allow applications to
estimate receive path MTU.
At this point, the field is only recorded when ipv6 connection
tracking is enabled. A follow-up patch will record this field also
in the ipv6 input path.
Tested using the test for IP_RECVFRAGSIZE plus
ip netns exec to ip addr add dev veth1 fc07::1/64
ip netns exec from ip addr add dev veth0 fc07::2/64
ip netns exec to ./recv_cmsg_recvfragsize -6 -u -p 6000 &
ip netns exec from nc -q 1 -u fc07::1 6000 < payload
Both with and without enabling connection tracking
ip6tables -A INPUT -m state --state NEW -p udp -j LOG
Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The IP stack records the largest fragment of a reassembled packet
in IPCB(skb)->frag_max_size. When reading a datagram or raw packet
that arrived fragmented, expose the value to allow applications to
estimate receive path MTU.
Tested:
Sent data over a veth pair of which the source has a small mtu.
Sent data using netcat, received using a dedicated process.
Verified that the cmsg IP_RECVFRAGSIZE is returned only when
data arrives fragmented, and in that cases matches the veth mtu.
ip link add veth0 type veth peer name veth1
ip netns add from
ip netns add to
ip link set dev veth1 netns to
ip netns exec to ip addr add dev veth1 192.168.10.1/24
ip netns exec to ip link set dev veth1 up
ip link set dev veth0 netns from
ip netns exec from ip addr add dev veth0 192.168.10.2/24
ip netns exec from ip link set dev veth0 up
ip netns exec from ip link set dev veth0 mtu 1300
ip netns exec from ethtool -K veth0 ufo off
This patchset add support for the Sysnopsys DWMAC Gigabit Ethernet
controller Glue layer of the Oxford Semiconductor OX820 SoC.
Changes since v2 at http://lkml.kernel.org/r/20161031105345.16711-1-narmstrong@baylibre.com :
- Disable/Unprepare clock if regmap read fails in oxnas_dwmac_init
Changes since v1 at https://patchwork.kernel.org/patch/9388231/ :
- Split dt-bindings in a separate patch
- Add IP version in the dt-bindings compatible
- Check return of clk_prepare_enable()
- use get_stmmac_bsp_priv() helper
- hardwire setup values in oxnas_dwmac_init()
Changes since RFC at https://patchwork.kernel.org/patch/9387257 :
- Drop init/exit callbacks
- Implement proper remove and PM callback
- Call init from probe
- Disable/Unprepare clock if stmmac probe fails
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Armstrong [Wed, 2 Nov 2016 14:02:36 +0000 (15:02 +0100)]
net: stmmac: Add OXNAS Glue Driver
Add Synopsys Designware MAC Glue layer for the Oxford Semiconductor OX820.
Acked-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 3 Nov 2016 19:25:27 +0000 (15:25 -0400)]
Merge branch 'diag-raw-fixes'
Cyrill Gorcunov says:
====================
net: Fixes for raw diag sockets handling
Hi! Here are a few fixes for raw-diag sockets handling: missing
sock_put call and jump for exiting from nested cycle. I made
patches for iproute2 as well so will send them out soon.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Cyrill Gorcunov [Wed, 2 Nov 2016 12:36:32 +0000 (15:36 +0300)]
net: ip, raw_diag -- Use jump for exiting from nested loop
I managed to miss that sk_for_each is called under "for"
cycle so need to use goto here to return matching socket.
CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Ahern <dsa@cumulusnetworks.com> CC: Andrey Vagin <avagin@openvz.org> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cyrill Gorcunov [Wed, 2 Nov 2016 12:36:31 +0000 (15:36 +0300)]
net: ip, raw_diag -- Fix socket leaking for destroy request
In raw_diag_destroy the helper raw_sock_get returns
with sock_hold call, so we have to put it then.
CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Ahern <dsa@cumulusnetworks.com> CC: Andrey Vagin <avagin@openvz.org> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Driver sets the skb l4/l3 hash based on NIC_CFG_RSS_HASH_TYPE_*,
which is bit mask. This is wrong. Hw actually provides us enum.
Use CQ_ENET_RQ_DESC_RSS_TYPE_* to set l3 and l4 hash type.
Fixes: bf751ba802fe ("driver/net: enic: record q_number and rss_hash for skb") Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>