Jesse Brandeburg [Thu, 14 Jan 2016 00:51:38 +0000 (16:51 -0800)]
i40e: move sync_vsi_filters up in service_task
The sync_vsi_filters function is moved up in the service_task because
it may need to request a reset, and we don't want to wait another round
of service task time.
NOTE: Filters will be replayed by sync_vsi_filters including broadcast
and promiscuous settings.
Also, added some error handling in this space in case any of these
fail the driver will retry correctly.
Also update copyright year.
Change-ID: I23f3d552100baecea69466339f738f27614efd47 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Wed, 17 Feb 2016 21:15:47 +0000 (16:15 -0500)]
Merge branch 'inet_lro-remove'
Ben Hutchings says:
====================
Remove the inet_lro library
The old inet_lro library has been deprecated ever since GRO was
introduced, but there are still a few drivers using it. Convert
them to GRO and remove it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Mon, 15 Feb 2016 18:22:35 +0000 (13:22 -0500)]
qed/qede: use 8.7.3.0 FW.
This patch moves the qed* driver into utilizing the 8.7.3.0 FW.
This new FW is required for a lot of new SW features, including:
- Vlan filtering offload
- Encapsulation offload support
- HW ingress aggregations
As well as paving the way for the possibility of adding storage protocols
in the future.
V2:
- Fix kbuild test robot error/warnings.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@qlogic.com> Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 15 Feb 2016 06:28:05 +0000 (14:28 +0800)]
sctp: remove the unused sctp_datamsg_free()
Since commit 8b570dc9f7b6 ("sctp: only drop the reference on the datamsg
after sending a msg") used sctp_datamsg_put in sctp_sendmsg, instead of
sctp_datamsg_free, this function has no use in sctp.
So we will remove it.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 15 Feb 2016 06:28:03 +0000 (14:28 +0800)]
sctp: move rcu_read_lock from __sctp_lookup_association to sctp_lookup_association
__sctp_lookup_association() is only invoked by sctp_v4_err() and
sctp_rcv(), both which run on the rx BH, and it has been protected
by rcu_read_lock [see ip_local_deliver_finish() / ipv6_rcv()].
So we can move it to sctp_lookup_association, only let
sctp_lookup_association use rcu_read_lock.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rosen, Rami [Mon, 15 Feb 2016 00:39:43 +0000 (02:39 +0200)]
core: remove unneded headers for net cgroup controllers.
commit 3ed80a6 (cgroup: drop module support) made including
module.h redundant in the net cgroup controllers,
netclassid_cgroup.c and netprio_cgroup.c. This patch
removes them.
Signed-off-by: Rami Rosen <rami.rosen@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 14 Feb 2016 19:56:33 +0000 (22:56 +0300)]
sh_eth: kill useless *switch* defaults
The driver often has the *default* cases doing nothing in the *switch*
statements with the integer expressions -- remove them.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 14 Feb 2016 19:56:03 +0000 (22:56 +0300)]
ravb: kill useless *switch* defaults
The driver has the *default* case doing nothing in the *switch* statement
with an integer expression -- remove it.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 Feb 2016 20:23:53 +0000 (15:23 -0500)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2016-02-17
This series contains updates to i40e/i40evf only.
Jesse cleans up a duplicate declaration in probe. Then fixes a bug where the
driver was using a offset based off a DMA handle while mapping and unmapping
using sync_single_range_for_[cpu|device], when it should be using DMA handle
(returned from alloc_coherent) and the offset of the memory to be sync'd.
Fixed an issue where sync_vsi_filter() was allocating memory in a way that
could sleep (GFP_KERNEL) which was causing a problem when called by the
team driver under rcu_read_lock().
Shannon adds a check to ensure we do not attempt to do TSO when
skb->ip_summed is not set to CHECKSUM_PARTIAL.
Kiran add functions related to port mirroring features such as add/delete
mirror rule.
Jacob assigns the i40e_pf structure directly instead of using a large
memcpy, to avoid a sparse warning and lets the compiler optimize the copy
since it know the size of the structure in advance.
Anjali enables GENEVE capability for XL710/X710 devices with firmware API
version higher than 1.4. Added flag for automatic rule eviction feature
for X722, which is disabled by default.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The following updates and fixes are included in this driver update series:
- Disable VLAN filtering in promiscuous mode
- Change from using napi_complete to napi_complete_done
- Use __napi_schedule_irqoff when running in interrupt context
- Verify ethtool speed setting is valid for the selected speedset
- Enable PFC based on the pfc_en setting
- Fix the mapping of priorities to traffic classes
- Do traffic class setup when DCB nl callbacks are invoked
- Check Rx queue fifos before stopping Rx queue DMA
- Switch from disable_irq to masking interrupts for auto-negotiation
This patch series is based on net-next.
Changes from v1:
- Removed #ifndef and #define of CRCPOLY_LE as part of the patch to
disable VLAN filtering in promiscuous mode
- Reworked changes to xgbe_setup_tc to resolve conflicts with commit 16e5cc647173 (net: rework setup_tc ndo op to consume general tc operand)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Wed, 17 Feb 2016 17:49:28 +0000 (11:49 -0600)]
amd-xgbe: Mask auto-negotiation interrupts in ISR
Currently the auto-negotiation interrupt handling disables the irq
instead of masking off the interrupts. This was done because the phy
library was originally used to read and write the PCS registers, which
could not be performed in interrupt context. Now that the phy library is
no longer used to read and write the PCS registers the interrupts can be
masked off in the interrupt service routine eliminating the need to call
disable_irq/enable_irq. This also requires changing the protection mutex
to a spinlock.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Wed, 17 Feb 2016 17:49:08 +0000 (11:49 -0600)]
amd-xgbe: Do traffic class setup when called through dcbnl
Currently the netdev traffic class setup is only performed when invoked
through the ndo_setup_tc interface. However, the same setup should be
performed when the dcbnl interface (ieee_setets) is invoked. Rework the
netdev traffic class setup to be invokable through either interface and
also provide the priority to traffic class mapping if available.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Wed, 17 Feb 2016 17:48:57 +0000 (11:48 -0600)]
amd-xgbe: Fix the mapping of priorities to traffic classes
The driver is checking the pfc_en field of the ieee_pfc structure to
determine whether to associate a priority with a traffic class. This is
incorrect since the pfc_en field is for determining if PFC is enabled
for a traffic class.
The association of priority to traffic class does not depend on whether
the traffic class is enabled for PFC, so remove that check. Also, the
mapping of priorities to traffic classes should be done when configuring
the traffic classes and not the PFC support so move the priority to traffic
class association from xgbe_config_dcb_pfc to xgbe_config_dcb_tc.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Wed, 17 Feb 2016 17:48:48 +0000 (11:48 -0600)]
amd-xgbe: Enable/disable PFC per traffic class
Currently the PFC flow control is enabled on all traffic classes if
one or more traffic classes request it. The PFC enable setting of the
traffic class should be used to determine whether to enable or disable
flow control for the traffic class.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Wed, 17 Feb 2016 17:48:29 +0000 (11:48 -0600)]
amd-xgbe: Use __napi_schedule_irqoff
Change from calling __napi_schedule to __napi_schedule_irqoff when running
in interrupt context or when called by netpoll with interrupts already
disabled. The Tx timer function will continue to use __napi_schedule.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Wed, 17 Feb 2016 17:48:08 +0000 (11:48 -0600)]
amd-xgbe: Disable VLAN filtering when in promiscuous mode
When the hardware is placed in promiscuous mode it will still perform
VLAN filtering and therefore may not pass all packets to the driver.
Disable all VLAN filtering when entering promiscuous mode and restore
VLAN filtering upon exit from promiscuous mode. In order to avoid adding
forward declarations, move the VLAN related functions earlier in the
file.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Wed, 13 Jan 2016 03:32:31 +0000 (19:32 -0800)]
i40e: use eth_platform_get_mac_address()
This commit converts commit b499ffb0a22c ("i40e: Look up MAC address in
Open Firmware or IDPROM") to use eth_platform_get_mac_address()
added by commit c7f5d105495a ("net: Add eth_platform_get_mac_address()
helper.")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Anjali Singhai [Wed, 6 Jan 2016 19:49:28 +0000 (11:49 -0800)]
i40e: Enable Geneve offload for FW API ver > 1.4 for XL710/X710 devices
This patch makes sure we check the GENEVE offload capable flag before
we attempt offload.
It also enables the Capability for XL710/X710 devices with FW API
version higher than 1.4
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Mon, 4 Jan 2016 18:33:11 +0000 (10:33 -0800)]
i40e: avoid large memcpy by assigning struct
Assign the i40e_pf structure directly instead of using a large memcpy,
which avoids a sparse warning and lets the compiler optimize the copy
since it knows the size of the structure in advance.
Change-ID: I17604e23be2616521eb760290befcb767b52b3f7 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Driver already counted allocation errors, so print
them as part of the ethtool -S output. Useful for debugging
if your system is having trouble making memory available for
the driver.
Change-ID: I83839fa86e81e6d80f03b917c88dd3ef9a64dde0 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Mon, 4 Jan 2016 18:33:08 +0000 (10:33 -0800)]
i40e: negate PHY int mask bits
The PHY interrupt mask bits mask out the events we don't want,
so we need to negate the bitmask of events we want.
Change-ID: I273244da5a8d285b6abc84fd68a90f1e6fa0393e Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Kiran Patil [Mon, 4 Jan 2016 18:33:07 +0000 (10:33 -0800)]
i40e: APIs to Add/remove port mirroring rules
This patch implements necessary functions related to port
mirroring features such as add/delete mirror rule, function
to set promiscuous VLAN mode for VSI if mirror rule_type is
"VLAN Mirroring".
Change-ID: Iaf513fd5f188f99dcb977b48f99e73185dfddc40 Signed-off-by: Kiran Patil <kiran.patil@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The driver was being called by VLAN, bonding, teaming operations
that expected to be able to hold locks like rcu_read_lock().
This causes the driver to be held to the requirement to not sleep,
and was found by the kernel debug options for checking sleep
inside critical section, and the locking validator.
Change-ID: Ibc68c835f5ffa8ffe0638ffe910a66fc5649a7f7 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Nelson, Shannon <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The sync_vsi_filter function was allocating memory in such
a way that it could sleep (GFP_KERNEL) which was causing a problem
when called by the team driver under rcu_read_lock(), which cannot
be held while sleeping. Found with lockdep.
Change-ID: I4e59053cb5eedcf3d0ca151715be3dc42a94bdd5 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Mon, 4 Jan 2016 18:33:04 +0000 (10:33 -0800)]
i40e: do TSO only if CHECKSUM_PARTIAL is set
Don't bother trying to set up a TSO if the skb->ip_summed is not
set to CHECKSUM_PARTIAL.
Change-ID: I6495b3568e404907a2965b48cf3e2effa7c9ab55 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Driver was using an offset based off a DMA handle while mapping and
unmapping using sync_single_range_for[cpu|device], where it should
be using DMA handle (returned from alloc_coherent) and the offset of the
memory to be sync'd.
Change-ID: I208256565b1595ff0e9171ab852de06b997917c6 Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Nelson, Shannon <shannon.nelson@intel.com> Reviewed-by: Williams, Mitch A <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Wed, 17 Feb 2016 14:48:50 +0000 (09:48 -0500)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2016-02-16
This series contains updates to i40e/i40evf only.
Shannon adds flags to MAC allocation requests to signify that the MAC VLAN
filters should come from the shared resource pool. Added a new "set switch
config" admin queue command and the new Cisco VXLAN-GPE cloud tunnel type
for the admin queue commands. Added more detail to the NVM update debug
message in order to see the full ethtool request data. Also added a few
more bits of netdev data into the debugfs output for dump VSI.
Pandi fixes the width of two datatypes which were being declared a different
size from what they are assigned.
Anjali fixes an issue where we were not doing write-back on interrupt
throttle for legacy case in x722.
Mitch adds a counter for ARQ overflows since sometimes an ever-growing
number indicates that something bad is happening. Also added 20G speed for
Tx bandwidth calculations.
Jesse refactors the DCB function based on a community suggestion to change
the multi-level if statement into a switch statement. Cleans up VF device
IDs in the PF, since it does not need to know them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 Feb 2016 14:47:37 +0000 (09:47 -0500)]
Merge branch 'tc-hw-offload'
John Fastabend says:
====================
tc offload for cls_u32 on ixgbe
This extends the setup_tc framework so it can support more than just
the mqprio offload and push other classifiers and qdiscs into the
hardware. The series here targets the u32 classifier and ixgbe
driver. I worked out the u32 classifier because it is protocol
oblivious and aligns with multiple hardware devices I have access
to. I did an initial implementation on ixgbe because (a) I have one
in my box (b) its a stable driver and (c) it is relatively simple
compared to the other devices I have here but still has enough
flexibility to exercise the features of cls_u32.
I intentionally limited the scope of this series to the basic
feature set. Specifically this uses a 'big hammer' feature bit
to do the offload or not. If the bit is set you get offloaded rules
if it is not then rules will not be offloaded. If we can agree on
this patch series there are some more patches on my queue we can
talk about to make the offload decision per rule using flags similar
to how we do l2 mac updates. Additionally the error strategy can
be improved to be hard aborting, log and continue, etc. I think
these are nice to have improvements but shouldn't block this series.
Also by adding get_parse_graph and set_parse_graph attributes as
in my previous flow_api work we can build programmable devices
and programmatically learn when rules can or can not be loaded
into the hardware. Again future work.
... v3 includes a couple style fixups suggested by Jiri and a
quick fix to ixgbe to check features flag and not dev_features
flag the latter being always set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 17 Feb 2016 05:19:19 +0000 (21:19 -0800)]
net: ixgbe: abort with cls u32 divisor groups greater than 1
This patch ensures ixgbe will not try to offload hash tables from the
u32 module. The device class does not currently support this so until
it is enabled just abort on these tables.
Interestingly the more flexible your hardware is the less code you
need to implement to guard against these cases.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 17 Feb 2016 05:18:53 +0000 (21:18 -0800)]
net: ixgbe: add support for tc_u32 offload
This adds initial support for offloading the u32 tc classifier. This
initial implementation only implements a few base matches and actions
to illustrate the use of the infrastructure patches.
However it is an interesting subset because it handles the u32 next
hdr logic to correctly map tcp packets from ip headers using the ihl
and protocol fields. After this is accepted we can extend the match
and action fields easily by updating the model header file.
Also only the drop action is supported initially.
Here is a short test script,
#tc qdisc add dev eth4 ingress
#tc filter add dev eth4 parent ffff: protocol ip \
u32 ht 800: order 1 \
match ip dst 15.0.0.1/32 match ip src 15.0.0.2/32 action drop
<-- hardware has dst/src ip match rule installed -->
#tc filter del dev eth4 parent ffff: prio 49152
#tc filter add dev eth4 parent ffff: protocol ip prio 99 \
handle 1: u32 divisor 1
#tc filter add dev eth4 protocol ip parent ffff: prio 99 \
u32 ht 800: order 1 link 1: \
offset at 0 mask 0f00 shift 6 plus 0 eat match ip protocol 6 ff
#tc filter add dev eth4 parent ffff: protocol ip \
u32 ht 1: order 3 match tcp src 23 ffff action drop
<-- hardware has tcp src port rule installed -->
#tc qdisc del dev eth4 parent ffff:
<-- hardware cleaned up -->
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 17 Feb 2016 05:18:28 +0000 (21:18 -0800)]
net: ixgbe: add minimal parser details for ixgbe
This adds an ixgbe data structure that is used to determine what
headers:fields can be matched and in what order they are supported.
For hardware devices this can be a bit tricky because typically
only pre-programmed (firmware, ucode, rtl) parse graphs will be
supported and we don't yet have an interface to change these from
the OS. So its sort of a you get whatever your friendly vendor
provides affair at the moment.
In the future we can add the get routines and set routines to
update this data structure. One interesting thing to note here
is the data structure here identifies ethernet, ip, and tcp
fields without having to hardcode them as enumerations or use
other identifiers.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 17 Feb 2016 05:18:03 +0000 (21:18 -0800)]
net: tc: helper functions to query action types
This is a helper function drivers can use to learn if the
action type is a drop action.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 17 Feb 2016 05:17:37 +0000 (21:17 -0800)]
net: add tc offload feature flag
Its useful to turn off the qdisc offload feature at a per device
level. This gives us a big hammer to enable/disable offloading.
More fine grained control (i.e. per rule) may be supported later.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 17 Feb 2016 05:17:09 +0000 (21:17 -0800)]
net: sched: add cls_u32 offload hooks for netdevs
This patch allows netdev drivers to consume cls_u32 offloads via
the ndo_setup_tc ndo op.
This works aligns with how network drivers have been doing qdisc
offloads for mqprio.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 17 Feb 2016 05:16:43 +0000 (21:16 -0800)]
net: rework setup_tc ndo op to consume general tc operand
This patch updates setup_tc so we can pass additional parameters into
the ndo op in a generic way. To do this we provide structured union
and type flag.
This lets each classifier and qdisc provide its own set of attributes
without having to add new ndo ops or grow the signature of the
callback.
Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Fastabend [Wed, 17 Feb 2016 05:16:15 +0000 (21:16 -0800)]
net: rework ndo tc op to consume additional qdisc handle parameter
The ndo_setup_tc() op was added to support drivers offloading tx
qdiscs however only support for mqprio was ever added. So we
only ever added support for passing the number of traffic classes
to the driver.
This patch generalizes the ndo_setup_tc op so that a handle can
be provided to indicate if the offload is for ingress or egress
or potentially even child qdiscs.
CC: Murali Karicheri <m-karicheri2@ti.com> CC: Shradha Shah <sshah@solarflare.com> CC: Or Gerlitz <ogerlitz@mellanox.com> CC: Ariel Elior <ariel.elior@qlogic.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Bruce Allan <bruce.w.allan@intel.com> CC: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 23 Dec 2015 20:05:53 +0000 (12:05 -0800)]
i40e: add netdev info to VSI dump
Add a few more bits of netdev data into the debugfs output for dump VSI.
For now, we'll add the features, hw_features, vlan_features, and flags
bitflags and the state. More could be added later if needed.
Also, tweak a couple nearby output lines for output readability.
Change-ID: I9fb5a9da75c9ad7679498ce9ac3ba24d065ddd2e Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Reviewed-by: Brandeburg, Jesse <jesse.brandeburg@intel.com> Reviewed-by: Wyborny, Carolyn <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Wed, 23 Dec 2015 20:05:52 +0000 (12:05 -0800)]
i40evf: enable bus master after reset
If the VF is reset via VFLR, the device will be knocked out of bus
master mode, and the driver will fail to recover from the reset. Fix
this by enabling bus mastering after every reset. In a non-VFLR case,
the bus master bit will not be disabled, and this call will have no effect.
Change-ID: Id515859ac7a691db478222228add6d149e96801a Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 23 Dec 2015 20:05:51 +0000 (12:05 -0800)]
i40e: add a little more to an NVM update debug message
Add a little more detail to an NVM update debug message in order to
see the full ethtool request data.
Change-ID: Iab10437cb32d6fddc67ee347e7c0b42511e152cd Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Kevin Scott <kevin.c.scott@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Wed, 23 Dec 2015 20:05:49 +0000 (12:05 -0800)]
i40e: add 20G speed for Tx bandwidth calculations
When calculating TX bandwidth for VFs, we need to know the link speed to
make sure we don't allocate more bandwidth than is available. Add 20G
link speed to the switch statement so we can support devices that link
at that speed.
Change-ID: I5409f6139d549e5832777db9c22ca0664e0c5f8b Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Wed, 23 Dec 2015 20:05:48 +0000 (12:05 -0800)]
i40e: add counter for arq overflows
Sometimes, ARQ overflows are a big deal and tell us that the
firmware/hardware/driver/something is having problems. But normally
they're no big deal. To assist in assessing this, add a counter to
our Ethtool stats. A handful of ARQ overflows during VF init is no
problem. A large, ever-growing number indicates that Something Bad is
happening.
Change-ID: Ie5348bfbc8a54a890559cb00279c28d976a55096 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e: fix write-back-on-itr to work with legacy itr
We were not doing write-back on interrupt throttle for Legacy case in X722.
This patch fixes that, so we do WB_ON_ITR for Legacy as well. Plus the issue
that we should still be setting NO_ITR if we are touching the DYN_CTLN register
since we do not want to change ITR setting here.
Change-ID: I5db8491ee1544118a389db839cecc93e1bbc480e Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Pandi Maharajan [Wed, 23 Dec 2015 20:05:46 +0000 (12:05 -0800)]
i40e: Store lan_vsi_idx and lan_vsi_id in the right size
lan_vsi_idx and lan_vsi_id are assigned to u16 data sized variables but
declared in u8. This patch fixes the width of the datatype.
Change-ID: If4bcbcc7d32f2b287c51cb33d17879691258dce2 Signed-off-by: Pandi Kumar Maharajan <pandi.maharajan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 23 Dec 2015 20:05:45 +0000 (12:05 -0800)]
i40e: Bump AQ minor version to 1.5 for new FW features
Bump AQ minor version to 1.5 for new FW features.
Change-ID: I5a790f7f519a2a8921aaa1c5663727dd1897ffec Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Kevin Scott <kevin.c.scott@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 23 Dec 2015 20:05:44 +0000 (12:05 -0800)]
i40e: AQ thermal sensor control struct
Add the new AQ command and struct for managing a thermal sensor.
Change-ID: I6f5631839a0f3dca352a6c222f1269a960e2310a Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Wed, 17 Feb 2016 01:42:55 +0000 (20:42 -0500)]
Merge branch 'ip-sysctl-namespaceify'
Nikolay Borisov says:
====================
Namespacify various ip sysctl knobs
This series continues namespacifying more net related knobs.
The focus here is on ip options. Patches 1,3,4,5 namespacify
the respective sysctl knobs. Patch 2 moves some igmp code to the
correct file (and function) and also adds some #ifdef guards to
silence compilation warnings.
Finally, patch 5 exposes the ip fragmentation related sysctls
since all of the knobs are namespaced.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Borisov [Mon, 15 Feb 2016 10:11:28 +0000 (12:11 +0200)]
igmp: net: Move igmp namespace init to correct file
When igmp related sysctl were namespacified their initializatin was
erroneously put into the tcp socket namespace constructor. This
patch moves the relevant code into the igmp namespace constructor to
keep things consistent.
Also sprinkle some #ifdefs to silence warnings
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 Feb 2016 01:38:29 +0000 (20:38 -0500)]
Merge tag 'wireless-drivers-next-for-davem-2016-02-12' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
Major changes:
wl12xx
* add device tree support for SPI
mwifiex
* add debugfs file to read chip information
* add MSIx support for newer pcie chipsets (8997 onwards)
* add schedule scan support
* add WoWLAN net-detect support
* firmware dump support for w8997 chipset
iwlwifi
* continue the work on multiple Rx queues
* add support for beacon storing used in low power states
* use the regular firmware image of WoWLAN
* fix 8000 devices for Big Endian machines
* more firmware debug hooks
* add support for P2P Client snoozing
* make the beacon filtering for AP mode configurable
* fix transmit queues overflow with LSO
libertas
* add support for setting power save via cfg80211
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 12 Feb 2016 10:42:34 +0000 (11:42 +0100)]
net: phy: spi_ks8995: include linux/gpio/consumer.h
The ks8995 phy driver just started using gpiod_* functions, which are
declared in linux/gpio/consumer.h, not linux/gpio.h, resulting in a
build failure in randconfig builds that do not have CONFIG_GPIOLIB
enabled:
drivers/net/phy/spi_ks8995.c: In function 'ks8995_probe':
drivers/net/phy/spi_ks8995.c:477:3: error: implicit declaration of function 'gpiod_set_value' [-Werror=implicit-function-declaration]
This changes the header inclusion so it builds in all configurations.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: cd6f288cbaab ("net: phy: spi_ks8995: add support for resetting switch using GPIO") Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 23 Dec 2015 20:05:43 +0000 (12:05 -0800)]
i40e: AQ Add VXLAN-GPE tunnel type
Add the new Cisco VXLAN-GPE cloud tunnel type for the Add Cloud Filter
and UDP tunnel AQ commands.
Change-ID: I2c093c7d79726c7fca08a36e5c63581a905da3d2 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Kevin Scott <kevin.c.scott@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Wed, 17 Feb 2016 01:21:49 +0000 (20:21 -0500)]
Merge branch 'unified-tunnel-dst-caching'
Paolo Abeni says:
====================
net: unify dst caching for tunnel devices
This patch series try to unify the dst cache implementations currently
present in the kernel, namely in ip_tunnel.c and ip6_tunnel.c, introducing a
new generic implementation, replacing the existing ones, and then using
the new implementation in other tunnel devices which currently lack it.
The new dst implementation is compiled, as built-in, only if any device using
it is enabled.
Caching the dst for the tunnel remote address gives small, but measurable,
performance improvement when tunneling over ipv4 (in the 2%-4% range) and
significant ones when tunneling over ipv6 (roughly 60% when no
fragmentation/segmentation take place and the tunnel local address
is not specified).
v2:
- move the vxlan dst_cache usage inside the device lookup functions
- fix usage after free for lwt tunnel moving the dst cache storage inside
the dst_metadata,
- sparse codying style cleanup
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 12 Feb 2016 14:43:59 +0000 (15:43 +0100)]
net/ipv4: add dst cache support for gre lwtunnels
In case of UDP traffic with datagram length below MTU this
gives about 4% performance increase
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 12 Feb 2016 14:43:58 +0000 (15:43 +0100)]
geneve: add dst caching support
use generic dst implementation for both plain geneve devices and
lwtunnels.
In case of UDP traffic with datagram length below MTU this give
about 2% performance increase for plain geneve tunnel over ipv4,
about 65% performance increase for ipv6 tunnel.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 12 Feb 2016 14:43:57 +0000 (15:43 +0100)]
net: add dst_cache to ovs vxlan lwtunnel
In case of UDP traffic with datagram length
below MTU this give about 2% performance increase
when tunneling over ipv4 and about 60% when tunneling
over ipv6
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 12 Feb 2016 14:43:56 +0000 (15:43 +0100)]
net: use dst_cache for vxlan device
In case of UDP traffic with datagram length
below MTU this give about 3% performance increase
when tunneling over ipv4 and about 70% when
tunneling over ipv6.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 12 Feb 2016 14:43:55 +0000 (15:43 +0100)]
ip_tunnel: replace dst_cache with generic implementation
The current ip_tunnel cache implementation is prone to a race
that will cause the wrong dst to be cached on cuncurrent dst cache
miss and ip tunnel update via netlink.
Replacing with the generic implementation fix the issue.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 12 Feb 2016 14:43:54 +0000 (15:43 +0100)]
net: replace dst_cache ip6_tunnel implementation with the generic one
This also fix a potential race into the existing tunnel code, which
could lead to the wrong dst to be permanenty cached:
CPU1: CPU2:
<xmit on ip6_tunnel>
<cache lookup fails>
dst = ip6_route_output(...)
<tunnel params are changed via nl>
dst_cache_reset() // no effect,
// the cache is empty
dst_cache_set() // the wrong dst
// is permanenty stored
// into the cache
With the new dst implementation the above race is not possible
since the first cache lookup after dst_cache_reset will fail due
to the timestamp check
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 12 Feb 2016 14:43:53 +0000 (15:43 +0100)]
net: add dst_cache support
This patch add a generic, lockless dst cache implementation.
The need for lock is avoided updating the dst cache fields
only in per cpu scope, and requiring that the cache manipulation
functions are invoked with the local bh disabled.
The refresh_ts and reset_ts fields are used to ensure the cache
consistency in case of cuncurrent cache update (dst_cache_set*) and
reset operation (dst_cache_reset).
Consider the following scenario:
CPU1: CPU2:
<cache lookup with emtpy cache: it fails>
<get dst via uncached route lookup>
<related configuration changes>
dst_cache_reset()
dst_cache_set()
The dst entry set passed to dst_cache_set() should not be used
for later dst cache lookup, because it's obtained using old
configuration values.
Since the refresh_ts is updated only on dst_cache lookup, the
cached value in the above scenario will be discarded on the next
lookup.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Suggested-and-acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 Feb 2016 01:12:17 +0000 (20:12 -0500)]
Merge branch 'bnx2x-next'
Yuval Mintz says:
====================
bnx2x: driver updates
This series contains several changes - the biggest change is the
addition of Geneve NDO support [allows device to perform RSS according
to inner-headers of encapsulated packet, similar to what it does for
vxlan]. It also extends dcbx support, as well as introducing some minor
changes.
Dave,
Please consider applying this series to `net-next'.
[Do notice patch #3 fails checkpatch due to consistency with existing
HSI]
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Tue, 16 Feb 2016 16:08:01 +0000 (18:08 +0200)]
bnx2x: Warn about grc timeouts in register dump
There are several scenarios where taking a register dump from a device
might log benign GRC timeout attentions to system logs.
Most common of those is when taking the dump from a 2-port device.
Sadly, there's no easy way to mask the problematic attentions during the
flow - Changing this behvaior would require a firmware update.
For now, simply warn users to ignore the warnings.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Tue, 16 Feb 2016 16:08:00 +0000 (18:08 +0200)]
bnx2x: extend DCBx support
This adds support for default application priority.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Tue, 16 Feb 2016 16:07:59 +0000 (18:07 +0200)]
bnx2x: Add support for single-port DCBx
Driver is currently looking at shared information for determining whether
DCBx can be supported for a given port.
On 4-port devices, up-to-date management firmware can support DCBx on
each port of a given engine independently - but that would cause bnx2x to
misinterpert the support and assume DCBx is supported on both.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Tue, 16 Feb 2016 16:07:58 +0000 (18:07 +0200)]
bnx2x: Add Geneve inner-RSS support
This adds the ability to perform RSS hashing based on encapsulated
headers for a geneve-encapsulated packet.
This also changes the Vxlan implementation in bnx2x to be uniform
for both vxlan and geneve [from configuration perspective].
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Tue, 16 Feb 2016 16:07:57 +0000 (18:07 +0200)]
bnx2x: Remove unneccessary EXPORT_SYMBOL
bnx2x_schedule_sp_rtnl is exported by bnx2x, although no other module
uses it.
Reported-by: Benjamin Poirier <bpoirier@suse.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Wed, 23 Dec 2015 20:05:42 +0000 (12:05 -0800)]
i40e: AQ Add set_switch_config
Add the new Set Switch Config AdminQ command, and mark the L2 Filter
bit as deprecated in the Add VEB command.
Change-ID: I5b24790f14c56f0ddf3f70df1e486844146b039f Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Kevin Scott <kevin.c.scott@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Wed, 23 Dec 2015 20:05:41 +0000 (12:05 -0800)]
i40e: AQ Shared resource flags
Add flags to MAC allocation requests to signify that the MAC VLAN filters
should come from the shared resource pool rather than the dedicated PF
resource pools.
Change-ID: I4c2da64c01856edcb0982bc4aab75c5a91047a7a Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Acked-by: Kevin Scott <kevin.c.scott@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Richard Alpe [Thu, 11 Feb 2016 09:43:15 +0000 (10:43 +0100)]
tipc: refactor node xmit and fix memory leaks
Refactor tipc_node_xmit() to fail fast and fail early. Fix several
potential memory leaks in unexpected error paths.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This change has been made with the goal that kernel functions should
return something more descriptive than -1 on failure.
A variable `err` has been introduced for storing error codes.
The return value of kzalloc on failure should return a -1 and not a
-ENOMEM. This was found using Coccinelle. A simplified version of
the semantic patch used is:
//<smpl>
@@
expression *e;
identifier l1;
@@
e = kzalloc(...);
if (e == NULL) {
...
goto l1;
}
l1:
...
return -1
+ -ENOMEM
;
//</smpl
Furthermore, set `err` to -ENOMEM on failure of alloc_netdev(), and to
-ENODEV on failure of register_netdev() and probe_irq_off().
The single call site only checks that the return value is not 0,
hence no change is required at the call site.
Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 16 Feb 2016 20:30:29 +0000 (15:30 -0500)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2016-02-15
This series contains updates to igb only.
Shota Suzuki cleans up unnecessary flag setting for 82576 in
igb_set_flag_queue_pairs() since the default block already sets
IGB_FLAG_QUEUE_PAIRS to the correct value anyways, so the e1000_82576
code block is not necessary and we can simply fall through. Then fixes
an issue where IGB_FLAG_QUEUE_PAIRS can now be set by using "ethtool -L"
option but is never cleared unless the driver is reloaded, so clear the
queue pairing if the pairing becomes unnecessary as a result of "ethtool
-L".
Mitch fixes the igbvf from giving up if it fails to get the hardware
mailbox lock. This can happen when the PF-VF communication channel is
heavily loaded and causes complete communications failure between the
PF and VF drivers, so add a counter and a delay so that the driver will
now retry ten times before giving up on getting the mailbox lock.
The remaining patches in the series are from Alex Duyck, starting with the
cleaning up code that sets the MAC address. Then refactors the VFTA and
VLVF configuration, to simplify and update to similar setups in the ixgbe
driver. Fixed an issue were VLANs headers size was being added to the
value programmed into the RLPML registers, yet these registers already
take into account the size of the VLAN headers when determining the
maximum packet length, so we can drop the code that adds the size to
the RLPML registers. Cleaned up the configuration of the VF port based
VLAN configuration. Also fixed the igb driver so that we can fully
support SR-IOV or the recently added NTUPLE filtering while allowing
support for VLAN promiscuous mode. Also added the ability to use the
bridge utility to add a FDB entry for the PF to an igb port.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 16 Feb 2016 20:19:50 +0000 (15:19 -0500)]
Merge branch 'ethtool-channels-rxfh-conflict'
Jacob Keller says:
====================
ethtool: correct {GS}CHANNELS and {GS}RXFH conflict
This patch series fixes up ethtool_set_channels operation which
allowed modifying the RXFH table indirectly by reducing the number of
queues below the current max queue used by the Rx flow table. Most
drivers incorrectly allowed this to destroy the Rx flow table and
would then start by reinitializing it to default settings. However,
drivers are not able to correctly handle the conflict since there was
no way to differentiate between the default settings and the user
requested explicit settings.
To fix this, implement a new netdev private flag which we use to
indicate whether the RXFH has been user configured. If someone has
a better alternative of how to store this information, let me know.
I am not sure that priv_flags is the best solution but I have not had
any better idea.
Secondly, we add a function which just calls the driver's get_rxfh
callback to determine the current indirection table. Loop through this
and we can determine the current highest queue that will be used by
RSS.
Now, modify ethtool_set_channels to add a check ensuring that if (a)
we have had rxfh configured by user, (b) we can get the maximum RSS
queue currently used, then we ensure that the newly requested Rx count
(or combined count) is at least as high as this maximum RSS queue. The
reasoning here is that we can always safely increase the number of
queues. If we decrease the queues we must ensure that the decrease
does not go lower than the highest in-use queue for the Rx flow table.
Drivers may still need to be patched if they currently overwrite the
Rx flow table during channel configuration. If the driver currently
always resets Rx flow table when increasing number of queues it must
be patched to only do this when netif_is_rxfh_configured returns
false.
The second patch simply adds a check to ensure that all provided
channel counts fit within driver defined maximums.
The third patch fixes fm10k to correctly reconfigure the RSS reta
table whenever it is still unconfigured. This means that the default
state will provide RSS to every queue. Once the user has configured
RXFH, then we should maintain it. In addition, since the case where we
must reconfigure the RSS table in this case should now no longer
occur, add a dev_err message to indicate the user that we did so.
I have also supplied an ethtool patch to enable setting the default Rx
flow indirection table. Without this, current ethtool does not support
sending an indir_size of 0, and thus does not correctly support
configuring back to the default.
Changes in v2:
* fixed compile error
* fixed incorrect comparison with max_rx_in_use
* adjusted looping over dev_size
* removed inline on function
* dropped patch about separating combined vs asymmetric channels
* verified behavior using fm10k driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Keller, Jacob E [Tue, 9 Feb 2016 00:05:05 +0000 (16:05 -0800)]
fm10k: don't reinitialize RSS flow table when RXFH configured
Also print an error message incase we do have to reconfigure as this
should no longer happen anymore due to ethtool changes. If it somehow
does occur, user should be made aware of it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Keller, Jacob E [Tue, 9 Feb 2016 00:05:03 +0000 (16:05 -0800)]
ethtool: correctly ensure {GS}CHANNELS doesn't conflict with GS{RXFH}
Ethernet drivers implementing both {GS}RXFH and {GS}CHANNELS ethtool ops
incorrectly allow SCHANNELS when it would conflict with the settings
from SRXFH. This occurs because it is not possible for drivers to
understand whether their Rx flow indirection table has been configured
or is in the default state. In addition, drivers currently behave in
various ways when increasing the number of Rx channels.
Some drivers will always destroy the Rx flow indirection table when this
occurs, whether it has been set by the user or not. Other drivers will
attempt to preserve the table even if the user has never modified it
from the default driver settings. Neither of these situation is
desirable because it leads to unexpected behavior or loss of user
configuration.
The correct behavior is to simply return -EINVAL when SCHANNELS would
conflict with the current Rx flow table settings. However, it should
only do so if the current settings were modified by the user. If we
required that the new settings never conflict with the current (default)
Rx flow settings, we would force users to first reduce their Rx flow
settings and then reduce the number of Rx channels.
This patch proposes a solution implemented in net/core/ethtool.c which
ensures that all drivers behave correctly. It checks whether the RXFH
table has been configured to non-default settings, and stores this
information in a private netdev flag. When the number of channels is
requested to change, it first ensures that the current Rx flow table is
not going to assign flows to now disabled channels.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>