Proportional Integral controller Enhanced (PIE) is a scheduler to address the
bufferbloat problem.
>From the IETF draft below:
" Bufferbloat is a phenomenon where excess buffers in the network cause high
latency and jitter. As more and more interactive applications (e.g. voice over
IP, real time video streaming and financial transactions) run in the Internet,
high latency and jitter degrade application performance. There is a pressing
need to design intelligent queue management schemes that can control latency and
jitter; and hence provide desirable quality of service to users.
We present here a lightweight design, PIE(Proportional Integral controller
Enhanced) that can effectively control the average queueing latency to a target
value. Simulation results, theoretical analysis and Linux testbed results have
shown that PIE can ensure low latency and achieve high link utilization under
various congestion situations. The design does not require per-packet
timestamp, so it incurs very small overhead and is simple enough to implement
in both hardware and software. "
Many thanks to Dave Taht for extensive feedback, reviews, testing and
suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and
suggestions. Naeem Khademi and Dave Taht independently contributed to ECN
support.
For more information, please see technical paper about PIE in the IEEE
Conference on High Performance Switching and Routing 2013. A copy of the paper
can be found at ftp://ftpeng.cisco.com/pie/.
Please also refer to the IETF draft submission at
http://tools.ietf.org/html/draft-pan-tsvwg-pie-00
All relevant code, documents and test scripts and results can be found at
ftp://ftpeng.cisco.com/pie/.
For problems with the iproute2/tc or Linux kernel code, please contact Vijay
Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu
(mysuryan@cisco.com)
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: Mythili Prabhu <mysuryan@cisco.com> CC: Dave Taht <dave.taht@bufferbloat.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Jan 2014 18:36:06 +0000 (13:36 -0500)]
netfilter: Fix build failure in nfnetlink_queue_core.c.
net/netfilter/nfnetlink_queue_core.c: In function 'nfqnl_put_sk_uidgid':
net/netfilter/nfnetlink_queue_core.c:304:35: error: 'TCP_TIME_WAIT' undeclared (first use in this function)
net/netfilter/nfnetlink_queue_core.c:304:35: note: each undeclared identifier is reported only once for each function it appears in
make[3]: *** [net/netfilter/nfnetlink_queue_core.o] Error 1
Just a missing include of net/tcp_states.h
Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Jan 2014 18:29:30 +0000 (13:29 -0500)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables
Pablo Neira Ayuso says: <pablo@netfilter.org>
====================
nftables updates for net-next
The following patchset contains nftables updates for your net-next tree,
they are:
* Add set operation to the meta expression by means of the select_ops()
infrastructure, this allows us to set the packet mark among other things.
From Arturo Borrero Gonzalez.
* Fix wrong format in sscanf in nf_tables_set_alloc_name(), from Daniel
Borkmann.
* Add new queue expression to nf_tables. These comes with two previous patches
to prepare this new feature, one to add mask in nf_tables_core to
evaluate the queue verdict appropriately and another to refactor common
code with xt_NFQUEUE, from Eric Leblond.
* Do not hide nftables from Kconfig if nfnetlink is not enabled, also from
Eric Leblond.
* Add the reject expression to nf_tables, this adds the missing TCP RST
support. It comes with an initial patch to refactor common code with
xt_NFQUEUE, again from Eric Leblond.
* Remove an unused variable assignment in nf_tables_dump_set(), from Michal
Nazarewicz.
* Remove the nft_meta_target code, now that Arturo added the set operation
to the meta expression, from me.
* Add help information for nf_tables to Kconfig, also from me.
* Allow to dump all sets by specifying NFPROTO_UNSPEC, similar feature is
available to other nf_tables objects, requested by Arturo, from me.
* Expose the table usage counter, so we can know how many chains are using
this table without dumping the list of chains, from Tomasz Bursztyka.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Jan 2014 18:25:58 +0000 (13:25 -0500)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to i40e only.
Majority of this series contains patches from Greg and Mitch to fix
up or add functionality to the PF/VF driver interactions. Notably,
a fix for SR-IOV VF port VLAN which resolved the problem of port VLAN
configurations not being persistent across VF driver loads and unloads
and enable/disable of the feature. Also do not enable the default port
on the VEB, which is designed only to bridge the PF to an Open vSwitch
or bridge. Another fix to resolve a possible memory corruption
condition where ARQ messages are written to random memory locations.
Fix a problem where the 'ip link show' command would display stale
link address information after the link address was set via the 'ip
link set' command.
Anjali provides several patches, one which saves information that can
be used while cleaning the Tx ring and useful in detecting Tx hangs.
Then provides a fixes to the admin queue shutdown function to ensure
we are shutting down the queue in the shutdown path and ensure ASQ is
alive before issuing the admin queue command.
Shannon provides a fix for get/update vsi params where the incorrect
struct was being used.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Always call the AQ call to shutdown the queue in the shutdown path.
Check ASQ is alive before issuing the AQ command since we might be
resetting to recover from a bad state in which case we should not
issue the AQ command.
Use the register variable for length so it can be used by PF, VF
and GL AQ commands.
Greg Rose [Thu, 28 Nov 2013 06:39:44 +0000 (06:39 +0000)]
i40e: Hide the Port VLAN VLAN ID
The VF VSI Port VLAN settings still allow the user to view VLAN tag in
the descriptor. Fix the settings to hide the VLAN ID from the VF. The
VF is not supposed to be aware it is on a VLAN in the Port VLAN
scenario.
Change-Id: I976f2bacb455dbb750f8c53a781c689f02cb8907 Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 28 Nov 2013 06:39:43 +0000 (06:39 +0000)]
i40e: use correct struct for get and update vsi params
The get_vsi_params and update_vsi_params functions were using a
different command struct that just happened to have an seid element in
the right place and so worked correctly anyway. This patch fixes the
functions to use the right data struct.
There is no actual logic change.
Change-Id: I513b5e1dc293dfd5b2ba4fa443cbdbfa608d9d19 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 28 Nov 2013 06:39:42 +0000 (06:39 +0000)]
i40e: Fix VF driver MAC address configuration
Fix a problem where the 'ip link show' command would display stale
link address information after the link address was set via the 'ip
link set' command. In addition, fix problem with the user being
allowed to overwrite the administratively set VF MAC address.
Change-Id: I669ed14e55f2b633ef7b456b713632b08468671c Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 28 Nov 2013 06:39:41 +0000 (06:39 +0000)]
i40e: support VFs on PFs other than 0
When communicating with VF devices over the AQ, the FW refers to the
VF by its global VF ID, not local the VF ID with reference to its
parent PF. Since the global and local VF IDs are identical for PF 0,
the code worked correctly on PF 0.
However, we cannot just use global IDs throughout the code as most of
the other references to the VF (VSI setup, register offsets, etc.)
require the local VF ID. Instead, we just add or subtract our base VF
ID when sending and receiving AQ messages.
Change-Id: I92f4332b4876bc68b2f9af9ebf48761f63b6bd97 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 28 Nov 2013 06:39:40 +0000 (06:39 +0000)]
i40e: acknowledge VFLR when disabling SR-IOV
When SR-IOV is disabled, the (now nonexistent) virtual function
devices undergo a VFLR event. We don't need to handle this event
because the VFs are gone, but we do need to tell the HW that they are
complete. This fixes an issue with a phantom VFLR and broken VFs when
SR-IOV is re-enabled.
Change-Id: I7580b49ded0158172a85b14661ec212af77000c8 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 28 Nov 2013 06:39:39 +0000 (06:39 +0000)]
i40e: don't allocate zero size
Shockingly, the compiler didn't flag this uninitialized variable. This
fixes a potential memory corruption condition where ARQ messages are
written to random memory locations.
Change-Id: Iac82f4562d2bf3f42df3f3b2163d9cbed2160135 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 28 Nov 2013 06:39:37 +0000 (06:39 +0000)]
i40e: Do not enable default port on the VEB
Enabling the default port on the VEB causes all outgoing traffic from
virtual functions to be copied to the physical function. The default
port is only supposed to be used if you wish to bridge the physical
function to a SW switch such as Open vSwitch or the Linux bridge. That
allows the SW switch to route traffic to VMs that are not using a
virtual function.
Eventually we'll want to implement the ndo_fdb_add, ndo_fdb_del, and
ndo_fdb_dump functions. The ndo_fdb_add function would set the
default port on the VEB in those cases where the MAC/VLAN address
filters have overflowed. Normally we would not want to use it.
Change-Id: I3990f0384fff2840c4e43bc0955dd0b701380852 Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Thu, 28 Nov 2013 06:39:34 +0000 (06:39 +0000)]
i40e: Fix SR-IOV VF port VLAN
This patch fixes two different problems.
1) The port VLAN configuration was not persistent across VF driver
loads and unloads.
2) The port VLAN configuration was only correct the first time it was
set. Switching the port VLAN on and off would cause subsequent VLAN
configurations to be corrupted in the VSI. Ensure that the correct
bits are being set for the VSI port VLAN configuration.
Change-Id: I7ebf5329f77eb8d73ccd3324eb346b3abeea737d Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eyal Perry [Sun, 5 Jan 2014 15:41:08 +0000 (17:41 +0200)]
net/mlx4_core: Warn if device doesn't have enough PCI bandwidth
Check if the device get enough bandwidth from the entire PCI chain to satisfy
its capabilities. This patch determines the PCIe device's bandwidth capabilities
by reading its PCIe Link Capabilities registers and then call the
pcie_get_minimum_link function to ensure that the adapter is hooked into a slot
which is capable of providing the necessary bandwidth capabilities.
Signed-off-by: Eyal Perry <eyalpe@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 6 Jan 2014 01:31:01 +0000 (20:31 -0500)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to i40e only.
Anjali provides two cleanups to remove unnecessary code and a fix
to resolve debugfs dumping only half the NVM. Then provides a fix
to ethtool NVM reads where shadow RAM was used instead of actual
NVM reads.
Jesse provides a couple of fixes, one removes custom i40e functions
which duplicate existing kernel functionality. Second fixes constant
cast issues by replacing __constant_htons with htons.
Mitch provides a couple of fixes for the VF interfaces in i40e. First
provides a fix to guard against VF message races with can cause a panic.
Second fix reinitializes the buffer size each time we clean the ARQ,
because subsequent messages can be truncated. Lastly adds functionality
to enable/disable ICR 0 dynamically.
Vasu adds a simple guard against multiple includes of the i40e_txrx.h
file.
Shannon provides a couple of fixes, first fix swaps a couple of lines
around in the error handling if the allocation for the VSI array fails.
Second fixes an issue where we try to free the q_vector that has not
been setup which can panic the kernel.
David provides a patch to save off the point to memory and the length
of 2 structs used in the admin queue in order to store all info about
allocated kernel memory.
Neerav fixes ring allocation where allocation and clearing of rings
for a VSI should be using the alloc_queue_pairs and not num_queue_pairs.
Then removes the unused define for multi-queue enabled.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
8021q: make vlan_pcpu_stats visible without CONFIG_VLAN_8021Q
macvlan needs vlan_pcpu_stats so make it visible even if compiling
without VLAN_8021Q support. Otherwise a very long compiler error happens.
Fixes: cdf3e274cf1b36 ("macvlan: unify macvlan_pcpu_stats and vlan_pcpu_stats") Cc: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-By: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 6 Jan 2014 00:20:11 +0000 (01:20 +0100)]
net: netdev_kobject_init: annotate with __init
netdev_kobject_init() is only being called from __init context,
that is, net_dev_init(), so annotate it with __init as well, thus
the kernel can take this as a hint that the function is used only
during the initialization phase and free up used memory resources
after its invocation.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for your net-next tree,
they are:
* Add full port randomization support. Some crazy researchers found a way
to reconstruct the secure ephemeral ports that are allocated in random mode
by sending off-path bursts of UDP packets to overrun the socket buffer of
the DNS resolver to trigger retransmissions, then if the timing for the
DNS resolution done by a client is larger than usual, then they conclude
that the port that received the burst of UDP packets is the one that was
opened. It seems a bit aggressive method to me but it seems to work for
them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
new NAT mode to fully randomize ports using prandom.
* Add a new classifier to x_tables based on the socket net_cls set via
cgroups. These includes two patches to prepare the field as requested by
Zefan Li. Also from Daniel Borkmann.
* Use prandom instead of get_random_bytes in several locations of the
netfilter code, from Florian Westphal.
* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
mark, also from Florian Westphal.
* Fix compilation warning due to unused variable in IPVS, from Geert
Uytterhoeven.
* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.
* Add IPComp extension to x_tables, from Fan Du.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neerav Parikh [Thu, 28 Nov 2013 06:39:37 +0000 (06:39 +0000)]
i40e: Fix ring allocation
The allocation and clearing of rings for a VSI should be
using the alloc_queue_pairs and not num_queue_pairs.
The alloc_queue_pairs per VSI is a pre-allocated number
of queues assigned to a VSI; based on number of TCs enabled
only certain number of queues may be used from that. This
is mainly valid only for the LAN VSI case as that is the
only VSI that may be enabled with multiple traffic classes.
In the future the number of TCs may change based on DCBX
configuration.
The actual number of queues that are enabled/configured is
based on the number of TCs enabled for a given VSI and that
is stored in num_queue_pairs.
With this change num_[tr]x_queues is unused so remove them.
David Cassard [Thu, 28 Nov 2013 06:39:35 +0000 (06:39 +0000)]
i40e: keep allocated memory in structs
Save both a pointer to memory and the length in order to store all
info about allocated kernel memory. This patch changes some adminq
allocations to preserve the full i40e_dma_mem/i40e_virt_mem structs
for every allocation.
Change-Id: Ibcf96159aba4ba61f839d16d87d19478df28e630 Signed-off-by: David Cassard <david.g.cassard@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 28 Nov 2013 06:39:34 +0000 (06:39 +0000)]
i40e: fix error handling when alloc of vsi array fails
Swap a couple lines around in the error handling if the kzalloc() for
the pf->vsi array fails. This was causing a kernel BUG because the
call to i40e_clear_interrupt_scheme() was assuming the pf->vsi[] array
existed. In this fix it is possible that i40e_reset_interrupt_capability()
will get called twice, but this is a safe action.
Change-Id: I939163ccaa89baac7511556d36bc873864c35ae1 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 28 Nov 2013 06:39:33 +0000 (06:39 +0000)]
i40e: reinit buffer size each time
When cleaning the ARQ, we must reinitialize the buffer size each time we
go through the loop, because i40e_clean_arq_element returns the message
length in the same field. Without this change, subsequent messages can
be truncated to the length of the previous message.
Change-Id: Ic9c32ff843faf0fc3196d21351a1c3a60c6158eb Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 28 Nov 2013 06:39:30 +0000 (06:39 +0000)]
i40e: guard against vf message races
When disabling and enabling VFs on a live system with the VF driver
loaded, it's possible to receive an admin queue message from the VF
driver at an inconvenient time, e.g. when the associated data structures
aren't present or configured. This causes a rather inconvenient panic.
To guard against this, we change the order of when we set num_alloc_vfs
when turning off SR-IOV, and then gate processing of any VF messages
based upon that value. Likewise, when enabling VFs, we shut off the
relevant interrupt until configuration is complete.
Change-Id: I0c172c056616c2bebd78bbc807ab446eb484deea Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Sun, 5 Jan 2014 01:25:55 +0000 (20:25 -0500)]
Merge branch 'bgmac'
bgmac: add initial support for core rev 4 on ARM BCM47xx
====================
This adds support for core rev 4 and ARM BCM47XX.
With an other fix to the platform code I am now getting over 200 MBit/s
with this Ethernet driver, the DMA problems are solved are unrelated
to bgmac.
v3:
- moved flags calculation for bcma_core_enable() into if block
- remove hard coding of phy address to BGMAC_PHY_NOREGS
v2: add changed suggested by Rafał
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Sun, 5 Jan 2014 00:10:47 +0000 (01:10 +0100)]
bgmac: add support for Northstar SoC (BCM4707, BCM53018)
This adds support for the Northstar SoC. This SoC does not have a PMU in
bcma and no register on it should be called. In addition it support 2.5
GBit/s Ethernet to the PHY.
This GMAC core is not fully working there are still problems with the
DMA controller.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Sun, 5 Jan 2014 00:10:46 +0000 (01:10 +0100)]
bgmac: reset all cores on Northstar SoC
On the Northstar SoC (BCM4707 and BCM53018) we have to enable all GMAC
cores when we just want to use on. We iterate over all the cores and
activate them.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Acked-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Sun, 5 Jan 2014 00:10:45 +0000 (01:10 +0100)]
bgmac: add support for new BGMAC_CMDCFG_SR position on core rev >= 4
The BGMAC_CMDCFG_SR register is at a different position on core rev >= 4
We do not know where this register is on a rev 5 or higher core, I have
newer seen such a core.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Acked-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hauke Mehrtens [Sun, 5 Jan 2014 00:10:44 +0000 (01:10 +0100)]
bgmac: initialize the DMA controller of core rev >= 4
The DMA controller used in the device supported by GMAC with core rev
>= 4 has some new options which are now set to the default values used
in the Broadcom SDK.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Acked-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Sat, 4 Jan 2014 05:57:59 +0000 (13:57 +0800)]
net: unify the pcpu_tstats and br_cpu_netstats as one
They are same, so unify them as one, pcpu_sw_netstats.
Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
from if_tunnel and remove br_cpu_netstats from br_private.h
Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 5 Jan 2014 00:50:35 +0000 (19:50 -0500)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to i40e and pci_regs.h.
Anjali provides a patch to prevent messages from stray HMC events, except
at interrupt message level, and refactors the HMC error handling.
Catherine adds routines in probe to populate/check PCI bus speed and width,
then verify we are in a 8GT/s x8 PCIe slot and warn when we are not.
Shannon adds Wake-on-LAN support for i40e, fixes curly brace use as well as
return type for i40e_vsi_clear_rings().
Joseph implements receive offload for VXLAN for i40e, where the hardware
supports checksum offload/verification of the inner/outer header.
Mitch provides the bulk of the changes, where he refactors the VF reset
code so that it works on real hardware. Then does code cleanup by
calling existing functions to enable and disable queues for VFs and
remove unused functions. Removes a unnecessary log messages that are
seen at every VF reset, for example complaining about disabling queues
that are already disabled. Fixes an error return when the VF asks to
add an invalid MAC address and if the VF sends a bad message, make it
more informative about what is actually going on.
Jesse refactors the LED function to flash LED lights correctly.
v2:
- removed patch 5 "i40e: add set settings and pauseparam" based on
feedback from Ben Hutchings, will re-work that patch for later
submission
- Added patch "i40e: Implementation of vxlan ndo's" from Joseph to
address Or Gerlitz's questions and concerns. This patch adds the
implementation for the VXLAN ndo's and allows the hardware to do
receive checksum offload for inner packets on the UDP ports that
VXLAN notifies us about.
- Added patch "i40e: using for_each_set_bit to simplify the code"
from Wei Yongjun. This patch uses for_each_set_bit() to simply
the code.
v3:
- fixed indentation issue in patch 11 based on feedback from
Sergei Shtylyov.
Sorry for the delayed release of v4, it was delayed to the holidays.
v4:
- Addressed Or Gerlitz's concerns about trying to get a hold of a mutex
while holding a spin lock in patch 6 by executing the AQ commands from
a subtask.
- Addressed David Miller's Kconfig concerns by creating a Kconfig VXLAN
option for i40e and wrapped appropriate code with the config option in
patch 6.
- Updated patch 7 based on the changes made in patch 6 in the above two
bullets.
v5:
- Added the patch to pci_regs.h based on David Miller's feedback to add
PCI defines for speed and width
- Updated patch 3 description to better explain the changes based on
feedback from David Miller
- Updated patch 4 to use the newly added defines to pci_regs.h instead
of local defines
- Updated patch 7 to use <net/vxlan.h> in the #include based on feedback
from David Miller
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 5 Jan 2014 00:28:27 +0000 (03:28 +0300)]
phylib: make phy_scan_fixups() static
phy_scan_fixups() isn't and shouldn't be called by the drivers directly, so
unexport it. And since Florian Fainelli's recent patches, the function is only
called locally, so we can make it static as well.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 5 Jan 2014 00:27:17 +0000 (03:27 +0300)]
phylib: remove unused adjust_state() callback
Remove adjust_state() callback from 'struct phy_device' since it seems to have
never been really used from the inception: phy_start_machine() has been always
called with 2nd argument equal to NULL.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mitch Williams [Thu, 28 Nov 2013 06:39:34 +0000 (06:39 +0000)]
i40e: report VF MAC addresses correctly
If the user does not assign a VF MAC address, then just report it as
zero. Attempting to guess the correct primary MAC address of the VF is a
futile and heartbreaking endeavour.
Change-Id: I2673577a160afb6fc55094c890467b44e60c7584 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 28 Nov 2013 06:39:29 +0000 (06:39 +0000)]
i40e: remove chatty log messages
Don't complain when we disable queues that are already disable, or
enable them when they're already enabled. This removes a bunch of bogus
log messages that we see at every VF reset.
Change-Id: Ia127be572abdccc48a53d8c43f8a07b8bb920de1 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 28 Nov 2013 06:39:28 +0000 (06:39 +0000)]
i40e: remove redundant code
Don't keep separate functions to enable and disable queues for the VFs.
Just call the existing function that everybody else uses. Remove the
unused functions.
Change-Id: I15db9aad64a59e502bfe1e0fdab9b347ab85c12c Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Sibai Li <sibai.li@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 28 Nov 2013 06:39:27 +0000 (06:39 +0000)]
i40e: refactor VF reset flow
Fix the VF reset flow so that it works on real hardware. After
discussions with the HW team, the reset flow has been changed
somewhat.
- Change the i40e_reset_vf function to a void type, and fix
up the callers to reflect this.
- Move the MSI-X disable code to i40e_free_vf_res since it must
be done every time the VF is freed, regardless of whether or
not it is reset.
- Ensure that the PCIe bus is quiet before polling the reset bit.
- Don't clear the VFGEN_RSTAT1 register at the beginning as it is
cleared by the reset.
- Poll longer for the reset to be done.
- Disable the queues using an existing function rather than
rolling our own.
- Free and reallocate the VSI after reset to avoid rx hang.
Change-Id: I11e2590431cb73e8663714d1cc5b23d59b809033 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Thu, 28 Nov 2013 06:39:26 +0000 (06:39 +0000)]
i40e: move i40e_reset_vf
The VF reset code will be refactored in future patches. Part of that
refactor required it to call i40e_alloc_vf_res and i40e_free_vf_res, so
the function must be moved. In order to make the future patches more
readable, we perform the function move here, with no other changes.
Change-Id: If6567c9c0bada6caafb2ee0227e0d9d50d05f27f Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Sat, 28 Dec 2013 07:32:18 +0000 (07:32 +0000)]
i40e: Implementation of VXLAN ndo's
This adds the implementation for the VXLAN ndo's. This allows the
hardware to do RX checksum offload for inner packets on the UDP ports
that VXLAN notifies us about.
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Thu, 28 Nov 2013 06:39:22 +0000 (06:39 +0000)]
i40e: add wake-on-lan support
Wake on LAN is disabled by default and will remain that way for most
platforms, but there is an NVM setting that allows vendors to enable it
for a port if they think they've provided the right power environment
for the device. This patch adds code to check the NVM setting and enable
Magic Packet use if WoL is enabled for the port.
Since only Magic Packet is supported, there's not a lot of HW configuration
needed.
Change-Id: I44e904a7b15695e34683009f487064cd86ea59b0 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e: Suppress HMC error to Interrupt message level
The HMC error interrupt would generate an un-necessary message
"unhandled interrupt", causing extra log spam, in addition to causing
a reset that was not necessary. Prevent this issue by handling the
HMC error case explicitly, and only reset if the interrupt was from
some of the other causes.
David S. Miller [Sat, 4 Jan 2014 02:03:27 +0000 (21:03 -0500)]
Merge branch 'bonding'
Scott Feldman says:
====================
bonding: final set of netlink patches
v2:
- per Jiri's comment, fix ad_select checking against parm table by
spliting bond_parse_parm() into several funcs. Go ahead and apply
same technique to all parameters using parm table.
- fix netlink msg size to including missing nest attr
- drop the last patch for active_slaves. This patch needs to be
reworked per Jiri's comments and shouldn't hold up finalizing
the conversion of the existing parameter to netlink attributes.
Ding, assuming this patch set goes in, you should have all you
need to start converting module parameter setting/checking over to
funcs in *_options.c.
I'll send iproute2 patch for bonding netlink support once this patch
set is accepted.
v1:
The following series implements the last set of bonding netlink attributes
for 802.3ad mode:
The last patch adds an additional netlink attribute, active_slaves, which
is a nested list of ifindices for current active slaves. We're using this
list to enable/disable hashing of ports in a hardware LAG implementation.
In the same way bonding driver includes/excludes ports for 802.3ad egress
hashing, hardware ports are included/excluded from egress hashing by
hardware based on port active status. Yes, data path offloaded to
hardware, control path remains in kernel via bonding driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Nayak Sujir (3):
tg3: Set the MAC clock to the fastest speed during boot code load
tg3: Poll cpmu link state on APE + ASF enabled devices
tg3: Update version to 3.136
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Sujir [Fri, 3 Jan 2014 18:09:15 +0000 (10:09 -0800)]
tg3: Update version to 3.136
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Sujir [Fri, 3 Jan 2014 18:09:14 +0000 (10:09 -0800)]
tg3: Poll cpmu link state on APE + ASF enabled devices
On ASF enabled devices where the mgmt firmware runs on the application
processing engine, there is a race between the tg3 driver processing a
link change event and the ASF firmware clearing the link changed bit in
the EMAC status register. This leads to link notifications to the driver
sometimes getting lost.
Poll the CPMU link state as a backup for the normal interrupt path
update if ASF is enabled.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Sujir [Fri, 3 Jan 2014 18:09:13 +0000 (10:09 -0800)]
tg3: Set the MAC clock to the fastest speed during boot code load
On the 5717, 5718 and 5719 devices, the bootcode runs slower when any
port doesn't have a link due to clock speed slowing down as part of the
link-aware feature. This leads to the driver timing out waiting for the
bootcode signature.
This patch overrides the clock policy to the highest frequency just before
reset and restores it after the bootcode is up.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 3 Jan 2014 18:09:12 +0000 (10:09 -0800)]
tg3: Add unicast filtering support.
Up to 3 additional unicast addresses can be added to the perfect match
filter table.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 3 Jan 2014 18:09:11 +0000 (10:09 -0800)]
tg3: Refactor __tg3_set_mac_addr()
so that individual MAC address filter entries can be set.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
hayeswang [Fri, 3 Jan 2014 03:21:56 +0000 (11:21 +0800)]
r8152: fix the wrong return value
The return value should be the boolean value, not the error code.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Spotted-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 4 Jan 2014 00:41:39 +0000 (19:41 -0500)]
Merge branch 'tunnel_dst_caching'
Tom Herbert says:
====================
ipv4: Cache dst in tunnels
Version 3 of caching routes in tunnels.
Addressed some comments from Eric in this series.
There are two patches (variants) in the series:
1) One dst cached for each tunnel.
2) Percpu dst cache per tunnel to avoid false sharing
Testing with GRE tunnels on a 32 CPU host with bnx2x (RSS support
for GRE) shows a modest improvement in CPU utilization with these
patches running 200 TCP_RR netperf clients.
Without patches
71.22% CPU utilization
138/180/244 90/95/99% latencies
1.30465e+06 CPU/tps
18318 CPU/tps
With patches
69.84%
142/186/249 90/95/99% latencies
1.30827e+06
18732 CPU/tps
====================
Signed-off-by: David S. Miller <davem@davemloft.net>