git.karo-electronics.de Git - karo-tx-linux.git/log

net/mlx4_core: Warn if device doesn't have enough PCI bandwidth

Check if the device get enough bandwidth from the entire PCI chain to satisfy
its capabilities. This patch determines the PCIe device's bandwidth capabilities
by reading its PCIe Link Capabilities registers and then call the
pcie_get_minimum_link function to ensure that the adapter is hooked into a slot
which is capable of providing the necessary bandwidth capabilities.

Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e only.

Anjali provides two cleanups to remove unnecessary code and a fix
to resolve debugfs dumping only half the NVM.  Then provides a fix
to ethtool NVM reads where shadow RAM was used instead of actual
NVM reads.

Jesse provides a couple of fixes, one removes custom i40e functions
which duplicate existing kernel functionality.  Second fixes constant
cast issues by replacing __constant_htons with htons.

Mitch provides a couple of fixes for the VF interfaces in i40e.  First
provides a fix to guard against VF message races with can cause a panic.
Second fix reinitializes the buffer size each time we clean the ARQ,
because subsequent messages can be truncated. Lastly adds functionality
to enable/disable ICR 0 dynamically.

Vasu adds a simple guard against multiple includes of the i40e_txrx.h
file.

Shannon provides a couple of fixes, first fix swaps a couple of lines
around in the error handling if the allocation for the VSI array fails.
Second fixes an issue where we try to free the q_vector that has not
been setup which can panic the kernel.

David provides a patch to save off the point to memory and the length
of 2 structs used in the admin queue in order to store all info about
allocated kernel memory.

Neerav fixes ring allocation where allocation and clearing of rings
for a VSI should be using the alloc_queue_pairs and not num_queue_pairs.
Then removes the unused define for multi-queue enabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

8021q: make vlan_pcpu_stats visible without CONFIG_VLAN_8021Q

macvlan needs vlan_pcpu_stats so make it visible even if compiling
without VLAN_8021Q support. Otherwise a very long compiler error happens.

Fixes: cdf3e274cf1b36 ("macvlan: unify macvlan_pcpu_stats and vlan_pcpu_stats")
Cc: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-By: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netdev_kobject_init: annotate with __init

netdev_kobject_init() is only being called from __init context,
that is, net_dev_init(), so annotate it with __init as well, thus
the kernel can take this as a hint that the function is used only
during the initialization phase and free up used memory resources
after its invocation.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for your net-next tree,
they are:

* Add full port randomization support. Some crazy researchers found a way
  to reconstruct the secure ephemeral ports that are allocated in random mode
  by sending off-path bursts of UDP packets to overrun the socket buffer of
  the DNS resolver to trigger retransmissions, then if the timing for the
  DNS resolution done by a client is larger than usual, then they conclude
  that the port that received the burst of UDP packets is the one that was
  opened. It seems a bit aggressive method to me but it seems to work for
  them. As a result, Daniel Borkmann and Hannes Frederic Sowa came up with a
  new NAT mode to fully randomize ports using prandom.

* Add a new classifier to x_tables based on the socket net_cls set via
  cgroups. These includes two patches to prepare the field as requested by
  Zefan Li. Also from Daniel Borkmann.

* Use prandom instead of get_random_bytes in several locations of the
  netfilter code, from Florian Westphal.

* Allow to use the CTA_MARK_MASK in ctnetlink when mangling the conntrack
  mark, also from Florian Westphal.

* Fix compilation warning due to unused variable in IPVS, from Geert
  Uytterhoeven.

* Add support for UID/GID via nfnetlink_queue, from Valentina Giusti.

* Add IPComp extension to x_tables, from Fan Du.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: remove un-necessary io-write

Driver needs to clean PBA only when interrupts are turned off and we
are polling instead.

Change-Id: Ic0c1da761bd3abe7f73b1cc8bcddf8e3a232fd0f
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Remove unnecessary prototypes

These functions don't need a prototype as they are defined
in the file before they are called.

Change-Id: Ie17ffad4a29a9c0df434c4ebc4681128a6095c65
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: I40E_FLAG_MQ_ENABLED is not used

Remove references to I40E_FLAG_MQ_ENABLED from the code
as it doesn't seem to be used anywhere.

Change-Id: I4c89fb65b2cdd26fbb0c58fccbbb4b03f0e5f1b3
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix ring allocation

The allocation and clearing of rings for a VSI should be
using the alloc_queue_pairs and not num_queue_pairs.

The alloc_queue_pairs per VSI is a pre-allocated number
of queues assigned to a VSI; based on number of TCs enabled
only certain number of queues may be used from that. This
is mainly valid only for the LAN VSI case as that is the
only VSI that may be enabled with multiple traffic classes.
In the future the number of TCs may change based on DCBX
configuration.

The actual number of queues that are enabled/configured is
based on the number of TCs enabled for a given VSI and that
is stored in num_queue_pairs.

With this change num_[tr]x_queues is unused so remove them.

Change-Id: I9c2f84778bb25f7313c630e9b002a0caa883ce29
Signed-off-by: Neerav Parikh <Neerav.Parikh@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: catch unset q_vector

Don't try to free a q_vector that hasn't been set up as it can
panic the kernel.

Change-Id: I0650cc6c441d0779788c522c790293c276d14fbc
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: keep allocated memory in structs

Save both a pointer to memory and the length in order to store all
info about allocated kernel memory. This patch changes some adminq
allocations to preserve the full i40e_dma_mem/i40e_virt_mem structs
for every allocation.

Change-Id: Ibcf96159aba4ba61f839d16d87d19478df28e630
Signed-off-by: David Cassard <david.g.cassard@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fix error handling when alloc of vsi array fails

Swap a couple lines around in the error handling if the kzalloc() for
the pf->vsi array fails. This was causing a kernel BUG because the
call to i40e_clear_interrupt_scheme() was assuming the pf->vsi[] array
existed. In this fix it is possible that i40e_reset_interrupt_capability()
will get called twice, but this is a safe action.

Change-Id: I939163ccaa89baac7511556d36bc873864c35ae1
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: reinit buffer size each time

When cleaning the ARQ, we must reinitialize the buffer size each time we
go through the loop, because i40e_clean_arq_element returns the message
length in the same field. Without this change, subsequent messages can
be truncated to the length of the previous message.

Change-Id: Ic9c32ff843faf0fc3196d21351a1c3a60c6158eb
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: use functions to enable and disable icr 0

Introduce i40e_irq_dynamic_disable_icr0 and use it and its previously-
extant counterpart when appropriate.

Change-Id: Ieb4037874fba2e96fc2354b34a97a3cb8f6490f3
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: add header file flag _I40E_TXRX_H_

Add an include header guard to guard against multiple includes

Change-Id: I73efa03efc912d2047edab903c7caed05b444da2
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: guard against vf message races

When disabling and enabling VFs on a live system with the VF driver
loaded, it's possible to receive an admin queue message from the VF
driver at an inconvenient time, e.g. when the associated data structures
aren't present or configured. This causes a rather inconvenient panic.

To guard against this, we change the order of when we set num_alloc_vfs
when turning off SR-IOV, and then gate processing of any VF messages
based upon that value. Likewise, when enabling VFs, we shut off the
relevant interrupt until configuration is complete.

Change-Id: I0c172c056616c2bebd78bbc807ab446eb484deea
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fix constant cast issues

replace __constant_htons with htons

Change-Id: I123a5318bae34c8b004c71db07c56f137c685849
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Change the ethtool NVM read method to use AQ

Earlier we were reading Shadow RAM (copy of the NVM) which can differ
from the actual NVM. Use AQ instead to read the actual NVM.

Change-Id: Ia0f2773b722db77d093f738c068af872be69bbd4
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fix mac address checking

Remove custom i40e functions around ethernet addresses that are
duplicating already existing kernel functionality.

Also ends up fixing a bug with multicast addresses.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Dump the whole NVM, not half

Debugfs was reading exactly half the number of words, fix it.

Change-Id: Ieb217f3c6dca455d44e50a0dc61a6664c0cb2265
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'bgmac'

bgmac: add initial support for core rev 4 on ARM BCM47xx

====================
This adds support for core rev 4 and ARM BCM47XX.
With an other fix to the platform code I am now getting over 200 MBit/s
with this Ethernet driver, the DMA problems are solved are unrelated
to bgmac.

v3:
- moved flags calculation for bcma_core_enable() into if block
- remove hard coding of phy address to BGMAC_PHY_NOREGS

v2: add changed suggested by Rafał
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: add support for Northstar SoC (BCM4707, BCM53018)

This adds support for the Northstar SoC. This SoC does not have a PMU in
bcma and no register on it should be called. In addition it support 2.5
GBit/s Ethernet to the PHY.

This GMAC core is not fully working there are still problems with the
DMA controller.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: reset all cores on Northstar SoC

On the Northstar SoC (BCM4707 and BCM53018) we have to enable all GMAC
cores when we just want to use on. We iterate over all the cores and
activate them.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: add support for new BGMAC_CMDCFG_SR position on core rev >= 4

The BGMAC_CMDCFG_SR register is at a different position on core rev >= 4
We do not know where this register is on a rev 5 or higher core, I have
newer seen such a core.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bgmac: initialize the DMA controller of core rev >= 4

The DMA controller used in the device supported by GMAC with core rev
>= 4 has some new options which are now set to the default values used
in the Broadcom SDK.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bcma: export bcma_find_core_unit()

This function is used to get a specific core when there is more than
one core of that specific type. This is used in bgmac to reset all GMAC
cores.

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Rafał Miłecki <zajec5@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: remove unused code

Remove dead code;
tipc_bearer_find_interface
tipc_node_redundant_links

This may break out of tree version of TIPC if there still is one.
But that maybe a good thing :-)

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: make local function static

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: make local variable static

Make DCCP module config variable static, only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: remove obsolete code

This function is defined but not used.
Remove it now, can be resurrected if ever needed.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvlan: unify macvlan_pcpu_stats and vlan_pcpu_stats

They are same, so unify them as one; since macvlan is a kind of vlan,
vlan_pcpu_stats should be a proper name for vlan and macvlan.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: unify the pcpu_tstats and br_cpu_netstats as one

They are same, so unify them as one, pcpu_sw_netstats.

Define pcpu_sw_netstat in netdevice.h, remove pcpu_tstats
from if_tunnel and remove br_cpu_netstats from br_private.h

Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e and pci_regs.h.

Anjali provides a patch to prevent messages from stray HMC events, except
at interrupt message level, and refactors the HMC error handling.

Catherine adds routines in probe to populate/check PCI bus speed and width,
then verify we are in a 8GT/s x8 PCIe slot and warn when we are not.

Shannon adds Wake-on-LAN support for i40e, fixes curly brace use as well as
return type for i40e_vsi_clear_rings().

Joseph implements receive offload for VXLAN for i40e, where the hardware
supports checksum offload/verification of the inner/outer header.

Mitch provides the bulk of the changes, where he refactors the VF reset
code so that it works on real hardware.  Then does code cleanup by
calling existing functions to enable and disable queues for VFs and
remove unused functions.  Removes a unnecessary log messages that are
seen at every VF reset, for example complaining about disabling queues
that are already disabled.  Fixes an error return when the VF asks to
add an invalid MAC address and if the VF sends a bad message, make it
more informative about what is actually going on.

Jesse refactors the LED function to flash LED lights correctly.

v2:
- removed patch 5 "i40e: add set settings and pauseparam" based on
   feedback from Ben Hutchings, will re-work that patch for later
   submission
- Added patch "i40e: Implementation of vxlan ndo's" from Joseph to
   address Or Gerlitz's questions and concerns.  This patch adds the
   implementation for the VXLAN ndo's and allows the hardware to do
   receive checksum offload for inner packets on the UDP ports that
   VXLAN notifies us about.
- Added patch "i40e: using for_each_set_bit to simplify the code"
   from Wei Yongjun.  This patch uses for_each_set_bit() to simply
   the code.

v3:
- fixed indentation issue in patch 11 based on feedback from
   Sergei Shtylyov.

Sorry for the delayed release of v4, it was delayed to the holidays.

v4:
- Addressed Or Gerlitz's concerns about trying to get a hold of a mutex
   while holding a spin lock in patch 6 by executing the AQ commands from
   a subtask.
- Addressed David Miller's Kconfig concerns by creating a Kconfig VXLAN
   option for i40e and wrapped appropriate code with the config option in
   patch 6.
- Updated patch 7 based on the changes made in patch 6 in the above two
   bullets.

v5:
- Added the patch to pci_regs.h based on David Miller's feedback to add
   PCI defines for speed and width
- Updated patch 3 description to better explain the changes based on
   feedback from David Miller
- Updated patch 4 to use the newly added defines to pci_regs.h instead
   of local defines
- Updated patch 7 to use <net/vxlan.h> in the #include based on feedback
   from David Miller
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

phylib: make phy_scan_fixups() static

phy_scan_fixups() isn't and shouldn't be called by the drivers directly, so
unexport it. And since Florian Fainelli's recent patches, the function is only
called locally, so we can make it static as well.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylib: remove unused adjust_state() callback

Remove adjust_state() callback from 'struct phy_device' since it seems to have
never been really used from the inception: phy_start_machine() has been always
called with 2nd argument equal to NULL.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: kill excess empty lines

Remove excess empty lines such as those between a function call and its result
check and just duplicate ones between functions.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: kill excess code

Remove some excess code:

- convert assignments to initializers;

- kill useless assignments before *return*.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: kill useless local variables

A number of functions (especially in phy.c) has local variables that were hardly
needed in the first place -- remove them.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

<linux/phy.h>: coding style fixes

Running 'checkpatch.pl' gives some errors and warnings:

- no spaces around =;

- * separated by space from the function name;

- { in function definition not on a separate line;

- line over 80 characters.

While fixing these, also fix the following style issues:

- file name in the heading comment;

- alignment not matching open paren.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mdio_bus: coding style fixes

The recent patch from Florian Fainelli fixed all 'checkpatch.pl' errors but left
some warnings like:

- including <asm/io.h> instead of <linux/io.h>;

- including <asm/uaccess.h> instead of <linux/uaccess.h>;

- block comments using empty /* line;

- 'struct dev_pm_ops' variable not being *const*.

While fixing these, also fix the following style issues (some of which were
found running 'checkpatch.pl --strict'):

- alignment not matching open paren;

- file name in the heading comment.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: coding style fixes

The recent patch from Florian Fainelli fixed all 'checkpatch.pl' errors but left
the numerous warnings:

- including <asm/io.h> instead of <linux/io.h>;

- including <asm/uaccess.h> instead of <linux/uaccess.h>;

- *extern* declaration in .c file;

- block comments using empty /* line;

- block comments not starting with * on the middle lines;

- block comments not having trailing */ on a separate line;

- EXPORT_SYMBOL() not immediately following its function;

- unnecessary {} for signle statement block;

- spaces before tabs.

While fixing these, also fix the following style issues (some of which were
found running 'checkpatch.pl --strict'):

- alignment not matching open paren;

- missing {} on one of the *if* arms where another has them;

- use of sizeof(struct structure) instead of sizeof(*variable);

- multiple assignments on one line;

- empty line before };

- file names in the heading comments;

- missing spaces around operators;

- no {} around multi-line *if* operator's arm;

- unneeded () around subexpressions;

- incomplete kernel-doc comment style;

- comment line exceeding 80 characters;

- missing empty line after declarations.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: report VF MAC addresses correctly

If the user does not assign a VF MAC address, then just report it as
zero. Attempting to guess the correct primary MAC address of the VF is a
futile and heartbreaking endeavour.

Change-Id: I2673577a160afb6fc55094c890467b44e60c7584
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: update led set args

Add an argument to led function and refactor code to flash LED lights
correctly.

Change-Id: I00b21607ced53aaa057159503875708871946259
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: make a define from a large constant

Make a define used in the header file by both VF and PF drivers.

Change-Id: Ie9e35adcc021cd6a8f7513934984eb4ed55774f5
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: be more informative

If the VF sends a bad message, be more informative about what it
actually is.

Change-Id: I89e06d2db416a1d05aeea016dd6e8b7870cae99a
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fix error return

If the VF asks to add an invalid MAC address, tell it that instead of
just using a generic return code.

Change-Id: I366aff5449fa5874ad51e2734cac2a71783ab14b
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: remove chatty log messages

Don't complain when we disable queues that are already disable, or
enable them when they're already enabled. This removes a bunch of bogus
log messages that we see at every VF reset.

Change-Id: Ia127be572abdccc48a53d8c43f8a07b8bb920de1
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: remove redundant code

Don't keep separate functions to enable and disable queues for the VFs.
Just call the existing function that everybody else uses. Remove the
unused functions.

Change-Id: I15db9aad64a59e502bfe1e0fdab9b347ab85c12c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: refactor VF reset flow

Fix the VF reset flow so that it works on real hardware. After
discussions with the HW team, the reset flow has been changed
somewhat.

- Change the i40e_reset_vf function to a void type, and fix
  up the callers to reflect this.
- Move the MSI-X disable code to i40e_free_vf_res since it must
  be done every time the VF is freed, regardless of whether or
  not it is reset.
- Ensure that the PCIe bus is quiet before polling the reset bit.
- Don't clear the VFGEN_RSTAT1 register at the beginning as it is
  cleared by the reset.
- Poll longer for the reset to be done.
- Disable the queues using an existing function rather than
  rolling our own.
- Free and reallocate the VSI after reset to avoid rx hang.

Change-Id: I11e2590431cb73e8663714d1cc5b23d59b809033
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: move i40e_reset_vf

The VF reset code will be refactored in future patches. Part of that
refactor required it to call i40e_alloc_vf_res and i40e_free_vf_res, so
the function must be moved. In order to make the future patches more
readable, we perform the function move here, with no other changes.

Change-Id: If6567c9c0bada6caafb2ee0227e0d9d50d05f27f
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Rx checksum offload for VXLAN

This implements receive offload for VXLAN for i40e. The hardware
supports checksum offload/verification of the inner/outer header.

Change-Id: I450db300af6713f2044fef1191a0d1d294c13369
Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Implementation of VXLAN ndo's

This adds the implementation for the VXLAN ndo's. This allows the
hardware to do RX checksum offload for inner packets on the UDP ports
that VXLAN notifies us about.

Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fix curly brace use and return type

Add curly-braces on a multi-line function. While we're here we
also change to return void in i40e_vsi_clear_rings() since no
caller cares.

Change-Id: I261fcef20e2a39e18d83ec08fdd14456131dee91
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: add wake-on-lan support

Wake on LAN is disabled by default and will remain that way for most
platforms, but there is an NVM setting that allows vendors to enable it
for a port if they think they've provided the right power environment
for the device. This patch adds code to check the NVM setting and enable
Magic Packet use if WoL is enabled for the port.

Since only Magic Packet is supported, there's not a lot of HW configuration
needed.

Change-Id: I44e904a7b15695e34683009f487064cd86ea59b0
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Populate and check pci bus speed and width

Call i40e_set_pci_config_data from probe, then check that
we are in a 8GT/s x8 PCIe slot and send a warning if we are not.

Change-Id: I62815c574cee50d2787c50bbe956dde7a7a75a11
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Suppress HMC error to Interrupt message level

The HMC error interrupt would generate an un-necessary message
"unhandled interrupt", causing extra log spam, in addition to causing
a reset that was not necessary. Prevent this issue by handling the
HMC error case explicitly, and only reset if the interrupt was from
some of the other causes.

Change-Id: Iabd203ba1dfc26a136b638597f3e9991acfa29f3
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: using for_each_set_bit to simplify the code

Using for_each_set_bit() to simplify the code.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

pci_regs.h: Add PCI bus link speed and width defines

Add missing PCI bus link speed 8.0 GT/s and bus link widths of
x1, x2, x4 and x8.

CC: <linux-kernel@vger.kernel.org>
CC: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>

Merge branch 'bonding'

Scott Feldman says:

====================
bonding: final set of netlink patches

v2:

- per Jiri's comment, fix ad_select checking against parm table by
   spliting bond_parse_parm() into several funcs.  Go ahead and apply
   same technique to all parameters using parm table.

- fix netlink msg size to including missing nest attr

- drop the last patch for active_slaves.  This patch needs to be
   reworked per Jiri's comments and shouldn't hold up finalizing
   the conversion of the existing parameter to netlink attributes.

Ding, assuming this patch set goes in, you should have all you
need to start converting module parameter setting/checking over to
funcs in *_options.c.

I'll send iproute2 patch for bonding netlink support once this patch
set is accepted.

v1:

The following series implements the last set of bonding netlink attributes
for 802.3ad mode:

lacp_rate
ad_select
ad_info, nest of:
ad_aggregator
ad_num_ports
ad_actor_key
ad_partner_key
ad_partner_mac

The last patch adds an additional netlink attribute, active_slaves, which
is a nested list of ifindices for current active slaves.  We're using this
list to enable/disable hashing of ports in a hardware LAG implementation.
In the same way bonding driver includes/excludes ports for 802.3ad egress
hashing, hardware ports are included/excluded from egress hashing by
hardware based on port active status.  Yes, data path offloaded to
hardware, control path remains in kernel via bonding driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add bounds checking for tbl params

Add bounds checking for params defined with parm tbl.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: fix netlink msg size

Add missing space for IFLA_BOND_ARP_IP_TARGET nest header.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add ad_info attribute netlink support

Add nested IFLA_BOND_AD_INFO for bonding 802.3ad info.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add ad_select attribute netlink support

Add IFLA_BOND_AD_SELECT to allow get/set of bonding parameter
ad_select via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add lacp_rate attribute netlink support

Add IFLA_BOND_AD_LACP_RATE to allow get/set of bonding parameter
lacp_rate via netlink.

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tg3'

Nithin Nayak Sujir says:

====================
tg3: Unicast filter support and misc fixes

Michael Chan (2):
  tg3: Refactor __tg3_set_mac_addr()
  tg3: Add unicast filtering support.

Nithin Nayak Sujir (3):
  tg3: Set the MAC clock to the fastest speed during boot code load
  tg3: Poll cpmu link state on APE + ASF enabled devices
  tg3: Update version to 3.136
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Update version to 3.136

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Poll cpmu link state on APE + ASF enabled devices

On ASF enabled devices where the mgmt firmware runs on the application
processing engine, there is a race between the tg3 driver processing a
link change event and the ASF firmware clearing the link changed bit in
the EMAC status register. This leads to link notifications to the driver
sometimes getting lost.

Poll the CPMU link state as a backup for the normal interrupt path
update if ASF is enabled.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Set the MAC clock to the fastest speed during boot code load

On the 5717, 5718 and 5719 devices, the bootcode runs slower when any
port doesn't have a link due to clock speed slowing down as part of the
link-aware feature. This leads to the driver timing out waiting for the
bootcode signature.

This patch overrides the clock policy to the highest frequency just before
reset and restores it after the bootcode is up.

Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Add unicast filtering support.

Up to 3 additional unicast addresses can be added to the perfect match
filter table.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Refactor __tg3_set_mac_addr()

so that individual MAC address filter entries can be set.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

llc: make lock static

The llc_sap_list_lock does not need to be global, only acquired
in core.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

socket: cleanups

Namespace related cleaning

* make cred_to_ucred static
* remove unused sock_rmalloc function

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: coding style fixes

Running 'scripts/checkpatch.pl' on the driver files gives numerous warnings:

- block comments using empty /* line;

- unneeded \ at end of lines;

- message string split across lines;

- use of __attribute__((aligned(n))) instead of __aligned(n) macro;

- use of __attribute__((packed)) instead of __packed macro.

Additionally, running 'scripts/checkpatch.pl --strict' gives more complaints:

- including the paragraph about writing to FSF into the heading comment;

- alignment not matching open paren;

- multiple assignments on one line;

- use of CamelCase names;

- missing {} on one of the *if* arms where another has them;

- spinlock definition without a comment.

While fixing these, also do some more style cleanups:

- remove useless () around expressions;

- add {} around multi-line *if* operator's arm;

- remove space before comma;

- add spaces after /* and before */;

- properly align continuation lines of broken up expressions;

- realign comments to the structure fields.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: fix the wrong return value

The return value should be the boolean value, not the error code.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Spotted-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

smsc9420: use named constants for pci_power_t values

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression e1,e2;
@@

pci_enable_wake(e1,
- 0
+ PCI_D0
,e2)
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tunnel_dst_caching'

Tom Herbert says:

====================
ipv4: Cache dst in tunnels

Version 3 of caching routes in tunnels.

Addressed some comments from Eric in this series.

There are two patches (variants) in the series:
1) One dst cached for each tunnel.
2) Percpu dst cache per tunnel to avoid false sharing

Testing with GRE tunnels on a 32 CPU host with bnx2x (RSS support
for GRE) shows a modest improvement in CPU utilization with these
patches running 200 TCP_RR netperf clients.

Without patches
71.22% CPU utilization
138/180/244 90/95/99% latencies
1.30465e+06 CPU/tps
18318 CPU/tps

With patches
69.84%
142/186/249 90/95/99% latencies
1.30827e+06
18732 CPU/tps
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Use percpu Cache route in IP tunnels

percpu route cache eliminates share of dst refcnt between CPUs.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Cache dst in tunnels

Avoid doing a route lookup on every packet being tunneled.

In ip_tunnel.c cache the route returned from ip_route_output if
the tunnel is "connected" so that all the rouitng parameters are
taken from tunnel parms for a packet. Specifically, not NBMA tunnel
and tos is from tunnel parms (not inner packet).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: Add process name and pid to deprecation warnings

Recently I updated the sctp socket option deprecation warnings to be both a bit
more clear and ratelimited to prevent user processes from spamming the log file.
Ben Hutchings suggested that I add the process name and pid to these warnings so
that users can tell who is responsible for using the deprecated apis. This
patch accomplishes that.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

net: tulip: delete useless tests on netdev_priv

Netdev_priv performs an addition, not a pointer dereference, so it seems
quite unlikely that its result would ever be NULL.

A semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
statement S;
@@

- if (!netdev_priv(...)) S
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: x_tables: lightweight process control group matching

It would be useful e.g. in a server or desktop environment to have
a facility in the notion of fine-grained "per application" or "per
application group" firewall policies. Probably, users in the mobile,
embedded area (e.g. Android based) with different security policy
requirements for application groups could have great benefit from
that as well. For example, with a little bit of configuration effort,
an admin could whitelist well-known applications, and thus block
otherwise unwanted "hard-to-track" applications like [1] from a
user's machine. Blocking is just one example, but it is not limited
to that, meaning we can have much different scenarios/policies that
netfilter allows us than just blocking, e.g. fine grained settings
where applications are allowed to connect/send traffic to, application
traffic marking/conntracking, application-specific packet mangling,
and so on.

Implementation of PID-based matching would not be appropriate
as they frequently change, and child tracking would make that
even more complex and ugly. Cgroups would be a perfect candidate
for accomplishing that as they associate a set of tasks with a
set of parameters for one or more subsystems, in our case the
netfilter subsystem, which, of course, can be combined with other
cgroup subsystems into something more complex if needed.

As mentioned, to overcome this constraint, such processes could
be placed into one or multiple cgroups where different fine-grained
rules can be defined depending on the application scenario, while
e.g. everything else that is not part of that could be dropped (or
vice versa), thus making life harder for unwanted processes to
communicate to the outside world. So, we make use of cgroups here
to track jobs and limit their resources in terms of iptables
policies; in other words, limiting, tracking, etc what they are
allowed to communicate.

In our case we're working on outgoing traffic based on which local
socket that originated from. Also, one doesn't even need to have
an a-prio knowledge of the application internals regarding their
particular use of ports or protocols. Matching is *extremly*
lightweight as we just test for the sk_classid marker of sockets,
originating from net_cls. net_cls and netfilter do not contradict
each other; in fact, each construct can live as standalone or they
can be used in combination with each other, which is perfectly fine,
plus it serves Tejun's requirement to not introduce a new cgroups
subsystem. Through this, we result in a very minimal and efficient
module, and don't add anything except netfilter code.

One possible, minimal usage example (many other iptables options
can be applied obviously):

1) Configuring cgroups if not already done, e.g.:

  mkdir /sys/fs/cgroup/net_cls
  mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
  mkdir /sys/fs/cgroup/net_cls/0
  echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
  (resp. a real flow handle id for tc)

2) Configuring netfilter (iptables-nftables), e.g.:

  iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP

3) Running applications, e.g.:

  ping 208.67.222.222  <pid:1799>
  echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
  [...]
  ping 208.67.220.220  <pid:1804>
  ping: sendmsg: Operation not permitted
  [...]
  echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
  64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
  [...]

Of course, real-world deployments would make use of cgroups user
space toolsuite, or own custom policy daemons dynamically moving
applications from/to various cgroups.

  [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

net: netprio: rename config to be more consistent with cgroup configs

While we're at it and introduced CGROUP_NET_CLASSID, lets also make
NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
option consistent among networking cgroups, but also among cgroups
CONFIG conventions in general as the vast majority has a prefix of
CONFIG_CGROUP_<SUBSYS>.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

net: net_cls: move cgroupfs classid handling into core

Zefan Li requested [1] to perform the following cleanup/refactoring:

- Split cgroupfs classid handling into net core to better express a
  possible more generic use.

- Disable module support for cgroupfs bits as the majority of other
  cgroupfs subsystems do not have that, and seems to be not wished
  from cgroup side. Zefan probably might want to follow-up for netprio
  later on.

- By this, code can be further reduced which previously took care of
  functionality built when compiled as module.

cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so
that we are consistent with {netclassid,netprio}_cgroup naming that is
under net/core/ as suggested by Zefan.

No change in functionality, but only code refactoring that is being
done here.

[1] http://patchwork.ozlabs.org/patch/304825/

Suggested-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: cgroups@vger.kernel.org
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: xt_CT: fix error value in xt_ct_tg_check()

If setting event mask fails then we were returning 0 for success.
This patch updates return code to -EINVAL in case of problem.

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack: remove dead code

The following code is not used in current upstream code.
Some of this seems to be old hooks, other might be used by some
out of tree module (which I don't care about breaking), and
the need_ipv4_conntrack was used by old NAT code but no longer
called.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: ipset: remove unused code

Function never used in current upstream code.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_nat: add full port randomization support

We currently use prandom_u32() for allocation of ports in tcp bind(0)
and udp code. In case of plain SNAT we try to keep the ports as is
or increment on collision.

SNAT --random mode does use per-destination incrementing port
allocation. As a recent paper pointed out in [1] that this mode of
port allocation makes it possible to an attacker to find the randomly
allocated ports through a timing side-channel in a socket overloading
attack conducted through an off-path attacker.

So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
in regard to the attack described in this paper. As we need to keep
compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
selection algorithm with a simple prandom_u32() in order to mitigate
this attack vector. Note that the lfsr113's internal state is
periodically reseeded by the kernel through a local secure entropy
source.

More details can be found in [1], the basic idea is to send bursts
of packets to a socket to overflow its receive queue and measure
the latency to detect a possible retransmit when the port is found.
Because of increasing ports to given destination and port, further
allocations can be predicted. This information could then be used by
an attacker for e.g. for cache-poisoning, NS pinning, and degradation
of service attacks against DNS servers [1]:

  The best defense against the poisoning attacks is to properly
  deploy and validate DNSSEC; DNSSEC provides security not only
  against off-path attacker but even against MitM attacker. We hope
  that our results will help motivate administrators to adopt DNSSEC.
  However, full DNSSEC deployment make take significant time, and
  until that happens, we recommend short-term, non-cryptographic
  defenses. We recommend to support full port randomisation,
  according to practices recommended in [2], and to avoid
  per-destination sequential port allocation, which we show may be
  vulnerable to derandomisation attacks.

Joint work between Hannes Frederic Sowa and Daniel Borkmann.

[1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
[2] http://arxiv.org/pdf/1205.5190v1.pdf

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

net: revert "sched classifier: make cgroup table local"

This reverts commit de6fb288b1246a5c4e00c0cdbfe3a838a360b3f4.
Otherwise we got:

net/sched/cls_cgroup.c:106:29: error: static declaration of ‘net_cls_subsys’ follows non-static declaration
static struct cgroup_subsys net_cls_subsys = {
                             ^
In file included from include/linux/cgroup.h:654:0,
                 from net/sched/cls_cgroup.c:18:
include/linux/cgroup_subsys.h:35:29: note: previous declaration of ‘net_cls_subsys’ was here
SUBSYS(net_cls)
                             ^
make[2]: *** [net/sched/cls_cgroup.o] Error 1

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sched classifier: make cgroup table local

Doesn't need to be global.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

sched action: make local function static

No need to export functions only used in one file.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: Only cycle port if HW timestamp config changes

If the hwtstamp_config matches what is currently set for the device then
simply return. Without this change any program that tries to enable
hardware timestamps will cause the link to cycle even if hardware
timstamps were already enabled.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-By: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: Add PTP hardware clock

This adds a PHC to the mlx4_en driver. We use reader/writer spinlocks to
protect the timecounter since every packet received needs to call
timecounter_cycle2time() when timestamping is enabled. This can become
a performance bottleneck with RSS and multiple receive queues if normal
spinlocks are used.

This driver has been tested with both Documentation/ptp/testptp and the
linuxptp project (http://linuxptp.sourceforge.net/) on a Mellanox
ConnectX-3 card.

Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-By: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: switch and case should be at the same indent

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: spaces required around that '='

Signed-off-by: Weilong Chen <chenweilong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Cleanup in eth-netx.h

Commit 2960ed346877 ("ARM: netx: move platform_data definitions")
moved the file to the current location but forgot to remove the pointer
to its previous location. Clean it up. While at it also change the header
file protection macros appropriately.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

hamradio: 6pack: fix error return code

Set the return variable to an error code as done elsewhere in the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix error return code

Set the return variable to propagate any error code as done elsewhere in
the function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
    when != &ret
*if(...)
{
  ... when != ret = e2
      when forall
return ret;
}

// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: remove prune parameter for fib6_clean_all

since the prune parameter for fib6_clean_all always is 0, remove it.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: make the code look more readable

In commit 3b8401fe9d ("tipc: kill unnecessary goto's") didn't make
the code look most readable, so fix it. This patch is cosmetic
and does not change the operation of TIPC in any way.

Suggested-by: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun, rfs: fix the incorrect hash value

The code incorrectly save the queue index as the hash, so this patch
is fixing it with the hash received in the stack receive path.

Signed-off-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>