]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
7 years agotcp: remove poll() flakes when receiving RST
Eric Dumazet [Tue, 18 Apr 2017 16:45:51 +0000 (09:45 -0700)]
tcp: remove poll() flakes when receiving RST

When a RST packet is processed, we send two wakeup events to interested
polling users.

First one by a sk->sk_error_report(sk) from tcp_reset(),
followed by a sk->sk_state_change(sk) from tcp_done().

Depending on machine load and luck, poll() can either return POLLERR,
or POLLIN|POLLOUT|POLLERR|POLLHUP (this happens on 99 % of the cases)

This is probably fine, but we can avoid the confusion by reordering
things so that we have more TCP fields updated before the first wakeup.

This might even allow us to remove some barriers we added in the past.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'mlxsw-flow-based-forwarding-OVS'
David S. Miller [Thu, 20 Apr 2017 19:32:32 +0000 (15:32 -0400)]
Merge branch 'mlxsw-flow-based-forwarding-OVS'

Jiri Pirko says:

====================
mlxsw: Allow flow based forwarding in OVS

This patchset does some fixes so the HW is setup correctly to do
flow-based (ACL based) forwarding for OVS-enslaved port.

The first patch is just trivial fix spotted on the way.

Patches 2-4 take care of proper FID setup which HW needs in order to
for ACL based forwarding.

The 7th patch (with dependency of patch 5 and 6) takes care of proper setup
of ports that are enslaved in OVS.

The last patch implements new FID miss trap that is used to push
packets belonging to unknown flows to kernel and userspace.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Add FID miss trap
Jiri Pirko [Tue, 18 Apr 2017 14:55:38 +0000 (16:55 +0200)]
mlxsw: spectrum: Add FID miss trap

When there is no FID set for a specific packet, the HW will drop it.
However, by default these packets are useful to be delivered to CPU as
it can inspect them and program HW accordingly. So add this trap.

This would only ever happen when port is enslaved to an OVS master.
Otherwise, packets would be dropped during VLAN / STP filtering,
before FID classification.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Allow ports to work under OVS master
Jiri Pirko [Tue, 18 Apr 2017 14:55:37 +0000 (16:55 +0200)]
mlxsw: spectrum: Allow ports to work under OVS master

>From now on, a port can become a slave of OVS master. All vlans
are enabled, STP state is set to "forwarding". It is up to the OVS
userspace daemon to setup the flows either in kernel or in HW using TC
flower offload.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: add netif_is_ovs_port helper
Jiri Pirko [Tue, 18 Apr 2017 14:55:36 +0000 (16:55 +0200)]
net: add netif_is_ovs_port helper

To find out if a netdev is an OVS port.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Teach mlxsw_sp_port_vlan_set to accept any vlan range
Jiri Pirko [Tue, 18 Apr 2017 14:55:35 +0000 (16:55 +0200)]
mlxsw: spectrum: Teach mlxsw_sp_port_vlan_set to accept any vlan range

So far, mlxsw_sp_port_vlan_set range is limited by
MLXSW_REG_SPVM_REC_MAX_COUNT. In spectrum_switchdev code this is
wrapped up by a helper function which actually does multiple calls
to FW for bigger ranges. Move the code into mlxsw_sp_port_vlan_set
and use it always. That allows caller not to care about the range.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum_flower: Set dummy FID before forward action
Jiri Pirko [Tue, 18 Apr 2017 14:55:34 +0000 (16:55 +0200)]
mlxsw: spectrum_flower: Set dummy FID before forward action

HW requires the FID to be valid in order for the forward action to work.
So regardless of the current FID validity, just set the dummy FID which
would do the trick.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Add dummy FID initialization
Jiri Pirko [Tue, 18 Apr 2017 14:55:33 +0000 (16:55 +0200)]
mlxsw: spectrum: Add dummy FID initialization

For forwarding using ACL action, HW needs a valid FID to be setup. It
does not actually use it, so it can be any valid FID. So create a dummy
FID only for this purpose.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Implement action to set FID
Jiri Pirko [Tue, 18 Apr 2017 14:55:32 +0000 (16:55 +0200)]
mlxsw: spectrum: Implement action to set FID

Implement part of multipurpose Virtual Router and Forwarding Domain
Action that takes care of setting up FID. We need to use it to be able
to forward packets using ACL action when no FID is associated on RX.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agomlxsw: spectrum: Fix indent in mlxsw_sp_netdevice_port_upper_event
Jiri Pirko [Tue, 18 Apr 2017 14:55:31 +0000 (16:55 +0200)]
mlxsw: spectrum: Fix indent in mlxsw_sp_netdevice_port_upper_event

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobindings: net: stmmac: add missing note about LPI interrupt
Niklas Cassel [Tue, 18 Apr 2017 12:39:53 +0000 (14:39 +0200)]
bindings: net: stmmac: add missing note about LPI interrupt

The hardware has a LPI interrupt.
There is already code in the stmmac driver to parse and handle the
interrupt. However, this information was missing from the DT binding.

At the same time, improve the description of the existing interrupts.

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Acked-By: Joao Pinto <jpinto@synopsys.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: remove reference to sock_filter_ext from kerneldoc comment
Tobias Klauser [Tue, 18 Apr 2017 09:27:00 +0000 (11:27 +0200)]
bpf: remove reference to sock_filter_ext from kerneldoc comment

struct sock_filter_ext didn't make it into the tree and is now called
struct bpf_insn. Reword the kerneldoc comment for bpf_convert_filter()
accordingly.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'mac80211-next-for-davem-2017-04-18' of git://git.kernel.org/pub/scm/linux...
David S. Miller [Thu, 20 Apr 2017 17:54:40 +0000 (13:54 -0400)]
Merge tag 'mac80211-next-for-davem-2017-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
My last pull request has been a while, we now have:
 * connection quality monitoring with multiple thresholds
 * support for FILS shared key authentication offload
 * pre-CAC regulatory compliance - only ETSI allows this
 * sanity check for some rate confusion that hit ChromeOS
   (but nobody else uses it, evidently)
 * some documentation updates
 * lots of cleanups
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'dsa-LAN9303'
David S. Miller [Thu, 20 Apr 2017 17:48:55 +0000 (13:48 -0400)]
Merge branch 'dsa-LAN9303'

Juergen Borleis says:

====================
net: dsa: add SMSC/Microchip LAN9303 three port ethernet switch driver

The LAN9303 is a three port 10/100 ethernet switch with integrated phys
for the two external ethernet ports. The third port is an RMII/MII
interface to a host master network interface (e.g. fixed link).

While the LAN9303 device itself supports offload packet processing, this
driver does not make use of it yet. This driver just configures the device
to provide two separate network interfaces (which is the default state of
a DSA device).

Please note: the "MDIO managed mode" driver part isn't tested yet. I have
used and tested the "I2C managed mode" only.

Changes in v6:

- fix support to use the driver as a module (core, i2c and mdio)
- license info added in all parts of the driver (for module support)

Changes in v5:

- add missing include file to 'net/dsa/tag_lan9303.c'

Changes in v4:

- rebased on net-next, 'net/dsa/tag_lan9303.c' adapted to changed API

Changes in v3:

- 'ds_to_lan9303()' removed
- special PHY reg MII_LAN911X_SPECIAL_CONTROL_STATUS mapping removed
- compatible strings for I2C and MDIO are now different
- MDIO-managed-mode devicetree binding added (still compile time tested only)

Changes in v2:

- code moved to 'drivers/net/dsa'
- timeouts in completion wait loops
- macros instead of various magic numbers
- development code removed
- devicetree property names changed
- devicetree example adapted
- tried to avoid to mix 'switching' and 'forwarding'...

Comments are welcome.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: LAN9303: add MDIO managed mode support
Juergen Beisert [Tue, 18 Apr 2017 08:48:27 +0000 (10:48 +0200)]
net: dsa: LAN9303: add MDIO managed mode support

When the LAN9303 device is in MDIO manged mode, all register accesses must
be done via MDIO.

Please note: this code is compile time tested only due to the absence of such
configured hardware. It is based on a patch from Stefan Roese from 2014.

Signed-off-by: Juergen Borleis <jbe@pengutronix.de>
CC: devicetree@vger.kernel.org
CC: robh+dt@kernel.org
CC: mark.rutland@arm.com
CC: sr@denx.de
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: LAN9303: add I2C managed mode support
Juergen Beisert [Tue, 18 Apr 2017 08:48:26 +0000 (10:48 +0200)]
net: dsa: LAN9303: add I2C managed mode support

In this mode the switch device and the internal phys will be managed via
I2C interface. The MDIO interface is still supported, but for the
(emulated) CPU port only.

Signed-off-by: Juergen Borleis <jbe@pengutronix.de>
CC: devicetree@vger.kernel.org
CC: robh+dt@kernel.org
CC: mark.rutland@arm.com
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: add new DSA switch driver for the SMSC-LAN9303
Juergen Beisert [Tue, 18 Apr 2017 08:48:25 +0000 (10:48 +0200)]
net: dsa: add new DSA switch driver for the SMSC-LAN9303

The SMSC/Microchip LAN9303 is an ethernet switch device with one CPU port
and two external ethernet ports with built-in phys.

This driver uses the DSA framework, but is currently only capable of
separating the two external ports. There is no offload support yet.

Signed-off-by: Juergen Borleis <jbe@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: dsa: add support for the SMSC-LAN9303 tagging format
Juergen Beisert [Tue, 18 Apr 2017 08:48:24 +0000 (10:48 +0200)]
net: dsa: add support for the SMSC-LAN9303 tagging format

To define the outgoing port and to discover the incoming port a regular
VLAN tag is used by the LAN9303. But its VID meaning is 'special'.

This tag handler/filter depends on some hardware features which must be
enabled in the device to provide and make use of this special VLAN tag
to control the destination and the source of an ethernet packet.

Signed-off-by: Juergen Borleis <jbe@pengutronix.de>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet/mlx4: suppress 'may be used uninitialized' warning
Greg Thelen [Tue, 18 Apr 2017 06:21:35 +0000 (23:21 -0700)]
net/mlx4: suppress 'may be used uninitialized' warning

gcc 4.8.4 complains that mlx4_SW2HW_MPT_wrapper() uses an uninitialized
'mpt' variable:
  drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_SW2HW_MPT_wrapper':
  drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2802:12: warning: 'mpt' may be used uninitialized in this function [-Wmaybe-uninitialized]
     mpt->mtt = mtt;

I think this warning is a false complaint.  mpt is only used when
mr_res_start_move_to() return zero, and in all such cases it initializes
mpt.  But apparently gcc cannot see that.

Initialize mpt to avoid the warning.

Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Thu, 20 Apr 2017 14:35:33 +0000 (10:35 -0400)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

A function in kernel/bpf/syscall.c which got a bug fix in 'net'
was moved to kernel/bpf/verifier.c in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosh_eth: unmap DMA buffers when freeing rings
Sergei Shtylyov [Mon, 17 Apr 2017 12:55:22 +0000 (15:55 +0300)]
sh_eth: unmap DMA buffers when freeing rings

The DMA API debugging (when enabled) causes:

WARNING: CPU: 0 PID: 1445 at lib/dma-debug.c:519 add_dma_entry+0xe0/0x12c
DMA-API: exceeded 7 overlapping mappings of cacheline 0x01b2974d

to be  printed after repeated initialization of the Ether device, e.g.
suspend/resume or 'ifconfig' up/down. This is because DMA buffers mapped
using dma_map_single() in sh_eth_ring_format() and sh_eth_start_xmit() are
never unmapped. Resolve this problem by unmapping the buffers when freeing
the descriptor  rings;  in order  to do it right, we'd have to add an extra
parameter to sh_eth_txfree() (we rename this function to sh_eth_tx_free(),
while at it).

Based on the commit a47b70ea86bd ("ravb: unmap descriptors when freeing
rings").

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Tue, 18 Apr 2017 20:56:51 +0000 (13:56 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
 "Two Sparc bug fixes from Daniel Jordan and Nitin Gupta"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix hugepage page table free
  sparc64: Use LOCKDEP_SMALL, not PROVE_LOCKING_SMALL

7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Tue, 18 Apr 2017 20:24:42 +0000 (13:24 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) BPF tail call handling bug fixes from Daniel Borkmann.

 2) Fix allowance of too many rx queues in sfc driver, from Bert
    Kenward.

 3) Non-loopback ipv6 packets claiming src of ::1 should be dropped,
    from Florian Westphal.

 4) Statistics requests on KSZ9031 can crash, fix from Grygorii
    Strashko.

 5) TX ring handling fixes in mediatek driver, from Sean Wang.

 6) ip_ra_control can deadlock, fix lock acquisition ordering to fix,
    from Cong WANG.

 7) Fix use after free in ip_recv_error(), from Willem de Buijn.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  bpf: fix checking xdp_adjust_head on tail calls
  bpf: fix cb access in socket filter programs on tail calls
  ipv6: drop non loopback packets claiming to originate from ::1
  net: ethernet: mediatek: fix inconsistency of port number carried in TXD
  net: ethernet: mediatek: fix inconsistency between TXD and the used buffer
  net: phy: micrel: fix crash when statistic requested for KSZ9031 phy
  net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule
  net: thunderx: Fix set_max_bgx_per_node for 81xx rgx
  net-timestamp: avoid use-after-free in ip_recv_error
  ipv4: fix a deadlock in ip_ra_control
  sfc: limit the number of receive queues

7 years agosparc64: Fix hugepage page table free
Nitin Gupta [Mon, 17 Apr 2017 22:46:41 +0000 (15:46 -0700)]
sparc64: Fix hugepage page table free

Make sure the start adderess is aligned to PMD_SIZE
boundary when freeing page table backing a hugepage
region. The issue was causing segfaults when a region
backed by 64K pages was unmapped since such a region
is in general not PMD_SIZE aligned.

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc64: Use LOCKDEP_SMALL, not PROVE_LOCKING_SMALL
Daniel Jordan [Mon, 10 Apr 2017 15:50:52 +0000 (11:50 -0400)]
sparc64: Use LOCKDEP_SMALL, not PROVE_LOCKING_SMALL

CONFIG_PROVE_LOCKING_SMALL shrinks the memory usage of lockdep so the
kernel text, data, and bss fit in the required 32MB limit, but this
option is not set for every config that enables lockdep.

A 4.10 kernel fails to boot with the console output

    Kernel: Using 8 locked TLB entries for main kernel image.
    hypervisor_tlb_lock[2000000:0:8000000071c007c3:1]: errors with f
    Program terminated

with these config options

    CONFIG_LOCKDEP=y
    CONFIG_LOCK_STAT=y
    CONFIG_PROVE_LOCKING=n

To fix, rename CONFIG_PROVE_LOCKING_SMALL to CONFIG_LOCKDEP_SMALL, and
enable this option with CONFIG_LOCKDEP=y so we get the reduced memory
usage every time lockdep is turned on.

Tested that CONFIG_LOCKDEP_SMALL is set to 'y' if and only if
CONFIG_LOCKDEP is set to 'y'.  When other lockdep-related config options
that select CONFIG_LOCKDEP are enabled (e.g. CONFIG_LOCK_STAT or
CONFIG_PROVE_LOCKING), verified that CONFIG_LOCKDEP_SMALL is also
enabled.

Fixes: e6b5f1be7afe ("config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc")
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: cx89x0: move attribute declaration before struct keyword
Stefan Agner [Mon, 17 Apr 2017 20:54:34 +0000 (13:54 -0700)]
net: cx89x0: move attribute declaration before struct keyword

The attribute declaration is typically before the definition. Move
the __maybe_unused attribute declaration before the struct keyword.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobe2net: VxLAN offload should be re-enabled when only 1 UDP port is left
Sriharsha Basavapatna [Mon, 17 Apr 2017 16:03:13 +0000 (21:33 +0530)]
be2net: VxLAN offload should be re-enabled when only 1 UDP port is left

We disable VxLAN offload when more than 1 UDP port is added to the driver,
since Skyhawk doesn't support offload with multiple ports. The existing
driver design expects the user to delete all port configurations and create
a configuration with a single UDP port for VxLAN offload to be re-enabled.
Remove this restriction by tracking the ports added and re-enabling offload
when ports get deleted and only 1 port is left.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agodrivers: net: xgene-v2: Extend ethtool statistics
Iyappan Subramanian [Mon, 17 Apr 2017 23:47:55 +0000 (16:47 -0700)]
drivers: net: xgene-v2: Extend ethtool statistics

This patch adds extended statistics reporting to ethtool.

In summary, this patch,

   - adds ethtool.h with the statistics register definitions
   - adds 'struct xge_gstrings_extd_stats' to gather extended stats
   - modifies xge_get_strings(), get_sset_count() and
     get_ethtool_stats() accordingly
   - moves 'struct xge_gstrings_stats' to ethtool.h

Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'ftgmac100-batch5-features'
David S. Miller [Tue, 18 Apr 2017 18:11:10 +0000 (14:11 -0400)]
Merge branch 'ftgmac100-batch5-features'

Benjamin Herrenschmidt says:

====================
ftgmac100: Rework batch 5 - Features

This is the third spin of the fifth and last batch of
updates to the ftgmac100 driver.

This contains a few additional "features" such as:

 - Support for ethtool n-way reset
 - Multicast filtering & promisc support
 - Vlan offload
 - netpoll

And a couple of misc bits. This also adds the device-tree binding
documentation.

v2. - Addresses review comments and adds a new patch fixing a
      theorical ordering issue in my new NAPI poll implementation
    - Add a bug fix (Patch 8/9) for a potential ordering issue
      in the new NAPI poll code.

v3. - Rebase on net-next (fix conflict with an unrelated #include
      change series)
    - Update DT bindings better describing accepted phy-mode values
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoftgmac100: Document device-tree binding
Benjamin Herrenschmidt [Mon, 17 Apr 2017 22:37:06 +0000 (08:37 +1000)]
ftgmac100: Document device-tree binding

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoftgmac100: Fix potential ordering issue in NAPI poll
Benjamin Herrenschmidt [Mon, 17 Apr 2017 22:37:05 +0000 (08:37 +1000)]
ftgmac100: Fix potential ordering issue in NAPI poll

We need to ensure the loads from the descriptor are done after the
MMIO store clearing the interrupts has completed, otherwise we
might still miss work.

A read back from the MMIO register will "push" the posted store and
ioread32 has a barrier on weakly aordered architectures that will
order subsequent accesses.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoftgmac100: Display the discovered PHY device info
Benjamin Herrenschmidt [Mon, 17 Apr 2017 22:37:04 +0000 (08:37 +1000)]
ftgmac100: Display the discovered PHY device info

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoftgmac100: Allow configuration of phy interface via device-tree
Benjamin Herrenschmidt [Mon, 17 Apr 2017 22:37:03 +0000 (08:37 +1000)]
ftgmac100: Allow configuration of phy interface via device-tree

This uses the standard phy-mode property

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoftgmac100: Add netpoll support
Benjamin Herrenschmidt [Mon, 17 Apr 2017 22:37:02 +0000 (08:37 +1000)]
ftgmac100: Add netpoll support

Just call the interrupt handler with interrupts locally disabled

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoftgmac100: Add vlan HW offload
Benjamin Herrenschmidt [Mon, 17 Apr 2017 22:37:01 +0000 (08:37 +1000)]
ftgmac100: Add vlan HW offload

The chip supports HW vlan tag insertion and extraction. Add support
for it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoftgmac100: Add ndo_set_rx_mode() and support for multicast & promisc
Benjamin Herrenschmidt [Mon, 17 Apr 2017 22:37:00 +0000 (08:37 +1000)]
ftgmac100: Add ndo_set_rx_mode() and support for multicast & promisc

This adds the ndo_set_rx_mode() callback to configure the
multicast filters, promisc and allmulti options.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoftgmac100: Add pause frames configuration and support
Benjamin Herrenschmidt [Mon, 17 Apr 2017 22:36:59 +0000 (08:36 +1000)]
ftgmac100: Add pause frames configuration and support

Hopefully my understanding of how the hardware works is correct,
as the documentation isn't completely clear. So far I have seen
no obvious issue. Pause seem to also work with NC-SI.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoftgmac100: Add ethtool n-way reset call
Benjamin Herrenschmidt [Mon, 17 Apr 2017 22:36:58 +0000 (08:36 +1000)]
ftgmac100: Add ethtool n-way reset call

A non-wired up implementation accidentally made its way in
a previous patch (Make ring sizes configurable via ethtool).

This removes it and wires up the generic phy_ethtool_nway_reset
instead.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'marvell-static-code-analysis'
David S. Miller [Tue, 18 Apr 2017 17:55:11 +0000 (13:55 -0400)]
Merge branch 'marvell-static-code-analysis'

Markus Elfring says:

====================
Ethernet-Marvell: Fine-tuning for several function implementations

Several update suggestions were taken into account
from static source code analysis.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosky2: Use seq_puts() in sky2_debug_show()
Markus Elfring [Mon, 17 Apr 2017 14:15:12 +0000 (16:15 +0200)]
sky2: Use seq_puts() in sky2_debug_show()

A string which did not contain a data format specification should be put
into a sequence. Thus use the corresponding function "seq_puts".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoskge: Adjust a null pointer check in skge_down()
Markus Elfring [Mon, 17 Apr 2017 14:08:39 +0000 (16:08 +0200)]
skge: Adjust a null pointer check in skge_down()

The script "checkpatch.pl" pointed information out like the following.

Comparison to NULL could be written "!skge->mem".

Thus fix the affected source code place.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoskge: Use seq_puts() in skge_debug_show()
Markus Elfring [Mon, 17 Apr 2017 13:43:08 +0000 (15:43 +0200)]
skge: Use seq_puts() in skge_debug_show()

A string which did not contain a data format specification should be put
into a sequence. Thus use the corresponding function "seq_puts".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: pxa168_eth: Adjust four checks for null pointers
Markus Elfring [Mon, 17 Apr 2017 13:23:45 +0000 (15:23 +0200)]
net: pxa168_eth: Adjust four checks for null pointers

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The script “checkpatch.pl” pointed information out like the following.

Comparison to NULL could be written …

Thus fix the affected source code places.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: pxa168_eth: Use kcalloc() in two functions
Markus Elfring [Mon, 17 Apr 2017 12:32:14 +0000 (14:32 +0200)]
net: pxa168_eth: Use kcalloc() in two functions

Multiplications for the size determination of memory allocations
indicated that array data structures should be processed.
Thus use the corresponding function "kcalloc".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Adjust a null pointer check in mvpp2_egress_enable()
Markus Elfring [Mon, 17 Apr 2017 12:07:52 +0000 (14:07 +0200)]
net: mvpp2: Adjust a null pointer check in mvpp2_egress_enable()

The script "checkpatch.pl" pointed information out like the following.

Comparison to NULL could be written "txq->descs".

Thus fix the affected source code place.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Rename a jump label in mvpp2_prs_vlan_add()
Markus Elfring [Mon, 17 Apr 2017 11:50:35 +0000 (13:50 +0200)]
net: mvpp2: Rename a jump label in mvpp2_prs_vlan_add()

Adjust jump labels according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Rename a jump label in mvpp2_prs_double_vlan_add()
Markus Elfring [Mon, 17 Apr 2017 11:03:49 +0000 (13:03 +0200)]
net: mvpp2: Rename a jump label in mvpp2_prs_double_vlan_add()

Adjust jump labels according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Rename a jump label in mvpp2_txq_init()
Markus Elfring [Mon, 17 Apr 2017 10:58:33 +0000 (12:58 +0200)]
net: mvpp2: Rename a jump label in mvpp2_txq_init()

Adjust jump labels according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Rename a jump label in mvpp2_tx_frag_process()
Markus Elfring [Mon, 17 Apr 2017 09:36:34 +0000 (11:36 +0200)]
net: mvpp2: Rename a jump label in mvpp2_tx_frag_process()

Adjust jump labels according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Adjust three error messages
Markus Elfring [Mon, 17 Apr 2017 09:20:41 +0000 (11:20 +0200)]
net: mvpp2: Adjust three error messages

Use the word "failed" in the string for three function calls.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Rename a jump label in two functions
Markus Elfring [Mon, 17 Apr 2017 09:10:47 +0000 (11:10 +0200)]
net: mvpp2: Rename a jump label in two functions

Adjust jump labels according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Fix a jump label position in mvpp2_rx()
Markus Elfring [Mon, 17 Apr 2017 08:52:02 +0000 (10:52 +0200)]
net: mvpp2: Fix a jump label position in mvpp2_rx()

The script "checkpatch.pl" pointed out that labels should not be indented.
Thus delete two horizontal tabs before the jump label "err_drop_frame"
in the function "mvpp2_rx".

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Improve a size determination in two functions
Markus Elfring [Mon, 17 Apr 2017 08:40:32 +0000 (10:40 +0200)]
net: mvpp2: Improve a size determination in two functions

Replace the specification of two data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Improve 27 size determinations
Markus Elfring [Mon, 17 Apr 2017 08:30:29 +0000 (10:30 +0200)]
net: mvpp2: Improve 27 size determinations

Replace the specification of data structures by references to
a local variable as the parameter for the operator "sizeof"
to make the corresponding size determination a bit safer.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Improve another size determination in mvpp2_prs_default_init()
Markus Elfring [Mon, 17 Apr 2017 07:12:34 +0000 (09:12 +0200)]
net: mvpp2: Improve another size determination in mvpp2_prs_default_init()

Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Improve another size determination in mvpp2_bm_init()
Markus Elfring [Mon, 17 Apr 2017 07:06:33 +0000 (09:06 +0200)]
net: mvpp2: Improve another size determination in mvpp2_bm_init()

Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Improve another size determination in mvpp2_port_probe()
Markus Elfring [Mon, 17 Apr 2017 06:55:42 +0000 (08:55 +0200)]
net: mvpp2: Improve another size determination in mvpp2_port_probe()

Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Improve another size determination in mvpp2_init()
Markus Elfring [Mon, 17 Apr 2017 06:48:23 +0000 (08:48 +0200)]
net: mvpp2: Improve another size determination in mvpp2_init()

Replace the specification of a data structure by a pointer dereference
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Improve two size determinations in mvpp2_probe()
Markus Elfring [Mon, 17 Apr 2017 06:38:32 +0000 (08:38 +0200)]
net: mvpp2: Improve two size determinations in mvpp2_probe()

Replace the specification of two data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvpp2: Use kmalloc_array() in mvpp2_txq_init()
Markus Elfring [Mon, 17 Apr 2017 06:09:07 +0000 (08:09 +0200)]
net: mvpp2: Use kmalloc_array() in mvpp2_txq_init()

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data structure by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvneta: Adjust six checks for null pointers
Markus Elfring [Sun, 16 Apr 2017 20:45:33 +0000 (22:45 +0200)]
net: mvneta: Adjust six checks for null pointers

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The script “checkpatch.pl” pointed information out like the following.

Comparison to NULL could be written …

Thus fix the affected source code places.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvneta: Use kmalloc_array() in mvneta_txq_init()
Markus Elfring [Sun, 16 Apr 2017 20:11:22 +0000 (22:11 +0200)]
net: mvneta: Use kmalloc_array() in mvneta_txq_init()

A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvneta: Improve two size determinations in mvneta_init()
Markus Elfring [Sun, 16 Apr 2017 19:45:38 +0000 (21:45 +0200)]
net: mvneta: Improve two size determinations in mvneta_init()

Replace the specification of two data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: mvneta: Use devm_kmalloc_array() in mvneta_init()
Markus Elfring [Sun, 16 Apr 2017 19:23:19 +0000 (21:23 +0200)]
net: mvneta: Use devm_kmalloc_array() in mvneta_init()

* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "devm_kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data type by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agorhashtable: remove insecure_elasticity
Florian Westphal [Sun, 16 Apr 2017 00:55:09 +0000 (02:55 +0200)]
rhashtable: remove insecure_elasticity

commit 83e7e4ce9e93c3 ("mac80211: Use rhltable instead of rhashtable")
removed the last user that made use of 'insecure_elasticity' parameter,
i.e. the default of 16 is used everywhere.

Replace it with a constant.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'sctp-dup-stream-reconf-events'
David S. Miller [Tue, 18 Apr 2017 17:39:51 +0000 (13:39 -0400)]
Merge branch 'sctp-dup-stream-reconf-events'

Xin Long says:

====================
sctp: add proper process for duplicated stream reconf requests

Now sctp stream reconf will process a request again even if it's seqno
is less than asoc->strreset_inseq. It may cause a replay attack.

This patchset is to avoid it by add proper process for all duplicated
stream reconf requests.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: process duplicated strreset asoc request correctly
Xin Long [Sat, 15 Apr 2017 14:00:29 +0000 (22:00 +0800)]
sctp: process duplicated strreset asoc request correctly

This patch is to fix the replay attack issue for strreset asoc requests.

When a duplicated strreset asoc request is received, reply it with bad
seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with the
result saved in asoc if it's seqno >= asoc->strreset_inseq - 2.

But note that if the result saved in asoc is performed, the sender's next
tsn and receiver's next tsn for the response chunk should be set. It's
safe to get them from asoc. Because if it's changed, which means the peer
has received the response already, the new response with wrong tsn won't
be accepted by peer.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: process duplicated strreset in and addstrm in requests correctly
Xin Long [Sat, 15 Apr 2017 14:00:28 +0000 (22:00 +0800)]
sctp: process duplicated strreset in and addstrm in requests correctly

This patch is to fix the replay attack issue for strreset and addstrm in
requests.

When a duplicated strreset in or addstrm in request is received, reply it
with bad seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with
the result saved in asoc if it's seqno >= asoc->strreset_inseq - 2.

For strreset in or addstrm in request, if the receiver side processes it
successfully, a strreset out or addstrm out request(as a response for that
request) will be sent back to peer. reconf_time will retransmit the out
request even if it's lost.

So when receiving a duplicated strreset in or addstrm in request and it's
result was performed, it shouldn't reply this request, but drop it instead.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: process duplicated strreset out and addstrm out requests correctly
Xin Long [Sat, 15 Apr 2017 14:00:27 +0000 (22:00 +0800)]
sctp: process duplicated strreset out and addstrm out requests correctly

Now sctp stream reconf will process a request again even if it's seqno is
less than asoc->strreset_inseq.

If one request has been done successfully and some data chunks have been
accepted and then a duplicated strreset out request comes, the streamin's
ssn will be cleared. It will cause that stream will never receive chunks
any more because of unsynchronized ssn. It allows a replay attack.

A similar issue also exists when processing addstrm out requests. It will
cause more extra streams being added.

This patch is to fix it by saving the last 2 results into asoc. When a
duplicated strreset out or addstrm out request is received, reply it with
bad seqno if it's seqno < asoc->strreset_inseq - 2, and reply it with the
result saved in asoc if it's seqno >= asoc->strreset_inseq - 2.

Note that it saves last 2 results instead of only last 1 result, because
two requests can be sent together in one chunk.

And note that when receiving a duplicated request, the receiver side will
still reply it even if the peer has received the response. It's safe, As
the response will be dropped by the peer.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rosted...
Linus Torvalds [Tue, 18 Apr 2017 17:19:47 +0000 (10:19 -0700)]
Merge tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace testcase update from Steven Rostedt:
 "While testing my development branch, without the fix for the pid use
  after free bug, the selftest that Namhyung added triggers it. I
  figured it would be good to add the test for the bug after the fix,
  such that it does not exist without the fix.

  I added another patch that lets the test only test part of the pid
  filtering, and ignores the function-fork (filtering on children as
  well) if the function-fork feature does not exist. This feature is
  added by Namhyung just before he added this test. But since the test
  tests both with and without the feature, it would be good to let it
  not fail if the feature does not exist"

* tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  selftests: ftrace: Add check for function-fork before running pid filter test
  selftests: ftrace: Add a testcase for function PID filter

7 years agoselftests: ftrace: Add check for function-fork before running pid filter test
Steven Rostedt (VMware) [Tue, 18 Apr 2017 16:32:21 +0000 (12:32 -0400)]
selftests: ftrace: Add check for function-fork before running pid filter test

Have the func-filter-pid test check for the function-fork option before
testing it. It can still test the pid filtering, but will stop before
testing the function-fork option for children inheriting the pids.
This allows the test to be added before the function-fork feature, but after
a bug fix that triggers one of the bugs the test can cause.

Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
7 years agoMerge tag 'trace-v4.11-rc5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rosted...
Linus Torvalds [Tue, 18 Apr 2017 16:31:51 +0000 (09:31 -0700)]
Merge tag 'trace-v4.11-rc5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fix from Steven Rostedt:
 "Namhyung Kim discovered a use after free bug. It has to do with adding
  a pid filter to function tracing in an instance, and then freeing the
  instance"

* tag 'trace-v4.11-rc5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ftrace: Fix function pid filter on instances

7 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Tue, 18 Apr 2017 16:03:50 +0000 (09:03 -0700)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes the following problems:

   - regression in new XTS/LRW code when used with async crypto

   - long-standing bug in ahash API when used with certain algos

   - bogus memory dereference in async algif_aead with certain algos"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: algif_aead - Fix bogus request dereference in completion function
  crypto: ahash - Fix EINPROGRESS notification callback
  crypto: lrw - Fix use-after-free on EINPROGRESS
  crypto: xts - Fix use-after-free on EINPROGRESS

7 years agoselftests: ftrace: Add a testcase for function PID filter
Namhyung Kim [Mon, 17 Apr 2017 02:44:30 +0000 (11:44 +0900)]
selftests: ftrace: Add a testcase for function PID filter

Like event pid filtering test, add function pid filtering test with the
new "function-fork" option.  It also tests it on an instance directory
so that it can verify the bug related pid filtering on instances.

Link: http://lkml.kernel.org/r/20170417024430.21194-5-namhyung@kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
7 years agonl80211: Fix enum type of variable in nl80211_put_sta_rate()
Matthias Kaehlcke [Mon, 17 Apr 2017 22:59:52 +0000 (15:59 -0700)]
nl80211: Fix enum type of variable in nl80211_put_sta_rate()

rate_flg is of type 'enum nl80211_attrs', however it is assigned with
'enum nl80211_rate_info' values. Change the type of rate_flg accordingly.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agomac80211: ibss: Fix channel type enum in ieee80211_sta_join_ibss()
Matthias Kaehlcke [Mon, 17 Apr 2017 20:59:53 +0000 (13:59 -0700)]
mac80211: ibss: Fix channel type enum in ieee80211_sta_join_ibss()

cfg80211_chandef_create() expects an 'enum nl80211_channel_type' as
channel type however in ieee80211_sta_join_ibss()
NL80211_CHAN_WIDTH_20_NOHT is passed in two occasions, which is of
the enum type 'nl80211_chan_width'. Change the value to NL80211_CHAN_NO_HT
(20 MHz, non-HT channel) of the channel type enum.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agocfg80211: Fix array-bounds warning in fragment copy
Matthias Kaehlcke [Thu, 13 Apr 2017 17:05:04 +0000 (10:05 -0700)]
cfg80211: Fix array-bounds warning in fragment copy

__ieee80211_amsdu_copy_frag intentionally initializes a pointer to
array[-1] to increment it later to valid values. clang rightfully
generates an array-bounds warning on the initialization statement.

Initialize the pointer to array[0] and change the algorithm from
increment before to increment after consume.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agomac80211: keep a separate list of monitor interfaces that are up
Johannes Berg [Thu, 13 Apr 2017 11:28:18 +0000 (13:28 +0200)]
mac80211: keep a separate list of monitor interfaces that are up

In addition to keeping monitor interfaces on the regular list of
interfaces, keep those that are up and not in cooked mode on a
separate list. This saves having to iterate all interfaces when
delivering to monitor interfaces.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agonl80211: add request id in scheduled scan event messages
Arend Van Spriel [Thu, 13 Apr 2017 12:06:27 +0000 (13:06 +0100)]
nl80211: add request id in scheduled scan event messages

For multi-scheduled scan support in subsequent patch a request id
will be added. This patch add this request id to the scheduled
scan event messages. For now the request id will always be zero.
With multi-scheduled scan its value will inform user-space to which
scan the event relates.

Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
7 years agoMerge branch 'parisc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Mon, 17 Apr 2017 22:06:34 +0000 (15:06 -0700)]
Merge branch 'parisc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

Pull parisc fix from Helge Deller:
 "One patch which fixes get_user() for 64-bit values on 32-bit kernels.

  Up to now we lost the upper 32-bits of the returned 64-bit value"

* 'parisc-4.11-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix get_user() for 64-bit value on 32-bit kernel

7 years agoftrace: Fix function pid filter on instances
Namhyung Kim [Mon, 17 Apr 2017 02:44:27 +0000 (11:44 +0900)]
ftrace: Fix function pid filter on instances

When function tracer has a pid filter, it adds a probe to sched_switch
to track if current task can be ignored.  The probe checks the
ftrace_ignore_pid from current tr to filter tasks.  But it misses to
delete the probe when removing an instance so that it can cause a crash
due to the invalid tr pointer (use-after-free).

This is easily reproducible with the following:

  # cd /sys/kernel/debug/tracing
  # mkdir instances/buggy
  # echo $$ > instances/buggy/set_ftrace_pid
  # rmdir instances/buggy

  ============================================================================
  BUG: KASAN: use-after-free in ftrace_filter_pid_sched_switch_probe+0x3d/0x90
  Read of size 8 by task kworker/0:1/17
  CPU: 0 PID: 17 Comm: kworker/0:1 Tainted: G    B           4.11.0-rc3  #198
  Call Trace:
   dump_stack+0x68/0x9f
   kasan_object_err+0x21/0x70
   kasan_report.part.1+0x22b/0x500
   ? ftrace_filter_pid_sched_switch_probe+0x3d/0x90
   kasan_report+0x25/0x30
   __asan_load8+0x5e/0x70
   ftrace_filter_pid_sched_switch_probe+0x3d/0x90
   ? fpid_start+0x130/0x130
   __schedule+0x571/0xce0
   ...

To fix it, use ftrace_clear_pids() to unregister the probe.  As
instance_rmdir() already updated ftrace codes, it can just free the
filter safely.

Link: http://lkml.kernel.org/r/20170417024430.21194-2-namhyung@kernel.org
Fixes: 0c8916c34203 ("tracing: Add rmdir to remove multibuffer instances")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
7 years agoMerge branch 'bpf-fixes'
David S. Miller [Mon, 17 Apr 2017 19:51:58 +0000 (15:51 -0400)]
Merge branch 'bpf-fixes'

Daniel Borkmann says:

====================
Two BPF fixes

The set fixes cb_access and xdp_adjust_head bits in struct bpf_prog,
that are used for requirement checks on the program rather than f.e.
heuristics. Thus, for tail calls, we cannot make any assumptions and
are forced to set them.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: fix checking xdp_adjust_head on tail calls
Daniel Borkmann [Mon, 17 Apr 2017 01:12:07 +0000 (03:12 +0200)]
bpf: fix checking xdp_adjust_head on tail calls

Commit 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog")
added the xdp_adjust_head bit to the BPF prog in order to tell drivers
that the program that is to be attached requires support for the XDP
bpf_xdp_adjust_head() helper such that drivers not supporting this
helper can reject the program. There are also drivers that do support
the helper, but need to check for xdp_adjust_head bit in order to move
packet metadata prepended by the firmware away for making headroom.

For these cases, the current check for xdp_adjust_head bit is insufficient
since there can be cases where the program itself does not use the
bpf_xdp_adjust_head() helper, but tail calls into another program that
uses bpf_xdp_adjust_head(). As such, the xdp_adjust_head bit is still
set to 0. Since the first program has no control over which program it
calls into, we need to assume that bpf_xdp_adjust_head() helper is used
upon tail calls. Thus, for the very same reasons in cb_access, set the
xdp_adjust_head bit to 1 when the main program uses tail calls.

Fixes: 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: fix cb access in socket filter programs on tail calls
Daniel Borkmann [Mon, 17 Apr 2017 01:12:06 +0000 (03:12 +0200)]
bpf: fix cb access in socket filter programs on tail calls

Commit ff936a04e5f2 ("bpf: fix cb access in socket filter programs")
added a fix for socket filter programs such that in i) AF_PACKET the
20 bytes of skb->cb[] area gets zeroed before use in order to not leak
data, and ii) socket filter programs attached to TCP/UDP sockets need
to save/restore these 20 bytes since they are also used by protocol
layers at that time.

The problem is that bpf_prog_run_save_cb() and bpf_prog_run_clear_cb()
only look at the actual attached program to determine whether to zero
or save/restore the skb->cb[] parts. There can be cases where the
actual attached program does not access the skb->cb[], but the program
tail calls into another program which does access this area. In such
a case, the zero or save/restore is currently not performed.

Since the programs we tail call into are unknown at verification time
and can dynamically change, we need to assume that whenever the attached
program performs a tail call, that later programs could access the
skb->cb[], and therefore we need to always set cb_access to 1.

Fixes: ff936a04e5f2 ("bpf: fix cb access in socket filter programs")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobonding: deliver link-local packets with skb->dev set to link that packets arrived on
Chonggang Li [Sun, 16 Apr 2017 19:02:18 +0000 (12:02 -0700)]
bonding: deliver link-local packets with skb->dev set to link that packets arrived on

Bonding driver changes the skb->dev to the bonding-master before
passing the packet to stack for further processing. This, however
does not make sense for the link-local packets and it loses "the
link info" once its skb->dev is changed to bonding-master.  This
patch changes this behavior for link-local packets by not changing
the skb->dev to the bonding-master and maintaining it as it is,
i.e. the link on which the packet arrived.

Signed-off-by: Chonggang Li <chonggangli@google.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: rtnetlink: plumb extended ack to doit function
David Ahern [Sun, 16 Apr 2017 16:48:24 +0000 (09:48 -0700)]
net: rtnetlink: plumb extended ack to doit function

Add netlink_ext_ack arg to rtnl_doit_func. Pass extack arg to nlmsg_parse
for doit functions that call it directly.

This is the first step to using extended error reporting in rtnetlink.
>From here individual subsystems can be updated to set netlink_ext_ack as
needed.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: sr: fix BUG due to headroom too small after SRH push
David Lebrun [Sun, 16 Apr 2017 10:27:14 +0000 (12:27 +0200)]
ipv6: sr: fix BUG due to headroom too small after SRH push

When a locally generated packet receives an SRH with two or more segments,
the remaining headroom is too small to push an ethernet header. This patch
ensures that the headroom is large enough after SRH push.

The BUG generated the following trace.

[  192.950285] skbuff: skb_under_panic: text:ffffffff81809675 len:198 put:14 head:ffff88006f306400 data:ffff88006f3063fa tail:0xc0 end:0x2c0 dev:A-1
[  192.952456] ------------[ cut here ]------------
[  192.953218] kernel BUG at net/core/skbuff.c:105!
[  192.953411] invalid opcode: 0000 [#1] PREEMPT SMP
[  192.953411] Modules linked in:
[  192.953411] CPU: 5 PID: 3433 Comm: ping6 Not tainted 4.11.0-rc3+ #237
[  192.953411] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
[  192.953411] task: ffff88007c2d42c0 task.stack: ffffc90000ef4000
[  192.953411] RIP: 0010:skb_panic+0x61/0x70
[  192.953411] RSP: 0018:ffffc90000ef7900 EFLAGS: 00010286
[  192.953411] RAX: 0000000000000085 RBX: 00000000000086dd RCX: 0000000000000201
[  192.953411] RDX: 0000000080000201 RSI: ffffffff81d104c5 RDI: 00000000ffffffff
[  192.953411] RBP: ffffc90000ef7920 R08: 0000000000000001 R09: 0000000000000000
[  192.953411] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  192.953411] R13: ffff88007c5a4000 R14: ffff88007b363d80 R15: 00000000000000b8
[  192.953411] FS:  00007f94b558b700(0000) GS:ffff88007fd40000(0000) knlGS:0000000000000000
[  192.953411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  192.953411] CR2: 00007fff5ecd5080 CR3: 0000000074141000 CR4: 00000000001406e0
[  192.953411] Call Trace:
[  192.953411]  skb_push+0x3b/0x40
[  192.953411]  eth_header+0x25/0xc0
[  192.953411]  neigh_resolve_output+0x168/0x230
[  192.953411]  ? ip6_finish_output2+0x242/0x8f0
[  192.953411]  ip6_finish_output2+0x242/0x8f0
[  192.953411]  ? ip6_finish_output2+0x76/0x8f0
[  192.953411]  ip6_finish_output+0xa8/0x1d0
[  192.953411]  ip6_output+0x64/0x2d0
[  192.953411]  ? ip6_output+0x73/0x2d0
[  192.953411]  ? ip6_dst_check+0xb5/0xc0
[  192.953411]  ? dst_cache_per_cpu_get.isra.2+0x40/0x80
[  192.953411]  seg6_output+0xb0/0x220
[  192.953411]  lwtunnel_output+0xcf/0x210
[  192.953411]  ? lwtunnel_output+0x59/0x210
[  192.953411]  ip6_local_out+0x38/0x70
[  192.953411]  ip6_send_skb+0x2a/0xb0
[  192.953411]  ip6_push_pending_frames+0x48/0x50
[  192.953411]  rawv6_sendmsg+0xa39/0xf10
[  192.953411]  ? __lock_acquire+0x489/0x890
[  192.953411]  ? __mutex_lock+0x1fc/0x970
[  192.953411]  ? __lock_acquire+0x489/0x890
[  192.953411]  ? __mutex_lock+0x1fc/0x970
[  192.953411]  ? tty_ioctl+0x283/0xec0
[  192.953411]  inet_sendmsg+0x45/0x1d0
[  192.953411]  ? _copy_from_user+0x54/0x80
[  192.953411]  sock_sendmsg+0x33/0x40
[  192.953411]  SYSC_sendto+0xef/0x170
[  192.953411]  ? entry_SYSCALL_64_fastpath+0x5/0xc2
[  192.953411]  ? trace_hardirqs_on_caller+0x12b/0x1b0
[  192.953411]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[  192.953411]  SyS_sendto+0x9/0x10
[  192.953411]  entry_SYSCALL_64_fastpath+0x1f/0xc2
[  192.953411] RIP: 0033:0x7f94b453db33
[  192.953411] RSP: 002b:00007fff5ecd0578 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  192.953411] RAX: ffffffffffffffda RBX: 00007fff5ecd16e0 RCX: 00007f94b453db33
[  192.953411] RDX: 0000000000000040 RSI: 000055a78352e9c0 RDI: 0000000000000003
[  192.953411] RBP: 00007fff5ecd1690 R08: 000055a78352c940 R09: 000000000000001c
[  192.953411] R10: 0000000000000000 R11: 0000000000000246 R12: 000055a783321e10
[  192.953411] R13: 000055a7839890c0 R14: 0000000000000004 R15: 0000000000000000
[  192.953411] Code: 00 00 48 89 44 24 10 8b 87 c4 00 00 00 48 89 44 24 08 48 8b 87 d8 00 00 00 48 c7 c7 90 58 d2 81 48 89 04 24 31 c0 e8 4f 70 9a ff <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 8b 97 d8 00 00
[  192.953411] RIP: skb_panic+0x61/0x70 RSP: ffffc90000ef7900
[  193.000186] ---[ end trace bd0b89fabdf2f92c ]---
[  193.000951] Kernel panic - not syncing: Fatal exception in interrupt
[  193.001137] Kernel Offset: disabled
[  193.001169] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Fixes: 19d5a26f5ef8de5dcb78799feaf404d717b1aac3 ("ipv6: sr: expand skb head only if necessary")
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agogso: Validate assumption of frag_list segementation
Ilan Tayari [Sun, 16 Apr 2017 08:00:07 +0000 (11:00 +0300)]
gso: Validate assumption of frag_list segementation

Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list
pointer") assumes that all SKBs in a frag_list (except maybe the last
one) contain the same amount of GSO payload.

This assumption is not always correct, resulting in the following
warning message in the log:
    skb_segment: too many frags

For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
one frag, and some with 2 frags.
After GRO, the frag_list SKBs end up having different amounts of payload.
If this frag_list SKB is then forwarded, the aforementioned assumption
is violated.

Validate the assumption, and fall back to software GSO if it not true.

Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer")
Signed-off-by: Ilan Tayari <ilant@mellanox.com>
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosctp: get list_of_streams of strreset outreq earlier
Xin Long [Sat, 15 Apr 2017 13:56:57 +0000 (21:56 +0800)]
sctp: get list_of_streams of strreset outreq earlier

Now when processing strreset out responses, it gets outreq->list_of_streams
only when result is performed. But if result is not performed, str_p will
be NULL. It will cause panic in sctp_ulpevent_make_stream_reset_event if
nums is not 0.

This patch is to fix it by getting outreq->list_of_streams earlier, and
also to improve some codes for the strreset inreq process.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoAdd uid and cookie bpf helper to cg_skb_func_proto
Chenbo Feng [Sat, 15 Apr 2017 01:25:26 +0000 (18:25 -0700)]
Add uid and cookie bpf helper to cg_skb_func_proto

BPF helper functions get_socket_cookie and get_socket_uid can be
used for network traffic classifications, among others. Expose
them also to programs of type BPF_PROG_TYPE_CGROUP_SKB. As of
commit 8f917bba0042 ("bpf: pass sk to helper functions") the
required skb->sk function is available at both cgroup bpf ingress
and egress hooks. With these two new helper, cg_skb_func_proto is
effectively the same as sk_filter_func_proto.

Change since V1:
Instead of add the helper to cg_skb_func_proto, redirect the
cg_skb_func_proto to sk_filter_func_proto since all helper function
in sk_filter_func_proto are applicable to cg_skb_func_proto now.

Signed-off-by: Chenbo Feng <fengc@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agohv_netvsc: change netvsc device default duplex to FULL
Simon Xiao [Fri, 14 Apr 2017 21:42:58 +0000 (14:42 -0700)]
hv_netvsc: change netvsc device default duplex to FULL

The netvsc device supports full duplex by default.
This warnings in log from bonding device which did not like
seeing UNKNOWN duplex.

Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonetvsc: fix RCU warning in get_stats
stephen hemminger [Fri, 14 Apr 2017 21:42:57 +0000 (14:42 -0700)]
netvsc: fix RCU warning in get_stats

The statistics functionis called with RTNL held during probe
but with RCU held during access from /proc and elsewhere.
This is safe so update the lockdep annotation.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agonet: phy: test the right variable in phy_write_mmd()
Dan Carpenter [Fri, 14 Apr 2017 19:10:41 +0000 (22:10 +0300)]
net: phy: test the right variable in phy_write_mmd()

This is a copy and paste buglet.  We meant to test for ->write_mmd but
we test for ->read_mmd.

Fixes: 1ee6b9bc6206 ("net: phy: make phy_(read|write)_mmd() generic MMD accessors")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoipv6: drop non loopback packets claiming to originate from ::1
Florian Westphal [Fri, 14 Apr 2017 18:22:43 +0000 (20:22 +0200)]
ipv6: drop non loopback packets claiming to originate from ::1

We lack a saddr check for ::1. This causes security issues e.g. with acls
permitting connections from ::1 because of assumption that these originate
from local machine.

Assuming a source address of ::1 is local seems reasonable.
RFC4291 doesn't allow such a source address either, so drop such packets.

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
David S. Miller [Mon, 17 Apr 2017 19:00:57 +0000 (15:00 -0400)]
Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2017-04-14

Here's the main batch of Bluetooth & 802.15.4 patches for the 4.12
kernel.

 - Many fixes to 6LoWPAN, in particular for BLE
 - New CA8210 IEEE 802.15.4 device driver (accounting for most of the
   lines of code added in this pull request)
 - Added Nokia Bluetooth (UART) HCI driver
 - Some serdev & TTY changes that are dependencies for the Nokia
   driver (with acks from relevant maintainers and an agreement that
   these come through the bluetooth tree)
 - Support for new Intel Bluetooth device
 - Various other minor cleanups/fixes here and there

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'bpf-lru-perf'
David S. Miller [Mon, 17 Apr 2017 17:55:54 +0000 (13:55 -0400)]
Merge branch 'bpf-lru-perf'

Martin KaFai Lau says:

====================
bpf: LRU performance and test-program improvements

The first 4 patches make a few improvements to the LRU tests.

Patch 5/6 is to improve the performance of BPF_F_NO_COMMON_LRU map.

Patch 6/6 adds an example in using LRU map with map-in-map.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: lru: Add map-in-map LRU example
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:30 +0000 (10:30 -0700)]
bpf: lru: Add map-in-map LRU example

This patch adds a map-in-map LRU example.
If we know only a subset of cores will use the
LRU, we can allocate a common LRU list per targeting core
and store it into an array-of-hashs.

It allows using the common LRU map with map-update performance
comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
on the unused cores that we know they will never access the LRU map.

BPF_F_NO_COMMON_LRU:
> map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
9234314 (9.23M/s)

map-in-map LRU:
> map_perf_test 512 8 1260000 80000000 | awk '{sum += $3}END{print sum}'
9962743 (9.96M/s)

Notes that the max_entries for the map-in-map LRU test is 1260000 which
is the max_entries for each inner LRU map.  8 processes have been
started, so 8 * 1260000 = 10080000 (~10M) which is close to what is
used in the BPF_F_NO_COMMON_LRU test.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:29 +0000 (10:30 -0700)]
bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4

After doing map_perf_test with a much bigger
BPF_F_NO_COMMON_LRU map, the perf report shows a
lot of time spent in rotating the inactive list (i.e.
__bpf_lru_list_rotate_inactive):
> map_perf_test 32 8 10000 1000000 | awk '{sum += $3}END{print sum}'
19644783 (19M/s)
> map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
6283930 (6.28M/s)

By inactive, it usually means the element is not in cache.  Hence,
there is a need to tune the PERCPU_NR_SCANS value.

This patch finds a better number of elements to
scan during each list rotation.  The PERCPU_NR_SCANS (which
is defined the same as PERCPU_FREE_TARGET) decreases
from 16 elements to 4 elements.  This change only
affects the BPF_F_NO_COMMON_LRU map.

The test_lru_dist does not show meaningful difference
between 16 and 4.  Our production L4 load balancer which uses
the LRU map for conntrack-ing also shows little change in cache
hit rate.  Since both benchmark and production data show no
cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4.
We can consider making it configurable if we find a usecase
later that shows another value works better and/or use
a different rotation strategy.

After this change:
> map_perf_test 32 8 10000000 10000000 |  awk '{sum += $3}END{print sum}'
9240324 (9.2M/s)

i.e. 6.28M/s -> 9.2M/s

The test_lru_dist has not shown meaningful difference:
> test_lru_dist zipf.100k.a1_01.out 4000 1:
nr_misses: 31575 (Before) vs 31566 (After)

> test_lru_dist zipf.100k.a0_01.out 40000 1
nr_misses: 67036 (Before) vs 67031 (After)

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: Allow bpf sample programs (*_user.c) to change bpf_map_def
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:28 +0000 (10:30 -0700)]
bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def

The current bpf_map_def is statically defined during compile
time.  This patch allows the *_user.c program to change it during
runtime.  It is done by adding load_bpf_file_fixup_map() which
takes a callback.  The callback will be called before creating
each map so that it has a chance to modify the bpf_map_def.

The current usecase is to change max_entries in map_perf_test.
It is interesting to test with a much bigger map size in
some cases (e.g. the following patch on bpf_lru_map.c).
However,  it is hard to find one size to fit all testing
environment.  Hence, it is handy to take the max_entries
as a cmdline arg and then configure the bpf_map_def during
runtime.

This patch adds two cmdline args.  One is to configure
the map's max_entries.  Another is to configure the max_cnt
which controls how many times a syscall is called.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agobpf: lru: Refactor LRU map tests in map_perf_test
Martin KaFai Lau [Fri, 14 Apr 2017 17:30:27 +0000 (10:30 -0700)]
bpf: lru: Refactor LRU map tests in map_perf_test

One more LRU test will be added later in this patch series.
In this patch, we first move all existing LRU map tests into
a single syscall (connect) first so that the future new
LRU test can be added without hunting another syscall.

One of the map name is also changed from percpu_lru_hash_map
to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>