git.karo-electronics.de Git - linux-beck.git/log

hv_netvsc: Add structs and handlers for VF messages

This patch adds data structures and handlers for messages related
to SRIOV Virtual Function.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rt6_probe_write_lock'

Martin KaFai Lau says:

====================
ipv6: Avoid rt6_probe() taking writer lock in the fast path

v1 -> v2:
1. Separate the code re-arrangement into another patch
2. Fix style
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Avoid rt6_probe() taking writer lock in the fast path

The patch checks neigh->nud_state before acquiring the writer lock.
Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF.

40 udpflood processes and a /64 gateway route are used.
The gateway has NUD_PERMANENT. Each of them is run for 30s.
At the end, the total number of finished sendto():

Before: 55M
After: 95M

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
CC: Julian Anastasov <ja@ssi.bg>
CC: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Re-arrange code in rt6_probe()

It is a prep work for the next patch to remove write_lock
from rt6_probe().

1. Reduce the number of if(neigh) check. From 4 to 1.
2. Bring the write_(un)lock() closer to the operations that the
lock is protecting.

Hopefully, the above make rt6_probe() more readable.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: convert num_grat_arp to the new bonding option API

num_grat_arp wasn't converted to the new bonding option API, so do this
now and remove the specific sysfs store option in order to use the
standard one. num_grat_arp is the same as num_unsol_na so add it as an
alias with the same option settings. An important difference is the option
name which is matched in bond_sysfs_store_option().

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: fix auto negotiation checking for teranetics

When using fiber port, the phy cannot report it's auto negotiation state,
driver should always report auto negotiation is done when using fiber port.

Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lwtunnel: change prototype of lwtunnel_state_get()

It saves some lines and simplify a bit the code when the state is returning
by this function. It's also useful to handle a NULL entry.

To avoid too long lines, I've also renamed lwtunnel_state_get() and
lwtunnel_state_put() to lwtstate_get() and lwtstate_put().

CC: Thomas Graf <tgraf@suug.ch>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: copy lwtstate in ip6_rt_copy_init()

We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc()
use ip6_rt_copy_init() to build a dst).

CC: Thomas Graf <tgraf@suug.ch>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Fixes: 19e42e451506 ("ipv6: support for fib route lwtunnel encap attributes")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: use lwtunnel_output6() only if flag redirect is set

This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set.
The check is already done in IPv4.

CC: Thomas Graf <tgraf@suug.ch>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Fixes: 74a0f2fe8ed5 ("ipv6: rt6_info output redirect to tunnel output")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83867: fix simple_return.cocci warnings

drivers/net/phy/dp83867.c:126:1-4: WARNING: end returns can be simpified
drivers/net/phy/dp83867.c:74:5-8: WARNING: end returns can be simpified if tested value is negative or 0

Simplify a trivial if-return sequence. Possibly combine with a
preceding function call.

Generated by: scripts/coccinelle/misc/simple_return.cocci

CC: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dev: Spelling fix in comments

Fix the following typo
- unchainged -> unchanged

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebpf: Allow dereferences of PTR_TO_STACK registers

mov %rsp, %r1           ; r1 = rsp
        add $-8, %r1            ; r1 = rsp - 8
        store_q $123, -8(%rsp)  ; *(u64*)r1 = 123  <- valid
        store_q $123, (%r1)     ; *(u64*)r1 = 123  <- previously invalid
        mov $0, %r0
        exit                    ; Always need to exit

And we'd get the following error:

0: (bf) r1 = r10
1: (07) r1 += -8
2: (7a) *(u64 *)(r10 -8) = 999
3: (7a) *(u64 *)(r1 +0) = 999
R1 invalid mem access 'fp'

Unable to load program

We already know that a register is a stack address and the appropriate
offset, so we should be able to validate those references as well.

Signed-off-by: Alex Gartrell <agartrell@fb.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx5e-next'

Amir Vadai says:

====================
ConnectX-4 driver update 2015-07-23

This patchset introduce some performance enhancements to the ConnectX-4 driver.
1. Improving RSS distribution, and make RSS function controlable using ethtool.
2. Make memory that is written by NIC and read by host CPU allocate in the
local NUMA to the processing CPU
3. Support tx copybreak
4. Using hardware feature called blueflame to save DMA reads when possible

Another patch by Achiad fix some cosmetic issues in the driver.

Patchset was applied and tested on top of commit 045a0fa ("ip_tunnel: Call
ip_tunnel_core_init() from inet_init()")
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Input IPSEC.SPI into the RX RSS hash function

In addition to the source/destination IP which are already hashed.
Only for unicast traffic for now.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Cosmetics: use BIT() instead of "1 <<", and others

No logical change in this commit.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: TX latency optimization to save DMA reads

A regular TX WQE execution involves two or more DMA reads -
one to fetch the WQE, and another one per WQE gather entry.

These DMA reads obviously increase the TX latency.
There are two mlx5 mechanisms to bypass these DMA reads:
1) Inline WQE
2) Blue Flame (BF)

An inline WQE contains a whole packet, thus saves the DMA read/s
of the regular WQE gather entry/s. Inline WQE support was already
added in the previous commit.

A BF WQE is written directly to the device I/O mapped memory, thus
enables saving the DMA read that fetches the WQE.

The BF WQE I/O write must be in cache line granularity, thus uses
the CPU write combining mechanism.
A BF WQE I/O write acts also as a TX doorbell for notifying the
device of new TX WQEs.
A BF WQE is written to the same I/O mapped address as the regular TX
doorbell, thus this address is being mapped twice - once by ioremap()
and once by io_mapping_map_wc().

While both mechanisms reduce the TX latency, they both consume more CPU
cycles than a regular WQE:
- A BF WQE must still be written to host memory, in addition to being
written directly to the device I/O mapped memory.
- An inline WQE involves copying the SKB data into it.

To handle this tradeoff, we introduce here a heuristic algorithm that
strives to avoid using these two mechanisms in case the TX queue is
being back-pressured by the device, and limit their usage rate otherwise.

An inline WQE will always be "Blue Flamed" (written directly to the
device I/O mapped memory) while a BF WQE may not be inlined (may contain
gather entries).

Preliminary testing using netperf UDP_RR shows that the latency goes down
from 17.5us to 16.9us, while the message rate (tested with pktgen) stays
the same.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Support TX packet copy into WQE

AKA inline WQE.
A TX latency optimization to save data gather DMA reads.
Controlled by ETHTOOL_TX_COPYBREAK.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Allocate DMA coherent memory on reader NUMA node

By affinity hints and XPS, each mlx5e channel is assigned a CPU
core.

Channel DMA coherent memory that is written by the NIC and read
by SW (e.g CQ buffer) is allocated on the NUMA node of the CPU
core assigned for the channel.

Channel DMA coherent memory that is written by SW and read by the
NIC (e.g SQ/RQ buffer) is allocated on the NUMA node of the NIC.

Doorbell record (written by SW and read by the NIC) is an
exception since it is accessed by SW more frequently.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Support ETH_RSS_HASH_XOR

The ConnectX-4 HW implements inverted XOR8.
To make it act as XOR we re-order the HW RSS indirection table.

Set XOR to be the default RSS hash function and add ethtool API to
control it.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netcp-next'

WingMan Kwok says:

====================
net: netcp: Bug fixes of CPSW statistics collection

This patch set contains bug fixes and enhencements of hw ethernet
statistics processing on TI's Keystone2 CPSW ethernet switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: Adds missing statistics for K2L and K2E

This patch adds the missing statistics for the host
and slave ports of the CPSW on K2L and K2E platforms.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: Fixes to CPSW statistics collection

In certain applications it's beneficial to allow the CPSW h/w
stats counters to continue to increment even while the kernel
polls them. This patch implements this behavior for both 1G
and 10G ethernet subsystem modules.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: Consolidates statistics collection code

Different Keystone2 platforms have different number and
layouts of hw statistics modules. This patch consolidates
the statistics processing of different Keystone2 platforms
for easy maintenance.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: Fixes error in oversized memory allocation for statistics storage

The CPSW driver keeps internally some, but not all, of
the statistics available in the hw statistics modules.  Furthermore,
some of the locations in the hw statistics modules are reserved and
contain no useful information.  Prior to this patch, the driver
allocates memory of the size of the the whole hw statistics modules,
instead of the size of statistics-entries-interested-in (i.e. et_stats),
for internal storage.  This patch fixes that.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: Fixes hw statistics module base setting error

This patch fixes error in the setting of the hw statistics
module base for K2HK platform. In K2HK although there are
4 hw statistics modules, but only 2 are visible at a time.
Thus when setting up the pointers to the base of the
corresponding hw statistics modules, modules 0 and 2 should
point to one base, while modules 1 and 3 should point to the
other.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: Fixes the use of spin_lock_bh in timer function

This patch fixes a bug in which the timer routine synchronized
against the ethtool-triggered statistics updates with spin_lock_bh().
A timer function is itself a bottom-half, so this should be
spin_lock().

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4vf: Read correct FL congestion threshold for T5 and T6

VF driver was reading incorrect freelist congestion notification threshold
for FLM queues when packing is enabled for T5 and T6 adapter. Fixing it
now.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

lwtunnel: export linux/lwtunnel.h to userspace

Note also that include/linux/lwtunnel.h is not needed.

CC: Thomas Graf <tgraf@suug.ch>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Fixes: 499a24256862 ("lwtunnel: infrastructure for handling light weight tunnels like mpls")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: mdb: notify on router port add and del

Send notifications on router port add and del/expire, re-use the already
existing MDBA_ROUTER and send NEWMDB/DELMDB netlink notifications
respectively.

Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Retrieve tunnel metadata when receiving from vport-netdev

Retrieve the tunnel metadata for packets received by a net_device and
provide it to ovs_vport_receive() for flow key extraction.

[This hunk was in the GRE patch in the initial series and missed the
cut for the initial submission for merging.]

Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: Change capability mask for jumbo support

JUMBO and NO_GIGABIT_HALF have the same capability masks.
Change one of them.

Signed-off-by: Harini Katakam <harinik@xilinx.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: fix compilation when vxlan is a module

With CONFIG_VXLAN=m and CONFIG_OPENVSWITCH=y, there was the following
compilation error:
  LD      init/built-in.o
  net/built-in.o: In function `vxlan_tnl_create':
  .../net/openvswitch/vport-netdev.c:322: undefined reference to `vxlan_dev_create'
  make: *** [vmlinux] Error 1

CC: Thomas Graf <tgraf@suug.ch>
Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: be more aggressive when probing alternative gateways

Currently, we do not notice if new alternative gateways
are added. We can do it by checking for present neigh
entry. Also, gateways that are currently probed (NUD_INCOMPLETE)
can be skipped from round-robin probing.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: fix crash over flow-based vxlan device

Similar check was added in ip_rcv but not in ipv6_rcv.

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81734e0a>] ipv6_rcv+0xfa/0x500
Call Trace:
[<ffffffff816c9786>] ? ip_rcv+0x296/0x400
[<ffffffff817732d2>] ? packet_rcv+0x52/0x410
[<ffffffff8168e99f>] __netif_receive_skb_core+0x63f/0x9a0
[<ffffffffc02b34a0>] ? br_handle_frame_finish+0x580/0x580 [bridge]
[<ffffffff8109912c>] ? update_rq_clock.part.81+0x1c/0x40
[<ffffffff8168ed18>] __netif_receive_skb+0x18/0x60
[<ffffffff8168fa1f>] process_backlog+0x9f/0x150

Fixes: ee122c79d422 (vxlan: Flow based tunneling)
Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Register link_update callback for all MoCA PHYs

Commit 8d88c6ebb34c ("net: bcmgenet: enable MoCA link state change
detection") added a fixed PHY link_update callback for MoCA PHYs when
registered using platform_data exclusively, this change is also
applicable to systems using Device Tree as their primary configuration
interface.

In order for this to work, move the link_update assignment into
bcmgenet_moca_phy_setup() where we know for sure that we are running on
a MoCA GENET instance, and do not override phydev->link since this is:

- properly taken care of by the PHY library by getting the link UP/DOWN
  interrupts
- this now runs everytime we call bcmgenet_open(), so we need to
  preserve whatever we detected before we went administratively DOWN and
  then UP
- we need to make sure that MoCA PHYs start with a link DOWN during
  probe in order to force a link transition to occur

To avoid a forward declaration, move bcmgenet_fixed_phy_link_update()
above its caller.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Remove checks on clock handles

Instead of multiplying the number of checks for IS_ERR(priv->clk),
simply NULLify the 'struct clk' pointer which is something the Linux
common clock framework perfectly deals with and does early return for
each and every single clk_* API functions.

Having every single function check for !IS_ERR(priv->clk) is both
redundant and error prone, as it turns out, we were doing it for the
main GENET clock: priv->clk, but not for the Wake-on-LAN or EEE clock,
so let's just be consistent here.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Petri Gynther <pgynther@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: Wait for sub-channels to be processed during probe

The current code returns from probe without waiting for the proper handling
of subchannels that may be requested. If the netvsc driver were to be rapidly
loaded/unloaded, we can trigger a panic as the unload will be tearing
down state that may not have been fully setup yet. We fix this issue by making
sure that we return from the probe call only after ensuring that the
sub-channel offers in flight are properly handled.

Reviewed-and-tested-by: Haiyang Zhang <haiyangz@microsoft.com
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Allow firmware flash, only if cxgb4 is the master driver

Adapter can go for a toss, if cxgb4 is loaded as slave and we try to
upgrade the firmware. So add a check for the same before flashing
firmware using ethtool.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: Use proper endian type for vni in vxlan[6]_xmit_skb

Silences the following sparse warnings:
drivers/net/vxlan.c:1818:21: warning: incorrect type in assignment (different base types)
drivers/net/vxlan.c:1818:21:    expected restricted __be32 [usertype] vx_vni
drivers/net/vxlan.c:1818:21:    got unsigned int [unsigned] [usertype] vni
drivers/net/vxlan.c:2014:58: warning: incorrect type in argument 11 (different base types)
drivers/net/vxlan.c:2014:58:    expected unsigned int [unsigned] [usertype] vni
drivers/net/vxlan.c:2014:58:    got restricted __be32 [usertype] <noident>

Fixes: 614732eaa12d ("openvswitch: Use regular VXLAN net_device device")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tipc'

Jon Maloy says:

====================
tipc: clean up socket message reception

Despite recent improvements the message reception code in socket.c is
perceived as obscure and hard to follow, especially regarding the logics
for message rejection. With the commits in this series we try to remedy
this situation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: clean up socket layer message reception

When a message is received in a socket, one of the call chains
tipc_sk_rcv()->tipc_sk_enqueue()->filter_rcv()(->tipc_sk_proto_rcv())
or
tipc_sk_backlog_rcv()->filter_rcv()(->tipc_sk_proto_rcv())
are followed. At each of these levels we may encounter situations
where the message may need to be rejected, or a new message
produced for transfer back to the sender. Despite recent
improvements, the current code for doing this is perceived
as awkward and hard to follow.

Leveraging the two previous commits in this series, we now
introduce a more uniform handling of such situations. We
let each of the functions in the chain itself produce/reverse
the message to be returned to the sender, but also perform the
actual forwarding. This simplifies the necessary logics within
each function.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: introduce new tipc_sk_respond() function

Currently, we use the code sequence

if (msg_reverse())
tipc_link_xmit_skb()

at numerous locations in socket.c. The preparation of arguments
for these calls, as well as the sequence itself, makes the code
unecessarily complex.

In this commit, we introduce a new function, tipc_sk_respond(),
that performs this call combination. We also replace some, but not
yet all, of these explicit call sequences with calls to the new
function. Notably, we let the function tipc_sk_proto_rcv() use
the new function to directly send out PROBE_REPLY messages,
instead of deferring this to the calling tipc_sk_rcv() function,
as we do now.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: let function tipc_msg_reverse() expand header when needed

The shortest TIPC message header, for cluster local CONNECTED messages,
is 24 bytes long. With this format, the fields "dest_node" and
"orig_node" are optimized away, since they in reality are redundant
in this particular case.

However, the absence of these fields leads to code inconsistencies
that are difficult to handle in some cases, especially when we need
to reverse or reject messages at the socket layer.

In this commit, we concentrate the handling of the absent fields
to one place, by letting the function tipc_msg_reverse() reallocate
the buffer and expand the header to 32 bytes when necessary. This
means that the socket code now can assume that the two previously
absent fields are present in the header when a message needs to be
rejected. This opens up for some further simplifications of the
socket code.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-07-23

This series contains updates to e1000e, igb, ixgbevf, i40e and i40evf.

Emil extends the reporting of the RSS key and hash table by adding support
for x550 VFs.

Jia-Ju Bai fixes a QoS issue in e1000e where the error handling lacked a
call to pm_qos_remove_request() to cleanup the QoS request made in
e1000_open().

Todd updates igb to report unsupported for ethtool coalesce settings
that are not supported.  Also updated the driver to use the ARRAY_SIZE()
macro.

Carolyn fixes and refactors the dynamic ITR code for i40e and i40evf
which would never change dynamically.  So update the switch() statement
to have a default case and switch on "new_latency_range" versus the
current ITR setting.

Shannon cleans up i40e code, where there were un-needed goto's.  Also
clean up error status messages that were causing some confusion in
PHY and FCoE setup error reports.

Mitch updates the virtual channel interface to prepare for the x722 device
and other future devices, so that the VF driver can report what its
capable of supporting to the PF driver.  Updates the i40evf driver to
handle resets like Core or EMP resets, where the device is reinitialized
and the VF will not get the same VSI.

Jesse updates the i40e and i40evf driver to use the kernel BIT() and
BIT_ULL() macros.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: Fix setting a flag in br_fill_ifvlaninfo_range().

This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() method. The
assignment of vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is
unneeded, as vinfo.flags value is overriden by the immediately following
vinfo.flags = flags | BRIDGE_VLAN_INFO_RANGE_END assignement.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: support ndo_get_phys_port_id()

Add be_get_phys_port_id() function to report physical port id. The port id
should be unique across different be2net devices in the system. We use the
chip serial number along with the physical port number for this.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: use BIT and BIT_ULL macros

Use macros for abstracting (1 << foo) to BIT(foo)
and (1ULL << foo64) to BIT_ULL(foo64) in order to match
better with kernel requirements.

NOTE: the adminq_cmd.h file was not modified on purpose because
of the dependency upon firmware for that file.

Change-ID: I73ee2e48c880d671948aad19bd53ca6b2ac558fc
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: clean up error status messages

Clean up a little confusion in reporting error status in phy and fcoe
setup error reports by separating the return status from the AQ error.

Add two decoder functions to make this easier.

Change-ID: I960bcdeef3978a15fec1cdb5eff781d5cbae42fb
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: provide correct API version to older VF drivers

This driver fully supports VF drivers using both the 1.0 and 1.1
versions of the virtual channel API. However, VF drivers using
version 1.0 get upset if we provide them with a version other than
that, and refuse to play with us.

Correct this by checking the VFs API version at the time that we
store it off, and provide the correct version number back to the VF
so we can all get along.

Change-ID: I86dfe02e67b2bef336b4b49a1bb072f3e7229abc
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: support virtual channel API version 1.1

Store off the PF's API version, then use it to determine whether or not
to send it our capabilities. Change the version checking to allow for PF
drivers with lower API versions than our current version, so we can
still talk to PF drivers over the 1.0 API.

Change-ID: I8edc55d1229c7decf0ed3f285a63032694007c2e
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: handle big resets

The most common type of reset that the VF will encounter is a PF reset
that cascades down into a VF reset for each VF. In this case, the VF
will always be assigned the same VSI and recovery is fairly simple.

However, in the case of 'bigger' resets, such as a Core or EMP reset,
when the device is reinitialized, it's probable that the VF will NOT get
the same VSI. When this happens, the VF will not be able to recover, as
it will continue to request resources for its original VSI.

Add an extra state to the admin queue state machine so that the driver
can re-request its configuration information at runtime. During reset
recovery, set this bit in the aq_required field, and fetch the (possibly
new) configuration information before attempting to bring the driver
back up. Since the driver doesn't know what kind of reset it has
encountered, this step is done even for a PF reset, but it doesn't hurt
anything - it just gets the same VSI back.

Change-ID: I915d59ffb40375215117362f4ac7a37811aba748
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: support virtual channel API 1.1

Store off the VF API version for use when figuring out the VF driver
capabilities. Add support for the VF driver handing its capabilities to
the PF driver and then use this information when sending VF resource
information back to the VF driver.

Change-ID: Ic00d0eeeb5b8118085e12f068ef857089a8f7c2d
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: add macros for virtual channel API version and device capability

Now that we've rolled the virtual channel API version to 1.1, add some
macros to test what version is being used by our partner in crime. For the
VF, add some macros to determine what our device capabilities are.

Change-ID: I79f6683d4c23bd76a8ad9fd492776fcc1208e1dc
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: add VF capabilities to virtual channel interface

To prepare for the changes coming up in the X722 device and future
devices, the virtual channel interface has to change slightly. The VF
driver can now report what its capable of supporting, which then informs
the PF driver when it sends the configuration information back to the
VF.

A 1.1 VF driver on a 1.0 PF driver should not send its capabilities.
Likewise, a 1.1 PF driver controlling a 1.0 VF driver should not expect
or depend upon receiving the VF capabilities.

All other aspects of the API are unchanged.

Change-ID: I530cc55f107edd1ee8bdf95830aa90b87854058a
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Anjali Singhai <anjali.singhai@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: clean up unneeded gotos

With a little work we can clean up some unnecessary logic jumping and
drop a variable.

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Cc: Laurent Navet <laurent.navet@gmail.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: Fix and refactor dynamic ITR code

This patch changes the switch statement for dynamic interrupt throttling
and adds a default case. With this patch, we check the latency setting
instead of the current ITR settings and the included refactor improves
performance.

Without this patch, the ITR setting would never change dynamically, and
there was no default.

Change-ID: Idb5a8a14c7109ec47c90f6e94bd43baa17d7ee37
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: bump version to igb-5.3.0

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: use ARRAY_SIZE to replace calculating sizeof(a)/sizeof(a[0])

Use the ARRAY_SIZE macro rather than calculating sizeof(a)/sizeof(a[0]).
Also directly replace the code rather than using an unnecessary define.

Reported-by: Maninder Singh <maninder1.s@samsung.com>
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: report unsupported ethtool settings in set_coalesce

There are many settings possible using ethtool -C/--coalesce, but not
all of them are supported in igb. Report failure when an unsupported
option is set.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Cleanup qos request in error handling of e1000_open

The driver lacks pm_qos_remove_request in error handling (err_req_irq) of
e1000_open, and qos request inserted by pm_qos_add_request is not removed.
This patch add pm_qos_remove_request in error handling to fix it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: add support for reporting RSS key and hash table for X550

This patch extends the reporting of the RSS key and hash table by
adding support for X550 VFs. The difference is that X550 VFs have
their own registers for RSS key and indirection table, so there is
no need to query the PF.

The RSS key and indirection table are stored in the adapter structure
during the configuration of VFRSSRK and VFRETA which in turn can be
used in ethtool for reporting.

The logic for writing VFRETA is also changed to make sure that the
indirection table is reported correctly.

In addition this patch adds defines for the VFRETA entries and number
of VFRSSRK registers as well as some whitespace cleanups.

Reported-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ip_tunnel: Call ip_tunnel_core_init() from inet_init()

Convert the module_init() to a invocation from inet_init() since
ip_tunnel_core is part of the INET built-in.

Fixes: 3093fbe7ff4 ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Conflicts:
net/bridge/br_mdb.c

br_mdb.c conflict was a function call being removed to fix a bug in
'net' but whose signature was changed in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Don't use shared bluetooth antenna in iwlwifi driver for management
    frames, from Emmanuel Grumbach.

2) Fix device ID check in ath9k driver, from Felix Fietkau.

3) Off by one in xen-netback BUG checks, from Dan Carpenter.

4) Fix IFLA_VF_PORT netlink attribute validation, from Daniel Borkmann.

5) Fix races in setting peeked bit flag in SKBs during datagram
    receive.  If it's shared we have to clone it otherwise the value can
    easily be corrupted.  Fix from Herbert Xu.

6) Revert fec clock handling change, causes regressions.  From Fabio
    Estevam.

7) Fix use after free in fq_codel and sfq packet schedulers, from WANG
    Cong.

8) ipvlan bug fixes (memory leaks, missing rcu_dereference_bh, etc.)
    from WANG Cong and Konstantin Khlebnikov.

9) Memory leak in act_bpf packet action, from Alexei Starovoitov.

10) ARM bpf JIT bug fixes from Nicolas Schichan.

11) Fix backwards compat of ANY_LAYOUT in virtio_net driver, from
    Michael S Tsirkin.

12) Destruction of bond with different ARP header types not handled
    correctly, fix from Nikolay Aleksandrov.

13) Revert GRO receive support in ipv6 SIT tunnel driver, causes
    regressions because the GRO packets created cannot be processed
    properly on the GSO side if we forward the frame.  From Herbert Xu.

14) TCCR update race and other fixes to ravb driver from Sergei
    Shtylyov.

15) Fix SKB leaks in caif_queue_rcv_skb(), from Eric Dumazet.

16) Fix panics on packet scheduler filter replace, from Daniel Borkmann.

17) Make sure AF_PACKET sees properly IP headers in defragmented frames
    (via PACKET_FANOUT_FLAG_DEFRAG option), from Edward Hyunkoo Jee.

18) AF_NETLINK cannot hold mutex in RCU callback, fix from Florian
    Westphal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits)
  ravb: fix ring memory allocation
  net: phy: dp83867: Fix warning check for setting the internal delay
  openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
  netlink: don't hold mutex in rcu callback when releasing mmapd ring
  ARM: net: fix vlan access instructions in ARM JIT.
  ARM: net: handle negative offsets in BPF JIT.
  ARM: net: fix condition for load_order > 0 when translating load instructions.
  tcp: suppress a division by zero warning
  drivers: net: cpsw: remove tx event processing in rx napi poll
  inet: frags: fix defragmented packet's IP header for af_packet
  net: mvneta: fix refilling for Rx DMA buffers
  stmmac: fix setting of driver data in stmmac_dvr_probe
  sched: cls_flow: fix panic on filter replace
  sched: cls_flower: fix panic on filter replace
  sched: cls_bpf: fix panic on filter replace
  net/mdio: fix mdio_bus_match for c45 PHY
  net: ratelimit warnings about dst entry refcount underflow or overflow
  caif: fix leaks and race in caif_queue_rcv_skb()
  qmi_wwan: add the second QMI/network interface for Sierra Wireless MC7305/MC7355
  ravb: fix race updating TCCR
  ...

ip_tunnel: Provide tunnel metadata API for CONFIG_INET=n

Account for the configuration FIB_RULES=y && INET=n as FIB_RULES can
be selected by IPV6 or DECNET without INET.

Fixes: e7030878fc84 ("fib: Add fib rule match on tunnel id")
Fixes: 3093fbe7ff4b ("route: Per route IP tunnel metadata via lightweight tunnel")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: sysctl to restrict candidate source addresses

Per RFC 6724, section 4, "Candidate Source Addresses":

    It is RECOMMENDED that the candidate source addresses be the set
    of unicast addresses assigned to the interface that will be used
    to send to the destination (the "outgoing" interface).

Add a sysctl to enable this behaviour.

Signed-off-by: Erik Kline <ek@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: support the new RTL8153 chip

Support the new USB gigabit ethernet.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls_iptunnel: fix sparse warn: remove incorrect rcu_dereference

fix for:
net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison
expression (different address spaces)

remove incorrect rcu_dereference possibly left over from
earlier revisions of the code.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bnx2x-next'

Yuval Mintz says:

====================
bnx2x: update FW, rebrand and more

This patch series does several things - it updates the bnx2x FW into
7.12.30 which both contains some small fixes as well as opening the door
for several new features for the device - mainly vxlan/geneve offloads
and vlan filtering offload.
It then adds a new Multi-function mode [BD] which requires this FW in
order to operate.

In addition, this finally rebrands the driver from a 'broadcom' driver
into a 'qlogic' driver [although it would still reside under Broadcom's
tree in the kernel].
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Bump up driver version to 1.712.30

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Add MFW dump support

Devices with up-to-date management FW will be able to store register dumps
on their persistent storage - in case management FW identifies a fatal
error it would gather and store such dumps, which could later be retrieved
using specific debug tools.

This patch adds the necessary part in the driver in order to make the
feature operational, as well as update users [under debug] during load
in case their device contains a dump of a previous crash.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: new Multi-function mode - BD

This adds support to a new multi-function mode, enabling driver to
initialize such devices and correctly interacting with management FW
for fully utilizing their features.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Add 84858 phy support

This adds support to a new copper phy.

Signed-off-by: Yaniv Rosner <Yaniv.Rosner@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Rebrand from 'broadcom' into 'qlogic'

bnx2x still appears as a Broadcom driver even though the devices it
utilizes belong to Qlogic for more than a year.

This patch changes the various headers and the device strings to indicate
the correct ownership of the device.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Utilize FW 7.12.30

This moves bnx2x into using 7.12.30 FW. Said firmware fixes the following:

- Packets from a VF with pvid configured which were sent with a
different vlan were transmitted instead of being discarded.

- FCoE traffic might not recover after a failue while there's traffic
to another function.

In addition, this FW opens the door for the driver to implement several
new features; Specifically, this enhances the device's support for
encapsulated packets and will allow vxlan/geneve offloads to be added in
the future, as well as vlan filtering offload.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull ARM64 fixes from Catalin Marinas:

- arm64 build fix following the move of the thread_struct to the end of
   task_struct and the asm offsets becoming too large for the AArch64
   ISA

- preparatory patch for moving irq_data struct members (applied now to
   reduce dependency for the next merging window)

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  ARM64/irq: Use access helper irq_data_get_affinity_mask()
  arm64: switch_to: calculate cpu context pointer using separate register

ARM64/irq: Use access helper irq_data_get_affinity_mask()

This is a preparatory patch for moving irq_data struct members.

Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

arm64: switch_to: calculate cpu context pointer using separate register

Commit 0c8c0f03e3a2 ("x86/fpu, sched: Dynamically allocate 'struct fpu'")
moved the thread_struct to the bottom of task_struct. As a result, the
offset is now too large to be used in an immediate add on arm64 with
some kernel configs:

arch/arm64/kernel/entry.S: Assembler messages:
arch/arm64/kernel/entry.S:588: Error: immediate out of range
arch/arm64/kernel/entry.S:597: Error: immediate out of range

This patch calculates the offset using an additional register instead of
an immediate offset.

Fixes: 0c8c0f03e3a2 ("x86/fpu, sched: Dynamically allocate 'struct fpu'")
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

net: track success and failure of TCP PMTU probing

Track success and failure of TCP PMTU probing.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ravb: fix ring memory allocation

The driver is written as if it can adapt to a low memory situation  allocating
less RX  skbs and TX aligned buffers than the respective RX/TX ring sizes.  In
reality  though  the driver  would malfunction in this case. Stop being overly
smart and just fail in such situation -- this is achieved by moving the memory
allocation from ravb_ring_format() to ravb_ring_init().

We leave dma_map_single() calls in place but make their failure non-fatal
by marking the corresponding RX descriptors  with zero data size which should
prevent DMA to an invalid addresses.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add debugfs entry to enable backdoor access

Add debugfs entry 'use_backdoor' to enable backdoor access to read sge
context. By default, we read sge context's via firmware. In case of FW
issues, one can enable backdoor access via debugfs to dump sge context
for debugging purpose.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: dp83867: Fix warning check for setting the internal delay

Fix warning: logical ‘or’ of collectively exhaustive tests is always true

Change the internal delay check from an 'or' condition to an 'and'
condition.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mpls: make RTA_OIF optional

If user did not specify an oif, try and get it from the via address.
If failed to get device, return with -ENODEV.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes

Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow->stats[node].

Use nr_node_ids to allocate a maximal sparse array instead of
num_possible_nodes().

The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot.
Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861
Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: don't hold mutex in rcu callback when releasing mmapd ring

Kirill A. Shutemov says:

This simple test-case trigers few locking asserts in kernel:

int main(int argc, char **argv)
{
        unsigned int block_size = 16 * 4096;
        struct nl_mmap_req req = {
                .nm_block_size          = block_size,
                .nm_block_nr            = 64,
                .nm_frame_size          = 16384,
                .nm_frame_nr            = 64 * block_size / 16384,
        };
        unsigned int ring_size;
int fd;

fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
                exit(1);
        if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
                exit(1);

ring_size = req.nm_block_nr * req.nm_block_size;
mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
return 0;
}

+++ exited with 0 +++
BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
3 locks held by init/1:
#0:  (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
#1:  ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
#2:  (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20

CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
Call Trace:
<IRQ>  [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
[<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
[<ffffffff81085bed>] __might_sleep+0x4d/0x90
[<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
[<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
[<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
[<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
[<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
[<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
[<ffffffff817e484d>] __sk_free+0x1d/0x160
[<ffffffff817e49a9>] sk_free+0x19/0x20
[..]

Cong Wang says:

We can't hold mutex lock in a rcu callback, [..]

Thomas Graf says:

The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all.

Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Diagnosed-by: Cong Wang <cwang@twopensource.com>
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sfc-filter-chaining'

Edward Cree says:

====================
sfc: support for cascaded multicast filtering

Recent versions of firmware for SFC9100 adapters add support for filter
chaining, in which packets matching multiple filters are delivered to all
filters' recipients, rather than only the highest match-priority filter as was
previously the case.
This patch series enables this feature and redesigns the filter handling code
to make use of it; in particular, subscribing to a multicast address on one
function no longer prevents traffic to that address reaching another function
which is in promiscuous or allmulti mode.
If the firmware does not support filter chaining, the driver will fall back to
the old behaviour.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: clean fallbacks between promisc/normal in efx_ef10_filter_sync_rx_mode

Separate functions for inserting individual and promisc filters; explicit
fallback logic in efx_ef10_filter_sync_rx_mode(), in order not to overload
the 'promisc' flag as also meaning "fall back to promisc".

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: support cascaded multicast filters

If the workaround to support cascaded multicast filters ("workaround_26807") is
enabled, the broadcast filter and individual multicast filters are not inserted
when in promiscuous or allmulti mode.

There is a race while inserting and removing filters when entering and leaving
promiscuous mode. When changing promiscuous state with cascaded multicast
filters, the old multicast filters are removed before inserting the new filters
to avoid duplicating packets; this can lead to dropped packets until all
filters have been inserted.

The efx_nic:mc_promisc flag is added to record the presence of a multicast
promiscuous filter; this gives a simple way to tell if the promiscuous state is
changing.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: re-factor efx_ef10_filter_sync_rx_mode()

This change is only re-factoring; there are no changes to functionality
except for a slight elaboration of an error message (on mismatch filter
insertion failure).

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Insert multicast filters as well as mismatch filters in promiscuous mode

If a function is in promiscuous mode and another function has a broadcast or
multicast filter inserted, the function in promiscuous mode won't see that
broadcast or multicast traffic.
Most notably this breaks broadcast, which means ARP doesn't work. Less
show-stoppingly, a function listening on a multicast address that's also in
promiscuous mode will not see that multicast traffic if another function is
also listening on that multicast address.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: warn if other functions have been reset by MCFW

When enabling the workaround for cascaded multicast filters, the MC
can reset other functions if they have already inserted filters.
In that case, the workaround has been enabled, but print an info
message in the log recording that other functions had to be reset.

As other functions were reset, the MC will have incremented its boot
count, so also increment the warm_boot_count on the function which
enabled the workaround, as that function won't have received an MC
reboot event and does not need to reset.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: add output flag decoding to efx_mcdi_set_workaround

The initial use of this will be to check a flag reporting if an FLR was
performed on other functions when enabling cascaded multicast filters.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: cope with ENOSYS from efx_mcdi_get_workarounds()

GET_WORKAROUNDS was only introduced in May 2014, not all firmware
will have it. So call sites need to handle ENOSYS.
In this case we're probing the bug26807 workaround, which is not
implemented in any firmware that doesn't have GET_WORKAROUNDS.
So interpret ENOSYS as 'false'.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: enable cascaded multicast filters in MCFW

After creating event queue 0, check to see if the workaround is enabled,
and enable it if necessary.  This will be called during PCI probe and
also when coming back up after a reset.  The nic_data->workaround_26807
will be used in the future to control the filter insertion behaviour
based on this workaround.

Only the primary PF can enable this workaround, so tolerate an EPERM
error and continue.  Otherwise, if any step in the checking and enabling
of the workaround fails, the event queue must be removed.

We check that workaround is implemented before trying to enable it,
and store the current workaround setting before trying to change it.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: update MCDI protocol definitions

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'arm-bpf-fixes'

Nicolas Schichan says:

====================
BPF JIT fixes for ARM

These patches are fixing bugs in the ARM JIT and should probably find
their way to a stable kernel. All 60 test_bpf tests in Linux 4.1 release
are now passing OK (was 54 out of 60 before).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: net: fix vlan access instructions in ARM JIT.

This makes BPF_ANC | SKF_AD_VLAN_TAG and BPF_ANC | SKF_AD_VLAN_TAG_PRESENT
have the same behaviour as the in kernel VM and makes the test_bpf LD_VLAN_TAG
and LD_VLAN_TAG_PRESENT tests pass.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: net: handle negative offsets in BPF JIT.

Previously, the JIT would reject negative offsets known during code
generation and mishandle negative offsets provided at runtime.

Fix that by calling bpf_internal_load_pointer_neg_helper()
appropriately in the jit_get_skb_{b,h,w} slow path helpers and by forcing
the execution flow to the slow path helpers when the offset is
negative.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: net: fix condition for load_order > 0 when translating load instructions.

To check whether the load should take the fast path or not, the code
would check that (r_skb_hlen - load_order) is greater than the offset
of the access using an "Unsigned higher or same" condition. For
halfword accesses and an skb length of 1 at offset 0, that test is
valid, as we end up comparing 0xffffffff(-1) and 0, so the fast path
is taken and the filter allows the load to wrongly succeed. A similar
issue exists for word loads at offset 0 and an skb length of less than
4.

Fix that by using the condition "Signed greater than or equal"
condition for the fast path code for load orders greater than 0.

Signed-off-by: Nicolas Schichan <nschichan@freebox.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: suppress a division by zero warning

Andrew Morton reported following warning on one ARM build
with gcc-4.4 :

net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
net/ipv4/inet_hashtables.c:617: warning: division by zero

Even guarded with a test on sizeof(spinlock_t), compiler does not
like current construct on a !CONFIG_SMP build.

Remove the warning by using a temporary variable.

Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()")
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>