git.karo-electronics.de Git - linux-beck.git/log

net: thunderx: introduce a function for mailbox access

This fixes sparse message:

drivers/net/ethernet/cavium/thunder/nicvf_main.c:153:25: sparse: cast to
restricted __le64

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: thunderx: fix constants

This fixes sparse messages like this:

drivers/net/ethernet/cavium/thunder/thunder_bgx.c:897:24: sparse:
constant 0x300000000000 is so big it is long

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: thunderx: Cleanup duplicate NODE_ID macros, add nic_get_node_id()

There are duplicate NODE_ID macro definitions. Move all of them to
nic.h for usage in nic and bgx driver and introduce nic_get_node_id()
helper function.

This patch also fixes 64bit mask which should have been ULL by
reworking the node calculation.

Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Conflicts:
drivers/net/phy/amd-xgbe-phy.c
drivers/net/wireless/iwlwifi/Kconfig
include/net/mac80211.h

iwlwifi/Kconfig and mac80211.h were both trivial overlapping
changes.

The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and
the bug fix that happened on the 'net' side is already integrated
into the rest of the amd-xgbe driver.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-next'

Hariprasad Shenai says:

====================
cxgb4/cxgb4vf: Adds support for Chelsio T6 adapter

This patch series adds the following:
Adds NIC driver support for T6 adapter
Adds vNIC driver support for T6 adapter

This patch series has been created against net-next tree and includes
patches on cxgb4 and cxgb4vf driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.

Thanks

V2:
Fixed compilation issue, when CHELSIO_T4_FCOE is set
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4vf: Adds SRIOV driver changes for T6 adapter

Adds vnic driver register related changes for T6 adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Adds support for T6 adapter

Adds NIC driver related changes for T6 adapter. Register related
changes, MC related changes, VF related changes, doorbell related
changes, debugfs changes, etc

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add is_t6 macro and T6 register ranges

Adds new macro is_t6 and adds the register address range for T6 adapter

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Various VTI tunnel (mark handling, PMTU) bug fixes from Alexander
    Duyck and Steffen Klassert.

2) Revert ethtool PHY query change, it wasn't correct.  The PHY address
    selected by the driver running the PHY to MAC connection decides
    what PHY address GET ethtool operations return information from.

3) Fix handling of sequence number bits for encryption IV generation in
    ESP driver, from Herbert Xu.

4) UDP can return -EAGAIN when we hit a bad checksum on receive, even
    when there are other packets in the receive queue which is wrong.
    Just respect the error returned from the generic socket recv
    datagram helper.  From Eric Dumazet.

5) Fix BNA driver firmware loading on big-endian systems, from Ivan
    Vecera.

6) Fix regression in that we were inheriting the congestion control of
    the listening socket for new connections, the intended behavior
    always was to use the default in this case.  From Neal Cardwell.

7) Fix NULL deref in brcmfmac driver, from Arend van Spriel.

8) OTP parsing fix in iwlwifi from Liad Kaufman.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (26 commits)
  vti6: Add pmtu handling to vti6_xmit.
  Revert "net: core: 'ethtool' issue with querying phy settings"
  bnx2x: Move statistics implementation into semaphores
  xen: netback: read hotplug script once at start of day.
  xen: netback: fix printf format string warning
  Revert "netfilter: ensure number of counters is >0 in do_replace()"
  net: dsa: Properly propagate errors from dsa_switch_setup_one
  tcp: fix child sockets to use system default congestion control if not set
  udp: fix behavior of wrong checksums
  sfc: free multiple Rx buffers when required
  bna: fix soft lock-up during firmware initialization failure
  bna: remove unreasonable iocpf timer start
  bna: fix firmware loading on big-endian machines
  bridge: fix br_multicast_query_expired() bug
  via-rhine: Resigning as maintainer
  brcmfmac: avoid null pointer access when brcmf_msgbuf_get_pktid() fails
  mac80211: Fix mac80211.h docbook comments
  iwlwifi: nvm: fix otp parsing in 8000 hw family
  iwlwifi: pcie: fix tracking of cmd_in_flight
  ip_vti/ip6_vti: Preserve skb->mark after rcv_cb call
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Pull Sparc fixes from David Miller:

1) Setup the core/threads/sockets bitmaps correctly so that 'lscpus'
    and friends operate properly.  Frtom Chris Hyser.

2) The bit that normally means "Cached Virtually" on sun4v systems,
    actually changes meaning in M7 and later chips.  Fix from Khalid
    Aziz.

3) One some PCI-E systems we need to probe different OF properties to
    fill in the PCI slot information properly, from Eric Snowberg.

4) Kill an extraneous memset after kzalloc(), from Christophe Jaillet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE
  sparc64: pci slots information is not populated in sysfs
  sparc: kernel: GRPCI2: Remove a useless memset
  sparc64: Setup sysfs to mark LDOM sockets, cores and threads correctly

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fix from Michael Tsirkin:
"Last-minute virtio fix for 4.1

  This tweaks an exported user-space header to fix build breakage for
  userspace using it"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  include/uapi/linux/virtio_balloon.h: include linux/virtio_types.h

geneve: allow user to specify TOS info for tunnel frames

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

geneve: allow user to specify TTL for tunnel frames

Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rocker-next'

Scott Feldman says:

====================
rocker: enable by default untagged VLAN support

This patch set is a followup to Simon Horman's RFC patch:

   [PATCH/RFC net-next] rocker: by default accept untagged packets

Now, on port probe, we install untagged VLAN (vid=0) support for each port
as the default.  This is equivalent to the command:

   bridge vlan add vid 0 dev DEV self

Accepting untagged VLAN pkts is a reasonable default, but the user could
override this with:

   bridge vlan del vid 0 dev DEV self

With this, we no longer need 8021q module to install vid=0 when port interface
opens.  In fact, we don't need support for legacy VLAN ndo ops at all since
they're superseded by bridge_setlink/dellink.  So remove legacy VLAN ndo ops
support in driver.  (The legacy VLAN ndo ops are supported by bonding/team
drivers, but don't fit into the transaction model offered by switchdev, so
switching all VLAN functions to bridge_setlink/dellink switchdev support gets
us stacked driver + transaction model support).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: remove support for legacy VLAN ndo ops

Remove support for legacy ndo ops
.ndo_vlan_rx_add_vid/.ndo_vlan_rx_kill_vid.  Rocker will use
bridge_setlink/dellink exclusively for VLAN add/del operations.

The legacy ops are needed if using 8021q driver module to setup VLANs on
the port.  But an alternative exists in using bridge_setlink/delink to
setup VLANs, which doesn't depend on 8021q module.  So rocker will switch
to the newer setlink/dellink ops.  VLANs can added/delete from the port,
regardless if port is bridged or not, using the bridge commands:

bridge vlan [add|del] vid VID dev DEV self

(Yes, I agree it's confusing to use the "bridge" command to set a VLAN on a
non-bridged port).

Using setlink/dellink over legacy ops let's us handle the stacked driver
case automatically.  It's built-in.  setlink also pass additional flags
(PVID, egress untagged) that aren't available with the legacy ops.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: install/remove router MAC for untagged VLAN when joining/leaving bridge

When the port joins a bridge, the port's internal VLAN ID needs to change
to the bridge's internal VLAN ID.  Likewise, when leaving the bridge, the
internal VLAN ID reverts back the port's original internal VLAN ID.  (The
internal VLAN ID is used by device to internally mark untagged pkts with
some VLAN, which will eventually be removed on egress...think PVID).  When
the internal VLAN ID changes, we need to update the VLAN table entries and
the router MAC entries for IP/IPv6 to reflect the new internal VLAN ID.

This patch makes use of the common rocker_port_vlan_add/del functions to
make sure the tables are updated for the current internal VLAN ID.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: install untagged VLAN (vid=0) support for each port

On port probe, install by default untagged VLAN support. This is
equivalent to running the command:

bridge vlan add vid 0 dev DEV self

A user could, if they wanted, manaully removing untagged support from the
port by running the command:

bridge vlan del vid 0 dev DEV self

But installing it by default on port initialization gives the normal
expected behavior.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: cleanup vlan table on error adding vlan

Basic house keeping: If there is an error adding the router MAC for this
vlan, removing the just installed VLAN table entry to leave device in same
state as before failure.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: zero allocate ports array

When allocating the array of rocker port pointers, zero the array values so
we can test for !NULL to see if port is allocated/registered.  We'll need
this later when installing untagged VLAN support for each port, during port
probe.  It's a long story, but to install a VLAN (vid=0 for untagged, in
this case) on a port, we'll need to scan other ports to see if the VLAN
group for that VLAN has been setup.  To scan the other ports, we need to
walk the port array.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fix for net

The following patch reverts the ebtables chunk that enforces counters that was
introduced in the recently applied d26e2c9ffa38 ('Revert "netfilter: ensure
number of counters is >0 in do_replace()"') since this breaks ebtables.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: Add GRO support for non hardware accelerated vlan

Currently packets with non-hardware-accelerated vlan cannot be handled
by GRO. This causes low performance for 802.1ad and stacked vlan, as their
vlan tags are currently not stripped by hardware.

This patch adds GRO support for non-hardware-accelerated vlan and
improves receive performance of them.

Test Environment:
vlan device (.1Q) on vlan device (.1ad) on ixgbe (82599)

Result:

- Before

$ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

87380  16384  16384    60.00    5233.17

Rx side CPU usage:
  %usr      %sys      %irq     %soft     %idle
  0.27     58.03      0.00     41.70      0.00

- After

$ netperf -t TCP_STREAM -H 192.168.20.2 -l 60
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

87380  16384  16384    60.00    7586.85

Rx side CPU usage:
  %usr      %sys      %irq     %soft     %idle
  0.50     25.83      0.00     59.53     14.14

[ Register VLAN offloads with priority 10 -DaveM ]

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: remove unused fn to enable/disable db coalescing

Remove unused function cxgb4_enable_db_coalescing() and
cxgb4_disable_db_coalescing()

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-for-davem-2015-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

Kalle Valo says:

====================
iwlwifi:

* fix OTP parsing 8260
* fix powersave handling for 8260

brcmfmac:

* fix null pointer crash
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: remove rocker parameter from functions that have rocker_port parameter

The rocker (switch) of a rocker_port may be trivially obtained from
the latter it seems cleaner not to pass the former to a function when
the latter is being passed anyway.

rocker_port_rx_proc() is omitted from this change as it is a hot path case.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vti6: Add pmtu handling to vti6_xmit.

We currently rely on the PMTU discovery of xfrm.
However if a packet is localy sent, the PMTU mechanism
of xfrm tries to to local socket notification what
might not work for applications like ping that don't
check for this. So add pmtu handling to vti6_xmit to
report MTU changes immediately.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Alloc 4k fragment for each rx ring buffer element

The driver allocates one page for each buffer on the rx ring, which is
too much on architectures like ppc64 and can cause unexpected allocation
failures when the system is under stress. Now, we keep a memory pool
per queue, and if the architecture's PAGE_SIZE is greater than 4k, we
fragment pages and assign each 4k segment to a ring element, which
reduces the overall memory consumption on such architectures. This
helps avoiding errors like the example below:

[bnx2x_alloc_rx_sge:435(eth1)]Can't alloc sge
[c00000037ffeb900] [d000000075eddeb4] .bnx2x_alloc_rx_sge+0x44/0x200 [bnx2x]
[c00000037ffeb9b0] [d000000075ee0b34] .bnx2x_fill_frag_skb+0x1ac/0x460 [bnx2x]
[c00000037ffebac0] [d000000075ee11f0] .bnx2x_tpa_stop+0x160/0x2e8 [bnx2x]
[c00000037ffebb90] [d000000075ee1560] .bnx2x_rx_int+0x1e8/0xc30 [bnx2x]
[c00000037ffebcd0] [d000000075ee2084] .bnx2x_poll+0xdc/0x3d8 [bnx2x] (unreliable)

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Acked-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Reviewed-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: include datapath actions with sampled-packet upcall to userspace

If new optional attribute OVS_USERSPACE_ATTR_ACTIONS is added to an
OVS_ACTION_ATTR_USERSPACE action, then include the datapath actions
in the upcall.

This Directly associates the sampled packet with the path it takes
through the virtual switch. Path information currently includes mangling,
encapsulation and decapsulation actions for tunneling protocols GRE,
VXLAN, Geneve, MPLS and QinQ, but this extension requires no further
changes to accommodate datapath actions that may be added in the
future.

Adding path information enhances visibility into complex virtual
networks.

Signed-off-by: Neil McKee <neil.mckee@inmon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add priority to packet_offload objects.

When we scan a packet for GRO processing, we want to see the most
common packet types in the front of the offload_base list.

So add a priority field so we can handle this properly.

IPv4/IPv6 get the highest priority with the implicit zero priority
field.

Next comes ethernet with a priority of 10, and then we have the MPLS
types with a priority of 15.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Suggested-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "net: core: 'ethtool' issue with querying phy settings"

This reverts commit f96dee13b8e10f00840124255bed1d8b4c6afd6f.

It isn't right, ethtool is meant to manage one PHY instance
per netdevice at a time, and this is selected by the SET
command. Therefore by definition the GET command must only
return the settings for the configured and selected PHY.

Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Move statistics implementation into semaphores

Commit dff173de84958 ("bnx2x: Fix statistics locking scheme") changed the
bnx2x locking around statistics state into using a mutex - but the lock
is being accessed via a timer which is forbidden.

[If compiled with CONFIG_DEBUG_MUTEXES, logs show a warning about
accessing the mutex in interrupt context]

This moves the implementation into using a semaphore [with size '1']
instead.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen: netback: read hotplug script once at start of day.

When we come to tear things down in netback_remove() and generate the
uevent it is possible that the xenstore directory has already been
removed (details below).

In such cases netback_uevent() won't be able to read the hotplug
script and will write a xenstore error node.

A recent change to the hypervisor exposed this race such that we now
sometimes lose it (where apparently we didn't ever before).

Instead read the hotplug script configuration during setup and use it
for the lifetime of the backend device.

The apparently more obvious fix of moving the transition to
state=Closed in netback_remove() to after the uevent does not work
because it is possible that we are already in state=Closed (in
reaction to the guest having disconnected as it shutdown). Being
already in Closed means the toolstack is at liberty to start tearing
down the xenstore directories. In principal it might be possible to
arrange to unregister the device sooner (e.g on transition to Closing)
such that xenstore would still be there but this state machine is
fragile and prone to anger...

A modern Xen system only relies on the hotplug uevent for driver
domains, when the backend is in the same domain as the toolstack it
will run the necessary setup/teardown directly in the correct sequence
wrt xenstore changes.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xen: netback: fix printf format string warning

drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’:
drivers/net/xen-netback/netback.c:1253:8: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=]
(txreq.offset&~PAGE_MASK) + txreq.size);
^

PAGE_MASK's type can vary by arch, so a cast is needed.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
----
v2: Cast to unsigned long, since PAGE_MASK can vary by arch.
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "netfilter: ensure number of counters is >0 in do_replace()"

This partially reverts commit 1086bbe97a07 ("netfilter: ensure number of
counters is >0 in do_replace()") in net/bridge/netfilter/ebtables.c.

Setting rules with ebtables does not work any more with 1086bbe97a07 place.

There is an error message and no rules set in the end.

e.g.

~# ebtables -t nat -A POSTROUTING --src 12:34:56:78:9a:bc -j DROP
Unable to update the kernel. Two possible causes:
1. Multiple ebtables programs were executing simultaneously. The ebtables
userspace tool doesn't by default support multiple ebtables programs
running

Reverting the ebtables part of 1086bbe97a07 makes this work again.

Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

include/uapi/linux/virtio_balloon.h: include linux/virtio_types.h

Fixes userspace compilation error:

error: unknown type name ‘__virtio16’
__virtio16 tag;

Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

xen-netfront: Use setup_timer

Use the timer API function setup_timer instead of structure field
assignments to initialize a timer.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@change@
expression e, func, da;
@@

-init_timer (&e);
+setup_timer (&e, func, da);
-e.data = da;
-e.function = func;

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE

sparc: Resolve conflict between sparc v9 and M7 on usage of bit 9 of TTE

Bit 9 of TTE is CV (Cacheable in V-cache) on sparc v9 processor while
the same bit 9 is MCDE (Memory Corruption Detection Enable) on M7
processor. This creates a conflicting usage of the same bit. Kernel
sets TTE.cv bit on all pages for sun4v architecture which works well
for sparc v9 but enables memory corruption detection on M7 processor
which is not the intent. This patch adds code to determine if kernel
is running on M7 processor and takes steps to not enable memory
corruption detection in TTE erroneously.

Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc64: pci slots information is not populated in sysfs

Add PCI slot numbers within sysfs for PCIe hardware. Larger
PCIe systems with nested PCI bridges and slots further
down on these bridges were not being populated within sysfs.
This will add ACPI style PCI slot numbers for these systems
since the OF 'slot-names' information is not available on
all PCIe platforms.

Signed-off-by: Eric Snowberg <eric.snowberg@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sparc: kernel: GRPCI2: Remove a useless memset

grpci2priv is allocated using kzalloc, so there is no need to memset it.

Signed-off-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Properly propagate errors from dsa_switch_setup_one

While shuffling some code around, dsa_switch_setup_one() was introduced,
and it was modified to return either an error code using ERR_PTR() or a
NULL pointer when running out of memory or failing to setup a switch.

This is a problem for its caler: dsa_switch_setup() which uses IS_ERR()
and expects to find an error code, not a NULL pointer, so we still try
to proceed with dsa_switch_setup() and operate on invalid memory
addresses. This can be easily reproduced by having e.g: the bcm_sf2
driver built-in, but having no such switch, such that drv->setup will
fail.

Fix this by using PTR_ERR() consistently which is both more informative
and avoids for the caller to use IS_ERR_OR_NULL().

Fixes: df197195a5248 ("net: dsa: split dsa_switch_setup into two functions")
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix child sockets to use system default congestion control if not set

Linux 3.17 and earlier are explicitly engineered so that if the app
doesn't specifically request a CC module on a listener before the SYN
arrives, then the child gets the system default CC when the connection
is established. See tcp_init_congestion_control() in 3.17 or earlier,
which says "if no choice made yet assign the current value set as
default". The change ("net: tcp: assign tcp cong_ops when tcp sk is
created") altered these semantics, so that children got their parent
listener's congestion control even if the system default had changed
after the listener was created.

This commit returns to those original semantics from 3.17 and earlier,
since they are the original semantics from 2007 in 4d4d3d1e8 ("[TCP]:
Congestion control initialization."), and some Linux congestion
control workflows depend on that.

In summary, if a listener socket specifically sets TCP_CONGESTION to
"x", or the route locks the CC module to "x", then the child gets
"x". Otherwise the child gets current system default from
net.ipv4.tcp_congestion_control. That's the behavior in 3.17 and
earlier, and this commit restores that.

Fixes: 55d8694fa82c ("net: tcp: assign tcp cong_ops when tcp sk is created")
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rds-next'

Sowmini Varadhan says:

====================
net/rds: SOL_RDS socket option to explicitly select transport

Today the underlying transport (TCP or IB) for a PF_RDS socket is
implicitly selected based on the local address used to bind(2) the
PF_RDS socket. This results in some non-deterministic behavior when
there are un-numbered and IPoIB interfaces sharing the same IP address.
It also places the constraint that the IB interface must have an IP
address (and thus, IPoIB) configured on it.

The non-determinism may be avoided by providing the user-space application
a socket option that allows it to explicitly select the transport
prior to bind(2).

Patch 1 of this series provides the constant definitions needed by
the application via <linux/rds.h>.

Patch 2 provides the setsockopt support, and Patch 3 provides the
getsockopt support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/rds Add getsockopt support for SO_RDS_TRANSPORT

The currently attached transport for a PF_RDS socket may be obtained
from user space by invoking getsockopt(2) using the SO_RDS_TRANSPORT
option at the SOL_RDS level. The integer optval returned will be one
of the RDS_TRANS_* constants defined in linux/rds.h.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/rds: Add setsockopt support for SO_RDS_TRANSPORT

An application may deterministically attach the underlying transport for
a PF_RDS socket by invoking setsockopt(2) with the SO_RDS_TRANSPORT
option at the SOL_RDS level. The integer argument to setsockopt must be
one of the RDS_TRANS_* transport types, e.g., RDS_TRANS_TCP. The option
must be specified before invoking bind(2) on the socket, and may only
be used once on the socket. An attempt to set the option on a bound
socket, or to invoke the option after a successful SO_RDS_TRANSPORT
attachment, will return EOPNOTSUPP.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/rds: Declare SO_RDS_TRANSPORT and RDS_TRANS_* constants in uapi/linux/rds.h

User space applications that desire to explicitly select the
underlying transport for a PF_RDS socket may do so by using the
SO_RDS_TRANSPORT socket option at the SOL_RDS level before bind().
The integer argument provided to the socket option would be one
of the RDS_TRANS_* values, e.g., RDS_TRANS_TCP. This commit exports
the constant values need by such applications via <linux/rds.h>

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet/intel: Use setup_timer

Use the timer API function setup_timer instead of structure field
assignments to initialize a timer.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@change@
expression e1, e2, e3, e4, a, b;
@@

-init_timer(&e1);
+setup_timer(&e1, a, b);

... when != a = e2
when != b = e3

-e1.function = a;
... when != b = e4
-e1.data = b;

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebpf: misc core cleanup

Besides others, move bpf_tail_call_proto to the remaining definitions
of other protos, improve comments a bit (i.e. remove some obvious ones,
where the code is already self-documenting, add objectives for others),
simplify bpf_prog_array_compatible() a bit.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ebpf: allow bpf_ktime_get_ns_proto also for networking

As this is already exported from tracing side via commit d9847d310ab4
("tracing: Allow BPF programs to call bpf_ktime_get_ns()"), we might
as well want to move it to the core, so also networking users can make
use of it, e.g. to measure diffs for certain flows from ingress/egress.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: fix behavior of wrong checksums

We have two problems in UDP stack related to bogus checksums :

1) We return -EAGAIN to application even if receive queue is not empty.
This breaks applications using edge trigger epoll()

2) Under UDP flood, we can loop forever without yielding to other
processes, potentially hanging the host, especially on non SMP.

This patch is an attempt to make things better.

We might in the future add extra support for rt applications
wanting to better control time spent doing a recv() in a hostile
environment. For example we could validate checksums before queuing
packets in socket receive queue.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/capi: Use setup_timer

Use the timer API function setup_timer instead of structure field
assignments to initialize a timer.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@change@
expression e1, e2, e3, e4, a, b;
@@

-init_timer(&e1);
+setup_timer(&e1, a, b);

... when != a = e2
when != b = e3

-e1.data = b;
... when != a = e4
-e1.function = a;

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dl2k: Use setup_timer

Use the timer API function setup_timer instead of structure field
assignments to initialize a timer.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@change@
expression e1, e2, e3, e4, a, b;
@@

-init_timer(&e1);
+setup_timer(&e1, a, b);

... when != a = e2
when != b = e3

-e1.data = b;
... when != a = e4
-e1.function = a;

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mv643xx_eth: Use setup_timer

Use the timer API function setup_timer instead of structure field
assignments to initialize a timer.

A simplified version of the Coccinelle semantic patch that performs
this transformation is as follows:

@change@
expression e, func, da;
@@

-init_timer (&e);
+setup_timer (&e, func, da);
-e.data = da;
-e.function = func;

Signed-off-by: Vaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Linux 4.1-rc6

sfc: free multiple Rx buffers when required

When Rx packet data must be dropped, all the buffers
associated with that Rx packet must be freed. Extend
and rename efx_free_rx_buffer() to efx_free_rx_buffers()
and loop through all the fragments.
By doing so this patch fixes a possible memory leak.

Signed-off-by: Shradha Shah <sshah@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-next-for-davem-2015-05-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
As we get closer to the merge window, here are a few
more things for -next:
* disconnect TDLS stations on CSA to avoid issues
* fix a memory leak introduced in a recent commit
* switch rfkill and cfg80211 to PM ops
* in an unlikely scenario, prevent a bookkeeping
value to get corrupted leading to dropped packets
* fix a crash in VLAN assignment
* switch rfkill-gpio to more modern gpiod API
* send disconnected event to userspace with proper
local/remote indication
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs fix from Al Viro:
"Off-by-one in d_walk()/__dentry_kill() race fix.

  It's very hard to hit; possible in the same conditions as the original
  bug, except that you need the skipped branch to contain all the
  remaining evictables, so that the d_walk()-calling loop in
  d_invalidate() decides there's nothing more to do and doesn't go for
  another pass - otherwise that next pass will sweep the sucker.

  So it's not too urgent, but seeing that the fix is obvious and the
  original commit has spread into all -stable branches..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  d_walk() might skip too much

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"Three fixes this time around:

   - fix a memory leak which occurs when probing performance monitoring
     unit interrupts

   - fix handling of non-PMD aligned end of RAM causing boot failures

   - fix missing syscall trace exit path with syscall tracing enabled
     causing a kernel oops in the audit code"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8357/1: perf: fix memory leak when probing PMU PPIs
  ARM: fix missing syscall trace exit
  ARM: 8356/1: mm: handle non-pmd-aligned end of RAM

Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux

Pull MIPS fixes from Ralf Baechle:
"MIPS fixes for 4.1 all across the tree"

* 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux:
  MIPS: strnlen_user.S: Fix a CPU_DADDI_WORKAROUNDS regression
  MIPS: BMIPS: Fix bmips_wr_vec()
  MIPS: ath79: fix build problem if CONFIG_BLK_DEV_INITRD is not set
  MIPS: Fuloong 2E: Replace CONFIG_USB_ISP1760_HCD by CONFIG_USB_ISP1760
  MIPS: irq: Use DECLARE_BITMAP
  ttyFDC: Fix to use native endian MMIO reads
  MIPS: Fix CDMM to use native endian MMIO reads

Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat tool fixes from Len Brown:
"Just one minor kernel dependency in this batch -- added a #define to
  msr-index.h"

* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: update version number to 4.7
  tools/power turbostat: allow running without cpu0
  tools/power turbostat: correctly decode of ENERGY_PERFORMANCE_BIAS
  tools/power turbostat: enable turbostat to support Knights Landing (KNL)
  tools/power turbostat: correctly display more than 2 threads/core

Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
"These are mostly minor fixes, with the exception of the following that
  address fall-out from recent v4.1-rc1 changes:

   - regression fix related to the big fabric API registration changes
     and configfs_depend_item() usage, that required cherry-picking one
     of HCH's patches from for-next to address the issue for v4.1 code.

   - remaining TCM-USER -v2 related changes to enforce full CDB
     passthrough from Andy + Ilias.

  Also included is a target_core_pscsi driver fix from Andy that
  addresses a long standing issue with a Scsi_Host reference being
  leaked on PSCSI device shutdown"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iser-target: Fix error path in isert_create_pi_ctx()
  target: Use a PASSTHROUGH flag instead of transport_types
  target: Move passthrough CDB parsing into a common function
  target/user: Only support full command pass-through
  target/user: Update example code for new ABI requirements
  target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST
  target: Fix se_tpg_tfo->tf_subsys regression + remove tf_subsystem
  target: Drop signal_pending checks after interruptible lock acquire
  target: Add missing parentheses
  target: Fix bidi command handling
  target/user: Disallow full passthrough (pass_level=0)
  ISCSI: fix minor memory leak

Merge tag 'hwmon-for-linus-v4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
"Some late hwmon patches, all headed for -stable

   - fix sysfs attribute initialization in nct6775 and nct6683 drivers

   - do not attempt to auto-detect tmp435 on I2C address 0x37

   - ensure iio channel is of type IIO_VOLTAGE in ntc_thermistor driver"

* tag 'hwmon-for-linus-v4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (nct6683) Add missing sysfs attribute initialization
  hwmon: (nct6775) Add missing sysfs attribute initialization
  hwmon: (tmp401) Do not auto-detect chip on I2C address 0x37
  hwmon: (ntc_thermistor) Ensure iio channel is of type IIO_VOLTAGE

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
Included changes:
- checkpatch fixes
- code cleanup
- debugfs component is now compiled only if DEBUG_FS is selected
- update copyright years
- disable by default not-so-user-safe features
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: add missing rcu protection when releasing programs from prog_array

Normally the program attachment place (like sockets, qdiscs) takes
care of rcu protection and calls bpf_prog_put() after a grace period.
The programs stored inside prog_array may not be attached anywhere,
so prog_array needs to take care of preserving rcu protection.
Otherwise bpf_tail_call() will race with bpf_prog_put().
To solve that introduce bpf_prog_put_rcu() helper function and use
it in 3 places where unattached program can decrement refcnt:
closing program fd, deleting/replacing program in prog_array.

Fixes: 04fd61ab36ec ("bpf: allow bpf programs to tail-call other bpf programs")
Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hv_netvsc-next'

K. Y. Srinivasan says:

====================
hv_netvsc: Implement NUMA aware memory allocation

Allocate both receive buffer and send buffer from the NUMA node assigned to the
primary channel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: Allocate the sendbuf in a NUMA aware way

Allocate the send buffer in a NUMA aware way.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: Allocate the receive buffer from the correct NUMA node

Allocate the receive bufer from the NUMA node assigned to the primary
channel.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netevent: remove automatic variable in register_netevent_notifier()

Remove automatic variable 'err' in register_netevent_notifier() and
return the result of atomic_notifier_chain_register() directly.

Signed-off-by: Wang Long <long.wanglong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next, they are:

1) default CONFIG_NETFILTER_INGRESS to y for easier compile-testing of all
   options.

2) Allow to bind a table to net_device. This introduces the internal
   NFT_AF_NEEDS_DEV flag to perform a mandatory check for this binding.
   This is required by the next patch.

3) Add the 'netdev' table family, this new table allows you to create ingress
   filter basechains. This provides access to the existing nf_tables features
   from ingress.

4) Kill unused argument from compat_find_calc_{match,target} in ip_tables
   and ip6_tables, from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'systemport-next'

Florian Fainelli says:

====================
net: systemport: misc improvements

These patches are highly inspired by changes from Petri on bcmgenet, last patch
is a misc fix that I had pending for a while, but is not a candidate for 'net'
at this point.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: systemport: Add a check for oversized packets

Occasionnaly we may get oversized packets from the hardware which exceed
the nomimal 2KiB buffer size we allocate SKBs with. Add an early check
which drops the packet to avoid invoking skb_over_panic() and move on to
processing the next packet.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: systemport: rewrite bcm_sysport_rx_refill

Currently, bcm_sysport_desc_rx() calls bcm_sysport_rx_refill() at the end of Rx
packet processing loop, after the current Rx packet has already been passed to
napi_gro_receive(). However, bcm_sysport_rx_refill() might fail to allocate a new
Rx skb, thus leaving a hole on the Rx queue where no valid Rx buffer exists.

To eliminate this situation:

1. Rewrite bcm_sysport_rx_refill() to retain the current Rx skb on the
Rx queue if a new replacement Rx skb can't be allocated and DMA-mapped.
In this case, the data on the current Rx skb is effectively dropped.

2. Modify bcm_sysport_desc_rx() to call bcm_sysport_rx_refill() at the
top of Rx packet processing loop, so that the new replacement Rx skb is
already in place before the current Rx skb is processed.

This is loosely inspired from d6707bec5986 ("net: bcmgenet: rewrite
bcmgenet_rx_refill()")

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: systemport: Pre-calculate and utilize cb->bd_addr

There is a 1:1 mapping between the software maintained control block in
priv->rx_cbs and the buffer address in priv->rx_bds, such that there is
no need to keep computing the buffer address when refiling a control
block.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: drop unneeded goto

Delete jump to a label on the next line, when that label is not
used elsewhere.

A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier l;
@@

-if (...) goto l;
-l:
// </smpl>

Also remove the unnecessary ret variable.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bna-fixes'

Ivan Vecera says:

====================
bna: misc bugfixes

These patches fix several bugs found during device initialization debugging.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bna: fix soft lock-up during firmware initialization failure

Bug in the driver initialization causes soft-lockup if firmware
initialization timeout is reached. Polling function bfa_ioc_poll_fwinit()
incorrectly calls bfa_nw_iocpf_timeout() when the timeout is reached.
The problem is that bfa_nw_iocpf_timeout() calls again
bfa_ioc_poll_fwinit()... etc. The bfa_ioc_poll_fwinit() should directly
send timeout event for iocpf and the same should be done if firmware
download into HW fails.

Cc: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bna: remove unreasonable iocpf timer start

Driver starts iocpf timer prior bnad_ioceth_enable() call and this is
unreasonable. This piece of code probably originates from Brocade/Qlogic
out-of-box driver during initial import into upstream. This driver uses
only one timer and queue to implement multiple timers and this timer is
started at this place. The upstream driver uses multiple timers instead
of this.

Cc: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bna: fix firmware loading on big-endian machines

Firmware required by bna is stored in appropriate files as sequence
of LE32 integers. After loading by request_firmware() they need to be
byte-swapped on big-endian arches. Without this conversion the NIC
is unusable on big-endian machines.

Cc: Rasesh Mody <rasesh.mody@qlogic.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: thunderx: add 64-bit dependency

The thunderx ethernet driver fails to build on architectures
that do not have an atomic readq() and writeq() function for
64-bit PCI bus access:

drivers/net/ethernet/cavium/thunder/thunder_bgx.c: In function 'bgx_reg_read':
include/asm-generic/io.h:195:23: error: implicit declaration of function 'readq' [-Werror=implicit-function-declaration]

It seems impossible to get this driver to work on most 32-bit
hardware, so it's better to add an explicit dependency, in
order to let us keep building 'allmodconfig' kernels on
all architectures.

As the driver is meant for the internal hardware on an arm64 SoC, this
is not a problem for usability. Allowing the build on all 64-bit
architectures rather than just CONFIG_ARM64 on the other hand means that
we get the benefit of build testing on x86.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mac80211-for-davem-2015-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
This just has a single docbook build fix. In my confusion
I'd already sent the same fix for -next, but Ben Hutchings
noted it's necessary in 4.1.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4-next'

Or Gerlitz says:

====================
mlx4 driver update, May 28, 2015

The 1st patch fixes an issue with a function running DPDK overriding
broadcast steering rules set by other functions. Please add this one
to your -stable queue.

The rest of the series from Matan and Ido deals with scaling the number
of IRQs that serve RoCE applications to be in par with the Ethernet driver.

changes from V0:
- addressed feedback from Sergei, removed extra blank line in patch #4
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Make sure there are no pending async events when freeing CQ

When freeing a CQ, we need to make sure there are no
asynchronous events (on the ASYNC EQ) that could
relate to this CQ before freeing it.

This is done by introducing synchronize_irq.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Move affinity hints to mlx4_core ownership

Now that EQs management is in the sole responsibility of mlx4_core,
the IRQ affinity hints configuration should be in its hands as well.
request_irq is called only once by the first consumer (maybe mlx4_ib),
so mlx4_en passes the affinity mask too late. We also need to request
vectors according to the cores we want to run on.

mlx4_core distribution of IRQs to cores is straight forward,
EQ(i)->IRQ will set affinity hint to core i.
Consumers need to request EQ vectors, according to their cores
considerations (NUMA).

Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4: Add EQ pool

Previously, mlx4_en allocated EQs and used them exclusively.
This affected RoCE performance, as applications which are
events sensitive were limited to use only the legacy EQs.

Change that by introducing an EQ pool. This pool is managed
by mlx4_core. EQs are assigned to ports (when there are limited
number of EQs, multiple ports could be assigned to the same EQs).

An exception to this rule is the ASYNC EQ which handles various events.

Legacy EQs are completely removed as all EQs could be shared.

When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for
EQ serving on a specific port. The core driver calculates which
EQ should be assigned to that request.

Because IRQs are shared between IB and Ethernet modules, their
names only include the PCI device BDF address.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Demote simple multicast and broadcast flow steering rules

In SRIOV, when simple (i.e - Ethernet L2 only) flow steering rules are
created, always create them at MLX4_DOMAIN_NIC priority (instead of
the real priority the function created them at). This is done in order
to let multiple functions add broadcast/multicast rules without
affecting other functions, which is necessary for DPDK in SRIOV.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: fix br_multicast_query_expired() bug

br_multicast_query_expired() querier argument is a pointer to
a struct bridge_mcast_querier :

struct bridge_mcast_querier {
struct br_ip addr;
struct net_bridge_port __rcu *port;
};

Intent of the code was to clear port field, not the pointer to querier.

Fixes: 2cd4143192e8 ("bridge: memorize and export selected IGMP/MLD querier port")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Linus Lüssing <linus.luessing@c0d3.blue>
Cc: Linus Lüssing <linus.luessing@web.de>
Cc: Steinar H. Gunderson <sesse@samfundet.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2015-05-28

Here's a set of patches intended for 4.2. The majority of the changes
are on the 802.15.4 side of things rather than Bluetooth related:

- All sorts of cleanups & fixes to ieee802154 and related drivers
- Rework of tx power support in ieee802154 and its drivers
- Support for setting ieee802154 tx power through nl802154
- New IDs for the btusb driver
- Various cleanups & smaller fixes to btusb
- New btrtl driver for Realtec devices
- Fix suspend/resume for Realtek devices

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

iser-target: Fix error path in isert_create_pi_ctx()

We don't assign pi_ctx to desc->pi_ctx until we're certain to succeed
in the function. That means the cleanup path should use the local
pi_ctx variable, not desc->pi_ctx.

This was detected by Coverity (CID 1260062).

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Use a PASSTHROUGH flag instead of transport_types

It seems like we only care if a transport is passthrough or not. Convert
transport_type to a flags field and replace TRANSPORT_PLUGIN_* with a
flag, TRANSPORT_FLAG_PASSTHROUGH.

Signed-off-by: Andy Grover <agrover@redhat.com>
Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target: Move passthrough CDB parsing into a common function

Aside from whether they handle BIDI ops or not, parsing of the CDB by
kernel and user SCSI passthrough modules should be identical. Move this
into a new passthrough_parse_cdb() and call it from tcm-pscsi and tcm-user.

Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target/user: Only support full command pass-through

After much discussion, give up on only passing a subset of SCSI commands
to userspace and pass them all. Based on what pscsi is doing, make sure
to set SCF_SCSI_DATA_CDB for I/O ops, and define attributes identical to
pscsi.

Make hw_block_size configurable via dev param.

Remove mention of command filtering from tcmu-design.txt.

Signed-off-by: Andy Grover <agrover@redhat.com>
Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target/user: Update example code for new ABI requirements

We now require that the userspace handler set a bit if the command is not
handled.

Update calls to tcmu_hdr_get_op for v2.

Signed-off-by: Andy Grover <agrover@redhat.com>
Reviewed-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

target/pscsi: Don't leak scsi_host if hba is VIRTUAL_HOST

See https://bugzilla.redhat.com/show_bug.cgi?id=1025672

We need to put() the reference to the scsi host that we got in
pscsi_configure_device(). In VIRTUAL_HOST mode it is associated with
the dev_virt, not the hba_virt.

Signed-off-by: Andy Grover <agrover@redhat.com>
Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Merge branch 'mlx5-next'

Amir Vadai says:

====================
net/mlx5: ConnectX-4 100G Ethernet driver

This patchset extends the mlx5_core driver to support Ethernet
functionality. The Ethernet functionality in the mlx5 driver is
integrated into the core driver and not as separated driver. The
IB functionality remains in the mlx5_ib driver as before.

This functionality will enable the Ethernet capability of Mellanox's new
famility of cards - ConnectX-4. Due to the fact that backword
compatability is being kept, existing Connect-IB cards that are using
this driver are fully working with the modified driver, and no issues
with current deployments should be seen.

Like the ConnectX-3 cards, ConnectX-4 is a VPI (Virtual Port Interface -
every port can be configured as Infiniband or Ethernet) card.
Unlike previous generations, the ConnectX-4 has a separate PCI function
per port.

The current code has a limitation that Infiniband and Ethernet port types
are mutually exclusive. When the driver is compiled with Ethernet
support, the Infiniband functionality is disabled and vice versa. To
control that we added the CONFIG_MLX5_CORE_EN config directive
which is 'n' by default, but can be changed by the user.

This limitation is short-lived and would be addressed soon.

As part of this patchset, mlx5_ifc.h was heavily modified [1]. This file
is now generated automatically from the device specification document.
Since this patch is too big for the mail server, it might be missing in
the mailing list, but could be pulled from an external git repository [2].

irq name selection is done at driver initialization and doesn't contain the
interface name as part of the irq name.
irq_balancer will still work thanks to an improvement introduced by Neil Horman
[3] to use sysfs instead of /proc/interrupts.

Patchset was applied on top of commit ed2dfd9 ("tcp/dccp: warn user for
preferred ip_local_port_range")

[1] - Patch 4/11 ("net/mlx5_core: HW data structs/types definitions preparation for mlx5 ehternet driver")
[2] - http://git.openfabrics.org/?p=~amirv/linux.git;a=shortlog;h=refs/heads/mlx5e_v1
[3] - kernel: da8d1c8 PCI/sysfs: add per pci device msi[x] irq listing (v5)
      irq_balancer: 32a7757 Complete rework of how we detect and classify irqs

Thanks to Achiad, Saeed, Yevheny, Or and the whole team for making this happen,
Amir

Changes from V4:
- Removed Patch 3/12: net/mlx5_core: Add EQ renaming mechanism
- Patch 12/12: net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality
  - irq name is created on driver initialization, therefore it won't contain
    the network interface name in it. This won't effect irq_balancer thanks to
    patches introduced by Neil Horman to use sysfs instead of /proc/interrupts.

Changes from V3:
- PATCH 8/11: net/mlx5_core: Set/Query port MTU commands
  - Return value directly - no need for err.

Changes from V2:
- Improved changelogs and cover-letter
- Added CONFIG_MLX5_EN to disable/enable the Ethernet functionality
- Moved en.h and wq.[ch] into the patch with data-path related code

Changes from V1:
- Added patch 1/12 ("net/mlx5_core,mlx5_ib: Do not use vmap() on coherent
  memory")

Changes from V0:
- Removed V0 Patch 1/11 ("net/mlx5_core: Virtually extend work/completion queue
  buffers by one page") due to misuse of DMA API. Thanks Dave.
- Patch 1/11 ("net/mlx5_core: Set irq affinity hints"):
  - Use kcalloc instead of kzalloc
  - Fix build error when CONFIG_CPUMASK_OFFSTACK=n. Driver loading will fail
    now if cpumask allocation is failing.
  - Using dev_to_node helper. Thanks, Ido.
- Patch 3/11 ("net/mlx5_core: HW data structs/types definitions preparation for
  mlx5 ehternet driver")
  - Removed Mellanox internal comment at the head of the file. Thanks Joe
- Patch 6/11 ("net/mlx5_core: Implement get/set port status")
  - Use direct return of function's result. Thanks Sergei.
- Added Patch 8/11 ("net/mlx5_core: Set/Query port MTU commands")
- Patch 9/11 ("net/mlx5: Ethernet Datapath files")
  - Use rq->wqe_sz instead of skb_end_offset. Thanks Ido.
  - Use dma_wmb() when possible instead of wmb(). Thanks Alex.
  - Fix checkpatch issues
- Patch 10/11 ("net/mlx5: ethernet resources handling")
  - checkpatch issues
  - Added missing include
- Patch 11/11 ("net/mlx5: Ethernet driver")
  - checkpatch issues
  - fixed typo
  - Modified use of affinity hint
  - Using dev_to_node helper. Thanks, Ido.
  - Use new hardware commands from Patch 8/11 ("net/mlx5_core: Set/Query port
    MTU commands") to get/set port MTU in hardware.
  - Removed NETIF_F_SG since hardware ring wraparound is not supported
  - Use dma_wmb() when possible instead of wmb(). Thanks Alex.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality

This is the Ethernet part of the driver for the Mellanox ConnectX(R)-4
Single/Dual-Port Adapter supporting 100Gb/s with VPI. The driver
extends the existing mlx5 driver with Ethernet functionality.

This patch contains the driver entry points but does not include
transmit and receive (see the previous patch in the series) routines.

It also adds the option MLX5_CORE_EN to Kconfig to enable/disable the
Ethernet functionality. Currently, Kconfig is programmed to make
Ethernet and Infiniband functionality mutally exclusive.
Also changed MLX5_INFINIBAND to be depandant on MLX5_CORE instead of
selecting it, since MLX5_CORE could be selected without MLX5_INFINIBAND
being selected.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Ethernet resource handling files

This patch contains the resource handling files:
- flow_table.c: This file contains the code to handle the low level API
to configure hardware flow table. It is separated from
the flow_table_en.c, because it will be used in the
future by Raw Ethernet QP in mlx5_ib too.
- en_flow_table.[ch]: Ethernet flow steering handling. The flow table
object contain a mapping between flow specs and TIRs.
This mechanism will be used also to configure e-switch
in the future, when SR-IOV support will be added.
- transobj.[ch] - Low level functions to create/modify/destroy the
                  transport objects: RQ/SQ/TIR/TIS
- vport.[ch] - Handle attributes of a virtual port (vPort) in the
  embedded switch. Currently this switch is a passthrough, until SR-IOV
  support will be added.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Ethernet Datapath files

en_[rt]x.c contains the data path related code specific to tx or rx.
en_txrx.c contains data path code which is common for both the rx and
tx, this is mainly napi related code.

Below are the objects that are being used by the hardware and the driver
in the data path:

Channel - one channel per IRQ. Every channel object contains:
  RQ  - describes the rx queue
  TIR - One TIR (Transport Interface Receive) object per flow type. TIR
        contains attributes for a type of rx flow (e.g IPv4, IPv6 etc).
        A flow is defined in the Flow Table.
        Currently TIR describes the RSS hash parameters if exists and LRO
        attributes.
  SQ  - describes the a tx queue. There is one SQ (Send Queue) per
        TC (traffic class).
  TIS - There is one TIS (Transport Interface Send) per TC.  It
        describes the TC and may later be extended to describe more
transport properties.

Both RQ and SQ inherit from the object WQ (work queue). This common code
to describe the layout of CQE's WQE's in memory is in the files wq.[cj]

For every channel there is one NAPI context that is used for RX and
for TX.

Driver is using netdev_alloc_skb() to allocate skb's.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5_core: Set/Query port MTU commands

Introduce set/Query low level functions to access MTU in hardware. To be
used by the netdev.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5_core: Modify CQ moderation parameters

Introduce mlx5_core_modify_cq_moderation() to be used by the netdev, to
set hardware coalescing.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5_core: Implement get/set port status

Implemet get/set port status low level functions to be exposed by the
netdev.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5_core: Implement access functions of ptys register fields

Those registers will be used by the ethtool to set/get settings.

Signed-off-by: Rana Shahout <ranas@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5_core: New device capabilities handling

- Query all supported types of dev caps on driver load.
- Store the Cap data outbox per cap type into driver private data.
- Introduce new Macros to access/dump stored caps (using the auto
generated data types).
- Obsolete SW representation of dev caps (no need for SW copy for each
cap).
- Modify IB driver to use new macros for checking caps.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>