git.karo-electronics.de Git - karo-tx-linux.git/log

ath9k: Use mutex_lock to avoid potential race in start/stop rng

Move ath9k_rng_stop/ath9k_rng_start pair into critical section,
use mutex_lock to void potential race accessing.

Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath9k: avoid potential freezing during random generator read

In the worst case, ath9k_rng_stop() may take 10s to stop rng kthread.
The time is too long for users, use wait_event_interruptible_timeout()
instead of msleep_interruptible(), wakup immediately once
kthread_should_stop() is true.

Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath9k: fix an invalid pointer dereference in ath9k_rng_stop()

The bug was triggered when do suspend/resuming continuously
on Dell XPS L322X/0PJHXN version 9333 (2013) with kernel
4.12.0-041200rc4-generic. But can't reproduce on DELL
E5440 + AR9300 PCIE chips.

The warning is caused by accessing invalid pointer sc->rng_task.
sc->rng_task is not be cleared after kthread_stop(sc->rng_task)
be called in ath9k_rng_stop(). Because the kthread is stopped
before ath9k_rng_kthread() be scheduled.

So set sc->rng_task to null after kthread_stop(sc->rng_task) to
resolve this issue.

WARNING: CPU: 0 PID: 984 at linux/kernel/kthread.c:71 kthread_stop+0xf1/0x100
CPU: 0 PID: 984 Comm: NetworkManager Not tainted 4.12.0-041200rc4-generic #201706042031
Hardware name: Dell Inc.          Dell System XPS L322X/0PJHXN, BIOS A09 05/15/2013
task: ffff950170fdda00 task.stack: ffffa22c01538000
RIP: 0010:kthread_stop+0xf1/0x100
RSP: 0018:ffffa22c0153b5b0 EFLAGS: 00010246
RAX: ffffffffa6257800 RBX: ffff950171b79560 RCX: 0000000000000000
RDX: 0000000080000000 RSI: 000000007fffffff RDI: ffff9500ac9a9680
RBP: ffffa22c0153b5c8 R08: 0000000000000000 R09: 0000000000000000
R10: ffffa22c0153b648 R11: ffff9501768004b8 R12: ffff9500ac9a9680
R13: ffff950171b79f70 R14: ffff950171b78780 R15: ffff9501749dc018
FS:  00007f0d6bfd5540(0000) GS:ffff95017f200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fc190161a08 CR3: 0000000232906000 CR4: 00000000001406f0
Call Trace:
  ath9k_rng_stop+0x1a/0x20 [ath9k]
  ath9k_stop+0x3b/0x1d0 [ath9k]
  drv_stop+0x33/0xf0 [mac80211]
  ieee80211_stop_device+0x43/0x50 [mac80211]
  ieee80211_do_stop+0x4f2/0x810 [mac80211]

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196043
Reported-by: Giulio Genovese <giulio.genovese@gmail.com>
Tested-by: Giulio Genovese <giulio.genovese@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath10k: add const to thermal_cooling_device_ops structure

Declare thermal_cooling_device_ops structure as const as it is only passed
as an argument to the function thermal_cooling_device_register and this
argument is of type const. So, declare the structure as const.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath9k: fix tx99 bus error

The hard coded register 0x9864 and 0x9924 are invalid
for ar9300 chips.

Cc: <stable@vger.kernel.org>
Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

ath9k: fix tx99 use after free

One scenario that could lead to UAF is two threads writing
simultaneously to the "tx99" debug file. One of them would
set the "start" value to true and follow to ath9k_tx99_init().
Inside the function it would set the sc->tx99_state to true
after allocating sc->tx99skb. Then, the other thread would
execute write_file_tx99() and call ath9k_tx99_deinit().
sc->tx99_state would be freed. After that, the first thread
would continue inside ath9k_tx99_init() and call
r = ath9k_tx99_send(sc, sc->tx99_skb, &txctl);
that would make use of the freed sc->tx99_skb memory.

Cc: <stable@vger.kernel.org>
Signed-off-by: Miaoqing Pan <miaoqing@codeaurora.org>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>

brcmfmac: Fix a memory leak in error handling path in 'brcmf_cfg80211_attach'

If 'wiphy_new()' fails, we leak 'ops'. Add a new label in the error
handling path to free it in such a case.

Cc: stable@vger.kernel.org
Fixes: 5c22fb85102a7 ("brcmfmac: add wowl gtk rekeying offload support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

brcmfmac: fix double free upon register_netdevice() failure

The function brcmf_net_attach() can only fail when register_netdevice()
fails. When this happens register_netdevice() calls priv_destructor, ie.
brcmf_cfg80211_free_netdev() freeing the vif instance. Also upon this
failure brcmf_net_attach() calls free_netdev(). However, callers are also
doing cleanup resulting in double free. In some places they need netdev
private space as it holds parameters to communicate with the device. So
we want to do the cleanup only in callers of brcmf_net_attach() by making
the following changes:

- set priv_destructor after register_netdevice() succeeds.
- remove call to free_netdev() in brcmf_net_attach().
- call free_netdev() in brcmf_net_detach() for unregistered netdev.
- add free_netdev() if brcmf_net_attach() fails for a created interface.

Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.")
Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com>
Reviewed-by: Franky Lin <franky.lin@broadcom.com>
Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>

net/mlx4: fix spelling mistake: "coalesing" -> "coalescing"

Trivial fix to spelling mistake in en_dbg debug message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-add-netlink_ext_ack-support-to-rtnl_link_ops'

Matthias Schiffer says:

====================
net: add netlink_ext_ack support to rtnl_link_ops

Same changes as http://patchwork.ozlabs.org/patch/780351/ , split into
separate patches for each rtnl_link_ops field as requested.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: add netlink_ext_ack argument to rtnl_link_ops.slave_validate

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add netlink_ext_ack argument to rtnl_link_ops.slave_changelink

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add netlink_ext_ack argument to rtnl_link_ops.validate

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add netlink_ext_ack argument to rtnl_link_ops.changelink

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add netlink_ext_ack argument to rtnl_link_ops.newlink

Add support for extended error reporting.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: macb: add fixed-link node support

In case the MACB is directly connected to a
non-mdio PHY/device, it should be possible to provide
a fixed link configuration in the DT.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-next-for-davem-2017-06-25' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
wireless-drivers-next patches for 4.13

New features and bug fixes to quite a few different drivers, but
nothing really special standing out.

What makes me happy that we have now more vendors actively
contributing to upstream drivers. In this pull request we have patches
from Broadcom, Intel, Qualcomm, Realtek and Redpine Signals, and I
still have patches from Marvell and Quantenna pending in patchwork. Now
that's something comparing to how things looked 11 years ago in Jeff
Garzik's "State of the Union: Wireless" email:

https://lkml.org/lkml/2006/1/5/671

Major changes:

wil6210

* add low level RF sector interface via nl80211 vendor commands

* add module parameter ftm_mode to load separate firmware for factory
testing

* support devices with different PCIe bar size

* add support for PCIe D3hot in system suspend

* remove ioctl interface which should not be in a wireless driver

ath10k

* go back to using dma_alloc_coherent() for firmware scratch memory

* add per chain RSSI reporting

brcmfmac

* add support multi-scheduled scan

* add scheduled scan support for specified BSSIDs

* add support for brcm43430 revision 0

wlcore

* add wil1285 compatible

rsi

* add RS9113 USB support

iwlwifi

* FW API documentation improvements (for tools and htmldoc)

* continuing work for the new A000 family

* bump the maximum supported FW API to 31

* improve the differentiation between 8000, 9000 and A000 families
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sctp-RFC-4960-Errata-fixes'

Marcelo Ricardo Leitner says:

====================
sctp: RFC 4960 Errata fixes

This patchset contains fixes for 4 Errata topics from
https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01
Namely, sections:
3.12. Order of Adjustments of partial_bytes_acked and cwnd
3.22. Increase of partial_bytes_acked in Congestion Avoidance
3.26. CWND Increase in Congestion Avoidance Phase
3.27. Refresh of cwnd and ssthresh after Idle Period

Tests performed with netperf using net namespaces, with drop rates at
0%, 0.5% and 1% by netem, IPv4 and IPv6, 10 runs for each combination.
I couldn't spot differences on the stats. With and without these patches
the results vary in a similar way in terms of throughput and
retransmissions.

Tests with 20ms delay and 20ms delay + drops at 0.5% and 1% also had
results in a similar way, no noticeable difference.

Looking at cwnd, it was possible to notice slightly lower values being
used while still sustaining same throughput profile.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: adjust ssthresh when transport is idle

RFC 4960 Errata 3.27 identifies that ssthresh should be adjusted to cwnd
because otherwise it could cause the transport to lock into congestion
avoidance phase specially if ssthresh was previously reduced by some
packet drop, leading to poor performance.

The Errata says to adjust ssthresh to cwnd only once, though the same
goal is achieved by updating it every time we update cwnd too. The
caveat is that we could take longer to get back up to speed but that
should be compensated by the fact that we don't adjust on RTO basis (as
RFC says) but based on Heartbeats, which are usually way longer.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.27
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: adjust cwnd increase in Congestion Avoidance phase

RFC4960 Errata 3.26 identified that at the same time RFC4960 states that
cwnd should never grow more than 1*MTU per RTT, Section 7.2.2 was
underspecified and as described could allow increasing cwnd more than
that.

This patch updates it so partial_bytes_acked is maxed to cwnd if
flight_size doesn't reach cwnd, protecting it from such case.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.26
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: allow increasing cwnd regardless of ctsn moving or not

As per RFC4960 Errata 3.22, this condition is not needed anymore as it
could cause the partial_bytes_acked to not consider the TSNs acked in
the Gap Ack Blocks although they were received by the peer successfully.

This patch thus drops the check for new Cumulative TSN Ack Point,
leaving just the flight_size < cwnd one.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.22
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: update order of adjustments of partial_bytes_acked and cwnd

RFC4960 Errata 3.12 says RFC4960 is unclear about the order of
adjustments applied to partial_bytes_acked and cwnd in the congestion
avoidance phase, and that the actual order should be:
partial_bytes_acked is reset to (partial_bytes_acked - cwnd). Next, cwnd
is increased by MTU.

We were first increasing cwnd, and then subtracting the new value pba,
which leads to a different result as pba is smaller than what it should
and could cause cwnd to not grow as much.

See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-rfc4960-errata-01#section-3.12
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Remove ndo_dfwd_start_xmit

Looks like commit f663dd9aaf9e ("net: core: explicitly select a txq before doing l2 forwarding")
has removed the need for this dedicated xmit function [it even explicitly
states so in its commit log message] but it hasn't removed the definition
of the ndo.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
CC: Jason Wang <jasowang@redhat.com>
CC: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qcom-emac-various-minor-improvements'

Timur Tabi says:

====================
net: qcom/emac: various minor improvements

A collection of minor fixes and features to the Qualcomm Technologies
EMAC network driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: qcom/emac: add support for emulation systems

On emulation systems, the EMAC's internal PHY ("SGMII") is not present,
but is not needed for network functionality. So just display a warning
message and ignore the SGMII.

Tested-by: Philip Elcan <pelcan@codeaurora.org>
Tested-by: Adam Wallis <awallis@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qcom/emac: do not reset the EMAC during initialization

On ACPI systems, the driver depends on firmware pre-initializing the
EMAC because we don't have access to the clocks, and the EMAC has specific
clock programming requirements. Therefore, we don't want to reset the
EMAC while we are completing the initialization.

Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qcom/emac: add shutdown function

The shutdown function halts all DMA and interrupts, so that all
operations are discontinued when the system shuts down, e.g. via
kexec or a forced reboot.

Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

af_iucv: Move sockaddr length checks to before accessing sa_family in bind and connect handlers

Verify that the caller-provided sockaddr structure is large enough to
contain the sa_family field, before accessing it in bind() and connect()
handlers of the AF_IUCV socket. Since neither syscall enforces a minimum
size of the corresponding memory region, very short sockaddrs (zero or
one byte long) result in operating on uninitialized memory while
referencing .sa_family.

Fixes: 52a82e23b9f2 ("af_iucv: Validate socket address length in iucv_sock_bind()")
Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com>
[jwi: removed unneeded null-check for addr]
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/iucv: improve endianness handling

Use proper endianness conversion for an skb protocol assignment. Given
that IUCV is only available on big endian systems (s390), this simply
avoids an endianness warning reported by sparse.

Signed-off-by: Hans Wippel <hwippel@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: fix error code in mv88e6390_serdes_power()

We're accidentally returning the wrong variable. "cmode" is
uninitialized at this point so it causes a static checker warning.

Fixes: 6335e9f2446b ("net: dsa: mv88e6xxx: mv88e6390X SERDES support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'nfp-add-flower-app-with-representors'

Simon Horman says:

====================
nfp: add flower app with representors

this series adds a flower app to the NFP driver.
It initialises four types of netdevs:

* PF netdev - lower-device for communication of packets to device
* PF representor netdev
* VF representor netdevs
* Phys port representor netdevs

The PF netdev acts as a lower-device which sends and receives packets to
and from the firmware. The representors act as upper-devices. For TX
representors attach a metadata dst to the skb which is used by the PF
netdev to prepend metadata to the packet before forwarding the firmware. On
RX the PF netdev looks up the representor based on the prepended metadata
received from the firmware and forwards the skb to the representor after
removing the metadata.

Control queues are used to send and receive control messages which are
used to communicate configuration information with the firmware. These
are in separate vNIC to the queues belonging to the PF netdev. The control
queues are not exposed to use-space via a netdev or any other means.

The first 9 patches of this series provide app-independent infrastructure
to instantiate representors and the remaining 3 patches provide an app
which uses this infrastructure.

As the name implies this app is targeted at providing offload of TC flower.
Flower offload - allowing classifiers to be attached to representor netdevs
- is intended to be provided by follow-up patches at which point it will
become the dominant feature of the app.

Minor changes since v2 noted in changelogs of individual patches.
Review of v1 and v2 of this patchset have been addressed either
through discussion on-list or changes in this patchset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add VF and PF representors to flower app

Initialise VF and PF representors in flower app.

Based in part on work by Benjamin LaHaise, Bert van Leeuwen and
Jakub Kicinski.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add flower app

Add app for flower offload. At this point the PF netdev and phys port
representor netdevs are initialised. Follow-up work will add support for
VF and PF representors and beyond that offloading the flower classifier.

Based in part on work by Benjamin LaHaise and Bert van Leeuwen.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add support for control messages for flower app

In preparation for adding a new flower app - targeted at offloading
the flower classifier - provide support for control message that it will
use to communicate with the NFP.

Based in part on work by Bert van Leeuwen.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add support for tx/rx with metadata portid

Allow tx/rx with metadata port id. This will be used for tx/rx of
representor netdevs acting as upper-devices while a pf netdev acts
as a lower-device.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: provide nfp_port to of nfp_net_get_mac_addr()

Provide port rather than vNIC as parameter of nfp_net_get_mac_addr.
This is to allow this function to be used by representor netdevs where
a vNIC may have more than one physical port none of which are associated
with the vNIC.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: app callbacks for SRIOV

Add app-callbacks for app-specific initialisation of SRIOV.

Disabling SRIOV is brought forward in nfp_pci_remove()
so that nfp_app_sriov_disable is called while the app still exists.

This is intended to be used to implement representor netdevs for virtual
ports.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add stats and xmit helpers for representors

Provide helpers for stats and xmit on representor netdevs.

Parts based on work by Bert van Leeuwen, Benjamin LaHaise and
Jakub Kicinski.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: general representor implementation

Provide infrastructure to create and destroy representors of a given type.

Parts based on work by Bert van Leeuwen, Benjamin LaHaise,
and Jakub Kicinski.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: map mac_stats and vf_cfg BARs

If present map mac_stats and vf_cfg BARs. These will be used by
representor netdevs to read statistics for phys port and vf representors.

Also provide defines describing the layout of the mac_stats area.
Similar defines are already present for the cf_cfg area.

Based in part on work by Jakub Kicinski.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: move physical port init into a helper

Move MAC/PHY port init into a helper to make it easier to reuse
it in the representor code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: devlink add support for getting eswitch mode

Add app callback for reporting eswitch mode. Non-SRIOV apps
should not implement this callback, nfp_app code will then
respond with -EOPNOTSUPP.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: store port/representator id in metadata_dst

Switches and modern SR-IOV enabled NICs may multiplex traffic from Port
representators and control messages over single set of hardware queues.
Control messages and muxed traffic may need ordered delivery.

Those requirements make it hard to comfortably use TC infrastructure today
unless we have a way of attaching metadata to skbs at the upper device.
Because single set of queues is used for many netdevs stopping TC/sched
queues of all of them reliably is impossible and lower device has to
retreat to returning NETDEV_TX_BUSY and usually has to take extra locks on
the fastpath.

This patch attempts to enable port/representative devs to attach metadata
to skbs which carry port id. This way representatives can be queueless and
all queuing can be performed at the lower netdev in the usual way.

Traffic arriving on the port/representative interfaces will be have
metadata attached and will subsequently be queued to the lower device for
transmission. The lower device should recognize the metadata and translate
it to HW specific format which is most likely either a special header
inserted before the network headers or descriptor/metadata fields.

Metadata is associated with the lower device by storing the netdev pointer
along with port id so that if TC decides to redirect or mirror the new
netdev will not try to interpret it.

This is mostly for SR-IOV devices since switches don't have lower netdevs
today.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phy-internal'

Florian Fainelli says:

====================
net: phy: Support "internal" PHY interface

This makes the "internal" phy-mode property generally available and
documented and this allows us to remove some custom parsing code
we had for bcmgenet and bcm_sf2 which both used that specific value.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: bcm_sf2: Remove special handling of "internal" phy-mode

The PHY library now supports an "internal" phy-mode, thus making our
custom parsing code now unnecessary.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bcmgenet: Remove special handling of "internal" phy-mode

The PHY library now supports an "internal" phy-mode, thus making our
custom parsing code now unnecessary.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Support "internal" PHY interface

Now that the Device Tree binding has been updated, update the PHY
library phy_interface_t and phy_modes to support the "internal" PHY
interface type.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: Add "internal" as a valid 'phy-mode' property

A number of Ethernet MACs have internal Ethernet PHYs and the internal
wiring makes it so that this knowledge needs to be available using the
standard 'phy-mode' property.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'mlx5-updates-2017-06-23' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-06-23

This series provides some updates to the mlx5 core and netdevice drivers.

Three patches from Tariq, Introduces page reuse mechanism in non-Striding
RQ RX datapath, we allow the the RX descriptor to reuse its allocated page
as much as it could, until the page is fully consumed. RX page reuse
reduces the stress on page allocator and improves RX performance especially
with high speeds (100Gb/s).

Next four patches of the series from Or allows to offload tc flower matching
on ttl/hoplimit and header re-write of hoplimit.

The rest of the series from Yotam and Or enhances mlx5 to support FW flashing
through the mlxfw module, in a similar manner done by the mlxsw driver.
Currently, only ethtool based flashing is implemented, where both Eth and IB ports
are supported.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Use Firmware params to get buffer-group map

Buffer group mappings can be obtained using FW_PARAMs cmd for newer FW.

Since some of the bg_maps are obtained in atomic context, created another
t4_query_params_ns(), that wont sleep when awaiting mbox cmd completion.

Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Update T6 Buffer Group and Channel Mappings

We were using t4_get_mps_bg_map() for both t4_get_port_stats()
to determine which MPS Buffer Groups to report statistics on for a given
Port, and also for t4_sge_alloc_rxq() to provide a TP Ingress Channel
Congestion Map. For T4/T5 these are actually the same values (because they
are ~somewhat~ related), but for T6 they should return different values
(T6 has Port 0 associated with MPS Buffer Group 0 (with MPS Buffer Group 1
silently cascading off) and Port 1 is associated with MPS Buffer Group 2
(with 3 cascading off)).

Based on the original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tls: return -EFAULT if copy_to_user() fails

The copy_to_user() function returns the number of bytes remaining but we
want to return -EFAULT here.

Fixes: 3c4d7559159b ("tls: kernel TLS support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2017-06-23

1) Use memdup_user to spmlify xfrm_user_policy.
   From Geliang Tang.

2) Make xfrm_dev_register static to silence a sparse warning.
   From Wei Yongjun.

3) Use crypto_memneq to check the ICV in the AH protocol.
   From Sabrina Dubroca.

4) Remove some unused variables in esp6.
   From Stephen Hemminger.

5) Extend XFRM MIGRATE to allow to change the UDP encapsulation port.
   From Antony Antony.

6) Include the UDP encapsulation port to km_migrate announcements.
   From Antony Antony.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ena-new-features-and-improvements'

Netanel Belgazal says:

====================
net: update ena ethernet driver to version 1.2.0

This patchset contains some new features/improvements that were added
to the ENA driver to increase its robustness and are based on
experience of wide ENA deployment.

Change log:

V2:
* Remove patch that add inline to C-file static function (contradict coding style).
* Remove patch that moves MTU parameter validation in ena_change_mtu() instead of
using the network stack.
* Use upper_32_bits()/lower_32_bits() instead of casting.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: update ena driver to version 1.2.0

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: update driver's rx drop statistics

rx drop counter is reported by the device in the keep-alive
event.
update the driver's counter with the device counter.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: use lower_32_bits()/upper_32_bits() to split dma address

In ena_com_mem_addr_set(), use the above functions to split dma address
to the lower 32 bits and the higher 16 bits.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: separate skb allocation to dedicated function

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: use napi_schedule_irqoff when possible

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: allow the driver to work with small number of msix vectors

Current driver tries to allocate msix vectors as the number of the
negotiated io queues. (with another msix vector for management).
If pci_alloc_irq_vectors() fails, the driver aborts the probe
and the ENA network device is never brought up.

With this patch, the driver's logic will reduce the number of IO
queues to the number of allocated msix vectors (minus one for management)
instead of failing probe().

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: add support for out of order rx buffers refill

ENA driver post Rx buffers through the Rx submission queue
for the ENA device to fill them with receive packets.
Each Rx buffer is marked with req_id in the Rx descriptor.

Newer ENA devices could consume the posted Rx buffer in out of order,
and as result the corresponding Rx completion queue will have Rx
completion descriptors with non contiguous req_id(s)

In this change the driver holds two rings.
The first ring (called free_rx_ids) is a mapping ring.
It holds all the unused request ids.
The values in this ring are from 0 to ring_size -1.

When the driver wants to allocate a new Rx buffer it uses the head of
free_rx_ids and uses it's value as the index for rx_buffer_info ring.
The req_id is also written to the Rx descriptor

Upon Rx completion,
The driver took the req_id from the completion descriptor and uses it
as index in rx_buffer_info.
The req_id is then return to the free_rx_ids ring.

This patch also adds statistics to inform when the driver receive out
of range or unused req_id.

Note:
free_rx_ids is only accessible from the napi handler, so no locking is
required

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: add reset reason for each device FLR

For each device reset, log to the device what is the cause
the reset occur.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: change sizeof() argument to be the type pointer

Instead of using:
memset(ptr, 0x0, sizeof(struct ...))
use:
memset(ptr, 0x0, sizeor(*ptr))

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: add hardware hints capability to the driver

With this patch, ENA device can update the ena driver about
the desired timeout values:
These values are part of the "hardware hints" which are transmitted
to the driver as Asynchronous event through ENA async
event notification queue.

In case the ENA device does not support this capability,
the driver will use its own default values.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ena: change return value for unsupported features unsupported return value

return -EOPNOTSUPP instead of -EPERM.

Signed-off-by: Netanel Belgazal <netanel@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix out-of-bounds access in ULP sysctl

KASAN reports out-of-bound access in proc_dostring() coming from
proc_tcp_available_ulp() because in case TCP ULP list is empty
the buffer allocated for the response will not have anything
printed into it. Set the first byte to zero to avoid strlen()
going out-of-bounds.

Fixes: 734942cc4ea6 ("tcp: ULP infrastructure")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: possibly avoid extra masking for narrower load in verifier

Commit 31fd85816dbe ("bpf: permits narrower load from bpf program
context fields") permits narrower load for certain ctx fields.
The commit however will already generate a masking even if
the prog-specific ctx conversion produces the result with
narrower size.

For example, for __sk_buff->protocol, the ctx conversion
loads the data into register with 2-byte load.
A narrower 2-byte load should not generate masking.
For __sk_buff->vlan_present, the conversion function
set the result as either 0 or 1, essentially a byte.
The narrower 2-byte or 1-byte load should not generate masking.

To avoid unnecessary masking, prog-specific *_is_valid_access
now passes converted_op_size back to verifier, which indicates
the valid data width after perceived future conversion.
Based on this information, verifier is able to avoid
unnecessary marking.

Since we want more information back from prog-specific
*_is_valid_access checking, all of them are packed into
one data structure for more clarity.

Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: make some functions static

The functions dwmac4_dma_init_rx_chan, dwmac4_dma_init_tx_chan and
dwmac4_dma_init_channel do not need to be in global scope, so them
static.

Cleans up sparse warnings:
"symbol 'dwmac4_dma_init_rx_chan' was not declared. Should it be static?"
"symbol 'dwmac4_dma_init_tx_chan' was not declared. Should it be static?"
"symbol 'dwmac4_dma_init_channel' was not declared. Should it be static?"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'xdp-offload-mode'

Jakub Kicinski says:

====================
xdp: offload mode

While we discuss the representors.. :)

This set adds XDP flag for forcing offload and a attachment mode
for reporting to user space that program has been offloaded.  The
nfp driver is modified to make use of the new flags, but also to
adhere to the DRV_MODE flag which should disable the HW offload.

The intended driver behaviour is:
            DRV mode   offload
no flags      yes     attempted
DRV_MODE      yes        no
HW_MODE      no         yes

Where 'yes' means required, and error will be returned if setup fails.
'Attempted' means the offload will only happen automatically if HW is
capable and offloading the program will cause no change in system
behaviour (e.g. maps don't have to bound).

Thanks to loading the program both to the driver and HW by default we
can fallback to the driver mode without disruption in case user replaces
the program with one which cannot be offloaded later.

Note that the NFP driver currently claims XDP offload support but
lacks most basic features like direct packet access.

Only change compared to the RFC is fixing the double bpf_prog_put()
which Daniel has spotted (patch 5).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: xdp: report if program is offloaded

Make use of just added XDP_ATTACHED_HW.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xdp: add reporting of offload mode

Extend the XDP_ATTACHED_* values to include offloaded mode.
Let drivers report whether program is installed in the driver
or the HW by changing the prog_attached field from bool to
u8 (type of the netlink attribute).

Exploit the fact that the value of XDP_ATTACHED_DRV is 1,
therefore since all drivers currently assign the mode with
double negation:
mode = !!xdp_prog;
no drivers have to be modified.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: add support for XDP_FLAGS_HW_MODE

Respect the XDP_FLAGS_HW_MODE. When it's set install the program
on the NIC and skip enabling XDP in the driver.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: release the reference on offloaded programs

The xdp_prog member of the adapter's data path structure is used
for XDP in driver mode.  In case a XDP program is loaded with in
HW-only mode, we need to store it somewhere else.  Add a new XDP
prog pointer in the main structure and use that when we need to
know whether any XDP program is loaded, not only a driver mode
one.  Only release our reference on adapter free instead of
immediately after netdev unregister to allow offload to be disabled
first.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: bpf: don't offload XDP programs in DRV_MODE

DRV_MODE means that user space wants the program to be run in
the driver. Do not try to offload. Only offload if no mode
flags have been specified.

Remember what the mode is when the program is installed and refuse
new setup requests if there is already a program loaded in a
different mode. This should leave it open for us to implement
simultaneous loading of two programs - one in the drv path and
another to the NIC later.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: xdp: move driver XDP setup into a separate function

In preparation of XDP offload flags move the driver setup into
a function. Otherwise the number of conditions in one function
would make it slightly hard to follow. The offload handler may
now be called with NULL prog, even if no offload is currently
active, but that's fine, offload code can handle that.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xdp: add HW offload mode flag for installing programs

Add an installation-time flag for requesting that the program
be installed only if it can be offloaded to HW.

Internally new command for ndo_xdp is added, this way we avoid
putting checks into drivers since they all return -EINVAL on
an unknown command.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

xdp: pass XDP flags into install handlers

Pass XDP flags to the xdp ndo. This will allow drivers to look
at the mode flags and make decisions about offload.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: fix poll()

Michael reported an UDP breakage caused by the commit b65ac44674dd
("udp: try to avoid 2 cache miss on dequeue").
The function __first_packet_length() can update the checksum bits
of the pending skb, making the scratched area out-of-sync, and
setting skb->csum, if the skb was previously in need of checksum
validation.

On later recvmsg() for such skb, checksum validation will be
invoked again - due to the wrong udp_skb_csum_unnecessary()
value - and will fail, causing the valid skb to be dropped.

This change addresses the issue refreshing the scratch area in
__first_packet_length() after the possible checksum update.

Fixes: b65ac44674dd ("udp: try to avoid 2 cache miss on dequeue")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp/v6: prefetch rmem_alloc in udp6_queue_rcv_skb()

very similar to commit dd99e425be23 ("udp: prefetch
rmem_alloc in udp_queue_rcv_skb()"), this allows saving a cache
miss when the BH is bottle-neck for UDP over ipv6 packet
processing, e.g. for small packets when a single RX NIC ingress
queue is in use.

Performances under flood when multiple NIC RX queues used are
unaffected, but when a single NIC rx queue is in use, this
gives ~8% performance improvement.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-mvpp2-misc-improvements'

Thomas Petazzoni says:

====================
net: mvpp2: misc improvements

Here are a few patches making various small improvements/refactoring
in the mvpp2 driver. They are based on today's net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: remove mvpp2_pool_refill()

When all a function does is calling another function with the exact same
arguments, in the exact same order, you know it's time to remove said
function. Which is exactly what this commit does.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: remove unused mvpp2_bm_cookie_pool_set() function

This function is not used in the driver, remove it.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvpp2: add comments about smp_processor_id() usage

A previous commit modified a number of smp_processor_id() used in
migration-enabled contexts into get_cpu/put_cpu sections. However, a few
smp_processor_id() calls remain in the driver, and this commit adds
comments explaining why they can be kept.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'stmmac-pci-Refactor-DMI-probing'

Jan Kiszka says:

====================
stmmac: pci: Refactor DMI probing

Some cleanups of the way we probe DMI platforms in the driver. Reduces
a bit of open-coding and makes the logic easier reusable for any
potential DMI platform != Quark.

Tested on IOT2000 and Galileo Gen2.

Changes in v5:
- fixed a remaining issue in patch 5
- dropped patch 6 for now
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: pci: Use dmi_system_id table for retrieving PHY addresses

Avoids reimplementation of DMI matching in stmmac_pci_find_phy_addr.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: pci: Select quark_pci_dmi_data from quark_default_data

No need to carry this reference in stmmac_pci_info - the Quark-specific
setup handler knows that it needs to use the Quark-specific DMI table.
This also allows to drop the stmmac_pci_info reference from the setup
handler parameter list.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: pci: Make stmmac_pci_find_phy_addr truly generic

Move the special case for the early Galileo firmware into
quark_default_setup. This allows to use stmmac_pci_find_phy_addr for
non-quark cases.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: pci: Use stmmac_pci_info for all devices

Make stmmac_default_data compatible with stmmac_pci_info.setup and use
an info structure for all devices. This allows to make the probing more
regular.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: pci: Make stmmac_pci_info structure constant

By removing the PCI device reference from the structure and passing it
as parameters to the interested functions, we can make quark_pci_info
const.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: Fix the carrier state error when data path is off

When the VF NIC is opened, the synthetic NIC's carrier state is set to
off. This tells the host to transitions data path to the VF device. But
if startup script or user manipulates the admin state of the netvsc
device directly for example:
# ifconfig eth0 down
# ifconfig eth0 up
Then the carrier state of the synthetic NIC would be on, even though the
data path was still over the VF NIC. This patch sets the carrier state
of synthetic NIC with consideration of the related VF state.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: Remove unnecessary var link_state from struct netvsc_device_info

We simply use rndis_device->link_state in the netdev_dbg. The variable,
link_state from struct netvsc_device_info, is not used anywhere else.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: fix a build problem

tracex5_kern.c build failed with the following error message:
../samples/bpf/tracex5_kern.c:12:10: fatal error: 'syscall_nrs.h' file not found
#include "syscall_nrs.h"
The generated file syscall_nrs.h is put in build/samples/bpf directory,
but this directory is not in include path, hence build failed.

The fix is to add $(obj) into the clang compilation path.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rds-tcp-fixes'

Sowmini Varadhan says:

====================
rds: tcp: fixes

Patch1 is a bug fix for correct reconnect when a connection
is restarted. Patch 2 accelerates cleanup by setting linger
to 1 and sending a RST to the peer.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rds: tcp: set linger to 1 when unloading a rds-tcp

If we are unloading the rds_tcp module, we can set linger to 1
and drop pending packets to accelerate reconnect. The peer will
end up resetting the connection based on new generation numbers
of the new incarnation, so hanging on to unsent TCP packets via
linger is mostly pointless in this case.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Jenny Xu <jenny.x.xu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rds: tcp: send handshake ping-probe from passive endpoint

The RDS handshake ping probe added by commit 5916e2c1554f
("RDS: TCP: Enable multipath RDS for TCP") is sent from rds_sendmsg()
before the first data packet is sent to a peer. If the conversation
is not bidirectional (i.e., one side is always passive and never
invokes rds_sendmsg()) and the passive side restarts its rds_tcp
module, a new HS ping probe needs to be sent, so that the number
of paths can be re-established.

This patch achieves that by sending a HS ping probe from
rds_tcp_accept_one() when c_npaths is 0 (i.e., we have not done
a handshake probe with this peer yet).

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Jenny Xu <jenny.x.xu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Correct return code checking for ibmvnic_init during probe

The update to ibmvnic_init to allow an EAGAIN return code broke
the calling of ibmvnic_init from ibmvnic_probe. The code now
will return from this point in the probe routine if anything
other than EAGAIN is returned. The check should be to see if rc
is non-zero and not equal to EAGAIN.

Without this fix, the vNIC driver can return 0 (success) from
its probe routine due to ibmvnic_init returning zero, but before
completing the probe process and registering with the netdev layer.

Fixes: 6a2fb0e99f9c (ibmvnic: driver initialization for kdump/kexec)
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ibmvnic-Correct-long-term-mapped-buffer-error-handling'

Thomas Falcon says:

====================
ibmvnic: Correct long-term-mapped buffer error handling

This patch set fixes the error-handling of long-term-mapped buffers
during adapter initialization and reset. The first patch fixes a bug
in an incorrectly defined descriptor that was keeping the return
codes from the VIO server from being properly checked. The second patch
fixes and cleans up the error-handling implementation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Fix error handling when registering long-term-mapped buffers

The patch stores the return code of the REQUEST_MAP_RSP sub-CRQ command
in the private data structure, where it can be later checked during
device open or a reset.

In the case of a reset, the mapping request to the vNIC Server may fail,
especially in the case of a partition migration. The driver attempts to
handle this by re-allocating the buffer and re-sending the mapping request.

The original error handling implementation was removed. The separate
function handling the REQUEST_MAP response message was also removed,
since it is now simple enough to be handled in the ibmvnic_handle_crq
function.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Fix incorrectly defined ibmvnic_request_map_rsp structure

This reserved area should be eight bytes in length instead of four.
As a result, the return codes in the REQUEST_MAP_RSP descriptors
were not being properly handled.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Add a tcp_filter hook before handle ack packet

Currently in both ipv4 and ipv6 code path, the ack packet received when
sk at TCP_NEW_SYN_RECV state is not filtered by socket filter or cgroup
filter since it is handled from tcp_child_process and never reaches the
tcp_filter inside tcp_v4_rcv or tcp_v6_rcv. Adding a tcp_filter hooks
here can make sure all the ingress tcp packet can be correctly filtered.

Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>