git.karo-electronics.de Git - linux-beck.git/log

ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it

eBPF defines this as BPF_TUNLEN_MAX and OVS just uses the hard-coded
value inside struct sw_flow_key. Thus, add and use IP_TUNNEL_OPTS_MAX
for this, which makes the code a bit more generic and allows to remove
BPF_TUNLEN_MAX from eBPF code.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf, dst: add and use dst_tclassid helper

We can just add a small helper dst_tclassid() for retrieving the
dst->tclassid value. It makes the code a bit better in that we can
get rid of the ifdef from filter.c by moving this into the header.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: make skb->tc_classid also readable

Currently, the tc_classid from eBPF skb context is write-only, but there's
no good reason for tc programs to limit it to write-only. For example,
it can be used to transfer its state via tail calls where the resulting
tc_classid gets filled gradually.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: bm: clarify dependencies

MVNETA_BM has a dependency on MVNETA, so we can only select the former
if the latter is enabled. However, the code dependency is the reverse:
The mvneta module can call into the mvneta_bm module, so mvneta cannot
be a built-in if mvneta_bm is a module, or we get a link error:

drivers/net/built-in.o: In function `mvneta_remove':
drivers/net/ethernet/marvell/mvneta.c:4211: undefined reference to `mvneta_bm_pool_destroy'
drivers/net/built-in.o: In function `mvneta_bm_update_mtu':
drivers/net/ethernet/marvell/mvneta.c:1034: undefined reference to `mvneta_bm_bufs_free'

This avoids the problem by further clarifying the dependency so that
MVNETA_BM is a silent Kconfig option that gets turned on by the
new MVNETA_BM_ENABLE option. This way both the core HWBM module and
the MVNETA_BM code are always built-in when needed.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: dc35a10f68d3 ("net: mvneta: bm: add support for hardware buffer management")
Signed-off-by: David S. Miller <davem@davemloft.net>

cls_bpf: reset class and reuse major in da

There are two issues with the current code. First one is that we need
to set res->class to 0 in case we use non-default classid matching.

This is important for the case where cls_bpf was initially set up with
an optional binding to a default class with tcf_bind_filter(), where
the underlying qdisc implements bind_tcf() that fills res->class and
tests for it later on when doing the classification. Convention for
these cases is that after tc_classify() was called, such qdiscs (atm,
drr, qfq, cbq, hfsc, htb) first test class, and if 0, then they lookup
based on classid.

Second, there's a bug with da mode, where res->classid is only assigned
a 16 bit minor, but it needs to expand to the full 32 bit major/minor
combination instead, therefore we need to expand with the bound major.
This is fine as classes belonging to a classful qdisc must share the
same major.

Fixes: 045efa82ff56 ("cls_bpf: introduce integrated actions")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ldmvsw'

Aaron Young says:

====================
ldmvsw: Add ldmvsw driver

This series adds a new Logical Domains vSwitch (ldmvsw) driver.

The ldmvsw driver code will live in the drivers/net/ethernet/sun/
directory and will operate on Oracle systems running SPARC Linux in a
Logical Domains environment (typically in the control domain).

The ldmvsw driver is very similar in function to the existing sunvnet
driver. Ldmvsw creates a network interface for each "vsw-port" node
found in the Machine Description (MD) of a service domain. These
nodes correspond to ports on a vswitch created by the logical domains
manager. The created network interface(s) can be used by bridge/vswitch
software (such as the Linux bridge or Open vSwitch) to provide
guest domain(s) with network interconnectivity or connectivity
to a physical network.

Here is a example diagram of ldmvsw driver usage in a logical
domain environment to provide a guest domain with network connectivity
to a physical NIC on the service domain:

   +----------------+             +-----------------
   | Service Domain |             |  Guest domain  |
   |                |             |                |
   |  LinuxBridge   |             |                |
   |    |    |      |             |                |
   |   NIC Ldmvsw   |             |    Sunvnet     |
   +----------------+             +----------------+
        |    |           LDC              |
       LAN   ------------------------------

As stated, the sunvnet and ldmvsw drivers are _very_ similar in function.
They both create network interface(s) to receive/transmit network
traffic across LDC network channel(s). Since the driver is so similar
in function to sunvnet, the approach will be as follows to integrate
the driver and take advantage of common code:

Patch #1: Split sunvnet.c driver into sunvnet.c and sunvnet_common.c
Patch #2: Modify the sunvnet_common code and data structures to be compatible
          with both the sunvnet and ldmvsw drivers.
Patch #3: Add the new ldmvsw.c driver code
Patch #4: Checkpatch cleanup of the sunvnet/sunvnet_common code.

NOTE - Patch#1 renames a file (sunvnet.h -> sunvnet_common.h). When generating
the patches (using git format-patch), I had to use the --no-renames option
otherwise patch#1 would NOT apply using 'patch -p1' - which as I
understand is a requirement for patch acceptance. I wasn't sure if this
is proper thing to do.  Please advise if not. Thanks.

v2 changes:
  * change all EXPORT_SYMBOL declarations to EXPORT_SYMBOL_GPL
  * remove inline attribute for external function port_is_up_common()
  * Give all exported/global funcs in sunvnet_common.c a 'sunvnet_' prefix
    to avoid kernel global namespace pollution/collisions
  * ldmvsw.c: Order local variable declarations from longest to shortest line
  * ldmvsw.c: register the netdevice after all supporting state is ready/setup.
              NOTE: The consensus at Oracle is that the following functions
                    must be done AFTER register_netdev() - this is the same
                    ordering currently used in the sunvnet driver:
                    1. sunvnet_port_add_txq_common() - needs registered netdev
                    2. napi_enable() - requires registered netdev
                    3. vio_port_up() - as soon as this function is called
                                       LDC handshake messages will come in
                                       which must be handled by the napi code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c

Checkpatch updates for sunvnet.c and sunvnet_common.c.

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Rashmi Narasimhan <rashmi.narasimhan@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ldmvsw: Add ldmvsw.c driver code

  Add ldmvsw.c driver

  Details:

  The ldmvsw driver very closely follows the sunvnet.c code and makes
  use of the sunvnet_common.c code for core functionality.

  A significant difference between sunvnet and ldmvsw driver is
  sunvnet creates a network interface for each vnet-port *parent*
  node in the MD while the ldmvsw driver creates a network interface
  for every vsw-port node in the Machine Description (MD).
  Therefore the netdev_priv() for sunvnet is a vnet structure while
  the netdev_priv() for ldmvsw is a vnet_port structure.

  Vnet_port structures allocated by ldmvsw have the vsw bit set.
  When finding the net_device associated with a port, the common code keys
  off this bit to use either the net_device found in the vnet_port or the
  net_device in the vnet structure (see the VNET_PORT_TO_NET_DEVICE() macro in
  sunvnet_common.h). This scheme allows the common code to work with
  both drivers with minimal changes.

  Similar to Xen, network interfaces created by the ldmvsw driver will always
  have a HW Addr (i.e. mac address) of FE:FF:FF:FF:FF:FF and each will be
  assigned the devname "vif<cfg_handle>.<port_id>" - where <cfg_handle> and
  <port_id> are a unique handle/port pair assigned to the associated
  vsw-port node in the MD.

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Rashmi Narasimhan <rashmi.narasimhan@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ldmvsw: Make sunvnet_common compatible with ldmvsw

  Modify sunvnet common code and data structures to be compatible
  with both sunvnet and ldmvsw drivers.

  Details:

  Sunvnet operates on "vnet-port" nodes which appear in the Machine
  Description (MD) in a guest domain. Ldmvsw operates on "vsw-port"
  nodes which appear in the MD of a service domain.

  A difference between the sunvnet driver and the ldmvsw driver is
  the sunvnet driver creates a network interface (i.e. a struct net_device)
  for every vnet-port *parent* "network" node. Several vnet-ports may appear
  under this common parent network node - each corresponding to a common parent
  network interface.  Conversely, since bridge/vswitch software will need
  to interface with every vsw-port in a system, the ldmvsw driver creates
  a network interface (i.e. a struct net_device) for every vsw-port - not
  every parent node as with sunvnet.  This difference required some special
  handling in the common code as explained below.

  There are 2 key data structures used by the sunvnet and ldmvsw drivers
  (which are now found in sunvnet_common.h):

  1. struct vnet_port
     This structure represents a vnet-port node in sunvnet and a vsw-port
     in the ldmvsw driver.

  2. struct vnet
     This structure represents a parent "network" node in sunvnet and a parent
     "virtual-network-switch" node in ldmvsw.

  Since the sunvnet driver allocates a net_device for every parent "network"
  node, a net_device member appears in the struct vnet. Since the ldmvsw
  driver allocates a net_device for every port, a net_device member was
  added to the vnet_port. The common code distinguishes which structure
  net_device member to use by checking a 'vsw' bit that was added to the
  vnet_port structure. See the VNET_PORT_TO_NET_DEVICE() marco in
  sunvnet_common.h.

  The netdev_priv() in sunvnet is allocated as a vnet. The netdev_priv()
  in ldmvsw is a vnet_port. Therefore, any place in the common code
  where a netdev_priv() call was made, a wrapper function was implemented
  in each driver to first get the vnet and/or vnet_port (in a driver
  specific way) and pass them as newly added parameters to the common
  functions (see wrapper funcs: vnet_set_rx_mode() and vnet_poll_controller()).
  Since these wrapper functions call __tx_port_find(), __tx_port_find() was
  moved from the common code back into sunvnet.c. Note - ldmvsw.c does not
  require this function.

  These changes also required that port_is_up() be made
  into a common function and thus it was given a _common suffix and
  exported like the other common functions.

  A wrapper function was also added for vnet_start_xmit_common() to pass a
  driver-specific function arg to return the port associated with a given
  struct sk_buff and struct net_device. This was required because
  vnet_start_xmit_common() grabs a lock prior to getting the associated
  port. Using a function pointer arg allowed the code to work unchanged
  without risking changes to the non-trivial locking logic in
  vnet_start_xmit_common().

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Rashmi Narasimhan <rashmi.narasimhan@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ldmvsw: Split sunvnet driver into common code

  Split sunvnet.c into sunvnet.c and sunvnet_common.c.

  Details:

  Since the sunvnet and ldmvsw drivers will both use common sunvnet code,
  move the functions (and support functions) anticipated to be common code
  from sunvnet.c to sunvnet_common.c. Similarly, sunvnet.h was renamed to
  sunvnet_common.h. The sunvnet_common.c code will be compiled into the
  kernel and act as a library of functions that are linked by either
  (or both) drivers when loaded.

  Function names for external functions in sunvnet_common.c (to be
  called by both the sunvnet and ldmvsw drivers) were tagged with a "_common"
  suffix to clearly designate them as common functions.

  No functional changes as of yet... just moved code verbatim to the new
  sunvnet_common.c/h files.

  Makefile/Kconfig support added to build sunvnet_common.c file. The code
  is included in the kernel if SUN_LDOMS is defined/selected.

  NOTE - per the SubmittingPatches documentation, since the code was just
  moved from one file another, the code was NOT checkpatch'd in this commit
  to aid in review.

Signed-off-by: Aaron Young <aaron.young@oracle.com>
Signed-off-by: Rashmi Narasimhan <rashmi.narasimhan@oracle.com>
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Alexandre Chartre <Alexandre.Chartre@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Prevent false warning for lack of FC NPIV

Not all adapters have FC-NPIV configured. If bnx2fc is used with such an
adapter, driver would read irrelevant data from the the nvram and log
"FC-NPIV table with bad length..." In system logs.

Simply accept that reading '0' as the feature offset in nvram indicates
the feature isn't there and return.

Reported-by: Andrew Patterson <andrew.patterson@hpe.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ravb: fix result value overwrite

The result value is overwritten by a return value of
ravb_ptp_interrupt().

Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com>
Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qlge: Fix receive packets drop.

When running small packets [length < 256 bytes] traffic, packets were
being dropped due to invalid data in those packets which were
delivered by the driver upto the stack. Using pci_dma_sync_single_for_cpu
ensures copying latest and updated data into skb from the receive buffer.

Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Don't search for phys if mdio node is defined.

If a dt mdio entry has been added least assume that we wont
search for phys attached. The DT and of_mdiobus_register already do
this. This stops DSA phys being found and phys created for them, as
this is handled by the DSA driver.

Signed-off-by: Phil Reid <preid@electromag.com.au>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mediatek: unlock on error in mtk_tx_map()

There was a missing unlock on the error path.

Fixes: 656e705243fd ('net-next: mediatek: add support for MT7623 ethernet')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: John Crispin <blogic@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

mediatek: checking for IS_ERR() instead of NULL

of_phy_connect() returns NULL on error, it never returns error pointers.

Fixes: 656e705243fd ('net-next: mediatek: add support for MT7623 ethernet')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: John Crispin <blogic@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: allow output of MPLS packets on tunnel vports

Currently output of MPLS packets on tunnel vports is not allowed by Open
vSwitch. This is because historically encapsulation was done in such a way
that the inner_protocol field of the skb needed to hold the inner protocol
for both MPLS and tunnel encapsulation in order for GSO segmentation to be
performed correctly.

Since b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of
vport") Open vSwitch makes use of lwt to output to tunnel netdevs which
perform encapsulation. As no drivers expose support for MPLS offloads this
means that GSO packets are segmented in software by validate_xmit_skb(),
which is called from __dev_queue_xmit(), before tunnel encapsulation occurs.
This means that the inner protocol of MPLS is no longer needed by the time
encapsulation occurs and the contention on the inner_protocol field of the
skb no longer occurs.

Thus it is now safe to output MPLS to tunnel vports.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jesse Gross <jesse@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdev: Move octeon/octeon_mgmt driver to cavium directory.

No code changes. Since OCTEON is a Cavium product, move the driver to
the vendor directory to unclutter things a bit.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ovs: internal_set_rx_headroom() can be static

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dst_cache_per_cpu_dst_set() can be static

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qmi_wwan: Added support for Gemalto's Cinterion PHxx WWAN interface

Added support for Gemalto's Cinterion PHxx WWAN interfaces
by adding QMI_FIXED_INTF with Cinterion's VID and PID.

PHxx can have:
2 RmNet Interfaces (PID 0x0082) or
1 RmNet + 1 USB Audio interface (PID 0x0083).

Signed-off-by: Hans-Christoph Schemmel <hans-christoph.schemmel@gemalto.com>
Acked-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp/dccp: remove obsolete WARN_ON() in icmp handlers

Now SYN_RECV request sockets are installed in ehash table, an ICMP
handler can find a request socket while another cpu handles an incoming
packet transforming this SYN_RECV request socket into an ESTABLISHED
socket.

We need to remove the now obsolete WARN_ON(req->sk), since req->sk
is set when a new child is created and added into listener accept queue.

If this race happens, the ICMP will do nothing special.

Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Ben Lazarus <blazarus@google.com>
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: propagate gso_max_segs

vlan drivers lack proper propagation of gso_max_segs from
lower device.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'thunderx-mdio-fixes'

David Daney says:

====================
net/phy: Fixes for Cavium Thunder MDIO code.

Previous patch set:
commit 5fc7cf179449 ("net: thunderx: Cleanup PHY probing code.")
commit 1eefee901fca ("phy: mdio-octeon: Refactor into two files/modules")
commit 379d7ac7ca31 ("phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.")

Had several problems. We try to fix them here.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: thunderx: Don't leak phy device references on -EPROBE_DEFER condition.

It is possible, although unlikely, that probing will find the
phy_device for the first LMAC of a thunder BGX device, but then need
to fail with -EPROBE_DEFER on a subsequent LMAC. In this case, we
need to call put_device() on each of the phy_devices that were
obtained, but will be unused due to returning -EPROBE_DEFER.

Also, since we can break out of the probing loop early, we need to
explicitly call of_node_put() outside of the loop.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cavium: For Kconfig THUNDER_NIC_BGX, select MDIO_THUNDER.

Previously we selected MDIO_OCTEON, which after creating the Thunder
specific MDIO bus driver is much less useful.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: mdio-cavium: Add missing MODULE_* annotations.

When the code was factored out of mdio-octeon.c, the
MODULE_DESCRIPTION, MODULE_AUTHOR and MODULE_LICENSE annotations were
inadvertently omitted. Restore them so that we don't get kernel taint
warnings upon module loading.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ppp: ensure file->private_data can't be overridden

Locking ppp_mutex must be done before dereferencing file->private_data,
otherwise it could be modified before ppp_unattached_ioctl() takes the
lock. This could lead ppp_unattached_ioctl() to override ->private_data,
thus leaking reference to the ppp_file previously pointed to.

v2: lock all ppp_ioctl() instead of just checking private_data in
ppp_unattached_ioctl(), to avoid ambiguous behaviour.

Fixes: f3ff8a4d80e8 ("ppp: push BKL down into the driver")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'arc_emac-next'

Caesar Wang says:

====================
arc_emac: fixes the emac issues and cleanup emac drivers

This series patches are based on kernel 4.5-rc7+ version.
Linux version 4.5.0-rc7-next-20160311+ (wxt@nb) (...) #45 SMP Sun Mar 13 16:17:56

The history patch in here:
Patch-v1: https://lkml.org/lkml/2016/3/11/209
Patch-v2: https://lkml.org/lkml/2016/3/13/39

Verified on kylin board with my github.
https://github.com/Caesar-github/rockchip/tree/kylin/next

That's verified on kylin board with ubuntu os.

This series patches are built all pass with Mr.robot on
https://github.com/Caesar-github/linux/tree/build-emac-v3

How to test and verify?

You can refer to the following wiki document.
http://rockchip.wikidot.com/linux-develop-guide

bootup log:
[    1.264740] rockchip_emac 10200000.ethernet: no regulator found
[    1.270908] rockchip_emac 10200000.ethernet: ARC EMAC detected with id: 0x7fd02
[    1.278362] rockchip_emac 10200000.ethernet: IRQ is 29
[    1.283747] rockchip_emac 10200000.ethernet: MAC address is now 06:5d:61:c7:39:41
[    1.291314] rockchip_emac 10200000.ethernet: GPIO lookup for consumer phy-reset
[    1.291333] rockchip_emac 10200000.ethernet: using device tree for GPIO lookup
[    1.663155] rockchip_emac 10200000.ethernet: connected to Generic PHY phy with id 0xffffc816
[    8.863448] rockchip_emac 10200000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off

root@localhost:/# busybox ping www.baidu.com
PING www.baidu.com (14.215.177.38): 56 data bytes
64 bytes from 14.215.177.38: seq=0 ttl=48 time=35.046 ms
64 bytes from 14.215.177.38: seq=1 ttl=48 time=35.095 ms
64 bytes from 14.215.177.38: seq=2 ttl=48 time=34.203 ms
64 bytes from 14.215.177.38: seq=3 ttl=48 time=38.516 ms
...
---

1) This series has 6 patches: (1--->9)
net: arc_emac: make the rockchip emac document more compatible
net: arc_emac: add phy reset is optional for device tree
net: arc_emac: support the phy reset for emac driver
net: arc: trivial: cleanup the emac driver
clk: rockchip: add node-id for rk3036 emac hclk
clk: rockchip: associate the rk3036 HCLK_EMAC clock-id
clk: rockchip: add clock-id for rk3036 emac pll source clock
clk: rockchip: associate SCLK_MAC_PLL and disable reparenting on rk3036
ARM: dts: rockchip: add support emac for RK3036

2) This series patches have the following descriptions:

Hi Rob, David:
PATCH[1/9-2/9]: ====>
net: arc_emac: make the rockchip emac document more compatible
net: arc_emac: add phy reset is optional for device tree

The patches change the rockchip emac document for more compatible and
Add the phy reset property for document.
---

Hi David
PATCH[3/9]: ====>
net: arc_emac: support the phy reset for emac driver

The emac didn't work on kylin board since in some case the clocks parent changed.
The kylin hardware connects the phy reset pin, we should use it with real world.
As the previous patch discuss on https://patchwork.kernel.org/patch/8186801/

And as sergei/Heiko suggestions on
https://patchwork.kernel.org/patch/8564571/
---

Hi David
PATCH[4/9]: ====>
net: arc: trivial: cleanup the emac driver

The first time to look the emac drivers, I think that have to cleanup the drivers with scripts.
Although it's the trivial things, in order to be more read.
---

Hi Heiko,Michael,Stephen:
PATCH[5/9-8/9]: ====> clk: rockchip: rk3036: fix and add node id for emac clock

Four-part from https://patchwork.kernel.org/patch/8564581/
clk: rockchip: add node-id for rk3036 emac hclk
clk: rockchip: associate the rk3036 HCLK_EMAC clock-id
clk: rockchip: add clock-id for rk3036 emac pll source clock
clk: rockchip: associate SCLK_MAC_PLL and disable reparenting on rk3036

Add the emac needed clocks for rk3036 SoCs
---

Hi Heiko:
PATCH[9/9]: ====>
ARM: dts: rockchip: add support emac for RK3036

Add the emac needed main info for rk3036 dts.
---

Thanks your reviewing! :)

Changes in v3:
- %s/he/the
- Add the Cc people
- As Sergei comments, the original name is better, so
  %s/reset-gpios/phy-reset-gpios
- Add the Cc people.
- Caused the build error since the missing include head file.
- %s/reset/phy-reset to match the device tree.
- Add the Cc people
- Add the Cc people.
- Add the Cc people.
- Add the Cc people.
- Add the Cc people.
- Add the Cc people.
- rename reset-gpio to phy-reset-gpios.
- change the commit.
- remove the pcfg_output_high, that's really not needed for emac.
- Add the Cc people.
- Fixes the 'zhengxing' to 'Xing Zheng'.

Changes in v2:
- change the commit and remove the repeat the name 'rockchip'.
- %s/phy-reset-gpios/reset-gpios
- As the pervious version, Sergei and Heiko comments on
  https://patchwork.kernel.org/patch/8564571/.
- Nevermind, add signed-off since Heiko the original patch,
  refer the Heiko's test patch on
  https://github.com/mmind/linux-rockchip/commit/a943c588783438ff1c508dfa8c79f1709aa5775e
  :)
- As the robot notice the build error since overflow in implicit
  constant conversion.
- rename phy-reset-gpio to reset-gpios.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: rockchip: add to support emac for rk3036 SoCs

This patch adds the emac device node for rk3036 SoCs.
We need to let mac clock under the DPLL which is able to provide
the accurate 50MHz what mac_ref need, since that will cause some
unstable things if the cpufreq is working.

Signed-off-by: Xing Zheng <zhengxing@rock-chips.com>
Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Cc: linux-rockchip@lists.infradead.org
Cc: Xing Zheng <zhengxing@rock-chips.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

clk: rockchip: associate SCLK_MAC_PLL and disable reparenting on rk3036

The emac needs constant and very specific rate but the possible PLL-sources
are very limited, so we expect the PLL source to be set manually on per
board and don't want it to get changed in an automatic way later.
So add the necessary clock-id and disable reparenting on set_rate calls.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: linux-clk@vger.kernel.org
Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

clk: rockchip: add clock-id for rk3036 emac pll source clock

Suitable PLLs for the emac on the rk3036 are difficult to find
and one of them is the (continuously changing) APLL. So in most
cases it will be necessary to select a PLL manually.
So add a clock-id for it.

Signed-off-by: Xing Zheng <zhengxing@rock-chips.com>
Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Cc: Xing Zheng <zhengxing@rock-chips.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: linux-clk@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

clk: rockchip: associate the rk3036 HCLK_EMAC clock-id

Associate the new clock id the clock.

Signed-off-by: Xing Zheng <zhengxing@rock-chips.com>
Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Cc: Xing Zheng <zhengxing@rock-chips.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: linux-clk@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

clk: rockchip: add node-id for rk3036 emac hclk

Add the node-id for the emac hclk to the binding header.

Signed-off-by: Xing Zheng <zhengxing@rock-chips.com>
Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Cc: Xing Zheng <zhengxing@rock-chips.com>
Cc: Michael Turquette <mturquette@baylibre.com>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: linux-clk@vger.kernel.org
Cc: linux-rockchip@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>

net: arc: trivial: cleanup the emac driver

This patch will make the driver more readability

The emac has the error and warnings if you run
'scripts/checkpatch.pl -f --subjective xxx' to check.

Let's clean up such trivial details.

Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Cc: Jiri Kosina <trivial@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Kochetkov <al.kochet@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

net: arc_emac: support the phy reset for emac driver

This patch adds to support the emac phy reset.

Different boards may require different phy reset duration. Add property
phy-reset-duration for emac driver, so that the boards that need
a longer reset duration can specify it in their device tree.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: Alexander Kochetkov <al.kochet@gmail.com>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: arc_emac: add phy reset is optional for device tree

This patch adds the following property for arc_emac.

1) phy-reset-gpios:
The phy-reset-gpio is an optional property for arc emac device tree boot.
Change the binding document to match the driver code.

2) phy-reset-duration:
Different boards may require different phy reset duration. Add property
phy-reset-duration for device tree probe, so that the boards that need
a longer reset duration can specify it in their device tree.

Anyway, we can add the above property for arc emac.

Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc; Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: arc_emac: make the rockchip emac document more compatible

Add the rk3036 SoCs to match driver for document since the emac driver
has supported the rk3036 SoCs.

This patch adds the rk3036/rk3066/rk3188 SoCS to compatible for rockchip
emac ducument. Also, that will suit for other SoCs in the future.

Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: devicetree@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: Set cmd field in ETHTOOL_GLINKSETTINGS response to wrong nwords

When the ETHTOOL_GLINKSETTINGS implementation finds that userland is
using the wrong number of words of link mode bitmaps (or is trying to
find out the right numbers) it sets the cmd field to 0 in the response
structure.

This is inconsistent with the implementation of every other ethtool
command, so let's remove that inconsistency before it gets into a
stable release.

Fixes: 3f1ac7a700d03 ("net: ethtool: add new ETHTOOL_xLINKSETTINGS API")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: do not call netif_start_queue() from sh_eth_dev_init()

Iff  sh_eth_phy_start() call fails in sh_eth_open(), the netif_start_queue()
call done by sh_eth_dev_init()  is not undone.  In order to deal with that,
stop calling netif_start_queue()  from there, so that it can be called only
when the device is fully opened and sh_eth_dev_init() only deals with the
hardware initialization, symmetrically to sh_eth_dev_exit()...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: don't wait for Tx completion on recovery

When driver has hit a parity event, HW can no longer write to host memory.
As a result, Tx completions cannot be written to the host SB memory, and
waiting for Tx completions eventually timeout.
As driver is willing to delay as much as 1-2 seconds per Tx queue for its
draining and this delay is sequential, the time to recover might greatly
lengthen needlessly in case the recovery is done under multi-connection
traffic.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: consolidate local_bh_disable/enable + spin_lock/unlock to _bh variant

local_bh_disable() + spin_lock() is equivalent to spin_lock_bh(), same for
the unlock/enable case, so replace the calls by the appropriate wrappers.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS/OVS updates for net-next

The following patchset contains Netfilter/IPVS fixes and OVS NAT
support, more specifically this batch is composed of:

1) Fix a crash in ipset when performing a parallel flush/dump with
   set:list type, from Jozsef Kadlecsik.

2) Make sure NFACCT_FILTER_* netlink attributes are in place before
   accessing them, from Phil Turnbull.

3) Check return error code from ip_vs_fill_iph_skb_off() in IPVS SIP
   helper, from Arnd Bergmann.

4) Add workaround to IPVS to reschedule existing connections to new
   destination server by dropping the packet and wait for retransmission
   of TCP syn packet, from Julian Anastasov.

5) Allow connection rescheduling in IPVS when in CLOSE state, also
   from Julian.

6) Fix wrong offset of SIP Call-ID in IPVS helper, from Marco Angaroni.

7) Validate IPSET_ATTR_ETHER netlink attribute length, from Jozsef.

8) Check match/targetinfo netlink attribute size in nft_compat,
   patch from Florian Westphal.

9) Check for integer overflow on 32-bit systems in x_tables, from
   Florian Westphal.

Several patches from Jarno Rajahalme to prepare the introduction of
NAT support to OVS based on the Netfilter infrastructure:

10) Schedule IP_CT_NEW_REPLY definition for removal in
    nf_conntrack_common.h.

11) Simplify checksumming recalculation in nf_nat.

12) Add comments to the openvswitch conntrack code, from Jarno.

13) Update the CT state key only after successful nf_conntrack_in()
    invocation.

14) Find existing conntrack entry after upcall.

15) Handle NF_REPEAT case due to templates in nf_conntrack_in().

16) Call the conntrack helper functions once the conntrack has been
    confirmed.

17) And finally, add the NAT interface to OVS.

The batch closes with:

18) Cleanup to use spin_unlock_wait() instead of
    spin_lock()/spin_unlock(), from Nicholas Mc Guire.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_conntrack: consolidate lock/unlock into unlock_wait

The spin_lock()/spin_unlock() is synchronizing on the
nf_conntrack_locks_all_lock which is equivalent to
spin_unlock_wait() but the later should be more efficient.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

net: diag: add a scheduling point in inet_diag_dump_icsk()

On loaded TCP servers, looking at millions of sockets can hold
cpu for many seconds, if the lookup condition is very narrow.

(eg : ss dst 1.2.3.4 )

Better add a cond_resched() to allow other processes to access
the cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

smc91x: avoid self-comparison warning

The smc91x driver defines a macro that compares its argument to
itself, apparently to get a true result while using its argument
to avoid a warning about unused local variables.

Unfortunately, this triggers a warning with gcc-6, as the comparison
is obviously useless:

drivers/net/ethernet/smsc/smc91x.c: In function 'smc_hardware_send_pkt':
drivers/net/ethernet/smsc/smc91x.c:563:14: error: self-comparison always evaluates to true [-Werror=tautological-compare]
if (!smc_special_trylock(&lp->lock, flags)) {

This replaces the macro with another one that behaves similarly,
with a cast to (void) to ensure the argument is used, and using
a literal 'true' as its value.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: Interface with NAT.

Extend OVS conntrack interface to cover NAT. New nested
OVS_CT_ATTR_NAT attribute may be used to include NAT with a CT action.
A bare OVS_CT_ATTR_NAT only mangles existing and expected connections.
If OVS_NAT_ATTR_SRC or OVS_NAT_ATTR_DST is included within the nested
attributes, new (non-committed/non-confirmed) connections are mangled
according to the rest of the nested attributes.

The corresponding OVS userspace patch series includes test cases (in
tests/system-traffic.at) that also serve as example uses.

This work extends on a branch by Thomas Graf at
https://github.com/tgraf/ovs/tree/nat.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

openvswitch: Delay conntrack helper call for new connections.

There is no need to help connections that are not confirmed, so we can
delay helping new connections to the time when they are confirmed.
This change is needed for NAT support, and having this as a separate
patch will make the following NAT patch a bit easier to review.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

openvswitch: Handle NF_REPEAT in conntrack action.

Repeat the nf_conntrack_in() call when it returns NF_REPEAT. This
avoids dropping a SYN packet re-opening an existing TCP connection.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

openvswitch: Find existing conntrack entry after upcall.

Add a new function ovs_ct_find_existing() to find an existing
conntrack entry for which this packet was already applied to. This is
only to be called when there is evidence that the packet was already
tracked and committed, but we lost the ct reference due to an
userspace upcall.

ovs_ct_find_existing() is called from skb_nfct_cached(), which can now
hide the fact that the ct reference may have been lost due to an
upcall. This allows ovs_ct_commit() to be simplified.

This patch is needed by later "openvswitch: Interface with NAT" patch,
as we need to be able to pass the packet through NAT using the
original ct reference also after the reference is lost after an
upcall.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

openvswitch: Update the CT state key only after nf_conntrack_in().

Only a successful nf_conntrack_in() call can effect a connection state
change, so it suffices to update the key only after the
nf_conntrack_in() returns.

This change is needed for the later NAT patches.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

openvswitch: Add commentary to conntrack.c

This makes the code easier to understand and the following patches
more focused.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: Allow calling into nat helper without skb_dst.

NAT checksum recalculation code assumes existence of skb_dst, which
becomes a problem for a later patch in the series ("openvswitch:
Interface with NAT."). Simplify this by removing the check on
skb_dst, as the checksum will be dealt with later in the stack.

Suggested-by: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: Remove IP_CT_NEW_REPLY definition.

Remove the definition of IP_CT_NEW_REPLY from the kernel as it does
not make sense. This allows the definition of IP_CT_NUMBER to be
simplified as well.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge branch 'dsa-finers-bridging-control'

Vivien Didelot says:

====================
net: dsa: finer bridging control

This patchset renames the bridging routines of the DSA layer, make the
unbridging routine return void, and rework the DSA netdev notifier handler,
similar to what the Mellanox Spectrum driver does.

Changes RFC -> v1:
- drop unused NETDEV_PRECHANGEUPPER case
- add Andrew's Tested-by tag
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: refine netdev event notifier

Rework the netdev event handler, similar to what the Mellanox Spectrum
driver does, to easily welcome more events later (for example
NETDEV_PRECHANGEUPPER) and use netdev helpers (such as
netif_is_bridge_master).

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: make port_bridge_leave return void

netdev_upper_dev_unlink() which notifies NETDEV_CHANGEUPPER, returns
void, as well as del_nbp(). So there's no advantage to catch an eventual
error from the port_bridge_leave routine at the DSA level.

Make this routine void for the DSA layer and its existing drivers.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: rename port_*_bridge routines

Rename DSA port_join_bridge and port_leave_bridge routines to
respectively port_bridge_join and port_bridge_leave in order to respect
an implicit Port::Bridge namespace.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Support DR6 indication in mISDNipac driver

According to figure 39 in PEB3086 data sheet, version 1.4 this indication
replaces DR when layer 1 transition source state is F6.

This fixes mISDN layer 1 getting stuck in F6 state in TE mode on
Dialogic Diva 2.02 card (and possibly others) when NT deactivates it.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Acked-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Order IPAC register defines

It looks like IPAC/ISAC chips register defines weren't in any particular
order.

Order them by their number to make it easier to spot holes.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Acked-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: kill useless initializers

Some of the local variable intializers in the driver turned out to be
pointless, kill 'em.

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mvneta-fixes'

Gregory CLEMENT says:

====================
Few mvneta fixes

In this second version I split the last patch in two parts as
requested.

For the record the initial cover letter was:
"here is a patch set of few fixes. Without the first one, a kernel
configured with debug features ended to hang when the driver is built
as a module and is removed. This is quite is annoying for debugging!

The second patch fix a forgotten flag at the initial submission of the
driver.

The third patch is only really a cosmetic one so I have no problem to
not apply it for 4.5 and wait for 4.6.

I really would like to see the first one applied for 4.5 and for the
second I let you judge if it something needed for now or that should
wait the next release."
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: replace magic numbers by existing macros

Some literal values are actually already defined by macros, so let's use
them.

[gregory.clement@free-electrons.com: split intial commit in two
individual changes]
Signed-off-by: Dmitri Epshtein <dima@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: fix error messages in mvneta_port_down function

This commit corrects error printing when shutting down the port.

[gregory.clement@free-electrons.com: split initial commit in two
individual changes]
Signed-off-by: Dmitri Epshtein <dima@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: enable change MAC address when interface is up

Function eth_prepare_mac_addr_change() is called as part of MAC
address change. This function check if interface is running.
To enable change MAC address when interface is running:
IFF_LIVE_ADDR_CHANGE flag must be set to dev->priv_flags field

Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP
network unit")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitri Epshtein <dima@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: Fix spinlock usage

In the previous patch, the spinlock was not initialized. While it didn't
cause any trouble yet it could be a problem to use it uninitialized.

The most annoying part was the critical section protected by the spinlock
in mvneta_stop(). Some of the functions could sleep as pointed when
activated CONFIG_DEBUG_ATOMIC_SLEEP. Actually, in mvneta_stop() we only
need to protect the is_stopped flagged, indeed the code of the notifier
for CPU online is protected by the same spinlock, so when we get the
lock, the notifer work is done.

Reported-by: Patrick Uiterwijk <patrick@puiterwijk.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict

Zefir Kurtisi reported kernel panic with an openwrt specific patch.
However, it turns out that mainline has a similar bug waiting to happen.

Once NF_HOOK() returns the skb is in undefined state and must not be
used. Moreover, the okfn must consume the skb to support async
processing (NF_QUEUE).

Current okfn in this spot doesn't consume it and caller assumes that
NF_HOOK return value tells us if skb was freed or not, but thats wrong.

It "works" because no in-tree user registers a NFPROTO_BRIDGE hook at
LOCAL_IN that returns STOLEN or NF_QUEUE verdicts.

Once we add NF_QUEUE support for nftables bridge this will break --
NF_QUEUE holds the skb for async processing, caller will erronoulsy
return RX_HANDLER_PASS and on reinject netfilter will access free'd skb.

Fix this by pushing skb up the stack in the okfn instead.

NB: It also seems dubious to use LOCAL_IN while bypassing PRE_ROUTING
completely in this case but this is how its been forever so it seems
preferable to not change this.

Cc: Felix Fietkau <nbd@openwrt.org>
Cc: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Tested-by: Zefir Kurtisi <zefir.kurtisi@neratec.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2016-03-12

Here's the last bluetooth-next pull request for the 4.6 kernel.

- New USB ID for AR3012 in btusb
- New BCM2E55 ACPI ID
- Buffer overflow fix for the Add Advertising command
- Support for a new Bluetooth LE limited privacy mode
- Fix for firmware activation in btmrvl_sdio
- Cleanups to mac802154 & 6lowpan code

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-cleanups'

Andrew Lunn says:

====================
DSA cleanup and fixes

The RFC patchset for re-architecturing DSA probing contains a few
standalone patches, either clean up or fixes. This pulls them out for
submission.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

phy: fixed: Fix removal of phys.

The fixed phys delete function simply removed the fixed phy from the
internal linked list and freed the memory. It however did not
unregister the associated phy device. This meant it was still possible
to find the phy device on the mdio bus.

Make fixed_phy_del() an internal function and add a
fixed_phy_unregister() to unregisters the phy device and then uses
fixed_phy_del() to free resources.

Modify DSA to use this new API function, so we don't leak phys.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: dsa: Fix freeing of fixed-phys from user ports.

All ports types can have a fixed PHY associated with it. Remove the
check which limits removal to only CPU and DSA ports.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Destroy fixed link phys after the phy has been disconnected

The phy is disconnected from the slave in dsa_slave_destroy(). Don't
destroy fixed link phys until after this, since there can be fixed
linked phys connected to ports.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: slave: Don't reference NULL pointer during phy_disconnect

When the phy is disconnected, the parent pointer to the netdev it was
attached to is set to NULL. The code then tries to suspend the phy,
but dsa_slave_fixed_link_update needs the parent pointer to determine
which switch the phy is connected to. So it dereferenced a NULL
pointer. Check for this condition.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dsa: Rename mv88e6123_61_65 to mv88e6123 to be consistent

All the drivers support multiple chips, but mv88e6123_61_65 is the
only one that reflects this in its naming. Change it to be consistent
with the other drivers.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'of_mdio-checks'

Sergei Shtylyov says:

====================
of_mdio: use IS_ERR_OR_NULL() and PTR_ERR_OR_ZERO()

Here's the set of 3 patches against DaveM's 'net-next.git' repo. They deal
with some error checks in the device tree MDIO code...
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

of_mdio: use PTR_ERR_OR_ZERO()

PTR_ERR_OR_ZERO() is open coded in of_phy_register_fixed_link(), so just
call it directly...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

of_mdio: use IS_ERR_OR_NULL()

IS_ERR_OR_NULL() is open coded in of_mdiobus_register_phy(), so just call
it directly...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

of_mdio: mdio_device_create() never returns NULL

mdio_device_create() never returns NULL, thus checking for it in
of_mdiobus_register_device() is pointless...

Suggested-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'thunderx-phy'

David Daney says:

====================
net/phy: Improvements to Cavium Thunder MDIO code.

Changes from v1:

- In 1/3 Add back check for non-OF objects in bgx_init_of_phy().  It
   is probably not necessary, but better safe than sorry...

The firmware on many Cavium Thunder systems configures the MDIO bus
hardware to be probed as a PCI device.  In order to use the MDIO bus
drivers in this configuration, we must add PCI probing to the driver.

There are two parts to this set of three patches:

1) Cleanup the PHY probing code in thunder_bgx.c to handle the case
    where there is no PHY attached to a port, as well as being more
    robust in the face of driver loading order by use of
    -EPROBE_DEFER.

2) Split mdio-octeon.c into two drivers, one with platform probing,
and the other with PCI probing.  Common code is shared between the
two.

Tested on several different Thunder and OCTEON systems, also compile
tested on x86_64.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

phy: mdio-thunder: Add driver for Cavium Thunder SoC MDIO buses.

The Cavium Thunder SoCs have multiple MIDO buses that are part of a
single PCI device. To model this in the device tree we call the PCI
parent device a "cavium,thunder-8890-mdio-nexus", it has several
children, one for each MDIO bus.

The MDIO bus hardware is identical to that found in the OCTEON SoCs,
so we use that code for things that are not part of the PCI driver
probe/remove

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: mdio-octeon: Refactor into two files/modules

A follow-on patch uses PCI probing to find the Thunder MDIO hardware.
In preparation for this, split out the common code into a new file
mdio-cavium.c, which will be used by both the existing OCTEON driver,
and the new Thunder PCI based driver.

As part of the refactoring simplify the struct cavium_mdiobus by
removing fields that are only ever used in the probe function and can
just as well be local variables.

Use readq/writeq in preference to readq_relaxed/writeq_relaxed as the
relaxed form was an optimization for an early chip revision, and the
MDIO drivers are not performance bottlenecks that need optimization in
the first place.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: thunderx: Cleanup PHY probing code.

Remove the call to force the octeon-mdio driver to be loaded. Allow
the standard driver loading mechanisms to load the PHY drivers, and
use -EPROBE_DEFER to cause the BGX driver to be probed only after the
PHY drivers are available.

Reorder the setting of MAC addresses and PHY probing to allow BGX
LMACs with no attached PHY to still be assigned a MAC address.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: Add missing hotplug notifier transition

The mvneta_percpu_notifier() hotplug callback lacks handling of the
CPU_DOWN_FAILED case. That means, if CPU_DOWN_PREPARE failes, the
driver is not well configured on the CPU.

Add handling for CPU_DOWN_FAILED[_FROZEN] hotplug notifier transition
to setup the driver.

Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

fsl/fman: fix dtsec_set_tx_pause_frames

Fix a bug introduced in e06a03b (fsl/fman: fix the pause_time test)
When pause_time is set to '0' - pause frames are disabled and
there's no need to apply dTSEC-A003 Errata workaround.

Signed-off-by: Igal Liberman <igal.liberman@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: networking: phy.txt: Add missing functions

Some new development in PHYLIB added new function pointers to the struct
phy_driver, document these.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Add RFC4898 tcpEStatsPerfDataSegsOut/In

Per RFC4898, they count segments sent/received
containing a positive length data segment (that includes
retransmission segments carrying data). Unlike
tcpi_segs_out/in, tcpi_data_segs_out/in excludes segments
carrying no data (e.g. pure ack).

The patch also updates the segs_in in tcp_fastopen_add_skb()
so that segs_in >= data_segs_in property is kept.

Together with retransmission data, tcpi_data_segs_out
gives a better signal on the rxmit rate.

v6: Rebase on the latest net-next

v5: Eric pointed out that checking skb->len is still needed in
tcp_fastopen_add_skb() because skb can carry a FIN without data.
Hence, instead of open coding segs_in and data_segs_in, tcp_segs_in()
helper is used. Comment is added to the fastopen case to explain why
segs_in has to be reset and tcp_segs_in() has to be called before
__skb_pull().

v4: Add comment to the changes in tcp_fastopen_add_skb()
and also add remark on this case in the commit message.

v3: Add const modifier to the skb parameter in tcp_segs_in()

v2: Rework based on recent fix by Eric:
commit a9d99ce28ed3 ("tcp: fix tcpi_segs_in after connection establishment")

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Chris Rapier <rapier@psc.edu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Marcelo Ricardo Leitner <mleitner@redhat.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vmxnet3: fix lock imbalance in vmxnet3_tq_xmit()

A recent bug fix rearranged the code in vmxnet3_tq_xmit() in a
way that left the error handling for oversized headers unlock
a lock that had not been taken yet. Gcc warns about the incorrect
use of the 'flags' variable because of that:

drivers/net/vmxnet3/vmxnet3_drv.c: In function 'vmxnet3_tq_xmit.constprop':
include/linux/spinlock.h:246:3: error: 'flags' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This changes the error handling path to 'goto' the end of the function
beyond the lock/unlock pair.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: cec05562fb1d ("vmxnet3: avoid calling pskb_may_pull with interrupts disabled")
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-gcc60-fixes'

Arnd Bergmann says:

====================
net: gcc-6.0 warning fixes

I've just installed gcc-6.0 to see what kinds of new warnings
we get. It turns out that it's actually really useful once I
disabled -Wunused-const-variable, and all of the warnings it
found in network drivers seem valid.

Sorry for the bad timing in the merge window, but I figured
it would be better to send the fixes as I found the bugs
rather than waiting for the next cycle. The first three
look appropriate for stable backports.

The other two only fix a gcc warning about incorrect whitespace,
probably not worth backporting those.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: caif: fix misleading indentation

gcc points out code that is not indented the way it is
interpreted:

net/caif/cfpkt_skbuff.c: In function 'cfpkt_setlen':
net/caif/cfpkt_skbuff.c:289:4: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
    return cfpkt_getlen(pkt);
    ^~~~~~
net/caif/cfpkt_skbuff.c:286:3: note: ...this 'else' clause, but it is not
   else
   ^~~~

It is clear from the context that not returning here would be
a bug, as we'd end up passing a negative length into a function
that takes a u16 length, so it is not missing curly braces
here, and I'm assuming that the indentation is the only part
that's wrong about it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ath9k: fix misleading indentation

A cleanup patch in linux-3.18 moved around some code in the ath9k
driver and left some code to be indented in a misleading way,
made worse by the addition of some new code for p2p mode, as
discovered by a new gcc-6 warning:

drivers/net/wireless/ath/ath9k/init.c: In function 'ath9k_set_hw_capab':
drivers/net/wireless/ath/ath9k/init.c:851:4: warning: statement is indented as if it were guarded by... [-Wmisleading-indentation]
    hw->wiphy->iface_combinations = if_comb;
    ^~
drivers/net/wireless/ath/ath9k/init.c:847:3: note: ...this 'if' clause, but it is not
   if (ath9k_is_chanctx_enabled())
   ^~

The code is in fact correct, but the indentation is not, so I'm
reformatting it as it should have been after the original cleanup.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 499afaccf6f3 ("ath9k: Isolate ath9k_use_chanctx module parameter")
Fixes: eb61f9f623f7 ("ath9k: advertise p2p dev support when chanctx")
Signed-off-by: David S. Miller <davem@davemloft.net>

ath9k: fix buffer overrun for ar9287

Code that was added back in 2.6.38 has an obvious overflow
when accessing a static array, and at the time it was added
only a code comment was put in front of it as a reminder
to have it reviewed properly.

This has not happened, but gcc-6 now points to the specific
overflow:

drivers/net/wireless/ath/ath9k/eeprom.c: In function 'ath9k_hw_get_gain_boundaries_pdadcs':
drivers/net/wireless/ath/ath9k/eeprom.c:483:44: error: array subscript is above array bounds [-Werror=array-bounds]
maxPwrT4[i] = data_9287[idxL].pwrPdg[i][4];
~~~~~~~~~~~~~~~~~~~~~~~~~^~~

It turns out that the correct array length exists in the local
'intercepts' variable of this function, so we can just use that
instead of hardcoding '4', so this patch changes all three
instances to use that variable. The other two instances were
already correct, but it's more consistent this way.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 940cd2c12ebf ("ath9k_hw: merge the ar9287 version of ath9k_hw_get_gain_boundaries_pdadcs")
Signed-off-by: David S. Miller <davem@davemloft.net>

farsync: fix off-by-one bug in fst_add_one

gcc-6 finds an out of bounds access in the fst_add_one function
when calculating the end of the mmio area:

drivers/net/wan/farsync.c: In function 'fst_add_one':
drivers/net/wan/farsync.c:418:53: error: index 2 denotes an offset greater than size of 'u8[2][8192] {aka unsigned char[2][8192]}' [-Werror=array-bounds]
#define BUF_OFFSET(X)   (BFM_BASE + offsetof(struct buf_window, X))
                                                     ^
include/linux/compiler-gcc.h:158:21: note: in definition of macro '__compiler_offsetof'
  __builtin_offsetof(a, b)
                     ^
drivers/net/wan/farsync.c:418:37: note: in expansion of macro 'offsetof'
#define BUF_OFFSET(X)   (BFM_BASE + offsetof(struct buf_window, X))
                                     ^~~~~~~~
drivers/net/wan/farsync.c:2519:36: note: in expansion of macro 'BUF_OFFSET'
                                  + BUF_OFFSET ( txBuffer[i][NUM_TX_BUFFER][0]);
                                    ^~~~~~~~~~

The warning is correct, but not critical because this appears
to be a write-only variable that is set by each WAN driver but
never accessed afterwards.

I'm taking the minimal fix here, using the correct pointer by
pointing 'mem_end' to the last byte inside of the register area
as all other WAN drivers do, rather than the first byte outside of
it. An alternative would be to just remove the mem_end member
entirely.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4: add missing braces in verify_qp_parameters

The implementation of QP paravirtualization back in linux-3.7 included
some code that looks very dubious, and gcc-6 has grown smart enough
to warn about it:

drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'verify_qp_parameters':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3154:5: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
     if (optpar & MLX4_QP_OPTPAR_ALT_ADDR_PATH) {
     ^~
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3144:4: note: ...this 'if' clause, but it is not
    if (slave != mlx4_master_func_num(dev))

>From looking at the context, I'm reasonably sure that the indentation
is correct but that it should have contained curly braces from the
start, as the update_gid() function in the same patch correctly does.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 54679e148287 ("mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop")
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: check device_reset return code

The device_reset() function may fail, so we have to check
its return value, e.g. to make deferred probing work correctly.
gcc warns about it because of the warn_unused_result attribute:

drivers/net/ethernet/mediatek/mtk_eth_soc.c: In function 'mtk_probe':
drivers/net/ethernet/mediatek/mtk_eth_soc.c:1679:2: error: ignoring return value of 'device_reset', declared with attribute warn_unused_result [-Werror=unused-result]

This adds the trivial error check to propagate the return value
to the generic platform device probe code.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: remove incorrect dma_mask assignment

Device drivers should not mess with the DMA mask directly,
but instead call dma_set_mask() etc if needed.

In case of the mtk_eth_soc driver, the mask already gets set
correctly when the device is created, and setting it again
is against the documented API.

This removes the incorrect setting.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: use dma_addr_t correctly

dma_alloc_coherent() expects a dma_addr_t pointer as its argument,
not an 'unsigned int', and gcc correctly warns about broken
code in the mtk_init_fq_dma function:

drivers/net/ethernet/mediatek/mtk_eth_soc.c: In function 'mtk_init_fq_dma':
drivers/net/ethernet/mediatek/mtk_eth_soc.c:463:13: error: passing argument 3 of 'dma_alloc_coherent' from incompatible pointer type [-Werror=incompatible-pointer-types]

This changes the type of the local variable to dma_addr_t.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix use after free in the recvmmsg exit path

The syzkaller fuzzer hit the following use-after-free:

  Call Trace:
   [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
   [<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
   [<     inline     >] SYSC_recvmmsg net/socket.c:2281
   [<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
   [<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
  arch/x86/entry/entry_64.S:185

And, as Dmitry rightly assessed, that is because we can drop the
reference and then touch it when the underlying recvmsg calls return
some packets and then hit an error, which will make recvmmsg to set
sock->sk->sk_err, oops, fix it.

Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Fixes: a2e2725541fa ("net: Introduce recvmmsg socket syscall")
http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'thunderx-perf'

Sunil Goutham says:

====================
net: thunderx: Performance enhancement changes

Below patches attempts to improve performance by reducing
no of atomic operations while allocating new receive buffers
and reducing cache misses by adjusting nicvf structure elements.

Changes from v1:
No changes, resubmitting a fresh as per David's suggestion.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: thunderx: Adjust nicvf structure to reduce cache misses

Adjusted nicvf structure such that all elements used in hot
path like napi, xmit e.t.c fall into same cache line. This reduced
no of cache misses and resulted in ~2% increase in no of packets
handled on a core.

Also modified elements with :1 notation to boolean, to be
consistent with other element definitions.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: thunderx: Set recevie buffer page usage count in bulk

Instead of calling get_page() for every receive buffer carved out
of page, set page's usage count at the end, to reduce no of atomic
calls.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>