git.karo-electronics.de Git - karo-tx-linux.git/log

igb: add statistic indicating number of skipped Tx timestamps

The igb driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.

There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: add statistic indicating number of skipped Tx timestamps

The e1000e driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.

There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: avoid permanent lock of *_PTP_TX_IN_PROGRESS

The igb driver uses a state bit lock to avoid handling more than one Tx
timestamp request at once. This is required because hardware is limited
to a single set of registers for Tx timestamps.

The state bit lock is not properly cleaned up during
igb_xmit_frame_ring() if the transmit fails such as due to DMA or TSO
failure. In some hardware this results in blocking timestamps until the
service task times out. In other hardware this results in a permanent
lock of the timestamp bit because we never receive an interrupt
indicating the timestamp occurred, since indeed the packet was never
transmitted.

Fix this by checking for DMA and TSO errors in igb_xmit_frame_ring() and
properly cleaning up after ourselves when these occur.

Reported-by: Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: fix race condition with PTP_TX_IN_PROGRESS bits

Hardware related to the igb driver has a limitation of only handling one
Tx timestamp at a time. Thus, the driver uses a state bit lock to
enforce that only one timestamp request is honored at a time.

Unfortunately this suffers from a simple race condition. The bit lock is
not cleared until after skb_tstamp_tx() is called notifying the stack of
a new Tx timestamp. Even a well behaved application which sends only one
timestamp request at once and waits for a response might wake up and
send a new packet before the bit lock is cleared. This results in
needlessly dropping some Tx timestamp requests.

We can fix this by unlocking the state bit as soon as we read the
Timestamp register, as this is the first point at which it is safe to
unlock.

To avoid issues with the skb pointer, we'll use a copy of the pointer
and set the global variable in the driver structure to NULL first. This
ensures that the next timestamp request does not modify our local copy
of the skb pointer.

This ensures that well behaved applications do not accidentally race
with the unlock bit. Obviously an application which sends multiple Tx
timestamp requests at once will still only timestamp one packet at
a time. Unfortunately there is nothing we can do about this.

Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: fix race condition around skb_tstamp_tx()

The e1000e driver and related hardware has a limitation on Tx PTP
packets which requires we limit to timestamping a single packet at once.
We do this by verifying that we never request a new Tx timestamp while
we still have a tx_hwtstamp_skb pointer.

Unfortunately the driver suffers from a race condition around this. The
tx_hwtstamp_skb pointer is not set to NULL until after skb_tstamp_tx()
is called. This function notifies the stack and applications of a new
timestamp. Even a well behaved application that only sends a new request
when the first one is finished might be woken up and possibly send
a packet before we can free the timestamp in the driver again. The
result is that we needlessly ignore some Tx timestamp requests in this
corner case.

Fix this by assigning the tx_hwtstamp_skb pointer prior to calling
skb_tstamp_tx() and use a temporary pointer to hold the timestamped skb
until that function finishes. This ensures that the application is not
woken up until the driver is ready to begin timestamping a new packet.

This ensures that well behaved applications do not accidentally race
with condition to skip Tx timestamps. Obviously an application which
sends multiple Tx timestamp requests at once will still only timestamp
one packet at a time. Unfortunately there is nothing we can do about
this.

Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: mark PM functions as __maybe_unused

The new wake function is only used by the suspend/resume handlers that
are defined in inside of an #ifdef, which can cause this harmless
warning:

drivers/net/ethernet/intel/igb/igb_main.c:7988:13: warning: 'igb_deliver_wake_packet' defined but not used [-Wunused-function]

Removing the #ifdef, instead using a __maybe_unused annotation
simplifies the code and avoids the warning.

Fixes: b90fa8763560 ("igb: Enable reading of wake up packet")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: Explicitly select page 0 at initialization

The functions igb_read_phy_reg_gs40g/igb_write_phy_reg_gs40g (which were
removed in 2a3cdea) explicitly selected the required page at every phy_reg
access. Currently, igb_get_phy_id_82575 relays on the fact that page 0 is
already selected. The assumption is not fulfilled for my Lex 3I380CW
motherboard with integrated dual i211 based gigabit ethernet. This leads to igb
initialization failure and network interfaces are not working:

    igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
    igb: Copyright (c) 2007-2014 Intel Corporation.
    igb: probe of 0000:01:00.0 failed with error -2
    igb: probe of 0000:02:00.0 failed with error -2

In order to fix it, we explicitly select page 0 before first access to phy
registers.

See also: https://bugzilla.suse.com/show_bug.cgi?id=1009911
See also: http://www.lex.com.tw/products/pdf/3I380A&3I380CW.pdf

Fixes: 2a3cdea ("igb: Remove GS40G specific defines/functions")
Cc: <stable@vger.kernel.org> # 4.5+
Signed-off-by: Matwey V Kornilov <matwey@sai.msu.ru>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
"Two cgroup fixes. One to address RCU delay of cpuset removal affecting
  userland visible behaviors. The other fixes a race condition between
  controller disable and cgroup removal"

* 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: consider dying css as offline
  cgroup: Prevent kill_css() from being called more than once

Merge branch 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:

- Revert of sata_mv devm_ioremap_resource() conversion. It made init
   fail if there are overlapping resources which led to detection
   failures on some setups.

- A workaround for an Acer laptop which sometimes reports corrupt port
   map.

- Other non-critical fixes.

* 'for-4.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: fix error checking in in ata_parse_force_one()
  Revert "ata: sata_mv: Convert to devm_ioremap_resource()"
  ata: libahci: properly propagate return value of platform_get_irq()
  ata: sata_rcar: Handle return value of clk_prepare_enable
  ahci: Acer SA5-271 SSD Not Detected Fix

mdio: mux: fix an incorrect less than zero error check using a u32

The u32 variable v is being checked to see if an error return is
less than zero and this check has no effect because it is unsigned.
Fix this by making v and int (this also matches the type of
cb->bus_number which is assigned to the value in v).

Detected by CoverityScan, CID#1440454 ("Unsigned compared against zero")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'iwlwifi-for-kalle-2017-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes

Fixes for 4.12:

* Some memory leaks;
* IBSS support;
* Some bugzilla bugs;
* Some runtime PM fixes;
* Rate-scaling issues;
* Some locking problems;

iwlwifi: fix host command memory leaks

Sending host command with CMD_WANT_SKB flag demands the release of the
response buffer with iwl_free_resp function.
The patch adds the memory release in all the relevant places

Signed-off-by: Shahar S Matityahu <shahar.s.matityahu@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: fix min API version for 7265D, 3168, 8000 and 8265

In a previous commit, we removed support for API versions earlier than
22 for these NICs. By mistake, the *_UCODE_API_MIN definitions were
set to 17. Fix that.

Fixes: 4b87e5af638b ("iwlwifi: remove support for fw older than -17 and -22")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: clear new beacon command template struct

Clear the struct so that all reserved fields are zero when we
send the struct down to the device.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: don't fail when removing a key from an inexisting sta

The iwl_mvm_remove_sta_key() function handles removing a key when the
sta doesn't exist anymore. Mistakenly, this was changed to return an
error while fixing another bug.

If the mvm_sta doesn't exist, we continue normally, but just don't try
to remove the igtk key.

Fixes: cd4d23c1ea9b ("iwlwifi: mvm: Fix removal of IGTK")
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: pcie: only use d0i3 in suspend/resume if system_pm is set to d0i3

We only need to handle d0i3 entry and exit during suspend resume if
system_pm is set to IWL_PLAT_PM_MODE_D0I3, otherwise d0i3 entry
failures will cause suspend to fail.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=194791

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: fix firmware debug restart recording

When we want to stop the recording of the firmware debug
and restart it later without reloading the firmware we
don't need to resend the configuration that comes with
host commands.
Sending those commands confused the hardware and led to
an NMI 0x66.

Change the flow as following:
* read the relevant registers (DBGC_IN_SAMPLE, DBGC_OUT_CTRL)
* clear those registers
* wait for the hardware to complete its write to the buffer
* get the data
* restore the value of those registers (to restart the
recording)

For early start (where the configuration is already
compiled in the firmware), we don't need to set those
registers after the firmware has been loaded, but only
when we want to restart the recording without having
restarted the firmware.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: tt: move ucode_loaded check under mutex

The ucode_loaded check should be under the mutex, since it can
otherwise change state after we looked at it and before we got
the mutex. Fix that.

Fixes: 5c89e7bc557e ("iwlwifi: mvm: add registration to cooling device")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: support ibss in dqa mode

Allow working IBSS also when working in DQA mode.
This is done by setting it to treat the queues the
same as a BSS AP treats the queues.

Fixes: 7948b87308a4 ("iwlwifi: mvm: enable dynamic queue allocation mode")
Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: Fix command queue number on d0i3 flow

During d0i3 flow we flush all the queue except from the command queue.
Currently, in this flow the command queue is hard coded to 9.
In DQA the command queue number has changed from 9 to 0.
Fix that.

This fixes a problem in runtime PM resume flow.

Fixes: 097129c9e625 ("iwlwifi: mvm: move cmd queue to be #0 in dqa mode")
Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

iwlwifi: mvm: rs: start using LQ command color

Up until now, the driver was comparing the rate reported by the FW and
the rate of the latest LQ command to avoid processing data belonging
to the old LQ command. Recently, FW changed the meaning of the initial
rate field in tx response and it holds the actual rate (which is not
necessarily the initial rate of LQ's rate table). Use instead LQ cmd
color to be able to filter out tx responses/BA notifications which
where sent during earlier LQ commands' time frame.

This fixes some throughput degradation in noisy environments.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

sparc64: Add __multi3 for gcc 7.x and later.

Reported-by: Waldemar Brodkorb <wbx@openadk.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
"Three fixes this time around:

   - Two fixes for noMMU, fixing the decompressor header layout, and
     preventing a build error with some configurations.

   - Fixing the hyp-stub updates that went in during the merge window
     for platforms that use MCPM"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M
  ARM: 8676/1: NOMMU: provide pgprot_device() macro
  ARM: 8675/1: MCPM: ensure not to enter __hyp_soft_restart from loopback and cpu_power_down

net/mlx4: Check if Granular QoS per VF has been enabled before updating QP qos_vport

The Granular QoS per VF feature must be enabled in FW before it can be
used.

Thus, the driver cannot modify a QP's qos_vport value (via the UPDATE_QP FW
command) if the feature has not been enabled -- the FW returns an error if
this is attempted.

Fixes: 08068cd5683f ("net/mlx4: Added qos_vport QP configuration in VST mode")
Signed-off-by: Ido Shamay <idos@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: fix kernel-doc warnings

Fix kernel-doc warnings (typo) in drivers/net/phy/phy.c:

..//drivers/net/phy/phy.c:259: warning: No description found for parameter 'features'
..//drivers/net/phy/phy.c:259: warning: Excess function parameter 'feature' description in 'phy_lookup_setting'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

devlink: fix potential memort leak

We must free allocated skb when genlmsg_put() return fails.

Fixes: 1555d204e743 ("devlink: Support for pipeline debug (dpipe)")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: stmmac: dwmac-sun8i: ensure the EPHY is properly reseted

The EPHY may be already enabled by bootloaders which have Ethernet
capability (e.g. current U-Boot). Thus it should be reseted properly
before doing the enabling sequence in the dwmac-sun8i driver, otherwise
the EMAC reset process may fail if no cable is plugged, and then fail
the dwmac-sun8i probing.

Tested on Orange Pi PC, One and Zero. All the boards fail to have
dwmac-sun8i probed with "EMAC reset timeout" without cable plugged
before, and with this fix they're now all able to successfully probe the
EMAC without cable plugged and then use the connection after a cable is
hot-plugged in.

Fixes: 9f93ac8d408 ("net-next: stmmac: Add dwmac-sun8i")
Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Tested-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Reviewed-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: is not as formal as Signed-off-by:. It is a record that the acker
Reviewed-by: is similar.
Signed-off-by: David S. Miller <davem@davemloft.net>

net/3com: Make el3_netdev_get_ecmd return void

Make return value void since function never returns meaningfull value.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/{mii, smsc}: Make mii_ethtool_get_link_ksettings and smc_netdev_get_ecmd return void

Make return value void since functions never returns meaningfull value.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/dec: Make __de_get_link_ksettings return void

Make return value void since function never return meaningfull value

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: select cls when cls_act is enabled

It really makes no sense to have cls_act enabled without cls. In that
case, the cls_act code is dead. So select it.

This also fixes an issue recently reported by kbuild robot:
[linux-next:master 1326/4151] net/sched/act_api.c:37:18: error: implicit declaration of function 'tcf_chain_get'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: db50514f9a9c ("net: sched: add termination action to allow goto chain")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

genetlink: remove ops_list from genetlink header.

commit d91824c08fbc ("genetlink: register family ops as array") removed the
ops_list member from both genl_family and genl_ops; while the
documentation of genl_family was updated accordingly by this patch,
ops_list remained in the documentation of the genl_ops object.
This patch fixes it by removing ops_list from genl_ops documentation.

Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Update TCP congestion control documentation

Update tcp.txt to fix mandatory congestion control ops and default
CCA selection. Also, fix comment in tcp.h for undo_cwnd.

Signed-off-by: Anmol Sarma <me@anmolsarma.in>
Signed-off-by: David S. Miller <davem@davemloft.net>

rxrpc: Add service upgrade support for client connections

Make it possible for a client to use AuriStor's service upgrade facility.

The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
the first sendmsg() of a call.  This takes no parameters.

When recvmsg() starts returning data from the call, the service ID field in
the returned msg_name will reflect the result of the upgrade attempt.  If
the upgrade was ignored, srx_service will match what was set in the
sendmsg(); if the upgrade happened the srx_service will be altered to
indicate the service the server upgraded to.

Note that:

(1) The choice of upgrade service is up to the server

(2) Further client calls to the same server that would share a connection
     are blocked if an upgrade probe is in progress.

(3) This should only be used to probe the service.  Clients should then
     use the returned service ID in all subsequent communications with that
     server (and not set the upgrade).  Note that the kernel will not
     retain this information should the connection expire from its cache.

(4) If a server that supports upgrading is replaced by one that doesn't,
     whilst a connection is live, and if the replacement is running, say,
     OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
     server will not respond to packets sent to the upgraded connection.

     At this point, calls will time out and the server must be reprobed.

Signed-off-by: David Howells <dhowells@redhat.com>

rxrpc: Implement service upgrade

Implement AuriStor's service upgrade facility.  There are three problems
that this is meant to deal with:

(1) Various of the standard AFS RPC calls have IPv4 addresses in their
     requests and/or replies - but there's no room for including IPv6
     addresses.

(2) Definition of IPv6-specific RPC operations in the standard operation
     sets has not yet been achieved.

(3) One could envision the creation a new service on the same port that as
     the original service.  The new service could implement improved
     operations - and the client could try this first, falling back to the
     original service if it's not there.

     Unfortunately, certain servers ignore packets addressed to a service
     they don't implement and don't respond in any way - not even with an
     ABORT.  This means that the client must then wait for the call timeout
     to occur.

What service upgrade does is to see if the connection is marked as being
'upgradeable' and if so, change the service ID in the server and thus the
request and reply formats.  Note that the upgrade isn't mandatory - a
server that supports only the original call set will ignore the upgrade
request.

In the protocol, the procedure is then as follows:

(1) To request an upgrade, the first DATA packet in a new connection must
     have the userStatus set to 1 (this is normally 0).  The userStatus
     value is normally ignored by the server.

(2) If the server doesn't support upgrading, the reply packets will
     contain the same service ID as for the first request packet.

(3) If the server does support upgrading, all future reply packets on that
     connection will contain the new service ID and the new service ID will
     be applied to *all* further calls on that connection as well.

(4) The RPC op used to probe the upgrade must take the same request data
     as the shadow call in the upgrade set (but may return a different
     reply).  GetCapability RPC ops were added to all standard sets for
     just this purpose.  Ops where the request formats differ cannot be
     used for probing.

(5) The client must wait for completion of the probe before sending any
     further RPC ops to the same destination.  It should then use the
     service ID that recvmsg() reported back in all future calls.

(6) The shadow service must have call definitions for all the operation
     IDs defined by the original service.

To support service upgrading, a server should:

(1) Call bind() twice on its AF_RXRPC socket before calling listen().
     Each bind() should supply a different service ID, but the transport
     addresses must be the same.  This allows the server to receive
     requests with either service ID.

(2) Enable automatic upgrading by calling setsockopt(), specifying
     RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
     unsigned shorts as the argument:

unsigned short optval[2];

     This specifies a pair of service IDs.  They must be different and must
     match the service IDs bound to the socket.  Member 0 is the service ID
     to upgrade from and member 1 is the service ID to upgrade to.

Signed-off-by: David Howells <dhowells@redhat.com>

rxrpc: Permit multiple service binding

Permit bind() to be called on an AF_RXRPC socket more than once (currently
maximum twice) to bind multiple listening services to it. There are some
restrictions:

(1) All bind() calls involved must have a non-zero service ID.

(2) The service IDs must all be different.

(3) The rest of the address (notably the transport part) must be the same
in all (a single UDP socket is shared).

(4) This must be done before listen() or sendmsg() is called.

This allows someone to connect to the service socket with different service
IDs and lays the foundation for service upgrading.

The service ID used by an incoming call can be extracted from the msg_name
returned by recvmsg().

Signed-off-by: David Howells <dhowells@redhat.com>

rxrpc: Separate the connection's protocol service ID from the lookup ID

Keep the rxrpc_connection struct's idea of the service ID that is exposed
in the protocol separate from the service ID that's used as a lookup key.

This allows the protocol service ID on a client connection to get upgraded
without making the connection unfindable for other client calls that also
would like to use the upgraded connection.

The connection's actual service ID is then returned through recvmsg() by
way of msg_name.

Whilst we're at it, we get rid of the last_service_id field from each
channel. The service ID is per-connection, not per-call and an entire
connection is upgraded in one go.

Signed-off-by: David Howells <dhowells@redhat.com>

ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M

As reported by Patrice, the header layout of the decompressor is
incorrect when building for v7-M. In this case, the __nop macro
resolves to 'mov r0, r0', which is emitted as a narrow encoding,
resulting in the header data fields to end up at lower offsets than
required.

Given the variety of targets we need to support with the same code,
the startup sequence is a bit of a jumble, and uses instructions
and macros whose encoding widths cannot be specified (badr), or only
exist in a narrow encoding (bx)

So force the use of a wide encoding in __nop, and replace the start
sequence with a simple jump to the label marking the start of code,
preceded by a Thumb2 mode switch if required (using explicit wide
encodings where appropriate). The label itself can be moved to the
start of code [where it belongs] due to the larger range of branch
instructions as compared to adr instructions.

Reported-by: Patrice CHOTARD <patrice.chotard@st.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

ARM: 8676/1: NOMMU: provide pgprot_device() macro

NOMMU build leads to the following error:

  CC      drivers/pci/mmap.o
drivers/pci/mmap.c: In function 'pci_mmap_resource_range':
drivers/pci/mmap.c:60:3: error: implicit declaration of function 'pgprot_device' [-Werror=implicit-function-declaration]
   vma->vm_page_prot = pgprot_device(vma->vm_page_prot);
   ^

cc1: some warnings being treated as errors
scripts/Makefile.build:302: recipe for target 'drivers/pci/mmap.o' failed
make[2]: *** [drivers/pci/mmap.o] Error 1
scripts/Makefile.build:561: recipe for target 'drivers/pci' failed
make[1]: *** [drivers/pci] Error 2
Makefile:1016: recipe for target 'drivers' failed
make: *** [drivers] Error 2

Fix it with support of pgprot_device() macro for NOMMU.

Fixes: 00d2904ffeac ("ARM/PCI: Use generic pci_mmap_resource_range()")
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

Merge branch 'mlxsw-Minor-cleanup'

Jiri Pirko says:

====================
mlxsw: Minor cleanup

Fix small issues I noticed during the refactoring.

First patch adds file name comments in the header file to make it clear
what goes where. Second patch fixes a typo and third patch simply aligns
RIF index allocation with similar allocations in the driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_router: Align RIF index allocation with existing code

The way we usually allocate an index is by letting the allocation
function return an error instead of an invalid index.

Do the same for RIF index.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: Fix typo inside enumeration

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Tidy up header file

Make it clear where functions are defined and move misplaced declaration
to their correct place.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Rename the firmware file

Change the firmware file name to be in "mellanox" directory.

This commit is a followup to the linux-firmware commit a4c72696f5f4
("Mellanox: Add firmware for mlxsw_spectrum")

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4: Fix the check in attaching steering rules

Our previous patch (cited below) introduced a regression
for RAW Eth QPs.

Fix it by checking if the QP number provided by user-space
exists, hence allowing steering rules to be added for valid
QPs only.

Fixes: 89c557687a32 ("net/mlx4_en: Avoid adding steering rules with invalid ring")
Reported-by: Or Gerlitz <gerlitz.or@gmail.com>
Signed-off-by: Talat Batheesh <talatb@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qed-vf-xdp'

Yuval Mintz says:

qed*: Support VF XDP attachment

====================
Each driver queue [Rx, Tx, XDP-forwarding] requires an allocated HW/FW
connection + configured queue-zone.

VF handling by the PF has several limitations that prevented adding the
capability to perform XDP at driver-level:

- The VF assumes there's 1-to-1 correspondance between the VF queue and
   the used connection, meaning q<x> is always going to use cid<x>,
   whereas for its own queues the PF is acquiring a new cid per each new
   queue.

- There's a 1-to-1 correspondate between the VF-queues and the HW queue
   zones. While this is necessary for Rx-queues [as the queue-zone
   contains the producer], transmission queues can share the underlaying
   queue-zone [only shared configuration is coalescing].
   But all VF<->PF communication mechanisms assume there's a single
   identifier that identify a queue [as queue-zone == queue], while
   sharing queue-zones requires passing additional information.

- VFs currently don't try mapping a doorbell bar - there's a small
   doorbell window in the regview allowing VFs to doorbell up to 16
   connections; but this window isn's wide enough for the added XDP
   forwarding queues.

This series is going to add the necessary infrastrucutre to finally let
our VFs support XDP assuming both the PF and VF drivers are sufficiently
new [Legacy support would be retained both for older VFs and older PFs,
but both will be needed for this new support to work].
Basically, the various database driver maintains for its queue-cids
would be revised, and queue-cids would be identified using the
(queue-zone, unique index) pair. The TLV mechanism would then be
extended to allow VFs to communicate that unique-index as well as the
already provided queue-zone. Finally, the VFs would try to map their
doorbell bar and inform their PF that they're using it.

Almost all the changes are in qed, with exception of #3 [which does some
cleanup in qede as well] and #11 that actually enables the feature.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qede: VF XDP support

This introduces 2 changes needed for XDP to be supported for VFs:

a. On VF-side, publish the NDO based on qed outputs

b. On PF-side, request qed to allocate sufficient cids per-VF
to allow the child vfs to support it

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: VF XDP support

The final addition on the qed front -
- VFs would now require their PFs to provide multiple CIDs
- Based on the availability of connections from PF, determine whether
XDP is feasible and share it with qede via dev_info.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: VFs to try utilizing the doorbell bar

VFs are currently not mapping their doorbell bar, instead relying
on the small doorbell window they have in their limited regview bar.

In order to increase the number of possible Tx connections [queues]
employeed by VF past 16, we need to start using the doorbell bar if
one such is exposed - VF would communicate this fact to PF which would
return the size-bar internally configured into chip, according to
which the VF would decide whether to actually utilize the doorbell
bar.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Multiple qzone queues for VFs

This adds the infrastructure for supporting VFs that want to open
multiple transmission queues on the same queue-zone.
At this point, there are no VFs that actually request this functionality,
but later patches would remedy that.

a. VF and PF would communicate the capability during ACQUIRE;
    Legacy VFs would continue on behaving as they do today

b. PF would communicate number of supported CIDs to the VF
    and would enforce said limitation

c. Whenever VF passes a request for a given queue configuration
    it would also pass an associated index within said queue-zone

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: IOV db support multiple queues per qzone

Allow the infrastructure a PF maintains for each one of its VFs
to support multiple queue-cids on a single queue-zone.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Make VF legacy a bitfield

Until now we used to have a single VF legacy compatibility mode,
one that affected the place of the Rx producers of those VFs [mostly].

As PF would soon support allocating CIDs for VFs instead of having
a static CID<->queue configuration for them, we'll need to have
an additional legacy mode since existing VFs would need to continue
on using the older mode of operation.

Change the infrastrucutre so that the legacy would be able to indicate
which of the legacy behaviors is needed for a given VF.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Assign a unique per-queue index to queue-cid

When a queue-cid is allocated, assign an index inside that's
CID's queue-zone.

For PFs and VFS, this number is going to be unique and derive
from a per-queue-zone bitmap, while for PF's VFs queues the
number is currently going to constant; Later, we'd add the
capability of a VF to communicate such an index to its PF.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Pass vf_params when creating a queue-cid

We're going to need additional information for queue-cids
that a PF creates for its VFs, so start by refactoring existing
logic used for initializing said struct into receiving a structure
encapsulating the VF-specific information that needs to be provided.

This also introduces QED_QUEUE_CID_SELF - each queue-cid would hold
an indication to whether it belongs to the hw-function holding it
[whether that's a PF or a VF], or else what's the VF id it belongs
to.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed*: L2 interface to use the SB structures directly

Part of an effort of a cleaner seperation between qed and the protocol
drivers, the L2 interface is to use the SB structure for initialization
purposes opaquely.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Create L2 queue database

First step in allowing a single PF/VF to open multiple queues on
the same queue zone is to add per-hwfn database of queue-cids
as a two-dimensional array where entry would be according to
[queue zone][internal index].

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Add bitmaps for VF CIDs

Each PF has a bitmap for its own ranges of CIDs, to allow easy grabbing
of an available CID when such is needed. But VFs are not using the same
mechanism, instead relying on hard-coded CIDs [ queue-index == cid ].

As an infrastructure step toward increasing number of CIDs of VFs,
the PF is going to maintain bitmaps for the VF CIDs as well -
the bitmaps would be per-VF and the ranges would be the same [in HW all
VFs of a given PF have the same mapping of CIDs, and the HW is capable
of distinguishing between those according to the VF index]

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sit: reload iphdr in ipip6_rcv

Since iptunnel_pull_header() can call pskb_may_pull(),
we must reload any pointer that was related to skb->head.

Fixes: a09a4c8dd1ec ("tunnels: Remove encapsulation offloads on decap")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'skb-sgvec-overflow'

Jason A. Donenfeld says:

====================
net: Avoiding stack overflow in skb_to_sgvec

The recent bug with macsec and historical one with virtio have
indicated that letting skb_to_sgvec trounce all over an sglist
without checking the length is probably a bad idea. And it's not
necessary either: an sglist already explicitly marks its last
item, and the initialization functions are diligent in doing so.
Thus there's a clear way of avoiding future overflows.

So, this patchset, from a high level, makes skb_to_sgvec return
a potential error code, and then adjusts all callers to check
for the error code. There are two situations in which skb_to_sgvec
might return such an error:

1) When the passed in sglist is too small; and
2) When the passed in skbuff is too deeply nested.

So, the first patch in this series handles the issues with
skb_to_sgvec directly, and the remaining ones then handle the call
sites.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: check return value of skb_to_sgvec always

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macsec: check return value of skb_to_sgvec always

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

rxrpc: check return value of skb_to_sgvec always

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipsec: check return value of skb_to_sgvec always

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow

This is a defense-in-depth measure in response to bugs like
4d6fa57b4dab ("macsec: avoid heap overflow in skb_to_sgvec"). There's
not only a potential overflow of sglist items, but also a stack overflow
potential, so we fix this by limiting the amount of recursion this function
is allowed to do. Not actually providing a bounded base case is a future
disaster that we can easily avoid here.

As a small matter of house keeping, we take this opportunity to move the
documentation comment over the actual function the documentation is for.

While this could be implemented by using an explicit stack of skbuffs,
when implementing this, the function complexity increased considerably,
and I don't think such complexity and bloat is actually worth it. So,
instead I built this and tested it on x86, x86_64, ARM, ARM64, and MIPS,
and measured the stack usage there. I also reverted the recent MIPS
changes that give it a separate IRQ stack, so that I could experience
some worst-case situations. I found that limiting it to 24 layers deep
yielded a good stack usage with room for safety, as well as being much
deeper than any driver actually ever creates.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ping: do not abuse udp_poll()

Alexander reported various KASAN messages triggered in recent kernels

The problem is that ping sockets should not use udp_poll() in the first
place, and recent changes in UDP stack finally exposed this old bug.

Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Solar Designer <solar@openwall.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Acked-By: Lorenzo Colitti <lorenzo@google.com>
Tested-By: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Fix stale cpu_switch reference after unbind then bind

Commit 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]")
replaced the use of dst->ds[0] with dst->cpu_switch since that is
functionally equivalent, however, we can now run into an use after free
scenario after unbinding then rebinding the switch driver.

The use after free happens because we do correctly initialize
dst->cpu_switch the first time we probe in dsa_cpu_parse(), then we
unbind the driver: dsa_dst_unapply() is called, and we rebind again.
dst->cpu_switch now points to a freed "ds" structure, and so when we
finally dereference it in dsa_cpu_port_ethtool_setup(), we oops.

To fix this, simply set dst->cpu_switch to NULL in dsa_dst_unapply()
which guarantees that we always correctly re-assign dst->cpu_switch in
dsa_cpu_parse().

Fixes: 9520ed8fb841 ("net: dsa: use cpu_switch instead of ds[0]")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-Add-BPF-support-to-all-perf_event'

Merge branch 'bpf-Add-BPF-support-to-all-perf_event'

Alexei Starovoitov says:

====================
bpf: Add BPF support to all perf_event

v3->v4: one more tweak to reject unsupported events at map
update time as Peter suggested

v2->v3: more refactoring to address Peter's feedback.
Now all perf_events are attachable and readable

v1->v2: address Peter's feedback. Refactor patch 1 to allow attaching
bpf programs to all event types and reading counters from all of them as well
patch 2 - more tests
patch 3 - address Dave's feedback and document bpf_perf_event_read()
and bpf_perf_event_output() properly
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: update perf event helper functions documentation

This commit updates documentation of the bpf_perf_event_output and
bpf_perf_event_read helpers to match their implementation.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

samples/bpf: add tests for more perf event types

$ trace_event

tests attaching BPF program to HW_CPU_CYCLES, SW_CPU_CLOCK, HW_CACHE_L1D and other events.
It runs 'dd' in the background while bpf program collects user and kernel
stack trace on counter overflow.
User space expects to see sys_read and sys_write in the kernel stack.

$ tracex6

tests reading of various perf counters from BPF program.

Both tests were refactored to increase coverage and be more accurate.

Signed-off-by: Teng Qin <qinteng@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

perf, bpf: Add BPF support to all perf_event types

Allow BPF_PROG_TYPE_PERF_EVENT program types to attach to all
perf_event types, including HW_CACHE, RAW, and dynamic pmu events.
Only tracepoint/kprobe events are treated differently which require
BPF_PROG_TYPE_TRACEPOINT/BPF_PROG_TYPE_KPROBE program types accordingly.

Also add support for reading all event counters using
bpf_perf_event_read() helper.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Fix leak in ipv6_gso_segment().

If ip6_find_1stfragopt() fails and we return an error we have to free
up 'segs' because nobody else is going to.

Fixes: 2423496af35d ("ipv6: Prevent overrun when parsing v6 header options")
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: Really delete an arp/neigh entry on "ip neigh delete" or "arp -d"

The command
  # arp -s 62.2.0.1 a:b:c:d:e:f dev eth2
adds an entry like the following (listed by "arp -an")
  ? (62.2.0.1) at 0a:0b:0c:0d:0e:0f [ether] PERM on eth2
but the symmetric deletion command
  # arp -i eth2 -d 62.2.0.1
does not remove the PERM entry from the table, and instead leaves behind
  ? (62.2.0.1) at <incomplete> on eth2

The reason is that there is a refcnt of 1 for the arp_tbl itself
(neigh_alloc starts off the entry with a refcnt of 1), thus
the neigh_release() call from arp_invalidate() will (at best) just
decrement the ref to 1, but will never actually free it from the
table.

To fix this, we need to do something like neigh_forced_gc: if
the refcnt is 1 (i.e., on the table's ref), remove the entry from
the table and free it. This patch refactors and shares common code
between neigh_forced_gc and the newly added neigh_remove_one.

A similar issue exists for IPv6 Neighbor Cache entries, and is fixed
in a similar manner by this patch.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Reviewed-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: smsc: Implement PHY statistics

Most of the PHYs supported by the SMSC driver have a counter of symbol
errors. This is 16 bit wide and wraps around when it reaches its
maximum value.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-By: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-Fixes-for-mv88e6161'

Andrew Lunn says:

====================
dsa: Fixes for mv88e6161

Testing a board with an mv88e6161 turned up two issues. The PHYs were
not found, because the wrong method to access them was used. The
statistics did not work, because the wrong snapshot method was used
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: mv88e6161 uses mv88e6320 stats snapshot

The mv88e6161 was using the wrong method to perform statistics
snapshot.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: 6161 uses global 2 for PHY access

Access to the internal PHYs of the 6161 and 6123 go through global 2
SMI registers. Fix the ops structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-mv88e6xxx-move-registers-macros'

Vivien Didelot says:

====================
net: dsa: mv88e6xxx: move registers macros

This patchset brings no functional changes.

It is the first step of a cleanup renaming the chip header file and
moving the Register definitions _as is_ in their proper header files.

A following patchset will prefix them with the appropriate model
(MV88E6XXX_ or e.g. MV88E6390_) to respect an implicit namespace and
easily identify model subtleties in registers layout, as correctly done
in the newly added serdes.h header.
====================

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: move the Global 2 macros

Move the GLOBAL2_* macros where they belong, in the related global2.h
header.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: move the Global 1 macros

Move the GLOBAL_* macros where they belong, in the related global1.h
header. Include it in global2.c which uses GLOBAL_STATUS_IRQ_DEVICE.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: move the Port macros

Move the PORT_* macros where they belong, in the related port.h header.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: move PHY macros

Move the PHY_* macros where they belong, in the related phy.h header.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: rename chip header

The mv88e6xxx.h is meant to contains the chip structures and data.
Rename it to chip.h, as for other source/header pairs of the driver.

At the same time, ensure that relative header inclusions are separated
by a newline and sorted alphabetically.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dsa-cleanups-before-multi-CPU-port'

Florian Fainelli says:

====================
net: dsa: Cleanups before multi-CPU port

This patch series does a bunch of cleanups before we start adding support
for multiple CPU ports.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Initialize all CPU and enabled ports masks in dsa_ds_parse()

There was no reason for duplicating the code that initializes
ds->enabled_port_mask in both dsa_parse_ports_dn() and
dsa_parse_ports(), instead move this to dsa_ds_parse() which is early
enough before ops->setup() has run.

While at it, we can now make dsa_is_cpu_port() check ds->cpu_port_mask
which is a step towards being multi-CPU port capable.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Consistently use dsa_port for dsa_*_port_{apply, unapply}

We have all the information we need in dsa_port, so use it instead of
repeating the same arguments over and over again.

Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: Pass dsa_port reference to ethtool setup/restore

We do not need to have a reference to a dsa_switch, instead we should
pass a reference to a CPU dsa_port, change that. This is a preliminary
change to better support multiple CPU ports.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

geneve: fix needed_headroom and max_mtu for collect_metadata

Since commit 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
when using COLLECT_METADATA geneve devices are created with too small of
a needed_headroom and too large of a max_mtu. This is because
ip_tunnel_info_af() is not valid with the device level info when using
COLLECT_METADATA and we mistakenly fall into the IPv4 case.

For COLLECT_METADATA, always use the worst case of ipv6 since both
sockets are created.

Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Eric Garver <e@erig.me>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

sock: reset sk_err when the error queue is empty

Prior to f5f99309fa74 (sock: do not set sk_err in
sock_dequeue_err_skb), sk_err was reset to the error of
the skb on the head of the error queue.

Applications, most notably ping, are relying on this
behavior to reset sk_err for ICMP packets.

Set sk_err to the ICMP error when there is an ICMP packet
at the head of the error queue.

Fixes: f5f99309fa74 (sock: do not set sk_err in sock_dequeue_err_skb)
Reported-by: Cyril Hrubis <chrubis@suse.cz>
Tested-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: use PAGE_ALLOC_COSTLY_ORDER in xgbe_map_rx_buffer

xgbe_map_rx_buffer is rather confused about what PAGE_ALLOC_COSTLY_ORDER
means. It uses PAGE_ALLOC_COSTLY_ORDER-1 assuming that
PAGE_ALLOC_COSTLY_ORDER is the first costly order which is not the case
actually because orders larger than that are costly. And even that
applies only to sleeping allocations which is not the case here. We
simply do not perform any costly operations like reclaim or compaction
for those. Simplify the code by dropping the order calculation and use
PAGE_ALLOC_COSTLY_ORDER directly.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rxrpc: remove redundant proc_remove call

The proc_remove call is dead code as it occurs after a return and
hence can never be called. Remove it.

Detected by CoverityScan, CID#1437743 ("Logically dead code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: consistently use dccp_write_space()

DCCP uses dccp_write_space() for sk->sk_write_space method.

Unfortunately a passive connection (as provided by accept())
is using the generic sk_stream_write_space() function.

Lets simply inherit sk->sk_write_space from the parent
instead of forcing the generic one.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: marvell: make some functions static

functions m88e1510_get_temp_critical, m88e1510_set_temp_critical and
m88e1510_get_temp_alarm can be made static as they not need to be
in global scope.

Cleans up sparse warnings:
"symbol 'm88e1510_get_temp_alarm' was not declared. Should it be static?"
"symbol 'm88e1510_get_temp_critical' was not declared. Should it be
static?"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netxen: remove writeq/readq function definitions

Instead of rewriting write/readq, use linux/io-64-nonatomic-lo-hi.h which
already have them.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-procfs: Use vsnprintf extension %phN

Save a bit of code by using the kernel extension.

$ size net/core/net-procfs.o*
   text    data     bss     dec     hex filename
   3701     120       0    3821     eed net/core/net-procfs.o.new
   3764     120       0    3884     f2c net/core/net-procfs.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6_tunnel: fix traffic class routing for tunnels

ip6_route_output() requires that the flowlabel contains the traffic
class for policy routing.

Commit 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets") removed the code which previously added the
traffic class to the flowlabel.

The traffic class is added here because only route lookup needs the
flowlabel to contain the traffic class.

Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets")
Signed-off-by: Liam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: Peter Dawson <peter.a.dawson@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Linux 4.12-rc4

fs/ufs: Set UFS default maximum bytes per file

This fixes a problem with reading files larger than 2GB from a UFS-2
file system:

https://bugzilla.kernel.org/show_bug.cgi?id=195721

The incorrect UFS s_maxsize limit became a problem as of commit
c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
which started using s_maxbytes to avoid a page index overflow in
do_generic_file_read().

That caused files to be truncated on UFS-2 file systems because the
default maximum file size is 2GB (MAX_NON_LFS) and UFS didn't update it.

Here I simply increase the default to a common value used by other file
systems.

Signed-off-by: Richard Narron <comet.berkeley@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Will B <will.brokenbourgh2877@gmail.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: <stable@vger.kernel.org> # v4.9 and backports of c2a9737f45e2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

net: qcom/emac: do not use hardware mdio automatic polling

Use software polling (PHY_POLL) to check for link state changes instead
of relying on the EMAC's hardware polling feature.  Some PHY drivers
are unable to get a functioning link because the HW polling is not
robust enough.

The EMAC is able to poll the PHY on the MDIO bus looking for link state
changes (via the Link Status bit in the Status Register at address 0x1).
When the link state changes, the EMAC triggers an interrupt and tells the
driver what the new state is.  The feature eliminates the need for
software to poll the MDIO bus.

Unfortunately, this feature is incompatible with phylib, because it
ignores everything that the PHY core and PHY drivers are trying to do.
In particular:

1. It assumes a compatible register set, so PHYs with different registers
   may not work.

2. It doesn't allow for hardware errata that have work-arounds implemented
   in the PHY driver.

3. It doesn't support multiple register pages. If the PHY core switches
   the register set to another page, the EMAC won't know the page has
   changed and will still attempt to read the same PHY register.

4. It only checks the copper side of the link, not the SGMII side.  Some
   PHY drivers (e.g. at803x) may also check the SGMII side, and
   report the link as not ready during autonegotiation if the SGMII link
   is still down.  Phylib then waits for another interrupt to query
   the PHY again, but the EMAC won't send another interrupt because it
   thinks the link is up.

Cc: stable@vger.kernel.org # 4.11.x
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

mdio: mux: make child bus walking more permissive and errors more verbose

If any errors are encountered while walking the device tree structure of
the MDIO bus for children, the code may silently continue, silently
exit, or throw an error and exit.  This make it difficult for device
tree writers to know there is an error.  Also, it makes any error in a
child entry of the MDIO bus be fatal for all entries.  Instead, we
should provide verbose errors describing the error and then attempt to
continue if it all possible.  Also, use of_mdio_parse_addr()

Signed-off-by: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-dissection-and-matching-on-tos-and-ttl'

Or Gerlitz says:

====================
net: add support for dissection and matching on ip tos and ttl

The 1st two patches enable matching/classifying on ip tos and ttl by
the flow dissector and flower. The other two patches offload matching
on tcp flags and ip tos in mlx5.

The mlx5 patches touch single file/function and not interfere with
other inflight mlx5 submissions.

V2: repost as asked by Dave.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>