Driver report supported TSO (v4 & v6) and IP checksum offload
in addition to previously supported features. In data path
skbs are checked for non-zero gso_size, and when detected sent
to additional function for processing TSO SKBs. Since HW does not
fully support TSO, additional effort is required from the driver.
Driver partitions the data into mss sized descriptors which are
then DMAed to the HW.
Signed-off-by: Vladimir Shulman <QCA_shulmanv@QCA.qualcomm.com> Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Peter Oh [Wed, 29 Jul 2015 08:58:50 +0000 (11:58 +0300)]
ath10k: initialize msdu ext. descriptor before use
Initial QCA99X0 support has a known issue with TCP Tx throughput.
All other path such as UDP Tx/Rx and TCP Rx meet their expectation
(> 900Mbps), but TCP Tx marked as low as 5Mbps when single pair is
used on iperf.
The root cause is turned out because TSO flag is not initialized
properly so that firmware configures TSO in wrong way.
TSO flags in msdu extension descriptor is required to be reset
to indicate firmware there is no TSO is enabled, otherwise it
could act as TSO is enabled which causes huge throughput drop.
In fact, it's enough by resetting TSO flags only to prevent the
unexpected behavior, but initializing whole msdu ext. descriptor
will help to clear uncertainty of firmware could bring on as it
constantly updated.
Signed-off-by: Peter Oh <poh@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Add vendor/device id of QCA99X0 V2.0 to pci id table and
QCA99X0_HW_2_0_CHIP_ID_REV to ath10k_pci_supp_chips[] for
QCA99X0 to get detected by the driver.
kvalo: now QCA99X0 family of chipsets is supported by ath10k.
Tested client, AP and monitor mode with QCA9990.
Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Raja Mani [Wed, 29 Jul 2015 08:40:38 +0000 (11:40 +0300)]
ath10k: increase max client to 512 in qca99x0
When max client was set to 512 in qca99x0, there was host memory
alloc failure during wmi service ready event handling. This issue
got resolved now, increasing max client limit from 256 to 512.
Signed-off-by: Raja Mani <rmani@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Raja Mani [Wed, 29 Jul 2015 08:40:38 +0000 (11:40 +0300)]
ath10k: fix memory alloc failure in qca99x0 during wmi svc rdy event
Host memory required for firmware is allocated while handling
wmi service ready event. Right now, wmi service ready is handled
in tasklet context and it calls dma_alloc_coherent() with atomic
flag (GFP_ATOMIC) to allocate memory in host needed for firmware.
The problem is, dma_alloc_coherent() with GFP_ATOMIC fails in
the platform (at least in AP platform) where it has less atomic
pool memory (< 2mb). QCA99X0 requires around 2 MB of host memory
for one card, having additional QCA99X0 card in the same platform
will require similarly amount of memory. So, it's not guaranteed that
all the platform will have enough atomic memory pool.
Fix this issue, by handling wmi service ready event in workqueue
context and calling dma_alloc_coherent() with GFP_KERNEL. mac80211 work
queue will not be ready at the time of handling wmi service ready.
So, it can't be used to handle wmi service ready. Also, register work
gets scheduled during insmod in existing ath10k_wq and waits for
wmi service ready to completed. Both workqueue can't be used for
this purpose. New auxiliary workqueue is added to handle wmi service
ready.
Signed-off-by: Raja Mani <rmani@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
David Liu [Fri, 24 Jul 2015 17:25:32 +0000 (20:25 +0300)]
ath10k: enable raw encap mode and software crypto engine
This patch enables raw Rx/Tx encap mode to support software based
crypto engine. This patch introduces a new module param 'cryptmode'.
cryptmode:
0: Use hardware crypto engine globally with native Wi-Fi mode TX/RX
encapsulation to the firmware. This is the default mode.
1: Use sofware crypto engine globally with raw mode TX/RX
encapsulation to the firmware.
Known limitation:
A-MSDU must be disabled for RAW Tx encap mode to perform well when
heavy traffic is applied.
Testing: (by Michal Kazior <michal.kazior@tieto.com>)
Note:
- only open network tested for RAW vs nwifi performance comparison
- killer1525 (qca6174 hw2.2) is 2x2 device (hence max 866mbps)
- used iperf
- OTA, devices a few cm apart from each other, no shielding
- tcpX/udpX, X - means number of threads used
Overview:
- relative Tx performance drop is seen but is within reasonable and
expected threshold (A-MSDU must be disabled with RAW Tx)
b) Connectivity Testing
cryptmode=1
ap=iwl6205 sta1=qca988x crypto=open topology-1ap1sta OK
ap=iwl6205 sta1=qca988x crypto=wep1 topology-1ap1sta OK
ap=iwl6205 sta1=qca988x crypto=wpa topology-1ap1sta OK
ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta OK
ap=qca988x sta1=iwl6205 crypto=open topology-1ap1sta OK
ap=qca988x sta1=iwl6205 crypto=wep1 topology-1ap1sta OK
ap=qca988x sta1=iwl6205 crypto=wpa topology-1ap1sta OK
ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta OK
ap=iwl6205 sta1=qca988x crypto=open topology-1ap1sta2br OK
ap=iwl6205 sta1=qca988x crypto=wep1 topology-1ap1sta2br OK
ap=iwl6205 sta1=qca988x crypto=wpa topology-1ap1sta2br OK
ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta2br OK
ap=qca988x sta1=iwl6205 crypto=open topology-1ap1sta2br OK
ap=qca988x sta1=iwl6205 crypto=wep1 topology-1ap1sta2br OK
ap=qca988x sta1=iwl6205 crypto=wpa topology-1ap1sta2br OK
ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta2br OK
ap=iwl6205 sta1=qca988x crypto=open topology-1ap1sta2br1vlan OK
ap=iwl6205 sta1=qca988x crypto=wep1 topology-1ap1sta2br1vlan OK
ap=iwl6205 sta1=qca988x crypto=wpa topology-1ap1sta2br1vlan OK
ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta2br1vlan OK
ap=qca988x sta1=iwl6205 crypto=open topology-1ap1sta2br1vlan OK
ap=qca988x sta1=iwl6205 crypto=wep1 topology-1ap1sta2br1vlan OK
ap=qca988x sta1=iwl6205 crypto=wpa topology-1ap1sta2br1vlan OK
ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta2br1vlan OK
Note:
- each test takes all possible endpoint pairs and pings
- each pair-ping flushes arp table
- ip6 is used
endpoints: vlan0_id2, vlan1_id2
note: STA works in 4addr mode, AP has wds_sta=1
Credits:
Thanks to Michal Kazior <michal.kazior@tieto.com> who helped find the
amsdu issue, contributed a workaround (already squashed into this
patch), and contributed the throughput and connectivity tests results.
Signed-off-by: David Liu <cfliu.tw@gmail.com> Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Tested-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ath10k: Improve performance by reducing tx_lock contention
During tx completion, tx_lock is held for longer than required, preventing
efficient refill of htt->pending_tx. Refactor the code so that only MSDU
related operations are protected by the lock.
Improves downstream performance on a dual-core ARM Freescale LS1024A
(f.k.a. Mindspeed Comcerto 2000) AP with a 3x3 client from 495 to 580 Mbps.
Other CPU bound multicore systems may also benefit.
Signed-off-by: Denton Gentry <dgentry@google.com> Signed-off-by: Avery Pennarun <apenwarr@google.com>
[mfaltesek@google.com: removed conflicting code for tracking msdu_ids.] Signed-off-by: Marty Faltesek <mfaltesek@google.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ath10k: suppress 'failed to process fft' warning messages
When using DFS channels on Ath10k, kernel log has repeated warning message
'failed to process fft: -22' typically under medium/heavy traffic.
This patch switches the warnings to driver debug (WMI events) mode only
thus reducing log file noise.
DFS and spectral scan share underlying HW mechanisms and enabling one
(DFS) enables the other (spectral scan) as far as event reporting from
firmware to driver is concerned. Spectral scan events take no part in
processing of DFS radar pulses which are delivered as distinct events,
so the fft (spectral event) warning is harmless and DFS interference
detection/protection still occurs.
Symptoms seen & fix tested in both debug & non-debug modes on TP-Link
Archer C7 v2 platform.
Signed-off-by: Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Raja Mani [Tue, 21 Jul 2015 05:22:00 +0000 (10:52 +0530)]
ath10k: extend struct htt_mgmt_tx_dec for qca99x0
HTT_H2T_MSG_TYPE_MGMT_TX msg in 10.4 firmware carries additional
4 byte in htt_mgmt_tx_desc where it tells to firmware that at what
rate mgmt frame has to go out in the air. It's an optional parameter,
setting this field to zero will force firmware to choose auto rate
and send the frame out.
Those 4 byte info is missed out in the current code and 10.4 firmware
ended up reading some junk in those 4 byte and sometime malfunctioning.
Fix it by adding 4 byte in struct htt_mgmt_tx_desc. Non 10.4 firmware
will not process those four byte. So, adding 4 byte at the end of
struct htt_mgmt_tx_desc will not create any impact on other chipset.
Signed-off-by: Raja Mani <rmani@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ath10k: fix wrong initialization of struct channel
chandef is initialized with NULL and on the very next line, we are using it to
get channel, which is not correct. Channel should be initialized after
obtaining chandef.
Found by cppcheck:
ath/ath10k/mac.c:839]: (error) Possible null pointer dereference: chandef
Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Peter Oh [Thu, 16 Jul 2015 02:01:21 +0000 (19:01 -0700)]
ath10k: add support for qca99x0 Rx descriptors
QCA99X0 chip has an extra 4 bytes in rx_msdu_start,
20 bytes in rx_msdu_end and 20 bytes in rx_ppdu_end structure
which are used in htt_rx_desc and HTT Rx ring offset setup.
This is necessary for correct Rx for QCA99X0 or Rx descriptors
will be overwritten and corrupted.
With this patch QCA988X and QCA6174 will have extra 44 bytes
padding in Rx descriptor layout which is harmless.
Signed-off-by: Peter Oh <poh@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Peter Oh [Thu, 16 Jul 2015 02:01:20 +0000 (19:01 -0700)]
ath10k: redefine rx_ppdu_end_common structure to cover qca99x0
rx_ppdu_end_common structure is valid for both of qca998x and
qca6174, but not for qca99x0 since it has new additional members.
Hence update the common structure to cover qca99x0 as well.
Signed-off-by: Peter Oh <poh@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Peter Oh [Thu, 16 Jul 2015 02:01:19 +0000 (19:01 -0700)]
ath10k: update tx path to support QCA99X0
Since QCA99X0 uses fragmentation descriptor differently from
other ones on tx path, we need to handle it separately.
QCA99X0 is using 48 bits for address and 16 bits for length
out of 2 dword and each values have to be programmed by frag
desc base addr + msdu id, so that hardware can retrieve
corresponding frag data.
Signed-off-by: Peter Oh <poh@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
It is observed that during cold reset pcie access right
after a write operation to SOC_GLOBAL_RESET_ADDRESS causes
Data Bus Error and system hard lockup. The reason
for bus error is that pcie needs some time to get
back to stable state for any transaction during cold reset. Add
delay of 20 msecs after write of SOC_GLOBAL_RESET_ADDRESS
to fix this issue. This patch is tested on QCA988X. This is
also tested on QCA99X0 which is WIP.
Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Zefir Kurtisi [Tue, 16 Jun 2015 10:52:16 +0000 (12:52 +0200)]
ath9k: DFS - add pulse chirp detection for FCC
FCC long pulse radar (type 5) requires pulses to be
checked for chirping. This patch implements chirp
detection based on the FFT data provided for long
pulses.
A chirp is detected when a set of criteria defined
by FCC pulse characteristics is met, including
* have at least 4 FFT samples
* max_bin index moves equidistantly between samples
* the gradient is within defined range
The chirp detection has been tested with reference
radar generating devices and proved to work reliably.
Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Zefir Kurtisi [Tue, 16 Jun 2015 09:46:42 +0000 (11:46 +0200)]
ath9k: DFS - consider ext_channel pulses only in HT40 mode
The chip reports radar pulses on extension channel
even if operating in HT20 mode. This patch adds a
sanity check for HT40 mode before it feeds pulses
on extension channel to the pattern detector.
Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
commit 2c86c275015c ("Add ipw2100 wireless driver.") introduced
HW_PHY_OFF_LOOP_DELAY (HZ / 5000) which always evaluated to 0. Clarified
by Stanislav Yakovlev <stas.yakovlev@gmail.com> that it should be 50
milliseconds thus fixed up to msecs_to_jiffies(50).
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Acked-by: Stanislav Yakovlev <stas.yakovlev@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Michal Kazior [Thu, 9 Jul 2015 11:08:38 +0000 (13:08 +0200)]
ath10k: fix per-vif queue locking
Whenever any vdev was supposed to be paused all Tx
queues were stopped (except offchannel) instead of
only these associated with the given vdev.
This caused subtle issues with
multi-channel/multi-vif scenarios, e.g.
authentication of station vif could sometimes fail
depending on fw tx pause request timing.
Fixes: b4aa539dd8f2 ("ath10k: implement tx pause wmi event") Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Michal Kazior [Thu, 9 Jul 2015 11:08:37 +0000 (13:08 +0200)]
ath10k: update vdev ps state on start
Psmode can be forcefully enabled when vdev isn't
started. It isn't guaranteed that mac80211 will
re-issue psmode setting after vdev is started
unless actual bss_conf.ps value has changed.
Even if this doesn't fix any problems now it may
prevent future breakage.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Michal Kazior [Thu, 9 Jul 2015 11:08:36 +0000 (13:08 +0200)]
ath10k: fix hw roc expiration notifcation
The expiration function must not be called when
roc is explicitly cancelled by mac80211. However
since fcf9844636be ("ath10k: fix hw roc
expiration") the notification was never sent when
roc actually expired.
This fixes some P2P connection setup issues.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Michal Kazior [Thu, 9 Jul 2015 11:08:35 +0000 (13:08 +0200)]
ath10k: limit multi-vif ps more aggresivelly
Further testing proved that multi-channel AP+STA
on QCA6174 with RM.2.0-00088 should have powersave
force-disabled to avoid beacon misses/skipping on
either side which in turn could disrupt
communication.
Since AP never has arvif->ps don't even bother
checking it. Other combinations may be broken as
well so disallow powersave with multivif outright
unless firmware advertises otherwise.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Michal Kazior [Thu, 9 Jul 2015 11:08:34 +0000 (13:08 +0200)]
ath10k: don't set cck/ofdm scan flags
mac80211 already does provide complete IEs for
Probe Requests for hw scan and ath10k firmware was
appending duplicate Supported Rates IEs
unnecessarily.
Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
ath10k: Fix target to cpu address conversion logic
In commit 418ca5992e2f ("ath10k: Make target cpu address to
CE address conversion chip specific") mask 0x7fff is added
by mistake instead of 0x7ff. Fix this regression.
Fixes: 418ca5992e2f ("ath10k: Make target cpu address to CE address conversion chip specific") Signed-off-by: Vasanthakumar Thiagarajan <vthiagar@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
commit a521ee983d312db7 ("ath10k: Add new reg_address/mask to hw register
table") broke QCA61x4 support by providing wrong
fw_indicator_address, which should have been 0x0003a028 instead of 0x00009028.
User experience was a failing boot up sequence (crashing device during
initialization):
[ 181.663874] ath10k_pci 0000:02:00.0: enabling device (0000 -> 0002)
[ 181.664787] ath10k_pci 0000:02:00.0: pci irq msi-x interrupts 8 irq_mode 0 reset_mode 0
[ 181.688886] ath10k_pci 0000:02:00.0: device has crashed during init
[ 181.688897] ath10k_pci 0000:02:00.0: failed to wait for target after cold reset: -70
[ 181.688902] ath10k_pci 0000:02:00.0: failed to reset chip: -70
[ 181.689774] ath10k_pci: probe of 0000:02:00.0 failed with error -70
Fix it by updating the address with correct value.
Fixes: a521ee983d31 ("ath10k: Add new reg_address/mask to hw register table") Signed-off-by: Bartosz Markowski <bartosz.markowski@tieto.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
David S. Miller [Sat, 11 Jul 2015 06:24:31 +0000 (23:24 -0700)]
Merge branch 'be2net-next'
Sathya Perla says:
====================
be2net: patch set
Hi David, the following patch set has code cleanup patches, minor enhancements
and non-critical fixes. Pls consider applying to the net-next tree. Thanks!
Patch 1 removes duplicate code in be_setup_wol() routine making it simpler
and more readable.
Patch 2 fixes the the bridge mode return value for the ndo_bridge_getlink()
call. Instead of just relying on the SRIOV enabled state, the driver now
queries the FW, for the actual mode of bridge.
Patch 3 removes code for setting D0 power state as it's already done
in pci_enable_device()
Patch 4 fixes a bad return value in be_check_ufi_compatibility() routine
introduced by an earlier commit.
Patch 5 fixes a field in udp header being accessed while in network endian
format.
Patch 6 fixes the be_mcc_notify() routine to return an error status when
the FW/HW is in an error state.
Patch 7 fixes the be_cmd_rx_filter() routine to issue the RX_FILTER cmd
and not wait for a completion from the FW. If the FW/adapter
is in an error state, this change helps in not holding up the rtnl_lock
and keeping bottom halves disabled while the driver timesout waiting for
a response from the FW.
Patch 8 fixes the be_cmd_set_loopback() routine to issue the LOOPBACK cmd
and not wait for the FW completion while spin_lock_bh() is held on the
mcc_lock. As the cmd is always issued from ethtool in a process context,
it can sleep till the FW completion is received.
Patch 9 bumps up the driver version to 10.6.0.3
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The SET_LOOPBACK_MODE command is always issued from ethtool only in a
process context. So, while waiting for the cmd to complete, the driver
can sleep instead of holding spin_lock_bh() on the mcc_lock. This is done
by calling be_mcc_notify() instead of be_mcc_notify_wait() (that returns
only after the cmd completes while the MCCQ is locked).
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This fix makes the RX_FILTER cmd asynchronous, i.e., the caller issues
this cmd and doesn't wait for a completion from the FW. If the FW/adapter
is in an error state, this change helps in not holding up the rtnl_lock
and keeping bottom halves disabled while the driver timesout waiting for
a response from the FW.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the adapter is in error state, return error from be_mcc_notify()
so that the caller routines need not sleep waiting for a response.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
be2net: convert dest field in udp-hdr to host-endian
The "dest" field in the UDP-hdr of a TX skb is in network endian format.
Convert it to host endian before accessing it. The os2bmc patch,
mentioned below introduced this code.
Fixes: 760c295e0e8d ("be2net: Support for OS2BMC") Signed-off-by: Venkat Duvvuru <VenkatKumar.Duvvuru@Emulex.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
be2net: fix wrong return value in be_check_ufi_compatibility()
In the commit a6e6ff6eee12f3e
("be2net: simplify UFI compatibility checking"), a return value of "-1"
was incorrectly used in place of "false". This patch fixes it.
Fixes: a6e6ff6eee12f3e ("be2net: simplify UFI compatibility checking") Signed-off-by: Vasundhara Volam <vasundhara.volam@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
pci_enable_device() call sets device power state to D0; there is no need
doing it again.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The current code assumes that bridge functionality (EVB) in the adapter
is enabled only when SR-IOV is enabled. This is not always true.
This patch uses the GET_HSW_CONFIG FW cmd to query this from the FW.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This change will make be_setup_wol() routine more compact and readable
by removing some duplicate code.
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com> Signed-off-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: Do not iterate over all interfaces when finding source address on specific interface.
If outgoing interface is specified and the candidate address is
restricted to the outgoing interface, it is enough to iterate
over that given interface only.
Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Thomson [Fri, 10 Jul 2015 01:56:54 +0000 (13:56 +1200)]
net: phy: Pass mdix ethtool setting through to phy driver
Pass the mdix setting from ethtool down to the phy driver, to allow
driver specific implementations of manually setting the polarity.
Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 9 Jul 2015 16:01:40 +0000 (18:01 +0200)]
tcp: do not export tcp_init_xmit_timers()
After commit 900f65d361d3 ("tcp: move duplicate code from
tcp_v4_init_sock()/tcp_v6_init_sock()"), we no longer
need to export tcp_init_xmit_timers()
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fill also the port group state when sending notifications.
Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 8 Jul 2015 23:58:22 +0000 (16:58 -0700)]
ipv6: Nonlocal bind
Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.
This add the ip_nonlocal_bind sysctl under ipv6.
Testing:
Set up nonlocal binding and receive routing on a host, e.g.:
ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1
Set up routing to 2001:0:0:1::/64 on peer to go to first host
ping6 -I 2001:0:0:1::1 peer-address -- to verify
Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 8 Jul 2015 21:28:29 +0000 (14:28 -0700)]
inet: simplify timewait refcounting
timewait sockets have a complex refcounting logic.
Once we realize it should be similar to established and
syn_recv sockets, we can use sk_nulls_del_node_init_rcu()
and remove inet_twsk_unhash()
In particular, deferred inet_twsk_put() added in commit 13475a30b66cd ("tcp: connect() race with timewait reuse")
looks unecessary : When removing a timewait socket from
ehash or bhash, caller must own a reference on the socket
anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dsa: mv88e6352/mv88e6xxx: Add support for Marvell 88E6320 and 88E6321
MV88E6320 and MV88E6321 are largely compatible to MV886352,
but are members of a different chip family.
Signed-off-by: Aleksey S. Kazantsev <ioctl@yandex.ru> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 9 Jul 2015 21:22:53 +0000 (14:22 -0700)]
Merge branch 'tcp-in-slow-start'
Yuchung Cheng says:
====================
tcp: fixes some congestion control corner cases
This patch series fixes corner cases of TCP congestion control.
First issue is to avoid continuing slow start when cwnd reaches ssthresh.
Second issue is incorrectly processing order of congestion state and
cwnd update when entering fast recovery or undoing cwnd.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: update congestion state first before raising cwnd
The congestion state and cwnd can be updated in the wrong order.
For example, upon receiving a dubious ACK, we incorrectly raise
the cwnd first (tcp_may_raise_cwnd()/tcp_cong_avoid()) because
the state is still Open, then enter recovery state to reduce cwnd.
For another example, if the ACK indicates spurious timeout or
retransmits, we first revert the cwnd reduction and congestion
state back to Open state. But we don't raise the cwnd even though
the ACK does not indicate any congestion.
To fix this problem we should first call tcp_fastretrans_alert() to
process the dubious ACK and update the congestion state, then call
tcp_may_raise_cwnd() that raises cwnd based on the current state.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In the original design slow start is only used to raise cwnd
when cwnd is stricly below ssthresh. It makes little sense
to slow start when cwnd == ssthresh: especially
when hystart has set ssthresh in the initial ramp, or after
recovery when cwnd resets to ssthresh. Not doing so will
also help reduce the buffer bloat slightly.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper to test the slow start condition in various congestion
control modules and other places. This is to prepare a slight improvement
in policy as to exactly when to slow start.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 9 Jul 2015 18:02:52 +0000 (11:02 -0700)]
net: skb_defer_rx_timestamp should check for phydev before setting up classify
This change makes it so that the call skb_defer_rx_timestamp will first
check for a phydev before going in and manipulating the skb->data and
skb->len values. By doing this we can avoid unnecessary work on network
devices that don't support phydev. As a result we reduce the total
instruction count needed to process this on most devices.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maxwell [Wed, 8 Jul 2015 00:12:28 +0000 (10:12 +1000)]
tcp: v1 always send a quick ack when quickacks are enabled
V1 of this patch contains Eric Dumazet's suggestion to move the per
dst RTAX_QUICKACK check into tcp_in_quickack_mode(). Thanks Eric.
I ran some tests and after setting the "ip route change quickack 1"
knob there were still many delayed ACKs sent. This occured
because when icsk_ack.quick=0 the !icsk_ack.pingpong value is
subsequently ignored as tcp_in_quickack_mode() checks both these
values. The condition for a quick ack to trigger requires
that both icsk_ack.quick != 0 and icsk_ack.pingpong=0. Currently
only icsk_ack.pingpong is controlled by the knob. But the
icsk_ack.quick value changes dynamically depending on heuristics.
The crux of the matter is that delayed acks still cannot be entirely
disabled even with the RTAX_QUICKACK per dst knob enabled. This
patch ensures that a quick ack is always sent when the RTAX_QUICKACK
per dst knob is turned on.
The "ip route change quickack 1" knob was recently added to enable
quickacks. It was modeled around the TCP_QUICKACK setsockopt() option.
This issue is that even with "ip route change quickack 1" enabled
we still see delayed ACKs under some conditions. It would be nice
to be able to completely disable delayed ACKs.
Here is an example:
# netstat -s|grep dela
3 delayed acks sent
For all routes enable the knob
# ip route change quickack 1
Generate some traffic across a slow link and we still see the delayed
acks.
# netstat -s|grep dela
106 delayed acks sent
1 delayed acks further delayed because of locked socket
The issue is that both the "ip route change quickack 1" knob and
the TCP_QUICKACK option set the icsk_ack.pingpong variable to 0.
However at the business end in the __tcp_ack_snd_check() routine,
tcp_in_quickack_mode() checks that both icsk_ack.quick != 0
and icsk_ack.pingpong=0 in order to trigger a quickack. As
icsk_ack.quick is determined by heuristics it can be 0. When
that occurs the icsk_ack.pingpong value is ignored and a delayed
ACK is sent regardless.
This patch moves the RTAX_QUICKACK per dst check into the
tcp_in_quickack_mode() routine which ensures that a quickack is
always sent when the quickack knob is enabled for that dst.
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Scott Feldman [Wed, 8 Jul 2015 23:06:47 +0000 (16:06 -0700)]
rocker: add change MTU support
Implement ndo_change_mtu: on MTU change, reallocate Rx ring bufs and signal
HW of new port MTU value.
Signed-off-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Tested-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Xi Wang [Wed, 8 Jul 2015 21:00:56 +0000 (14:00 -0700)]
test_bpf: extend tests for 32-bit endianness conversion
Currently "ALU_END_FROM_BE 32" and "ALU_END_FROM_LE 32" do not test if
the upper bits of the result are zeros (the arm64 JIT had such bugs).
Extend the two tests to catch this.
Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Jul 2015 23:13:55 +0000 (16:13 -0700)]
Merge branch 'cxgb4-t6'
Hariprasad Shenai says:
====================
Cleanup, T6 changes and register range update
This patch series adds the following:
Don't use entire L2T table, update register ranges for T6 adapter,
read stats for only available channels for T6 and enable cim_la dump for
T6 adapter also.
This patch series has been created against net-next tree and includes
patches on cxgb4 driver.
We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.
====================
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb4: Don't use entire L2T table, use only its slice
The driver was retrieving the parameters for the bounds of its
slice of the L2T from the firmware and then throwing those away and
using the entire table. This corrects that problem.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
hv_netvsc: Add support to set MTU reservation from guest side
When packet encapsulation is in use, the MTU needs to be reduced for
headroom reservation.
The existing code takes the updated MTU value only from the host side.
But vSwitch extensions, such as Open vSwitch, require the flexibility
to change the MTU to different values from within a guest during the
lifecycle of a vNIC, when the encapsulation protocol is changed. The
patch supports this kind of MTU changes.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Carol Soto [Mon, 6 Jul 2015 14:20:19 +0000 (09:20 -0500)]
net/mlx4_core: Add extra check for total vfs for SRIOV
Add extra check for total vfs for SRIOV to check if that value is
bigger than total vfs in pci SRIOV capabalities. Fix a check and
print of the number of maximum vfs that hw can handle. Fix a check
and print of the number of maximum vfs per port that driver can handle.
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Holzheu [Mon, 6 Jul 2015 14:20:07 +0000 (16:20 +0200)]
samples: bpf: enable trace samples for s390x
The trace bpf samples do not compile on s390x because they use x86
specific fields from the "pt_regs" structure.
Fix this and access the fields via new PT_REGS macros.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Punnaiah Choudary Kalluri <punnaia@xilinx.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jul 2015 12:18:09 +0000 (05:18 -0700)]
net_sched: act_mirred: remove spinlock in fast path
Like act_gact, act_mirred can be lockless in packet processing
1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) use rcu to protect tcfm_dev
4) Remove spinlock usage, as it is no longer needed.
Next step : add multi queue capability to ifb device
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jul 2015 12:18:08 +0000 (05:18 -0700)]
net_sched: act_gact: remove spinlock in fast path
Final step for gact RCU operation :
1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) Remove spinlock acquisition, as it is no longer needed.
Since this is the last contended lock in packet RX when tc gact is used,
this gives impressive gain.
My host with 8 RX queues was handling 5 Mpps before the patch,
and more than 11 Mpps after patch.
Tested:
On receiver :
dev=eth0
tc qdisc del dev $dev ingress 2>/dev/null
tc qdisc add dev $dev ingress
tc filter del dev $dev root pref 10 2>/dev/null
tc filter del dev $dev pref 10 2>/dev/null
tc filter add dev $dev est 1sec 4sec parent ffff: protocol ip prio 1 \
u32 match ip src 7.0.0.0/8 flowid 1:15 action drop
Sender sends packets flood from 7/8 network
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jul 2015 12:18:07 +0000 (05:18 -0700)]
net_sched: act_gact: read tcfg_ptype once
Third step for gact RCU operation :
Following patch will get rid of spinlock protection,
so we need to read tcfg_ptype once.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jul 2015 12:18:06 +0000 (05:18 -0700)]
net_sched: act_gact: use a separate packet counters for gact_determ()
Second step for gact RCU operation :
We want to get rid of the spinlock protecting gact operations.
Stats (packets/bytes) will soon be per cpu.
gact_determ() would not work without a central packet counter,
so lets add it for this mode.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jul 2015 12:18:05 +0000 (05:18 -0700)]
net_sched: act_gact: make tcfg_pval non zero
First step for gact RCU operation :
Instead of testing if tcfg_pval is zero or not, just make it 1.
No change in behavior, but slightly faster code.
The smp_rmb()/smp_wmb() barriers, while not strictly needed at this
stage are added for upcoming spinlock removal.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jul 2015 12:18:04 +0000 (05:18 -0700)]
net: sched: add percpu stats to actions
Reuse existing percpu infrastructure John Fastabend added for qdisc.
This patch adds a new cpustats parameter to tcf_hash_create() and all
actions pass false, meaning this patch should have no effect yet.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 6 Jul 2015 12:18:03 +0000 (05:18 -0700)]
net: sched: extend percpu stats helpers
qdisc_bstats_update_cpu() and other helpers were added to support
percpu stats for qdisc.
We want to add percpu stats for tc action, so this patch add common
helpers.
qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update()
qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop()
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 2 Jul 2015 11:24:44 +0000 (13:24 +0200)]
mlx4: TCP/UDP packets have L4 hash
Mellanox driver has the knowledge if rxhash is a L4 hash,
if it receives a non fragmented TCP or UDP frame and
NETIF_F_RXCSUM is enabled on netdev.
ip_summed value is CHECKSUM_UNNECESSARY in this case.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Amir Vadai <amirv@mellanox.com> Cc: Ido Shamay <idos@mellanox.com> Acked-by: Ido Shamay <idos@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Jul 2015 20:29:46 +0000 (13:29 -0700)]
Merge branch 'tcp-policer-drops'
Yuchung Cheng says:
====================
tcp: reducing lost retransmits in recovery
This patch series reduces lost retransmits in recovery, in particular
when dealing with traffic policers. The main problem is that
slow start in recovery under policing can cause massive lost and
retransmit storms: any excess sending rate turns into drops. The
solution is to avoid doing slow start when lost retransmit is
detected and use packet conservation instead.
On networks with traffic policers the patches have lowered the
TCP loss rates by ~20% from Google servers without latency regressions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: PRR uses CRB mode by default and SS mode conditionally
PRR slow start is often too aggressive especially when drops are
caused by traffic policers. The policers mainly use token bucket
to enforce the rate so sending (twice) faster than the delivery
rate causes excessive drops.
This patch changes PRR to the conservative reduction bound
(CRB) mode in RFC 6937 by default. CRB follows the packet
conservation rule to send at most the delivery rate by default.
But if many packets are lost and the pipe is empty, CRB may take N
round trips to repair N losses. We conditionally turn on slow start
mode if all these conditions are made to speed up the recovery:
1) on the second round or later in recovery
2) retransmission sent in the previous round is delivered on this ACK
3) no retransmission is marked lost on this ACK
By using packet conservation by default, this change reduces the loss
retransmits signicantly on networks that deploy traffic policers,
up to 20% reduction of overall loss rate.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If the retransmission in CA_Loss is lost again, we should not
continue to slow start or raise cwnd in congestion avoidance mode.
Instead we should enter fast recovery and use PRR to reduce cwnd,
following the principle in RFC5681:
"... or the loss of a retransmission, should be taken as two
indications of congestion and, therefore, cwnd (and ssthresh) MUST
be lowered twice in this case."
This is especially important to reduce loss when the CA_Loss
state was caused by a traffic policer dropping the entire inflight.
The CA_Loss state has a problem where a loss of L packets causes the
sender to send a burst of L packets. So a policer that's dropping
most packets in a given RTT can cause a huge retransmit storm. By
contrast, PRR includes logic to bound the number of outbound packets
that result from a given ACK. So switching to CA_Recovery on lost
retransmits in CA_Loss avoids this retransmit storm problem when
in CA_Loss.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb4: Fix incorrect sequence numbers shown in devlog
Part of commit 49aa284fe64c4c1 ("cxgb4: Add support for devlog")
change introduced a real bug where the Device Log Sequence Numbers are
no longer being converted from firmware Big-Endian to local CPU-Endian
format.
This patch moves all of the translation into the devlog_show() routine.
The only endianness code now in devlog_open() is the small loop to find the
earliest (lowest Sequence Number) Device Log entry in the circular buffer.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: Make MLD packets to only be processed locally
Before commit daad151263cf ("ipv6: Make ipv6_is_mld() inline and use it
from ip6_mc_input().") MLD packets were only processed locally. After the
change, a copy of MLD packet goes through ip6_mr_input, causing
MRT6MSG_NOCACHE message to be generated to user space.
Make MLD packet only processed locally.
Fixes: daad151263cf ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().") Signed-off-by: Hermin Anggawijaya <hermin.anggawijaya@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>