yankejian [Tue, 8 Dec 2015 03:02:31 +0000 (11:02 +0800)]
net: hns: optimize XGE capability by reducing cpu usage
here is the patch raising the performance of XGE by:
1)changes the way page management method for enet momery, and
2)reduces the count of rmb, and
3)adds Memory prefetching
Signed-off-by: Kejian Yan <yankejian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tejun Heo [Mon, 7 Dec 2015 22:38:53 +0000 (17:38 -0500)]
sock, cgroup: add sock->sk_cgroup
In cgroup v1, dealing with cgroup membership was difficult because the
number of membership associations was unbound. As a result, cgroup v1
grew several controllers whose primary purpose is either tagging
membership or pull in configuration knobs from other subsystems so
that cgroup membership test can be avoided.
net_cls and net_prio controllers are examples of the latter. They
allow configuring network-specific attributes from cgroup side so that
network subsystem can avoid testing cgroup membership; unfortunately,
these are not only cumbersome but also problematic.
Both net_cls and net_prio aren't properly hierarchical. Both inherit
configuration from the parent on creation but there's no interaction
afterwards. An ancestor doesn't restrict the behavior in its subtree
in anyway and configuration changes aren't propagated downwards.
Especially when combined with cgroup delegation, this is problematic
because delegatees can mess up whatever network configuration
implemented at the system level. net_prio would allow the delegatees
to set whatever priority value regardless of CAP_NET_ADMIN and net_cls
the same for classid.
While it is possible to solve these issues from controller side by
implementing hierarchical allowable ranges in both controllers, it
would involve quite a bit of complexity in the controllers and further
obfuscate network configuration as it becomes even more difficult to
tell what's actually being configured looking from the network side.
While not much can be done for v1 at this point, as membership
handling is sane on cgroup v2, it'd be better to make cgroup matching
behave like other network matches and classifiers than introducing
further complications.
In preparation, this patch updates sock->sk_cgrp_data handling so that
it points to the v2 cgroup that sock was created in until either
net_prio or net_cls is used. Once either of the two is used,
sock->sk_cgrp_data reverts to its previous role of carrying prioidx
and classid. This is to avoid adding yet another cgroup related field
to struct sock.
As the mode switching can happen at most once per boot, the switching
mechanism is aimed at lowering hot path overhead. It may leak a
finite, likely small, number of cgroup refs and report spurious
prioidx or classid on switching; however, dynamic updates of prioidx
and classid have always been racy and lossy - socks between creation
and fd installation are never updated, config changes don't update
existing sockets at all, and prioidx may index with dead and recycled
cgroup IDs. Non-critical inaccuracies from small race windows won't
make any noticeable difference.
This patch doesn't make use of the pointer yet. The following patch
will implement netfilter match for cgroup2 membership.
v2: Use sock_cgroup_data to avoid inflating struct sock w/ another
cgroup specific field.
v3: Add comments explaining why sock_data_prioidx() and
sock_data_classid() use different fallback values.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Daniel Wagner <daniel.wagner@bmw-carit.de> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tejun Heo [Mon, 7 Dec 2015 22:38:52 +0000 (17:38 -0500)]
net: wrap sock->sk_cgrp_prioidx and ->sk_classid inside a struct
Introduce sock->sk_cgrp_data which is a struct sock_cgroup_data.
->sk_cgroup_prioidx and ->sk_classid are moved into it. The struct
and its accessors are defined in cgroup-defs.h. This is to prepare
for overloading the fields with a cgroup pointer.
This patch mostly performs equivalent conversions but the followings
are noteworthy.
* Equality test before updating classid is removed from
sock_update_classid(). This shouldn't make any noticeable
difference and a similar test will be implemented on the helper side
later.
* sock_update_netprioidx() now takes struct sock_cgroup_data and can
be moved to netprio_cgroup.h without causing include dependency
loop. Moved.
* The dummy version of sock_update_netprioidx() converted to a static
inline function while at it.
Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Tejun Heo [Mon, 7 Dec 2015 22:38:51 +0000 (17:38 -0500)]
netprio_cgroup: limit the maximum css->id to USHRT_MAX
netprio builds per-netdev contiguous priomap array which is indexed by
css->id. The array is allocated using kzalloc() effectively limiting
the maximum ID supported to some thousand range. This patch caps the
maximum supported css->id to USHRT_MAX which should be way above what
is actually useable.
This allows reducing sock->sk_cgrp_prioidx to u16 from u32. The freed
up part will be used to overload the cgroup related fields.
sock->sk_cgrp_prioidx's position is swapped with sk_mark so that the
two cgroup related fields are adjacent.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Daniel Borkmann <daniel@iogearbox.net> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The virtio-vsock device specification is not finalized yet. Michael
Tsirkin voiced concerned about merging this code when the hardware
interface (and possibly the userspace interface) could still change.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Mon, 7 Dec 2015 21:41:43 +0000 (00:41 +0300)]
sh_eth: get rid of bb_{set|clr|read}()
After the MDIO bitbang code consolidation, there's no need anymore for
bb_{set|clr}() as well as bb_read() -- just expand them inline, thus
saving more LoCs...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Mon, 7 Dec 2015 21:40:57 +0000 (00:40 +0300)]
sh_eth: factor out common code from MDIO bitbang methods
sh_mm[cd]_ctrl() and sh_set_mdio() all look mostly the same -- factor out
their common code and put it into sh_mdio_ctrl().
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Mon, 7 Dec 2015 21:40:19 +0000 (00:40 +0300)]
sh_eth: remove mask fields from 'struct bb_info'
The MDIO control bits are always mapped to the same bits of the same register
(PIR), so there's no need to store their masks in the 'struct bb_info'...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 8 Dec 2015 17:39:15 +0000 (12:39 -0500)]
Merge tag 'wireless-drivers-next-for-davem-2015-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Vallo says:
====================
brcfmac
* support bcm4359 which can operate in two bands concurrently
* disable runtime pm for USB avoiding issues
* use generic pm callback in PCIe driver
* support wowlan wake indication reporting
* add beamforming support
* unified handling of firmware files
ath10k
* support Manegement Frame Protection (MFP)
* add thermal throttling support for 10.4 firmware
* add support for pktlog in QCA99X0
* add debugfs file to enable Bluetooth coexistence feature
* use firmware's native mesh interface type instead of raw mode
iwlwifi
* BT coex improvements
* D3 operation bugfixes
* rate control improvements
* firmware debugging infra improvements
* ground work for multi Rx
* various security fixes
====================
Rainer Weikusat [Tue, 8 Dec 2015 14:47:56 +0000 (14:47 +0000)]
net: Fix inverted test in __skb_recv_datagram
As the kernel generally uses negated error numbers, *err needs to be
compared with -EAGAIN (d'oh).
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: ea3793ee29d3 ("core: enable more fine-grained datagram reception control") Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Dec 2015 21:35:51 +0000 (16:35 -0500)]
Merge branch 'more-dsa-unbinding-fixes'
Neil Armstrong says:
====================
Further fix for dsa unbinding
This series fixes further issues for DSA dynamic unbinding.
The first patch completely removes the PHY link state polling.
The two following cleans up the dsa state upon removal.
The last patch moves slave destroy code as slave function and
adds missing netdev and phy cleanup calls.
v1: http://lkml.kernel.org/r/562F8ECB.6050709@baylibre.com
v2: http://lkml.kernel.org/r/56321D9A.8010109@baylibre.com
remove phy fix and add missing calls in dsa_switch_destroy
then add dedicated dsa_slave_destroy
v3: remove polling instead of fixing it, make single patch for
dsa slave destroy
====================
Acked-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Armstrong [Mon, 7 Dec 2015 12:57:35 +0000 (13:57 +0100)]
net: dsa: move dsa slave destroy code to slave.c
Move dsa slave dedicated code from dsa_switch_destroy to a new
dsa_slave_destroy function in slave.c.
Add the netif_carrier_off and phy_disconnect calls in order to
correctly cleanup the netdev state and PHY state machine.
Signed-off-by: Frode Isaksen <fisaksen@baylibre.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Armstrong [Mon, 7 Dec 2015 12:57:33 +0000 (13:57 +0100)]
net: dsa: cleanup resources upon module removal
Make sure that we unassign the master_netdev dsa_ptr to make the packet
processing go through the regular Ethernet receive path.
Suggested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Dec 2015 19:10:10 +0000 (14:10 -0500)]
Merge tag 'mac80211-next-for-davem-2015-12-07' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
This pull request got a bit bigger than I wanted, due to
needing to reshuffle and fix some bugs. I merged mac80211
to get the right base for some of these changes.
* new mac80211 API for upcoming driver changes: EOSP handling,
key iteration
* scan abort changes allowing to cancel an ongoing scan
* VHT IBSS 80+80 MHz support
* re-enable full AP client state tracking after fixes
* various small fixes (that weren't relevant for mac80211)
* various cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we have moved on to using allocated pages to carve receive
buffers instead of netdev_alloc_skb() there is no need to store
any pointers for later retrieval. Earlier we had to store
skb and skb->data pointers which later are used to handover
received packet to network stack.
This will avoid an unnecessary cache miss as well.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilan Peer [Sun, 6 Dec 2015 19:19:15 +0000 (21:19 +0200)]
mac80211: handle HW ROC expired properly
In case of HW ROC, when the driver reports that the ROC expired,
it is not sufficient to purge the ROCs based on the remaining
time, as it possible that the device finished the ROC session
before the actual requested duration.
To handle such cases, in case of ROC expired notification from
the driver, complete all the ROCs which are marked with hw_begun,
regardless of the remaining duration.
Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Rainer Weikusat [Sun, 6 Dec 2015 21:11:38 +0000 (21:11 +0000)]
af_unix: fix unix_dgram_recvmsg entry locking
The current unix_dgram_recvsmg code acquires the u->readlock mutex in
order to protect access to the peek offset prior to calling
__skb_recv_datagram for actually receiving data. This implies that a
blocking reader will go to sleep with this mutex held if there's
presently no data to return to userspace. Two non-desirable side effects
of this are that a later non-blocking read call on the same socket will
block on the ->readlock mutex until the earlier blocking call releases it
(or the readers is interrupted) and that later blocking read calls
will wait longer than the effective socket read timeout says they
should: The timeout will only start 'ticking' once such a reader hits
the schedule_timeout in wait_for_more_packets (core.c) while the time it
already had to wait until it could acquire the mutex is unaccounted for.
The patch avoids both by using the __skb_try_recv_datagram and
__skb_wait_for_more packets functions created by the first patch to
implement a unix_dgram_recvmsg read loop which releases the readlock
mutex prior to going to sleep and reacquires it as needed
afterwards. Non-blocking readers will thus immediately return with
-EAGAIN if there's no data available regardless of any concurrent
blocking readers and all blocking readers will end up sleeping via
schedule_timeout, thus honouring the configured socket receive timeout.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rainer Weikusat [Sun, 6 Dec 2015 21:11:34 +0000 (21:11 +0000)]
core: enable more fine-grained datagram reception control
The __skb_recv_datagram routine in core/ datagram.c provides a general
skb reception factility supposed to be utilized by protocol modules
providing datagram sockets. It encompasses both the actual recvmsg code
and a surrounding 'sleep until data is available' loop. This is
inconvenient if a protocol module has to use additional locking in order
to maintain some per-socket state the generic datagram socket code is
unaware of (as the af_unix code does). The patch below moves the recvmsg
proper code into a new __skb_try_recv_datagram routine which doesn't
sleep and renames wait_for_more_packets to
__skb_wait_for_more_packets, both routines being exported interfaces. The
original __skb_recv_datagram routine is reimplemented on top of these
two functions such that its user-visible behaviour remains unchanged.
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 7 Dec 2015 03:38:58 +0000 (04:38 +0100)]
PHY: DP83867: Remove looking in parent device for OF properties
Device tree properties for a phy device are expected to be in the phy
node. The current code for the DP83867 also tries to look in the
parent node. The devices binding documentation does not mention this,
no current device tree file makes use of this, and it is not behaviour
we want. So remove looking in the parent device.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Sun, 6 Dec 2015 21:47:15 +0000 (22:47 +0100)]
net: cdc_ncm: add "ndp_to_end" sysfs attribute
Adding a writable sysfs attribute for the "NDP to end"
quirk flag.
This makes it easier for end users to test new devices for
this firmware bug. We've been lucky so far, but we should
not depend on reporters capable of rebuilding the driver.
Cc: Enrico Mioso <mrkiko.rs@gmail.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 7 Dec 2015 03:40:46 +0000 (22:40 -0500)]
Merge branch 'mlx4-HA-LAG-SRIOV-VF'
Or Gerlitz says:
====================
Add HA and LAG support for mlx4 SRIOV VFs
This series is built upon the code added in commit ce388ff "Merge branch
'mlx4-next'" which added HA and LAG support to mlx4 RoCE and SRIOV services.
We add HA and Link Aggregation support to single ported mlx4 Ethernet VFs.
In this case, the PF Ethernet interfaces are bonded, the VFs see single
port HW devices (already supported) -- however, now this port is highly
available. This means that all VF HW QPs (both VF Ethernet driver and VF
RoCE / RAW QPs) are subject to the V2P (Virtual-To-Physical) mapping which
is managed by the PF driver, and hence resilient across link failures and
such events.
When bonding operates in Dynamic link aggregation (802.3ad) mode, traffic
from each VF will go over the VF "base port" (the port this VF is assigned
to by the admin) as long as this port is up. When the port fails, traffic
from all VFs that are defined on this port will move to the other port, and
be back to their base-port when it recovers.
Moni and Or.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:43 +0000 (18:07 +0200)]
net/mlx4_core: Support the HA mode for SRIOV VFs too
When the mlx4 driver runs in HA mode, and all VFs are single ported
ones, we make their single port Highly-Available.
This is done by taking advantage of the HA mode properties (following
bonding changes with programming the port V2P map, etc) and adding
the missing parts which are unique to SRIOV such as mirroring VF
steering rules on both ports.
Due to limits on the MAC and VLAN table this mode is enabled only when
number of total VFs is under 64.
Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Sun, 6 Dec 2015 16:07:42 +0000 (18:07 +0200)]
IB/mlx4: Use the VF base-port when demuxing mad from wire
Under HA mode, it's possible that the VF registered its GID
(and expects to get mads through the PV scheme) on a port which is
different from the one this mad arrived on, due to HA fail over.
Therefore, if the gid is not matched on the port that the packet arrived
on, check for a match on the other port if HA mode is active -- and if a
match is found on the other port, continue processing the mad using that
other port.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:41 +0000 (18:07 +0200)]
net/mlx4_core: Keep VLAN/MAC tables mirrored in multifunc HA mode
Due to HW limitations, indexes to MAC and VLAN tables are always taken
from the table of the actual port. So, if a resource holds an index to
a table, it may refer to different values during the lifetime of the
resource, unless the tables are mirrored. Also, even when
driver is not in HA mode the policy of allocating an index to these
tables is such to make sure, as much as possible, that when the time
comes the mirroring will be successful. This means that in multifunction
mode the allocation of a free index in a port's table tries to make sure
that the same index in the other's port table is also free.
Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:40 +0000 (18:07 +0200)]
net/mlx4_core: Support mirroring VF DMFS rules on both ports
Under HA mode, steering rules set by VFs should be mirrored on both
ports of the device so packets will be accepted no matter on which
port they arrived.
Since getting into HA mode is done dynamically when the user bonds mlx4
Ethernet netdevs, we keep hold of the VF DMFS rule mbox with the port
value flipped (1->2,2->1) and execute the mirroring when getting into
HA mode. Later, when going out of HA mode, we unset the mirrored rules.
In that context note that mirrored rules cannot be removed explicitly.
Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Sun, 6 Dec 2015 16:07:39 +0000 (18:07 +0200)]
net/mlx4_core: Use both physical ports to dispatch link state events to VF
Under HA mode, the link down event should be sent to VFs only if both
ports are down.
Signed-off-by: Moni Shoua <monis@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Sun, 6 Dec 2015 16:07:38 +0000 (18:07 +0200)]
net/mlx4_core: Use both physical ports to set the VF link state
In HA mode, the link state for VFs for which the policy is "auto"
(i.e. follow the physical link state) should be ORed from both ports.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
CC: Asias He <asias@redhat.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Julia Lawall <julia.lawall@lip6.fr> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Sun, 6 Dec 2015 20:25:50 +0000 (21:25 +0100)]
net: qmi_wwan: should hold RTNL while changing netdev type
The notifier calls were thrown in as a last-minute fix for an
imagined "this device could be part of a bridge" problem. That
revealed a certain lack of locking. Not to mention testing...
Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Singhai, Anjali [Fri, 4 Dec 2015 07:49:31 +0000 (23:49 -0800)]
Revert "i40e: remove CONFIG_I40E_VXLAN"
This reverts commit 8fe269991aece394a7ed274f525d96c73f94109a.
The case where VXLAN is a module and i40e driver is inbuilt
will not be handled properly with this change since i40e
will have an undefined symbol vxlan_get_rx_port in it.
v2: Add a signed-off-by.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 6 Dec 2015 16:14:06 +0000 (11:14 -0500)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2015-12-05
This series contains updates to fm10k only.
Jacob provides the remaining fm10k patches in the series. First change
ensures that all the logic regarding the setting of netdev features is
consolidated in one place of the driver. Fixed an issue where an assumption
was being made on how many queues are available, especially when init_hw_vf()
errors out. Fixed up an number of issues with init_hw() where failures
were not being handled properly or at all, so update the driver to check
returned error codes and respond appropriately. Fixed up typecasting
issues found where either the incorrect typecast size was used or
explicitly typecast values. Added additional debugging statistics and
rename statistic to better reflect its true value. Added support for
ITR scaling based on PCIe link speed for fm10k. Fixed up code comment
where "hardware" was misspelled.
v2: Dropped patches #1 and #10 from original submission, patch #1 was from
Nick Krause and due to his past kernel interactions, dropping his patch.
Patch #10 had questions and concerns from Tom Herbert which cannot be
addressed at this time since the author (Jacob Keller) is currently on
sabbatical, so dropping this patch for now until we can properly address
Tom's questions and concerns.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Fri, 16 Oct 2015 17:57:09 +0000 (10:57 -0700)]
fm10k: change default Tx ITR to 25usec
The current default ITR for Tx is overly restrictive. Using a simple
netperf TCP_STREAM test, we top out at about 10Gb/s for a single thread
when running using 1500 byte frames. By reducing the ITR value to 25usec
(up to 40K interrupts a second from 10K), we are able to achieve 36Gb/s
for a single thread TCP stream test.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:08 +0000 (10:57 -0700)]
fm10k: use macro for default Tx and Rx ITR values
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:07 +0000 (10:57 -0700)]
fm10k: Update adaptive ITR algorithm
The existing adaptive ITR algorithm is overly restrictive. It throttles
incorrectly for various traffic rates, and does not produce good
performance. The algorithm now allows for more interrupts per second,
and does some calculation to help improve for smaller packet loads. In
addition, take into account the new itr_scale from the hardware which
indicates how much to scale due to PCIe link speed.
Reported-by: Matthew Vick <matthew.vick@intel.com> Reported-by: Alex Duyck <alexander.duyck@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:06 +0000 (10:57 -0700)]
fm10k: introduce ITR_IS_ADAPTIVE macro
Define a macro for identifying when the itr value is dynamic or
adaptive. The concept was taken from i40e. This helps make clear what
the check is, and reduces the line length to something more reasonable
in a few places.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:05 +0000 (10:57 -0700)]
fm10k: Add support for ITR scaling based on PCIe link speed
The Intel Ethernet Switch FM10000 Host Interface interrupt throttle
timers are based on the PCIe link speed. Because of this, the value
being programmed into the ITR registers must be scaled accordingly.
For the PF, this is as simple as reading the PCIe link speed and storing
the result. However, in the case of SR-IOV, the VF's interrupt throttle
timers are based on the link speed of the PF. However, the VF is unable
to get the link speed information from its configuration space, so the
PF must inform it of what scale to use.
Rather than pass this scale via mailbox message, take advantage of
unused bits in the TDLEN register to pass the scale. It is the
responsibility of the PF to program this for the VF while setting up the
VF queues and the responsibility of the VF to get the information
accordingly. This is preferable because it allows the VF to set up the
interrupts properly during initialization and matches how the MAC
address is passed in the TDBAL/TDBAH registers.
Since we're modifying fm10k_type.h, we may as well also update the
copyright year.
Reported-by: Matthew Vick <matthew.vick@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:03 +0000 (10:57 -0700)]
fm10k: rename mbx_tx_oversized statistic to mbx_tx_dropped
Originally this statistic was renamed because the method of dropping was
called "drop_oversized_messages", but this logic has changed much, and
this counter does actually represent messages which we failed to
transmit for a number of reasons. Rename the counter back to tx_dropped
since this is when it will increment, and it is less confusing.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:02 +0000 (10:57 -0700)]
fm10k: add statistics for actual DWORD count of mbmem mailbox
A previous bug was uncovered by addition of a debug stat to indicate the
actual number of DWORDS we pulled from the mbmem. It turned out this was
not the same as the tx_dwords counter. While the previous bug fix should
have corrected this in all cases, add some debug stats that count the
number of DWORDs pushed or pulled from the mbmem. A future debugger may
take advantage of this statistic for debugging purposes. Since we're
modifying fm10k_mbx.h, update the copyright year as well.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:57:00 +0000 (10:57 -0700)]
fm10k: Correct typecast in fm10k_update_xc_addr_pf
Since the resultant data type of the mac_update.mac_upper field is u16,
it does not make sense to typecast u8 variables to u32 first. Since
we're modifying fm10k_pf.c, also update the copyright year.
Reported-by: Matthew Vick <matthew.vick@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:59 +0000 (10:56 -0700)]
fm10k: reinitialize queuing scheme after calling init_hw
The init_hw function may fail, and in the case of VFs, it might change
the number of maximum queues available. Thus, for every flow which
checks init_hw, we need to ensure that we clear the queue scheme before,
and initialize it after. The fm10k_io_slot_reset path will end up
triggering a reset so fm10k_reinit needs this change. The
fm10k_io_error_detected and fm10k_io_resume also need to properly clear
and reinitialize the queue scheme.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:58 +0000 (10:56 -0700)]
fm10k: always check init_hw for errors
A recent change modified init_hw in some flows the function may fail on
VF devices. For example, if a VF doesn't yet own its own queues.
However, many callers of init_hw didn't bother to check the error code.
Other callers checked but only displayed diagnostic messages without
actually handling the consequences.
Fix this by (a) always returning and preventing the netdevice from going
up, and (b) printing the diagnostic in every flow for consistency. This
should resolve an issue where VF drivers would attempt to come up
before the PF has finished assigning queues.
In addition, change the dmesg output to explicitly show the actual
function that failed, instead of combining reset_hw and init_hw into a
single check, to help for future debugging.
Fixes: 1d568b0f6424 ("fm10k: do not assume VF always has 1 queue") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:57 +0000 (10:56 -0700)]
fm10k: reset max_queues on init_hw_vf failure
VF drivers must detect how many queues are available. Previously, the
driver assumed that each VF has at minimum 1 queue. This assumption is
incorrect, since it is possible that the PF has not yet assigned the
queues to the VF by the time the VF checks. To resolve this, we added a
check first to ensure that the first queue is infact owned by the VF at
init_hw_vf time. However, the code flow did not reset hw->mac.max_queues
to 0. In some cases, such as during reinit flows, we call init_hw_vf
without clearing the previous value of hw->mac.max_queues. Due to this,
when init_hw_vf errors out, if its error code is not properly handled
the VF driver may still believe it has queues which no longer belong to
it. Fix this by clearing the hw->mac.max_queues on exit due to errors.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 Oct 2015 17:56:56 +0000 (10:56 -0700)]
fm10k: set netdev features in one location
Don't change netdev hw_features later in fm10k_probe, instead set all
values inside fm10k_alloc_netdev. To do so, we need to know the MAC type
(whether it is PF or VF) in order to determine what to do. This helps
ensure that all logic regarding features is co-located.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Krishneil Singh <krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sergei Shtylyov [Fri, 4 Dec 2015 21:58:57 +0000 (00:58 +0300)]
sh_eth: read MAC address registers only once
The code reading the MAHR/MALR registers in read_mac_address() is terribly
ineffective -- it reads MAHR 4 times and MALR 2 times, while it's enough to
read each register only once. Use the local variables to achieve that,
somewhat beautifying the code while at it...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 4 Dec 2015 21:58:07 +0000 (00:58 +0300)]
ravb: read MAC address registers only once
The code reading the MAHR/MALR registers in ravb_read_mac_address() is
terribly ineffective -- it reads MAHR 4 times and MALR 2 times, while
it's enough to read each register only once. Use the local variables to
achieve that, somewhat beautifying the code while at it...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This removes one redundant error message in bnx2x and changes another one to
WARN_ONCE. The third patch is a small simplification in ethtool stats.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Fri, 4 Dec 2015 16:22:36 +0000 (17:22 +0100)]
bnx2x: simplify distinction between port and func stats
The 'flags' field in bnx2x_stats_arr[] serves only one purpose - to tell
us if the statistic is a per-port stat and thus should not be shown for
virtual functions. It's strange that the field can have three different
values. A boolean will do just fine.
Also remove IS_FUNC_STAT(). It was used only once and it's in fact just
a negation of IS_PORT_STAT().
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Fri, 4 Dec 2015 16:22:35 +0000 (17:22 +0100)]
bnx2x: change FW GRO error message to WARN_ONCE
It's supposed to be impossible for TPA to give us anything else
than IPv4 or IPv6 here. But in case there is a way to reach this error
by some strange received frames, we don't want to flood the kernel log.
WARN_ONCE is better for this purpose.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Fri, 4 Dec 2015 16:22:34 +0000 (17:22 +0100)]
bnx2x: drop redundant error message about allocation failure
alloc_pages() already prints a warning when it fails. No need to emit
another message. Certainly not at KERN_ERR level, because it is no big
deal if this GFP_ATOMIC allocation fails occasionally.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Pieczko [Fri, 4 Dec 2015 08:48:39 +0000 (08:48 +0000)]
sfc: check warm_boot_count after other functions have been reset
A change in MCFW behaviour means that the net driver must update its record
of the warm_boot_count by reading it from the ER_DZ_BIU_MC_SFT_STATUS
register.
On v4.6.x MCFW the global boot count was incremented when some functions
needed to be reset to enable multicast chaining, so all functions saw the
same value. In that case, the driver needed to increment its
warm_boot_count when other functions were reset, to avoid noticing it later
and then trying to reset itself to recover unnecessarily.
With v4.7+ MCFW, the boot count in firmware doesn't change as that is
unnecessary since the PFs that have been reset will each receive an MC
reboot notification. In that case, the driver re-reads the unchanged
value.
Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Fri, 4 Dec 2015 07:43:19 +0000 (08:43 +0100)]
atm: solos-pci: Replace simple_strtol by kstrtoint
The simple_strtol function is obsolete.
This patch replace it by kstrtoint.
This will simplify code, since some error case not handled by
simple_strtol are handled by kstrtoint.
Signed-off-by: LABBE Corentin <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 5 Dec 2015 22:41:42 +0000 (17:41 -0500)]
Merge branch 'batman-hdlc'
Andrew Lunn says:
====================
Allow BATMAN to use hdlc-eth interfaces
BATMAN works over Ethernet like interfaces. hdlc-eth provides the need
requirements. However, hdlc devices are often created as raw hdlc
devices, which batman cannot use, and are then be transmuted into
other types using sethdlc(1). Have the HDLC code emit
NETDEV_*_TYPE_CHANGE events when the type changes, and have BATMAN
react on these events.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 3 Dec 2015 20:12:33 +0000 (21:12 +0100)]
batman-adv: Act on NETDEV_*_TYPE_CHANGE events
A network interface can change type. It may change from a type which
batman does not support, e.g. hdlc, to one it does, e.g. hdlc-eth.
When an interface changes type, it sends two notifications. Handle
these notifications.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 3 Dec 2015 20:12:31 +0000 (21:12 +0100)]
WAN: HDLC: Call notifiers before and after changing device type
An HDLC device can change type when the protocol driver is changed.
Calling the notifier change allows potential users of the interface
know about this planned change, and even block it. After the change
has occurred, send a second notification to users can evaluate the new
device type etc.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 3 Dec 2015 20:12:30 +0000 (21:12 +0100)]
WAN: HDLC: Detach protocol before unregistering device
The current code first unregisters the device, and then detaches the
protocol from it. This should be performed the other way around, since
the detach may try to use state which has been freed by the
unregister. Swap the order, so that we first detach and then remove the
netdev.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Dec 2015 21:56:23 +0000 (16:56 -0500)]
Merge branch 'qmi_wwan_MDM9x30'
Bjørn Mork says:
====================
net: qmi_wwan: MDM9x30 support
We add new device IDs all the time, often without any testing on
actual hardware. This is usually OK as long as the device is similar
to already supported devices, using the same chipset and firmware
basis. But the Sierra Wireless MC7455 is an example of a new chipset
generation. Adding it based on assumed similarity with its ancestors
proved too optimistic.
This series adds the missing bits and pieces necessary to support LTE
Advanced modems based on the Qualcomm MDM9x30 chipset. A big thanks to
Sierra Wireless for providing MC7455 samples for testing
The most important change is the "raw-ip" support. The series also
adds a necessary control request, removes an unsupported device ID,
and adds a driver specific entry in MAINTAINERS.
A few random notes about "raw-ip":
"I rather have these all running in raw IP mode. The 802.3 framing is
utterly stupid." - Marcel Holtmann in Jan 2012 [1]
Marcel was right. I should have listened to him. What more can I say?
The 802.3 framing has provided a steady supply of firmware bugs for
many years. We've added driver workarounds for many of these, but
there are still known bugs where the workaround is so yucky that we
have refused to apply it. But all that is over now. The latest
generation Qualcomm chips no longer supports 802.3 framing at all.
I had two open questions regarding the "raw-ip" userspace API:
1) Should we continue faking an ethernet device, even if we don't use
the L2 headers on the USB link anymore?
There was a vote in favour of the "headerless" device. This is the
honest representation of the hardware/firmware interface.
2) What input should the driver base its framing on?
Snooping or directly manipulating QMI is considered out of the
question. We delegated all QMI handling to userspace from the
beginning.
We have so far required userspace to configure the firmware for
"802.3" framing, or fail if that proved impossible. This
requirement is now changed. Userspace must now inform the driver
if it negotiates "raw-ip" framing. Two alternative interfaces were
proposed:
- ethtool private driver flag, or
- sysfs file
The NetworkManager/ModemManager developers were in favour of the
sysfs alternative.
These questions (or any other you migh have :) are of course still
open. This patch set presents the solutions I currently prefer,
considering the above.
All comments are appreciated, even simple '+1' ones.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:21 +0000 (19:24 +0100)]
net: qmi_wwan: support "raw IP" mode
QMI wwan devices have traditionally emulated ethernet devices
by default. But they have always had the capability of operating
without any L2 header at all, transmitting and receiving "raw"
IP packets over the USB link. This firmware feature used to be
configurable through the QMI management protocol.
Traditionally there was no way to verify the firmware mode
without attempting to change it. And the firmware would often
disallow changes anyway, i.e. due to a session already being
established. In some cases, this could be a hidden firmware
internal session, completely outside host control. For these
reasons, sticking with the "well known" default mode was safest.
But newer generations of QMI hardware and firmware have moved
towards defaulting to "raw IP" mode instead, followed by an
increasing number of bugs in the already buggy "802.3" firmware
implementation. At the same time, the QMI management protocol
gained the ability to detect the current mode. This has enabled
the userspace QMI management application to verify the current
firmware mode without trying to modify it.
Following this development, the latest QMI hardware and firmware
(the MDM9x30 generation) has dropped support for "802.3" mode
entirely. Support for "raw IP" framing in the driver is therefore
necessary for these devices, and to a certain degree to work
around problems with the previous generation,
This patch adds support for "raw IP" framing for QMI devices,
changing the netdev from an ethernet device to an ARPHRD_NONE
p-t-p device when "raw IP" framing is enabled.
The firmware setup is fully delegated to the QMI userspace
management application, through simple tunneling of the QMI
protocol. The driver will therefore not know which mode has been
"negotiated" between firmware and userspace. Allowing userspace
to inform the driver of the result through a sysfs switch is
considered a better alternative than to change the well established
clean delegation of firmware management to userspace.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:20 +0000 (19:24 +0100)]
usbnet: allow mini-drivers to consume L2 headers
Assume the minidriver has taken care of all L2 header parsing
if it sets skb->protocol. This allows the minidriver to
support non-ethernet L2 headers, and even operate without
any L2 header at all.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Thu, 3 Dec 2015 18:24:18 +0000 (19:24 +0100)]
net: qmi_wwan: MDM9x30 specific power management
MDM9x30 based modems appear to go into a deeper sleep when
suspended without "Remote Wakeup" enabled. The QMI interface
will not respond unless a "set DTR" control request is sent
on resume. The effect is similar to a QMI_CTL SYNC request,
resetting (some of) the firmware state.
We allow userspace sessions to span multiple character device
open/close sequences. This means that userspace can depend
on firmware state while both the netdev and the character
device are closed. We have disabled "needs_remote_wakeup" at
this point to allow devices without remote wakeup support to
be auto-suspended.
To make sure the MDM9x30 keeps firmware state, we need to
keep "needs_remote_wakeup" always set. We also need to
issue a "set DTR" request to enable the QMI interface.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 4 Dec 2015 19:36:16 +0000 (14:36 -0500)]
Merge branch 'hip06-soc'
Salil Mehta says:
====================
net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem
This PATCH V7 addresses the TAB formatting comments by
Sergei Shtylyov. Missing TABs at some other palces have
also been corrected.
PATCH V6:
This addresses the review comments provided by
David Miller over the existing use of ENABLE/DISABLE
hash defines with the code. These hash defines are doing
a similar job as implicit type bool would do. So these are
kind of duplicate and are redundant.
PATCH V5:
This PATCH addresses the review comments by Yuval Mintz
<Yuval.Mintz@qlogic.com>. This rework of comments are basically
related to:
1) styling of the code,
2) RSS default Key initiailization related code
3) redundant code removal
PATCH V4:
This addresses the review comment provided by
Sergei Shtylyov. The changelog of every patch has also
been modified.
PATCH V3:
Addresses the review comment floated by David Miller
PATCH V2:
1) Bug Fixes and Clean-up: Internally identified
2) Addresses internal review comments by Kenneth Lee and
by Huang Daode
3) Addresses the review comment from "Yisen.Zhuang(Zhuangyuzeng)"
4) Adds fix from Fengguang Wu for an error generated from
"kbuild test robot" from Intel
5) Ethtool support for TSO set option from Lisheng
PATCH V1:
Adds initial support of Hip06 SoC with below changes:
This patch-set adds support of new Hisilicon Hip06 SoC to the existing
(already part of net-next) HNS ethernet driver for Hip05 SoC. Hip06 is
a multi-core SoC and is a derivative of Hip05 SoC with lots of new
hardware featres supported like RSS, TSO, hardware VLAN assist etc.
The changes in the driver are mainly due to following:
1) changes in the DMA descriptor provided by the Hip06 ethernet
hardware. These changes need to co-exist with already present
Hip05 DMA descriptor and its operating functions. The decision
to choose the correct type of DMA descriptor is taken dynamically
depending upon the version of the hardware (i.e. V1/hip05 or
V2/hip06, see already existing hisilicon-hns-nic.txt binding file
for the detailed description version and naming).
2) To support new features added to the Hip06 ethernet hardware:
a. RSS (Receive Side Scaling)
b. TSO (TCP Segment Offload)
c. Hardware VLAN support (currently we are initializing hardware
to not assist in stripping the vlan tag at hardware level.
Proper support of this feature and ethtool would come after
these patches have been accepted)
Kindly note that, this patchset has been based on latest net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:56 +0000 (12:17 +0000)]
net:hns: Add support of ethtool TSO set option for Hip06 in HNS
This patch adds the support of ethtool TSO option to support
Hip06 SoC to HNS
Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:55 +0000 (12:17 +0000)]
net:hns: Add Hip06 "TSO(TCP Segment Offload)" support HNS Driver
This patch adds the support of "TSO (TCP Segment Offload)" feature
provided by the Hip06 ethernet hardware to the HNS ethernet
driver.
Enabling this feature would help offload the TCP Segmentation
process to the Hip06 ethernet hardware. This eventually would help
in saving precious cpu cycles.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:54 +0000 (12:17 +0000)]
net:hns: Add Hip06 "RSS(Receive Side Scaling)" support to HNS Driver
This patch adds the support of "RSS (Receive Side Scaling)" feature
provided by the Hip06 ethernet hardware to the HNS ethernet
driver.
This feature helps in distributing the different flows (mapped as
hash by hardware using Toeplitz Hash) to different Queues asssociated
with the processor cores. The mapping of flow-hash values to the
different queues is stored in indirection table (which is per Packet-
parse-Engine/PPE). This patch also provides the changes to re-program
the (flow-hash<->Qid) mapping using the ethtool.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Kenneth Lee <liguozhu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Salil [Thu, 3 Dec 2015 12:17:53 +0000 (12:17 +0000)]
net:hns: Add support of Hip06 SoC to the Hislicon Network Subsystem
This patchset adds support of Hisilicon Hip06 SoC to the existing HNS
ethernet driver.
The changes in the driver are mainly due to changes in the DMA
descriptor provided by the Hip06 ethernet hardware. These changes
need to co-exist with already present Hip05 DMA descriptor and its
operating functions. The decision to choose the correct type of DMA
descriptor is taken dynamically depending upon the version of the
hardware (i.e. V1/hip05 or V2/hip06, see already existing
hisilicon-hns-nic.txt binding file for detailed description). other
changes includes in SBM, DSAF and PPE modules as well. Changes
affecting the driver related to the newly added ethernet hardware
features in Hip06 would be added as separate patch over this and
subsequent patches.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Adam Welle [Tue, 1 Dec 2015 22:13:52 +0000 (17:13 -0500)]
mac80211_hwsim: check ATTR_FREQ for wmediumd (netlink) packets
If a packet is received from netlink with the frequency value set it is
checked against the current radio's frequency and discarded if different.
The frequency is also checked against data2->tmp_chan to support the "hw"
off-channel/scan case.
Signed-off-by: Adam Welle <arwelle@cert.org>
[allow both simultaneously, add locking] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Tue, 1 Dec 2015 22:15:26 +0000 (23:15 +0100)]
mac80211: reject zero cookie in mgmt-tx/roc cancel
When cancelling, you can cancel "any" (first in list) mgmt-tx
or remain-on-channel operation by using the value 0 for the
cookie along with the *opposite* operation, i.e.
* cancel the first mgmt-tx by cancelling roc with 0 cookie
* cancel the first roc by cancelling mgmt-tx with 0 cookie
This isn't really that bad since userspace should only pass
cookies that we gave it, but could lead to hard-to-debug
issues so better prevent it and reject zero values since we
never hand those out.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Fri, 6 Nov 2015 10:57:23 +0000 (11:57 +0100)]
mac80211_hwsim: stop using pointers as cookies
Instead of using pointers, use sequentially assigned cookies.
This is easier to understand while debugging and also avoids
problems when the pointer is reused for the next allocation.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jouni Malinen [Thu, 26 Nov 2015 18:50:12 +0000 (20:50 +0200)]
mac80211_hwsim: Update timestamp in Probe Response frames
Previously, this was done only for Beacon frames, but similar timestamp
update is needed for Probe Response frames to make these more accurately
match the real IEEE 802.11 behavior. Previously, all zeros timestamp was
sent in Probe Response frames.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Jouni Malinen [Thu, 26 Nov 2015 18:49:38 +0000 (20:49 +0200)]
mac80211: Allow a STA to join an IBSS with 80+80 MHz channel
While it was possible to create an IBSS with 80+80 MHz channel, joining
such an IBSS resulted in falling back to 20 MHz channel with VHT
disabled due to a missing switch case for 80+80.
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Mon, 23 Nov 2015 22:53:51 +0000 (23:53 +0100)]
mac80211: rewrite remain-on-channel logic
Jouni found a bug in the remain-on-channel logic: when a short item
is queued, a long item is combined with it extending the original
one, and then the long item is deleted, the timeout doesn't go back
to the short one, and the short item ends up taking a long time. In
this case, this showed as blocking scan when running two test cases
back to back - the scan from the second was delayed even though all
the remain-on-channel items should long have been gone.
Fixing this with the current data structures turns out to be a bit
complicated, we just remove the long item from the dependents list
right now and don't recalculate the timeouts.
There's a somewhat similar bug where we delete the short item and
all the dependents go with it; to fix this we'd have to move them
from the dependents to the real list.
Instead of trying to do that, rewrite the code to not have all this
complexity in the data structures: use a single list and allow more
than one entry in it being marked as started. This makes the code a
bit more complex, the worker needs to understand that it might need
to just remove one of the started items, while keeping the device
off-channel, but that's not more complicated than the nested data
structures.
This then fixes both issues described, and makes it easier to also
limit the overall off-channel time when combining.
TODO: as before, with hardware remain-on-channel, deleting an item
after combining results in cancelling them all - we can keep track
of the time elapsed and only cancel after that to fix this.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Typically drivers that implement hardware remain-on-channel will
have to wait for scheduling constraints, so make hwsim also wait
a little bit (only 20ms) before actually starting the operation.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Tue, 24 Nov 2015 13:25:49 +0000 (14:25 +0100)]
mac80211: simplify ack_skb handling
Since the cookie is assigned inside ieee80211_make_ack_skb()
now, we no longer need to return the ack_skb as the cookie
and can simplify the function's return and the callers. Also
rename it to ieee80211_attach_ack_skb() to more accurately
reflect its purpose.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Tue, 24 Nov 2015 19:28:27 +0000 (20:28 +0100)]
mac80211: fix mgmt-tx abort cookie and leak
If a mgmt-tx operation is aborted before it runs, the wrong
cookie is reported back to userspace, and the ack_skb gets
leaked since the frame is freed directly instead of freeing
it using ieee80211_free_txskb(). Fix that.
Fixes: 3b79af973cf4 ("mac80211: stop using pointers as userspace cookies") Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Johannes Berg [Tue, 24 Nov 2015 14:29:53 +0000 (15:29 +0100)]
mac80211: catch queue stop underflow
If some code stops the queues more times than having started
(for when refcounting is used), warn on and reset the counter
to 0 to avoid blocking forever.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>