This patch enhances the fjes_change_mtu() method
by introducing new flag named FJES_RX_MTU_CHANGING_DONE
in rx_status. At the same time, default MTU value is
changed into 65510 bytes.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
fjes: fix incorrect statistics information in fjes_xmit_frame()
There are bugs of acounting statistics in fjes_xmit_frame().
Accounting self stats is wrong. accounting stats of other
EPs to be transmitted is right.
This patch fixes this bug.
Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: socfpga: remove extra call to socfpga_dwmac_setup
In the socfpga_dwmac_probe function, we have a call to socfpga_dwmac_setup,
which is already called from socfpga_dwmac_init later in the probe function.
Remove this extra call to socfpga_dwmac_setup.
Also we should not be calling socfpga_dwmac_setup() directly without wrapping
it around the proper reset assert/deasserts. That is because the
socfpga_dwmac_setup() is setting up PHY modes in the system manager, and it
is requires the EMAC's to be in reset during the PHY setup.
Reported-by: Matthew Gerlach <mgerlach@opensource.altera.com> Signed-off-by: Dinh Nguyen <dinguyen@opensource.altera.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 14 Apr 2016 21:47:12 +0000 (23:47 +0200)]
dsa: mv88e6xxx: Kill the REG_READ and REG_WRITE macros
These macros hide a ds variable and a return statement on error, which
can lead to locking issues. Kill them off.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 14 Apr 2016 21:04:34 +0000 (17:04 -0400)]
netdev_features: Add NETIF_F_TSO_MANGLEID to NETIF_F_ALL_TSO
I realized that when I added NETIF_F_TSO_MANGLEID as a TSO type I forgot to
add it to NETIF_F_ALL_TSO. This patch corrects that so the flag will be
included correctly.
The result should be minor as it was only used by a few drivers and in a
few specific cases such as when NETIF_F_SG was not supported on a device so
the TSO flags were cleared.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 16 Apr 2016 23:09:14 +0000 (19:09 -0400)]
Merge branch 'ipv6-gre-offloads'
Alexander Duyck says:
====================
Add support for offloads with IPv6 GRE tunnels
This patch series enables the use of segmentation and checksum offloads
with IPv6 based GRE tunnels.
In order to enable this series I had to make a change to
iptunnel_handle_offloads so that it would no longer free the skb. This was
necessary as there were multiple paths in the IPv6 GRE code that required
the skb to still be present so it could be freed. As it turned out I
believe this actually fixes a bug that was present in FOU/GUE based tunnels
anyway.
Below is a quick breakdown of the performance gains seen with a simple
netperf test passing traffic through a ip6gretap tunnel and then an i40e
interface:
Throughput Throughput Local Local Result
Units CPU Service Tag
Util Demand
%
3544.93 10^6bits/s 6.30 4.656 "before"
13081.75 10^6bits/s 3.75 0.752 "after"
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 14 Apr 2016 19:34:04 +0000 (15:34 -0400)]
ip6gre: Add support for GSO
This patch adds code borrowed from bits and pieces of other protocols to
the IPv6 GRE path so that we can support GSO over IPv6 based GRE tunnels.
By adding this support we are able to significantly improve the throughput
for GRE tunnels as we are able to make use of GSO.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 14 Apr 2016 19:33:58 +0000 (15:33 -0400)]
GRE: Add support for GRO/GSO of IPv6 GRE traffic
Since GRE doesn't really care about L3 protocol we can support IPv4 and
IPv6 using the same offloads. With that being the case we can add a call
to register the offloads for IPv6 as a part of our GRE offload
initialization.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 14 Apr 2016 19:33:51 +0000 (15:33 -0400)]
ip6gre: Add support for basic offloads offloads excluding GSO
This patch adds support for the basic offloads we support on most devices.
Specifically with this patch set we can support checksum offload, basic
scatter-gather, and highdma.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 14 Apr 2016 19:33:45 +0000 (15:33 -0400)]
ip6gretap: Fix MTU to allow for Ethernet header
When we were creating an ip6gretap interface the MTU was about 6 bytes
short of what was needed. It turns out we were not taking the Ethernet
header into account and as a result we were eating into the 8 bytes
reserved for the encap limit.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 14 Apr 2016 19:33:37 +0000 (15:33 -0400)]
ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb
This patch updates the IP tunnel core function iptunnel_handle_offloads so
that we return an int and do not free the skb inside the function. This
actually allows us to clean up several paths in several tunnels so that we
can free the skb at one point in the path without having to have a
secondary path if we are supporting tunnel offloads.
In addition it should resolve some double-free issues I have found in the
tunnels paths as I believe it is possible for us to end up triggering such
an event in the case of fou or gue.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 16 Apr 2016 22:30:27 +0000 (18:30 -0400)]
Merge branch 'w5100-spi-and-w5200-support'
Akinobu Mita says:
====================
net: w5100: add support W5100/W5200 for SPI interface
This series add support for Wiznet W5100 and W5200 for SPI interface.
We can easily find the ethernet modules and shield for Arduino with
these chips for purchase. I've tested them with BeagleBone.
Wiznet W5100 for mmio access has already supported by w5100 driver.
In order to share the code between mmio mode and SPI mode, this series
firstly adds ability to support another register access interface to
the existing w5100 driver. This ground work also requires to introduce
workqueue and threaded irq because SPI transfers are callable only from
contexts that can sleep unlike mmio access.
The latter part of this series adds w5100-spi driver which actually
support W5100 and W5200 for SPI interface. Supporting W5100 is
straight forward because it only required to add a register access
interface by the SPI transfer. W5100 and W5200 have similar memory
map which justifies adding W5200 support to w5100 driver.
* Changes from v2 to v3
- Add comment for reg_lock
- Add ability to allocate ops specific data structure
- Allocate w5200 ops specific data structure to put DMA-safe buffer
- Add missing chip_id assignment for w5100_*_ops
* Changes from v1 to v2
- Use a plain single pointer instead of SKB queue, spotted by David S. Miller
- Correct timeout period in w5100_command
- Use spi_write_then_read instead of spi_write which needs DMA-safe buffer
- Support W5200
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Akinobu Mita [Thu, 14 Apr 2016 15:11:33 +0000 (00:11 +0900)]
net: w5100: support W5200
This adds support for W5200 chip.
W5100 and W5200 have similar memory map although some of their offsets
are different. The register access sequences between them are different
but w5100 driver has abstraction layer for difference bus interface
modes so it is easy to add W5200 support to w5100 and w5100-spi drivers.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Sinkovsky <msink@permonline.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Akinobu Mita [Thu, 14 Apr 2016 15:11:32 +0000 (00:11 +0900)]
net: w5100: support SPI interface mode
This adds new w5100-spi driver which shares the bus interface
independent code with existing w5100 driver.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Sinkovsky <msink@permonline.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Akinobu Mita [Thu, 14 Apr 2016 15:11:31 +0000 (00:11 +0900)]
net: w5100: enable to support sleepable register access interface
SPI transfer routines are callable only from contexts that can sleep.
This adds ability to tell the core driver that the interface mode
cannot access w5100 register on atomic contexts. In this case,
workqueue and threaded irq are required.
This also corrects timeout period waiting for command register to be
automatically cleared because the latency of the register access with
SPI transfer can be interfered by other contexts.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Sinkovsky <msink@permonline.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Akinobu Mita [Thu, 14 Apr 2016 15:11:30 +0000 (00:11 +0900)]
net: w5100: add ability to support other bus interface
The w5100 driver currently only supports direct and indirect bus
interface mode which use MMIO space for accessing w5100 registers.
In order to support SPI interface mode which is supported by W5100 chip,
this makes the bus interface abstraction layer more generic so that
separated w5100-spi driver can use w5100 driver as core module.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Sinkovsky <msink@permonline.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Akinobu Mita [Thu, 14 Apr 2016 15:11:29 +0000 (00:11 +0900)]
net: w5100: move mmiowb into register access callbacks
Instead of sprinkle mmiowb over the driver code, move it into primary
register write callbacks. (w5100_write, w5100_write16, w5100_writebuf)
This is a preparation for supporting SPI interface which doesn't use
MMIO for accessing w5100 registers.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Mike Sinkovsky <msink@permonline.ru> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
vxlan: reduce usage of synchronize_net in ndo_stop
We only need to do the synchronize_net dance once for both, ipv4 and
ipv6 sockets, thus removing one synchronize_net in case both sockets get
dismantled.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
vxlan: synchronously and race-free destruction of vxlan sockets
Due to the fact that the udp socket is destructed asynchronously in a
work queue, we have some nondeterministic behavior during shutdown of
vxlan tunnels and creating new ones. Fix this by keeping the destruction
process synchronous in regards to the user space process so IFF_UP can
be reliably set.
udp_tunnel_sock_release destroys vs->sock->sk if reference counter
indicates so. We expect to have the same lifetime of vxlan_sock and
vxlan_sock->sock->sk even in fast paths with only rcu locks held. So
only destruct the whole socket after we can be sure it cannot be found
by searching vxlan_net->sock_list.
Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jiri Benc <jbenc@redhat.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
phy: make some bits preserved while setup forced mode
When tested the PHY SGMII Loopback:
1.set the LOOPBACK bit,
2.set the autoneg to AUTONEG_DISABLE, it calls the
genphy_setup_forced which will clear the bit.
The BMCR_LOOPBACK bit should be preserved.
As Florian pointed out that other bits should be preserved too.
So I make the BMCR_ISOLATE and BMCR_PDOWN as well.
Signed-off-by: Weidong Wang <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The entries with '- ESTAB' are the assocs, some of them may belong to
the same endpoint. So we will dump the parent endpoint first, like the
entry with 'LISTEN'. then dump the assocs. ep and assocs entries will
be dumped in right order so that ss can show them in tree format easily.
Besides, this patchset also simplifies sctp proc codes, cause it has
some similar codes with sctp diag in sctp transport traversal.
v1->v2:
1. inet_diag_get_handler needs to return it as const.
2. merge 5/7 into 2/7 of v1.
v2->v3:
do some improvements and fixes in patch 1-4, see the details in
each patch's comment.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Apr 2016 07:35:35 +0000 (15:35 +0800)]
sctp: fix some rhashtable functions using in sctp proc/diag
When rhashtable_walk_init return err, no release function should be
called, and when rhashtable_walk_start return err, we should only invoke
rhashtable_walk_exit to release the source.
But now when sctp_transport_walk_start return err, we just call
rhashtable_walk_stop/exit, and never care about if rhashtable_walk_init
or start return err, which is so bad.
We will fix it by calling rhashtable_walk_exit if rhashtable_walk_start
return err in sctp_transport_walk_start, and if sctp_transport_walk_start
return err, we do not need to call sctp_transport_walk_stop any more.
For sctp proc, we will use 'iter->start_fail' to decide if we will call
rhashtable_walk_stop/exit.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Apr 2016 07:35:32 +0000 (15:35 +0800)]
sctp: export some functions for sctp_diag in inet_diag
inet_diag_msg_common_fill is used to fill the diag msg common info,
we need to use it in sctp_diag as well, so export it.
inet_diag_msg_attrs_fill is used to fill some common attrs info between
sctp diag and tcp diag.
v2->v3:
- do not need to define and export inet_diag_get_handler any more.
cause all the functions in it are in sctp_diag.ko, we just call
them in sctp_diag.ko.
- add inet_diag_msg_attrs_fill to make codes clear.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Apr 2016 07:35:31 +0000 (15:35 +0800)]
sctp: export some apis or variables for sctp_diag and reuse some for proc
For some main variables in sctp.ko, we couldn't export it to other modules,
so we have to define some api to access them.
It will include sctp transport and endpoint's traversal.
There are some transport traversal functions for sctp_diag, we can also
use it for sctp_proc. cause they have the similar situation to traversal
transport.
v2->v3:
- rhashtable_walk_init need the parameter gfp, because of recent upstrem
update
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 14 Apr 2016 07:35:30 +0000 (15:35 +0800)]
sctp: add sctp_info dump api for sctp_diag
sctp_diag will dump some important details of sctp's assoc or ep, we use
sctp_info to describe them, sctp_get_sctp_info to get them, and export
it to sctp_diag.ko.
v2->v3:
- we will not use list_for_each_safe in sctp_get_sctp_info, cause
all the callers of it will use lock_sock.
- fix the holes in struct sctp_info with __reserved* field.
because sctp_diag is a new feature, and sctp_info is just for now,
it may be changed in the future.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP already serializes access to rcvbuf through its sock lock:
sctp_recvmsg takes it right in the start and release at the end, while
rx path will also take the lock before doing any socket processing. On
sctp_rcv() it will check if there is an user using the socket and, if
there is, it will queue incoming packets to the backlog. The backlog
processing will do the same. Even timers will do such check and
re-schedule if an user is using the socket.
Simplifying this will allow us to remove sctp_skb_list_tail and get ride
of some expensive lockings. The lists that it is used on are also
mangled with functions like __skb_queue_tail and __skb_unlink in the
same context, like on sctp_ulpq_tail_event() and sctp_clear_pd().
sctp_close() will also purge those while using only the sock lock.
Therefore the lockings performed by sctp_skb_list_tail() are not
necessary. This patch removes this function and replaces its calls with
just skb_queue_splice_tail_init() instead.
The biggest gain is at sctp_ulpq_tail_event(), because the events always
contain a list, even if it's queueing a single skb and this was
triggering expensive calls to spin_lock_irqsave/_irqrestore for every
data chunk received.
As SCTP will deliver each data chunk on a corresponding recvmsg, the
more effective the change will be.
Before this patch, with chunks with 30 bytes:
netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000
400000 -s 400000 400000
on a 10Gbit link with 1500 MTU:
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
David S. Miller [Fri, 15 Apr 2016 21:21:10 +0000 (17:21 -0400)]
Merge branch 'mlx5_ifc-updates'
Saeed Mahameed says:
====================
mlx5_core: mlx5_ifc updates
This series include mlx5_core updates for both net-next and rdma
trees for 4.7 kernel cycle. This is the only shared code planned
for 4.7 between rdma and net trees. Hopefully, this will prevent
future conflicts when merging between ib-next and net-next once
4.7 cycle is over and merge window is opened.
Both Mellanox rdma and net submissions will proceed once this series
is applied into both trees.
Future shared code will be sent to both maintainers as pull requests
from Mellanox's kernel.org tree.
We have included all the maintainers of respective drivers.
Kindly review the change and let us know in case of any review comments.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding the needed mlx5_ifc hardware bits and structs
for the following feature:
* Add vport to steering commands for SRIOV ACL support
* Add mlcr, pcmr and mcia registers for dump module EEPROM
* Add support for FCS, baeacon led and disable_link bits to
hca caps
* Add CQE period mode bit in CQ context for CQE based CQ
moderation support
* Add umr SQ bit for fragmented memory registration
* Add needed bits and caps for Striding RQ support
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
All reserved fields after early_vf_enable are off by 1, since
early_vf_enable was not explicitly declared as array of size 1.
Reserved field before cqe_zip had a wrong size, it should
be 0x80 + 0x3f.
Fixes: b0844444590e ("net/mlx5_core: Introduce access function to read internal timer ") Fixes: b4ff3a36d3e4 ("net/mlx5: Use offset based reserved field names in the IFC header file") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Apr 2016 21:08:09 +0000 (17:08 -0400)]
Merge branch 'qed-tunneling-offload'
Manish Chopra says:
====================
qed/qede: Add tunneling support
This patch series adds support for VXLAN, GRE and GENEVE tunnels
to be used over this driver. With this support, adapter can perform
TSO offload, inner/outer checksums offloads on TX and RX for
encapsulated packets.
V1->V2 [ Comments from Jesse Gross incorporated ]
* Drop general infrastructure change patch.
"net: Make vxlan/geneve default udp ports public"
* Remove by default Linux default UDP ports configurations in driver.
Instead, use general registration APIs for UDP port configurations
* Removing .ndo_features_check - we will add it later with proper change.
Please consider applying this series to net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
qed/qede: Add GENEVE tunnel slowpath configuration support
This patch enables GENEVE tunnel on the adapter and
add support for driver hooks to configure UDP ports
for GENEVE tunnel offload to be performed by the adapter.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed/qede: Add VXLAN tunnel slowpath configuration support
This patch enables VXLAN tunnel on the adapter and
add support for driver hooks to configure UDP ports
for VXLAN tunnel offload to be performed by the adapter.
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Heise [Wed, 13 Apr 2016 11:52:22 +0000 (13:52 +0200)]
net/hsr: Added support for HSR v1
This patch adds support for the newer version 1 of the HSR
networking standard. Version 0 is still default and the new
version has to be selected via iproute2.
Main changes are in the supervision frame handling and its
ethertype field.
Signed-off-by: Peter Heise <peter.heise@airbus.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 14 Apr 2016 05:05:39 +0000 (22:05 -0700)]
tcp: do not mess with listener sk_wmem_alloc
When removing sk_refcnt manipulation on synflood, I missed that
using skb_set_owner_w() was racy, if sk->sk_wmem_alloc had already
transitioned to 0.
We should hold sk_refcnt instead, but this is a big deal under attack.
(Doing so increase performance from 3.2 Mpps to 3.8 Mpps only)
In this patch, I chose to not attach a socket to syncookies skb.
Performance is now 5 Mpps instead of 3.2 Mpps.
Following patch will remove last known false sharing in
tcp_rcv_state_process()
Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qlge: Replace create_singlethread_workqueue with alloc_ordered_workqueue
Replace deprecated create_singlethread_workqueue with
alloc_ordered_workqueue.
Work items include getting tx/rx frame sizes, resetting MPI processor,
setting asic recovery bit so ordering seems necessary as only one work
item should be in queue/executing at any given time, hence the use of
alloc_ordered_workqueue.
WQ_MEM_RECLAIM flag has been set since ethernet devices seem to sit in
memory reclaim path, so to guarantee forward progress regardless of
memory pressure.
Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Apr 2016 20:09:07 +0000 (16:09 -0400)]
Merge branch 'tipc-link-setup-improvements'
Jon Maloy says:
====================
tipc: improvements to the link setup algorithm
This series addresses some smaller issues regarding the link setup
algorithm. The first commit fixes a rare bug we have discovered during
testing; the second one may have some future impact on cluster
scalabilty, while remaining ones can be regarded as cosmetic in
a wider sense of the word.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 15 Apr 2016 17:33:07 +0000 (13:33 -0400)]
tipc: let first message on link be a state message
According to the link FSM, a received traffic packet can take a link
from state ESTABLISHING to ESTABLISHED, but the link can still not be
fully set up in one atomic operation. This means that even if the the
very first packet on the link is a traffic packet with sequence number
1 (one), it has to be dropped and retransmitted.
This can be avoided if we let the mentioned packet be preceded by a
LINK_PROTOCOL/STATE message, which takes up the endpoint before the
arrival of the traffic.
We add this small feature in this commit.
This is a fully compatible change.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 15 Apr 2016 17:33:06 +0000 (13:33 -0400)]
tipc: ensure that first packets on link are sent in order
In some link establishment scenarios we see that packet #2 may be sent
out before packet #1, forcing the receiver to demand retransmission of
the missing packet. This is harmless, but may cause confusion among
people tracing the packet flow.
Since this is extremely easy to fix, we do so by adding en extra send
call to the bearer immediately after the link has come up.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 15 Apr 2016 17:33:04 +0000 (13:33 -0400)]
tipc: reduce transmission rate of reset messages when link is down
When a link is down, it will continuously try to re-establish contact
with the peer by sending out a RESET or an ACTIVATE message at each
timeout interval. The default value for this interval is currently
375 ms. This is wasteful, and may become a problem in very large
clusters with dozens or hundreds of nodes being down simultaneously.
We now introduce a simple backoff algorithm for these cases. The
first five messages are sent at default rate; thereafter a message
is sent only each 16th timer interval.
This will cover the vast majority of link recycling cases, since the
endpoint starting last will transmit at the higher speed, and the link
should normally be established well be before the rate needs to be
reduced.
The only case where we will see a degradation of link re-establishment
times is when the endpoints remain intact, and a glitch in the
transmission media is causing the link reset. We will then experience
a worst-case re-establishing time of 6 seconds, something we deem
acceptable.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Fri, 15 Apr 2016 17:33:03 +0000 (13:33 -0400)]
tipc: guarantee peer bearer id exchange after reboot
When a link endpoint is going down locally, e.g., because its interface
is being stopped, it will spontaneously send out a RESET message to
its peer, informing it about this fact. This saves the peer from
detecting the failure via probing, and hence gives both speedier and
less resource consuming failure detection on the peer side.
According to the link FSM, a receiver of a RESET message, ignoring the
reason for it, must now consider the sender ready to come back up, and
starts periodically sending out ACTIVATE messages to the peer in order
to re-establish the link. Also, according to the FSM, the receiver of
an ACTIVATE message can now go directly to state ESTABLISHED and start
sending regular traffic packets. This is a well-proven and robust FSM.
However, in the case of a reboot, there is a small possibilty that link
endpoint on the rebooted node may have been re-created with a new bearer
identity between the moment it sent its (pre-boot) RESET and the moment
it receives the ACTIVATE from the peer. The new bearer identity cannot
be known by the peer according to this scenario, since traffic headers
don't convey such information. This is a problem, because both endpoints
need to know the correct value of the peer's bearer id at any moment in
time in order to be able to produce correct link events for their users.
The only way to guarantee this is to enforce a full setup message
exchange (RESET + ACTIVATE) even after the reboot, since those messages
carry the bearer idientity in their header.
In this commit we do this by introducing and setting a "stopping" bit in
the header of the spontaneously generated RESET messages, informing the
peer that the sender will not be immediately ready to re-establish the
link. A receiver seeing this bit must act as if this were a locally
detected connectivity failure, and hence has to go through a full two-
way setup message exchange before any link can be re-established.
Although never reported, this problem seems to have always been around.
This protocol addition is fully backwards compatible.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_buffers: Use MLXSW_SP_PB_UNUSED define for unused pb
Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_buffers: Use designated initializers for mlxsw_sp_pbs
Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 13 Apr 2016 08:52:20 +0000 (10:52 +0200)]
tun: use per cpu variables for stats accounting
Currently the tun device accounting uses dev->stats without applying any
kind of protection, regardless that accounting happens in preemptible
process context.
This patch move the tun stats to a per cpu data structure, and protect
the updates with u64_stats_update_begin()/u64_stats_update_end() or
this_cpu_inc according to the stat type. The per cpu stats are
aggregated by the newly added ndo_get_stats64 ops.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 15 Apr 2016 01:40:53 +0000 (21:40 -0400)]
Merge branch 'bpf-ARG_PTR_TO_RAW_STACK'
Merge branch 'bpf-ARG_PTR_TO_RAW_STACK'
Daniel Borkmann says:
====================
BPF updates
This series adds a new verifier argument type called
ARG_PTR_TO_RAW_STACK and converts related helpers to make
use of it. Basic idea is that we can save init of stack
memory when the helper function is guaranteed to fully
fill out the passed buffer in every path. Series also adds
test cases and converts samples. For more details, please
see individual patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 12 Apr 2016 22:10:54 +0000 (00:10 +0200)]
bpf, samples: add test cases for raw stack
This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the
verifier behaviour.
[...]
#84 raw_stack: no skb_load_bytes OK
#85 raw_stack: skb_load_bytes, no init OK
#86 raw_stack: skb_load_bytes, init OK
#87 raw_stack: skb_load_bytes, spilled regs around bounds OK
#88 raw_stack: skb_load_bytes, spilled regs corruption OK
#89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK
#90 raw_stack: skb_load_bytes, spilled regs + data OK
#91 raw_stack: skb_load_bytes, invalid access 1 OK
#92 raw_stack: skb_load_bytes, invalid access 2 OK
#93 raw_stack: skb_load_bytes, invalid access 3 OK
#94 raw_stack: skb_load_bytes, invalid access 4 OK
#95 raw_stack: skb_load_bytes, invalid access 5 OK
#96 raw_stack: skb_load_bytes, invalid access 6 OK
#97 raw_stack: skb_load_bytes, large access OK
Summary: 98 PASSED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 12 Apr 2016 22:10:53 +0000 (00:10 +0200)]
bpf, samples: don't zero data when not needed
Remove the zero initialization in the sample programs where appropriate.
Note that this is an optimization which is now possible, old programs
still doing the zero initialization are just fine as well. Also, make
sure we don't have padding issues when we don't memset() the entire
struct anymore.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 12 Apr 2016 22:10:52 +0000 (00:10 +0200)]
bpf: convert relevant helper args to ARG_PTR_TO_RAW_STACK
This patch converts all helpers that can use ARG_PTR_TO_RAW_STACK as argument
type. For tc programs this is bpf_skb_load_bytes(), bpf_skb_get_tunnel_key(),
bpf_skb_get_tunnel_opt(). For tracing, this optimizes bpf_get_current_comm()
and bpf_probe_read(). The check in bpf_skb_load_bytes() for MAX_BPF_STACK can
also be removed since the verifier already makes sure we stay within bounds
on stack buffers.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 12 Apr 2016 22:10:51 +0000 (00:10 +0200)]
bpf, verifier: add ARG_PTR_TO_RAW_STACK type
When passing buffers from eBPF stack space into a helper function, we have
ARG_PTR_TO_STACK argument type for helpers available. The verifier makes sure
that such buffers are initialized, within boundaries, etc.
However, the downside with this is that we have a couple of helper functions
such as bpf_skb_load_bytes() that fill out the passed buffer in the expected
success case anyway, so zero initializing them prior to the helper call is
unneeded/wasted instructions in the eBPF program that can be avoided.
Therefore, add a new helper function argument type called ARG_PTR_TO_RAW_STACK.
The idea is to skip the STACK_MISC check in check_stack_boundary() and color
the related stack slots as STACK_MISC after we checked all call arguments.
Helper functions using ARG_PTR_TO_RAW_STACK must make sure that every path of
the helper function will fill the provided buffer area, so that we cannot leak
any uninitialized stack memory. This f.e. means that error paths need to
memset() the buffers, but the expected fast-path doesn't have to do this
anymore.
Since there's no such helper needing more than at most one ARG_PTR_TO_RAW_STACK
argument, we can keep it simple and don't need to check for multiple areas.
Should in future such a use-case really appear, we have check_raw_mode() that
will make sure we implement support for it first.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 12 Apr 2016 22:10:50 +0000 (00:10 +0200)]
bpf, verifier: add bpf_call_arg_meta for passing meta data
Currently, when the verifier checks calls in check_call() function, we
call check_func_arg() for all 5 arguments e.g. to make sure expected types
are correct. In some cases, we collect meta data (here: map pointer) to
perform additional checks such as checking stack boundary on key/value
sizes for subsequent arguments. As we're going to extend the meta data,
add a generic struct bpf_call_arg_meta that we can use for passing into
check_func_arg().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds what's missing to properly support RPS and RFS on SCTP,
as some of it is already implemented in common calls.
Having support for RPS and RFS allows better scaling specially because
not all NICs support hashing SCTP headers.
Save the hash right when we dequeue a skb from inqueue so we do it only
once per skb instead of per chunk. New sockets will then inherit the
hash through sctp_copy_sock().
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
consume_skb() isn't for error cases that kfree_skb() is more proper
one. At this patch, it fixed tpacket_rcv() and packet_rcv() to be
consistent for error or non-error cases letting perf trace its event
properly.
Signed-off-by: Weongyo Jeong <weongyo.linux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tipc: fix a race condition leading to subscriber refcnt bug
Until now, the requests sent to topology server are queued
to a workqueue by the generic server framework.
These messages are processed by worker threads and trigger the
registered callbacks.
To reduce latency on uniprocessor systems, explicit rescheduling
is performed using cond_resched() after MAX_RECV_MSG_COUNT(25)
messages.
This implementation on SMP systems leads to an subscriber refcnt
error as described below:
When a worker thread yields by calling cond_resched() in a SMP
system, a new worker is created on another CPU to process the
pending workitem. Sometimes the sleeping thread wakes up before
the new thread finishes execution.
This breaks the assumption on ordering and being single threaded.
The fault is more frequent when MAX_RECV_MSG_COUNT is lowered.
If the first thread was processing subscription create and the
second thread processing close(), the close request will free
the subscriber and the create request oops as follows:
In this commit, we
- rename tipc_conn_shutdown() to tipc_conn_release().
- move connection release callback execution from tipc_close_conn()
to a new function tipc_sock_release(), which is executed before
we free the connection.
Thus we release the subscriber during connection release procedure
rather than connection shutdown procedure.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 14 Apr 2016 20:23:42 +0000 (16:23 -0400)]
Merge branch 'gro-fixed-id-gso-partial'
Alexander Duyck says:
====================
GRO Fixed IPv4 ID support and GSO partial support
This patch series sets up a few different things.
First it adds support for GRO of frames with a fixed IP ID value. This
will allow us to perform GRO for frames that go through things like an IPv6
to IPv4 header translation.
The second item we add is support for segmenting frames that are generated
this way. Most devices only support an incrementing IP ID value, and in
the case of TCP the IP ID can be ignored in many cases since the DF bit
should be set. So we can technically segment these frames using existing
TSO if we are willing to allow the IP ID to be mangled. As such I have
added a matching feature for the new form of GRO/GSO called TCP IPv4 ID
mangling. With this enabled we can assemble and disassemble a frame with
the sequence number fixed and the only ill effect will be that the IPv4 ID
will be altered which may or may not have any noticeable effect. As such I
have defaulted the feature to disabled.
The third item this patch series adds is support for partial GSO
segmentation. Partial GSO segmentation allows us to split a large frame
into two pieces. The first piece will have an even multiple of MSS worth
of data and the headers before the one pointed to by csum_start will have
been updated so that they are correct for if the data payload had already
been segmented. By doing this we can do things such as precompute the
outer header checksums for a frame to be segmented allowing us to perform
TSO on devices that don't support tunneling, or tunneling with outer header
checksums.
This patch set is based on the net-next tree, but I included "net: remove
netdevice gso_min_segs" in my tree as I assume it is likely to be applied
before this patch set will and I wanted to avoid a merge conflict.
v2: Fixed items reported by Jesse Gross
fixed missing GSO flag in MPLS check
adding DF check for MANGLEID
Moved extra GSO feature checks into gso_features_check
Rebased batches to account for "net: remove netdevice gso_min_segs"
Driver patches from the first patch set should still be compatible. However
I do have a few changes in them so I will submit a v2 of those to Jeff
Kirsher once these patches are accepted into net-next.
Example driver patches for i40e, ixgbe, and igb:
https://patchwork.ozlabs.org/patch/608221/
https://patchwork.ozlabs.org/patch/608224/
https://patchwork.ozlabs.org/patch/608225/
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 11 Apr 2016 01:45:09 +0000 (21:45 -0400)]
Documentation: Add documentation for TSO and GSO features
This document is a starting point for defining the TSO and GSO features.
The whole thing is starting to get a bit messy so I wanted to make sure we
have notes somwhere to start describing what does and doesn't work.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 11 Apr 2016 01:45:03 +0000 (21:45 -0400)]
GSO: Support partial segmentation offload
This patch adds support for something I am referring to as GSO partial.
The basic idea is that we can support a broader range of devices for
segmentation if we use fixed outer headers and have the hardware only
really deal with segmenting the inner header. The idea behind the naming
is due to the fact that everything before csum_start will be fixed headers,
and everything after will be the region that is handled by hardware.
With the current implementation it allows us to add support for the
following GSO types with an inner TSO_MANGLEID or TSO6 offload:
NETIF_F_GSO_GRE
NETIF_F_GSO_GRE_CSUM
NETIF_F_GSO_IPIP
NETIF_F_GSO_SIT
NETIF_F_UDP_TUNNEL
NETIF_F_UDP_TUNNEL_CSUM
In the case of hardware that already supports tunneling we may be able to
extend this further to support TSO_TCPV4 without TSO_MANGLEID if the
hardware can support updating inner IPv4 headers.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 11 Apr 2016 01:44:57 +0000 (21:44 -0400)]
GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID values
This patch does two things.
First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field. As
a result we should now be able to aggregate flows that were converted from
IPv6 to IPv4. In addition this allows us more flexibility for future
implementations of segmentation as we may be able to use a fixed IP ID when
segmenting the flow.
The second thing this does is that it places limitations on the outer IPv4
ID header in the case of tunneled frames. Specifically it forces the IP ID
to be incrementing by 1 unless the DF bit is set in the outer IPv4 header.
This way we can avoid creating overlapping series of IP IDs that could
possibly be fragmented if the frame goes through GRO and is then
resegmented via GSO.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 11 Apr 2016 01:44:51 +0000 (21:44 -0400)]
GSO: Add GSO type for fixed IPv4 ID
This patch adds support for TSO using IPv4 headers with a fixed IP ID
field. This is meant to allow us to do a lossless GRO in the case of TCP
flows that use a fixed IP ID such as those that convert IPv6 header to IPv4
headers.
In addition I am adding a feature that for now I am referring to TSO with
IP ID mangling. Basically when this flag is enabled the device has the
option to either output the flow with incrementing IP IDs or with a fixed
IP ID regardless of what the original IP ID ordering was. This is useful
in cases where the DF bit is set and we do not care if the original IP ID
value is maintained.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 11 Apr 2016 01:44:44 +0000 (21:44 -0400)]
ethtool: Add support for toggling any of the GSO offloads
The strings were missing for several of the GSO offloads that are
available. This patch provides the missing strings so that we can toggle
or query any of them via the ethtool command.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 14 Apr 2016 20:22:12 +0000 (16:22 -0400)]
Merge branch 'mlxsw-devlink-shared-buffers'
Jiri Pirko says:
====================
devlink + mlxsw: add support for config and control of shared buffers
ASICs implement shared buffer for packet forwarding purposes and enable
flexible partitioning of the shared buffer for different flows and ports,
enabling non-blocking progress of different flows as well as separation
of lossy traffic from loss-less traffic when using Per-Priority Flow
Control (PFC). The shared buffer optimizes the buffer utilization for better
absorption of packet bursts.
This patchset implements API which is based on the model SAI uses. That is
aligned with multiple ASIC vendors so this API should be vendor neutral.
Userspace counterpart patchset for devlink iproute2 tool can be found here:
https://github.com/jpirko/iproute2_mlxsw/tree/devlink_sb
Couple of examples of usage:
switch$ devlink sb help
Usage: devlink sb show [ DEV [ sb SB_INDEX ] ]
devlink sb pool show [ DEV [ sb SB_INDEX ] pool POOL_INDEX ]
devlink sb pool set DEV [ sb SB_INDEX ] pool POOL_INDEX
size POOL_SIZE thtype { static | dynamic }
devlink sb port pool show [ DEV/PORT_INDEX [ sb SB_INDEX ]
pool POOL_INDEX ]
devlink sb port pool set DEV/PORT_INDEX [ sb SB_INDEX ]
pool POOL_INDEX th THRESHOLD
devlink sb tc bind show [ DEV/PORT_INDEX [ sb SB_INDEX ] tc TC_INDEX ]
devlink sb tc bind set DEV/PORT_INDEX [ sb SB_INDEX ] tc TC_INDEX
type { ingress | egress } pool POOL_INDEX
th THRESHOLD
devlink sb occupancy show { DEV | DEV/PORT_INDEX } [ sb SB_INDEX ]
devlink sb occupancy snapshot DEV [ sb SB_INDEX ]
devlink sb occupancy clearmax DEV [ sb SB_INDEX ]
Implement occupancy API introduced in devlink and mlxsw core. This is
done by accessing SBPM register for Port-Pool and SBSR for Port-TC
current and max occupancy values. Max clear is implemented using the
same registers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: core: Introduce support for asynchronous EMAD register access
So far it was possible to have one EMAD register access at a time,
locked by mutex. This patch extends this interface to allow multiple
EMAD register accesses to be in fly at once. That allows faster
processing on firmware side avoiding unused time in between EMADs.
Measured speedup is ~30% for shared occupancy snapshot operation.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: core: Add mlxsw specific workqueue and use it for FDB notif. processing
Follow-up patch is going to need to use delayed work as well and
frequently. The FDB notification processing is already using that and
also quite frequently. It makes sense to create separate workqueue just
for mlxsw driver in this case and do not pollute system_wq.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: reg: Extend SBPM register for occupancy control
Since it is not possible to get and clear Port-Pool occupancy data using
SBSR register, there's a need to implement that using SBPM.
Extend pack helper and add unpack helper to get occupancy values.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_buffers: Get max_buff defaults into limits exposed to user
Although the device supports max_buff magic values 0 and 0xff, these are
not exposed to the user via devlink.
Therefore, adjust the default values to be within configurable range.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_buffers: Change initialization of PG 9
As explained in commit ff6551ec0c27 ("mlxsw: spectrum: Correctly
configure headroom size") control packets are directed to priority group
buffer 9 (PG9) in the ports' headroom buffers.
Since we don't want to drop control packets in case they can't be
admitted to the switch's shared buffer we bind PG9 to a different
ingress pool from the one used by all other PGs.
Unlike other PGs, we currently don't expose the binding between PG9 to a
pool and leave it fixed.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_buffers: Remove eg pool 3 default init and CPU port TC binding to it
Since there is no congestion control for CPU port traffic, we can change
the CPU port TC binding to pool 0 with min_buff and max_buff zeroed.
Remove initialization for pool egress pool 3 since it is no longer used
by dafault.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In order to achieve faster dumping of current setting and also in order
to provide possibility to get pool mode without a need to query hardware,
do cache the configuration in driver.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
User needs to monitor shared buffer occupancy. For that, he issues a
snapshot command in order to instruct hardware to catch current and
maximal occupancy values, and clear command in order to clear the
historical maximal values.
Also port-pool and tc-pool-bind command response messages are extended to
carry occupancy values.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Define userspace API and drivers API for configuration of shared
buffers. Four basic objects are defined:
shared buffer - attributes are size, number of pools and TCs
pool - chunk of sharedbuffer definition, it has some size and either
static or dynamic threshold
port pool threshold - to set per-port threshold for each pool
port tc threshold bind - to bind port and TC to specified pool
with threshold.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sun, 10 Apr 2016 20:55:15 +0000 (23:55 +0300)]
ravb: make ravb_ptp_interrupt() *void*
When we have the ISS.CGIS bit set, we already know that gPTP interrupt has
happened, so an extra GIS register check at the end of ravb_ptp_interrupt()
seems superfluous. We can model the gPTP interrupt handler like all other
dedicated interrupt handlers in the driver and make it *void*.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Adds support for the following via ethtool:
- UDP configuration of RSS based on 2-tuple/4-tuple.
- RSS hash key.
- RSS indirection table.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>