Wei Wang [Thu, 20 Apr 2017 21:45:46 +0000 (14:45 -0700)]
net/tcp_fastopen: Disable active side TFO in certain scenarios
Middlebox firewall issues can potentially cause server's data being
blackholed after a successful 3WHS using TFO. Following are the related
reports from Apple:
https://www.nanog.org/sites/default/files/Paasch_Network_Support.pdf
Slide 31 identifies an issue where the client ACK to the server's data
sent during a TFO'd handshake is dropped.
C ---> syn-data ---> S
C <--- syn/ack ----- S
C (accept & write)
C <---- data ------- S
C ----- ACK -> X S
[retry and timeout]
https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf
Slide 5 shows a similar situation that the server's data gets dropped
after 3WHS.
C ---- syn-data ---> S
C <--- syn/ack ----- S
C ---- ack --------> S
S (accept & write)
C? X <- data ------ S
[retry and timeout]
This is the worst failure b/c the client can not detect such behavior to
mitigate the situation (such as disabling TFO). Failing to proceed, the
application (e.g., SSL library) may simply timeout and retry with TFO
again, and the process repeats indefinitely.
The proposed solution is to disable active TFO globally under the
following circumstances:
1. client side TFO socket detects out of order FIN
2. client side TFO socket receives out of order RST
We disable active side TFO globally for 1hr at first. Then if it
happens again, we disable it for 2h, then 4h, 8h, ...
And we reset the timeout to 1hr if a client side TFO sockets not opened
on loopback has successfully received data segs from server.
And we examine this condition during close().
The rational behind it is that when such firewall issue happens,
application running on the client should eventually close the socket as
it is not able to get the data it is expecting. Or application running
on the server should close the socket as it is not able to receive any
response from client.
In both cases, out of order FIN or RST will get received on the client
given that the firewall will not block them as no data are in those
frames.
And we want to disable active TFO globally as it helps if the middle box
is very close to the client and most of the connections are likely to
fail.
Also, add a debug sysctl:
tcp_fastopen_blackhole_detect_timeout_sec:
the initial timeout to use when firewall blackhole issue happens.
This can be set and read.
When setting it to 0, it means to disable the active disable logic.
Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Apr 2017 18:11:10 +0000 (14:11 -0400)]
Merge tag 'mlx5-updates-2017-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2017-04-22
Sparse and compiler warnings fixes from Stephen Hemminger.
From Roi Dayan and Or Gerlitz, Add devlink and mlx5 support for controlling
E-Switch encapsulation mode, this knob will enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 22 Apr 2017 16:33:16 +0000 (09:33 -0700)]
net: add rcu locking when changing early demux
systemd-sysctl is triggering a suspicious RCU usage message when
net.ipv4.tcp_early_demux or net.ipv4.udp_early_demux is changed via
a sysctl config file:
[ 33.896184] ===============================
[ 33.899558] [ ERR: suspicious RCU usage. ]
[ 33.900624] 4.11.0-rc7+ #104 Not tainted
[ 33.901698] -------------------------------
[ 33.903059] /home/dsa/kernel-2.git/net/ipv4/sysctl_net_ipv4.c:305 suspicious rcu_dereference_check() usage!
[ 33.905724]
other info that might help us debug this:
Fixes: dddb64bcb3461 ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Apr 2017 17:54:48 +0000 (13:54 -0400)]
Merge branch 'bnxt_en-misc-next'
Michael Chan says:
====================
bnxt_en: Updates for net-next.
Miscellaneous updates include passing DCBX RoCE VLAN priority to firmware,
checking one more new firmware flag before allowing DCBX to run on the host,
adding 100Gbps speed support, adding check to disallow speed settings on
Multi-host NICs, and a minor fix for reporting VF attributes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_en: Restrict a PF in Multi-Host mode from changing port PHY configuration
This change restricts the PF in multi-host mode from setting any port
level PHY configuration. The settings are controlled by firmware in
Multi-Host mode.
Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bnxt_en: Add 100G link speed reporting for BCM57454 ASIC in ethtool
Added support for 100G link speed reporting for Broadcom BCM57454
ASIC in ethtool command.
Signed-off-by: Deepak Khungar <deepak.khungar@broadcom.com> Signed-off-by: Ray Jui <ray.jui@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sat, 22 Apr 2017 00:11:23 +0000 (20:11 -0400)]
bnxt_en: Fix VF attributes reporting.
The .ndo_get_vf_config() is returning the wrong qos attribute. Fix
the code that checks and reports the qos and spoofchk attributes. The
BNXT_VF_QOS and BNXT_VF_LINK_UP flags should not be set by default
during init. time.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Sat, 22 Apr 2017 00:11:22 +0000 (20:11 -0400)]
bnxt_en: Pass DCB RoCE app priority to firmware.
When the driver gets the RoCE app priority set/delete call through DCBNL,
the driver will send the information to the firmware to set up the
priority VLAN tag for RDMA traffic.
[ New version using the common ETH_P_IBOE constant in if_ether.h ]
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
which can be used in conjunction with the commit flag
(OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
conntrack events (IPCT_*) should be delivered via the Netfilter
netlink multicast groups. Default behavior depends on the system
configuration, but typically a lot of events are delivered. This can be
very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
types of events are of interest.
Netfilter core init_conntrack() adds the event cache extension, so we
only need to set the ctmask value. However, if the system is
configured without support for events, the setting will be skipped due
to extension not being found.
Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Reviewed-by: Greg Rose <gvrose8192@gmail.com> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Apr 2017 16:52:15 +0000 (12:52 -0400)]
Merge branch 'ibmvnic-updates-and-fixes'
Nathan Fontenot says:
====================
ibmvnic: Additional updates and bug fixes
This set of patches is an additional set of updates and bug fixes to
the ibmvnic driver which applies on top of the previous set of updates
sent out on 4/19.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ibmvnic: Add set_link_state routine for setting adapter link state
Create a common routine for setting the link state for the vnic adapter.
This update moves the sending of the crq and waiting for the link state
response to a common place. The new routine also adds handling of
resending the crq in cases of getting a partial success response.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When handling a fatal error in the driver, there can be additional
error information provided by the vios. This information is not
always present, so only retrieve the additional error information
when present.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ibmvnic: Insert header on VLAN tagged received frame
This patch addresses a modification in the PAPR+ specification which now
defines a previously reserved value for vNIC capabilities. It indicates
whether the system firmware performs a VLAN header stripping on all VLAN
tagged received frames, in case it does, the behavior expected is for
the ibmvnic driver to be responsible for inserting the VLAN header.
Reported-by: Manvanthara B. Puttashankar <mputtash@in.ibm.com> Signed-off-by: Murilo Fossa Vicentini <muvic@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Fri, 21 Apr 2017 19:38:40 +0000 (15:38 -0400)]
ibmvnic: Set real number of rx queues
Along with 5 TX queues, 5 RX queues are allocated at the beginning of
device probe. However, only the real number of TX queues is set. Configure
the real number of RX queues as well.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Apr 2017 16:46:28 +0000 (12:46 -0400)]
Merge branch 'packet-fanout-unique-id'
Mike Maloney says:
====================
packet: Add option to create new fanout group with unique id.
Fanout uses a per net global namespace. A process that intends to create a
new fanout group can accidentally join an existing group. It is
not possible to detect this.
Add a socket option to specify on the first call to
setsockopt(..., PACKET_FANOUT, ...) to ensure that a new group is created.
Also add tests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Maloney [Fri, 21 Apr 2017 14:56:12 +0000 (10:56 -0400)]
selftests/net: add tests for PACKET_FANOUT_FLAG_UNIQUEID
Create two groups with PACKET_FANOUT_FLAG_UNIQUEID, add a socket to one.
Ensure that the groups can only be joined if all options are consistent
with the original except for this flag.
Signed-off-by: Mike Maloney <maloney@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Maloney [Fri, 21 Apr 2017 14:56:11 +0000 (10:56 -0400)]
packet: add PACKET_FANOUT_FLAG_UNIQUEID to assign new fanout group id.
Fanout uses a per net global namespace. A process that intends to create
a new fanout group can accidentally join an existing group. It is not
possible to detect this.
Add socket option PACKET_FANOUT_FLAG_UNIQUEID. When specified the
supplied fanout group id must be set to 0, and the kernel chooses an id
that is not already in use. This is an ephemeral flag so that
other sockets can be added to this group using setsockopt, but NOT
specifying this flag. The current getsockopt(..., PACKET_FANOUT, ...)
can be used to retrieve the new group id.
We assume that there are not a lot of fanout groups and that this is not
a high frequency call.
The method assigns ids starting at zero and increases until it finds an
unused id. It keeps track of the last assigned id, and uses it as a
starting point to find new ids.
Signed-off-by: Mike Maloney <maloney@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Quadros [Fri, 21 Apr 2017 13:15:38 +0000 (16:15 +0300)]
mdio_bus: Issue GPIO RESET to PHYs.
Some boards [1] leave the PHYs at an invalid state
during system power-up or reset thus causing unreliability
issues with the PHY which manifests as PHY not being detected
or link not functional. To fix this, these PHYs need to be RESET
via a GPIO connected to the PHY's RESET pin.
Some boards have a single GPIO controlling the PHY RESET pin of all
PHYs on the bus whereas some others have separate GPIOs controlling
individual PHY RESETs.
In both cases, the RESET de-assertion cannot be done in the PHY driver
as the PHY will not probe till its reset is de-asserted.
So do the RESET de-assertion in the MDIO bus driver.
[1] - am572x-idk, am571x-idk, a437x-idk
Signed-off-by: Roger Quadros <rogerq@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Apr 2017 16:35:57 +0000 (12:35 -0400)]
Merge branch 'VSOCK-add-vsockmon'
Stefan Hajnoczi says:
====================
VSOCK: vsockmon virtual device to monitor AF_VSOCK sockets.
v5:
* Change vsock_deliver_tap() API to avoid unnecessary skb creation
[Jorgen]
* Fix skb leak when no taps are registered [Jorgen]
* s/cpu_to_le16(pkt->hdr.op)/le16_to_cpu(pkt->hdr.op)/ [Michael]
* Add af_vsock_tap.c and vsockmon.[ch] to MAINTAINERS
* checkpatch.pl and sparse fixes
v4:
* Add explicit reserved padding field to struct af_vsockmon_hdr and
drop __attribute__((packed)) [Michael, DaveM]
* Call synchronize_net() before module_put() [Michael]
v3:
* Hook virtio_transport.c (guest driver), not just drivers/vhost/vsock.c (host
driver)
* Fix DEFAULT_MTU macro definition [Zhu Yanjun]
* Rename af_vsockmon_hdr->t field ->transport for clarity
* Update .ndo_get_stats64() return type since it has changed
* Include missing <linux/module.h> header in af_vsock_tap.c
This is a continuation of Gerard Garcia's work on the vsockmon packet capture
interface for AF_VSOCK. Packet capture is an essential feature for network
communication. Gerard began addressing this feature gap in his Google Summer
of Code 2016 project. I have cleaned up, rebased, and retested the v2 series
he posted previously.
The design follows the nlmon packet capture interface closely. This is because
vsock has the same problem as netlink: there is no netdev on which packets can
be captured. The nlmon driver is a synthetic netdev purely for the purpose of
enabling packet capture. We follow the same approach here with vsockmon.
See include/uapi/linux/vsockmon.h in this series for details on the packet
layout.
The virtio drivers deal with struct virtio_vsock_pkt. Add
virtio_transport_deliver_tap_pkt(pkt) for handing packets to the
vsockmon device.
We call virtio_transport_deliver_tap_pkt(pkt) from
net/vmw_vsock/virtio_transport.c and drivers/vhost/vsock.c instead of
common code. This is because the drivers may drop packets before
handing them to common code - we still want to capture them.
Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add vsockmon virtual network device that receives packets from the vsock
transports and exposes them to user space.
Based on the nlmon device.
Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add tap functions that can be used by the vsock transports to
deliver packets to vsockmon virtual network devices.
Signed-off-by: Gerard Garcia <ggarcia@deic.uab.cat> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The series has set of enhancements for dcbx/dcbnl implementation of
qed/qede drivers.
- Patches (1) & (3) capture the sematic and debug changes.
- Patch (2) adds the driver support for populating RoCEv2 dcb data.
- Patch (4) adds the required support for reading/configuring the
IEEE selection field (SF).
- Patch (5) adds the support for configuring the static dcbx mode.
Please consider applying this to 'net-next' branch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch adds driver support for static/local dcbx mode. In this mode
adapter brings up the dcbx link with locally configured parameters
instead of performing the dcbx negotiation with the peer. The feature
is useful when peer device/switch doesn't support dcbx.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In the older firmware there was no distinction between RoCE and RoCEv2
whereas the newer firmware (8.15.3.0) allows us to configure each
independently. Driver need to populate the RoCEv2 data in its specific
structure.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Apr 2017 15:53:53 +0000 (11:53 -0400)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2017-04-20
This series contains updates to e1000, e1000e, igb/vf and ixgb.
Tobias Klauser cleans up e1000, ixgb and igbvf from having a local
function or structure for netdev stats.
Bernd Faust fixes an issue for 82579 devices, where the clock frequency
was being incorrectly set for these devices. These devices only support
96MHz, so make sure they are set to use only that.
Yury Kylulin extends the work Jake and Alex did for ixgbe in MAC filter
handling into the igb driver.
Kim Tatt Chuah enables igb to wake up by packet and to read the necessary
Wake Up Status (WUS) and Wake Up Packet Memory (WUPM) registers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
bpf, doc: update list of architectures that do eBPF JIT
update the list and remove 'in the future' statement,
since all still alive 64-bit architectures now do eBPF JIT.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix sparse warnings in recent ipoib support.
The RDMA functions are not used yet, hide behind #ifdef.
Based on comment, they will eventually be local so make static.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Roi Dayan [Sun, 25 Sep 2016 11:27:17 +0000 (14:27 +0300)]
net/mlx5: E-Switch, Add control for encapsulation
Implement the devlink e-switch encapsulation control set and get
callbacks. Apply the value set by the user on the switchdev offloads
mode when creating the fast FDB table where offloaded rules will be set.
Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Roi Dayan [Sun, 25 Sep 2016 10:52:44 +0000 (13:52 +0300)]
net/devlink: Add E-Switch encapsulation control
This is an e-switch global knob to enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.
The actual encap/decap is carried out (along with the matching and other actions)
per offloaded e-switch rules, e.g as done when offloading the TC tunnel key action.
Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ensure that HCI_UART_PROTO_READY is cleared before close(hu) is
called which closes the Data Link protocol layer.
Therefore, add the missing bit clear of HCI_UART_PROTO_READY to
hci_uart_init_work() so that the flag is cleared when
hci_register_dev fails.
Without the fix, the functions of the Data Link protocol layer could
potentially be accessed after that layer has been closed. This
could lead to a crash as memory would have been freed in that layer.
Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Dean Jenkins [Thu, 20 Apr 2017 17:06:39 +0000 (18:06 +0100)]
Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work()
If hci_register_dev() returns an error in hci_uart_init_work()
then the HCI_UART_REGISTERED bit gets erroneously set due to
a missing return statement. Therefore, add the missing return
statement.
The consequence of the missing return is that the HCI UART is not
registered but HCI_UART_REGISTERED is set which allows the code
to think that hu->hdev is safe to access but hu->hdev has been
freed so could lead to a crash.
Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
A device driver must not select the COMMON_CLK subsystem, as that conflicts
with platforms that provide a legacy implementation of the clk API:
drivers/clk/clk.o: In function `clk_enable':
clk.c:(.text.clk_enable+0x0): multiple definition of `clk_enable'
arch/arm/mach-sa1100/clock.o:clock.c:(.text.clk_enable+0x0): first defined here
drivers/clk/clk.o: In function `clk_round_rate':
clk.c:(.text.clk_round_rate+0x0): multiple definition of `clk_round_rate'
arch/arm/mach-sa1100/clock.o:clock.c:(.text.clk_round_rate+0x0): first defined here
drivers/clk/clk.o: In function `clk_get_parent':
clk.c:(.text.clk_get_parent+0x0): multiple definition of `clk_get_parent'
arch/arm/mach-sa1100/clock.o:clock.c:(.text.clk_get_parent+0x0): first defined here
drivers/clk/clk.o: In function `clk_get_rate':
clk.c:(.text.clk_get_rate+0x0): multiple definition of `clk_get_rate'
This changes the 'select' into 'depends on', as all other similar drivers do.
Bluetooth: try to improve CONFIG_SERIAL_DEV_BUS dependency
With CONFIG_SERIAL_DEV_BUS=m, the hci_serdev.o file does not actually
get built into hci_uart.o as the Makefile doesn't pick it up, leading
to a link error with anything referring to it:
ERROR: "hci_uart_register_device" [drivers/bluetooth/hci_nokia.ko] undefined!
scripts/Makefile.modpost:91: recipe for target '__modpost' failed
Changing this in the Makefile would cause another problem when
hci_uart itself is built-in and cannot reference symbols from the
serdev module.
This tries to address both problems by introducing a new hidden
Kconfig symbol that controls both the compilation of hci_serdev.o
and whether the Nokia driver can be selected. This seems to address
the problem for me, though there might be a better way to do it.
Bluetooth: hci_ll: Fix NULL pointer deref on FW upload failure
Avoid NULL pointer dereference occurring due to freeing
skb containing an error pointer. It can easily be triggered
by using the driver with broken uart (i.e. due to misconfigured
pinmuxing).
2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu.
3) Fix out of bounds access in ipv6 segment routing code, from David
Lebrun.
4) Don't write into the header of cloned SKBs in smsc95xx driver, from
James Hughes.
5) Several other drivers have this bug too, fix them. From Eric
Dumazet.
6) Fix access to uninitialized data in TC action cookie code, from
Wolfgang Bumiller.
7) Fix double free in IPV6 segment routing, again from David Lebrun.
8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern.
9) Fix use after free in qrtr code, from Dan Carpenter.
10) Don't double-destroy devices in ip6mr code, from Nikolay
Aleksandrov.
11) Don't pass out-of-range TX queue indices into drivers, from Tushar
Dave.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
netpoll: Check for skb->queue_mapping
ip6mr: fix notification device destruction
bpf, doc: update bpf maintainers entry
net: qrtr: potential use after free in qrtr_sendmsg()
bpf: Fix values type used in test_maps
net: ipv6: RTF_PCPU should not be settable from userspace
gso: Validate assumption of frag_list segementation
kaweth: use skb_cow_head() to deal with cloned skbs
ch9200: use skb_cow_head() to deal with cloned skbs
lan78xx: use skb_cow_head() to deal with cloned skbs
sr9700: use skb_cow_head() to deal with cloned skbs
cx82310_eth: use skb_cow_head() to deal with cloned skbs
smsc75xx: use skb_cow_head() to deal with cloned skbs
ipv6: sr: fix double free of skb after handling invalid SRH
MAINTAINERS: Add "B:" field for networking.
net sched actions: allocate act cookie early
qed: Fix issue in populating the PFC config paramters.
qed: Fix possible system hang in the dcbnl-getdcbx() path.
qed: Fix sending an invalid PFC error mask to MFW.
qed: Fix possible error in populating max_tc field.
...
Tushar Dave [Thu, 20 Apr 2017 22:57:31 +0000 (15:57 -0700)]
netpoll: Check for skb->queue_mapping
Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
otherwise skbs with queue_mapping greater than real_num_tx_queues
can be sent to the underlying driver and can result in kernel panic.
One such event is running netconsole and enabling VF on the same
device. Or running netconsole and changing number of tx queues via
ethtool on same device.
Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
because we call unregister_netdevice_many for a device that is already
being destroyed. In IPv4's ipmr that has been resolved by two commits
long time ago by introducing the "notify" parameter to the delete
function and avoiding the unregister when called from a notifier, so
let's do the same for ip6mr.
Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
With CONFIG_I2C=m and NET_DSA_SMSC_LAN9303=y, we run into a link error:
drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_byte_reg_read':
regmap-i2c.c:(.text.regmap_smbus_byte_reg_read+0x18): undefined reference to `i2c_smbus_read_byte_data'
drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_byte_reg_write':
regmap-i2c.c:(.text.regmap_smbus_byte_reg_write+0x18): undefined reference to `i2c_smbus_write_byte_data'
drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_word_reg_read':
regmap-i2c.c:(.text.regmap_smbus_word_reg_read+0x18): undefined reference to `i2c_smbus_read_word_data'
drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_word_read_swapped':
regmap-i2c.c:(.text.regmap_smbus_word_read_swapped+0x18): undefined reference to `i2c_smbus_read_word_data'
drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_word_write_swapped':
This adds a Kconfig dependency to avoid the broken configuration.
Fixes: be4e119f9914 ("net: dsa: LAN9303: add I2C managed mode support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Apr 2017 19:29:40 +0000 (15:29 -0400)]
Merge tag 'nfc-next-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:
====================
NFC 4.12 pull request
This is the NFC pull request for 4.12. We have:
- Improvements for the pn533 command queue handling and device
registration order.
- Removal of platform data for the pn544 and st21nfca drivers.
- Additional device tree options to support more trf7970a hardware options.
- Support for Sony's RC-S380P through the port100 driver.
- Removal of the obsolte nfcwilink driver.
- Headers inclusion cleanups (miscdevice.h, unaligned.h) for many drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: fix wq initialization for links created via netlink
Earlier patch 4493b81bea ("bonding: initialize work-queues during
creation of bond") moved the work-queue initialization from bond_open()
to bond_create(). However this caused the link those are created using
netlink 'create bond option' (ip link add bondX type bond); create the
new trunk without initializing work-queues. Prior to the above mentioned
change, ndo_open was in both paths and things worked correctly. The
consequence is visible in the report shared by Joe Stringer -
I've noticed that this patch breaks bonding within namespaces if
you're not careful to perform device cleanup correctly.
Here's my repro script, you can run on any net-next with this patch
and you'll start seeing some weird behaviour:
ip netns add foo
ip li add veth0 type veth peer name veth0+ netns foo
ip li add veth1 type veth peer name veth1+ netns foo
ip netns exec foo ip li add bond0 type bond
ip netns exec foo ip li set dev veth0+ master bond0
ip netns exec foo ip li set dev veth1+ master bond0
ip netns exec foo ip addr add dev bond0 192.168.0.1/24
ip netns exec foo ip li set dev bond0 up
ip li del dev veth0
ip li del dev veth1
The second to last command segfaults, last command hangs. rtnl is now
permanently locked. It's not a problem if you take bond0 down before
deleting veths, or delete bond0 before deleting veths. If you delete
either end of the veth pair as per above, either inside or outside the
namespace, it hits this problem.
Here's some kernel logs:
[ 1221.801610] bond0: Enslaving veth0+ as an active interface with an up link
[ 1224.449581] bond0: Enslaving veth1+ as an active interface with an up link
[ 1281.193863] bond0: Releasing backup interface veth0+
[ 1281.193866] bond0: the permanent HWaddr of veth0+ -
16:bf:fb:e0:b8:43 - is still in use by bond0 - set the HWaddr of
veth0+ to a different address to avoid conflicts
[ 1281.193867] ------------[ cut here ]------------
[ 1281.193873] WARNING: CPU: 0 PID: 2024 at kernel/workqueue.c:1511
__queue_delayed_work+0x13f/0x150
[ 1281.193873] Modules linked in: bonding veth openvswitch nf_nat_ipv6
nf_nat_ipv4 nf_nat autofs4 nfsd auth_rpcgss nfs_acl binfmt_misc nfs
lockd grace sunrpc fscache ppdev vmw_balloon coretemp psmouse
serio_raw vmwgfx ttm drm_kms_helper vmw_vmci netconsole parport_pc
configfs drm i2c_piix4 fb_sys_fops syscopyarea sysfillrect sysimgblt
shpchp mac_hid nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
nf_defrag_ipv4 nf_conntrack libcrc32c lp parport hid_generic usbhid
hid mptspi mptscsih e1000 mptbase ahci libahci
[ 1281.193905] CPU: 0 PID: 2024 Comm: ip Tainted: G W
4.10.0-bisect-bond-v0.14 #37
[ 1281.193906] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[ 1281.193906] Call Trace:
[ 1281.193912] dump_stack+0x63/0x89
[ 1281.193915] __warn+0xd1/0xf0
[ 1281.193917] warn_slowpath_null+0x1d/0x20
[ 1281.193918] __queue_delayed_work+0x13f/0x150
[ 1281.193920] queue_delayed_work_on+0x27/0x40
[ 1281.193929] bond_change_active_slave+0x25b/0x670 [bonding]
[ 1281.193932] ? synchronize_rcu_expedited+0x27/0x30
[ 1281.193935] __bond_release_one+0x489/0x510 [bonding]
[ 1281.193939] ? addrconf_notify+0x1b7/0xab0
[ 1281.193942] bond_netdev_event+0x2c5/0x2e0 [bonding]
[ 1281.193944] ? netconsole_netdev_event+0x124/0x190 [netconsole]
[ 1281.193947] notifier_call_chain+0x49/0x70
[ 1281.193948] raw_notifier_call_chain+0x16/0x20
[ 1281.193950] call_netdevice_notifiers_info+0x35/0x60
[ 1281.193951] rollback_registered_many+0x23b/0x3e0
[ 1281.193953] unregister_netdevice_many+0x24/0xd0
[ 1281.193955] rtnl_delete_link+0x3c/0x50
[ 1281.193956] rtnl_dellink+0x8d/0x1b0
[ 1281.193960] rtnetlink_rcv_msg+0x95/0x220
[ 1281.193962] ? __kmalloc_node_track_caller+0x35/0x280
[ 1281.193964] ? __netlink_lookup+0xf1/0x110
[ 1281.193966] ? rtnl_newlink+0x830/0x830
[ 1281.193967] netlink_rcv_skb+0xa7/0xc0
[ 1281.193969] rtnetlink_rcv+0x28/0x30
[ 1281.193970] netlink_unicast+0x15b/0x210
[ 1281.193971] netlink_sendmsg+0x319/0x390
[ 1281.193974] sock_sendmsg+0x38/0x50
[ 1281.193975] ___sys_sendmsg+0x25c/0x270
[ 1281.193978] ? mem_cgroup_commit_charge+0x76/0xf0
[ 1281.193981] ? page_add_new_anon_rmap+0x89/0xc0
[ 1281.193984] ? lru_cache_add_active_or_unevictable+0x35/0xb0
[ 1281.193985] ? __handle_mm_fault+0x4e9/0x1170
[ 1281.193987] __sys_sendmsg+0x45/0x80
[ 1281.193989] SyS_sendmsg+0x12/0x20
[ 1281.193991] do_syscall_64+0x6e/0x180
[ 1281.193993] entry_SYSCALL64_slow_path+0x25/0x25
[ 1281.193995] RIP: 0033:0x7f6ec122f5a0
[ 1281.193995] RSP: 002b:00007ffe69e89c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1281.193997] RAX: ffffffffffffffda RBX: 00007ffe69e8dd60 RCX: 00007f6ec122f5a0
[ 1281.193997] RDX: 0000000000000000 RSI: 00007ffe69e89c90 RDI: 0000000000000003
[ 1281.193998] RBP: 00007ffe69e89c90 R08: 0000000000000000 R09: 0000000000000003
[ 1281.193999] R10: 00007ffe69e89a10 R11: 0000000000000246 R12: 0000000058f14b9f
[ 1281.193999] R13: 0000000000000000 R14: 00000000006473a0 R15: 00007ffe69e8e450
[ 1281.194001] ---[ end trace 713a77486cbfbfa3 ]---
Fixes: 4493b81bea ("bonding: initialize work-queues during creation of bond") Reported-by: Joe Stringer <joe@ovn.org> Tested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 20 Apr 2017 15:27:58 +0000 (17:27 +0200)]
bpf, doc: update bpf maintainers entry
Add various related files that have been missing under
BPF entry covering essential parts of its infrastructure
and also add myself as co-maintainer.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently driver use phy_start_aneg() in arc_emac_open() to bring
up PHY. But phy_start() function is more appropriate for this purposes.
Besides that it call phy_start_aneg() as part of PHY startup sequence
it also can correctly bring up PHY from error and suspended states.
So the patch replace phy_start_aneg() to phy_start().
Also the patch add call to phy_stop() to arc_emac_stop() to allow
the PHY device to be fully suspended when the interface is unused.
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 20 Apr 2017 10:21:30 +0000 (13:21 +0300)]
net: qrtr: potential use after free in qrtr_sendmsg()
If skb_pad() fails then it frees the skb so we should check for errors.
Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Miller [Thu, 20 Apr 2017 19:20:16 +0000 (15:20 -0400)]
bpf: Fix values type used in test_maps
Maps of per-cpu type have their value element size adjusted to 8 if it
is specified smaller during various map operations.
This makes test_maps as a 32-bit binary fail, in fact the kernel
writes past the end of the value's array on the user's stack.
To be quite honest, I think the kernel should reject creation of a
per-cpu map that doesn't have a value size of at least 8 if that's
what the kernel is going to silently adjust to later.
If the user passed something smaller, it is a sizeof() calcualtion
based upon the type they will actually use (just like in this testcase
code) in later calls to the map operations.
Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY") Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
This adds the basic infrastructure for IPsec hardware
offloading, it creates a configuration API and adjusts
the packet path.
1) Add the needed netdev features to configure IPsec offloads.
2) Add the IPsec hardware offloading API.
3) Prepare the ESP packet path for hardware offloading.
4) Add gso handlers for esp4 and esp6, this implements
the software fallback for GSO packets.
5) Add xfrm replay handler functions for offloading.
6) Change ESP to use a synchronous crypto algorithm on
offloading, we don't have the option for asynchronous
returns when we handle IPsec at layer2.
7) Add a xfrm validate function to validate_xmit_skb. This
implements the software fallback for non GSO packets.
8) Set the inner_network and inner_transport members of
the SKB, as well as encapsulation, to reflect the actual
positions of these headers, and removes them only once
encryption is done on the payload.
From Ilan Tayari.
9) Prepare the ESP GRO codepath for hardware offloading.
10) Fix incorrect null pointer check in esp6.
From Colin Ian King.
11) Fix for the GSO software fallback path to detect the
fallback correctly.
From Ilan Tayari.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Apr 2017 18:13:11 +0000 (14:13 -0400)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-04-19
This series contains updates to i40e and i40evf only, most notable being
the addition of trace points for BPF programs.
Tobias Klauser updates i40evf to use net_device stats struct instead
of a local private copy.
Preethi updates the VF driver to not enable receive checksum offload by
default for tunneled packets.
Alex fixes an issue he introduced when he converted the code over to
using the length field to determine if a descriptor was done or not.
Mitch adds the ability to dump additional information on the VFs, which
is not available through 'ip link show' using debugfs.
Scott adds trace points to the drivers so that BPF programs can be
attached for feature testing and verification.
Jingjing adds admin queue functions for Pipeline Personalization Profile
commands.
Jake does most of the heavy lifting in this series, starting with the
a reduction in the scope of the RTNL lock being held while resetting VFs
to allow multiple PFs to reset in a timely manner. Factored out the
direct queue modification so that we are able to re-use the code.
Reduced the wait time for admin queue commands to complete, since we were
waiting a minimum of a millisecond, when in practice the admin queue
command is processed often much faster. Cleaned up code (flag) we never
use. Make the code to resetting all the VFs optimized for parallel
computing instead of the current way is a serialized fashion, to help
reduce the time it takes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The NAPI data structure is embedded in the netvsc_device structure
and is freed when device is closed. There is still a reference
(in NAPI list) to this which causes a crash in netif_napi_del
when device is removed. Fix by managing NAPI instances correctly.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 19 Apr 2017 21:21:22 +0000 (14:21 -0700)]
net_sched: remove useless NULL to tp->root
There is no need to NULL tp->root in ->destroy(), since tp is
going to be freed very soon, and existing readers are still
safe to read them.
For cls_route, we always init its tp->root, so it can't be NULL,
we can drop more useless code.
Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 19 Apr 2017 21:21:21 +0000 (14:21 -0700)]
net_sched: move the empty tp check from ->destroy() to ->delete()
We could have a race condition where in ->classify() path we
dereference tp->root and meanwhile a parallel ->destroy() makes it
a NULL. Daniel cured this bug in commit d936377414fa
("net, sched: respect rcu grace period on cls destruction").
This happens when ->destroy() is called for deleting a filter to
check if we are the last one in tp, this tp is still linked and
visible at that time. The root cause of this problem is the semantic
of ->destroy(), it does two things (for non-force case):
1) check if tp is empty
2) if tp is empty we could really destroy it
and its caller, if cares, needs to check its return value to see if it
is really destroyed. Therefore we can't unlink tp unless we know it is
empty.
As suggested by Daniel, we could actually move the test logic to ->delete()
so that we can safely unlink tp after ->delete() tells us the last one is
just deleted and before ->destroy().
Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") Cc: Roi Dayan <roid@mellanox.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
set. Flags passed to the kernel are blindly copied to the allocated
rt6_info by ip6_route_info_create making a newly inserted route appear
as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
and expects rt->dst.from to be set - which it is not since it is not
really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
generates the fault.
Fix by checking for the flag and failing with EINVAL.
Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 19 Apr 2017 21:01:17 +0000 (23:01 +0200)]
bpf: add napi_id read access to __sk_buff
Add napi_id access to __sk_buff for socket filter program types, tc
program types and other bpf_convert_ctx_access() users. Having access
to skb->napi_id is useful for per RX queue listener siloing, f.e.
in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
used, meaning SO_REUSEPORT enabled listeners can then select the
corresponding socket at SYN time already [1]. The skb is marked via
skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d4339028b35
("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
the NAPI ID associated with the queue for steering, which requires a
prior sk_mark_napi_id() after the socket was looked up.
Semantics for the __sk_buff napi_id access are similar, meaning if
skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
then an invalid napi_id of 0 is returned to the program, otherwise a
valid non-zero napi_id.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
K. Y. Srinivasan [Wed, 19 Apr 2017 20:53:49 +0000 (13:53 -0700)]
netvsc: Deal with rescinded channels correctly
We will not be able to send packets over a channel that has been
rescinded. Make necessary adjustments so we can properly cleanup
even when the channel is rescinded. This issue can be trigerred
in the NIC hot-remove path.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Apr 2017 17:33:55 +0000 (13:33 -0400)]
Merge branch 'ibmvnic-updates-and-bug-fixes'
Nathan Fontenot says:
====================
ibmvnic: Updates and bug fixes
This set of patches is a series of updates to remove some unneeded
and unused code in the driver as well as bug fixes for the
ibmvnic driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brian King [Wed, 19 Apr 2017 17:45:10 +0000 (13:45 -0400)]
ibmvnic: Disable irq prior to close
Add some code to call disable_irq on all the vnic interface's irqs.
This fixes a crash observed when closing an active interface, as
seen in the oops below when we try to access a buffer in the interrupt
handler which we've already freed.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We should not be releasing the crq's when calling close for the
adapter, these need to remain open to facilitate operations such
as updating the mac address. The crq's should be released in the
adpaters remove routine.
Additionally, we need to call release_reources from remove. This
corrects the scenario of trying to remove an adapter that has only
been probed.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The inflight list used to track memory that is allocated for crq that are
inflight is not needed. The one piece of the inflight list that does need
to be cleaned at module exit is the error buffer list which is already
attached to the adapter struct.
This patch removes the inflight list and moves checking the error buffer
list to ibmvnic_remove.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Brian King [Wed, 19 Apr 2017 17:44:53 +0000 (13:44 -0400)]
ibmvnic: Do not disable IRQ after scheduling tasklet
Since the primary CRQ is only used for service functions and
not in the performance path, simplify the code a bit and avoid
disabling the IRQ.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Brian King [Wed, 19 Apr 2017 17:44:47 +0000 (13:44 -0400)]
ibmvnic: Fixup atomic API usage
Replace a couple of modifications of an atomic followed
by a read of the atomic, which is no longer atomic, to
use atomic_XX_return variants to avoid race conditions.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Brian King [Wed, 19 Apr 2017 17:44:41 +0000 (13:44 -0400)]
ibmvnic: Unmap longer term buffer before free
Make sure we unregister long term buffers from the adapter
prior to DMA unmapping it and freeing the buffer. Failure
to do so could result in a DMA to a now invalid address.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ibmvnic: Fix ibmvnic_change_mac_addr struct format
The ibmvnic_change_mac_addr struct alignment was not matching the defined
format in PAPR+, it had the reserved and return code fields swapped. As a
consequence, the CHANGE_MAC_ADDR_RSP commands were being improperly handled
and executed even when the operation wasn't successfully completed by the
system firmware.
Also changing the endianness of the debug message to make it easier to
parse the CRQ content.
Signed-off-by: Murilo Fossa Vicentini <muvic@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Wed, 19 Apr 2017 17:44:29 +0000 (13:44 -0400)]
ibmvnic: Report errors when failing to release sub-crqs
Add reporting of errors when releasing sub-crqs fails.
Signed-off-by: Thomas Falcon <tlfalcon@us.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilan Tayari [Wed, 19 Apr 2017 18:26:07 +0000 (21:26 +0300)]
gso: Validate assumption of frag_list segementation
Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list
pointer") assumes that all SKBs in a frag_list (except maybe the last
one) contain the same amount of GSO payload.
This assumption is not always correct, resulting in the following
warning message in the log:
skb_segment: too many frags
For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
one frag, and some with 2 frags.
After GRO, the frag_list SKBs end up having different amounts of payload.
If this frag_list SKB is then forwarded, the aforementioned assumption
is violated.
Validate the assumption, and fall back to software GSO if it not true.
Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212 Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
gcc points out an useless assignment that was added during code refactoring:
drivers/net/ethernet/cavium/liquidio/lio_ethtool.c: In function 'octnet_intrmod_callback':
drivers/net/ethernet/cavium/liquidio/lio_ethtool.c:1315:59: error: parameter 'oct_dev' set but not used [-Werror=unused-but-set-parameter]
This is harmless but can clearly be remove to avoid the warning.
Fixes: 50c0add534d2 ("liquidio: refactor interrupt moderation code") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:26 +0000 (09:59 -0700)]
kaweth: use skb_cow_head() to deal with cloned skbs
We can use skb_cow_head() to properly deal with clones,
especially the ones coming from TCP stack that allow their head being
modified. This avoids a copy.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:25 +0000 (09:59 -0700)]
ch9200: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:24 +0000 (09:59 -0700)]
lan78xx: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:23 +0000 (09:59 -0700)]
sr9700: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:22 +0000 (09:59 -0700)]
cx82310_eth: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: cc28a20e77b2 ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:21 +0000 (09:59 -0700)]
smsc75xx: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning
Constants used for tuning are generally a bad idea, especially as hardware
changes over time. Replace the constant 2 jiffies with sysctl variable
netdev_budget_usecs to enable sysadmins to tune the softirq processing.
Also document the variable.
For example, a very fast machine might tune this to 1000 microseconds,
while my regression testing 486DX-25 needs it to be 4000 microseconds on
a nearly idle network to prevent time_squeeze from being incremented.
Version 2: changed jiffies to microseconds for predictable units.
Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Apr 2017 17:21:32 +0000 (13:21 -0400)]
Merge branch 'iptunnel-policy-based-routing'
Craig Gallek says:
====================
ip_tunnel: Allow policy-based routing through tunnels
iproute2 changes to follow. Example usage:
ip link add gre-test type gre local 10.0.0.1 remote 10.0.0.2 fwmark 0x4
ip -detail link show gre-test
...
ip link set gre-test type gre fwmark 0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_tunnel: Allow policy-based routing through tunnels
This feature allows the administrator to set an fwmark for
packets traversing a tunnel. This allows the use of independent
routing tables for tunneled packets without the use of iptables.
There is no concept of per-packet routing decisions through IPv4
tunnels, so this implementation does not need to work with
per-packet route lookups as the v6 implementation may
(with IP6_TNL_F_USE_ORIG_FWMARK).
Further, since the v4 tunnel ioctls share datastructures
(which can not be trivially modified) with the kernel's internal
tunnel configuration structures, the mark attribute must be stored
in the tunnel structure itself and passed as a parameter when
creating or changing tunnel attributes.
Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>