Daniel Borkmann [Mon, 11 Jan 2016 00:16:39 +0000 (01:16 +0100)]
bpf: support ipv6 for bpf_skb_{set,get}_tunnel_key
After IPv6 support has recently been added to metadata dst and related
encaps, add support for populating/reading it from an eBPF program.
Commit d3aa45ce6b ("bpf: add helpers to access tunnel metadata") started
with initial IPv4-only support back then (due to IPv6 metadata support
not being available yet).
To stay compatible with older programs, we need to test for the passed
structure size. Also TOS and TTL support from the ip_tunnel_info key has
been added. Tested with vxlan devs in collect meta data mode with IPv4,
IPv6 and in compat mode over different network namespaces.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 11 Jan 2016 00:16:38 +0000 (01:16 +0100)]
bpf: export helper function flags and reject invalid ones
Export flags used by eBPF helper functions through UAPI, so they can be
used by programs (instead of them redefining all flags each time or just
using the hard-coded values). It also gives a better overview what flags
are used where and we can further get rid of the extra macros defined in
filter.c. Moreover, reject invalid flags.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Sat, 9 Jan 2016 01:35:36 +0000 (20:35 -0500)]
bonding: make mii_status sysfs node consistent
The spew in /proc/net/bonding/bond0 uses netif_carrier_ok() to determine
mii_status, while /sys/class/net/bond0/bonding/mii_status looks at
curr_active_slave, which doesn't actually seem to be set sometimes when
the bond actually is up. A mode 4 bond configured via ifcfg-foo files on a
Red Hat Enterprise Linux system, after boot, comes up clean and
functional, but the sysfs node shows mii_status of down, while proc shows
up. A simple enough fix here seems to be to use the same method for
determining up or down in both places, and I'd opt for the one that seems
to match reality.
CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Sacren [Sat, 9 Jan 2016 23:07:11 +0000 (16:07 -0700)]
openvswitch: update kernel doc for struct vport
commit be4ace6e6b1b ("openvswitch: Move dev pointer into vport itself")
The commit above added @dev and moved @rcu to the bottom of struct
vport, but the change was not reflected in the kernel doc. So let's
update the kernel doc as well.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Sacren [Sat, 9 Jan 2016 23:07:10 +0000 (16:07 -0700)]
openvswitch: fix struct geneve_port member name
commit 6b001e682e90 ("openvswitch: Use Geneve device.")
The commit above introduced 'port_no' as the name for the member of
struct geneve_port. The correct name should be 'dst_port' as described
in the kernel doc. Let's fix that member name and all the pertinent
instances so that both doc and code would be consistent.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Sacren [Sat, 9 Jan 2016 23:07:09 +0000 (16:07 -0700)]
openvswitch: clean up unused function
commit 6b001e682e90 ("openvswitch: Use Geneve device.")
The commit above deleted the only call site of ovs_tunnel_route_lookup()
and now that function is not used any more. So let's delete the function
definition as well.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Sat, 9 Jan 2016 21:19:39 +0000 (13:19 -0800)]
net: ti: cpmac: Fix build error due to missed API change
Commit 7f854420fbfe ("phy: Add API for {un}registering an mdio device to
a bus") introduces an API to access mii_bus structures, but missed to
update the TI cpamc driver. This results in the following error message.
drivers/net/ethernet/ti/cpmac.c: In function 'cpmac_probe':
drivers/net/ethernet/ti/cpmac.c:1119:18: error:
'struct mii_bus' has no member named 'phy_map'
Fixes: 7f854420fbfe ("phy: Add API for {un}registering an mdio device to a bus") Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Sun, 10 Jan 2016 15:10:45 +0000 (07:10 -0800)]
net: tc35815: Drop unused variable
Commit e7f4dc3536a4 ("mdio: Move allocation of interrupts into core")
removes some code from tc_mii_init(), but does not remove a now unused
variable. This results in the following build warning.
drivers/net/ethernet/toshiba/tc35815.c: In function 'tc_mii_init':
drivers/net/ethernet/toshiba/tc35815.c:670:6: warning: unused variable 'i'
Fixes: e7f4dc3536a4 ("mdio: Move allocation of interrupts into core") Cc: Andrew Lunn <andrew@lunn.ch> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Sun, 10 Jan 2016 15:10:44 +0000 (07:10 -0800)]
net: tc35815: Fix build error due to missed API change
Commit 7f854420fbfe ("phy: Add API for {un}registering an mdio device to
a bus") introduces an API to access mii_bus structures, but missed to
update the tc35815 driver. This results in the following error message.
drivers/net/ethernet/toshiba/tc35815.c: In function 'tc_mii_probe':
drivers/net/ethernet/toshiba/tc35815.c:617:18: error:
'struct mii_bus' has no member named 'phy_map'
drivers/net/ethernet/toshiba/tc35815.c:623:24: error:
'struct mii_bus' has no member named 'phy_map'
Instead of looping over the list of phy addresses to find a phy chip,
use phy_find_first(). While the intent of the original code was to return
an error if more than one phy was specified, this code path was never
executed because the loop aborted after finding the first phy. The
original code is therefore semantically identical to phy_find_first(),
thus it is simpler and more straightforward to use phy_find_first()
directly.
Fixes: 7f854420fbfe ("phy: Add API for {un}registering an mdio device to a bus") Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Jan 2016 04:10:10 +0000 (23:10 -0500)]
Merge tag 'wireless-drivers-next-for-davem-2016-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
brcmfmac
* query features through firmware command
* ARP offload through inet notifier
* force probe to succeed for debugging purposes
* random mac support for scheduled scan
* support wowl upon net detect
iwlwifi
* bug fixes and improvements for firmware debug system
* advertise support for Rx A-MSDU in A-MPDU
* support -20.ucode
* fix WoWLAN for iwldvm
* preparations towards multiple Rx queues
* platform power improvements for GO mode when no clients are associated
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lubomir Rintel [Fri, 8 Jan 2016 12:47:23 +0000 (13:47 +0100)]
ipv6: always add flag an address that failed DAD with DADFAILED
The userspace needs to know why is the address being removed so that it can
perhaps obtain a new address.
Without the DADFAILED flag it's impossible to distinguish removal of a
temporary and tentative address due to DAD failure from other reasons (device
removed, manual address removal).
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Fri, 8 Jan 2016 12:19:16 +0000 (10:19 -0200)]
net: lpc_eth: Remove unused variables
Commit e7f4dc3536a400 ("mdio: Move allocation of interrupts into core")
introduced the following build warnings:
drivers/net/ethernet/nxp/lpc_eth.c: In function 'lpc_mii_init':
drivers/net/ethernet/nxp/lpc_eth.c:865:1: warning: label 'err_out_1' defined but not used [-Wunused-label]
drivers/net/ethernet/nxp/lpc_eth.c:826:20: warning: unused variable 'i' [-Wunused-variable]
Remove the unused variables to fix them.
Reported-by: Olof's autobuilder <build@lixom.net> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Sudip Mukherjee [Fri, 8 Jan 2016 11:28:15 +0000 (16:58 +0530)]
bfin_mac: fix error path
While building blackfin defconfig we were getting a build warning:
warning: label 'out_err_irq_alloc' defined but not used.
Commit e7f4dc3536a4 ("mdio: Move allocation of interrupts into core")
removed the label out_err_mdiobus_register but then mistakenly jumped to
out_err_alloc. But it was actually supposed to jump to out_err_irq_alloc.
Fixes: e7f4dc3536a4 ("mdio: Move allocation of interrupts into core") Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sudip Mukherjee [Fri, 8 Jan 2016 11:28:14 +0000 (16:58 +0530)]
phy: fix blackfin build failure
The build of blackfin defconfig is failing with the error:
error: 'struct mii_bus' has no member named 'phy_map'
A new API mdiobus_get_phy() was introduced and phy_map was removed but
it was not changed here.
Fixes: 7f854420fbfe ("phy: Add API for {un}registering an mdio device to a bus.") Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb4: Fixes static checker warning in mps_tcam_show()
The commit 115b56af88b5 ("cxgb4: Update mps_tcam output to include T6
fields") from Dec 23, 2015, leads to the following static checker
warning:
drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c:1735
mps_tcam_show()
warn: we tested 'lookup_type' before and it was 'true'
Fixing it.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Jan 2016 03:21:32 +0000 (22:21 -0500)]
Merge branch 'emac-RK3036'
Xing Zheng says:
====================
Add support emac for the RK3036 SoC platform
We have supported the emac for RK3066/RK3188, but the RK3036 have
some configuration different with them. We should let the driver of
emac_rockchip compatible with other Rockchip SoCs.
Changes in v2:
- Separate DTS from patch series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xing Zheng [Fri, 8 Jan 2016 01:35:01 +0000 (09:35 +0800)]
net: ethernet: arc: Keep emac compatibility for more Rockchip SoCs
On the RK3066/RK3188, there was fixed GRF offset configuration to set emac
and fixed DIV2 mac TX/RX clock. So, we need to easily set and fit to other
SoCs (RK3036) which maybe have different GRF offset, and need adjust mac
TX/RX clock.
Signed-off-by: Xing Zheng <zhengxing@rock-chips.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xing Zheng [Fri, 8 Jan 2016 01:35:00 +0000 (09:35 +0800)]
net: ethernet: arc: Probe emac after set RMII clock
After enter arc_emac_probe, emac will get_phy_id, phy_poll_reset and
other connecting PHY via mdiobus_read, so we need to set correct
ref clock rate for emac before probe emac.
Signed-off-by: Xing Zheng <zhengxing@rock-chips.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rob Swindell [Fri, 8 Jan 2016 00:56:58 +0000 (19:56 -0500)]
bnxt_en: Reset embedded processor after applying firmware upgrade
Use HWRM_FW_RESET command to request a self-reset of the embedded
processor(s) after successfully applying a firmware update. For boot
processor, the self-reset is currently deferred until the next PCIe reset.
Signed-off-by: Rob Swindell <swindell@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Fri, 8 Jan 2016 00:56:57 +0000 (19:56 -0500)]
bnxt_en: Zero pad firmware messages to 128 bytes.
For future compatibility, zero pad all messages that the driver sends
to the firmware to 128 bytes. If these messages are extended in the
future with new byte enables, zero padding these messages now will
guarantee future compatibility.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 7 Jan 2016 21:29:47 +0000 (22:29 +0100)]
net, sched: add clsact qdisc
This work adds a generalization of the ingress qdisc as a qdisc holding
only classifiers. The clsact qdisc works on ingress, but also on egress.
In both cases, it's execution happens without taking the qdisc lock, and
the main difference for the egress part compared to prior version of [1]
is that this can be applied with _any_ underlying real egress qdisc (also
classless ones).
Besides solving the use-case of [1], that is, allowing for more programmability
on assigning skb->priority for the mqprio case that is supported by most
popular 10G+ NICs, it also opens up a lot more flexibility for other tc
applications. The main work on classification can already be done at clsact
egress time if the use-case allows and state stored for later retrieval
f.e. again in skb->priority with major/minors (which is checked by most
classful qdiscs before consulting tc_classify()) and/or in other skb fields
like skb->tc_index for some light-weight post-processing to get to the
eventual classid in case of a classful qdisc. Another use case is that
the clsact egress part allows to have a central egress counterpart to
the ingress classifiers, so that classifiers can easily share state (e.g.
in cls_bpf via eBPF maps) for ingress and egress.
Currently, default setups like mq + pfifo_fast would require for this to
use, for example, prio qdisc instead (to get a tc_classify() run) and to
duplicate the egress classifier for each queue. With clsact, it allows
for leaving the setup as is, it can additionally assign skb->priority to
put the skb in one of pfifo_fast's bands and it can share state with maps.
Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
w/o the need to perform a skb_dst_force() to hold on to it any longer. In
lwt case, we can also use this facility to setup dst metadata via cls_bpf
(bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
that (case of IFF_NO_QUEUE devices, for example).
The realization can be done without any changes to the scheduler core
framework. All it takes is that we have two a-priori defined minors/child
classes, where we can mux between ingress and egress classifier list
(dev->ingress_cl_list and dev->egress_cl_list, latter stored close to
dev->_tx to avoid extra cacheline miss for moderate loads). The egress
part is a bit similar modelled to handle_ing() and patched to a noop in
case the functionality is not used. Both handlers are now called
sch_handle_ingress() and sch_handle_egress(), code sharing among the two
doesn't seem practical as there are various minor differences in both
paths, so that making them conditional in a single handler would rather
slow things down.
Full compatibility to ingress qdisc is provided as well. Since both
piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
per netdevice, and thus ingress qdisc specific behaviour can be retained
for user space. This means, either a user does 'tc qdisc add dev foo ingress'
and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
alternative, where both, ingress and egress classifier can be configured
as in the below example. ingress qdisc supports attaching classifier to any
minor number whereas clsact has two fixed minors for muxing between the
lists, therefore to not break user space setups, they are better done as
two separate qdiscs.
I decided to extend the sch_ingress module with clsact functionality so
that commonly used code can be reused, the module is being aliased with
sch_clsact so that it can be auto-loaded properly. Alternative would have been
to add a flag when initializing ingress to alter its behaviour plus aliasing
to a different name (as it's more than just ingress). However, the first would
end up, based on the flag, choosing the new/old behaviour by calling different
function implementations to handle each anyway, the latter would require to
register ingress qdisc once again under different alias. So, this really begs
to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
by its own that share callbacks used by both.
Adding filters (deleting, etc works analogous by specifying ingress/egress):
# tc filter add dev foo ingress bpf da obj bar.o sec ingress
# tc filter add dev foo egress bpf da obj bar.o sec egress
# tc filter show dev foo ingress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
# tc filter show dev foo egress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action
A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
show an empty list for clsact. Either using the parent names (ingress/egress)
or specifying the full major/minor will then show the related filter lists.
Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.
[1] http://patchwork.ozlabs.org/patch/512949/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 11 Jan 2016 00:13:21 +0000 (01:13 +0100)]
ethernet: amd: au1000: Remove pointless warning
The warning about being able to read any MDIO device, not just the
attached ethernet devices PHY applies to all MDIO drivers. So remove
it. This also removes a reference to a member in phy_device which has
moved.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Sun, 10 Jan 2016 20:04:32 +0000 (12:04 -0800)]
net: ethernet: faraday: Use phy_find_first() instead of open coding it
Use phy_find_first() to find the first phy device instead of
open coding it.
Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Guenter Roeck [Sun, 10 Jan 2016 20:03:16 +0000 (12:03 -0800)]
net: ethernet: broadcom: Fix build errors
Commit 7f854420fbfe ("phy: Add API for {un}registering an mdio device to
a bus") introduces an API to access mii_bus structures, but missed to
update the sb1250 driver. This results in the following build error.
drivers/net/ethernet/broadcom/sb1250-mac.c: In function 'sbmac_mii_probe':
drivers/net/ethernet/broadcom/sb1250-mac.c:2360:24: error:
'struct mii_bus' has no member named 'phy_map'
Use phy_find_first() instead of open coding it.
Commit 2220943a21e2 ("phy: Centralise print about attached phy") introduces
the following build error.
drivers/net/ethernet/broadcom/sb1250-mac.c: In function 'sbmac_mii_probe':
drivers/net/ethernet/broadcom/sb1250-mac.c:2383:20: error: 'phydev' undeclared
Fixes: 7f854420fbfe ("phy: Add API for {un}registering an mdio device to a bus") Fixes: 2220943a21e2 ("phy: Centralise print about attached phy") Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Romain Perier [Thu, 7 Jan 2016 20:13:28 +0000 (21:13 +0100)]
stmmac: Don't exit mdio registration when mdio subnode is not found in the DTS
Originally, most of the platforms using this driver did not define an mdio subnode
in the devicetree. Commit e34d65 ("stmmac: create of compatible mdio bus for stmmac driver")
introduced a backward compatibily issue by using of_mdiobus_register explicitly
with an mdio subnode. This patch fixes the issue by calling the function
mdiobus_register, when mdio subnode is not found. The driver is now compatible
with both modes.
Fixes: e34d65696d2e ("stmmac: create of compatible mdio bus for stmmac driver") Signed-off-by: Romain Perier <romain.perier@gmail.com> Tested-by: Phil Reid <preid@electromag.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 10 Jan 2016 22:54:28 +0000 (17:54 -0500)]
Merge branch 'bpf-next'
Daniel Borkmann says:
====================
BPF update
Fixes a csum issue on ingress. As mentioned previously, net-next
seems just fine imho. Later on, will follow up with couple of
replacements like ovs_skb_postpush_rcsum() etc.
Thanks!
v1 -> v2:
- Added patch 1 with helper
- Implemented Hannes' idea to just use csum_partial, thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 7 Jan 2016 14:50:23 +0000 (15:50 +0100)]
bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions
Add a small helper skb_postpush_rcsum() and fix up redirect locations
that need CHECKSUM_COMPLETE fixups on ingress. dev_forward_skb() expects
a proper csum that covers also Ethernet header, f.e. since 2c26d34bbcc0
("net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding"), we
also do skb_postpull_rcsum() after pulling Ethernet header off via
eth_type_trans().
When using eBPF in a netns setup f.e. with vxlan in collect metadata mode,
I can trigger the following csum issue with an IPv6 setup:
What happens is that on ingress, we push Ethernet header back in, either
from cls_bpf or right before skb_do_redirect(), but without updating csum.
The "hw csum failure" can be fixed by using the new skb_postpush_rcsum()
helper for the dev_forward_skb() case to correct the csum diff again.
Thanks to Hannes Frederic Sowa for the csum_partial() idea!
Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 7 Jan 2016 14:50:22 +0000 (15:50 +0100)]
net, sched: add skb_at_tc_ingress helper
Add a skb_at_tc_ingress() as this will be needed elsewhere as well and
can hide the ugly ifdef.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch series enables the tcp keepalive mechanism
to be configured per net namespace. This is especially useful
if you have multiple containers hosted on one node and one of
them is under DoS- in such situations one thing which could
be done is to configure the tcp keepalive settings such that
connections for that particular container are being reset
faster.
Another scenario where not being able to control those knob
comes per container is problematic is occurs the value of
net.netfilter.nf_conntrack_tcp_timeout_established is set
below the keepalive interval, in such situations the server won't
send an RST packet resulting in applications not trying to
reconnect and stale connection waiting. Changing the global
keepalive value is a possible solution but it might interfere
with other containers.
The three patches gradually convert each of the affected knobs
to be per netns. I thought it would be easier for review than
put everything in one patch. If people deem it more appropriate
to squash everything in one patch (maybe after review) I'd
be more than happy to do it.
The patches have been compile-tested on 4.4 and functionally
tested on 3.12 and they work as expected.
These are based off 4.4-rc8
====================
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nikolay Borisov [Thu, 7 Jan 2016 14:38:43 +0000 (16:38 +0200)]
ipv4: Namespaceify tcp_keepalive_time sysctl knob
Different net namespaces might have different requirements as to
the keepalive time of tcp sockets. This might be required in cases
where different firewall rules are in place which require tcp
timeout sockets to be increased/decreased independently of the host.
Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset add Linux hardware reflection for L2 multicast offload and add
MC support in mlxsw. For every bridge MDB entry insertion, either by IGMP
snooping or by static insertion/removal, a switchdev ops is been called.
In mlxsw, a new multicast group (MID) is been created and ports are assigned.
When all ports are removed, the multicast group is been deleted.
---
v1->v2:
- GFP_ATOMIC->GFP_KERNEL change in patch 7/8
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Sun, 10 Jan 2016 20:06:29 +0000 (21:06 +0100)]
switchdev: Adding IGMP snooping documentation
Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Sun, 10 Jan 2016 20:06:28 +0000 (21:06 +0100)]
mlxsw: Adding layer 2 multicast support
Add SWITCHDEV_OBJ_ID_PORT_MDB switchdev ops support. On first MDB insertion
creates a new multicast group (MID) and add members port to the MID. Also
add new MDB entry for the flooding-domain (fid-vid) and link the MDB entry
to the newly constructed MC group.
Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Sun, 10 Jan 2016 20:06:27 +0000 (21:06 +0100)]
mlxsw: Adding VID to FID translatation
Adding a generic function that translate VID to FID.
Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Sun, 10 Jan 2016 20:06:26 +0000 (21:06 +0100)]
mlxsw: Changing the maximum number of multicast group to a define
Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Sun, 10 Jan 2016 20:06:25 +0000 (21:06 +0100)]
mlxsw: reg: Adding SMID register
Adding back SMID register definition and packing. For each MC group a new
SMID entry will be generated.
Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Sun, 10 Jan 2016 20:06:24 +0000 (21:06 +0100)]
mlxsw: reg: Add definition of multicast record for SFD register
Multicast-related records have specific format in SFD register.
Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Sun, 10 Jan 2016 20:06:23 +0000 (21:06 +0100)]
bridge: Reflect MDB entries to hardware
Offload MDB changes per port to hardware
Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Sun, 10 Jan 2016 20:06:22 +0000 (21:06 +0100)]
switchdev: Adding MDB entry offload
Define HW multicast entry: MAC and VID.
Using a MAC address simplifies support for both IPV4 and IPv6.
Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 10 Jan 2016 02:48:36 +0000 (21:48 -0500)]
Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
Included changes:
- increase internal module version
- increase BLA wait periods to 6
- purge BLA backbone table when it is disabled
- make sure post function is invoked only if sysfs value is changed
- simplify code by removing useless NULL checks
- various corrections to existing kerneldoc
- minor cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 7 Jan 2016 10:50:30 +0000 (11:50 +0100)]
mlxsw: spectrum: remove FDB entry in case we get unknown object notification
It may happen that we get notification for FDB entry for object (port,
lag, vport), which does not exist. Currently we ignore that, which only
causes this being re-sent in next notification. The entry will never
disappear. So get rid of it by simply removing it using SFD register.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 7 Jan 2016 10:50:29 +0000 (11:50 +0100)]
mlxsw: spectrum: pass local_port to mlxsw_sp_port_fdb_uc_op
Do not pass struct mlxsw_sp_port to mlxsw_sp_port_fdb_uc_op and rather
just pass local_port. This is needed in case this is called from SFN
process function and mlxsw_sp_port is not existent for particular
local_port.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sven Eckelmann [Sun, 6 Sep 2015 19:38:48 +0000 (21:38 +0200)]
batman-adv: Add kerneldoc for batadv_neigh_node::refcount
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Sven Eckelmann [Sun, 6 Sep 2015 19:38:47 +0000 (21:38 +0200)]
batman-adv: Remove kerneldoc for missing struct members
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Sven Eckelmann [Sun, 6 Sep 2015 19:38:46 +0000 (21:38 +0200)]
batman-adv: Fix kerneldoc member names in for main structs
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Sven Eckelmann [Sun, 6 Sep 2015 19:38:45 +0000 (21:38 +0200)]
batman-adv: Fix kernel-doc parsing of main structs
kernel-doc is not able to skip an #ifdef between the kernel documentation
block and the start of the struct. Moving the #ifdef before the kernel doc
block avoids this problem
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Sven Eckelmann [Wed, 28 Oct 2015 09:58:00 +0000 (10:58 +0100)]
batman-adv: Change ifconfig examples to iproute2
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Markus Elfring [Tue, 3 Nov 2015 18:20:34 +0000 (19:20 +0100)]
batman-adv: Split a condition check
Let us split a check for a condition at the beginning of the
batadv_is_ap_isolated() function so that a direct return can be performed
in this function if the variable "vlan" contained a null pointer.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Markus Elfring [Tue, 3 Nov 2015 18:20:34 +0000 (19:20 +0100)]
batman-adv: Delete an unnecessary check before the function call "batadv_softif_vlan_free_ref"
The batadv_softif_vlan_free_ref() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Markus Elfring [Sun, 15 Nov 2015 08:00:42 +0000 (09:00 +0100)]
batman-adv: Less checks in batadv_tvlv_unicast_send()
* Let us return directly if a call of the batadv_orig_hash_find() function
returned a null pointer.
* Omit the initialisation for the variable "skb" at the beginning.
* Replace an assignment by a call of the kfree_skb() function
and delete the affected variable "ret" then.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Markus Elfring [Sun, 15 Nov 2015 07:04:43 +0000 (08:04 +0100)]
batman-adv: Delete unnecessary checks before the function call "kfree_skb"
The kfree_skb() function tests whether its argument is NULL and then
returns immediately. Thus the test around the calls is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Sven Eckelmann [Sun, 13 Sep 2015 07:44:45 +0000 (09:44 +0200)]
batman-adv: Add function to convert string to batadv throughput
The code to convert the throughput information from a string to the
batman-adv internal (100Kibit/s) representation is duplicated in
batadv_parse_gw_bandwidth. Move this functionality to its own function
batadv_parse_throughput to reduce the code complexity.
Signed-off-by: Sven Eckelmann <sven@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Simon Wunderlich [Tue, 17 Nov 2015 13:11:26 +0000 (14:11 +0100)]
batman-adv: only call post function if something changed
Currently, the post function is also called on errors or if there were
no changes, which is redundant for the functions currently using these
facilities.
Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
If networks take a long time to come up, e.g. due to lossy links, then
the bridge loop avoidance wait time to suppress broadcasts may not wait
long enough and detect a backbone before the mesh is brought up.
Increasing the wait period further to 60 seconds makes this scenario
less likely.
Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
batman-adv: purge bridge loop avoidance when its disabled
When bridge loop avoidance is disabled through sysfs, the internal
datastructures are not disabled, but only BLA operations are disabled.
To be sure that they are removed, purge the data immediately. That is
especially useful if a firmwares network state is changed, and the BLA
wait periods should restart on the new network.
Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Sven Eckelmann [Tue, 3 Nov 2015 18:20:34 +0000 (19:20 +0100)]
batman-adv: Fix lockdep annotation of batadv_tlv_container_remove
The function handles tlv containers and not tlv handlers. Thus the
lockdep_assert_held has to check for the container_list lock.
Fixes: 2c72d655b044 ("batman-adv: Annotate deleting functions with external lock via lockdep") Signed-off-by: Sven Eckelmann <sven@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <a@unstable.cc>
Daniel Borkmann [Wed, 6 Jan 2016 21:32:16 +0000 (22:32 +0100)]
bpf: cleanup bpf_prog_run_{save,clear}_cb helpers
Move the details behind the cb[] access into a small helper to decouple
and make them generic for bpf_prog_run_save_cb()/bpf_prog_run_clear_cb()
that was introduced via commit ff936a04e5f2 ("bpf: fix cb access in socket
filter programs"). Also add a comment to better clarify what is done in
bpf_skb_cb().
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 8 Jan 2016 18:31:06 +0000 (13:31 -0500)]
Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2016-01-08
This series contains updates to ixgbe only.
Vasu provides three fixes for ixgbe, first assigns a minimum credit to
a traffic class to resolve a Tx hang for CEE mode configuration. Second
fix changes the driver to use netdev->fcoe_ddp_xid instead of our local
IXGBE_FCOE_DDP_MAX, since it is correctly set for our different devices
and avoids a DDP skip error on X550. Lastly fix the PFC configuration
to include X550 devices.
Emil provides a fix for reporting the speed in ethtool by using the
stored value in out adapter structure. This is due to external drivers
may end up with unknown speed when calling ethtool_get_settings().
Mark fixes the handling of any outer UDP checksum, by passing the
skb up with CHECKSUM_NONE when an outer UDP checksum is set. This
will cause the stack to check the checksum, also do not increment an
error counter because we do not really know if there is an actual error.
Ixgbe ATR was not handling IPv6 extended headers, so ATR is not being
performed on such packets. Fix this by skipping extended headers
when they are present.
Usha fixes an issue with X550 and getting FDMI HBA attributes when
FCoE support is enabled.
Neerav fixes an issue for X550 when FCoE and SR-IOV are enabled, which
the hardware generates MDD events. Resolve this by setting the expected
values in the transmit context descriptors for FCoE/FIP frames and
adding a flush after writing the RDLEN register.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Here's one more bluetooth-next pull request for the 4.5 kernel:
- Support for CRC check and promiscuous mode for CC2520
- Fixes to btmrvl driver
- New ACPI IDs for hci_bcm driver
- Limited Discovery support for the Bluetooth mgmt interface
- Minor other cleanups here and there
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 8 Jan 2016 17:44:21 +0000 (12:44 -0500)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2016-01-08
This series contains updates to i40e and i40evf only.
Mitch adds a useful error message and return value when the VFs are being
reset, since there is a brief window of time when the VF cannot be
configured because they do not have a VSI to configure. Also made a driver
change to allow the user to specify a zero MAC address for VFs, which
causes the existing MAC address to be removed and allows the VF to use a
random address (like libvirt).
Sowmini Varadhan from Oracle adds similar functionality/fix to i40e that
was added to ixgbe earlier. The fix attempts to look up the MAC address
in the Open Firmware on systems that support it and use IDPROM on SPARC
if no OF address is found.
Anjali provides a fix the code path so that we do not call skb_set_hash()
if the feature is disabled.
Jesse removes a device ID that has never been productized and there are
no plans to use it, so just remove it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter: nf_tables: Add new attributes into nft_set to store user data.
User data is stored at after 'nft_set_ops' private data into 'data[]'
flexible array. The field 'udata' points to user data and 'udlen' stores
its length.
Add new flag NFTA_SET_USERDATA.
Signed-off-by: Carlos Falgueras García <carlosfg@riseup.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nfnetlink_queue: validate dependencies to avoid breaking atomicity
Check that dependencies are fulfilled before updating the queue
instance, otherwise we can leave things in intermediate state on errors
in nfqnl_recv_config().
Signed-off-by: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Mark Rustad [Wed, 9 Dec 2015 22:55:37 +0000 (14:55 -0800)]
ixgbe: Make ATR recognize IPv6 extended headers
Right now ATR is not handling IPv6 extended headers, so ATR is not
being performed on such packets. Fix that by skipping extended
headers when they are present. This also fixes a problem where
the ATR code was not checking that the inner protocol was actually
TCP before setting up the signature rules. Since the protocol check
is intimately involved with the extended header processing as well,
this all gets fixed together.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Neerav Parikh [Wed, 9 Dec 2015 06:13:58 +0000 (22:13 -0800)]
ixgbe: Fix MDD events generated when FCoE+SRIOV are enabled
When FCoE is enabled with SR-IOV on the X550 NIC the hardware
generates MDD events.
This patch fixes these by setting the expected values in the
Tx context descriptors for FCoE/FIP frames and adding a flush
after writing the RDLEN register.
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Usha Ketineni [Tue, 8 Dec 2015 12:01:18 +0000 (04:01 -0800)]
ixgbe: Fix to get FDMI HBA attributes information with X550
Check whether the FCOE support is enabled for the devices to get the
FDMI HBA attributes information instead of checking each device id.
Also, add Model string information for X550.
Signed-off-by: Usha Ketineni <usha.k.ketineni@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mark Rustad [Fri, 4 Dec 2015 19:26:43 +0000 (11:26 -0800)]
ixgbe: Correct handling of any outer UDP checksum setting
If an outer UDP checksum is set, pass the skb up with CHECKSUM_NONE
so that the stack will check the checksum. Do not increment an
error counter, because we don't know that there is an actual error.
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Thu, 3 Dec 2015 23:20:06 +0000 (15:20 -0800)]
ixgbe: do not call check_link for ethtool in ixgbe_get_settings()
In ixgbe_get_settings() the link status and speed of the interface
are determined based on a read from the LINKS register via the call
to mac.ops.check.link(). This can cause issues where external drivers
may end up with unknown speed when calling ethtool_get_setings().
Instead of calling the mac.ops.check_link() we can report the speed
from the adapter structure which is populated by the driver.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Vasu Dev [Mon, 23 Nov 2015 18:31:25 +0000 (10:31 -0800)]
ixgbe: fix broken PFC with X550
PFC is configuration is skipped for X550 devices due to a incorrect
device id check, fixing that to include X550 PFC configuration.
Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Vasu Dev [Mon, 23 Nov 2015 18:31:01 +0000 (10:31 -0800)]
ixgbe: use correct FCoE DDP max check
Use fcoe_ddp_xid from netdev as this is correctly set for different
device IDs to avoid DDP skip error on X550 as "xid=0x20b out-of-range"
Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Vasu Dev [Mon, 23 Nov 2015 18:30:51 +0000 (10:30 -0800)]
ixgbe: Fill at least min credits to a TC credit refills
Currently credit_refill and credit_max could be zero for a TC and that
is causing Tx hang for CEE mode configuration, so to fix that have at
min credit assigned to a TC and that is as what IEEE mode already does.
Change-ID: If652c133093a21e530f4e9eab09097976f57fb12 Signed-off-by: Vasu Dev <vasu.dev@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>