====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for your net-next tree.
They are:
* nf_tables set timeout infrastructure from Patrick Mchardy.
1) Add support for set timeout support.
2) Add support for set element timeouts using the new set extension
infrastructure.
4) Add garbage collection helper functions to get rid of stale elements.
Elements are accumulated in a batch that are asynchronously released
via RCU when the batch is full.
5) Add garbage collection synchronization helpers. This introduces a new
element busy bit to address concurrent access from the netlink API and the
garbage collector.
5) Add timeout support for the nft_hash set implementation. The garbage
collector peridically checks for stale elements from the workqueue.
* iptables/nftables cgroup fixes:
6) Ignore non full-socket objects from the input path, otherwise cgroup
match may crash, from Daniel Borkmann.
7) Fix cgroup in nf_tables.
8) Save some cycles from xt_socket by skipping packet header parsing when
skb->sk is already set because of early demux. Also from Daniel.
* br_netfilter updates from Florian Westphal.
9) Save frag_max_size and restore it from the forward path too.
10) Use a per-cpu area to restore the original source MAC address when traffic
is DNAT'ed.
11) Add helper functions to access physical devices.
12) Use these new physdev helper function from xt_physdev.
13) Add another nf_bridge_info_get() helper function to fetch the br_netfilter
state information.
14) Annotate original layer 2 protocol number in nf_bridge info, instead of
using kludgy flags.
15) Also annotate the pkttype mangling when the packet travels back and forth
from the IP to the bridge layer, instead of using a flag.
* More nf_tables set enhancement from Patrick:
16) Fix possible usage of set variant that doesn't support timeouts.
17) Avoid spurious "set is full" errors from Netlink API when there are pending
stale elements scheduled to be released.
18) Restrict loop checks to set maps.
19) Add support for dynamic set updates from the packet path.
20) Add support to store optional user data (eg. comments) per set element.
BTW, I have also pulled net-next into nf-next to anticipate the conflict
resolution between your okfn() signature changes and Florian's br_netfilter
updates.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 9 Apr 2015 18:43:13 +0000 (14:43 -0400)]
Merge tag 'wireless-drivers-next-for-davem-2015-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
Major changes:
iwlwifi:
* some more work on LAR
* fixes for UMAC scan
* more work on debugging framework
* more work for 8000 devices
* cleanups and small bugfixes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Prohibit the use/abuse of the xfrm netlink interface on
32/64 bit compatibility tasks. We need a full compat
layer before we can allow this. From Fan Du.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 9 Apr 2015 18:25:26 +0000 (14:25 -0400)]
Merge branch 'dma_rmb_wmb'
Alexander Duyck says:
====================
Replace wmb()/rmb() with dma_wmb()/dma_rmb() where appropriate, round 2
More cleanup of drivers in order to start making use of dma_rmb and dma_wmb
calls. This is another pass of what I would consider to be low hanging
fruit. There may be other opportunities to make use of the barriers in the
Mellanox and Chelsio drivers but I didn't want to risk meddling with code I
was not completely familiar with so I am leaving that for future work.
I have revisited the Mellanox driver changes. This time around I went only
for the sections with a clearly defined pattern. For dma_wmb I used it
between accesses of the descriptor bits followed by owner or size. For
dma_rmb I used it to replace rmb following a read of the ownership bit in
the descriptor.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 9 Apr 2015 01:49:49 +0000 (18:49 -0700)]
e100: Use dma_rmb/wmb where appropriate
Reduce the CPU overhead for transmit and receive by using lightweight dma_
barriers instead of full barriers where they are applicable.
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 9 Apr 2015 01:49:43 +0000 (18:49 -0700)]
i40e/i40evf: Use dma_rmb where appropriate
Update i40e and i40evf to use dma_rmb. This should improve performance by
decreasing the barrier overhead on strong ordered architectures.
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 9 Apr 2015 01:49:36 +0000 (18:49 -0700)]
mlx4/mlx5: Use dma_wmb/rmb where appropriate
This patch should help to improve the performance of the mlx4 and mlx5 on a
number of architectures. For example, on x86 the dma_wmb/rmb equates out
to a barrer() call as the architecture is already strong ordered, and on
PowerPC the call works out to a lwsync which is significantly less expensive
than the sync call that was being used for wmb.
I placed the new barriers between any spots that seemed to be trying to
order memory/memory reads or writes, if there are any spots that involved
MMIO I left the existing wmb in place as the new barriers cannot order
transactions between coherent and non-coherent memories.
v2: Reduced the replacments to just the spots where I could clearly
identify the usage pattern.
Cc: Amir Vadai <amirv@mellanox.com> Cc: Ido Shamay <idos@mellanox.com> Cc: Eli Cohen <eli@mellanox.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Thu, 9 Apr 2015 01:49:29 +0000 (18:49 -0700)]
cxgb3/4/4vf: Update drivers to use dma_rmb/wmb where appropriate
Update the Chelsio Ethernet drivers to use the dma_rmb/wmb calls instead of
the full barriers in order to improve performance.
Cc: Santosh Raspatur <santosh@chelsio.com> Cc: Hariprasad S <hariprasad@chelsio.com> Cc: Casey Leedom <leedom@chelsio.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
For the long-term, we should handle NETDEV_{UP,DOWN} event
from the lower device of a tunnel device.
Fixes: 56ef9c909b40 ("vxlan: Move socket initialization to within rtnl scope") Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 8 Apr 2015 22:34:04 +0000 (15:34 -0700)]
tcp: do not rearm rsk_timer on FastOpen requests
FastOpen requests are not like other regular request sockets.
They do not yet use rsk_timer : tcp_fastopen_queue_check()
simply manually removes one expired request from fastopenq->rskq_rst
list.
Therefore, tcp_check_req() must not call mod_timer_pending(),
otherwise we crash because rsk_timer was not initialized.
Fixes: fa76ce7328b ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Apr 2015 21:12:50 +0000 (17:12 -0400)]
Merge tag 'nfc-next-4.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:
====================
NFC: 4.1 pull request
This is the NFC pull request for 4.1.
This is a shorter one than usual, as the Intel Field Peak NFC
driver could not make it in time.
We have:
- A new driver for NXP NCI based chipsets, like e.g. the NPC100 or
the PN7150. It currently only supports an i2c physical layer, but
could easily be extended to work on top of e.g. SPI.
This driver also includes support for user space triggered firmware
updates.
- A few minor st21nfc[ab] fixes, cleanups, and comments improvements.
- A pn533 error return fix.
- A few NFC related logs formatting cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Miller [Wed, 8 Apr 2015 03:05:42 +0000 (23:05 -0400)]
netfilter: Fix switch statement warnings with recent gcc.
More recent GCC warns about two kinds of switch statement uses:
1) Switching on an enumeration, but not having an explicit case
statement for all members of the enumeration. To show the
compiler this is intentional, we simply add a default case
with nothing more than a break statement.
2) Switching on a boolean value. I think this warning is dumb
but nevertheless you get it wholesale with -Wswitch.
This patch cures all such warnings in netfilter.
Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Wed, 8 Apr 2015 19:19:17 +0000 (15:19 -0400)]
Merge branch 'selinux-nlmsg'
Nicolas Dichtel says:
====================
selinux: add some missing nlmsg commands
It's not a critical issue, thus the patches are based on net-next.
Patches are splitted because the 'Fixes' tag is not the same for all
commands.
==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Wed, 8 Apr 2015 16:36:42 +0000 (18:36 +0200)]
selinux/nlmsg: add XFRM_MSG_[NEW|GET]SADINFO
These commands are missing.
Fixes: 28d8909bc790 ("[XFRM]: Export SAD info.") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Wed, 8 Apr 2015 16:36:41 +0000 (18:36 +0200)]
selinux/nlmsg: add XFRM_MSG_GETSPDINFO
This command is missing.
Fixes: ecfd6b183780 ("[XFRM]: Export SPD info") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Wed, 8 Apr 2015 16:36:40 +0000 (18:36 +0200)]
selinux/nlmsg: add XFRM_MSG_NEWSPDINFO
This new command is missing.
Fixes: 880a6fab8f6b ("xfrm: configure policy hash table thresholds by netlink") Reported-by: Christophe Gouault <christophe.gouault@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Wed, 8 Apr 2015 16:36:39 +0000 (18:36 +0200)]
selinux/nlmsg: add RTM_GETNSID
This new command is missing.
Fixes: 9a9634545c70 ("netns: notify netns id events") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Wed, 8 Apr 2015 16:36:38 +0000 (18:36 +0200)]
selinux/nlmsg: add RTM_NEWNSID and RTM_GETNSID
These new commands are missing.
Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The DWMAC block on certain SoCs (such as IMG Pistachio) have a second
clock which must be enabled in order to access the peripheral's
register interface, so add support for requesting and enabling an
optional "pclk".
Signed-off-by: Andrew Bresticker <abrestic@chromium.org> Cc: James Hartley <james.hartley@imgtec.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Apr 2015 16:27:26 +0000 (12:27 -0400)]
Merge branch 'hv_netvsc_linearize'
Vitaly Kuznetsov says:
====================
hv_netvsc: linearize SKBs bigger than MAX_PAGE_BUFFER_COUNT-2 pages
This patch series fixes the same issue which was fixed in Xen with commit 97a6d1bb2b658ac85ed88205ccd1ab809899884d ("xen-netfront: Fix handling packets on
compound pages with skb_linearize").
It is relatively easy to create a packet which is small in size but occupies
more than 30 (MAX_PAGE_BUFFER_COUNT-2) pages. Here is a kernel-mode reproducer
which tries sending a packet with only 34 bytes of payload (but on 34 pages)
and fails:
static int __init sendfb_init(void)
{
struct socket *sock;
int i, ret;
struct sockaddr_in in4_addr = { 0 };
struct page *pages[17];
unsigned long flags;
ret = sock_create_kern(AF_INET, SOCK_STREAM, IPPROTO_TCP, &sock);
if (ret) {
pr_err("failed to create socket: %d!\n", ret);
return ret;
}
ret = sock->ops->connect(sock, (struct sockaddr *)&in4_addr, sizeof(in4_addr), 0);
if (ret) {
pr_err("failed to connect: %d!\n", ret);
return ret;
}
/* We can send up to 17 frags */
flags = MSG_MORE;
for (i = 0; i < 17; i++) {
if (i == 16)
flags = MSG_EOR;
pages[i] = alloc_pages(GFP_KERNEL | __GFP_COMP, 1);
if (!pages[i]) {
pr_err("out of memory!");
goto free_pages;
}
sock->ops->sendpage(sock, pages[i], PAGE_SIZE -1, 2, flags);
}
free_pages:
for (; i > 0; i--)
__free_pages(pages[i - 1], 1);
printk("sendfb_init: test done\n");
return -1;
}
module_init(sendfb_init);
MODULE_LICENSE("GPL");
A try to load such module results in multiple
'kernel: hv_netvsc vmbus_15 eth0: Packet too big: 100' messages as all retries
fail as well. It should also be possible to trigger the issue from userspace, I
expect e.g. NFS under heavy load to get stuck sometimes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
hv_netvsc: try linearizing big SKBs before dropping them
In netvsc_start_xmit() we can handle packets which are scattered around not
more than MAX_PAGE_BUFFER_COUNT-2 pages. It is, however, easy to create a
packet which is not big in size but occupies more pages (e.g. if it uses frags
on compound pages boundaries). When we drop such packet it cases sender to try
resending it but in most cases it will try resending the same packet which will
also get dropped, this will cause the particular connection to stick. To solve
the issue we can try linearizing skb.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
hv_netvsc: use single existing drop path in netvsc_start_xmit
... which validly uses dev_kfree_skb_any() instead of dev_kfree_skb().
Setting ret to -EFAULT and -ENOMEM have no real meaning here (we need to set
it to anything but -EAGAIN) as we drop the packet and return NETDEV_TX_OK
anyway.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Apr 2015 16:21:36 +0000 (12:21 -0400)]
Merge branch 'sfc-next'
Shradha Shah says:
====================
sfc: Nic specific sriov functions, netdev_ops and sriov_configure
First two patches among the series of patches to support SRIOV on EF10.
First patch declares nic specific sriov functions in nic specific headers,
creates only one instance of the netdev_ops, removes sriov functionality
from Falcon code.
Second patch adds support for sriov_configure.
The Virtual Functions can be enabled but they do not bind to the SFC
driver just yet.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shradha Shah [Wed, 8 Apr 2015 14:24:45 +0000 (15:24 +0100)]
sfc: Own header for nic-specific sriov functions, single instance of netdev_ops and sriov removed from Falcon code
By putting all the efx_{siena,ef10}_sriov_* declarations in
{siena,ef10}_sriov.h, ensure they cannot be called from nic-generic code.
Also fixes up an instance of this, where mcdi.c was calling
efx_siena_sriov_flr.
The single instance of netdev_ops should call general high level
functions that can then call something adapter specific in efx_nic_type.
We should only do adapter specialisation via efx_nic_type.
Removal of sriov functionality from the Falcon code means that tests
are needed for the presence of some callbacks.
Signed-off-by: Shradha Shah <sshah@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Apr 2015 16:15:15 +0000 (12:15 -0400)]
Merge branch 'dma_rmb_wmb'
Alexander Duyck says:
====================
Replace wmb()/rmb() with dma_wmb()/dma_rmb() where appropriate
This is a start of a side project cleaning up the drivers that can make use
of the dma_wmb and dma_rmb calls. The general idea is to start removing
the unnecessary wmb/rmb calls from a number of drivers and to make use of
the lighter weight dma_wmb/dma_rmb calls as this should allow for an
overall improvement in performance as each barrier can cost a significant
number of cycles and on architectures such as x86 this is unnecessary.
These changes are what I would consider low hanging fruit. The likelihood
of the changes introducing an error should be low since the use of the
barriers in these cases are fairly obvious.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 7 Apr 2015 23:55:27 +0000 (16:55 -0700)]
e1000, e1000e: Use dma_rmb instead of rmb for descriptor read ordering
This change replaces calls to rmb with dma_rmb in the case where we want to
order all follow-on descriptor reads after the check for the descriptor
status bit.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 7 Apr 2015 23:55:21 +0000 (16:55 -0700)]
s2io: Update driver to use dma_wmb
This change updates several spots where a wmb was being used to instead use
a dma_wmb to flush out writes before updating the control portion of the
descriptor.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 7 Apr 2015 23:55:14 +0000 (16:55 -0700)]
sungem, sunhme, sunvnet: Update drivers to use dma_wmb/rmb
This patch goes through and replaces wmb/rmb with dma_wmb/dma_rmb in cases
where the barrier is being used to order writes or reads to just memory and
doesn't involve any programmed I/O.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: call iptunnel_xmit with NULL sock pointer if no tunnel sock is available
Fixes: 79b16aadea32cce ("udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb().") Reported-by: David S. Miller <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: ip_tunnel: use net namespace from rtable not socket
The socket parameter might legally be NULL, thus sock_net is sometimes
causing a NULL pointer dereference. Using net_device pointer in dst_entry
is more reliable.
Fixes: b6a7719aedd7e5c ("ipv4: hash net ptr into fragmentation bucket selection") Reported-by: Rick Jones <rick.jones2@hp.com> Cc: Rick Jones <rick.jones2@hp.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Sun, 5 Apr 2015 12:43:38 +0000 (14:43 +0200)]
netfilter: nf_tables: support optional userdata for set elements
Add an userdata set extension and allow the user to attach arbitrary
data to set elements. This is intended to hold TLV encoded data like
comments or DNS annotations that have no meaning to the kernel.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sun, 5 Apr 2015 12:41:08 +0000 (14:41 +0200)]
netfilter: nf_tables: add support for dynamic set updates
Add a new "dynset" expression for dynamic set updates.
A new set op ->update() is added which, for non existant elements,
invokes an initialization callback and inserts the new element.
For both new or existing elements the extenstion pointer is returned
to the caller to optionally perform timer updates or other actions.
Element removal is not supported so far, however that seems to be a
rather exotic need and can be added later on.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sun, 5 Apr 2015 12:41:07 +0000 (14:41 +0200)]
netfilter: nf_tables: support different set binding types
Currently a set binding is assumed to be related to a lookup and, in
case of maps, a data load.
In order to use bindings for set updates, the loop detection checks
must be restricted to map operations only. Add a flags member to the
binding struct to hold the set "action" flags such as NFT_SET_MAP,
and perform loop detection based on these.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patrick McHardy [Sun, 5 Apr 2015 12:41:06 +0000 (14:41 +0200)]
netfilter: nf_tables: prepare set element accounting for async updates
Use atomic operations for the element count to avoid races with async
updates.
To properly handle the transactional semantics during netlink updates,
deleted but not yet committed elements are accounted for seperately and
are treated as being already removed. This means for the duration of
a netlink transaction, the limit might be exceeded by the amount of
elements deleted. Set implementations must be prepared to handle this.
Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
nf_bridge_info->mask is used for several things, for example to
remember if skb->pkt_type was set to OTHER_HOST.
For a bridge, OTHER_HOST is expected case. For ip forward its a non-starter
though -- routing expects PACKET_HOST.
Bridge netfilter thus changes OTHER_HOST to PACKET_HOST before hook
invocation and then un-does it after hook traversal.
This information is irrelevant outside of br_netfilter.
After this change, ->mask now only contains flags that need to be
known outside of br_netfilter in fast-path.
Future patch changes mask into a 2bit state field in sk_buff, so that
we can remove skb->nf_bridge pointer for good and consider all remaining
places that access nf_bridge info content a not-so fastpath.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: bridge: start splitting mask into public/private chunks
->mask is a bit info field that mixes various use cases.
In particular, we have flags that are mutually exlusive, and flags that
are only used within br_netfilter while others need to be exposed to
other parts of the kernel.
Remove BRNF_8021Q/PPPoE flags. They're mutually exclusive and only
needed within br_netfilter context.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: bridge: add helpers for fetching physin/outdev
right now we store this in the nf_bridge_info struct, accessible
via skb->nf_bridge. This patch prepares removal of this pointer from skb:
Instead of using skb->nf_bridge->x, we use helpers to obtain the in/out
device (or ifindexes).
Followup patches to netfilter will then allow nf_bridge_info to be
obtained by a call into the br_netfilter core, rather than keeping a
pointer to it in sk_buff.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: bridge: don't use nf_bridge_info data to store mac header
br_netfilter maintains an extra state, nf_bridge_info, which is attached
to skb via skb->nf_bridge pointer.
Amongst other things we use skb->nf_bridge->data to store the original
mac header for every processed skb.
This is required for ip refragmentation when using conntrack
on top of bridge, because ip_fragment doesn't copy it from original skb.
However there is no need anymore to do this unconditionally.
Move this to the one place where its needed -- when br_netfilter calls
ip_fragment().
Also switch to percpu storage for this so we can handle fragmenting
without accessing nf_bridge meta data.
Only user left is neigh resolution when DNAT is detected, to hold
the original source mac address (neigh resolution builds new mac header
using bridge mac), so rename ->data and reduce its size to whats needed.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Daniel Borkmann [Thu, 2 Apr 2015 12:28:30 +0000 (14:28 +0200)]
netfilter: x_tables: don't extract flow keys on early demuxed sks in socket match
Currently in xt_socket, we take advantage of early demuxed sockets
since commit 00028aa37098 ("netfilter: xt_socket: use IP early demux")
in order to avoid a second socket lookup in the fast path, but we
only make partial use of this:
We still unnecessarily parse headers, extract proto, {s,d}addr and
{s,d}ports from the skb data, accessing possible conntrack information,
etc even though we were not even calling into the socket lookup via
xt_socket_get_sock_{v4,v6}() due to skb->sk hit, meaning those cycles
can be spared.
After this patch, we only proceed the slower, manual lookup path
when we have a skb->sk miss, thus time to match verdict for early
demuxed sockets will improve further, which might be i.e. interesting
for use cases such as mentioned in 681f130f39e1 ("netfilter: xt_socket:
add XT_SOCKET_NOWILDCARD flag").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
hv_netvsc: Fix the packet free when it is in skb headroom
In the two places changed, we now use netvsc_xmit_completion() which properly
frees hv_netvsc_packet in or not in skb headroom.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The sum of RNDIS msg and PPI struct sizes is used in multiple places, so we define
a macro for them.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Lee [Mon, 6 Apr 2015 21:37:27 +0000 (14:37 -0700)]
tcp: RFC7413 option support for Fast Open client
Fast Open has been using an experimental option with a magic number
(RFC6994). This patch makes the client by default use the RFC7413
option (34) to get and send Fast Open cookies. This patch makes
the client solicit cookies from a given server first with the
RFC7413 option. If that fails to elicit a cookie, then it tries
the RFC6994 experimental option. If that also fails, it uses the
RFC7413 option on all subsequent connect attempts. If the server
returns a Fast Open cookie then the client caches the form of the
option that successfully elicited a cookie, and uses that form on
later connects when it presents that cookie.
The idea is to gradually obsolete the use of experimental options as
the servers and clients upgrade, while keeping the interoperability
meanwhile.
Signed-off-by: Daniel Lee <Longinus00@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Lee [Mon, 6 Apr 2015 21:37:26 +0000 (14:37 -0700)]
tcp: RFC7413 option support for Fast Open server
Fast Open has been using the experimental option with a magic number
(RFC6994) to request and grant Fast Open cookies. This patch enables
the server to support the official IANA option 34 in RFC7413 in
addition.
The change has passed all existing Fast Open tests with both
old and new options at Google.
Signed-off-by: Daniel Lee <Longinus00@gmail.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Tue, 7 Apr 2015 10:10:16 +0000 (12:10 +0200)]
netdevice.h: remove iflink description
Also move 'group' description to match the order of the net_device structure.
Fixes: 7a66bbc96ce9 ("net: remove iflink field from struct net_device") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Apr 2015 21:29:47 +0000 (17:29 -0400)]
Merge branch 'netns-next'
Nicolas Dichtel says:
====================
netns: enhance netlink interface for nsid
The first patch is a small cleanup. The second patch implements notifications
for netns id events. And the last one allows to dump existing netns id from
userland.
iproute2 patches are available, I can send them on demand.
v2: drop the first patch (the fix is now in net-next)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 7 Apr 2015 19:29:30 +0000 (15:29 -0400)]
Merge branch 'udp_tunnel_sk'
Prevent UDP tunnels from operating on garbage socket
So this should do the rest of the work such that when we encapsulate
into a UDP tunnel, the output path works on the UDP tunnel's socket
rather than skb->sk.
Part of this work is based upon changes done by Jiri Pirko some time
ago.
Basically the first step is to pass the socket through the nf_hook
okfn(), and then next we do the same for the UDP tunnel xmit routines.
Signed-off-by: David S. Miller <davem@davemloft.net>
David Miller [Mon, 6 Apr 2015 02:19:04 +0000 (22:19 -0400)]
netfilter: Pass socket pointer down through okfn().
On the output paths in particular, we have to sometimes deal with two
socket contexts. First, and usually skb->sk, is the local socket that
generated the frame.
And second, is potentially the socket used to control a tunneling
socket, such as one the encapsulates using UDP.
We do not want to disassociate skb->sk when encapsulating in order
to fix this, because that would break socket memory accounting.
The most extreme case where this can cause huge problems is an
AF_PACKET socket transmitting over a vxlan device. We hit code
paths doing checks that assume they are dealing with an ipv4
socket, but are actually operating upon the AF_PACKET one.
Signed-off-by: David S. Miller <davem@davemloft.net>
David Miller [Mon, 6 Apr 2015 02:19:00 +0000 (22:19 -0400)]
netfilter: Add socket pointer to nf_hook_state.
It is currently always set to NULL, but nf_queue is adjusted to be
prepared for it being set to a real socket by taking and releasing a
reference to that socket when necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
Miaoqing Pan [Wed, 1 Apr 2015 02:19:57 +0000 (10:19 +0800)]
ath9k: add extra GPIO led support
ar9550 or later chips, the AR_GPIO_IN_OUT register only can
control GPIO[0:3]. For the extra GPIO, use standard GPIO calls
instead of WMAC internal registers.
Signed-off-by: Miaoqing Pan <miaoqing@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Taehee Yoo [Fri, 27 Mar 2015 15:22:39 +0000 (00:22 +0900)]
rtlwifi: Add encryption argument in rtl_is_special_data for checking DHCP packet.
rtl8192cu can't connect to AP after physical reconnect.
according to dmesg, that problem's cause was DHCP timeout.
rtl_is_special_data function checks packet type for adjusting rate.
when that function is called from _rtl_rc_get_highest_rix, it can not
calculate offset correctly. so i add argument is_encn in rtl_is_special_data.
is_enc variable mean that iv header is added in skb parameter.
i test only rtl8192cu chipset. because i doesn't have other rtlwifi chipsets.
Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Here's what's probably the last bluetooth-next pull request for 4.1:
- Fixes for LE advertising data & advertising parameters
- Fix for race condition with HCI_RESET flag
- New BNEPGETSUPPFEAT ioctl, needed for certification
- New HCI request callback type to get the resulting skb
- Cleanups to use BIT() macro wherever possible
- Consolidate Broadcom device entries in the btusb HCI driver
- Check for valid flags in CMTP, HIDP & BNEP
- Disallow local privacy & OOB data combo to prevent a potential race
- Expose SMP & ECDH selftest results through debugfs
- Expose current Device ID info through debugfs
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
1) In TCP, don't register an FRTO for cumulatively ACK'd data that was
previously SACK'd, from Neal Cardwell.
2) Need to hold RNL mutex in ipv4 multicast code namespace cleanup,
from Cong WANG.
3) Similarly we have to hold RNL mutex for fib_rules_unregister(), also
from Cong WANG.
4) Revert and rework netns nsid allocation fix, from Nicolas Dichtel.
5) When we encapsulate for a tunnel device, skb->sk still points to the
user socket. So this leads to cases where we retraverse the
ipv4/ipv6 output path with skb->sk being of some other address
family (f.e. AF_PACKET). This can cause things to crash since the
ipv4 output path is dereferencing an AF_PACKET socket as if it were
an ipv4 one.
The short term fix for 'net' and -stable is to elide these socket
checks once we've entered an encapsulation sequence by testing
xmit_recursion.
Longer term we have a better solution wherein we pass the tunnel's
socket down through the output paths, but that is way too invasive
for 'net' and -stable.
From Hannes Frederic Sowa.
6) l2tp_init() failure path forgets to unregister per-net ops, from
Cong WANG.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net/mlx4_core: Fix error message deprecation for ConnectX-2 cards
net: dsa: fix filling routing table from OF description
l2tp: unregister l2tp_net_ops on failure path
mvneta: dont call mvneta_adjust_link() manually
ipv6: protect skb->sk accesses from recursive dereference inside the stack
netns: don't allocate an id for dead netns
Revert "netns: don't clear nsid too early on removal"
ip6mr: call del_timer_sync() in ip6mr_free_table()
net: move fib_rules_unregister() under rtnl lock
ipv4: take rtnl_lock and mark mrt table as freed on namespace cleanup
tcp: fix FRTO undo on cumulative ACK of SACKed range
xen-netfront: transmit fully GSO-sized packets
net/mlx4_core: Fix error message deprecation for ConnectX-2 cards
Commit 1daa4303b4ca ("net/mlx4_core: Deprecate error message at
ConnectX-2 cards startup to debug") did the deprecation only for port 1
of the card. Need to deprecate for port 2 as well.
Fixes: 1daa4303b4ca ("net/mlx4_core: Deprecate error message at ConnectX-2 cards startup to debug") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: fix filling routing table from OF description
According to description in 'include/net/dsa.h', in cascade switches
configurations where there are more than one interconnected devices,
'rtable' array in 'dsa_chip_data' structure is used to indicate which
port on this switch should be used to send packets to that are destined
for corresponding switch.
However, dsa_of_setup_routing_table() fills 'rtable' with port numbers
of the _target_ switch, but not current one.
This commit removes redundant devicetree parsing and adds needed port
number as a function argument. So dsa_of_setup_routing_table() now just
looks for target switch number by parsing parent of 'link' device node.
To remove possible misunderstandings with the way of determining target
switch number, a corresponding comment was added to the source code and
to the DSA device tree bindings documentation file.
This was tested on a custom board with two Marvell 88E6095 switches with
following corresponding routing tables: { -1, 10 } and { 8, -1 }.
Signed-off-by: Pavel Nakonechny <pavel.nakonechny@skitlab.ru> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"Updates for the input subsystem - two more tweaks for ALPS driver to
work out kinks after splitting the touchpad, trackstick, and potential
external PS/2 mouse into separate input devices.
Changes to support ALPS SS4 devices (protocol V8) will be coming in
4.1..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: alps - document stick behavior for protocol V2
Input: alps - report V2 Dualpoint Stick events via the right evdev node
Input: alps - report interleaved bare PS/2 packets via dev3
Commit 608cd71a9c7c ("tc: bpf: generalize pedit action") has added the
possibility to mangle packet data to BPF programs in the tc pipeline.
This patch adds two helpers bpf_l3_csum_replace() and bpf_l4_csum_replace()
for fixing up the protocol checksums after the packet mangling.
It also adds 'flags' argument to bpf_skb_store_bytes() helper to avoid
unnecessary checksum recomputations when BPF programs adjusting l3/l4
checksums and documents all three helpers in uapi header.
Moreover, a sample program is added to show how BPF programs can make use
of the mangle and csum helpers.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: protect skb->sk accesses from recursive dereference inside the stack
We should not consult skb->sk for output decisions in xmit recursion
levels > 0 in the stack. Otherwise local socket settings could influence
the result of e.g. tunnel encapsulation process.
ipv6 does not conform with this in three places:
1) ip6_fragment: we do consult ipv6_npinfo for frag_size
2) sk_mc_loop in ipv6 uses skb->sk and checks if we should
loop the packet back to the local socket
3) ip6_skb_dst_mtu could query the settings from the user socket and
force a wrong MTU
Furthermore:
In sk_mc_loop we could potentially land in WARN_ON(1) if we use a
PF_PACKET socket ontop of an IPv6-backed vxlan device.
Reuse xmit_recursion as we are currently only interested in protecting
tunnel devices.
Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
NFC: st21nfca: fix st21nfca_get_iso14443_3_uid data copy
st21nfca_get_iso14443_3_uid() does not correctly copy the uid from
uid_skb->data to its gate parameter. "gate = uid_skb->data;" only
puts a pointer to uid_skb->data to the local variable gate.
This means that in st21nfca_hci_target_from_gate() the content
of "u8 uid[NFC_NFCID1_MAXSIZE]" local variable is never initialized
before being used in memcpy(target->nfcid1, uid, len).
Fix this by replacing the local variable assignment with a memcpy.
This was found by compiling Linux with
"gcc -Wunused-but-set-parameter".
Acked-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
NFC: st21nfcb: Retry i2c_master_send if it returns a negative value
i2c_master_send may return many negative values different than
-EREMOTEIO.
In case an i2c transaction is NACK'ed, on raspberry pi B+
kernel 3.18, -EIO is generated instead.
Cc: stable@vger.kernel.org Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>