Sean Hefty [Wed, 29 May 2013 17:09:27 +0000 (10:09 -0700)]
RDMA/ucma: Support querying when IB paths are not reversible
The current query_route call can return up to two path records. The
assumption being that one is the primary path, with optional support
for an alternate path. In both cases, the paths are assumed to be
reversible and are used to send CM MADs.
With the ability to manually set IB path data, the rdma cm can
eventually be capable of using up to 6 paths per connection:
forward primary, reverse primary,
forward alternate, reverse alternate,
reversible primary path for CM MADs
reversible alternate path for CM MADs.
(It is unclear at this time if IB routing will complicate this) In
order to handle more flexible routing topologies, add a new command to
report any number of paths.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:26 +0000 (10:09 -0700)]
IB/sa: Export function to pack a path record into wire format
Allow converting from struct ib_sa_path_rec to the IB defined SA path
record wire format. This will be used to report path data from the
rdma cm into user space.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:25 +0000 (10:09 -0700)]
RDMA/ucma: Support querying for AF_IB addresses
The sockaddr structure for AF_IB is larger than sockaddr_in6. The
rdma cm user space ABI uses the latter to exchange address information
between user space and the kernel.
To support querying for larger addresses, define a new query command
that exchanges data using sockaddr_storage, rather than sockaddr_in6.
Unlike the existing query_route command, the new command only returns
address information. Route (i.e. path record) data is separated.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:23 +0000 (10:09 -0700)]
RDMA/cma: Set qkey for AF_IB
Allow the user to specify the qkey when using AF_IB. The qkey is
added to struct rdma_ucm_conn_param in place of a reserved field, but
for backwards compatability, is only accessed if the associated
rdma_cm_id is using AF_IB.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:22 +0000 (10:09 -0700)]
RDMA/cma: Expose private data when using AF_IB
If the source or destination address is AF_IB, then do not reserve a
portion of the private data in the IB CM REQ or SIDR REQ messages for
the cma header. Instead, all private data should be exported to the
user. When AF_IB is used, the rdma cm does not have sufficient
information to fill in the cma header. Additionally, this will be
necessary to support any IB connection through the rdma cm interface,
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:21 +0000 (10:09 -0700)]
RDMA/cma: Merge cma_get/save_net_info
With the removal of SDP related code, we can merge cma_get_net_info()
with cma_save_net_info(), since we're only ever dealing with a single
header format.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:19 +0000 (10:09 -0700)]
RDMA/cma: Add support for AF_IB to cma_get_service_id()
cma_get_service_id() forms the service ID based on the port space and
port number of the rdma_cm_id. Extend the call to support AF_IB,
which contains the service ID directly. This will be needed to
support any arbitrary SID.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:17 +0000 (10:09 -0700)]
RDMA/cma: Add support for AF_IB to rdma_resolve_addr()
Allow the user to specify the remote address using AF_IB format. When
AF_IB is used, the remote address simply needs to be recorded, and no
resolution using ARP is done. The local address may still need to be
matched with a local IB device.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:13 +0000 (10:09 -0700)]
RDMA/cma: Do not modify sa_family when setting loopback address
cma_resolve_loopback is called after an rdma_cm_id has been
bound to a specific sa_family and port. Once the
source sa_family for the id has been set, do not modify it.
Only the actual IP address portion of the source address
needs to be set.
As part of this fix, we can simplify setting the source address
by moving the loopback address assignment from cma_resolve_loopback
to cma_bind_loopback. cma_bind_loopback is only invoked when
the source address is the loopback address.
Finally, add loopback support for AF_IB as part of the change.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:12 +0000 (10:09 -0700)]
RDMA/cma: Allow user to specify AF_IB when binding
Modify rdma_bind_addr to allow the user to specify AF_IB when binding
to a device. AF_IB indicates that the user is not mapping an IP
address to the native IB addressing. (The mapping may have already
been done, or is not needed)
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:11 +0000 (10:09 -0700)]
RDMA/cma: Update port reservation to support AF_IB
The AF_IB uses a 64-bit service id (SID), which the user can control
through the use of a mask. The rdma_cm will assign values to the
unmasked portions of the SID based on the selected port space and port
number.
Because the IB spec divides the SID range into several regions, a
SID/mask combination may fall into one of the existing port space
ranges as defined by the RDMA CM IP Annex. Map the AF_IB SID to the
correct RDMA port space.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:10 +0000 (10:09 -0700)]
IB/addr: Add AF_IB support to ip_addr_size
Add support for AF_IB to ip_addr_size, and rename the function to
account for the change. Give the compiler more control over whether
the call should be inline or not by moving the definition into the .c
file, removing the static inline, and exporting it.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:09 +0000 (10:09 -0700)]
RDMA/cma: Include AF_IB in loopback and any address checks
Enhance checks for loopback and any address to support AF_IB in
addition to AF_INET and AF_INT6. This will allow future patches to
use AF_IB when binding and resolving addresses.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Sean Hefty [Wed, 29 May 2013 17:09:08 +0000 (10:09 -0700)]
RDMA/cma: Allow enabling reuseaddr in any state
The rdma_cm only allows setting reuseaddr if the corresponding
rdma_cm_id is in the idle state. Allow setting this value in other
states. This brings the behavior more inline with sockets.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Rami Rosen [Sat, 15 Jun 2013 20:04:56 +0000 (23:04 +0300)]
inet: frag , remove an empty ifdef.
This patch removes an empty ifdef from inet_frag_intern()
in net/ipv4/inet_fragment.c.
commit b67bfe0d42cac56c512dd5da4b1b347a23f4b70a
(hlist: drop the node parameter from iterators) removed hlist from
net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command,
which is now empty.
Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Sat, 15 Jun 2013 01:39:18 +0000 (09:39 +0800)]
tcp: introduce a per-route knob for quick ack
In previous discussions, I tried to find some reasonable heuristics
for delayed ACK, however this seems not possible, according to Eric:
"ACKS might also be delayed because of bidirectional
traffic, and is more controlled by the application
response time. TCP stack can not easily estimate it."
"ACK can be incredibly useful to recover from losses in
a short time.
The vast majority of TCP sessions are small lived, and we
send one ACK per received segment anyway at beginning or
retransmits to let the sender smoothly increase its cwnd,
so an auto-tuning facility wont help them that much."
and according to David:
"ACKs are the only information we have to detect loss.
And, for the same reasons that TCP VEGAS is fundamentally
broken, we cannot measure the pipe or some other
receiver-side-visible piece of information to determine
when it's "safe" to stretch ACK.
And even if it's "safe", we should not do it so that losses are
accurately detected and we don't spuriously retransmit.
The only way to know when the bandwidth increases is to
"test" it, by sending more and more packets until drops happen.
That's why all successful congestion control algorithms must
operate on explicited tested pieces of information.
Similarly, it's not really possible to universally know if
it's safe to stretch ACK or not."
It still makes sense to enable or disable quick ack mode like
what TCP_QUICK_ACK does.
Similar to TCP_QUICK_ACK option, but for people who can't
modify the source code and still wants to control
TCP delayed ACK behavior. As David suggested, this should belong
to per-path scope, since different pathes may want different
behaviors.
Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Rick Jones <rick.jones2@hp.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Graf <tgraf@suug.ch> CC: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yijing Wang [Tue, 18 Jun 2013 08:12:37 +0000 (16:12 +0800)]
bnx2: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)
Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init()
in init path. So we can use pdev->pm_cap instead of using
pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code.
Signed-off-by: Yijing Wang <wangyijing@huawei.com> Cc: Michael Chan <mchan@broadcom.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Yijing Wang [Tue, 18 Jun 2013 08:06:37 +0000 (16:06 +0800)]
amd8111e: use pdev->pm_cap instead of pci_find_capability(.., PCI_CAP_ID_PM)
Pci core has been saved pm cap register offset by pdev->pm_cap in pci_pm_init()
in init path. So we can use pdev->pm_cap instead of using
pci_find_capability(pdev, PCI_CAP_ID_PM) for better performance and simplified code.
Signed-off-by: Yijing Wang <wangyijing@huawei.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: netdev@vger.kernel.org (open list:NETWORKING DRIVERS) Signed-off-by: David S. Miller <davem@davemloft.net>
Yijing Wang [Tue, 18 Jun 2013 08:05:39 +0000 (16:05 +0800)]
Bnx2x: remove redundant D0 power state set
Pci_enable_device() will set device power state to D0,
so it's no need to do it again in bnx2x_init_dev().
Also remove redundant PM Cap find code, because pci core
has been saved the pci device pm cap value.
Signed-off-by: Yijing Wang <wangyijing@huawei.com> Cc: Eilon Greenstein <eilong@broadcom.com> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Acked-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 18 Jun 2013 02:37:05 +0000 (03:37 +0100)]
net: Add missing dependencies on NETDEVICES
ETRAX_ETHERNET selects ETHERNET and MII, which depend on NETDEVICES.
I don't think anything should select NETDEVICES, so make it a
dependency. It also doesn't need to select or depend on ETHERNET,
which has nothing to do with the Ethernet library functions.
BPCTL selects MII, which depends on NETDEVICES. But everything in the
drivers/staging/silicom directory is related to net devices, so make
NET_VENDOR_SILICOM depend on NETDEVICES and remove the now-redundant
dependencies on NET.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 18 Jun 2013 02:27:29 +0000 (03:27 +0100)]
at91_ether: Do not select NET_CORE
This has no dependency on any of the drivers under NET_CORE.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 18 Jun 2013 02:24:51 +0000 (03:24 +0100)]
net: Move MII out from under NET_CORE and hide it
All drivers that select MII also need to select NET_CORE because MII
depends on it. This is a bit ridiculous because NET_CORE is just a
menu option that doesn't enable any code by itself.
There is also no need for it to be a visible option, since its users
all select it.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 19 Jun 2013 10:51:20 +0000 (12:51 +0200)]
net: sock: adapt SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF
The current situation is that SOCK_MIN_RCVBUF is 2048 + sizeof(struct sk_buff))
while SOCK_MIN_SNDBUF is 2048. Since in both cases, skb->truesize is used for
sk_{r,w}mem_alloc accounting, we should have both sizes adjusted via defining a
TCP_SKB_MIN_TRUESIZE.
Further, as Eric Dumazet points out, the minimal skb truesize in transmit path is
SKB_TRUESIZE(2048) after commit f07d960df33c5 ("tcp: avoid frag allocation for
small frames"), and tcp_sendmsg() tries to limit skb size to half the congestion
window, meaning we try to build two skbs at minimum. Thus, having SOCK_MIN_SNDBUF
as 2048 can hit a small regression for some applications setting to low
SO_SNDBUF / SO_RCVBUF. Note that we define a TCP_SKB_MIN_TRUESIZE, because
SKB_TRUESIZE(2048) adds SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), but in
case of TCP skbs, the skb_shared_info is part of the 2048 bytes allocation for
skb->head.
The minor adaption in sk_stream_moderate_sndbuf() is to silence a warning by
using a typed max macro, as similarly done in SOCK_MIN_RCVBUF occurences, that
would appear otherwise.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Thu, 20 Jun 2013 02:01:33 +0000 (10:01 +0800)]
neigh: only allow init_net to change the default neigh_parms
Though we don't export the /proc/sys/net/ipv[4,6]/neigh/default/
directory to the un-init_net, but we can still use cmd such as
"ip ntable change name arp_cache locktime 129" to change the locktime
of default neigh_parms.
This patch disallows the un-init_net to find out the neigh_table.parms.
So the un-init_net will failed to influence the init_net.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Thu, 20 Jun 2013 02:01:32 +0000 (10:01 +0800)]
neigh: no need to call lookup_neigh_parms in neigh_parms_alloc
neigh_table.parms always exist and is initialized,kmemdup
can use it to create new neigh_parms, actually lookup_neigh_parms
here will return neigh_table.parms too.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Tue, 18 Jun 2013 22:36:05 +0000 (01:36 +0300)]
bnx2x: replace mechanism to check for next available packet
Check next packet availability by validating that HW has finished CQE
placement. This saves latency of another dma transaction performed to update
SB indexes.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Tue, 18 Jun 2013 22:36:04 +0000 (01:36 +0300)]
bnx2x: add support for ndo_ll_poll
Adds ndo_ll_poll method and locking for FPs between LL and the napi.
When receiving a packet we use skb_mark_ll to record the napi it came from.
Add each napi to the napi_hash right after netif_napi_add().
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Jun 2013 01:07:49 +0000 (18:07 -0700)]
openvswitch: gre tunneling support.
Pravin B Shelar says:
====================
Following patch series adds support for gre tunneling.
First six patches extend kernel gre and ip_tunnel modules
api so that there is more code sharing between gre modules
and ovs. Rest of patches adds ovs tunneling infrastructre
and gre protocol vport.
V2 fixes two patches according to comments from Jesse.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Tue, 18 Jun 2013 00:50:33 +0000 (17:50 -0700)]
openvswitch: Add gre tunnel support.
Add gre vport implementation. Most of gre protocol processing
is pushed to gre module. It make use of gre demultiplexer
therefore it can co-exist with linux device based gre tunnels.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Tue, 18 Jun 2013 00:50:12 +0000 (17:50 -0700)]
openvswitch: Copy individual actions.
Rather than validating actions and then copying all actiaons
in one block, following patch does same operation in single pass.
This validate and copy action one by one. This is required for
ovs tunneling patch.
This patch does not change any functionality.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Tue, 18 Jun 2013 00:49:38 +0000 (17:49 -0700)]
gre: Allow multiple protocol listener for gre protocol.
Currently there is only one user is allowed to register for gre
protocol. Following patch adds de-multiplexer. So that multiple
modules can listen on gre protocol e.g. kernel gre devices and ovs.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Thu, 13 Jun 2013 18:12:45 +0000 (22:12 +0400)]
sh_eth: get R8A7740 Rx descriptor word 0 shift out of #ifdef
The only R8A7740 specific #ifdef hindering ARM multiplatform build is left in
sh_eth_rx(): it covers the code shifting Rx buffer descriptor word 0 by 16. Get
rid of the #ifdef by adding 'shift_rd0' field to the 'struct sh_eth_cpu_data',
making the shift dependent on it, and setting it to 1 for the R8A7740 case...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The ath9k Kconfig conflict was a change of a Kconfig option name right
next to the deletion of another option.
The xen-netback conflict was overlapping changes involving the
handling of the notify list in xen_netbk_rx_action().
Batman conflict resolution provided by Antonio Quartulli, basically
keep everything in both conflict hunks.
The nl80211 conflict is a little more involved. In 'net' we added a
dynamic memory allocation to nl80211_dump_wiphy() to fix a race that
Linus reported. Meanwhile in 'net-next' the handlers were converted
to use pre and post doit handlers which use a flag to determine
whether to hold the RTNL mutex around the operation.
However, the dump handlers to not use this logic. Instead they have
to explicitly do the locking. There were apparent bugs in the
conversion of nl80211_dump_wiphy() in that we were not dropping the
RTNL mutex in all the return paths, and it seems we very much should
be doing so. So I fixed that whilst handling the overlapping changes.
To simplify the initial returns, I take the RTNL mutex after we try
to allocate 'tb'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Wed, 19 Jun 2013 08:09:57 +0000 (10:09 +0200)]
nl80211: fix attrbuf access race by allocating a separate one
Since my commit 3713b4e364 ("nl80211: allow splitting wiphy
information in dumps"), nl80211_dump_wiphy() uses the global
nl80211_fam.attrbuf for parsing the incoming data. This wouldn't
be a problem if it only did so on the first dump iteration which
is locked against other commands in generic netlink, but due to
space constraints in cb->args (the needed state doesn't fit) I
decided to always parse the original message. That's racy though
since nl80211_fam.attrbuf could be used by some other parsing in
generic netlink concurrently.
For now, fix this by allocating a separate parse buffer (it's a
bit too big for the stack, currently 1448 bytes on 64-bit). For
-next, I'll change the code to parse into the global buffer in
the first round only and then allocate a smaller buffer to keep
the data in cb->args.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The check introduced by:
commit 26a41ae604381c5cc0caf1c3261ca6b298b5fe69
Author: stephen hemminger <stephen@networkplumber.org>
Date: Mon Jun 17 12:09:58 2013 -0700
vxlan: only migrate dynamic FDB entries
was not correct because it is checking flag about type of FDB
entry, rather than the state (dynamic versus static). The confusion
arises because vxlan is reusing values from bridge, and bridge is
reusing values from neighbour table, and easy to get lost in translation.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 19 Jun 2013 02:32:32 +0000 (10:32 +0800)]
bcm63xx_enet: fix return value check in bcm_enet_shared_probe()
In case of error, the function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().
The usb_8dev hardware has problems on some xhci USB hosts. The driver fails to
read the firmware revision in the probe function. This leads to the following
Oops:
[ 3356.635912] kernel BUG at net/core/dev.c:5701!
The driver tries to free the netdev, which has already been registered, without
unregistering it.
This patch fixes the problem by unregistering the netdev in the error path.
Reported-by: Michael Olbrich <m.olbrich@pengutronix.de> Reviewed-by: Bernd Krumboeck <krumboeck@universalnet.at> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
ipv6: ndisc: fix ndisc_send_redirect writing to the wrong skb
Since some refactoring in 5f5a011, ndisc_send_redirect called
ndisc_fill_redirect_hdr_option on the wrong skb, leading to data corruption or
in the worst case a panic when the skb_put failed.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Isaku Yamahata [Fri, 14 Jun 2013 08:58:35 +0000 (17:58 +0900)]
staging/rtl8192u: convert skb->tail into skb_tail_pointer(skb)
The change set of 7a884dc "[SK_BUFF]: Convert skb->tail to sk_buff_data_t"
converted skb->tail from pointer into sk_buff_data_t.
Thus skb->tail is not always pointer, the area pointed by skb->tail
should be accessed via skb_tail_pointer().
Found by inspection. Compile tested only.
Cc: Simon Horman <horms@verge.net.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: devel@driverdev.osuosl.org Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Reviewed-by: Simon Horman <horms@verge.net.au> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Isaku Yamahata [Fri, 14 Jun 2013 08:58:34 +0000 (17:58 +0900)]
pxa168_eth: convert skb->end into skb_end_pointer(skb)
The change set of 4305b541, "[SK_BUFF]: Convert skb->end to sk_buff_data_t"
converted skb->end from pointer type to sk_buff_data_t.
The pointed value should be accessed via skb_end_pointer().
Since arm arch doesn't define NET_SKBUFF_DATA_USES_OFFSET,
skb->end is effectively pointer. So it doesn't cause a real problem.
But this patch is good for consistency.
Found by inspection. Compile tested only.
Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Isaku Yamahata [Fri, 14 Jun 2013 08:58:33 +0000 (17:58 +0900)]
mv643xx_eth.c: convert skb->end into skb_end_poitner(skb)
The change set of 4305b541 "[SK_BUFF]: Convert skb->end to sk_buff_data_t"
converted skb->end from pointer to sk_buff_data_t.
The pointed value should be accessed via skb_end_pointer().
Since arm or ppc arch doesn't define NET_SKBUFF_DATA_USES_OFFSET,
skb->end is effectively pointer. So it doesn't cause a real problem.
But this patch is good for consistency.
Found by inspection. Compile test only.
Cc: Simon Horman <horms@verge.net.au> Cc: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Isaku Yamahata [Fri, 14 Jun 2013 08:58:32 +0000 (17:58 +0900)]
net, scsi/csgb4i: convert skb->transport_header into skb_transport_header(skb)
The change set of 1a37e412, "net: Use 16bits for *_headers fields
of struct skbuff" converted from sk_buff_data_t into 16bit integer.
So skb->tail needs to be converted to skb_tail_pointer(skb).
Found by inspection. Compile tested only.
Cc: Simon Horman <horms@verge.net.au> Cc: Li RongQing <roy.qing.li@gmail.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Isaku Yamahata [Fri, 14 Jun 2013 08:58:31 +0000 (17:58 +0900)]
net, atm/ambassader: convert skb->tail into skb_tail_pointer(skb)
The change set of 27a884dc, "[SK_BUFF]: Convert skb->tail to sk_buff_data_t"
converted skb->tail from pointer into sk_buff_data_t. It missed skb->tail
in drivers/atm/ambassador.c.
This patch converts skb->tail into skb_tail_pointer(skb).
Found by inspection. Compile tested only.
Cc: Simon Horman <horms@verge.net.au> Cc: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp> Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Lüssing [Sun, 16 Jun 2013 21:20:34 +0000 (23:20 +0200)]
bridge: fix switched interval for MLD Query types
General Queries (the one with the Multicast Address field
set to zero / '::') are supposed to have a Maximum Response Delay
of [Query Response Interval], while for Multicast-Address-Specific
Queries it is [Last Listener Query Interval] - not the other way
round. (see RFC2710, section 7.3+7.8)
Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
vlan: restore ethtool ABI to control VLAN hardware acceleration
As part of the push to add 802.1ad server provider tagging support to the
kernel the VLAN features flags were renamed. Unfortunately the kernel name
for the VLAN hardware acceleration features that the kernel shows user space
was included in the rename, which broke ethtool (txvlan and rxvlan options
do not work). This patch restores the original names, i.e. the original ABI.
If we wanted to make clear to users that we are refering to CTAGs we can
always change ethtool's short_name and long_name for these features (for
example something along the lines of txvlan -> txvlan-ctag, tx-vlan-offload ->
tx-vlan-ctag-offload).
Cc: Patrick McHardy <kaber@trash.net> Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 17 Jun 2013 09:40:05 +0000 (11:40 +0200)]
net: sctp: remove SCTP_STATIC macro
SCTP_STATIC is just another define for the static keyword. It's use
is inconsistent in the SCTP code anyway and it was introduced in the
initial implementation of SCTP in 2.5. We have a regression suite in
lksctp-tools, but this is for user space only, so noone makes use of
this macro anymore. The kernel test suite for 2.5 is incompatible with
the current SCTP code anyway.
So simply Remove it, to be more consistent with the rest of the kernel
code.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 17 Jun 2013 09:40:04 +0000 (11:40 +0200)]
net: sctp: get rid of t_new macro for kzalloc
t_new rather obfuscates things where everyone else is using actual
function names instead of that macro, so replace it with kzalloc,
which is the function t_new wraps.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Mon, 17 Jun 2013 17:30:35 +0000 (19:30 +0200)]
bonding: don't call alb_set_slave_mac_addr() while atomic
alb_set_slave_mac_addr() sets the mac address in alb mode via
dev_set_mac_address(), which might sleep. It's called from
alb_handle_addr_collision_on_attach() in atomic context (under
read_lock(bond->lock)), thus triggering a bug.
Fix this by moving the lock inside alb_handle_addr_collision_on_attach().
v1->v2:
As Nikolay Aleksandrov noticed, we can drop the bond->lock completely.
Also, use bond_slave_has_mac(), when possible.
Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: cpsw: check for cpts pointer after its allocation
after priv->cpts got allocated then this pointer should check to determine
if the allocation succeeded or not.
Cc: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Jun 2013 23:15:51 +0000 (16:15 -0700)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into wireless
John W. Linville says:
====================
This will probably be the last batch of wireless fixes intended
for 3.10. Many of these are one- or two-liners, and a couple of
others are mostly relocating existing code to avoid races or to
limit the code to effecting specific hardware, etc.
The mac80211 fixes have a couple of exceptions to the above.
Regarding those, Johannes says:
"Following davem's complaint about my patch, here's a new pull request
w/o the patch he was complaining about, but instead with the const
fix rolled into the fix.
I have a fix for radar detection, one for rate control and a workaround
for broken HT APs which is a regression fix because we didn't rely
on them to be correct before."
Johannes also sends some iwlwifi fixes:
"I picked up Nikolay's patch for the chain noise calibration bug
that seems to have been there forever, a fix from Emmanuel for
setting TX flags on BAR frames and a fix of my own to avoid printing
request_module() errors if the kernel isn't even modular. We also
have our own version of Stanislaw's fix for rate control."
Along with those...
Anderson Lizardo fixes a Bluetooth memory corruption bug when an MTU
value is set to too small of a value.
Arend van Spriel sends a revised brcmsmac bug that fixes a regression
caused by a bad return value in an earlier patch. He also sends a
brcmfmac fix to avoid an oops when loading the driver at boot.
Daniel Drake fixes a race condition in btmrvl that causes hangs on
suspend for OLPC hardware.
Johan Hedberg adds a check to avoid sending a
HCI_Delete_Stored_Link_Key command to devices that don't support them,
avoiding some scary looking log spam.
Stanislaw Gruszka gives us a fix for iwlegacy to be able to use rates
higher than 1Mb/s on older wireless networks. He also sends an rt2x00
fix to reinstate older tx power handling behavior for some devices
that didn't work well with the current code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Jun 2013 23:13:45 +0000 (16:13 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter fixes. They are targeted to the
TCP option targets, that have receive some scrinity in the last week. The
changes are:
* Fix TCPOPTSTRIP, it stopped working in the forward chain as tcp_hdr
uses skb->transport_header, and we cannot use that in the forwarding
case, from myself.
* Fix default IPv6 MSS in TCPMSS in case of absence of TCP MSS options,
from Phil Oester.
* Fix missing fragmentation handling again in TCPMSS, from Phil Oester.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Mon, 17 Jun 2013 20:44:02 +0000 (22:44 +0200)]
alx: add a simple AR816x/AR817x device driver
This is a very simple driver, based on the original vendor
driver that Qualcomm/Atheros published/submitted previously,
but reworked to make the code saner. However, it also lost
a number of features (TSO/GSO, VLAN acceleration and multi-
queue support) in the process, as well as debugging support
features I didn't have any use for. The only thing I left
is checksum offload.
More features can obviously be added, but this seemed like
a good start for having a driver in mainline at all.
Johannes Stezenbach has verified that the driver works on
AR8161, I have a AR8171 myself. The E2200 device ID I found
on github in somebody's repository.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 17 Jun 2013 20:47:25 +0000 (13:47 -0700)]
tg3: Prevent system hang during repeated EEH errors.
The current tg3 code assumes the pci_error_handlers to be always called
in sequence. In particular, during ->error_detected(), NAPI is disabled
and the device is shutdown. The device is later reset and NAPI
re-enabled in ->slot_reset() and ->resume().
In EEH, if more than 6 errors are detected in a hour, only
->error_detected() will be called. This will leave the driver in an
inconsistent state as NAPI is disabled but netif_running state is still
true. When the device is later closed, we'll try to disable NAPI again
and it will loop forever.
We fix this by closing the device if we encounter any error conditions
during the normal sequence of the pci_error_handlers.
v2: Remove the changes in tg3_io_resume() based on Benjamin Poirier's
feedback.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Mon, 17 Jun 2013 22:36:49 +0000 (15:36 -0700)]
Fix the VLAN_TAG_PRESENT in netvsc_recv_callback()
We should call __vlan_hwaccel_put_tag() only if the packet
comes from vlan, otherwise VLAN_TAG_PRESENT will always be
added.
Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If skb_clone fails if out of memory then just skip the fanout.
Problem was introduced in 3.10 with:
commit 6681712d67eef14c4ce793561c3231659153a320
Author: David Stevens <dlstevens@us.ibm.com>
Date: Fri Mar 15 04:35:51 2013 +0000
vxlan: generalize forwarding tables
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Only migrate dynamic forwarding table entries, don't modify
static entries. If packet received from incorrect source IP address
assume it is an imposter and drop it.
This patch applies only to -net, a different patch would be needed for earlier
kernels since the NTF_SELF flag was introduced with 3.10.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Jun 2013 22:53:09 +0000 (15:53 -0700)]
Merge branch 'tipc'
Paul Gortmaker says:
====================
This is a rework of the content sent earlier[1], with the following changes:
-drop the Kconfig --> modparam conversion patch; this was
requested to be replaced[2] with a dynamic port quantity resizing.
Ying and Erik were discussing how best to achieve this, and then
vacation schedules got in the way, so implementing that will
come (hopefully) in the next round.
-rework the sk_rcvbuf patch to allow memory resizing via sysctl
as per what Ying and Neil discussed[3]
-add 4 more seemingly straigtforward and relatively small changes
from Ying (the last 4 in the series).
-add cosmetic UAPI comment update patch from Ying.
That said, the largest change is still the one where we make use of
the fact that linux supports kernel threads and do the server like
operations within kernel threads. As Jon says:
We remove the last remnants of the TIPC native API, to make it
possible to simplify locking policy and solve a problem with lost
topology events.
First, we introduce a socket-based alternative to the native API.
Second, we convert the two remaining users of the native API, the
TIPC internal topology server and the configuarion server, to use the
new API.
Third, we remove the remaining code pertaining to the native API.
I have re-tested this collection of commits between 32 and 64 bit x86
machines using the standard tipc test suite, and build tested for ppc.
Ying Xue [Mon, 17 Jun 2013 14:54:51 +0000 (10:54 -0400)]
tipc: remove dev_base_lock use from enable_bearer
Convert enable_bearer() to RCU locking with dev_get_by_name().
Based on a similar changeset in commit 840a185d ["aoe: remove
dev_base_lock use from aoecmd_cfg_pkts()"] -- quoting that:
"dev_base_lock is the legacy way to lock the device list,
and is planned to disappear. (writers hold RTNL, readers
hold RCU lock)"
Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Mon, 17 Jun 2013 14:54:50 +0000 (10:54 -0400)]
tipc: fix wrong return value for link_send_sections_long routine
When skb buffer cannot be allocated in link_send_sections_long(),
-ENOMEM error code instead of -EFAULT should be returned to its
caller.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Mon, 17 Jun 2013 14:54:49 +0000 (10:54 -0400)]
tipc: make tipc_link_send_sections_fast exit earlier
Once message build request function returns invalid code, the
process of sending message cannot continue. So in case of message
build failure, tipc_link_send_sections_fast() should return
immediately.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Mon, 17 Jun 2013 14:54:48 +0000 (10:54 -0400)]
tipc: enhance priority of link protocol packet
pfifo_fast is set as default traffic class queueing discipline. This
queue has three so called "bands". Within each band, FIFO rules apply.
However, as long as there are packets waiting in band 0, band 1 won't
be processed.
Now all kind of TIPC type packet priorities are never set, that is,
their priorities are 0, so they are mapped to band 1 of pfifo_fast
qdisc. But, especially during link congestion, if link protocol packet
can be sent out as earlier as possible than other type of packets so
that protocol packet can arrive at peer endpoint in time, the peer
will timely reset its link timeout timer to keep the link alive.
So enhancing the priority of link protocol packets can meet the
specific demand to avoid unnecessary link reset due to a transient
link congestion.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Gortmaker [Mon, 17 Jun 2013 14:54:47 +0000 (10:54 -0400)]
tipc: cosmetic realignment of function arguments
No runtime code changes here. Just a realign of the function
arguments to start where the 1st one was, and fit as many args
as can be put in an 80 char line.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Mon, 17 Jun 2013 14:54:46 +0000 (10:54 -0400)]
tipc: save sock structure pointer instead of void pointer to tipc_port
Directly save sock structure pointer instead of void pointer to avoid
unnecessary cast conversions.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ying Xue [Mon, 17 Jun 2013 14:54:45 +0000 (10:54 -0400)]
tipc: convert config_lock from spinlock to mutex
As the configuration server is now running under process context,
it's unnecessary for us to have a spinlock serializing the TIPC
configuration process. Instead, we replace it with a mutex lock,
which gives us more freedom. For instance, we can now call
pre-emptable functions within the protected area.
Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>