Vlad Yasevich [Wed, 13 Feb 2013 12:00:12 +0000 (12:00 +0000)]
bridge: Add netlink interface to configure vlans on bridge ports
Add a netlink interface to add and remove vlan configuration on bridge port.
The interface uses the RTM_SETLINK message and encodes the vlan
configuration inside the IFLA_AF_SPEC. It is possble to include multiple
vlans to either add or remove in a single message.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 13 Feb 2013 12:00:09 +0000 (12:00 +0000)]
bridge: Add vlan filtering infrastructure
Adds an optional infrustructure component to bridge that would allow
native vlan filtering in the bridge. Each bridge port (as well
as the bridge device) now get a VLAN bitmap. Each bit in the bitmap
is associated with a vlan id. This way if the bit corresponding to
the vid is set in the bitmap that the packet with vid is allowed to
enter and exit the port.
Write access the bitmap is protected by RTNL and read access
protected by RCU.
Vlan functionality is disabled by default.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 13 Feb 2013 01:03:50 +0000 (01:03 +0000)]
net: sctp: add build check for sctp_sf_eat_sack_6_2/jsctp_sf_eat_sack
In order to avoid any future surprises of kernel panics due to jprobes
function mismatches (as e.g. fixed in 4cb9d6eaf85ecd: sctp: jsctp_sf_eat_sack:
fix jprobes function signature mismatch), we should check both function
types during build and scream loudly if they do not match. __same_type
resolves to __builtin_types_compatible_p, which is 1 in case both types
are the same and 0 otherwise, qualifiers are ignored. Tested by myself.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Mon, 11 Feb 2013 09:27:41 +0000 (09:27 +0000)]
net: Fix possible wrong checksum generation.
Patch cef401de7be8c4e (net: fix possible wrong checksum
generation) fixed wrong checksum calculation but it broke TSO by
defining new GSO type but not a netdev feature for that type.
net_gso_ok() would not allow hardware checksum/segmentation
offload of such packets without the feature.
Following patch fixes TSO and wrong checksum. This patch uses
same logic that Eric Dumazet used. Patch introduces new flag
SKBTX_SHARED_FRAG if at least one frag can be modified by
the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
info tx_flags rather than gso_type.
tx_flags is better compared to gso_type since we can have skb with
shared frag without gso packet. It does not link SHARED_FRAG to
GSO, So there is no need to define netdev feature for this.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 Feb 2013 18:22:24 +0000 (13:22 -0500)]
Merge branch 'tcp_tsoffset'
Andrey Vagin says:
====================
If a TCP socket will get live-migrated from one box to another the
timestamps (which are typically ON) will get screwed up -- the new
kernel will generate TS values that has nothing to do with what they
were on dump. The solution is to yet again fix the kernel and put a
"timestamp offset" on a socket.
A socket offset is added in places where externally visible tcp
timestamp option is parsed/initialized.
Connections in the SYN_RECV state are not supported, global
tcp_time_stamp is used for them, because repair mode doesn't support
this state. In a future it can be implemented by the similar way as for
TIME_WAIT sockets.
For time-wait sockets offset is inhereted by a proper tcp_sock.
A per-socket offset can be set only for sockets in repair mode.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Mon, 11 Feb 2013 05:50:19 +0000 (05:50 +0000)]
tcp: send packets with a socket timestamp
A socket timestamp is a sum of the global tcp_time_stamp and
a per-socket offset.
A socket offset is added in places where externally visible
tcp timestamp option is parsed/initialized.
Connections in the SYN_RECV state are not supported, global
tcp_time_stamp is used for them, because repair mode doesn't support
this state. In a future it can be implemented by the similar way
as for TIME_WAIT sockets.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Mon, 11 Feb 2013 05:50:18 +0000 (05:50 +0000)]
tcp: set and get per-socket timestamp
A timestamp can be set, only if a socket is in the repair mode.
This patch adds a new socket option TCP_TIMESTAMP, which allows to
get and set current tcp times stamp.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Mon, 11 Feb 2013 05:50:17 +0000 (05:50 +0000)]
tcp: adding a per-socket timestamp offset
This functionality is used for restoring tcp sockets. A tcp timestamp
depends on how long a system has been running, so it's differ for each
host. The solution is to set a per-socket offset.
A per-socket offset for a TIME_WAIT socket is inherited from a proper
tcp socket.
tcp_request_sock doesn't have a timestamp offset, because the repair
mode for them are not implemented.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 Feb 2013 18:18:20 +0000 (13:18 -0500)]
Merge branch 'gfar-ethtool-atomic' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Paul Gortmaker says:
====================
Eric noticed that the handling of local u64 ethtool counters for
this driver commonly found on Freescale ppc-32 boards was racy.
However, before converting them over to atomic64_t, I noticed
that an internal struct was being used to determine the offsets
for exporting this data into the ethtool buffer, and in doing
so, it assumed that the counters would always be u64. Rather
than keep this implicit assumption, a simple code cleanup gets
rid of the struct completely, and leaves less conversion sites.
The alternative solution would have been to take advantage of
the fact that the counters are all relating to error conditions,
and hence make them internally u32. In doing so, we'd be assuming
that U32_MAX of any particular error condition is highly unlikely.
This might have made sense if any increments were in a hot path.
Tested with "ethtool -S eth0" on sbc8548 board.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Wed, 13 Feb 2013 16:32:42 +0000 (11:32 -0500)]
netpoll: fix smatch warnings in netpoll core code
Dan Carpenter contacted me with some notes regarding some smatch warnings in the
netpoll code, some of which I introduced with my recent netpoll locking fixes,
some which were there prior. Specifically they were:
net-next/net/core/netpoll.c:243 netpoll_poll_dev() warn: inconsistent
returns mutex:&ni->dev_lock: locked (213,217) unlocked (210,243)
net-next/net/core/netpoll.c:706 netpoll_neigh_reply() warn: potential
pointer math issue ('skb_transport_header(send_skb)' is a 128 bit pointer)
This patch corrects the locking imbalance (the first error), and adds some
parenthesis to correct the second error. Tested by myself. Applies to net-next
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Dan Carpenter <dan.carpenter@oracle.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
It adds an argument called panic to a function that uses the BUG() macro
which tries to call panic, but the argument masks the panic() function
declaration, resulting in the following error (gcc 4.2.4):
net/core/skbuff.c In function 'skb_panic':
net/core/skbuff.c +126 : error: called object 'panic' is not a function
This is fixed by renaming the argument to msg.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Jean Sacren <sakiwit@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Gortmaker [Tue, 12 Feb 2013 20:28:35 +0000 (15:28 -0500)]
gianfar: remove largely unused gfar_stats struct
The gfar_stats struct is only used in copying out data
via ethtool. It is declared as the extra stats, followed
by the rmon stats. However, the rmon stats are never
actually ever used in the driver; instead the rmon data
is a u32 register read that is cast directly into the
ethtool buf.
It seems the only reason rmon is in the struct at all is
to give the offset(s) at which it should be exported into
the ethtool buffer. But note gfar_stats doesn't contain
a gfar_extra_stats as a substruct -- instead it contains
a u64 array of equal element count. This implicitly means
we have two independent declarations of what gfar_extra_stats
really is. Rather than have this duality, we already have
defines which give us the offset directly, and hence do not
need the struct at all.
Further, since we know the extra_stats is unconditionally
always present, we can write it out to the ethtool buf
1st, and then optionally write out the rmon data. There
is no need for two independent loops, both of which are
simply copying out the extra_stats to buf offset zero.
This also helps pave the way towards allowing the extra
stats fields to be converted to atomic64_t values, without
having their types directly influencing the ethtool stats
export code (gfar_fill_stats) that expects to deal with u64.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Jiri Pirko [Tue, 12 Feb 2013 00:12:07 +0000 (00:12 +0000)]
act_police: improved accuracy at high rates
Current act_police uses rate table computed by the "tc" userspace
program, which has the following issue:
The rate table has 256 entries to map packet lengths to token (time
units). With TSO sized packets, the 256 entry granularity leads to
loss/gain of rate, making the token bucket inaccurate.
Thus, instead of relying on rate table, this patch explicitly computes
the time and accounts for packet transmission times with nanosecond
granularity.
Jiri Pirko [Tue, 12 Feb 2013 00:12:06 +0000 (00:12 +0000)]
act_police: move struct tcf_police to act_police.c
It's not used anywhere else, so move it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 12 Feb 2013 00:12:05 +0000 (00:12 +0000)]
tbf: improved accuracy at high rates
Current TBF uses rate table computed by the "tc" userspace program,
which has the following issue:
The rate table has 256 entries to map packet lengths to
token (time units). With TSO sized packets, the 256 entry granularity
leads to loss/gain of rate, making the token bucket inaccurate.
Thus, instead of relying on rate table, this patch explicitly computes
the time and accounts for packet transmission times with nanosecond
granularity.
The bnx2x gso_type setting bug fix in 'net' conflicted with
changes in 'net-next' that broke the gso_* setting logic
out into a seperate function, which also fixes the bug in
question. Thus, use the 'net-next' version.
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Roese [Tue, 12 Feb 2013 03:05:38 +0000 (03:05 +0000)]
net: fec_mpc52xx: Read MAC address from device-tree
Until now, the MPC5200 FEC ethernet driver relied upon the bootloader
(U-Boot) to write the MAC address into the ethernet controller
registers. The Linux driver should not rely on such a thing. So
lets read the MAC address from the DT as it should be done here.
The following priority is now used to read the MAC address:
1) First, try OF node MAC address, if not present or invalid, then:
2) Read from MAC address registers, if invalid, then:
3) Log a warning message, and choose a random MAC address.
This fixes a problem with a MPC5200 board that uses the SPL U-Boot
version without FEC initialization before Linux booting for
boot speedup.
Additionally a status line is now be printed upon successful
driver probing, also displaying this MAC address.
Signed-off-by: Stefan Roese <sr@denx.de> Cc: Anatolij Gustschin <agust@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Vipul Pandya [Tue, 12 Feb 2013 00:36:21 +0000 (00:36 +0000)]
cxgb4vf: Fix VLAN extraction counter increment
Signed-off-by: Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The CPSW switch can act as Dual EMAC by segregating the switch ports
using VLAN and port VLAN as per the TRM description in
14.3.2.10.2 Dual Mac Mode
Following CPSW components will be common for both the interfaces.
* Interrupt source is common for both eth interfaces
* Interrupt pacing is common for both interfaces
* Hardware statistics is common for all the ports
* CPDMA is common for both eth interface
* CPTS is common for both the interface and it should not be enabled on
both the interface as timestamping information doesn't contain port
information.
Constrains
* Reserved VID of One port should not be used in other interface which will
enable switching functionality
* Same VID must not be used in both the interface which will enable switching
functionality
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 Feb 2013 21:11:09 +0000 (16:11 -0500)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:
====================
Here is another handful of late-breaking fixes intended for the 3.8
stream... Hopefully the will still make it! :-)
There are three mac80211 fixes pulled from Johannes:
"Here are three fixes still for the 3.8 stream, the fix from Cong Ding
for the bad sizeof (Stephen Hemminger had pointed it out before but I'd
promptly forgotten), a mac80211 managed-mode channel context usage fix
where a downgrade would never stop until reaching non-HT and a bug in
the channel determination that could cause invalid channels like HT40+
on channel 11 to be used."
Also included is a mwl8k fix that avoids an oops when using mwl8k
devices that only support the 5 GHz band.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 12 Feb 2013 09:45:44 +0000 (09:45 +0000)]
ixgbe: Only set gso_type to SKB_GSO_TCPV4 as RSC does not support IPv6
The original fix that was applied for setting gso_type required more change
than necessary because it was assumed ixgbe does RSC on IPv6 frames and this
is not correct. RSC is only supported with IPv4/TCP frames only. As such we
can simplify the fix and avoid the unnecessary move of eth_type_trans.
The previous patch "ixgbe: fix gso type" and this patch reduce the entire fix
to one line that sets gso_type to TCPV4 if the frame is RSC.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Tommi Rantala <tt.rantala@gmail.com> Tested-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 12 Feb 2013 05:15:33 +0000 (05:15 +0000)]
net: sctp: remove unused multiple cookie keys
Vlad says: The whole multiple cookie keys code is completely unused
and has been all this time. Noone uses anything other then the
secret_key[0] since there is no changeover support anywhere.
Thus, for now clean up its left-over fragments.
Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 11 Feb 2013 10:25:31 +0000 (10:25 +0000)]
netpoll: cleanup sparse warnings
With my recent commit I introduced two sparse warnings. Looking closer there
were a few more in the same file, so I fixed them all up. Basic rcu pointer
dereferencing suff.
I've validated these changes using CONFIG_PROVE_RCU while starting and stopping
netconsole repeatedly in bonded and non-bonded configurations
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: fengguang.wu@intel.com CC: David Miller <davem@davemloft.net> CC: eric.dumazet@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 11 Feb 2013 10:25:30 +0000 (10:25 +0000)]
netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock
__netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is
already held. The mechanism is used to asynchronously call __netpoll_cleanup
outside of the holding of the rtnl_lock, so as to avoid deadlock.
Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the
rtnl_lock must be held while calling it. Further, it cannot be held, because
rcu callbacks may be issued in softirq contexts, which cannot sleep.
Fix this by converting the rcu callback to a work queue that is guaranteed to
get scheduled in process context, so that we can hold the rtnl properly while
calling __netpoll_cleanup
Tested successfully by myself.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> CC: Cong Wang <amwang@redhat.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Sacren [Mon, 11 Feb 2013 13:30:38 +0000 (13:30 +0000)]
skbuff: create skb_panic() function and its wrappers
Create skb_panic() function in lieu of both skb_over_panic() and
skb_under_panic() so that code duplication would be avoided. Update type
and variable name where necessary.
Jiri Pirko suggested using wrappers so that we would be able to keep the
fruits of the original code.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Jonas Gorski [Fri, 8 Feb 2013 15:07:25 +0000 (16:07 +0100)]
mwl8k: fix band for supported channels
The band field for the supported channels were left unpopulated, making
them default to 0 == IEEE80211_BAND_2GHZ, even for the 5GHz channels.
This resulted in null pointer accesses if anything tries to access
wiphy->bands[channel->band] of a 5GHz channel on 5GHz only cards, since
wiphy->bands[2GHZ] is NULL for them (e.g. cfg80211_chandef_usable does).
Spanning Tree Protocol packets should have always been marked as
control packets, this causes them to get queued in the high prirority
FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge
gets overloaded and can't communicate. This is a long-standing bug back
to the first versions of Linux bridge.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes sparse complaints about dropping __user in casts.
warning: cast removes address space of expression
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 10 Feb 2013 11:41:01 +0000 (11:41 +0000)]
bridge: use dev->addr_assign_type to see if user change mac
And remove no longer used br->flags.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: by default join ff01::1 and in case of forwarding ff01::2 and ff05:2
Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: don't let node/interface scoped multicast traffic escape on the wire
Reported-by: Erik Hugne <erik.hugne@ericsson.com> Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Sat, 9 Feb 2013 20:46:34 +0000 (21:46 +0100)]
mac80211: fix channel selection bug
When trying to connect to an AP that advertises HT but not
VHT, the mac80211 code erroneously uses the configuration
from the AP as is instead of checking it against regulatory
and local capabilities. This can lead to using an invalid
or even inexistent channel (like 11/HT40+).
Additionally, the return flags from downgrading must be
ORed together, to collect them from all of the downgrades.
Also clarify the message.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Register for firmware based Inter Driver Communication (IDC) using initialize
NIC as the first mailbox command
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net neighbour, decnet: Ensure to align device private data on preferred alignment.
To allow both of protocol-specific data and device-specific data
attached with neighbour entry, and to eliminate size calculation
cost when allocating entry, sizeof protocol-speicic data must be
multiple of NEIGH_PRIV_ALIGN. On 64bit archs,
sizeof(struct dn_neigh) is multiple of NEIGH_PRIV_ALIGN, but on
32bit archs, it was not.
Introduce NEIGH_ENTRY_SPACE() macro to ensure that protocol-specific
entry-size meets our requirement.
Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6 mcast: Do not join device multicast for interface-local multicasts.
RFC4291 (IPv6 addressing architecture) says that interface-Local scope
spans only a single interface on a node. We should not join L2 device
multicast list for addresses in interface-local (or smaller) scope.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Feb 2013 01:44:08 +0000 (20:44 -0500)]
Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter/IPVS fixes for 3.8-rc7, they are:
* Fix oops in IPVS state-sync due to releasing a random memory area due
to unitialized pointer, from Dan Carpenter.
* Fix SCTP flow establishment due to bad checksumming mangling in IPVS,
from Daniel Borkmann.
* Three fixes for the recently added IPv6 NPT, all from YOSHIFUJI Hideaki,
with an amendment collapsed into those patches from Ulrich Weber. They
fiix adjustment calculation, fix prefix mangling and ensure LSB of
prefixes are zeroes (as required by RFC).
Specifically, it took me a while to validate the 1's complement arithmetics/
checksumming approach in the IPv6 NPT code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Feb 2013 01:37:27 +0000 (20:37 -0500)]
Merge branch 'mvrp'
David Ward says:
====================
The Linux kernel currently implements the GARP VLAN Registration
Protocol (GVRP) from IEEE 802.1Q-1998 (applicant-only participant).
When the GVRP flag is set for a VLAN interface on a Linux host, the
host advertises its membership in the VLAN to the attached bridge/
switch, so that it is not necessary to manually configure the bridge/
switch port to participate in the VLAN.
GVRP has been superseded by the Multiple VLAN Registration Protocol
(MVRP) in IEEE 802.1Q-2011, which addresses scalability concerns about
the earlier protocol. The following patches add support for MVRP to
the Linux kernel and iproute2 utility. They are based largely off of
the existing implementation of GVRP, but have been modified for the
new PDU structure and state machine.
This implementation was tested with two Juniper EX4200 switches.
====================
Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Initial implementation of the Multiple VLAN Registration Protocol
(MVRP) from IEEE 802.1Q-2011, based on the existing implementation
of the GARP VLAN Registration Protocol (GVRP).
Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Initial implementation of the Multiple Registration Protocol (MRP)
from IEEE 802.1Q-2011, based on the existing implementation of the
Generic Attribute Registration Protocol (GARP).
Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Feb 2013 01:14:46 +0000 (20:14 -0500)]
Merge branch 'gso_type'
Michael S. Tsirkin says:
====================
At the moment, macvtap crashes are observed if macvtap is attached
to an interface with LRO enabled.
The crash in question is BUG() in macvtap_skb_to_vnet_hdr.
This happens because several drivers set gso_size but not gso_type
in incoming skbs.
This didn't use to be the case: with intel cards on 3.2 and older
kernels, with qlogic - on 3.4 and older kernels, so it's a regression if
not a recent one.
The following patches fix this for qlogic, broadcom and intel drivers.
I tested that the patch fixes the crash for ixgbe but
don't have qlogic/broadcom hardware to test.
I also only tested TCPv4.
Please review, and consider for 3.8.
Changes from v1:
- added missing htons as suggested by Eric
- backported the relevant bits from cbf1de72324a8105ddcc3d9ce9acbc613faea17e for bnx2x
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In LRO mode, bnx2x set gso_size but not gso type.
This leads to crashes in macvtap.
Commit cbf1de72324a8105ddcc3d9ce9acbc613faea17e
queued for 3.9 includes a more complete fix.
This is a minimal patch to avoid the crash, for 3.8.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qlcnic set gso_size but not gso type. This leads to crashes
in macvtap.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
stmmac: mdio register has to fail if the phy is not found
With this patch the stmmac fails in case of the phy device
is not found; w/o this fix the mdio can be register twice when
do down/up the iface and this is not correct.
Reported-by: Stas <stsp@list.ru> Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Feb 2013 01:10:10 +0000 (20:10 -0500)]
Merge branch 'vsock'
Andy King says:
====================
In an effort to improve the out-of-the-box experience with Linux kernels for
VMware users, VMware is working on readying the VM Sockets (VSOCK, formerly
VMCI Sockets) (vsock) kernel module for inclusion in the Linux kernel. The
purpose of this post is to acquire feedback on the vsock kernel module.
Unlike previous submissions, where the new socket family was entirely reliant
on VMware's VMCI PCI device (and thus VMware's hypervisor), VM Sockets is now
completely[1] separated out into two parts, each in its own module:
o Core socket code, which is transport-neutral and invokes transport
callbacks to communicate with the hypervisor. This is vsock.ko.
o A VMCI transport, which communicates over VMCI with the VMware hypervisor.
This is vmw_vsock_vmci_transport.ko, and it registers with the core module
as a transport.
This should provide a path to introducing additional transports, for example
virtio, with the ultimate goal being to make this new socket family
hypervisor-neutral.
[1] If Gerd tries it and determines this to be false (still), I'll ship him
a keg of beer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy King [Wed, 6 Feb 2013 14:23:56 +0000 (14:23 +0000)]
VSOCK: Introduce VM Sockets
VM Sockets allows communication between virtual machines and the hypervisor.
User level applications both in a virtual machine and on the host can use the
VM Sockets API, which facilitates fast and efficient communication between
guest virtual machines and their host. A socket address family, designed to be
compatible with UDP and TCP at the interface level, is provided.
Today, VM Sockets is used by various VMware Tools components inside the guest
for zero-config, network-less access to VMware host services. In addition to
this, VMware's users are using VM Sockets for various applications, where
network access of the virtual machine is restricted or non-existent. Examples
of this are VMs communicating with device proxies for proprietary hardware
running as host applications and automated testing of applications running
within virtual machines.
The VMware VM Sockets are similar to other socket types, like Berkeley UNIX
socket interface. The VM Sockets module supports both connection-oriented
stream sockets like TCP, and connectionless datagram sockets like UDP. The VM
Sockets protocol family is defined as "AF_VSOCK" and the socket operations
split for SOCK_DGRAM and SOCK_STREAM.
For additional information about the use of VM Sockets, please refer to the
VM Sockets Programming Guide available at:
Signed-off-by: George Zhang <georgezhang@vmware.com> Signed-off-by: Dmitry Torokhov <dtor@vmware.com> Signed-off-by: Andy king <acking@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 7 Feb 2013 23:22:58 +0000 (23:22 +0000)]
net: sctp: sctp_auth_make_key_vector: use sctp_auth_create_key
In sctp_auth_make_key_vector, we allocate a temporary sctp_auth_bytes
structure with kmalloc instead of the sctp_auth_create_key allocator.
Change this to sctp_auth_create_key as it is the case everywhere else,
so that we also can properly free it via sctp_auth_key_put. This makes
it easier for future code changes in the structure and allocator itself,
since a single API is consistently used for this purpose. Also, by
using sctp_auth_create_key we're doing sanity checks over the arguments.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 7 Feb 2013 16:41:02 +0000 (16:41 +0000)]
macvlan: add a salt to mc_hash()
Some multicast addresses are common to all macvlans,
so if a multicast message has a hash value collision, we
have to deliver a copy to all macvlans, adding significant
latency and possible packet drops if netdev_max_backlog
limit is hit.
Having a per macvlan hash function permits to reduce the
impact of hash collisions.
Suggested-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 7 Feb 2013 16:02:57 +0000 (16:02 +0000)]
macvlan: broadcast addr should be part of mc_filter
commit cd431e738509e (macvlan: add multicast filter) forgot
the broadcast case.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> SIgned-off-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
we should use rcu_dereference_bh() when we hold rcu_read_lock_bh().
Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 7 Feb 2013 11:46:27 +0000 (11:46 +0000)]
drivers: net: Remove remaining alloc/OOM messages
alloc failures already get standardized OOM
messages and a dump_stack.
For the affected mallocs around these OOM messages:
Converted kmallocs with multiplies to kmalloc_array.
Converted a kmalloc/memcpy to kmemdup.
Removed now unused stack variables.
Removed unnecessary parentheses.
Neatened alignment.
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Arend van Spriel <arend@broadcom.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rafał Miłecki [Thu, 7 Feb 2013 05:40:38 +0000 (05:40 +0000)]
bgmac: fix "cmdcfg" calls for promisc and loopback modes
The last (bool) parameter in bgmac_cmdcfg_maskset says if the write
should be made, even if value didn't change. Currently driver doesn't
match the specs about (not) forcing some changes. This makes it follow
them.
Reported-by: Nathan Hintz <nlhintz@hotmail.com> Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Fri, 8 Feb 2013 10:17:15 +0000 (10:17 +0000)]
skbuff: Move definition of NETDEV_FRAG_PAGE_MAX_SIZE
In order to address the fact that some devices cannot support the full 32K
frag size we need to have the value accessible somewhere so that we can use it
to do comparisons against what the device can support. As such I am moving
the values out of skbuff.c and into skbuff.h.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 8 Feb 2013 21:01:18 +0000 (08:01 +1100)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM fixes from Russell King:
"I was going to hold these off until v3.8 was out, and send them with a
stable tag, but as everyone else is pushing much bigger fixes which
Linus is accepting, let's save people from the hastle of having to
patch v3.8 back into working or use a stable kernel.
Looking at the diffstat, this really is high value for its size; this
is miniscule compared to how the -rc6 to tip diffstat currently looks.
So, four patches in this set:
- Punit Agrawal reports that the kernel no longer boots on MPCore due
to a new assumption made in the GIC code which isn't true of
earlier GIC designs. This is the biggest change in this set.
- Punit's boot log also revealed a bunch of WARN_ON() dumps caused by
the DT-ification of the GIC support without fixing up non-DT
Realview - which now sees a greater number of interrupts than it
did before.
- A fix for the DMA coherent code from Marek which uses the wrong
check for atomic allocations; this can result in spinlock lockups
or other nasty effects.
- A fix from Will, which will affect all Android based platforms if
not applied (which use the 2G:2G VM split) - this causes
particularly 'make' to misbehave unless this bug is fixed."
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
ARM: DMA mapping: fix bad atomic test
ARM: realview: ensure that we have sufficient IRQs available
ARM: GIC: fix GIC cpumask initialization
1) Revert iwlwifi reclaimed packet tracking, it causes problems for a
bunch of folks. From Emmanuel Grumbach.
2) Work limiting code in brcmsmac wifi driver can clear tx status
without processing the event. From Arend van Spriel.
3) rtlwifi USB driver processes wrong SKB, fix from Larry Finger.
4) l2tp tunnel delete can race with close, fix from Tom Parkin.
5) pktgen_add_device() failures are not checked at all, fix from Cong
Wang.
6) Fix unintentional removal of carrier off from tun_detach(),
otherwise we confuse userspace, from Michael S. Tsirkin.
7) Don't leak socket reference counts and ubufs in vhost-net driver,
from Jason Wang.
8) vmxnet3 driver gets it's initial carrier state wrong, fix from Neil
Horman.
9) Protect against USB networking devices which spam the host with 0
length frames, from Bjørn Mork.
10) Prevent neighbour overflows in ipv6 for locally destined routes,
from Marcelo Ricardo. This is the best short-term fix for this, a
longer term fix has been implemented in net-next.
11) L2TP uses ipv4 datagram routines in it's ipv6 code, whoops. This
mistake is largely because the ipv6 functions don't even have some
kind of prefix in their names to suggest they are ipv6 specific.
From Tom Parkin.
12) Check SYN packet drops properly in tcp_rcv_fastopen_synack(), from
Yuchung Cheng.
13) Fix races and TX skb freeing bugs in via-rhine's NAPI support, from
Francois Romieu and your's truly.
14) Fix infinite loops and divides by zero in TCP congestion window
handling, from Eric Dumazet, Neal Cardwell, and Ilpo Järvinen.
15) AF_PACKET tx ring handling can leak kernel memory to userspace, fix
from Phil Sutter.
16) Fix error handling in ipv6 GRE tunnel transmit, from Tommi Rantala.
17) Protect XEN netback driver against hostile frontend putting garbage
into the rings, don't leak pages in TX GOP checking, and add proper
resource releasing in error path of xen_netbk_get_requests(). From
Ian Campbell.
18) SCTP authentication keys should be cleared out and released with
kzfree(), from Daniel Borkmann.
19) L2TP is a bit too clever trying to maintain skb->truesize, and ends
up corrupting socket memory accounting to the point where packet
sending is halted indefinitely. Just remove the adjustments
entirely, they aren't really needed. From Eric Dumazet.
20) ATM Iphase driver uses a data type with the same name as the S390
headers, rename to fix the build. From Heiko Carstens.
21) Fix a typo in copying the inner network header offset from one SKB
to another, from Pravin B Shelar.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
net: sctp: sctp_endpoint_free: zero out secret key data
net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
atm/iphase: rename fregt_t -> ffreg_t
net: usb: fix regression from FLAG_NOARP code
l2tp: dont play with skb->truesize
net: sctp: sctp_auth_key_put: use kzfree instead of kfree
netback: correct netbk_tx_err to handle wrap around.
xen/netback: free already allocated memory on failure in xen_netbk_get_requests
xen/netback: don't leak pages on failure in xen_netbk_tx_check_gop.
xen/netback: shutdown the ring if it contains garbage.
net: qmi_wwan: add more Huawei devices, including E320
net: cdc_ncm: add another Huawei vendor specific device
ipv6/ip6_gre: fix error case handling in ip6gre_tunnel_xmit()
tcp: fix for zero packets_in_flight was too broad
brcmsmac: rework of mac80211 .flush() callback operation
ssb: unregister gpios before unloading ssb
bcma: unregister gpios before unloading bcma
rtlwifi: Fix scheduling while atomic bug
net: usbnet: fix tx_dropped statistics
tcp: ipv6: Update MIB counters for drops
...
David S. Miller [Fri, 8 Feb 2013 20:18:40 +0000 (15:18 -0500)]
Merge branch 'wireless'
John W. Linville says:
====================
Please accept this pull request intended for the 3.9 stream!
Included are a mac80211 pull, an iwlwifi pull (actually two -- one
was a fast-forward), a wl12xx pull, and a couple of Bluetooth pulls.
On mac80211, Johannes says:
"I've included
* AKM definitions from Bing,
* mesh fixes from Thomas, including a fix from him for me breaking his
patch while applying,
* channel check fix from Simon,
* an old patch from Yoni Divinsky who doesn't even work for TI any
more, to configure the WEP TX key for ARP offload etc.
* MAC ACL API from Vasanth
* a fix for the infamous chanctx_conf warning from Arnd
* from myself, a fix for my previous aggregation changes, some cleanup
and some improvements and fixes for WoWLAN"
On iwlwifi, Johannes says:
"Two small changes for iwlwifi-next, one to update all our Copyright
notices and one to provide the RX page order."
And also:
"So what I have here is some cleanups, preparations and the new MVM
(multi-virtual MAC) driver itself and (this is new) some work on the
transport API as well as a message flooding fix."
On wl12xx, Luca says:
"Lots of bugfixes and improvements in our TI wireless drivers,
including support for multi-channel. Intended for 3.9."
On Bluetooth, Gustavo says:
"This is my first pull request to 3.9. The biggest changes here are from Johan
Hedberg who made a lot of fixes in the Management interface. The issues arose
due to a new test tool we wrote and the usage of the Management interface as
default in BlueZ 5. The rest of the patches are more clean ups and small
fixes."
And also:
"Here goes another batch intended for 3.9, the majority of the patch here are
from Johan who is fixing many issues in the management interface that have
appeared lately. The rest of the patches are just small improvements, fixes
and clean ups."
Along with those are the usual variety of updates/enhancements to
the mwl8k, mwifiex, ath9k, rtlwifi, and rt2x00 drivers as well as
a few updates for the ssb and bcma busses. I don't think there are
any big headliners there.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 8 Feb 2013 19:55:08 +0000 (14:55 -0500)]
Merge branch 'sctp_keys'
Daniel Borkmann says:
====================
Cryptographically used keys should be zeroed out when our session
ends resp. memory is freed, thus do not leave them somewhere in the
memory.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 8 Feb 2013 03:04:35 +0000 (03:04 +0000)]
net: sctp: sctp_endpoint_free: zero out secret key data
On sctp_endpoint_destroy, previously used sensitive keying material
should be zeroed out before the memory is returned, as we already do
with e.g. auth keys when released.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 8 Feb 2013 03:04:34 +0000 (03:04 +0000)]
net: sctp: sctp_setsockopt_auth_key: use kzfree instead of kfree
In sctp_setsockopt_auth_key, we create a temporary copy of the user
passed shared auth key for the endpoint or association and after
internal setup, we free it right away. Since it's sensitive data, we
should zero out the key before returning the memory back to the
allocator. Thus, use kzfree instead of kfree, just as we do in
sctp_auth_key_put().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiko Carstens [Fri, 8 Feb 2013 00:19:11 +0000 (00:19 +0000)]
atm/iphase: rename fregt_t -> ffreg_t
We have conflicting type qualifiers for "freg_t" in s390's ptrace.h and the
iphase atm device driver, which causes the compile error below.
Unfortunately the s390 typedef can't be renamed, since it's a user visible api,
nor can I change the include order in s390 code to avoid the conflict.
So simply rename the iphase typedef to a new name. Fixes this compile error:
In file included from drivers/atm/iphase.c:66:0:
drivers/atm/iphase.h:639:25: error: conflicting type qualifiers for 'freg_t'
In file included from next/arch/s390/include/asm/ptrace.h:9:0,
from next/arch/s390/include/asm/lowcore.h:12,
from next/arch/s390/include/asm/thread_info.h:30,
from include/linux/thread_info.h:54,
from include/linux/preempt.h:9,
from include/linux/spinlock.h:50,
from include/linux/seqlock.h:29,
from include/linux/time.h:5,
from include/linux/stat.h:18,
from include/linux/module.h:10,
from drivers/atm/iphase.c:43:
next/arch/s390/include/uapi/asm/ptrace.h:197:3: note: previous declaration of 'freg_t' was here
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: chas williams - CONTRACTOR <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
Will Deacon [Fri, 8 Feb 2013 11:52:29 +0000 (12:52 +0100)]
ARM: 7641/1: memory: fix broken mmap by ensuring TASK_UNMAPPED_BASE is aligned
We have received multiple reports of mmap failures when running with a
2:2 vm split. These manifest as either -EINVAL with a non page-aligned
address (ending 0xaaa) or a SEGV, depending on the application. The
issue is commonly observed in children of make, which appears to use
bottom-up mmap (assumedly because it changes the stack rlimit).
Further investigation reveals that this regression was triggered by 394ef6403abc ("mm: use vm_unmapped_area() on arm architecture"), whereby
TASK_UNMAPPED_BASE is no longer page-aligned for bottom-up mmap, causing
get_unmapped_area to choke on misaligned addressed.
This patch fixes the problem by defining TASK_UNMAPPED_BASE in terms of
TASK_SIZE and explicitly aligns the result to 16M, matching the other
end of the heap.
Acked-by: Nicolas Pitre <nico@linaro.org> Reported-by: Steve Capper <steve.capper@arm.com> Reported-by: Jean-Francois Moine <moinejf@free.fr> Reported-by: Christoffer Dall <cdall@cs.columbia.edu> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>