Alexander Duyck [Sat, 9 Feb 2013 04:27:48 +0000 (04:27 +0000)]
igb: Update igb to use a path similar to ixgbe to determine when to stop Tx
After reviewing the igb and ixgbe code I realized there are a few issues in
how the code is structured. Specifically we are not checking the size of the
buffers being used in transmits and we are not using the same value to
determine when to stop or start a Tx queue. As such the code is prone to be
buggy.
This patch makes it so that we have one value DESC_NEEDED that we will use for
starting and stopping the queue. In addition we will check the size of
buffers being used when setting up a transmit so as to avoid a possible buffer
overrun if we were to receive a frame with a block of data larger than 32K in
skb->data.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Sat, 2 Feb 2013 05:07:11 +0000 (05:07 +0000)]
igb: Support using build_skb in the case that jumbo frames are disabled
This change makes it so that we can enable the use of build_skb for cases
where jumbo frames are disabled. The advantage to this is that we do not
have to perform a memcpy to populate the header and as a result we see a
significant performance improvement.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Thu, 14 Feb 2013 20:57:38 +0000 (15:57 -0500)]
net: Don't write to current task flags on every packet received.
Even for non-pfmalloc SKBs, __netif_receive_skb() will do a
tsk_restore_flags() on current unconditionally.
Make __netif_receive_skb() a shim around the existing code, renamed to
__netif_receive_skb_core(). Let __netif_receive_skb() wrap the
__netif_receive_skb_core() call with the task flag modifications, if
necessary.
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 14 Feb 2013 05:00:07 +0000 (05:00 +0000)]
gianfar: Fix and cleanup Rx FCB indication
This fixes a less obvious error on one hand, and prevents futher
similar errors by disambiguating and optimizing RxFCB indication,
on the other hand.
The error consists in NETIF_F_HW_VLAN_TX flag being used as an
indication of Rx FCB insertion. This happened as soon gfar_uses_fcb(),
which despite its name indicates Rx FCB insertion, started
incorporating is_vlan_on().
is_vlan_on(), on the other hand, is also a misleading construct because
we need to differentiate b/w hw VLAN extraction/VLEX (marked by VLAN_RX
flag) and hw VLAN insertion/VLINS (VLAN_TX flag), which are different
mechanisms using different types of FCBs.
The hw spec for the RxFCB feature is as follows:
In the case of RxBD rings, FCBs (Frame Control Block) are inserted by
the eTSEC whenever RCTRL[PRSDEP] is set to a non-zero value. Only one
FCB is inserted per frame (in the buffer pointed to by the RxBD with
bit F set). TOE acceleration for receive is enabled for all rx frames
in this case.
This patch introduces priv->uses_rxfcb field to quickly signal RxFCB
insertion in accordance with the specification above.
The dependency on FSL_GIANFAR_DEV_HAS_TIMER was also eliminated as
another source of confusion. The actual dependency is to priv->hwts_rx_en.
Upon changing priv->hwts_rx_en via IOCTL, the gfar device is being
restarted and on init_mac() the priv->hwts_rx_en flag determines RxFCB
insertion, and rctrl is programmed accordingly. The patch takes care
of this case too.
Though maybe not as self documenting as the inlining version uses_fcb(),
priv->uses_rxfcb has the main purpose to quickly signal, on the hot path,
that the incoming frame has a *Rx* FCB block inserted which needs to be
pulled out before passing the skb to the stack. This is a performance
critical operation, it needs to happen fast, that's why uses_rxfcb is
placed in the first cacheline of gfar_private.
This is also why a cached rctrl cannot be used instead: 1) because
we don't have 32 bits available in the first cacheline of gfar_priv
(but only 16); 2) bit operations are expensive on the hot path.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 14 Feb 2013 05:00:06 +0000 (05:00 +0000)]
gianfar: Remove wrong buffer size conditioning to VLAN h/w offload
The controller's ref manual states clearly that when the hw Rx vlan
offload feature is enabled, meaning that the VLEX bit from RCTRL is
correctly enabled, then the hw performs automatic VLAN tag extraction
and deletion from the ethernet frames. So there's no point in trying to
increase the rx buff size when rxvlan is on, as the frame is actually
smaller.
And the Tx vlan hw accel feature (VLINS) has nothing to do with rx buff
size computation.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 14 Feb 2013 05:00:04 +0000 (05:00 +0000)]
gianfar: GRO_DROP is unlikely
The change is significant since it affects the rx hot path.
Paul observed and documented the effects at asm level, see
below:
"It turns out that it does make a difference, since gfar_process_frame
gets inlined, and so the increment code gets moved out of line (I have
marked the if statment with * and the increment code within "-----"):
So, the increment does actually get moved ~1k away."
Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 14 Feb 2013 05:00:03 +0000 (05:00 +0000)]
gianfar: Cleanup and optimize struct gfar_private
Group run-time critical fields within the 1st cacheline (32B)
followed by the tx|rx_queue reference arrays and the interrupt
group instances (gfargrp), all cacheline aligned.
This has several benefits. Firstly comes the performance benefit
by having the members required by the driver's hot path re-grouped
in the structure's first cache lines, whereas the unimportant
members were pushed towards the end of the struct.
Another benefit comes from eliminating a 24 byte memory hole that
was rendering gfar_priv's 2nd cacheline useless. The default gcc
layout of gfar_private leaves an implicit 24 byte hole after the
errata (enum) member. This patch fixes it.
The uchar bitfields were pushed towards the end of the struct
as these are not run-time performance critical (used for init
time operations). Because there is no other 2 byte member
around to couple the uchar bitfields memeber with, we will
have an addititnal 2 byte hole after the bitfields. This is
unsignificant however, and it doesn't influence gfar_priv's
size, because the whole structure is padded to be a 32B multiple.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Thu, 14 Feb 2013 05:00:02 +0000 (05:00 +0000)]
gianfar: Add device ref (dev) in gfar_private
Use device pointer (dev) to simplify the code and to
avoid double indirections, especially on the hot path.
Basically, instead of accessing priv to get the ofdev
reference and then accessing the ofdev structure to
dereference the needed dev pointer, we will get the
dev pointer directly from priv.
The dev pointer is required on the hot path, see gfar_new_rxbdp
or gfar_clean_rx_ring (or xmit), and this patch makes
it available directly from priv's 1st cacheline.
This change is reflected at asm level too, taking (the hot)
gfar_new_rxbdp():
initial version -
18c0: 7c 7e 1b 78 mr r30,r3
David S. Miller [Thu, 14 Feb 2013 18:29:20 +0000 (13:29 -0500)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
1) Remove a duplicated call to skb_orphan() in pf_key, from Cong Wang.
2) Prepare xfrm and pf_key for algorithms without pf_key support,
from Jussi Kivilinna.
3) Fix an unbalanced lock in xfrm_output_one(), from Li RongQing.
4) Add an IPsec state resolution packet queue to handle
packets that are send before the states are resolved.
5) xfrm4_policy_fini() is unused since 2.6.11, time to remove it.
From Michal Kubecek.
6) The xfrm gc threshold was configurable just in the initial
namespace, make it configurable in all namespaces. From
Michal Kubecek.
7) We currently can not insert policies with mark and mask
such that some flows would be matched from both policies.
Allow this if the priorities of these policies are different,
the one with the higher priority is used in this case.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Wed, 13 Feb 2013 19:57:12 +0000 (19:57 +0000)]
bridge: make ifla_br_policy and br_af_ops static
They are only used within this file.
Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Thu, 14 Feb 2013 05:32:31 +0000 (13:32 +0800)]
bridge: use __u16 in if_bridge.h
We should use "__u16" instead of "u16" in the user-space visable
header.
Cc: Vlad Yasevich <vyasevic@redhat.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Changes since v10
* Updated implemenation of ndo_fdb_del in emulex and qlogic drivers.
Changes since v9:
* series re-ordering so make functionality more distinct. Basic vlan
filtering is patches 1-4. Support for PVID/untagged vlans is patches
5 and 6. VLAN support for FDB/MDB is patches 7-11. Patch 12 is
still additional egress policy.
* Slight simplification to code that extracts the VID from skb. Since we
now depend on the vlan module, at the time of input skb_tci is guaranteed
to be set if the packet had 8021q header. We can simply refere to it.
* Changed the opaque 'parent' pointer from prior patches to a union so we
can be much more explicit in our assignments.
* Lots of additional testing with STP turned on. No issues were observed.
Changes since v8:
* Unified vlans_to_* calls into a single interface
* Fixed the rest of the issues report by Michal Miroslaw
* Fixed a bug where fdb entries were not created for all added vlans.
Changes since v7:
* Rebases on the latest net-next and removed the vlan wrapper patch from
the series.
* Fixed a crash in br_fdb_add/br_fdb_delete.
Changes since v6:
* VLANs are now stored in a VLAN bitmap per port. This allows for O(1)
lookup at ingress and egress. We simply check to see if the bit associated
with the vlan id is set in the map. The drawback to this approach is that
it wastes some space when there is only a small number of VLANs.
* In addition to the build time configuration option, VLAN filtering also has
a configuration paramter in sysfs. By default the filtering is turned off
and all traffic is permitted. When the filtring is turned on, we do strict
matching to the filter configured. Thus, if there is no configuration, all
packets are rejected. This was done to make the behavior more streight
forward. Without this (and if egress policy patch is rejected), the
decision for how to forward untagged traffic that was not filtered at ingress
is almost impossible to make. It would not be right to deliver to every
port that has PVID set as, each port may have a different PVID.
* Separate egress policy bitmap patch has been isolated and is provided last
in the series. This has been a more contentious piece of functionality and I
wanted to isolate it so that it could easily be dropped and not block the whole
series.
Changes since v5:
- Pulled VLAN filtering into its own file and made it a configuration options.
- Made new vlan filtering option dependent on VLAN_8021Q.
- Got rid of HW filter inlines and moved then vlan_core.c.
(All of the above suggested by Stephen Hemminger)
Changes since v4:
- Pull per-port vlan data into its own structures and give it to the bridge
device thus making bridge device behave like a regular port for vlan
configuration.
- Add a per-vlan 'untagged' bitmap that determins egress policy. If a port
is part of this bitmap, traffic egresses untagged.
- PVID is now used for ingress policy only. Incomming frames without VLAN tag
are assigned to the PVID vlan. Egress is determined via bitmap memberships.
- Allow for incremental config of a vlan. Now, PVID and untagged memberships
may be set on existing vlans. They however can NOT be cleared separately.
- VLAN deletion is now done via RTM_DELLINK command for PF_BRIDGE family.
This cleans up the netlink interface.
Changes since v3:
- Re-integrated compiler problems that got left out last time. Appologies.
- checkpatches.pl errors fixed
Changes since v2:
- Added inline functiosn to manimulate vlan hw filters and re-use in 8021q
and bridge code.
- Use rtnl_dereference (Michael Tsirkin)
- Remove synchronize_net() call (Eric Dumazet)
- Fix NULL ptr deref bug I introduced in br_ifinfo_notify.
Changes since v1:
- Fixed some forwarding bugs.
- Add vlan to local fdb entries. New local entries are created per vlan
to facilite correct forwarding to bridge interface.
- Allow configuration of vlans directly on the bridge master device
in addition to ports.
Changes since rfc v2:
- Per-port vlan bitmap is gone and is replaced with a vlan list.
- Added bridge vlan list, which is referenced by each port. Entries in
the birdge vlan list have port bitmap that shows which port are parts
of which vlan.
- Netlink API changes.
- Dropped sysfs support for now. If people think this is really usefull,
can add it back.
- Support for native/untagged vlans.
Changes since rfc v1:
- Comments addressed regarding formatting and RCU usage
- iocts have been removed and changed over the netlink interface.
- Added support of user added ndb entries.
- changed sysfs interface to export a bitmap. Also added a write interface.
I am not sure how much I like it, but it made my testing easier/faster. I
might change the write interface to take text instead of binary.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 13 Feb 2013 12:00:20 +0000 (12:00 +0000)]
bridge: Separate egress policy bitmap
Add an ability to configure a separate "untagged" egress
policy to the VLAN information of the bridge. This superseeds PVID
policy and makes PVID ingress-only. The policy is configured with a
new flag and is represented as a port bitmap per vlan. Egress frames
with a VLAN id in "untagged" policy bitmap would egress
the port without VLAN header.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 13 Feb 2013 12:00:19 +0000 (12:00 +0000)]
bridge: Add vlan support for local fdb entries
When VLAN is added to the port, a local fdb entry for that port
(the entry with the mac address of the port) is added for that
VLAN. This way we can correctly determine if the traffic
is for the bridge itself. If the address of the port changes,
we try to change all the local fdb entries we have for that port.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 13 Feb 2013 12:00:18 +0000 (12:00 +0000)]
bridge: Add vlan support to static neighbors
When a user adds bridge neighbors, allow him to specify VLAN id.
If the VLAN id is not specified, the neighbor will be added
for VLANs currently in the ports filter list. If no VLANs are
configured on the port, we use vlan 0 and only add 1 entry.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 13 Feb 2013 12:00:16 +0000 (12:00 +0000)]
bridge: Add vlan to unicast fdb entries
This patch adds vlan to unicast fdb entries that are created for
learned addresses (not the manually configured ones). It adds
vlan id into the hash mix and uses vlan as an addditional parameter
for an entry match.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 13 Feb 2013 12:00:15 +0000 (12:00 +0000)]
bridge: Add the ability to configure pvid
A user may designate a certain vlan as PVID. This means that
any ingress frame that does not contain a vlan tag is assigned to
this vlan and any forwarding decisions are made with this vlan in mind.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 13 Feb 2013 12:00:13 +0000 (12:00 +0000)]
bridge: Dump vlan information from a bridge port
Using the RTM_GETLINK dump the vlan filter list of a given
bridge port. The information depends on setting the filter
flag similar to how nic VF info is dumped.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 13 Feb 2013 12:00:12 +0000 (12:00 +0000)]
bridge: Add netlink interface to configure vlans on bridge ports
Add a netlink interface to add and remove vlan configuration on bridge port.
The interface uses the RTM_SETLINK message and encodes the vlan
configuration inside the IFLA_AF_SPEC. It is possble to include multiple
vlans to either add or remove in a single message.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 13 Feb 2013 12:00:09 +0000 (12:00 +0000)]
bridge: Add vlan filtering infrastructure
Adds an optional infrustructure component to bridge that would allow
native vlan filtering in the bridge. Each bridge port (as well
as the bridge device) now get a VLAN bitmap. Each bit in the bitmap
is associated with a vlan id. This way if the bit corresponding to
the vid is set in the bitmap that the packet with vid is allowed to
enter and exit the port.
Write access the bitmap is protected by RTNL and read access
protected by RCU.
Vlan functionality is disabled by default.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 13 Feb 2013 01:03:50 +0000 (01:03 +0000)]
net: sctp: add build check for sctp_sf_eat_sack_6_2/jsctp_sf_eat_sack
In order to avoid any future surprises of kernel panics due to jprobes
function mismatches (as e.g. fixed in 4cb9d6eaf85ecd: sctp: jsctp_sf_eat_sack:
fix jprobes function signature mismatch), we should check both function
types during build and scream loudly if they do not match. __same_type
resolves to __builtin_types_compatible_p, which is 1 in case both types
are the same and 0 otherwise, qualifiers are ignored. Tested by myself.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Mon, 11 Feb 2013 09:27:41 +0000 (09:27 +0000)]
net: Fix possible wrong checksum generation.
Patch cef401de7be8c4e (net: fix possible wrong checksum
generation) fixed wrong checksum calculation but it broke TSO by
defining new GSO type but not a netdev feature for that type.
net_gso_ok() would not allow hardware checksum/segmentation
offload of such packets without the feature.
Following patch fixes TSO and wrong checksum. This patch uses
same logic that Eric Dumazet used. Patch introduces new flag
SKBTX_SHARED_FRAG if at least one frag can be modified by
the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
info tx_flags rather than gso_type.
tx_flags is better compared to gso_type since we can have skb with
shared frag without gso packet. It does not link SHARED_FRAG to
GSO, So there is no need to define netdev feature for this.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 Feb 2013 18:22:24 +0000 (13:22 -0500)]
Merge branch 'tcp_tsoffset'
Andrey Vagin says:
====================
If a TCP socket will get live-migrated from one box to another the
timestamps (which are typically ON) will get screwed up -- the new
kernel will generate TS values that has nothing to do with what they
were on dump. The solution is to yet again fix the kernel and put a
"timestamp offset" on a socket.
A socket offset is added in places where externally visible tcp
timestamp option is parsed/initialized.
Connections in the SYN_RECV state are not supported, global
tcp_time_stamp is used for them, because repair mode doesn't support
this state. In a future it can be implemented by the similar way as for
TIME_WAIT sockets.
For time-wait sockets offset is inhereted by a proper tcp_sock.
A per-socket offset can be set only for sockets in repair mode.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Mon, 11 Feb 2013 05:50:19 +0000 (05:50 +0000)]
tcp: send packets with a socket timestamp
A socket timestamp is a sum of the global tcp_time_stamp and
a per-socket offset.
A socket offset is added in places where externally visible
tcp timestamp option is parsed/initialized.
Connections in the SYN_RECV state are not supported, global
tcp_time_stamp is used for them, because repair mode doesn't support
this state. In a future it can be implemented by the similar way
as for TIME_WAIT sockets.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Mon, 11 Feb 2013 05:50:18 +0000 (05:50 +0000)]
tcp: set and get per-socket timestamp
A timestamp can be set, only if a socket is in the repair mode.
This patch adds a new socket option TCP_TIMESTAMP, which allows to
get and set current tcp times stamp.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Mon, 11 Feb 2013 05:50:17 +0000 (05:50 +0000)]
tcp: adding a per-socket timestamp offset
This functionality is used for restoring tcp sockets. A tcp timestamp
depends on how long a system has been running, so it's differ for each
host. The solution is to set a per-socket offset.
A per-socket offset for a TIME_WAIT socket is inherited from a proper
tcp socket.
tcp_request_sock doesn't have a timestamp offset, because the repair
mode for them are not implemented.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 13 Feb 2013 18:18:20 +0000 (13:18 -0500)]
Merge branch 'gfar-ethtool-atomic' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Paul Gortmaker says:
====================
Eric noticed that the handling of local u64 ethtool counters for
this driver commonly found on Freescale ppc-32 boards was racy.
However, before converting them over to atomic64_t, I noticed
that an internal struct was being used to determine the offsets
for exporting this data into the ethtool buffer, and in doing
so, it assumed that the counters would always be u64. Rather
than keep this implicit assumption, a simple code cleanup gets
rid of the struct completely, and leaves less conversion sites.
The alternative solution would have been to take advantage of
the fact that the counters are all relating to error conditions,
and hence make them internally u32. In doing so, we'd be assuming
that U32_MAX of any particular error condition is highly unlikely.
This might have made sense if any increments were in a hot path.
Tested with "ethtool -S eth0" on sbc8548 board.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Wed, 13 Feb 2013 16:32:42 +0000 (11:32 -0500)]
netpoll: fix smatch warnings in netpoll core code
Dan Carpenter contacted me with some notes regarding some smatch warnings in the
netpoll code, some of which I introduced with my recent netpoll locking fixes,
some which were there prior. Specifically they were:
net-next/net/core/netpoll.c:243 netpoll_poll_dev() warn: inconsistent
returns mutex:&ni->dev_lock: locked (213,217) unlocked (210,243)
net-next/net/core/netpoll.c:706 netpoll_neigh_reply() warn: potential
pointer math issue ('skb_transport_header(send_skb)' is a 128 bit pointer)
This patch corrects the locking imbalance (the first error), and adds some
parenthesis to correct the second error. Tested by myself. Applies to net-next
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Dan Carpenter <dan.carpenter@oracle.com> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
It adds an argument called panic to a function that uses the BUG() macro
which tries to call panic, but the argument masks the panic() function
declaration, resulting in the following error (gcc 4.2.4):
net/core/skbuff.c In function 'skb_panic':
net/core/skbuff.c +126 : error: called object 'panic' is not a function
This is fixed by renaming the argument to msg.
Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Jean Sacren <sakiwit@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Gortmaker [Tue, 12 Feb 2013 20:28:35 +0000 (15:28 -0500)]
gianfar: remove largely unused gfar_stats struct
The gfar_stats struct is only used in copying out data
via ethtool. It is declared as the extra stats, followed
by the rmon stats. However, the rmon stats are never
actually ever used in the driver; instead the rmon data
is a u32 register read that is cast directly into the
ethtool buf.
It seems the only reason rmon is in the struct at all is
to give the offset(s) at which it should be exported into
the ethtool buffer. But note gfar_stats doesn't contain
a gfar_extra_stats as a substruct -- instead it contains
a u64 array of equal element count. This implicitly means
we have two independent declarations of what gfar_extra_stats
really is. Rather than have this duality, we already have
defines which give us the offset directly, and hence do not
need the struct at all.
Further, since we know the extra_stats is unconditionally
always present, we can write it out to the ethtool buf
1st, and then optionally write out the rmon data. There
is no need for two independent loops, both of which are
simply copying out the extra_stats to buf offset zero.
This also helps pave the way towards allowing the extra
stats fields to be converted to atomic64_t values, without
having their types directly influencing the ethtool stats
export code (gfar_fill_stats) that expects to deal with u64.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Jiri Pirko [Tue, 12 Feb 2013 00:12:07 +0000 (00:12 +0000)]
act_police: improved accuracy at high rates
Current act_police uses rate table computed by the "tc" userspace
program, which has the following issue:
The rate table has 256 entries to map packet lengths to token (time
units). With TSO sized packets, the 256 entry granularity leads to
loss/gain of rate, making the token bucket inaccurate.
Thus, instead of relying on rate table, this patch explicitly computes
the time and accounts for packet transmission times with nanosecond
granularity.
Jiri Pirko [Tue, 12 Feb 2013 00:12:06 +0000 (00:12 +0000)]
act_police: move struct tcf_police to act_police.c
It's not used anywhere else, so move it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 12 Feb 2013 00:12:05 +0000 (00:12 +0000)]
tbf: improved accuracy at high rates
Current TBF uses rate table computed by the "tc" userspace program,
which has the following issue:
The rate table has 256 entries to map packet lengths to
token (time units). With TSO sized packets, the 256 entry granularity
leads to loss/gain of rate, making the token bucket inaccurate.
Thus, instead of relying on rate table, this patch explicitly computes
the time and accounts for packet transmission times with nanosecond
granularity.
The bnx2x gso_type setting bug fix in 'net' conflicted with
changes in 'net-next' that broke the gso_* setting logic
out into a seperate function, which also fixes the bug in
question. Thus, use the 'net-next' version.
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Roese [Tue, 12 Feb 2013 03:05:38 +0000 (03:05 +0000)]
net: fec_mpc52xx: Read MAC address from device-tree
Until now, the MPC5200 FEC ethernet driver relied upon the bootloader
(U-Boot) to write the MAC address into the ethernet controller
registers. The Linux driver should not rely on such a thing. So
lets read the MAC address from the DT as it should be done here.
The following priority is now used to read the MAC address:
1) First, try OF node MAC address, if not present or invalid, then:
2) Read from MAC address registers, if invalid, then:
3) Log a warning message, and choose a random MAC address.
This fixes a problem with a MPC5200 board that uses the SPL U-Boot
version without FEC initialization before Linux booting for
boot speedup.
Additionally a status line is now be printed upon successful
driver probing, also displaying this MAC address.
Signed-off-by: Stefan Roese <sr@denx.de> Cc: Anatolij Gustschin <agust@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Vipul Pandya [Tue, 12 Feb 2013 00:36:21 +0000 (00:36 +0000)]
cxgb4vf: Fix VLAN extraction counter increment
Signed-off-by: Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The CPSW switch can act as Dual EMAC by segregating the switch ports
using VLAN and port VLAN as per the TRM description in
14.3.2.10.2 Dual Mac Mode
Following CPSW components will be common for both the interfaces.
* Interrupt source is common for both eth interfaces
* Interrupt pacing is common for both interfaces
* Hardware statistics is common for all the ports
* CPDMA is common for both eth interface
* CPTS is common for both the interface and it should not be enabled on
both the interface as timestamping information doesn't contain port
information.
Constrains
* Reserved VID of One port should not be used in other interface which will
enable switching functionality
* Same VID must not be used in both the interface which will enable switching
functionality
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 Feb 2013 21:11:09 +0000 (16:11 -0500)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:
====================
Here is another handful of late-breaking fixes intended for the 3.8
stream... Hopefully the will still make it! :-)
There are three mac80211 fixes pulled from Johannes:
"Here are three fixes still for the 3.8 stream, the fix from Cong Ding
for the bad sizeof (Stephen Hemminger had pointed it out before but I'd
promptly forgotten), a mac80211 managed-mode channel context usage fix
where a downgrade would never stop until reaching non-HT and a bug in
the channel determination that could cause invalid channels like HT40+
on channel 11 to be used."
Also included is a mwl8k fix that avoids an oops when using mwl8k
devices that only support the 5 GHz band.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 12 Feb 2013 09:45:44 +0000 (09:45 +0000)]
ixgbe: Only set gso_type to SKB_GSO_TCPV4 as RSC does not support IPv6
The original fix that was applied for setting gso_type required more change
than necessary because it was assumed ixgbe does RSC on IPv6 frames and this
is not correct. RSC is only supported with IPv4/TCP frames only. As such we
can simplify the fix and avoid the unnecessary move of eth_type_trans.
The previous patch "ixgbe: fix gso type" and this patch reduce the entire fix
to one line that sets gso_type to TCPV4 if the frame is RSC.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Tommi Rantala <tt.rantala@gmail.com> Tested-by: Tommi Rantala <tt.rantala@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 12 Feb 2013 05:15:33 +0000 (05:15 +0000)]
net: sctp: remove unused multiple cookie keys
Vlad says: The whole multiple cookie keys code is completely unused
and has been all this time. Noone uses anything other then the
secret_key[0] since there is no changeover support anywhere.
Thus, for now clean up its left-over fragments.
Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 11 Feb 2013 10:25:31 +0000 (10:25 +0000)]
netpoll: cleanup sparse warnings
With my recent commit I introduced two sparse warnings. Looking closer there
were a few more in the same file, so I fixed them all up. Basic rcu pointer
dereferencing suff.
I've validated these changes using CONFIG_PROVE_RCU while starting and stopping
netconsole repeatedly in bonded and non-bonded configurations
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: fengguang.wu@intel.com CC: David Miller <davem@davemloft.net> CC: eric.dumazet@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 11 Feb 2013 10:25:30 +0000 (10:25 +0000)]
netpoll: Fix __netpoll_rcu_free so that it can hold the rtnl lock
__netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is
already held. The mechanism is used to asynchronously call __netpoll_cleanup
outside of the holding of the rtnl_lock, so as to avoid deadlock.
Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the
rtnl_lock must be held while calling it. Further, it cannot be held, because
rcu callbacks may be issued in softirq contexts, which cannot sleep.
Fix this by converting the rcu callback to a work queue that is guaranteed to
get scheduled in process context, so that we can hold the rtnl properly while
calling __netpoll_cleanup
Tested successfully by myself.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> CC: Cong Wang <amwang@redhat.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Sacren [Mon, 11 Feb 2013 13:30:38 +0000 (13:30 +0000)]
skbuff: create skb_panic() function and its wrappers
Create skb_panic() function in lieu of both skb_over_panic() and
skb_under_panic() so that code duplication would be avoided. Update type
and variable name where necessary.
Jiri Pirko suggested using wrappers so that we would be able to keep the
fruits of the original code.
Signed-off-by: Jean Sacren <sakiwit@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Jonas Gorski [Fri, 8 Feb 2013 15:07:25 +0000 (16:07 +0100)]
mwl8k: fix band for supported channels
The band field for the supported channels were left unpopulated, making
them default to 0 == IEEE80211_BAND_2GHZ, even for the 5GHz channels.
This resulted in null pointer accesses if anything tries to access
wiphy->bands[channel->band] of a 5GHz channel on 5GHz only cards, since
wiphy->bands[2GHZ] is NULL for them (e.g. cfg80211_chandef_usable does).
Spanning Tree Protocol packets should have always been marked as
control packets, this causes them to get queued in the high prirority
FIFO. As Radia Perlman mentioned in her LCA talk, STP dies if bridge
gets overloaded and can't communicate. This is a long-standing bug back
to the first versions of Linux bridge.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes sparse complaints about dropping __user in casts.
warning: cast removes address space of expression
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 10 Feb 2013 11:41:01 +0000 (11:41 +0000)]
bridge: use dev->addr_assign_type to see if user change mac
And remove no longer used br->flags.
Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: by default join ff01::1 and in case of forwarding ff01::2 and ff05:2
Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: don't let node/interface scoped multicast traffic escape on the wire
Reported-by: Erik Hugne <erik.hugne@ericsson.com> Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert [Mon, 11 Feb 2013 06:02:36 +0000 (07:02 +0100)]
xfrm: Allow inserting policies with matching mark and different priorities
We currently can not insert policies with mark and mask
such that some flows would be matched from both policies.
We make this possible when the priority of these policies
are different. If both policies match a flow, the one with
the higher priority is used.
Johannes Berg [Sat, 9 Feb 2013 20:46:34 +0000 (21:46 +0100)]
mac80211: fix channel selection bug
When trying to connect to an AP that advertises HT but not
VHT, the mac80211 code erroneously uses the configuration
from the AP as is instead of checking it against regulatory
and local capabilities. This can lead to using an invalid
or even inexistent channel (like 11/HT40+).
Additionally, the return flags from downgrading must be
ORed together, to collect them from all of the downgrades.
Also clarify the message.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Register for firmware based Inter Driver Communication (IDC) using initialize
NIC as the first mailbox command
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net neighbour, decnet: Ensure to align device private data on preferred alignment.
To allow both of protocol-specific data and device-specific data
attached with neighbour entry, and to eliminate size calculation
cost when allocating entry, sizeof protocol-speicic data must be
multiple of NEIGH_PRIV_ALIGN. On 64bit archs,
sizeof(struct dn_neigh) is multiple of NEIGH_PRIV_ALIGN, but on
32bit archs, it was not.
Introduce NEIGH_ENTRY_SPACE() macro to ensure that protocol-specific
entry-size meets our requirement.
Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6 mcast: Do not join device multicast for interface-local multicasts.
RFC4291 (IPv6 addressing architecture) says that interface-Local scope
spans only a single interface on a node. We should not join L2 device
multicast list for addresses in interface-local (or smaller) scope.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Feb 2013 01:44:08 +0000 (20:44 -0500)]
Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:
====================
The following patchset contains Netfilter/IPVS fixes for 3.8-rc7, they are:
* Fix oops in IPVS state-sync due to releasing a random memory area due
to unitialized pointer, from Dan Carpenter.
* Fix SCTP flow establishment due to bad checksumming mangling in IPVS,
from Daniel Borkmann.
* Three fixes for the recently added IPv6 NPT, all from YOSHIFUJI Hideaki,
with an amendment collapsed into those patches from Ulrich Weber. They
fiix adjustment calculation, fix prefix mangling and ensure LSB of
prefixes are zeroes (as required by RFC).
Specifically, it took me a while to validate the 1's complement arithmetics/
checksumming approach in the IPv6 NPT code.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Feb 2013 01:37:27 +0000 (20:37 -0500)]
Merge branch 'mvrp'
David Ward says:
====================
The Linux kernel currently implements the GARP VLAN Registration
Protocol (GVRP) from IEEE 802.1Q-1998 (applicant-only participant).
When the GVRP flag is set for a VLAN interface on a Linux host, the
host advertises its membership in the VLAN to the attached bridge/
switch, so that it is not necessary to manually configure the bridge/
switch port to participate in the VLAN.
GVRP has been superseded by the Multiple VLAN Registration Protocol
(MVRP) in IEEE 802.1Q-2011, which addresses scalability concerns about
the earlier protocol. The following patches add support for MVRP to
the Linux kernel and iproute2 utility. They are based largely off of
the existing implementation of GVRP, but have been modified for the
new PDU structure and state machine.
This implementation was tested with two Juniper EX4200 switches.
====================
Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Initial implementation of the Multiple VLAN Registration Protocol
(MVRP) from IEEE 802.1Q-2011, based on the existing implementation
of the GARP VLAN Registration Protocol (GVRP).
Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Initial implementation of the Multiple Registration Protocol (MRP)
from IEEE 802.1Q-2011, based on the existing implementation of the
Generic Attribute Registration Protocol (GARP).
Signed-off-by: David Ward <david.ward@ll.mit.edu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 11 Feb 2013 01:14:46 +0000 (20:14 -0500)]
Merge branch 'gso_type'
Michael S. Tsirkin says:
====================
At the moment, macvtap crashes are observed if macvtap is attached
to an interface with LRO enabled.
The crash in question is BUG() in macvtap_skb_to_vnet_hdr.
This happens because several drivers set gso_size but not gso_type
in incoming skbs.
This didn't use to be the case: with intel cards on 3.2 and older
kernels, with qlogic - on 3.4 and older kernels, so it's a regression if
not a recent one.
The following patches fix this for qlogic, broadcom and intel drivers.
I tested that the patch fixes the crash for ixgbe but
don't have qlogic/broadcom hardware to test.
I also only tested TCPv4.
Please review, and consider for 3.8.
Changes from v1:
- added missing htons as suggested by Eric
- backported the relevant bits from cbf1de72324a8105ddcc3d9ce9acbc613faea17e for bnx2x
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In LRO mode, bnx2x set gso_size but not gso type.
This leads to crashes in macvtap.
Commit cbf1de72324a8105ddcc3d9ce9acbc613faea17e
queued for 3.9 includes a more complete fix.
This is a minimal patch to avoid the crash, for 3.8.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qlcnic set gso_size but not gso type. This leads to crashes
in macvtap.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>