Herbert Xu [Fri, 18 Nov 2011 02:20:04 +0000 (02:20 +0000)]
net: Remove all uses of LL_ALLOCATED_SPACE
net: Remove all uses of LL_ALLOCATED_SPACE
The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the
alignment to the sum of needed_headroom and needed_tailroom. As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.
This patch replaces all uses of LL_ALLOCATED_SPACE with the macro
LL_RESERVED_SPACE and direct reference to needed_tailroom.
This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 18 Nov 2011 02:20:04 +0000 (02:20 +0000)]
ipv6: Remove all uses of LL_ALLOCATED_SPACE
ipv6: Remove all uses of LL_ALLOCATED_SPACE
The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the
alignment to the sum of needed_headroom and needed_tailroom. As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.
This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv6
with the macro LL_RESERVED_SPACE and direct reference to
needed_tailroom.
This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 18 Nov 2011 02:20:04 +0000 (02:20 +0000)]
ipv4: Remove all uses of LL_ALLOCATED_SPACE
ipv4: Remove all uses of LL_ALLOCATED_SPACE
The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the
alignment to the sum of needed_headroom and needed_tailroom. As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.
This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv4
with the macro LL_RESERVED_SPACE and direct reference to
needed_tailroom.
This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
david decotigny [Thu, 17 Nov 2011 09:38:23 +0000 (09:38 +0000)]
net-forcedeth: fix possible stats inaccuracies on 32b hosts
The software stats are updated from BH, this change ensures that 32b
UP hosts use appropriate protection.
Tested:
- HW/SW stats consistent with pktgen (in particular tx=rx)
- HW/SW stats consistent when tx/rx offloads disabled
- no problem with+without lockdep (SMP 16-way)
Signed-off-by: David Decotigny <david.decotigny@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2011 07:30:05 +0000 (07:30 +0000)]
bnx2: switch to build_skb() infrastructure
This is very similar to bnx2x conversion, but bnx2 only requires 16bytes
alignement at start of the received frame to store its l2_fhdr, so goal
was not to reduce skb truesize (in fact it should not change after this
patch)
Using build_skb() reduces cache line misses in the driver, since we
use cache hot skb instead of cold ones. Number of in-flight sk_buff
structures is lower, they are more likely recycled in SLUB caches
while still hot.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Michael Chan <mchan@broadcom.com> CC: Eilon Greenstein <eilong@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 17 Nov 2011 03:13:26 +0000 (03:13 +0000)]
net: use jump_label to shortcut RPS if not setup
Most machines dont use RPS/RFS, and pay a fair amount of instructions in
netif_receive_skb() / netif_rx() / get_rps_cpu() just to discover
RPS/RFS is not setup.
Add a jump_label named rps_needed.
If no device rps_map or global rps_sock_flow_table is setup,
netif_receive_skb() / netif_rx() do a single instruction instead of many
ones, including conditional jumps.
jmp +0 (if CONFIG_JUMP_LABEL=y)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matti Vaittinen [Wed, 16 Nov 2011 21:18:02 +0000 (21:18 +0000)]
IPV6 Fix a crash when trying to replace non existing route
This patch fixes a crash when non existing IPv6 route is tried to be changed.
When new destination node was inserted in middle of FIB6 tree, no relevant
sanity checks were performed. Later route insertion might have been prevented
due to invalid request, causing node with no rt info being left in tree.
When this node was accessed, a crash occurred.
Patch adds missing checks in fib6_add_1()
Signed-off-by: Matti Vaittinen <Mazziesaccount@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
david decotigny [Wed, 16 Nov 2011 12:15:15 +0000 (12:15 +0000)]
forcedeth: stats updated with a deferrable timer
Mark stats timer as deferrable: punctuality in waking the stats timer
callback doesn't matter much, as it is responsible only to avoid
integer wraparound.
We need at least 1 other timer to fire within 17s (fully loaded 1Gbps)
to avoid wrap-arounds. Desired period is still 10s.
Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
david decotigny [Wed, 16 Nov 2011 12:15:13 +0000 (12:15 +0000)]
forcedeth: implement ndo_get_stats64() API
This commit implements the ndo_get_stats64() API for forcedeth. Since
hardware stats are being updated from different contexts (process and
timer), this commit adds synchronization. For software stats, it
relies on the u64_stats_sync.h API.
Sameer Nanda [Wed, 16 Nov 2011 12:15:12 +0000 (12:15 +0000)]
forcedeth: allow to silence "TX timeout" debug messages
This adds a new module parameter "debug_tx_timeout" to silence most
debug messages in case of TX timeout. These messages don't provide a
signal/noise ratio high enough for production systems and, with ~30kB
logged each time, they tend to add to a cascade effect if the system
is already under stress (memory pressure, disk, etc.).
By default, the parameter is clear, meaning that only a single warning
will be reported.
Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
david decotigny [Wed, 16 Nov 2011 12:15:10 +0000 (12:15 +0000)]
net: new counter for tx_timeout errors in sysfs
This adds the /sys/class/net/DEV/queues/Q/tx_timeout attribute
containing the total number of timeout events on the given queue. It
is always available with CONFIG_SYSFS, independently of
CONFIG_RPS/XPS.
Credits to Stephen Hemminger for a preliminary version of this patch.
Tested:
without CONFIG_SYSFS (compilation only)
with sysfs and without CONFIG_RPS & CONFIG_XPS
with sysfs and without CONFIG_RPS
with sysfs and without CONFIG_XPS
with defaults
Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
david decotigny [Wed, 16 Nov 2011 12:15:08 +0000 (12:15 +0000)]
net-sysfs: fixed minor sparse warning
This commit fixes following warning:
net/core/net-sysfs.c:921:6: warning: symbol 'numa_node' shadows an earlier one
include/linux/topology.h:222:1: originally declared here
Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
david decotigny [Wed, 16 Nov 2011 12:15:07 +0000 (12:15 +0000)]
forcedeth: fix stats on hardware without extended stats support
This change makes sure that tx_packets/rx_bytes ifconfig counters are
updated even on NICs that don't provide hardware support for these
stats: they are now updated in software. For the sake of consistency,
we also now have tx_bytes updated in software (hardware counters
include ethernet CRC, and software doesn't account for it).
Tested:
pktgen + loopback (http://patchwork.ozlabs.org/patch/124698/)
reports identical tx_packets/rx_packets and tx_bytes/rx_bytes.
Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 898bdf2cb43eb0a962c397eb4dd1aec2c7211be2)
Michał Mirosław [Wed, 16 Nov 2011 14:05:33 +0000 (14:05 +0000)]
net: drivers: use bool type instead of double negation
Save some punctuation by using bool type's property equivalent to
doubled negation operator.
Reported-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Wed, 16 Nov 2011 23:36:59 +0000 (18:36 -0500)]
net: Add ethtool to mii advertisment conversion helpers
Translating between ethtool advertisement settings and MII
advertisements are common operations for ethernet drivers. This patch
adds a set of helper functions that implements the conversion. The
patch then modifies a couple of the drivers to use the new functions.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 16 Nov 2011 11:09:09 +0000 (11:09 +0000)]
team: replicate options on register
Since multiple team instances are putting defined options into their
option list, during register each option must be cloned before added
into list. This resolves uncool memory corruptions when using multiple
teams.
Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 15 Nov 2011 22:36:43 +0000 (22:36 +0000)]
6LoWPAN: double free in lowpan_fragment_xmit()
dev_queue_xmit() consumes its own skb, so the call to dev_kfree_skb()
in lowpan_fragment_xmit() is a double free.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
To enable VLAN promiscous mode, the HW interface should be created
with VLAN promiscous capability in Lancer. Add this capability during
creation of the HW interface.
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lancer uses V1 version of TXQ create. This command needs interface
id for TX queue creation. Rearrange code such that tx queue create
is after interface create. As TXQ create is now called after MCC
ring create use MCC instead of MBOX.
Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Nov 2011 15:29:55 +0000 (15:29 +0000)]
net: remove NETIF_F_NO_CSUM feature bit
Only distinct use is checking if NETIF_F_NOCACHE_COPY should be
enabled by default. The check heuristics is altered a bit here,
so it hits other people than before. The default shouldn't be
trusted for performance-critical cases anyway.
For all other uses NETIF_F_NO_CSUM is equivalent to NETIF_F_HW_CSUM.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Nov 2011 15:29:55 +0000 (15:29 +0000)]
net: Define enum for net device features.
Define feature values by bit position instead of direct 2**i values
and force the values to be of type netdev_features_t.
Cleaned and extended from patch by Mahesh Bandewar <maheshb@google.com>:
+ added netdev_features_t casts
+ included bits under NETIF_F_GSO_MASK
+ moved feature #defines out of struct net_device definition
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Nov 2011 15:29:55 +0000 (15:29 +0000)]
net: split netdev features to separate header
Move features definitions to separate header so that linux/skbuff.h won't
need to include linux/netdevice.h after netdev_features_t is introduced.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Nov 2011 15:29:55 +0000 (15:29 +0000)]
net: ethtool: break association of ETH_FLAG_* with NETIF_F_*
This is the only place left where dev->features are directly
exposed to userspace.
I know checkpatch.pl complains about __ethtool_{get,set}_flags(), but
the code is easier to read this way.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michał Mirosław [Tue, 15 Nov 2011 15:29:55 +0000 (15:29 +0000)]
net: remove legacy ethtool ops
As all drivers are converted, we may now remove discrete offload setting
callback handling.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 15 Nov 2011 04:12:55 +0000 (04:12 +0000)]
net: use jump_label for netstamp_needed
netstamp_needed seems a good candidate to jump_label conversion.
This avoids 3 conditional branches per incoming packet in fast path.
No measurable difference, given that these conditional branches are
predicted on modern cpus. Only a small icache reduction, thanks to the
unlikely() stuff.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones [Mon, 14 Nov 2011 14:17:08 +0000 (14:17 +0000)]
enable virtio_net to return bus_info in ethtool -i consistent with emulated NICs
Add a new .bus_name to virtio_config_ops then modify virtio_net to
call through to it in an ethtool .get_drvinfo routine to report
bus_info in ethtool -i output which is consistent with other
emulated NICs and the output of lspci.
Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Fri, 21 Oct 2011 20:04:09 +0000 (20:04 +0000)]
igb: Convert bare printk to pr_notice
printks should use KERN_ levels.
Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Sat, 29 Oct 2011 06:54:55 +0000 (06:54 +0000)]
ixgbe: fix LED blink logic to check for link
Previously the driver would force link without checking whether the link was
already established. This caused some inconsistencies in the LED blink rate.
Do not force link if link is already up.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jeff Kirsher [Thu, 3 Nov 2011 11:40:28 +0000 (11:40 +0000)]
e1000e: Convert printks to pr_<level>
Based on the original patch from Joe Perches.
Use the current logging styles.
pr_<level> conversions are now prefixed with "e1000e:"
Correct a couple of defects where the trailing NTU may have
been printed on a separate line because of an interleaving
hex_dump.
Remove unnecessary uses of KERN_CONT and use single pr_info()s
to avoid any possible output interleaving from other modules.
Coalesce formats as appropriate.
Remove an extra space from a broken across lines
coalescing of "Link Status " and " Change".
-v2 Remove changes to Copyright string
CC: Joe Perches <joe@perches.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Commit afc4b13d (net: remove use of ndo_set_multicast_list in
drivers) changed e1000e to use the ndo_set_rx_mode entry point,
but didn't implement the unicast address programming
functionality. Implement it to achieve the ability to add unicast
addresses.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
mdio-gpio: Add reset functionality to mdio-gpio driver(v2).
This patch adds phy reset functionality to mdio-gpio driver. Now
mdio_gpio_platform_data has new member as function pointer which can be
filled at the bsp level for a callback from phy infrastructure. Also the
mdio-bitbang driver fills-in the reset function of mii_bus structure.
Without this patch the bsp level code has to takecare of the reseting
PHY's on the bus, which become bit hacky for every bsp and
phy-infrastructure is ignored aswell.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matti Vaittinen [Mon, 14 Nov 2011 00:15:14 +0000 (00:15 +0000)]
IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag
The support for NLM_F_* flags at IPv6 routing requests.
If NLM_F_CREATE flag is not defined for RTM_NEWROUTE request,
warning is printed, but no error is returned. Instead new route is
added. Later NLM_F_CREATE may be required for
new route creation.
Exception is when NLM_F_REPLACE flag is given without NLM_F_CREATE, and
no matching route is found. In this case it should be safe to assume
that the request issuer is familiar with NLM_F_* flags, and does really
not want route to be created.
Specifying NLM_F_REPLACE flag will now make the kernel to search for
matching route, and replace it with new one. If no route is found and
NLM_F_CREATE is specified as well, then new route is created.
Also, specifying NLM_F_EXCL will yield returning of error if matching
route is found.
Patch created against linux-3.2-rc1
Signed-off-by: Matti Vaittinen <Mazziesaccount@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones [Mon, 14 Nov 2011 08:13:25 +0000 (08:13 +0000)]
Sweep the last of the active .get_drvinfo floors under ethernet/
This round of floor sweeping converts strncpy calls in various .get_drvinfo
routines to the preferred strlcpy. It also does a modicum of other
cleaning in those routines.
Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 14 Nov 2011 06:05:34 +0000 (06:05 +0000)]
bnx2x: uses build_skb() in receive path
bnx2x uses following formula to compute its rx_buf_sz :
dev->mtu + 2*L1_CACHE_BYTES + 14 + 8 + 8 + 2
Then core network adds NET_SKB_PAD and SKB_DATA_ALIGN(sizeof(struct
skb_shared_info))
Final allocated size for skb head on x86_64 (L1_CACHE_BYTES = 64,
MTU=1500) : 2112 bytes : SLUB/SLAB round this to 4096 bytes.
Since skb truesize is then bigger than SK_MEM_QUANTUM, we have lot of
false sharing because of mem_reclaim in UDP stack.
One possible way to half truesize is to reduce the need by 64 bytes
(2112 -> 2048 bytes)
Instead of allocating a full cache line at the end of packet for
alignment, we can use the fact that skb_shared_info sits at the end of
skb->head, and we can use this room, if we convert bnx2x to new
build_skb() infrastructure.
skb_shared_info will be initialized after hardware finished its
transfert, so we can eventually overwrite the final padding.
Using build_skb() also reduces cache line misses in the driver, since we
use cache hot skb instead of cold ones. Number of in-flight sk_buff
structures is lower, they are recycled while still hot.
Performance results :
(820.000 pps on a rx UDP monothread benchmark, instead of 720.000 pps)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Eilon Greenstein <eilong@broadcom.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Tom Herbert <therbert@google.com> CC: Jamal Hadi Salim <hadi@mojatatu.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Thomas Graf <tgraf@infradead.org> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 14 Nov 2011 06:03:34 +0000 (06:03 +0000)]
net: introduce build_skb()
One of the thing we discussed during netdev 2011 conference was the idea
to change some network drivers to allocate/populate their skb at RX
completion time, right before feeding the skb to network stack.
In old days, we allocated skbs when populating the RX ring.
This means bringing into cpu cache sk_buff and skb_shared_info cache
lines (since we clear/initialize them), then 'queue' skb->data to NIC.
By the time NIC fills a frame in skb->data buffer and host can process
it, cpu probably threw away the cache lines from its caches, because lot
of things happened between the allocation and final use.
So the deal would be to allocate only the data buffer for the NIC to
populate its RX ring buffer. And use build_skb() at RX completion to
attach a data buffer (now filled with an ethernet frame) to a new skb,
initialize the skb_shared_info portion, and give the hot skb to network
stack.
build_skb() is the function to allocate an skb, caller providing the
data buffer that should be attached to it. Drivers are expected to call
skb_reserve() right after build_skb() to adjust skb->data to the
Ethernet frame (usually skipping NET_SKB_PAD and NET_IP_ALIGN, but some
drivers might add a hardware provided alignment)
Data provided to build_skb() MUST have been allocated by a prior
kmalloc() call, with enough room to add SKB_DATA_ALIGN(sizeof(struct
skb_shared_info)) bytes at the end of the data without corrupting
incoming frame.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Eilon Greenstein <eilong@broadcom.com> CC: Ben Hutchings <bhutchings@solarflare.com> CC: Tom Herbert <therbert@google.com> CC: Jamal Hadi Salim <hadi@mojatatu.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Thomas Graf <tgraf@infradead.org> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds listen only mode to the mscan controller.
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 9 Nov 2011 12:07:14 +0000 (12:07 +0000)]
neigh: new unresolved queue limits
Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
> From: David Miller <davem@davemloft.net>
> Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
>
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Wed, 09 Nov 2011 12:14:09 +0100
> >
> >> unres_qlen is the number of frames we are able to queue per unresolved
> >> neighbour. Its default value (3) was never changed and is responsible
> >> for strange drops, especially if IP fragments are used, or multiple
> >> sessions start in parallel. Even a single tcp flow can hit this limit.
> > ...
> >
> > Ok, I've applied this, let's see what happens :-)
>
> Early answer, build fails.
>
> Please test build this patch with DECNET enabled and resubmit. The
> decnet neigh layer still refers to the removed ->queue_len member.
>
> Thanks.
Ouch, this was fixed on one machine yesterday, but not the other one I
used this morning, sorry.
[PATCH V5 net-next] neigh: new unresolved queue limits
unres_qlen is the number of frames we are able to queue per unresolved
neighbour. Its default value (3) was never changed and is responsible
for strange drops, especially if IP fragments are used, or multiple
sessions start in parallel. Even a single tcp flow can hit this limit.
$ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
Signed-off-by: David S. Miller <davem@davemloft.net>
More changes to the recent code to support control of forwarding
database via netlink.
* Support NTF_USE like neighbour table
* Validate state bits from application
* Only send notifications (and change bits) if new entry is
different.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones [Wed, 9 Nov 2011 09:58:07 +0000 (09:58 +0000)]
Sweep additional floors of strcpy in .get_drvinfo routines
Perform another round of floor sweeping, converting the .get_drvinfo
routines of additional drivers from strcpy to strlcpy along with
some conversion of sprintf to snprintf.
Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Fleming [Fri, 11 Nov 2011 05:10:39 +0000 (05:10 +0000)]
fsl_pq_mdio: Clean up tbi address configuration
The code for setting the address of the internal TBI PHY was
convoluted enough without a maze of ifdefs. Clean it up a bit
so we allow the logic to fail down to -ENODEV at the end of
the if/else ladder, rather than using ifdefs to repeat the same
failure code over and over.
Also, remove the support for the auto-configuration. I'm not aware of
anyone using it, and it ends up using the bus mutex before it's been
initialized.
Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sanjay Hortikar [Fri, 11 Nov 2011 16:11:21 +0000 (16:11 +0000)]
net-forcedeth: Add internal loopback support for forcedeth NICs.
Support enabling/disabling/querying internal loopback mode for
forcedeth NICs using ethtool.
Signed-off-by: Sanjay Hortikar <horti@google.com> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes settings for device initialization which makes possible to
use NDISC and TCP.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Acked-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch disables debug output enabled by default.
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Acked-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 13 Nov 2011 01:24:04 +0000 (01:24 +0000)]
ipv6: reduce percpu needs for icmpv6msg mibs
Reading /proc/net/snmp6 on a machine with a lot of cpus is very
expensive (can be ~88000 us).
This is because ICMPV6MSG MIB uses 4096 bytes per cpu, and folding
values for all possible cpus can read 16 Mbytes of memory (32MBytes on
non x86 arches)
ICMP messages are not considered as fast path on a typical server, and
eventually few cpus handle them anyway. We can afford an atomic
operation instead of using percpu data.
This saves 4096 bytes per cpu and per network namespace.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 11 Nov 2011 22:16:48 +0000 (22:16 +0000)]
net: introduce ethernet teaming device
This patch introduces new network device called team. It supposes to be
very fast, simple, userspace-driven alternative to existing bonding
driver.
Userspace library called libteam with couple of demo apps is available
here:
https://github.com/jpirko/libteam
Note it's still in its dipers atm.
team<->libteam use generic netlink for communication. That and rtnl
suppose to be the only way to configure team device, no sysfs etc.
Python binding of libteam was recently introduced.
Daemon providing arpmon/miimon active-backup functionality will be
introduced shortly. All what's necessary is already implemented in
kernel team driver.
v7->v8:
- check ndo_ndo_vlan_rx_[add/kill]_vid functions before calling
them.
- use dev_kfree_skb_any() instead of dev_kfree_skb()
v6->v7:
- transmit and receive functions are not checked in hot paths.
That also resolves memory leak on transmit when no port is
present
v5->v6:
- changed couple of _rcu calls to non _rcu ones in non-readers
v4->v5:
- team_change_mtu() uses team->lock while travesing though port
list
- mac address changes are moved completely to jurisdiction of
userspace daemon. This way the daemon can do FOM1, FOM2 and
possibly other weird things with mac addresses.
Only round-robin mode sets up all ports to bond's address then
enslaved.
- Extended Kconfig text
v3->v4:
- remove redundant synchronize_rcu from __team_change_mode()
- revert "set and clear of mode_ops happens per pointer, not per
byte"
- extend comment of function __team_change_mode()
v2->v3:
- team_change_mtu() uses rcu version of list traversal to unwind
- set and clear of mode_ops happens per pointer, not per byte
- port hashlist changed to be embedded into team structure
- error branch in team_port_enter() does cleanup now
- fixed rtln->rtnl
v1->v2:
- modes are made as modules. Makes team more modular and
extendable.
- several commenters' nitpicks found on v1 were fixed
- several other bugs were fixed.
- note I ignored Eric's comment about roundrobin port selector
as Eric's way may be easily implemented as another mode (mode
"random") in future.
Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:32 +0000 (04:34 +0000)]
bnx2x: update driver version to 1.70.35-0
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Sun, 13 Nov 2011 04:34:31 +0000 (04:34 +0000)]
bnx2x: Remove on-stack napi struct variable
Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:30 +0000 (04:34 +0000)]
bnx2x: prevent race in statistics flow
The race may cause access of registers while MAC hw block is
in reset state. As a result syslog will show error messages.
We can prevent this by using state from local variable.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Sun, 13 Nov 2011 04:34:29 +0000 (04:34 +0000)]
bnx2x: add fan failure event handling
Shut down the device in case of fan failure to prevent HW damage.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:28 +0000 (04:34 +0000)]
bnx2x: remove unused #define
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:27 +0000 (04:34 +0000)]
bnx2x: simplify definition of RX_SGE_MASK_LEN and use it.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:26 +0000 (04:34 +0000)]
bnx2x: DCBX: use #define instead of magic
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:25 +0000 (04:34 +0000)]
bnx2x: propagate DCBX negotiation
We need propagate the DCBX results from PMF to other functions
on the same port, in order to properly update netdev structure
and allow following new ETS and PFC configurations.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:24 +0000 (04:34 +0000)]
bnx2x: separate FCoE and iSCSI license initialization.
FCoE license info must be initialized at probe(), but
iSCSI at open().
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:23 +0000 (04:34 +0000)]
bnx2x: remove unused variable
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:22 +0000 (04:34 +0000)]
bnx2x: use rx_queue index for skb_record_rx_queue()
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Kravkov [Sun, 13 Nov 2011 04:34:21 +0000 (04:34 +0000)]
bnx2x: allow FCoE and DCB for 578xx
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Thu, 10 Nov 2011 19:18:00 +0000 (19:18 +0000)]
be2net: stop issuing FW cmds if any cmd times out
A FW cmd timeout (with a sufficiently large timeout value in the
order of tens of seconds) indicates an unresponsive FW. In this state
issuing further cmds and waiting for a completion will only stall the process.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Thu, 10 Nov 2011 19:17:57 +0000 (19:17 +0000)]
be2net: init (vf)_if_handle/vf_pmac_id to handle failure scenarios
Initialize if_handle, vf_if_handle and vf_pmac_id with "-1" so that in
failure cases when be_clear() is called, we can skip over
if_destroy/pmac_del cmds if they have not been created.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>