Tim Chen [Mon, 22 Aug 2011 14:57:26 +0000 (14:57 +0000)]
Scm: Remove unnecessary pid & credential references in Unix socket's send and receive path
Patch series 109f6e39..7361c36c back in 2.6.36 added functionality to
allow credentials to work across pid namespaces for packets sent via
UNIX sockets. However, the atomic reference counts on pid and
credentials caused plenty of cache bouncing when there are numerous
threads of the same pid sharing a UNIX socket. This patch mitigates the
problem by eliminating extraneous reference counts on pid and
credentials on both send and receive path of UNIX sockets. I found a 2x
improvement in hackbench's threaded case.
On the receive path in unix_dgram_recvmsg, currently there is an
increment of reference count on pid and credentials in scm_set_cred.
Then there are two decrement of the reference counts. Once in scm_recv
and once when skb_free_datagram call skb->destructor function
unix_destruct_scm. One pair of increment and decrement of ref count on
pid and credentials can be eliminated from the receive path. Until we
destroy the skb, we already set a reference when we created the skb on
the send side.
On the send path, there are two increments of ref count on pid and
credentials, once in scm_send and once in unix_scm_to_skb. Then there
is a decrement of the reference counts in scm_destroy's call to
scm_destroy_cred at the end of unix_dgram_sendmsg functions. One pair
of increment and decrement of the reference counts can be removed so we
only need to increment the ref counts once.
By incorporating these changes, for hackbench running on a 4 socket
NHM-EX machine with 40 cores, the execution of hackbench on
50 groups of 20 threads sped up by factor of 2.
Hackbench command used for testing:
./hackbench 50 thread 2000
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements Proportional Rate Reduction (PRR) for TCP.
PRR is an algorithm that determines TCP's sending rate in fast
recovery. PRR avoids excessive window reductions and aims for
the actual congestion window size at the end of recovery to be as
close as possible to the window determined by the congestion control
algorithm. PRR also improves accuracy of the amount of data sent
during loss recovery.
The patch implements the recommended flavor of PRR called PRR-SSRB
(Proportional rate reduction with slow start reduction bound) and
replaces the existing rate halving algorithm. PRR improves upon the
existing Linux fast recovery under a number of conditions including:
1) burst losses where the losses implicitly reduce the amount of
outstanding data (pipe) below the ssthresh value selected by the
congestion control algorithm and,
2) losses near the end of short flows where application runs out of
data to send.
As an example, with the existing rate halving implementation a single
loss event can cause a connection carrying short Web transactions to
go into the slow start mode after the recovery. This is because during
recovery Linux pulls the congestion window down to packets_in_flight+1
on every ACK. A short Web response often runs out of new data to send
and its pipe reduces to zero by the end of recovery when all its packets
are drained from the network. Subsequent HTTP responses using the same
connection will have to slow start to raise cwnd to ssthresh. PRR on
the other hand aims for the cwnd to be as close as possible to ssthresh
by the end of recovery.
A description of PRR and a discussion of its performance can be found at
the following links:
- IETF Draft:
http://tools.ietf.org/html/draft-mathis-tcpm-proportional-rate-reduction-01
- IETF Slides:
http://www.ietf.org/proceedings/80/slides/tcpm-6.pdf
http://tools.ietf.org/agenda/81/slides/tcpm-2.pdf
- Paper to appear in Internet Measurements Conference (IMC) 2011:
Improving TCP Loss Recovery
Nandita Dukkipati, Matt Mathis, Yuchung Cheng
Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Blocks can be configured with non-static frame-size.
2) Read/poll is at a block-level(as opposed to packet-level).
3) Added poll timeout to avoid indefinite user-space wait on idle links.
4) Added user-configurable knobs:
4.1) block::timeout.
4.2) tpkt_hdr::sk_rxhash.
Changes:
C1) tpacket_rcv()
C1.1) packet_current_frame() is replaced by packet_current_rx_frame()
The bulk of the processing is then moved in the following chain:
packet_current_rx_frame()
__packet_lookup_frame_in_block
fill_curr_block()
or
retire_current_block
dispatch_next_block
or
return NULL(queue is plugged/paused)
Signed-off-by: Chetan Loke <loke.chetan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides base support for transmission of IPv6 packets as
well as the formation of IPv6 link-local addresses and statelessly
autoconfigured addresses on top of IEEE 802.15.4 networks.
For more information please look at the RFC4944 "Compression Format
for IPv6 Datagrams in Low Power and Losst Networks (6LoWPAN).
Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Mon, 22 Aug 2011 23:45:01 +0000 (23:45 +0000)]
net: xfrm: convert to SKB frag APIs
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Mon, 22 Aug 2011 23:44:58 +0000 (23:44 +0000)]
net: convert core to skb paged frag APIs
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 22 Aug 2011 19:41:53 +0000 (19:41 +0000)]
be2net: fix erx->rx_drops_no_frags wrap around
The rx_drops_no_frags HW counter for RSS rings is 16bits in HW and can
wraparound often. Maintain a 32-bit accumulator in the driver to prevent
frequent wraparound.
Also, incorporated Eric's feedback to use ACCESS_ONCE() for the accumulator
write.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla [Mon, 22 Aug 2011 19:41:51 +0000 (19:41 +0000)]
be2net: Fix race in posting rx buffers.
There is a possibility of be_post_rx_frags() being called simultaneously from
both be_worker() (when rx_post_starved) and be_poll_rx() (when rxq->used is 0).
This can be avoided by posting rx buffers only when some completions have been
reaped.
Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 24 Aug 2011 10:41:19 +0000 (10:41 +0000)]
rps: support IPIP encapsulation
Skip IPIP header to get proper layer-4 information.
Like GRE tunnels, this only works if rxhash is not already provided by
the device itself (ethtool -K ethX rxhash off), to allow kernel compute
a software rxhash.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ian Campbell [Fri, 19 Aug 2011 06:25:00 +0000 (06:25 +0000)]
net: add APIs for manipulating skb page fragments.
The primary aim is to add skb_frag_(ref|unref) in order to remove the use of
bare get/put_page on SKB pages fragments and to isolate users from subsequent
changes to the skb_frag_t data structure.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 22 Aug 2011 19:43:22 +0000 (12:43 -0700)]
net: vlan: goto another_round instead of calling __netif_receive_skb
Now, when vlan tag on untagged in non-accelerated path is stripped from
skb, headers are reset right away. Benefit from that and avoid calling
__netif_receive_skb recursivelly and just use another_round.
Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesper Juhl [Mon, 22 Aug 2011 18:30:38 +0000 (11:30 -0700)]
net/wan/hdlc_ppp: use break in switch
We'll either hit one of the case labels or the default in the switch
and in all cases do we then 'goto out' and we also have a 'goto out'
after the switch that is redundant. Change to just use break in the
case statements and leave the 'goto out' after the lop for everyone to
hit.
Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David S. Miller <davem@davemloft.net>
batman-adv: print client flags in the local/global transtables output
Since clients can have several flags on or off, this patches make them
appear in the local/global transtable output so that they can be checked
for debugging purposes.
Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
batman-adv: hash_add() has to discriminate on the return value
hash_add() returns 0 on success while returns -1 either on error and on
entry already present. The caller could use such information to select
its behaviour. For this reason it is useful that hash_add() returns -1
in case on error and returns 1 in case of entry already present.
Signed-off-by: Antonio Quartulli <ordex@autistici.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
This change allows to get driver specific debug messages output
providing a module parameter. As far as the maximum level of verbosity
is too high, it is demoted by default.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Fri, 19 Aug 2011 13:58:24 +0000 (13:58 +0000)]
tg3: Update version to 3.120
This patch updates the tg3 version to 3.120.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Fri, 19 Aug 2011 13:58:23 +0000 (13:58 +0000)]
tg3: Add external loopback support to selftest
This patch adds external loopback support to tg3's ethtool selftest.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Fri, 19 Aug 2011 13:58:22 +0000 (13:58 +0000)]
tg3: Restructure tg3_test_loopback
The tg3_test_loopback() function is starting to get more complicated as
more loopback tests are added. This patch cleans up the code.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Fri, 19 Aug 2011 13:58:21 +0000 (13:58 +0000)]
tg3: Pull phy int lpbk setup into separate func
This patch pulls out the internal phy loopback setup code into a
separate function. This cleans up the loopback test code and makes it
available for NETIF_F_LOOPBACK support later.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Fri, 19 Aug 2011 13:58:20 +0000 (13:58 +0000)]
tg3: Consilidate MAC loopback code
The driver puts the device into MAC loopback in two places in the
driver. This patch consolidates the code into a single routine.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Carlson [Fri, 19 Aug 2011 13:58:19 +0000 (13:58 +0000)]
tg3: Remove dead code
Now that CPMU devices don't do MAC loopback, all the CPMU power saving
mode adjustments are unneeded. This patch removes the dead code.
Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Reviewed-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 29 Jun 2011 05:43:22 +0000 (05:43 +0000)]
ixgbe: Cleanup FCOE and VLAN handling in xmit_frame_ring
This change is meant to further cleanup the transmit path by streamlining
some of the VLAN and FCOE/DCB tasks in the transmit path. In addition it
adds code for support software VLANs in the event that they are used in
conjunction with DCB and/or FCOE.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Fri, 15 Jul 2011 02:31:30 +0000 (02:31 +0000)]
ixgbe: replace reference to CONFIG_FCOE with IXGBE_FCOE
CONFIG_FCOE is not the correct define to check since it is possible for it
to be CONFIG_FCOE_MODULE, as such the reference to it should be replaced
with IXGBE_FCOE.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Fri, 15 Jul 2011 02:31:25 +0000 (02:31 +0000)]
ixgbe: Refactor transmit map and cleanup routines
This patch implements a partial refactor of the TX map/queue and cleanup
routines. It merges the map and queue functionality and as a result
improves the transmit performance by avoiding unnecessary reads from memory.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Amir Hanania [Thu, 28 Apr 2011 08:47:23 +0000 (08:47 +0000)]
ixgbe - DDP last user buffer - error to warn
Change the error message in the last DDP user buffer to warn_once
Signed-off-by: Amir Hanania <amir.hanania@intel.com> Tested-by: Ross Brattain <ross.b.brattain@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com> Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bruce Allan [Fri, 29 Jul 2011 05:53:12 +0000 (05:53 +0000)]
e1000e: bump driver version number
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 22 Jul 2011 06:21:46 +0000 (06:21 +0000)]
e1000e: convert driver to use extended descriptors
Some features currently not supported by the driver (e.g. RSS) require the
use of extended descriptors, but the driver is setup to only use legacy
descriptors in all modes except for when jumbo frames are enabled on some
parts. Convert the driver to always use extended descriptors in order to
enable the forthcoming support of these other features.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Changli Gao [Fri, 19 Aug 2011 05:07:54 +0000 (22:07 -0700)]
net: rps: support 802.1Q
For the 802.1Q packets, if the NIC doesn't support hw-accel-vlan-rx, RPS
won't inspect the internal 4 tuples to generate skb->rxhash, so this kind
of traffic can't get any benefit from RPS.
This patch adds the support for 802.1Q to RPS.
Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Schmidt [Fri, 19 Aug 2011 04:51:01 +0000 (21:51 -0700)]
bnx2x: downgrade Max BW error message to debug
There are valid configurations where Max BW is configured to zero for
some VNs.
Print the message only if debugging is enabled and do not call the
configuration "illegal".
[v2: use DP(), not BNX2X_DBG_ERR(); recommended by Eilon Greenstein.]
Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The conversion was incomplete so that even if the driver was added to
the .config it wasn't built, but there were no errors). In this commit
we also update the various defconfigs that use EMAC to use the new
Kconfig symbol, and explicitly add the NET_VENDOR_IBM guard.
We do not explicitly select the Kconfig dependencies, as this would force
EMAC on. Doing it in the defconfig allows more flexibility.
Tested on a canyondlands board.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 19 Aug 2011 04:29:27 +0000 (21:29 -0700)]
vlan: reset headers on accel emulation path
It's after all necessary to do reset headers here. The reason is we
cannot depend that it gets reseted in __netif_receive_skb once skb is
reinjected. For incoming vlanids without vlan_dev, vlan_do_receive()
returns false with skb != NULL and __netif_reveive_skb continues, skb is
not reinjected.
This might be good material for 3.0-stable as well
Reported-by: Mike Auty <mike.auty@gmail.com> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 18 Aug 2011 06:50:37 +0000 (23:50 -0700)]
forcedeth: call vlan_mode only if hw supports vlans
If hw does not support vlans, dont call nv_vlan_mode because it has no point.
I believe that this should fix issues on older non-vlan supportive
chips (like Ingo has).
Reported-ty: Ingo Molnar <mingo@elte.hu> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
holt@sgi.com [Tue, 16 Aug 2011 17:32:24 +0000 (17:32 +0000)]
flexcan: Add flexcan device support for p1010rdb.
Allow the p1010 processor to select the flexcan network driver.
Signed-off-by: Robin Holt <holt@sgi.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>, Acked-by: Wolfgang Grandegger <wg@grandegger.com>, Cc: U Bhaskar-B22300 <B22300@freescale.com> Cc: socketcan-core@lists.berlios.de, Cc: netdev@vger.kernel.org, Cc: PPC list <linuxppc-dev@lists.ozlabs.org> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: David S. Miller <davem@davemloft.net>
holt@sgi.com [Tue, 16 Aug 2011 17:32:23 +0000 (17:32 +0000)]
flexcan: Prefer device tree clock frequency if available.
If our CAN device's device tree node has a clock-frequency property,
then use that value for the can devices clock frequency. If not, fall
back to asking the platform/mach code for the clock frequency associated
with the flexcan device.
Signed-off-by: Robin Holt <holt@sgi.com> Acked-by: Wolfgang Grandegger <wg@grandegger.com>, Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Marc Kleine-Budde <mkl@pengutronix.de>, Cc: U Bhaskar-B22300 <B22300@freescale.com> Cc: Scott Wood <scottwood@freescale.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: socketcan-core@lists.berlios.de, Cc: netdev@vger.kernel.org, Cc: PPC list <linuxppc-dev@lists.ozlabs.org> Cc: devicetree-discuss@lists.ozlabs.org Signed-off-by: David S. Miller <davem@davemloft.net>
holt@sgi.com [Tue, 16 Aug 2011 17:32:22 +0000 (17:32 +0000)]
flexcan: Add of_match to platform_device definition.
On powerpc, the OpenFirmware devices are not matched without specifying
an of_match array. Introduce that array as that is used for matching
on the Freescale P1010 processor.
Signed-off-by: Robin Holt <holt@sgi.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Wolfgang Grandegger <wg@grandegger.com> Cc: U Bhaskar-B22300 <B22300@freescale.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: socketcan-core@lists.berlios.de Cc: netdev@vger.kernel.org Cc: PPC list <linuxppc-dev@lists.ozlabs.org> Cc: devicetree-discuss@lists.ozlabs.org Signed-off-by: David S. Miller <davem@davemloft.net>
holt@sgi.com [Tue, 16 Aug 2011 17:32:21 +0000 (17:32 +0000)]
flexcan: Fix up fsl-flexcan device tree binding.
This patch cleans up the documentation of the device-tree binding for
the Flexcan devices on Freescale's PowerPC and ARM cores. Extra
properties are not used by the driver so we are removing them.
Signed-off-by: Robin Holt <holt@sgi.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>, Acked-by: Wolfgang Grandegger <wg@grandegger.com>, Cc: U Bhaskar-B22300 <B22300@freescale.com> Cc: Scott Wood <scottwood@freescale.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: socketcan-core@lists.berlios.de, Cc: netdev@vger.kernel.org, Cc: PPC list <linuxppc-dev@lists.ozlabs.org> Cc: devicetree-discuss@lists.ozlabs.org Signed-off-by: David S. Miller <davem@davemloft.net>
holt@sgi.com [Tue, 16 Aug 2011 17:32:20 +0000 (17:32 +0000)]
flexcan: Abstract off read/write for big/little endian.
Make flexcan driver handle register reads in the appropriate endianess.
This was a basic search and replace and then define some inlines.
Signed-off-by: Robin Holt <holt@sgi.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Wolfgang Grandegger <wg@grandegger.com> Cc: U Bhaskar-B22300 <B22300@freescale.com> Cc: socketcan-core@lists.berlios.de Cc: netdev@vger.kernel.org Cc: PPC list <linuxppc-dev@lists.ozlabs.org> Signed-off-by: David S. Miller <davem@davemloft.net>
holt@sgi.com [Tue, 16 Aug 2011 17:32:19 +0000 (17:32 +0000)]
flexcan: Remove #include <mach/clock.h>
powerpc does not have a mach-####/clock.h. When testing, I found neither
arm nor powerpc needed the mach/clock.h at all so I removed it.
Signed-off-by: Robin Holt <holt@sgi.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Wolfgang Grandegger <wg@grandegger.com> Cc: U Bhaskar-B22300 <B22300@freescale.com> Cc: socketcan-core@lists.berlios.de Cc: netdev@vger.kernel.org Cc: PPC list <linuxppc-dev@lists.ozlabs.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Tue, 16 Aug 2011 06:29:00 +0000 (06:29 +0000)]
net: introduce IFF_UNICAST_FLT private flag
Use IFF_UNICAST_FTL to find out if driver handles unicast address
filtering. In case it does not, promisc mode is entered.
Patch also fixes following drivers:
stmmac, niu: support uc filtering and yet it propagated
ndo_set_multicast_list
bna, benet, pxa168_eth, ks8851, ks8851_mll, ksz884x : has set
ndo_set_rx_mode but do not support uc filtering
Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Mon, 15 Aug 2011 22:33:34 +0000 (22:33 +0000)]
via-velocity: remove non-tagged packet filtering
It's undesired to filter untagged packets at any time. So simply remove this.
Reported-by: Stephan Bärwolf <stephan.baerwolf@tu-ilmenau.de> Tested-by: Stephan Bärwolf <stephan.baerwolf@tu-ilmenau.de> Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Pan(潘卫平) [Mon, 15 Aug 2011 15:57:35 +0000 (15:57 +0000)]
bonding:reset backup and inactive flag of slave
Eduard Sinelnikov (eduard.sinelnikov@gmail.com) found that if we change
bonding mode from active backup to round robin, some slaves are still keeping
"backup", and won't transmit packets.
As Jay Vosburgh(fubar@us.ibm.com) pointed out that we can work around that by
removing the bond_is_active_slave() check, because the "backup" flag is only
meaningful for active backup mode.
But if we just simply ignore the bond_is_active_slave() check,
the transmission will work fine, but we can't maintain the correct value of
"backup" flag for each slaves, though it is meaningless for other mode than
active backup.
I'd like to reset "backup" and "inactive" flag in bond_open,
thus we can keep the correct value of them.
As for bond_is_active_slave(), I'd like to prepare another patch to handle it.
V2:
Use C style comment.
Move read_lock(&bond->curr_slave_lock).
Replace restore with reset, for active backup mode, it means "restore",
but for other modes, it means "reset".
Signed-off-by: Weiping Pan <panweiping3@gmail.com> Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Sun, 14 Aug 2011 19:46:29 +0000 (19:46 +0000)]
rps: Inspect GRE encapsulated packets to get flow hash
Crack open GRE packets in __skb_get_rxhash to compute 4-tuple hash on
in encapsulated packet. Note that this is used only when the
__skb_get_rxhash is taken, in particular only when the device does
not compute provide the rxhash (ie. feature is disabled).
This was tested by creating a single GRE tunnel between two 16 core
AMD machines. 200 netperf TCP_RR streams were ran with 1 byte
request and response size.
Without patch: 157497 tps, 50/90/99% latencies 1250/1292/1364 usecs
With patch: 325896 tps, 50/90/99% latencies 603/848/1169
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Sun, 14 Aug 2011 19:45:55 +0000 (19:45 +0000)]
rps: Add flag to skb to indicate rxhash is based on L4 tuple
The l4_rxhash flag was added to the skb structure to indicate
that the rxhash value was computed over the 4 tuple for the
packet which includes the port information in the encapsulated
transport packet. This is used by the stack to preserve the
rxhash value in __skb_rx_tunnel.
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Sun, 14 Aug 2011 12:16:21 +0000 (12:16 +0000)]
bnx2x: Use pr_fmt and message logging cleanups
Add pr_fmt(fmt) KBUILD_MODNAME ": " to prefix messages with "bnx2x: ".
Remove #define DP_LEVEL and use pr_notice.
Repeating KERN_<LEVEL> isn't necessary in multi-line printks.
printk macro neatening, use fmt and ##__VA_ARGS__.
Coalesce long formats.
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Sun, 14 Aug 2011 12:16:20 +0000 (12:16 +0000)]
bnx2x: Coalesce pr_cont uses and fix DP typos
Uses of pr_cont should be avoided where reasonably possible
because they can be interleaved by other threads and processes.
Coalesce pr_cont uses.
Fix typos, duplicated words and spacing in DP uses caused
by split multi-line formats. Coalesce some of these
split formats. Add missing terminating newlines to DP uses.
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Mon, 15 Aug 2011 14:07:47 +0000 (14:07 +0000)]
ethtool: Correct description of 'max_coalesced_frames' fields
The current descriptions state that these fields specify 'How many
packets to delay ... after a packet ...' which implies that the
hardware should wait for (max_coalesced_frames + 1) completions before
generating an interrupt. It is also stated that setting both this
field and the corresponding 'coalesce_usecs' field to 0 is invalid.
Together, this implies that the hardware must always be configured
to delay a completion IRQ for at least 1 usec or 1 more completion.
I believe that the addition of 1 is not intended, and David Miller
confirms that the original implementation (in tg3) does not do this.
Clarify the descriptions of these fields to avoid this interpretation.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sit tunnels: propagate IPv6 transport class to IPv4 Type of Service
sit tunnels (IPv6 tunnel over IPv4) do not implement the "tos inherit"
case to copy the IPv6 transport class byte from the inner packet to
the IPv4 type of service byte in the outer packet. By contrast, ipip
tunnels and GRE tunnels do.
This patch, adapted from the similar code in net/ipv4/ipip.c and
net/ipv4/ip_gre.c, implements that.
This patch applies to 3.0.1, and has been tested on that version.
Signed-off-by: Lionel Elie Mamane <lionel@mamane.lu> Signed-off-by: David S. Miller <davem@davemloft.net>
Wang Shaoyan [Thu, 11 Aug 2011 17:07:25 +0000 (17:07 +0000)]
gianfar: reduce stack usage in gianfar_ethtool.c
drivers/net/gianfar_ethtool.c:765: warning: the frame size of 2048 bytes is larger than 1024 bytes
Signed-off-by: Wang Shaoyan <wangshaoyan.pt@taobao.com> Reviewed-and-tested-by: Sebastian Pöhn <sebastian.poehn@belden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
error: expected '=', ',', ';', 'asm' or '__attribute__' before 'sh_eth_interrupt'
error: implicit declaration of function 'request_irq'
error: 'sh_eth_interrupt' undeclared (first use in this function)
error: (Each undeclared identifier is reported only once
drivers/net/sh_eth.c:1386: error: for each function it appears in.)
error: 'IRQF_SHARED' undeclared (first use in this function)
error: implicit declaration of function 'free_irq'
Signed-off-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> CC: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In this code, the failure_cleanup label calls the function
plx_pci_del_card, which frees everything in the card->net_dev array. dev
is placed in this array immediately after allocation, so the two subsequent
jumps to failure_cleanup should not also call free_sja1000dev, but the
second one does.
If plx_pci_check_sja1000 fails, then free_sja1000dev is also called on
dev. Because dev is already in the card->net_dev array, this implies that
when plx_pci_del_card is later called, it may get freed again. So that
entry is reset to NULL after the free.
Finally, if there is a problem with one channel, there will be a hole in the
array. card->channels counts the number of channels that have succeeded,
and does not keep track of the index of the largest element in the array
that is valid. So the loop in plx_pci_del_card is changed to go up to
PLX_PCI_MAX_CHAN, which is only 2.
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
Josh Boyer [Mon, 8 Aug 2011 02:34:07 +0000 (02:34 +0000)]
usbnet/cdc_ncm: Don't use stack variables for DMA
The cdc_ncm driver still has a few places where stack variables are
passed to the cdc_ncm_do_request function. This triggers a stack trace in
lib/dma-debug.c if the CONFIG_DEBUG_DMA_API option is set.
Adjust these calls to pass parameters that have been allocated with
kzalloc.
Signed-off-by: Josh Boyer <jwboyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>