Cyrill Gorcunov [Wed, 21 Jan 2009 23:55:40 +0000 (15:55 -0800)]
net: pppoe,pppol2tp - register channels with explicit net
In PPPo[E|L2TP] we could explicitly point which net namespace
we're going to use for channels - make it so.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
- Each tunnel and appropriate lock are inside own namespace now.
- pppox code allows to create per-namespace sockets for
both PX_PROTO_OE and PX_PROTO_OL2TP protocols. Actually since
now pppox_create support net-namespaces new PPPo... protocols
(if they ever will be) should support net-namespace too otherwise
explicit check for &init_net would be needed.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
- each net-namespace for pppoe module is having own
hash table and appropriate locks wich are allocated
at time of namespace intialization. It requires about
140 bytes of memory for every new namespace but such
approach allow us to escape from hash chains growing
and additional lock contends (especially in SMP environment).
- pppox code allows to create per-namespace sockets for
PX_PROTO_OE protocol only (since at this moment support
for pppol2tp net-namespace is not implemented yet).
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 21 Jan 2009 22:42:28 +0000 (14:42 -0800)]
igb: make certain to power on optics for 82576 fiber nics
It appears that a step was missed in the initialization of 82576 fiber nics
that resulted in it not powering on the optics.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 21 Jan 2009 22:42:07 +0000 (14:42 -0800)]
igb: igb should not flag lltx
Igb has flags enabling lltx but this is a holdover from the earlier
e1000 driver which the igb driver was based off of.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Harvey Harrison [Tue, 20 Jan 2009 01:27:06 +0000 (17:27 -0800)]
bnx2: annotate bp->phy_lock functions
It looks like the locking is OK as the locks were being taken before the
various phy setup functions, add the annotations as they release and
reacquire the phy_lock.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 21 Jan 2009 22:39:13 +0000 (14:39 -0800)]
cxgb3: Replace LRO with GRO
This patch makes cxgb3 invoke the GRO hooks instead of LRO. As
GRO has a compatible external interface to LRO this is a very
straightforward replacement.
I've kept the ioctl controls for per-queue LRO switches. However,
we should not encourage anyone to use these.
Because of that, I've also kept the skb construction code in
cxgb3. Hopefully we can phase out those per-queue switches
and then kill this too.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mark McLoughlin [Tue, 20 Jan 2009 01:09:49 +0000 (17:09 -0800)]
virtio_net: add link status handling
Allow the host to inform us that the link is down by adding
a VIRTIO_NET_F_STATUS which indicates that device status is
available in virtio_net config.
This is currently useful for simulating link down conditions
(e.g. using proposed qemu 'set_link' monitor command) but
would also be needed if we were to support device assignment
via virtio.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (added future masking) Signed-off-by: David S. Miller <davem@davemloft.net>
Evgeniy Polyakov [Tue, 20 Jan 2009 00:46:02 +0000 (16:46 -0800)]
inet: Allowing more than 64k connections and heavily optimize bind(0) time.
With simple extension to the binding mechanism, which allows to bind more
than 64k sockets (or smaller amount, depending on sysctl parameters),
we have to traverse the whole bind hash table to find out empty bucket.
And while it is not a problem for example for 32k connections, bind()
completion time grows exponentially (since after each successful binding
we have to traverse one bucket more to find empty one) even if we start
each time from random offset inside the hash table.
So, when hash table is full, and we want to add another socket, we have
to traverse the whole table no matter what, so effectivelly this will be
the worst case performance and it will be constant.
Attached picture shows bind() time depending on number of already bound
sockets.
Green area corresponds to the usual binding to zero port process, which
turns on kernel port selection as described above. Red area is the bind
process, when number of reuse-bound sockets is not limited by 64k (or
sysctl parameters). The same exponential growth (hidden by the green
area) before number of ports reaches sysctl limit.
At this time bind hash table has exactly one reuse-enbaled socket in a
bucket, but it is possible that they have different addresses. Actually
kernel selects the first port to try randomly, so at the beginning bind
will take roughly constant time, but with time number of port to check
after random start will increase. And that will have exponential growth,
but because of above random selection, not every next port selection
will necessary take longer time than previous. So we have to consider
the area below in the graph (if you could zoom it, you could find, that
there are many different times placed there), so area can hide another.
Blue area corresponds to the port selection optimization.
This is rather simple design approach: hashtable now maintains (unprecise
and racely updated) number of currently bound sockets, and when number
of such sockets becomes greater than predefined value (I use maximum
port range defined by sysctls), we stop traversing the whole bind hash
table and just stop at first matching bucket after random start. Above
limit roughly corresponds to the case, when bind hash table is full and
we turned on mechanism of allowing to bind more reuse-enabled sockets,
so it does not change behaviour of other sockets.
Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Tested-by: Denys Fedoryschenko <denys@visp.net.lb> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 19 Jan 2009 23:20:57 +0000 (15:20 -0800)]
igb: Replace LRO with GRO
This patch makes igb invoke the GRO hooks instead of LRO. As
GRO has a compatible external interface to LRO this is a very
straightforward replacement.
Three things of note:
1) I've kept the LRO Kconfig option until we decide to enable
GRO across the board at which point it can also be killed.
2) The poll_controller stuff is broken in igb as it tries to do
the same work as the normal poll routine. Since poll_controller
can be called in the middle of a poll, this can't be good.
I noticed this because poll_controller can invoke the GRO hooks
without flushing held GRO packets.
However, this should be harmless (assuming the poll_controller
bug above doesn't kill you first :) since the next ->poll will
clear the backlog. The only time when we'll have a problem is
if we're already executing the GRO code on the same ring, but
that's no worse than what happens now.
3) I kept the ip_summed check before calling GRO so that we're
on par with previous behaviour.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Harvey Harrison [Mon, 19 Jan 2009 06:03:01 +0000 (22:03 -0800)]
typhoon: replace users of __constant_{endian}
The base versions handle constant folding just fine, use them
directly.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Acked-by: David Dillow <dave@thedillows.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 19 Jan 2009 05:50:16 +0000 (21:50 -0800)]
sfc: Replace LRO with GRO
This patch makes sfc invoke the GRO hooks instead of LRO. As
GRO has a compatible external interface to LRO this is a very
straightforward replacement.
Everything should appear identical to the user except that the
offload is now controlled by the GRO ethtool option instead of
LRO. I've kept the lro module parameter as is since that's for
compatibility only.
I have eliminated efx_rx_mk_skb as the GRO layer can take care
of all packets regardless of whether GRO is enabled or not.
So the only case where we don't call GRO is if the packet checksum
is absent. This is to keep the behaviour changes of the patch to
a minimum.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 19 Jan 2009 05:49:45 +0000 (21:49 -0800)]
ixgbe: Replace LRO with GRO
This patch makes ixgbe invoke the GRO hooks instead of LRO. As
GRO has a compatible external interface to LRO this is a very
straightforward replacement.
As GRO uses the napi structure to track the held packets, I've
modified the code paths involved to pass that along.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Fri, 16 Jan 2009 23:36:33 +0000 (23:36 +0000)]
dccp: Debugging functions for feature negotiation
Since all feature-negotiation processing now takes place in feat.c,
functions for producing verbose debugging output are concentrated
there.
New functions to print out values, entry records, and options are
provided, and also a macro is defined to not always have the function
name in the output line.
Thanks a lot to Wei Yongjun and Giuseppe Galeota for help and
discussion with an earlier revision of this patch.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Fri, 16 Jan 2009 23:36:32 +0000 (23:36 +0000)]
dccp: Initialisation and type-checking of feature sysctls
This patch takes care of initialising and type-checking sysctls
related to feature negotiation. Type checking is important since some
of the sysctls now directly impact the feature-negotiation process.
The sysctls are initialised with the known default values for each
feature. For the type-checking the value constraints from RFC 4340
are used:
* Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes),
tested and confirmed that it works up to 4294967295 - for Gbps speed;
* Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer);
* CCIDs are between 0 .. 255;
* request_retries, retries1, retries2 also between 0..255 for good measure;
* tx_qlen is checked to be non-negative;
* sync_ratelimit remains as before.
Notes:
------
1. Die s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c.
2. As pointed out by Arnaldo, the pattern of type-checking repeats itself in
other places, sometimes with exactly the same kind of definitions (e.g.
"static int zero;"). It may be a good idea (kernel janitors?) to consolidate
type checking. For the sake of keeping the changeset small and in order not
to affect other subsystems, I have not strived to generalise here.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Fri, 16 Jan 2009 23:36:31 +0000 (23:36 +0000)]
dccp: Implement both feature-local and feature-remote Sequence Window feature
This adds full support for local/remote Sequence Window feature, from which the
* sequence-number-validity (W) and
* acknowledgment-number-validity (W') windows
derive as specified in RFC 4340, 7.5.3.
Specifically, the following is contained in this patch:
* integrated new socket fields into dccp_sk;
* updated the update_gsr/gss routines with regard to these fields;
* updated handler code: the Sequence Window feature is located at the TX side,
so the local feature is meant if the handler-rx flag is false;
* the initialisation of `rcv_wnd' in reqsk is removed, since
- rcv_wnd is not used by the code anywhere;
- sequence number checks are not done in the LISTEN state (cf. 7.5.3);
- dccp_check_req checks the Ack number validity more rigorously;
* the `struct dccp_minisock' became empty and is now removed.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
Gerrit Renker [Fri, 16 Jan 2009 23:36:30 +0000 (23:36 +0000)]
dccp: Initialisation framework for feature negotiation
This initialises feature negotiation from two tables, which are in
turn are initialised from sysctls.
As a novel feature, specifics of the implementation (e.g. that short
seqnos and ECN are not yet available) are advertised for robustness.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Tue, 20 Jan 2009 00:43:59 +0000 (16:43 -0800)]
net: Remove redundant NAPI functions
Following the removal of the unused struct net_device * parameter from
the NAPI functions named *netif_rx_* in commit 908a7a1, they are
exactly equivalent to the corresponding *napi_* functions and are
therefore redundant.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Thomas Sailer <t.sailer@alumni.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Thomas Sailer <t.sailer@alumni.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Thomas Sailer <t.sailer@alumni.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Thomas Sailer <t.sailer@alumni.ethz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Christian Eggers [Wed, 21 Jan 2009 20:56:24 +0000 (12:56 -0800)]
usb/mcs7830: Don't use buffers from stack for USB transfers
mcs7830_set_reg() and mcs7830_get_reg() are called with buffers
from stack which must not be used directly for USB transfers.
This causes corruption of the stack particulary on non x86
architectures because DMA may be used for these transfers.
Signed-off-by: Christian Eggers <christian.eggers@kathrein.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
usbnet: allow type check of devdbg arguments in non-debug build
Improve usbnet's devdbg to always type-check diagnostic arguments,
like dev_dbg (device.h). This makes no change to the resulting size of
usbnet modules.
This patch also removes an #ifdef DEBUG directive from rndis_wlan so
it's devdbg statements are always type-checked at compile time.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Patrick McHardy [Wed, 21 Jan 2009 20:19:49 +0000 (12:19 -0800)]
netfilter: ctnetlink: fix scheduling while atomic
Caused by call to request_module() while holding nf_conntrack_lock.
Reported-and-tested-by: Kövesdi György <kgy@teledigit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sat, 17 Jan 2009 19:48:13 +0000 (19:48 +0000)]
gro: Fix merging of paged packets
The previous fix to paged packets broke the merging because it
reset the skb->len before we added it to the merged packet. This
wasn't detected because it simply resulted in the truncation of
the packet while the missing bit is subsequently retransmitted.
The fix is to store skb->len before we clobber it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>