git.karo-electronics.de Git - linux-beck.git/log

de2104x: remove experimental status

It should be ready after 8 years...remove the experimental dependency.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de2104x: disable media debug messages by default

Print media debug messages only when HW debug is enabled.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

myri10ge: DCA update (resubmit)

This patch contains the following DCA improvements to myri10ge:

1) Finally move myri10ge to use dca3 API

2) Disable PCIe relaxed ordering when enabling DCA on
     myri10ge.  This provides a performance boost on Nehalem
     based Xeons

3) Make sure to properly initialize NIC's DCA state when it is enabled,
     rather than giving the NIC a bogus tag (0) and waiting for
     the first received packet to trigger an update.  Not using a
     real tag can cause hardware exceptions on some motherboards
     when a CPU socket is empty.

3) Always update the cached CPU when our interrupt affinity changes
     so as to avoid excessive calls to dca3_get_tag()

Signed-off-by: Andrew Gallatin <gallatin@myri.com>
Signed-off-by: Loic Prylli <loic@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Moved enabling of MSI to the bnx2x_set_num_queues()

Moved enabling of MSI to the bnx2x_set_num_queues() - the same functions that
handles the initialization of the MSI-X.

From: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: tcp_enter_quickack_mode can be static

Function only used in tcp_input.c

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

arp: remove unnecessary export of arp_broken_ops

arp_broken_ops is only used in arp.c

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: rename netdev rx_queue to ingress_queue

There is some confusion with rx_queue name after RPS, and net drivers
private rx_queue fields.

I suggest to rename "struct net_device"->rx_queue to ingress_queue.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6tnl: percpu stats accounting

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipip: enable lockless xmits

IPIP tunnels can benefit from lockless xmits, using NETIF_F_LLTX

Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one ipip tunnel (size:200 bytes per frame)

Before patch :
real 2m53.321s
user 0m10.277s
sys 46m0.597s

After patch:
real 0m32.063s
user 0m9.237s
sys 8m16.255s

Last problem to solve is the contention on dst :

16118.00 28.3% __ip_route_output_key         vmlinux
6135.00 10.8% dst_release                   vmlinux
3220.00  5.6% ip_finish_output              vmlinux
2149.00  3.8% ip_route_output_flow          vmlinux
1575.00  2.8% ip_append_data                vmlinux
1481.00  2.6% ip_push_pending_frames        vmlinux
1349.00  2.4% __xfrm_lookup                 vmlinux
1216.00  2.1% csum_partial_copy_generic     vmlinux
1208.00  2.1% udp_sendmsg                   vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip_gre: lockless xmit

GRE tunnels can benefit from lockless xmits, using NETIF_F_LLTX

Note: If tunnels are created with the "oseq" option, LLTX is not
enabled :

Even using an atomic_t o_seq, we would increase chance for packets being
out of order at receiver.

Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one gre tunnel (size:200 bytes per frame)

Before patch :
real 3m0.094s
user 0m9.365s
sys 47m50.103s

After patch:
real 0m29.756s
user 0m11.097s
sys 7m33.012s

Last problem to solve is the contention on dst :

38660.00 21.4% __ip_route_output_key          vmlinux
20786.00 11.5% dst_release                    vmlinux
14191.00  7.8% __xfrm_lookup                  vmlinux
12410.00  6.9% ip_finish_output               vmlinux
4540.00  2.5% ip_push_pending_frames         vmlinux
4427.00  2.4% ip_append_data                 vmlinux
4265.00  2.4% __alloc_skb                    vmlinux
4140.00  2.3% __ip_local_out                 vmlinux
3991.00  2.2% dev_queue_xmit                 vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sit: enable lockless xmits

SIT tunnels can benefit from lockless xmits, using NETIF_F_LLTX

Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
10000000 UDP frames via one sit tunnel (size:220 bytes per frame)

Before patch :

real 3m15.399s
user 0m9.185s
sys 51m55.403s

75029.00 87.5% _raw_spin_lock            vmlinux
1090.00  1.3% dst_release               vmlinux
  902.00  1.1% dev_queue_xmit            vmlinux
  627.00  0.7% sock_wfree                vmlinux
  613.00  0.7% ip6_push_pending_frames   ipv6.ko
  505.00  0.6% __ip_route_output_key     vmlinux

After patch:

real 1m1.387s
user 0m12.489s
sys 15m58.868s

28239.00 23.3% dst_release               vmlinux
13570.00 11.2% ip6_push_pending_frames   ipv6.ko
13118.00 10.8% ip6_append_data           ipv6.ko
7995.00  6.6% __ip_route_output_key     vmlinux
7924.00  6.5% sk_dst_check              vmlinux
5015.00  4.1% udpv6_sendmsg             ipv6.ko
3594.00  3.0% sock_alloc_send_pskb      vmlinux
3135.00  2.6% sock_wfree                vmlinux
3055.00  2.5% ip6_sk_dst_lookup         ipv6.ko
2473.00  2.0% ip_finish_output          vmlinux

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sit: fix percpu stats accounting

commit 15fc1f7056ebd (sit: percpu stats accounting) forgot the fallback
tunnel case (sit0), and can crash pretty fast.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipip: fix percpu stats accounting

commit 3c97af99a5aa1 (ipip: percpu stats accounting) forgot the fallback
tunnel case (tunl0), and can crash pretty fast.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dummy: percpu stats and lockless xmit

Converts dummy network device driver to :

- percpu stats

- 64bit stats

- lockless xmit (NETIF_F_LLTX)

- performance features added (NETIF_F_SG | NETIF_F_FRAGLIST |
NETIF_F_TSO | NETIF_F_NO_CSUM | NETIF_F_HIGHDMA)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add a recursion limit in xmit path

As tunnel devices are going to be lockless, we need to make sure a
misconfigured machine wont enter an infinite loop.

Add a percpu variable, and limit to three the number of stacked xmits.

Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Implement Any-IP support for IPv6.

AnyIP is the capability to receive packets and establish incoming
connections on IPs we have not explicitly configured on the machine.

An example use case is to configure a machine to accept all incoming
traffic on eth0, and leave the policy of whether traffic for a given IP
should be delivered to the machine up to the load balancer.

Can be setup as follows:
ip -6 rule from all iif eth0 lookup 200
ip -6 route add local default dev lo table 200
(in this case for all IPv6 addresses)

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Allow configuring subnets as local addresses

This patch allows a host to be configured to respond to any address in
a specified range as if it were local, without actually needing to
configure the address on an interface.  This is done through routing
table configuration.  For instance, to configure a host to respond
to any address in 10.1/16 received on eth0 as a local address we can do:

ip rule add from all iif eth0 lookup 200
ip route add local 10.1/16 dev lo proto kernel scope host src 127.0.0.1 table 200

This host is now reachable by any 10.1/16 address (route lookup on
input for packets received on eth0 can find the route).  On output, the
rule will not be matched so that this host can still send packets to
10.1/16 (not sent on loopback).  Presumably, external routing can be
configured to make sense out of this.

To make this work, we needed to modify the logic in finding the
interface which is assigned a given source address for output
(dev_ip_find).  We perform a normal fib_lookup instead of just a
lookup on the local table, and in the lookup we ignore the input
interface for matching.

This patch is useful to implement IP-anycast for subnets of virtual
addresses.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pch_gbe: add header files

Fix build errors, more header files needed.

drivers/net/pch_gbe/pch_gbe_main.c:965: error: implicit declaration of function 'tcp_hdr'
drivers/net/pch_gbe/pch_gbe_main.c:965: error: invalid type argument of '->' (have 'int')
drivers/net/pch_gbe/pch_gbe_main.c:968: error: invalid type argument of '->' (have 'int')
drivers/net/pch_gbe/pch_gbe_main.c:976: error: implicit declaration of function 'udp_hdr'
drivers/net/pch_gbe/pch_gbe_main.c:976: error: invalid type argument of '->' (have 'int')
drivers/net/pch_gbe/pch_gbe_main.c:980: error: invalid type argument of '->' (have 'int')

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: Update Phonet doc for Pipe Controller implementation

Updates the Phonet document with description related to Pipe controller
implementation

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tg3: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8021q: Use netif_copy_real_num_queues() to set queue counts

This covers RX if necessary, as well as TX.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

niu: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

myri10ge: Use netif_set_real_num_{rx, tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mv643xx_eth: Use netif_set_real_num_{rx, tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: Use netif_set_real_num_{rx, tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

gianfar: Use netif_set_real_num_rx_queues()

Do not set num_tx_queues or real_num_tx_queues, since
alloc_etherdev_mq() does that.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4vf: Use netif_set_real_num_{rx, tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb3: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2: Use netif_set_real_num_{rx,tx}_queues()

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add netif_copy_real_num_queues() for use by virtual net drivers

This sets the active numbers of queues on a net device to match another.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Allow changing number of RX queues after device allocation

For RPS, we create a kobject for each RX queue based on the number of
queues passed to alloc_netdev_mq(). However, drivers generally do not
determine the numbers of hardware queues to use until much later, so
this usually represents the maximum number the driver may use and not
the actual number in use.

For TX queues, drivers can update the actual number using
netif_set_real_num_tx_queues(). Add a corresponding function for RX
queues, netif_set_real_num_rx_queues().

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sk_{detach|attach}_filter() rcu fixes

sk_attach_filter() and sk_detach_filter() are run with socket locked.

Use the appropriate rcu_dereference_protected() instead of blocking BH,
and rcu_dereference_bh().
There is no point adding BH prevention and memory barrier.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fib: use atomic_inc_not_zero() in fib_rules_lookup

It seems we dont use appropriate refcount increment in an
rcu_read_lock() protected section.

fib_rule_get() might increment a null refcount and bad things could
happen.

While fib_nl_delrule() respects an rcu grace period before calling
fib_rule_put(), fib_rules_cleanup_ops() calls fib_rule_put() without a
grace period.

Note : after this patch, we might avoid the synchronize_rcu() call done
in fib_nl_delrule()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sit: percpu stats accounting

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipip: percpu stats accounting

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip_gre: percpu stats accounting

Le lundi 27 septembre 2010 à 14:29 +0100, Ben Hutchings a écrit :

> > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> > index 5d6ddcb..de39b22 100644
> > --- a/net/ipv4/ip_gre.c
> > +++ b/net/ipv4/ip_gre.c
> [...]
> > @@ -377,7 +405,7 @@ static struct ip_tunnel *ipgre_tunnel_locate(struct net *net,
> >   if (parms->name[0])
> >   strlcpy(name, parms->name, IFNAMSIZ);
> >   else
> > - sprintf(name, "gre%%d");
> > + strcpy(name, "gre%d");
> >
> >   dev = alloc_netdev(sizeof(*t), name, ipgre_tunnel_setup);
> >   if (!dev)
> [...]
>
> This is a valid fix, but doesn't belong in this patch!
>

Sorry ? It was not a fix, but at most a cleanup ;)

Anyway I forgot the gretap case...

[PATCH 2/4 v2] ip_gre: percpu stats accounting

Maintain per_cpu tx_bytes, tx_packets, rx_bytes, rx_packets.

Other seldom used fields are kept in netdev->stats structure, possibly
unsafe.

This is a preliminary work to support lockless transmit path, and
correct RX stats, that are already unsafe.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tunnels: prepare percpu accounting

Tunnels are going to use percpu for their accounting.

They are going to use a new tstats field in net_device.

skb_tunnel_rx() is changed to be a wrapper around __skb_tunnel_rx()

IPTUNNEL_XMIT() is changed to be a wrapper around __IPTUNNEL_XMIT()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: use this_cpu_ptr() in vlan_skb_recv()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Phonet: Implement Pipe Controller to support Nokia Slim Modems

Phonet stack assumes the presence of Pipe Controller, either in Modem or
on Application Processing Engine user-space for the Pipe data.
Nokia Slim Modems like WG2.5 used in ST-Ericsson U8500 platform do not
implement Pipe controller in them.
This patch adds Pipe Controller implemenation to Phonet stack to support
Pipe data over Phonet stack for Nokia Slim Modems.

Signed-off-by: Kumar Sanghvi <kumar.sanghvi@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

Conflicts:
drivers/net/qlcnic/qlcnic_init.c
net/ipv4/ip_output.c

net: r6040: store BIOS default MAC in perm_add

Signed-off-by: Otavio Salvador <otavio@ossystems.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: add a missing unregister_pernet_subsys call

Clean up a missing exit path in the ipv6 module init routines.  In
addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys
for the ipv6_addr_label_ops structure.  But if module loading fails, or if the
ipv6 module is removed, there is no corresponding unregister_pernet_subsys call,
which leaves a now-bogus address on the pernet_list, leading to oopses in
subsequent registrations.  This patch cleans up both the failed load path and
the unload path.  Tested by myself with good results.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
include/net/addrconf.h |    1 +
net/ipv6/addrconf.c    |   11 ++++++++---
net/ipv6/addrlabel.c   |    5 +++++
3 files changed, 14 insertions(+), 3 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>

net: loopback driver cleanup

loopback driver uses dev->ml_priv to store its percpu stats pointer.
It uses ugly casts "(void __percpu __force *)" to shut up sparse
complains.

Define an union to better document we use ml_priv in loopback driver and
define a lstats field with appropriate types.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix rcu use in ip_route_output_slow

__in_dev_get_rtnl(dev_out) is called while RTNL is not held, thus
triggers a lockdep fault.

At this point, we only perform a raw test of dev_out->ip_ptr being NULL,
we dont need to make sure ip_ptr cant changed right after.

We can use rcu_dereference_raw() for this.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rps: allocate rx queues in register_netdevice only

Instead of having two places were we allocate dev->_rx, introduce
netif_alloc_rx_queues() helper and call it only from
register_netdevice(), not from alloc_netdev_mq()

Goal is to let drivers change dev->num_rx_queues after allocating netdev
and before registering it.

This also removes a lot of ifdefs in net/core/dev.c

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390: use free_netdev(netdev) instead of kfree()

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)

Signed-off-by: David S. Miller <davem@davemloft.net>

sgiseeq: use free_netdev(netdev) instead of kfree()

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)

Signed-off-by: David S. Miller <davem@davemloft.net>

rionet: use free_netdev(netdev) instead of kfree()

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)

Signed-off-by: David S. Miller <davem@davemloft.net>

ibm_newemac: use free_netdev(netdev) instead of kfree()

Freeing netdev without free_netdev() leads to net, tx leaks.
I might lead to dereferencing freed pointer.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

@@
struct net_device* dev;
@@

-kfree(dev)
+free_netdev(dev)

Signed-off-by: David S. Miller <davem@davemloft.net>

net: update SOCK_MIN_RCVBUF

SOCK_MIN_RCVBUF current value is 256 bytes

It doesnt permit to receive the smallest possible frame, considering
socket sk_rmem_alloc/sk_rcvbuf account skb truesizes. On 64bit arches,
sizeof(struct sk_buff) is 240 bytes. Add the typical 64 bytes of
headroom, and we go over the limit.

With old kernels and 32bit arches, we were under the limit, if netdriver
was doing copybreak.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

smsc911x: Add MODULE_ALIAS()

This enables auto loading for the smsc911x ethernet driver.

Signed-off-by: Vincent Stehlé <vincent.stehle@laposte.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: reset skb queue mapping when rx'ing over tunnel

Reset queue mapping when an skb is reentering the stack via a tunnel.
On second pass, the queue mapping from the original device is no
longer valid.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net: return operator cleanup

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skb_frag_t can be smaller on small arches

On 32bit arches, if PAGE_SIZE is smaller than 65536, we can use 16bit
offset and size fields. This patch saves 72 bytes per skb on i386, or
128 bytes after rounding.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

br2684: fix scheduling while atomic

You can't call atomic_notifier_chain_unregister() while in atomic context.

Fix, call un/register_atmdevice_notifier in module __init and __exit.

Bug report:
http://comments.gmane.org/gmane.linux.network/172603

Reported-by: Mikko Vinni <mmvinni@yahoo.com>
Tested-by: Mikko Vinni <mmvinni@yahoo.com>
Signed-off-by: Karl Hiramoto <karl@hiramoto.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: propagate NETIF_F_HIGHDMA to vlans

Automatically allows vlans to get NETIF_F_HIGHDMA if underlying device
supports it.

On 32bit arches (and more precisely if CONFIG_HIGHMEM is enabled), it
can help to reduce cost of illegal_highdma() and __skb_linearize()
calls.

Tested on tg3 , bnx2, bonding, this worked very well.

This is a generalization of a patch provided by Yi Zou & Jeff Kirsher.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de2104x: fix TP link detection

Compex FreedomLine 32 PnP-PCI2 cards have only TP and BNC connectors but the
SROM contains AUI port too. When TP loses link, the driver switches to
non-existing AUI port (which reports that carrier is always present).

Connecting TP back generates LinkPass interrupt but de_media_interrupt() is
broken - it only updates the link state of currently connected media, ignoring
the fact that LinkPass and LinkFail bits of MacStatus register belong to the
TP port only (the chip documentation says that).

This patch changes de_media_interrupt() to switch media to TP when link goes
up (and media type is not locked) and also to update the link state only when
the TP port is used.

Also the NonselPortActive (and also SelPortActive) bits of SIAStatus register
need to be cleared (by writing 1) after reading or they're useless.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de2104x: fix power management

At least my 21041 cards come out of suspend with bus mastering disabled so
they did not work after resume(no data transferred).
After adding pci_set_master(), the driver oopsed immediately on resume -
because de_clean_rings() is called on suspend but de_init_rings() call
was missing in resume.

Also disable link (reset SIA) before sleep (de4x5 does this too).

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbevf: Refactor ring parameter re-size

The function to resize the Tx/Rx rings had the potential to
dereference a NULL pointer and the code would attempt to resize
the Tx ring even if the Rx ring allocation had failed. This
would cause some confusion in the return code semantics. Fixed
up to just unwind the allocations if any of them fail and return
an error.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

de2104x: disable autonegotiation on broken hardware

At least on older 21041-AA chips (mine is rev. 11), TP duplex autonegotiation
causes the card not to work at all (link is up but no packets are transmitted).

de4x5 disables autonegotiation completely. But it seems to work on newer
(21041-PA rev. 21) so disable it only on rev<20 chips.

Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix a lockdep splat

We have for each socket :

One spinlock (sk_slock.slock)
One rwlock (sk_callback_lock)

Possible scenarios are :

(A) (this is used in net/sunrpc/xprtsock.c)
read_lock(&sk->sk_callback_lock) (without blocking BH)
<BH>
spin_lock(&sk->sk_slock.slock);
...
read_lock(&sk->sk_callback_lock);
...

(B)
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)

(C)
spin_lock_bh(&sk->sk_slock)
...
write_lock_bh(&sk->sk_callback_lock)
stuff
write_unlock_bh(&sk->sk_callback_lock)
spin_unlock_bh(&sk->sk_slock)

This (C) case conflicts with (A) :

CPU1 [A]                         CPU2 [C]
read_lock(callback_lock)
<BH>                             spin_lock_bh(slock)
<wait to spin_lock(slock)>
                                 <wait to write_lock_bh(callback_lock)>

We have one problematic (C) use case in inet_csk_listen_stop() :

local_bh_disable();
bh_lock_sock(child); // spin_lock_bh(&sk->sk_slock)
WARN_ON(sock_owned_by_user(child));
...
sock_orphan(child); // write_lock_bh(&sk->sk_callback_lock)

lockdep is not happy with this, as reported by Tetsuo Handa

It seems only way to deal with this is to use read_lock_bh(callbacklock)
everywhere.

Thanks to Jarek for pointing a bug in my first attempt and suggesting
this solution.

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: review the wake-up support

If the PM support is available this is passed
through the platform instead to be hard-coded
in the core files.
WoL on Magic Frame can be enabled by using
the ethtool support.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add Gigabit Ethernet driver of Topcliff PCH

Signed-off-by: Masayuki Ohtake <masa-korg@dsn.okisemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip: take care of last fragment in ip_append_data

While investigating a bit, I found ip_fragment() slow path was taken
because ip_append_data() provides following layout for a send(MTU +
N*(MTU - 20)) syscall :

- one skb with 1500 (mtu) bytes
- N fragments of 1480 (mtu-20) bytes (before adding IP header)
last fragment gets 17 bytes of trail data because of following bit:

if (datalen == length + fraggap)
alloclen += rt->dst.trailer_len;

Then esp4 adds 16 bytes of data (while trailer_len is 17... hmm...
another bug ?)

In ip_fragment(), we notice last fragment is too big (1496 + 20) > mtu,
so we take slow path, building another skb chain.

In order to avoid taking slow path, we should correct ip_append_data()
to make sure last fragment has real trail space, under mtu...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: return operator cleanup

Change "return (EXPR);" to "return EXPR;"

return is not a function, parentheses are not required.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: use GRO for receive

E1000 can benefit from calling the GRO receive functions.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: fix occasional panic on unload

Net drivers in general have an issue where timers fired
by mod_timer or work threads with schedule_work are running
outside of the rtnl_lock.

With no other lock protection these routines are vulnerable
to races with driver unload or reset paths.

The longer term solution to this might be a redesign with
safer locks being taken in the driver to guarantee no
reentrance, but for now a safe and effective fix is
to take the rtnl_lock in these routines.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: use work queues

E1000 is using several timers that in a follow on patch
will need to acquire the rtnl_lock in order to be safe.

This patch moves the timer bodies into work queues which
will allow the next patch to add rtnl_lock.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000/e1000e/igb/ixgb/ixgbe: set NETIF_F_HIGHDMA for VLAN feature flags

If the netdev->features is set with NETIF_F_HIGHDMA, we should set the
corresponding netdev->vlan_features as well to allow VLAN netdev created
on top of the real netdev to be able to also benefit from HIGHDMA on 32bit
system, reducing the performance hit that is caused by __skb_linearize(),
particularly for large send. This is fixed in this patch for all Intel e1000,
e1000e, igb, ixgbe, and ixgbe drivers since this should be beneficial
to all devices supported by these drivers.

Signed-off-by: Yi Zou <yi.zou@intel.com>
Tested-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Add support for DH89xxCC

This patch adds support for the Intel(r) DH89xxCC series. The new
device will be using Intel(r) i347-AT4 and Marvell(r) M88E1322 and
M88E1112 PHYs. Support for these devices has also been added here.

Signed-off-by: Joseph Gasparakis <joseph.gasparakis@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: clear VF_PROMISC bits instead of setting all other bits

This change corrects an issue in which we were setting all flag bits except
for promisc instead of clearing the promisc bits due to the incorrect use
of an |= instead of an &=.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: 82579 do not gate auto config of PHY by hardware during nominal use

For non-managed versions of 82579, set the bit that prevents the hardware
from automatically configuring the PHY after resets only when the driver
performs a reset, clear the bit after resets. This is so the hardware can
configure the PHY automatically when the part is reset in a manner that is
not controlled by the driver (e.g. in a virtual environment via PCI FLR)
otherwise the PHY will be mis-configured causing issues such as failing to
link at 1000Mbps.
For managed versions of 82579, keep the previous behavior since the
manageability firmware will handle the PHY configuration.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: 82579 jumbo frame workaround causing CRC errors

The subject workaround was causing CRC errors due to writing the wrong
register with updates of the RCTL register. It was also found that the
workaround function which modifies the RCTL register was being called in
the middle of a read-modify-write operation of the RCTL register, so the
function call has been moved appropriately. Lastly, jumbo frames must not
be allowed when CRC stripping is disabled by a module parameter because the
workaround requires the CRC be stripped.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: 82579 unaccounted missed packets

On 82579, there is a hardware bug that can cause received packets to not
get transferred from the PHY to the MAC due to K1 (a power saving feature
of the PHY-MAC interconnect similar to ASPM L1). Since the MAC controls
the accounting of missed packets, these will go unnoticed. Workaround the
issue by setting the K1 beacon duration according to the link speed.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: 82566DC fails to get link

Two recent patches to cleanup the reset[1] and initial PHY configuration[2]
code paths for ICH/PCH devices inadvertently left out a 10msec delay and
device ID check respectively which are necessary for the 82566DC (device id
0x104b) to be configured properly, otherwise it will not get link.

[1] commit e98cac447cc1cc418dff1d610a5c79c4f2bdec7f
[2] commit 3f0c16e84438d657d29446f85fe375794a93f159

CC: stable@kernel.org
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: 82579 SMBus address and LEDs incorrect after device reset

Since the hardware is prevented from performing automatic PHY configuration
(the driver does it instead), the OEM_WRITE_ENABLE bit in the EXTCNF_CTRL
register will not get cleared preventing the SMBus address and the LED
configuration to be written to the PHY registers. On 82579, do not check
the OEM_WRITE_ENABLE bit.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: 82577/8/9 issues with device in Sx

When going to Sx, disable gigabit in PHY (e1000_oem_bits_config_ich8lan)
in addition to the MAC before configuring PHY wakeup otherwise the PHY
configuration writes might be missed. Also write the LED configuration
and SMBus address to the PHY registers (e1000_oem_bits_config_ich8lan and
e1000_write_smbus_addr, respectively). The reset is no longer needed
since re-auto-negotiation is forced in e1000_oem_bits_config_ich8lan and
leaving it in causes issues with auto-negotiating the link.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm4: strip ECN bits from tos field

otherwise ECT(1) bit will get interpreted as RTO_ONLINK
and routing will fail with XfrmOutBundleGenError.

Signed-off-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1: zero out CMB and SBM in atl1_free_ring_resources

They are allocated in atl1_setup_ring_resources, zero out the pointers
in atl1_free_ring_resources (like the other resources).

Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Acked-by: Chris Snook <chris.snook@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

atl1: fix resume

adapter->cmb.cmb is initialized when the device is opened and freed when
it's closed. Accessing it unconditionally during resume results either
in a crash (NULL pointer dereference, when the interface has not been
opened yet) or data corruption (when the interface has been used and
brought down adapter->cmb.cmb points to a deallocated memory area).

Cc: stable@kernel.org
Signed-off-by: Luca Tettamanti <kronos.it@gmail.com>
Acked-by: Chris Snook <chris.snook@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

X.25 remove bkl in poll

The x25_datagram_poll didn't add anything, removed it.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

X.25 remove bkl in getsockname

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Add support for SFE4003 board and TXC43128 PHY

This board never went into production, but some engineering samples
are in use.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Remove support for SFN4111T, SFT9001 and Falcon GMAC

SFN4111T never reached production and is not being used for internal
or customer testing.

Since we have no production Falcon boards using the SFT9001 or the
GMAC, remove support for them as well.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Move "struct net" declaration inside the __KERNEL__ macro guard

This patch reduces namespace pollution by moving the "struct net" declaration
out of the userspace-facing portion of linux/netlink.h. It has no impact on
the kernel.

(This came up because we have several C++ applications which use "net" as a
namespace name.)

Signed-off-by: Ollie Wild <aaw@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_conntrack_defrag: check socket type before touching nodefrag flag

we need to check proper socket type within ipv4_conntrack_defrag
function before referencing the nodefrag flag.

For example the tun driver receive path produces skbs with
AF_UNSPEC socket type, and so current code is causing unwanted
fragmented packets going out.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_nat_snmp: fix checksum calculation (v4)

Fix checksum calculation in nf_nat_snmp_basic.

Based on patches by Clark Wang <wtweeker@163.com> and
Stephen Hemminger <shemminger@vyatta.com>.

https://bugzilla.kernel.org/show_bug.cgi?id=17622

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: fix a race in nf_ct_ext_create()

As soon as rcu_read_unlock() is called, there is no guarantee current
thread can safely derefence t pointer, rcu protected.

Fix is to copy t->alloc_size in a temporary variable.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: fix ipt_REJECT TCP RST routing for indev == outdev

ip_route_me_harder can't create the route cache when the outdev is the same
with the indev for the skbs whichout a valid protocol set.

__mkroute_input functions has this check:
1998         if (skb->protocol != htons(ETH_P_IP)) {
1999                 /* Not IP (i.e. ARP). Do not create route, if it is
2000                  * invalid for proxy arp. DNAT routes are always valid.
2001                  *
2002                  * Proxy arp feature have been extended to allow, ARP
2003                  * replies back to the same interface, to support
2004                  * Private VLAN switch technologies. See arp.c.
2005                  */
2006                 if (out_dev == in_dev &&
2007                     IN_DEV_PROXY_ARP_PVLAN(in_dev) == 0) {
2008                         err = -EINVAL;
2009                         goto cleanup;
2010                 }
2011         }

This patch gives the new skb a valid protocol to bypass this check. In order
to make ipt_REJECT work with bridges, you also need to enable ip_forward.

This patch also fixes a regression. When we used skb_copy_expand(), we
didn't have this issue stated above, as the protocol was properly set.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_ct_sip: default to NF_ACCEPT in sip_help_tcp()

I initially noticed this because of the compiler warning below, but it
does seem to be a valid concern in the case where ct_sip_get_header()
returns 0 in the first iteration of the while loop.

net/netfilter/nf_conntrack_sip.c: In function 'sip_help_tcp':
net/netfilter/nf_conntrack_sip.c:1379: warning: 'ret' may be used uninitialized in this function

Signed-off-by: Simon Horman <horms@verge.net.au>
[Patrick: changed NF_DROP to NF_ACCEPT]
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: tproxy: nf_tproxy_assign_sock() can handle tw sockets

transparent field of a socket is either inet_twsk(sk)->tw_transparent
for timewait sockets, or inet_sk(sk)->transparent for other sockets
(TCP/UDP).

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6

caif: Use default send and receive buffer size in caif_socket.

CAIF sockets should use socket's default send and receive buffers sizes.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif: Fix function NULL pointer check.

Check that receive function pointer is not null before calling it.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif: Minor fixes in log prints.

Use pr_debug for flow control printouts, and refine an error printout.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>