git.karo-electronics.de Git - linux-beck.git/log

igb: Move DMA Coalescing init code to separate function.

This patch moves the DMA Coalescing feature initialization code from
igb_reset to a new function and replaces it with a call to the new
function.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb: Fix for Alt MAC Address feature on 82580 and later devices

In 82580 and later devices, the alternate MAC address feature is
completely handled by the option ROM and software does not handle
it anymore. This patch changes the check_alt_mac_addr function to
exit immediately if device is 82580 or later.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igbvf: Bump version number

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igbvf: Update module identification strings

Update adapter identification strings to properly indicate i350 VF devices
in the VF driver. Change the driver ID string to remove 82576-specific
wording. Update copyright date.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

tcp: add const qualifiers where possible

Adding const qualifiers to pointers can ease code review, and spot some
bugs. It might allow compiler to optimize code further.

For example, is it legal to temporary write a null cksum into tcphdr
in tcp_md5_hash_header() ? I am afraid a sniffer could catch the
temporary null value...

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dev: use name hash for dev_seq_ops

Instead of using the dev->next chain and trying to resync at each call to
dev_seq_start, use the name hash, keeping the bucket and the offset in
seq->private field.

Tests revealed the following results for ifconfig > /dev/null
* 1000 interfaces:
* 0.114s without patch
* 0.089s with patch
* 3000 interfaces:
* 0.489s without patch
* 0.110s with patch
* 5000 interfaces:
* 1.363s without patch
* 0.250s with patch
* 128000 interfaces (other setup):
* ~100s without patch
* ~30s with patch

Signed-off-by: Mihai Maruseac <mmaruseac@ixiacom.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: Fix the minor device number allocation

On systems that create and delete lots of dynamic devices the
31bit linux ifindex fails to fit in the 16bit macvtap minor,
resulting in unusable macvtap devices.  I have systems running
automated tests that that hit this condition in just a few days.

Use a linux idr allocator to track which mavtap minor numbers
are available and and to track the association between macvtap
minor numbers and macvtap network devices.

Remove the unnecessary unneccessary check to see if the network
device we have found is indeed a macvtap device.  With macvtap
specific data structures it is impossible to find any other
kind of networking device.

Increase the macvtap minor range from 65536 to the full 20 bits
that is supported by linux device numbers.  It doesn't solve the
original problem but there is no penalty for a larger minor
device range.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: Rewrite macvtap_newlink so the error handling works.

Place macvlan_common_newlink at the end of macvtap_newlink because
failing in newlink after registering your network device is not
supported.

Move device_create into a netdevice creation notifier. The network device
notifier is the only hook that is called after the network device has been
registered with the device layer and before register_network_device returns
success.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: Don't leak unreceived packets when we delete a macvtap device.

To avoid leaking packets in the receive queue. Add a socket destructor
that will run whenever destroy a macvtap socket.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: Fix macvtap_open races in the zero copy enable code.

To see if it is appropriate to enable the macvtap zero copy feature
don't test the lowerdev network device flags. Instead test the
macvtap network device flags which are a direct copy of the lowerdev
flags. This is important because nothing holds a reference to lowerdev
and on a very bad day we lowerdev could be a pointer to stale memory.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: Close a race between macvtap_open and macvtap_dellink.

There is a small window in macvtap_open between looking up a
networking device and calling macvtap_set_queue in which
macvtap_del_queues called from macvtap_dellink. After
calling macvtap_del_queues it is totally incorrect to
allow macvtap_set_queue to proceed so prevent success by
reporting that all of the available queues are in use.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: fix truesize underestimation

We must account in skb->truesize, the size of the fragments, not the
used part of them.

Doing this work is important to avoid unexpected OOM situations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: virtualization@lists.linux-foundation.org
CC: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: fix skb truesize underestimation

bnx2x allocates a full page per fragment.

We must account in skb->truesize, the size of the fragment, not the used
part of it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add opaque struct around skb frag page

I've split this bit out of the skb frag destructor patch since it helps enforce
the use of the fragment API.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgbi: convert to SKB paged frag API.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Bottomley <James.Bottomley@suse.de>
Cc: Karen Xie <kxie@chelsio.com>
Cc: linux-scsi@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4vf: convert to SKB paged frag API.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: convert to SKB paged frag API.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Dimitris Michailidis <dm@chelsio.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4: convert to SKB paged frag API.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

net: allow CAP_NET_RAW to set socket options IP{,V6}_TRANSPARENT

Up till now the IP{,V6}_TRANSPARENT socket options (which actually set
the same bit in the socket struct) have required CAP_NET_ADMIN
privileges to set or clear the option.

- we make clearing the bit not require any privileges.
- we allow CAP_NET_ADMIN to set the bit (as before this change)
- we allow CAP_NET_RAW to set this bit, because raw
sockets already pretty much effectively allow you
to emulate socket transparency.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: constify skbuff and Qdisc elements

Preliminary patch before tcp constification

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: remove unused tcp_fin() parameters

tcp_fin() only needs socket pointer, we can remove skb and th params.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ll_temac: Add support for ethtool

This patch enables the ethtool interface. The implementation is done
using the libphy helper functions.

Signed-off-by: David S. Miller <davem@davemloft.net>

igb: fix a compile warning

control these three function declarations and
definitions with same macro CONFIG_PCI_IOV

drivers/net/ethernet/intel/igb/igb_main.c:165:
warning: ‘igb_vf_configure’ declared ‘static’ but never defined
drivers/net/ethernet/intel/igb/igb_main.c:166:
warning: ‘igb_find_enabled_vfs’ declared ‘static’ but never defined
drivers/net/ethernet/intel/igb/igb_main.c:167:
warning: ‘igb_check_vf_assignment’ declared ‘static’ but never defined

Signed-off-by: RongQing Li <roy.qing.li@gmail.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

myri10ge: fix truesize underestimation

skb->truesize must account for allocated memory, not the used part of
it. Doing this work is important to avoid unexpected OOM situations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jon Mason <mason@myri.com>
Acked-by: Jon Mason <mason@myri.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igbvf: fix truesize underestimation

igbvf allocates half a page per skb fragment. We must account
PAGE_SIZE/2 increments on skb->truesize, not the actual frag length.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: remove ndelay() call

Daniel Turull reported inaccuracies in pktgen when using low packet
rates, because we call ndelay(val) with values bigger than 20000.

Instead of calling ndelay() for delays < 100us, we can instead loop
calling ktime_now() only.

Reported-by: Daniel Turull <daniel.turull@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use TCP_DEFAULT_INIT_RCVWND in tcp_fixup_rcvbuf()

Since commit 356f039822b (TCP: increase default initial receive
window.), we allow sender to send 10 (TCP_DEFAULT_INIT_RCVWND) segments.

Change tcp_fixup_rcvbuf() to reflect this change, even if no real change
is expected, since sysctl_tcp_rmem[1] = 87380 and this value
is bigger than tcp_fixup_rcvbuf() computed rcvmem (~23720)

Note: Since commit 356f039822b limited default window to maximum of
10*1460 and 2*MSS, we use same heuristic in this patch.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mm: add a "struct page_frag" type containing a page, offset and length

A few network drivers currently use skb_frag_struct for this purpose but I have
patches which add additional fields and semantics there which these other uses
do not want.

A structure for reference sub-page regions seems like a generally useful thing
so do so instead of adding a network subsystem specific structure.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jens Axboe <jaxboe@fusionio.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: fix skb truesize underestimation

skb->truesize must account for allocated memory, not the used part of
it. Doing this work is important to avoid unexpected OOM situations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio_net: Clean up set_skb_frag()

Remove manual initialization in set_skb_frag, and instead
use __skb_fill_page_desc() to do the same. Patch tested
on net-next.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: do not take an additional reference in skb_frag_set_page

I audited all of the callers in the tree and only one of them (pktgen) expects
it to do so. Taking this reference is pretty obviously confusing and error
prone.

In particular I looked at the following commits which switched callers of
(__)skb_frag_set_page to the skb paged fragment api:

6a930b9f163d7e6d9ef692e05616c4ede65038ec cxgb3: convert to SKB paged frag API.
5dc3e196ea21e833128d51eb5b788a070fea1f28 myri10ge: convert to SKB paged frag API.
0e0634d20dd670a89af19af2a686a6cce943ac14 vmxnet3: convert to SKB paged frag API.
86ee8130a46769f73f8f423f99dbf782a09f9233 virtionet: convert to SKB paged frag API.
4a22c4c919c201c2a7f4ee09e672435a3072d875 sfc: convert to SKB paged frag API.
18324d690d6a5028e3c174fc1921447aedead2b8 cassini: convert to SKB paged frag API.
b061b39e3ae18ad75466258cf2116e18fa5bbd80 benet: convert to SKB paged frag API.
b7b6a688d217936459ff5cf1087b2361db952509 bnx2: convert to SKB paged frag API.
804cf14ea5ceca46554d5801e2817bba8116b7e5 net: xfrm: convert to SKB frag APIs
ea2ab69379a941c6f8884e290fdd28c93936a778 net: convert core to skb paged frag APIs

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

neigh: fix rcu splat in neigh_update()

when use dst_get_neighbour to get neighbour, we need
rcu_read_lock to protect, since dst_get_neighbour uses
rcu_dereference.

The bug was reported by Ari Savolainen <ari.m.savolainen@gmail.com>

[  105.612095]
[  105.612096] ===================================================
[  105.612100] [ INFO: suspicious rcu_dereference_check() usage. ]
[  105.612101] ---------------------------------------------------
[  105.612103] include/net/dst.h:91 invoked rcu_dereference_check()
without protection!
[  105.612105]
[  105.612106] other info that might help us debug this:
[  105.612106]
[  105.612108]
[  105.612108] rcu_scheduler_active = 1, debug_locks = 0
[  105.612110] 1 lock held by dnsmasq/2618:
[  105.612111]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff815df8c7>]
rtnl_lock+0x17/0x20
[  105.612120]
[  105.612121] stack backtrace:
[  105.612123] Pid: 2618, comm: dnsmasq Not tainted 3.1.0-rc1 #41
[  105.612125] Call Trace:
[  105.612129]  [<ffffffff810ccdcb>] lockdep_rcu_dereference+0xbb/0xc0
[  105.612132]  [<ffffffff815dc5a9>] neigh_update+0x4f9/0x5f0
[  105.612135]  [<ffffffff815da001>] ? neigh_lookup+0xe1/0x220
[  105.612139]  [<ffffffff81639298>] arp_req_set+0xb8/0x230
[  105.612142]  [<ffffffff8163a59f>] arp_ioctl+0x1bf/0x310
[  105.612146]  [<ffffffff810baa40>] ? lock_hrtimer_base.isra.26+0x30/0x60
[  105.612150]  [<ffffffff8163fb75>] inet_ioctl+0x85/0x90
[  105.612154]  [<ffffffff815b5520>] sock_do_ioctl+0x30/0x70
[  105.612157]  [<ffffffff815b55d3>] sock_ioctl+0x73/0x280
[  105.612162]  [<ffffffff811b7698>] do_vfs_ioctl+0x98/0x570
[  105.612165]  [<ffffffff811a5c40>] ? fget_light+0x340/0x3a0
[  105.612168]  [<ffffffff811b7bbf>] sys_ioctl+0x4f/0x80
[  105.612172]  [<ffffffff816fdcab>] system_call_fastpath+0x16/0x1b

Reported-by: Ari Savolainen <ari.m.savolainen@gmail.com>
Signed-off-by: RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

filter: use unsigned int to silence static checker warning

This is just a cleanup.

My testing version of Smatch warns about this:
net/core/filter.c +380 check_load_and_stores(6)
warn: check 'flen' for negative values

flen comes from the user. We try to clamp the values here between 1
and BPF_MAXINSNS but the clamp doesn't work because it could be
negative. This is a bug, but it's not exploitable.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

NET: asix: fix ethtool -e for AX88178 USB dongle

"ethtool -e ethX" dumps EEPROM data. Patch sets EEPROM length for device.
Ethtool works alot better when the kernel believes the length is > 0.

From: Allan Chou <allan@asix.com.tw>
Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

cleanup: remove unnecessary include.

This cleanup patch removes unnecessary include from net/ipv6/ip6_fib.c.

Signed-off-by: Kevin Wilson <wkevils@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: compat_ioctl is local to af_inet.c, make it static

ipv4: compat_ioctl is local to af_inet.c, make it static

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: limit max_mtu in case of 4KiB and use __netdev_alloc_skb (V2)

Problem using big mtu around 4096 bytes is you end allocating (4096
+NET_SKB_PAD + NET_IP_ALIGN + sizeof(struct skb_shared_info) bytes ->
8192 bytes : order-1 pages

It's better to limit the mtu to SKB_MAX_HEAD(NET_SKB_PAD),
to have no more than one page per skb.

Also the patch changes the netdev_alloc_skb_ip_align() done in
init_dma_desc_rings() and uses a variant allowing GFP_KERNEL allocations
allowing the driver to load even in case of memory pressure.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: add CHAINED descriptor mode support (V4)

This patch enhances the STMMAC driver to support CHAINED mode of
descriptor.

STMMAC supports DMA descriptor to operate both in dual buffer(RING)
and linked-list(CHAINED) mode. In RING mode (default) each descriptor
points to two data buffer pointers whereas in CHAINED mode they point
to only one data buffer pointer.

In CHAINED mode each descriptor will have pointer to next descriptor in
the list, hence creating the explicit chaining in the descriptor itself,
whereas such explicit chaining is not possible in RING mode.

First version of this work has been done by Rayagond.
Then the patch has been reworked avoiding ifdef inside the C code.
A new header file has been added to define all the functions needed for
managing enhanced and normal descriptors.
In fact, these have to be specialized according to the ring/chain usage.
Two new C files have been also added to implement the helper routines
needed to manage: jumbo frames, chain and ring setup (i.e. desc3).

Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: allow mmc usage only if feature actually available (V4)

Enable the MMC support if it is actually available from the
HW capability register.

Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: use predefined macros for HW cap register fields (V4)

Signed-off-by: Rayagond Kokatanur <rayagond@vayavyalabs.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: allow mtu bigger than 1500 in case of normal desc (V4)

This patch allows to set the mtu bigger than 1500
in case of normal descriptors.
This is helping some SPEAr customers.

Signed-off-by: Deepak SIKRI <deepak.sikri@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: update the driver version and doc (V4)

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: protect tx process with lock (V4)

This patch fixes a problem raised on Orly ARM SMP platform
where, in case of fragmented frames, the descriptors
in the TX ring resulted broken. This was due to a missing lock
protection in the tx process.

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Tested-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: Stop advertising 1000Base capabilties for non GMII iface (V4).

This patch stops advertising 1000Base capablities if GMAC is either
configured for MII or RMII mode and on board there is a GPHY plugged on.
Without this patch if an GBit switch is connected on MII interface,
Ethernet stops working at all.

Discovered as part of
https://bugzilla.stlinux.com/show_bug.cgi?id=14148 triage

Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@st.com>
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sysfs: Reject with a warning invalid uses of tagged directories.

sysfs is a core piece of ifrastructure that many people use and
few people have all of the rules in their head on how to use
it correctly. Add warnings for people using tagged directories
improperly to that any misuses can be caught and diagnosed quickly.

A single inexpensive test in sysfs_find_dirent is almost sufficient
to catch all possible misuses. An additional warning is needed
in sysfs_add_dirent so that we actually fail when attempting to
add an untagged dirent in a tagged directory.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

sysfs: Remove support for tagged directories with untagged members.

Now that /sys/class/net/bonding_masters is implemented as a tagged sysfs
file we can remove support for untagged files in tagged directories.

This change removes any ambiguity of what a NULL namespace value
means. A NULL namespace parameter after this patch means
that we are talking about an untagged sysfs dirent.

This makes the sysfs code much less prone to mistakes when during
maintenance.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Use a per netns implementation of /sys/class/net/bonding_masters.

This fixes a network namespace misfeature that bonding_masters looked at
current instead of the remembering the context where in which
/sys/class/net/bonding_masters was opened in to see which network
namespace to act upon.

This removes the need for sysfs to handle tagged directories with
untagged members allowing for a conceptually simpler sysfs
implementation.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

class: Implement support for class attrs in tagged sysfs directories.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

sysfs: Implement support for tagged files in sysfs.

Looking up files in sysfs is hard to understand and analyize because we
currently allow placing untagged files in tagged directories.  In the
implementation of that we have two subtly different meanings of NULL.
NULL meaning there is no tag on a directory entry and NULL meaning
we don't care which namespace the lookup is performed for.  This
multiple uses of NULL have resulted in subtle bugs (since fixed)
in the code.

Currently it is only the bonding driver that needs to have an untagged
file in a tagged directory.

To untagle this mess I am adding support for tagged files to sysfs.
Modifying the bonding driver to implement bonding_masters as a tagged
file.  Registering bonding_masters once for each network namespace.
Then I am removing support for untagged entries in tagged sysfs
directories.

Resulting in code that is much easier to reason about.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: fix wrong port enabling in 802.3ad

The port shouldn't be enabled unless its current MUX
state is DISTRIBUTING which is correctly handled by
ad_mux_machine(), otherwise the packet sent can be
lost because the other end may not be ready.

The issue happens on every port initialization, but
as the ports are expected to move quickly to DISTRIBUTING,
it doesn't cause much problem. However, it does cause
constant packet loss if the other peer has the port
configured to stay in STANDBY (i.e. SYNC set to OFF).

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: validate HWTSTAMP ioctl parameters

This patch adds a sanity check on the values provided by user space for
the hardware time stamping configuration. If the values lie outside of
the absolute limits, then the ioctl request will be denied.

Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Move rcu_barrier from rollback_registered_many to netdev_run_todo.

This patch moves the rcu_barrier from rollback_registered_many
(inside the rtnl_lock) into netdev_run_todo (just outside the rtnl_lock).
This allows us to gain the full benefit of sychronize_net calling
synchronize_rcu_expedited when the rtnl_lock is held.

The rcu_barrier in rollback_registered_many was originally a synchronize_net
but was promoted to be a rcu_barrier() when it was found that people were
unnecessarily hitting the 250ms wait in netdev_wait_allrefs().  Changing
the rcu_barrier back to a synchronize_net is therefore safe.

Since we only care about waiting for the rcu callbacks before we get
to netdev_wait_allrefs() it is also safe to move the wait into
netdev_run_todo.

This was tested by creating and destroying 1000 tap devices and observing
/proc/lock_stat.  /proc/lock_stat reports this change reduces the hold
times of the rtnl_lock by a factor of 10.  There was no observable
difference in the amount of time it takes to destroy a network device.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use TCP_INIT_CWND in tcp_fixup_sndbuf()

Initial cwnd being 10 (TCP_INIT_CWND) instead of 3, change
tcp_fixup_sndbuf() to get more than 16384 bytes (sysctl_tcp_wmem[1]) in
initial sk_sndbuf

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phylib: Modify Vitesse RGMII skew settings

The Vitesse driver was using the RGMII_ID interface type to determine if
skew was necessary.  However, we want to move away from using that
interface type, as it's really a property of the board's PHY connection.
However, some boards depend on it, so we want to support it, while
allowing new boards to use the more flexible "fixups" approach.  To do
this, we extract the code which adds skew into its own function, and
call that function when RGMII_ID has been selected.

Another side-effect of this change is that if your PHY has skew set
already, it doesn't clear it.  This way, the fixup code can modify the
register without config_init then clearing it.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Allow skb_recycle_check to be done in stages

skb_recycle_check resets the skb if it's eligible for recycling.
However, there are times when a driver might want to optionally
manipulate the skb data with the skb before resetting the skb,
but after it has determined eligibility. We do this by splitting the
eligibility check from the skb reset, creating two inline functions to
accomplish that task.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Acked-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: Updating driver version

Driver version updated to 1.5.4.2

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: Adding rxhash support

Moving to Toeplitz function in RSS calculation.
Reporting rxhash in skb.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: Recording rx queue for gro packets

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: Checksum counters per ring

Not updating common counters from data path.
The checksum counters are per ring, summarizing them when collecting statistics.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4_en: Controlling FCS header removal

Canceling FCS removal where FW allows for better alignment
of incoming data.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4: Fix vlan table overflow

Prevent overflow when trying to register more Vlans then the Vlan table in
HW is configured to.
Need to take into acount that the first 2 entries are reserved.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif-hsi: Added recovery check of CA wake status.

Added recovery check of CA wake status in case of wake up timeout.
Added check of CA wake status in case of wake down timeout.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif-hsi: Added sanity check for length of CAIF frames

Added sanity check for length of CAIF frames, and tear down of
CAIF link-layer device upon protocol error.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif-hsi: Make inactivity timeout configurable.

CAIF HSI uses a timer for inactivity. Upon timeout HSI-wake signaling
is initiated to allow power-down of the HSI block.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif-hsi: HSI-Platform device register and unregisters itself

Platform device is no longer removed from caif_hsi at shutdown.
The HSI-platform device must do it's own registration and unregistration.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif-hsi: Making read and writes asynchronous.

Some platforms do not allow to put HSI block into low-power
mode when FIFO is not empty. The patch flushes (by reading)
FIFO at wake down sequence. Asynchronous read and write is
implemented for that. As a side effect this will also greatly
improve performance.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif-hsi: Fix for wakeup condition problem

Under stressed conditions a race could happen when del_timer_sync() was called
from softirq context at the same time when mod_timer_pending() for the same
timer was called from the workqueue. This leaded to a state mismatch in the
CAIF HSI driver and following unexpected link wakeup procedure.

The fix puts del_timer_sync() and mod_timer_pending() calls under a spin lock
to protect against the race condition.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif-hsi: Fixing a race condition in the caif_hsi code

cfhsi->tx_state was not protected by a spin lock. TX soft-irq could interrupt
cfhsi_tx_done_work work leading to inconsistent state of the driver.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif-hsi: HSI Fix uninitialized data in HSI header

CAIF HSI header may be uninitialized and cause last message to
be repeated if transmit size is ~86 bytes long.

Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add skb frag size accessors

To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm6: Don't call icmpv6_send on local error

Calling icmpv6_send() on a local message size error leads to
an incorrect update of the path mtu. So use xfrm6_local_rxpmtu()
to notify about the pmtu if the IPV6_DONTFRAG socket option is
set on an udp or raw socket, according RFC 3542 and use
ipv6_local_error() otherwise.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Fix IPsec slowpath fragmentation problem

ip6_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path).
On IPsec the effective mtu is lower because we need to add the protocol
headers and trailers later when we do the IPsec transformations. So after
the IPsec transformations the packet might be too big, which leads to a
slowpath fragmentation then. This patch fixes this by building the packets
based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr
handling to this.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Remove superfluous NULL pointer check in ipv6_local_rxpmtu

The pointer to mtu_info is taken from the common buffer
of the skb, thus it can't be a NULL pointer. This patch
removes this check on mtu_info.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm: Simplify the replay check and advance functions

The replay check and replay advance functions had some code
duplications. This patch removes the duplications.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/phy: extra delay only for RGMII interfaces for IC+ IP 1001

The extra delay of 2ns to adjust RX clock phase is actually needed
in RGMII mode. Tested on the HDK7108 (STx7108c2).

Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: allow vlan traffic to be received under bond

The following configuration used to work as I expected. At least
we could use the fcoe interfaces to do MPIO and the bond0 iface
to do load balancing or failover.

       ---eth2.228-fcoe
       |
eth2 -----|
          |
          |---- bond0
          |
eth3 -----|
       |
       ---eth3.228-fcoe

This worked because of a change we added to allow inactive slaves
to rx 'exact' matches. This functionality was kept intact with the
rx_handler mechanism. However now the vlan interface attached to the
active slave never receives traffic because the bonding rx_handler
updates the skb->dev and goto's another_round. Previously, the
vlan_do_receive() logic was called before the bonding rx_handler.

Now by the time vlan_do_receive calls vlan_find_dev() the
skb->dev is set to bond0 and it is clear no vlan is attached
to this iface. The vlan lookup fails.

This patch moves the VLAN check above the rx_handler. A VLAN
tagged frame is now routed to the eth2.228-fcoe iface in the
above schematic. Untagged frames continue to the bond0 as
normal. This case also remains intact,

eth2 --> bond0 --> vlan.228

Here the skb is VLAN tagged but the vlan lookup fails on eth2
causing the bonding rx_handler to be called. On the second
pass the vlan lookup is on the bond0 iface and completes as
expected.

Putting a VLAN.228 on both the bond0 and eth2 device will
result in eth2.228 receiving the skb. I don't think this is
completely unexpected and was the result prior to the rx_handler
result.

Note, the same setup is also used for other storage traffic that
MPIO is used with eg. iSCSI and similar setups can be contrived
without storage protocols.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Tested-by: Hans Schillstrom <hams.schillstrom@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cs89x0: Move the driver into the Cirrus dir

The cs89x0 driver was initial placed in the apple/ when it
should have been placed in the cirrus/. This resolves the
issue by moving the dirver and fixing up the respective
Kconfig(s) and Makefile(s).

Thanks to Sascha for reporting the issue.

-v2 Fix a config error that was introduced with v1 by removing
the dependency on MACE for NET_VENDOR_APPLE.

CC: Russell Nelson <nelson@crynwr.com>
CC: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: give proper headroom in pppol2tp_xmit()

pppol2tp_xmit() calls skb_cow_head(skb, 2) before calling
l2tp_xmit_skb()

Then l2tp_xmit_skb() calls again skb_cow_head(skb, large_headroom)

This patchs changes the first skb_cow_head() call to supply the needed
headroom to make sure at most one (expensive) pskb_expand_head() is
done.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvlan: handle fragmented multicast frames

Fragmented multicast frames are delivered to a single macvlan port,
because ip defrag logic considers other samples are redundant.

Implement a defrag step before trying to send the multicast frame.

Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

ixgbe: change the eeprom version reported by ethtool

Use 32bit value starting at offset 0x2d for displaying the firmware
version in ethtool. This should work for all current ixgbe HW

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Stephen Ko <stephen.s.ko@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ipv6: remove a rcu_read_lock in ndisc_constructor

in6_dev_get(dev) takes a reference on struct inet6_dev, we dont need
rcu locking in ndisc_constructor()

Signed-off-by: Roy.Li <rongqing.li@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

can: remove references to berlios mailinglist

The BerliOS project, which currently hosts our mailinglist, will
close with the end of the year. Now take the chance and remove all
occurrences of the mailinglist address from the source files.

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

MAINTAINERS: can: the mailinglist moved to vger.kernel.org

Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/flow: Fix potential memory leak

While preparing net flow caches, once a fail may cause potential
memory leak , fix it.

Signed-off-by: Huajun Li <huajun.li.lee@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Remove unused tcp_end field in send WQ

The tcp_end field is not actually used by the hardware, so there
is no need to set it.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Add GRO support

Add GRO support to the ehea driver.

v3:
[cascardo] no need to enable GRO, since it's enabled by default
[cascardo] vgrp was removed in the vlan cleanup

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Remove LRO support

In preparation for adding GRO to ehea, remove LRO.

v3:
[cascardo] fixed conflict with vlan cleanup

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Add 64bit statistics

Switch to using ndo_get_stats64 to get 64bit statistics.

v3:
[cascardo] use rtnl_link_stats64 as port stats

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Remove some unused definitions

The queue macros are many levels deep and it makes it harder to
work your way through them when many of the versions are unused.
Remove the unused versions.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Simplify type 3 transmit routine

If a nonlinear skb fits within the immediate area, use skb_copy_bits
instead of copying the frags by hand.

v3:
[cascardo] fixed conflict with use of skb frag API

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Merge swqe2 TSO and non TSO paths

write_swqe2_TSO and write_swqe2_nonTSO are almost identical.

For TSO we have to set the TSO and mss bits in the wqe and we only
put the header in the immediate area, no data. Collapse both
functions into write_swqe2_immediate.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Simplify ehea_xmit2 and ehea_xmit3

Based on a patch from Michael Ellerman, clean up a significant
portion of the transmit path. There was a lot of duplication here.
Even worse, we were always checksumming tx packets and ignoring the
skb->ip_summed field.

Also remove NETIF_F_FRAGLIST from dev->features, I'm not sure why
it was enabled.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Allocate large enough skbs to avoid partial cacheline DMA writes

The ehea adapter has a mode where it will avoid partial cacheline DMA
writes on receive by always padding packets to fall on a cacheline
boundary.

Unfortunately we currently aren't allocating enough space for a full
ethernet MTU packet to be rounded up, so this optimisation doesn't hit.

It's unfortunate that the next largest packet size exposed by the
hypervisor interface is 2kB, meaning our skb allocation comes out of a
4kB SLAB. However the performance increase due to this optimisation is
quite large and my TCP stream numbers increase from 900MB to 1000MB/sec.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Add vlan_features

We weren't enabling any VLAN features so we missed out on checksum
offload and TSO when using VLANs. Enable them.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Dont check NETIF_F_TSO in TX path

It seems like the ehea xmit routine and an ethtool change of TSO
mode could race, resulting in corrupt packets. Checking gso_size
is enough and we can use the helper function.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Remove num_tx_qps module option

The num_tx_qps module option allows a user to configure a different
number of tx and rx queues. Now the networking stack is multiqueue
aware it makes little sense just to enable the tx queues and not the
rx queues so remove the option.

v3:
[cascardo] fixed conflict with get_stats change

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Remove force_irq logic in napi poll routine

commit 18604c548545 (ehea: NAPI multi queue TX/RX path for SMP) added
driver specific logic for exiting napi mode. I'm not sure what it was
trying to solve and it should be up to the network stack to decide when
we are done polling so remove it.

v3:
[cascardo] Fixed extra parentheses.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Update multiqueue support

The ehea driver had some multiqueue support but was missing the last
few years of networking stack improvements:

- Use skb_record_rx_queue to record which queue an skb came in on.

- Remove the driver specific netif_queue lock and use the networking
  stack transmit lock instead.

- Remove the driver specific transmit queue hashing and use
  skb_get_queue_mapping instead.

- Use netif_tx_{start|stop|wake}_queue where appropriate. We can also
  remove pr->queue_stopped and just check the queue status directly.

- Print all 16 queues in the ethtool stats.

We now enable multiqueue by default since it is a clear win on all my
testing so far.

v3:
[cascardo] fixed use_mcs parameter description
[cascardo] set ehea_ethtool_stats_keys as const

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ehea: Remove NETIF_F_LLTX

Remove the deprecated NETIF_F_LLTX feature. Since the network stack
now provides the locking we can remove the driver specific
pr->xmit_lock.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>