git.karo-electronics.de Git - linux-beck.git/log

]> git.karo-electronics.de Git - linux-beck.git/log

Johannes Berg [Wed, 19 Aug 2015 07:46:22 +0000 (09:46 +0200)]

average: remove out-of-line implementation

Since all users are now converted to the inline implementation,
remove the out-of-line implementation entirely.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Johannes Berg [Wed, 19 Aug 2015 07:46:21 +0000 (09:46 +0200)]

rt2x00: use DECLARE_EWMA

Instead of using the out-of-line EWMA calculation, use DECLARE_EWMA()
to create static inlines. On x86/64 this results in code that's one
byte larger (for me), but reduces struct link_ant and struct link
size by the two unsigned long values that store the parameters each.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Johannes Berg [Wed, 19 Aug 2015 07:46:20 +0000 (09:46 +0200)]

ath5k: use DECLARE_EWMA

This reduces code size slightly (at least on x86/64) while also
removing memory consumption by two unsigned long values for each
ath5k device.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Johannes Berg [Wed, 19 Aug 2015 07:48:40 +0000 (09:48 +0200)]

virtio_net: use DECLARE_EWMA

Instead of using the out-of-line EWMA calculation, use DECLARE_EWMA()
to create static inlines. On x86/64 this results in no change in code
size for me, but reduces the struct receive_queue size by the two
unsigned long values that store the parameters.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Yuval Mintz [Wed, 19 Aug 2015 07:21:58 +0000 (10:21 +0300)]

bnx2x: Fix vxlan endianity issue

Commit f34fa14cc033 ("bnx2x: Add vxlan RSS support") has introduced an
endianity issue when passing the vxlan UDP port to the HW.

Reported-by: <fengguang.wu@intel.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Thu, 20 Aug 2015 20:01:57 +0000 (13:01 -0700)]

Merge branch 'vrf-cleanups-part-2'

Nikolay Aleksandrov says:

====================
vrf: cleanups part 2

This is the next part of vrf cleanups, patch 1 drops the SLAB_PANIC
when creating kmem cache since it's handled, patch 02 removes a slave
duplicate check which is already done by the lower/upper code, patch 3
moves the ndo_add_slave code around a bit so we can drop an error
label and patch 4 drops the master device checks which are unnecessary
because the ops are taken from the master device itself so it can't be
different.
====================

Acked-by: David Ahern <dsa@cumulusnetworks.com>

commit | commitdiff | tree

Nikolay Aleksandrov [Wed, 19 Aug 2015 03:27:10 +0000 (06:27 +0300)]

vrf: ndo_add|del_slave drop unnecessary checks

When ndo_add|del_slave ops are used, they're taken from the respective
master device's netdev ops, so if the master device is a VRF only then
the VRF ops will get called thus no need to check the type of the
master.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Wed, 19 Aug 2015 03:27:09 +0000 (06:27 +0300)]

vrf: move vrf_insert_slave so we can drop a goto label

We can simplify do_vrf_add_slave by moving vrf_insert_slave in the end
of the enslaving and thus eliminate an error goto label. It always
succeeds and isn't needed before that anyway.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Wed, 19 Aug 2015 03:27:08 +0000 (06:27 +0300)]

vrf: remove unnecessary duplicate check

The upper/lower functions already check for duplicate slaves so no need
to do it again.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Wed, 19 Aug 2015 03:27:07 +0000 (06:27 +0300)]

vrf: don't panic on cache create failure

It's pointless to panic on cache create failure when that case is handled
and even more so since it's not a kernel-wide fatal problem so don't
panic.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Wed, 19 Aug 2015 03:12:29 +0000 (06:12 +0300)]

vrf: plug skb leaks

Currently whenever a packet different from ETH_P_IP is sent through the
VRF device it is leaked so plug the leaks and properly drop these
packets.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Tue, 18 Aug 2015 18:40:16 +0000 (21:40 +0300)]

vrf: vrf_master_ifindex_rcu is not always called with rcu read lock

While running net-next I hit this:
[  634.073119] ===============================
[  634.073150] [ INFO: suspicious RCU usage. ]
[  634.073182] 4.2.0-rc6+ #45 Not tainted
[  634.073213] -------------------------------
[  634.073244] include/net/vrf.h:38 suspicious rcu_dereference_check()
usage!
[  634.073274]
               other info that might help us debug this:

[  634.073307]
               rcu_scheduler_active = 1, debug_locks = 1
[  634.073338] 2 locks held by swapper/0/0:
[  634.073369]  #0:  (((&n->timer))){+.-...}, at: [<ffffffff8112bc35>]
call_timer_fn+0x5/0x480
[  634.073412]  #1:  (slock-AF_INET){+.-...}, at: [<ffffffff8174f0f5>]
icmp_send+0x155/0x5f0
[  634.073450]
               stack backtrace:
[  634.073483] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.2.0-rc6+ #45
[  634.073514] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
VirtualBox 12/01/2006
[  634.073545]  0000000000000000 0593ba8242d9ace4 ffff88002fc03b48
ffffffff81803f1b
[  634.073612]  0000000000000000 ffffffff81e12500 ffff88002fc03b78
ffffffff811003c5
[  634.073642]  0000000000000000 ffff88002ec4e600 ffffffff81f00f80
ffff88002fc03cf0
[  634.073669] Call Trace:
[  634.073694]  <IRQ>  [<ffffffff81803f1b>] dump_stack+0x4c/0x65
[  634.073728]  [<ffffffff811003c5>] lockdep_rcu_suspicious+0xc5/0x100
[  634.073763]  [<ffffffff8174eb56>] icmp_route_lookup+0x176/0x5c0
[  634.073793]  [<ffffffff8174f2fb>] ? icmp_send+0x35b/0x5f0
[  634.073818]  [<ffffffff8174f274>] ? icmp_send+0x2d4/0x5f0
[  634.073844]  [<ffffffff8174f3ce>] icmp_send+0x42e/0x5f0
[  634.073873]  [<ffffffff8170b662>] ipv4_link_failure+0x22/0xa0
[  634.073899]  [<ffffffff8174bdda>] arp_error_report+0x3a/0x80
[  634.073926]  [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0
[  634.073952]  [<ffffffff816d396e>] neigh_invalidate+0x8e/0x110
[  634.073984]  [<ffffffff816d62ae>] neigh_timer_handler+0x1ae/0x290
[  634.074013]  [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0
[  634.074013]  [<ffffffff8112bce3>] call_timer_fn+0xb3/0x480
[  634.074013]  [<ffffffff8112bc35>] ? call_timer_fn+0x5/0x480
[  634.074013]  [<ffffffff816d6100>] ? neigh_lookup+0x2c0/0x2c0
[  634.074013]  [<ffffffff8112c2bc>] run_timer_softirq+0x20c/0x430
[  634.074013]  [<ffffffff810af50e>] __do_softirq+0xde/0x630
[  634.074013]  [<ffffffff810afc97>] irq_exit+0x117/0x120
[  634.074013]  [<ffffffff81810976>] smp_apic_timer_interrupt+0x46/0x60
[  634.074013]  [<ffffffff8180e950>] apic_timer_interrupt+0x70/0x80
[  634.074013]  <EOI>  [<ffffffff8106b9d6>] ? native_safe_halt+0x6/0x10
[  634.074013]  [<ffffffff81101d8d>] ? trace_hardirqs_on+0xd/0x10
[  634.074013]  [<ffffffff81027d43>] default_idle+0x23/0x200
[  634.074013]  [<ffffffff8102852f>] arch_cpu_idle+0xf/0x20
[  634.074013]  [<ffffffff810f89ba>] default_idle_call+0x2a/0x40
[  634.074013]  [<ffffffff810f8dcc>] cpu_startup_entry+0x39c/0x4c0
[  634.074013]  [<ffffffff817f9cad>] rest_init+0x13d/0x150
[  634.074013]  [<ffffffff81f69038>] start_kernel+0x4a8/0x4c9
[  634.074013]  [<ffffffff81f68120>] ?
early_idt_handler_array+0x120/0x120
[  634.074013]  [<ffffffff81f68339>] x86_64_start_reservations+0x2a/0x2c
[  634.074013]  [<ffffffff81f68485>] x86_64_start_kernel+0x14a/0x16d

It would seem vrf_master_ifindex_rcu() can be called without RCU held in
other contexts as well so introduce a new helper which acquires rcu and
returns the ifindex.
Also add curly braces around both the "if" and "else" parts as per the
style guide.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ying Xue [Wed, 19 Aug 2015 07:46:17 +0000 (15:46 +0800)]

lwtunnel: Fix the sparse warnings in fib_encap_match

When CONFIG_LWTUNNEL config is not enabled, the lwtstate_free() is not
declared in lwtunnel.h at all. However, even in this case, the function
is still referenced in fib_semantics.c so that there appears the
following sparse warnings:

net/ipv4/fib_semantics.c:553:17: error: undefined identifier 'lwtstate_free'
  CC      net/ipv4/fib_semantics.o
  net/ipv4/fib_semantics.c: In function ‘fib_encap_match’:
  net/ipv4/fib_semantics.c:553:3: error: implicit declaration of function ‘lwtstate_free’ [-Werror=implicit-function-declaration]
  cc1: some warnings being treated as errors
  make[1]: *** [net/ipv4/fib_semantics.o] Error 1
  make: *** [net/ipv4/fib_semantics.o] Error 2

To eliminate the error, we define an empty function for lwtstate_free()
in lwtunnel.h when CONFIG_LWTUNNEL is disabled.

Fixes: df383e6240ef ("lwtunnel: fix memory leak")
Cc: Jiri Benc <jbenc@redhat.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 19 Aug 2015 03:21:32 +0000 (20:21 -0700)]

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-08-18

This series contains updates to igb, e100, e1000e and ixgbe.

Shota Suzuki provides a fix for a possible overflow in
igb_set_interrupt_capability() which leads to an oops.  When changing the
number of queues by "ethtool -L", set IGB_FLAG_QUEUE_PAIRS in the same
manner as when initializing the igb driver.

Vasily Averin provides a fix for a missing rtnl_unlock() for when we
error out due to not being able to allocate memory for our queues.

Stefan Assman provides a couple of fixes for igb/igbvf.  First changes
the igb driver in probe to simply call igb_enable_sriov() instead of
igb_sriov_reinit() since we are starting from scratch.  Then in igbvf,
fix the driver where it does not clear the buffer_info->dma in all
cases after calling dma_unmap_single(), which was found by changing the
MTU twice.

Richard Cochran implements the periodic output function using the
programmable clock outputs available in i210 when possible, falling
back to the target time for longer periods.

Todd adds support for the Marvell PHY 1512 which is required for i354
devices.  Then updates igb to make sure SR-IOV init uses the correct
number of queues, since recent changes could result in the PF holding
onto all of the queues.

Alex Williamson provides a fix in the case where a guest OS does not
support hot-unplug, so disable SR-IOV prior to unregister_netdev() to
avoid the problem.

Jia-Ju Bai provides several patches, first knocks some collecting dust
off an old e100 driver to add a check to avoid a null pointer
dereference.  Then cleans up a possible resource leak by releasing the
skb buffer allocated when the e100_xmit_prepare() runs into an issue
in the DMA mapping.  In igb, add a missing rtnl_unlock() for when we
error out due to igb_sriov_reinit() in the igb_init_interrupt_scheme().
Provides a e1000e fix, based on suggestions from Alex Duyck to move
head/tail register writing to e1000_configure_tx/rx() to avoid a
possible null pointer dereference (similar to igb driver).  Lastly,
fix a possible memory leak in igb_probe(), where the memory shadow_vfta
allocated by kcalloc in igb_sw_init() is not freed.

Mark simplifies port-specific macros for ixgbe by eliminating explicit
comparisons with 0 and enclose formal parameters in parens to eliminate
the risk of an operator precedence issue.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Wed, 19 Aug 2015 03:16:53 +0000 (20:16 -0700)]

Merge branch 'vrf-next'

Nikolay Aleksandrov says:

====================
vrf: a few simplifications and cleanups

These patches remove some unnecessary checks (patches 3, 4), unnecessary
num_slaves member and refcnt manipulations which are already done by the
upper functions.
====================

Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Tue, 18 Aug 2015 17:28:04 +0000 (20:28 +0300)]

vrf: simplify the netdev notifier function

We can drop the check because if vrf_ptr is present then we must have
the vrf device as a master and since we're running with rtnl it can't go
away.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Tue, 18 Aug 2015 17:28:03 +0000 (20:28 +0300)]

vrf: don't check for dstats and rth in uninit path

dstats and rth are always present because we fail the device registration
if they can't be allocated in vrf_init() (ndo_init) so drop the unnecessary
checks.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Tue, 18 Aug 2015 17:28:02 +0000 (20:28 +0300)]

vrf: drop unused num_slaves member

slave_queue has a num_slaves member which is unused, drop it.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Nikolay Aleksandrov [Tue, 18 Aug 2015 17:28:01 +0000 (20:28 +0300)]

vrf: drop unnecessary dev refcnt changes

netdev_master_upper_dev_link/unlink already do a dev_hold/put on the
devices being linked, so no need to take another reference.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andrew Schwartzmeyer [Wed, 19 Aug 2015 03:06:32 +0000 (20:06 -0700)]

hv_netvsc: Fix dereference of nvdev before check

Passes static analysis by Smatch.

Signed-off-by: Andrew Schwartzmeyer <andschwa@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Benc [Tue, 18 Aug 2015 16:42:09 +0000 (18:42 +0200)]

lwtunnel: ip tunnel: fix multiple routes with different encap

Currently, two routes going through the same tunnel interface are considered
the same even when they are routed to a different host after encapsulation.
This causes all routes added after the first one to have incorrect
encapsulation parameters.

This is nicely visible by doing:

  # ip r a 192.168.1.2/32 dev vxlan0 tunnel dst 10.0.0.2
  # ip r a 192.168.1.3/32 dev vxlan0 tunnel dst 10.0.0.3
  # ip r
  [...]
  192.168.1.2/32 tunnel id 0 src 0.0.0.0 dst 10.0.0.2 [...]
  192.168.1.3/32 tunnel id 0 src 0.0.0.0 dst 10.0.0.2 [...]

Implement the missing comparison function.

Fixes: 3093fbe7ff4bc ("route: Per route IP tunnel metadata via lightweight tunnel")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jiri Benc [Tue, 18 Aug 2015 16:41:13 +0000 (18:41 +0200)]

lwtunnel: fix memory leak

The built lwtunnel_state struct has to be freed after comparison.

Fixes: 571e722676fe3 ("ipv4: support for fib route lwtunnel encap attributes")
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Dan Carpenter [Tue, 18 Aug 2015 09:31:44 +0000 (12:31 +0300)]

cxgb4: memory corruption in debugfs

You can't use kstrtoul() with an int or it causes memory corruption.
Also j should be unsigned or we have underflow bugs.

I considered changing "j" to unsigned long but everything fits in a u32.

Fixes: 8e3d04fd7d70 ('cxgb4: Add MPS tracing support')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 18 Aug 2015 21:24:18 +0000 (14:24 -0700)]

enic: Fix namespace pollution causing build errors.

drivers/net/built-in.o: In function `.vnic_wq_devcmd2_alloc':
(.text+0x49fe40): multiple definition of `.vnic_wq_devcmd2_alloc'
drivers/scsi/built-in.o:(.text+0xb4318): first defined here
drivers/net/built-in.o:(.opd+0x2af00): multiple definition of `vnic_wq_devcmd2_alloc'
drivers/scsi/built-in.o:(.opd+0xad70): first defined here
drivers/net/built-in.o: In function `.vnic_wq_init_start':
(.text+0x49f9c0): multiple definition of `.vnic_wq_init_start'
drivers/scsi/built-in.o:(.text+0xb3b58): first defined here
drivers/net/built-in.o:(.opd+0x2ae88): multiple definition of `vnic_wq_init_start'
drivers/scsi/built-in.o:(.opd+0xace0): first defined here

Rename these to 'enic_*' to avoid the conflict with the functiosn of
the same name in the snic scsi driver.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Rajesh Borundia [Tue, 18 Aug 2015 07:22:59 +0000 (10:22 +0300)]

bnx2x: Add vxlan RSS support

Latest FW submission added some vxlan offload capabilities to our device.
This patch adds the ability to connect to the vxlan NDOs and configure
the UDP port associated with it in the HW.

The device would now be capable of performing RSS according to the
inner headers of the vxlan packets.

Signed-off-by: Rajesh Borundia <Rajesh.Borundia@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 18 Aug 2015 21:17:22 +0000 (14:17 -0700)]

Merge branch 'dsa-multi-swtich'

Andrew Lunn says:

====================
D in DSA patches

The D in DSA is distributed, meaning multiple switches can be
connected together. Currently no mainline system does this, and so the
code is broken. This patchset contains two fixes, and a small helper.

With three of more switches, the current device tree binding is not
sufficient to express the routing between the switches. The first
patch extends the binding, in a backwards compatible way, to allow a
link between a switch to describe all the switches accessible over the
link, not just the direct neighbor.

The third patch fixes the port configuration on newer devices for
links connecting switches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andrew Lunn [Mon, 17 Aug 2015 21:52:52 +0000 (23:52 +0200)]

dsa: mv88e6xxx: Set DSA mode based on chip abilities

Older devices only support a single DSA frame format, where as newer
devices have two. Take this into account when configuring a DSA port.
The port needs to be in plain old DSA mode, since this is a DSA link,
where as the newer format can be used for the CPU port.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andrew Lunn [Mon, 17 Aug 2015 21:52:51 +0000 (23:52 +0200)]

net: dsa: Add dsa_is_dsa_port() helper

Add an inline helper for determining is a port is a DSA port.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Andrew Lunn [Mon, 17 Aug 2015 21:52:50 +0000 (23:52 +0200)]

net: dsa: Allow multi hop routes to be expressed

With more than two switches in a hierarchy, it becomes necessary to
describe multi-hop routes between switches. The current binding does
not allow this, although the older platform_data did. Extend the link
property to be a list rather than a single phandle to a remote switch.
It is then possible to express that a port should be used to reach
more than one switch and the switch maybe more than one hop away.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Jacob Keller [Wed, 10 Jun 2015 18:44:45 +0000 (11:44 -0700)]

ixgbe: TRIVIAL fix up double 'the' and comment style

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Mark Rustad [Sat, 6 Jun 2015 17:41:03 +0000 (10:41 -0700)]

ixgbe: Simplify port-specific macros

Simplify port-specific macros by eliminating explicit comparison
with 0. More importantly, enclose formal parameter in parens to
eliminate the risk of an operator precedence surprise.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Todd Fujinaka [Sat, 8 Aug 2015 00:27:39 +0000 (17:27 -0700)]

igb: make sure SR-IOV init uses the right number of queues

Recent changes to igb_probe_vfs() could lead to the PF holding onto all
of the queues. Reorder igb_probe_vfs() to be before
gb_init_queue_configuration() and add some more error checking.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Stefan Assmann [Thu, 6 Aug 2015 07:32:17 +0000 (09:32 +0200)]

igbvf: clear buffer_info->dma after dma_unmap_single()

The driver doesn't clear buffer_info->dma after calling
dma_unmap_single() in all cases. This has been discovered by changing
the mtu twice, which caused the following backtrace.

[   68.569280] WARNING: CPU: 2 PID: 1860 at drivers/iommu/intel-iommu.c:3517 intel_unmap+0x20c/0x220()
[   68.579392] Driver unmaps unmatched page at PFN fffc2a40
[   68.585322] Modules linked in: igbvf ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat kvm_intel kvm igb megs
[   68.599163] CPU: 2 PID: 1860 Comm: ifconfig Not tainted 4.2.0-rc4+ #147
[   68.606543] Hardware name: IBM  -[546025Z]-/00Y7630, BIOS -[VVE134TUS-1.51]- 10/17/2013
[   68.615473]  0000000000000dbd ffff88046441bb08 ffffffff81a5ad0b ffffffff81e2f9ea
[   68.623775]  ffff88046441bb58 ffff88046441bb48 ffffffff81056b55 ffff88047fc583c0
[   68.632075]  0000000000000000 ffff880469a8e600 00000000fffc2a40 ffff880465b32098
[   68.640375] Call Trace:
[   68.643109]  [<ffffffff81a5ad0b>] dump_stack+0x48/0x5d
[   68.648844]  [<ffffffff81056b55>] warn_slowpath_common+0x95/0xe0
[   68.655549]  [<ffffffff81056c56>] warn_slowpath_fmt+0x46/0x70
[   68.661960]  [<ffffffff8158a614>] ? find_iova+0x54/0x90
[   68.667791]  [<ffffffff815988dc>] intel_unmap+0x20c/0x220
[   68.673815]  [<ffffffff8159891e>] intel_unmap_page+0xe/0x10
[   68.680038]  [<ffffffffa0067536>] igbvf_clean_rx_ring+0x96/0x370 [igbvf]
[   68.687516]  [<ffffffffa0067915>] igbvf_down+0x105/0x110 [igbvf]
[   68.694219]  [<ffffffffa0067beb>] igbvf_change_mtu+0x16b/0x180 [igbvf]
[...]

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jia-Ju Bai [Wed, 5 Aug 2015 14:05:16 +0000 (22:05 +0800)]

igb: Fix a memory leak in igb_probe

In error handling code of igb_probe, the memory adapter->shadow_vfta
allocated by kcalloc in igb_sw_init is not freed. So when register_netdev
or igb_init_i2c is failed, a memory leak will occur.
This patch adds kfree to fix it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jia-Ju Bai [Wed, 5 Aug 2015 10:16:10 +0000 (18:16 +0800)]

e1000e: Modify Tx/Rx configurations to avoid null pointer dereferences in e1000_open

When e1000e_setup_rx_resources is failed in e1000_open,
e1000e_free_tx_resources in "err_setup_rx" segment is executed.
"writel(0, tx_ring->head)" statement in e1000_clean_tx_ring
in e1000e_free_tx_resources will cause a null poonter dereference(crash),
because "tx_ring->head" is only assigned in e1000_configure_tx
in e1000_configure, but it is after e1000e_setup_rx_resources.

This patch moves head/tail register writing to e1000_configure_tx/rx,
which can fix this problem. It is inspired by igb_configure_tx_ring
in the igb driver.

Specially, thank Alexander Duyck for his valuable suggestion.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jia-Ju Bai [Mon, 3 Aug 2015 03:36:26 +0000 (11:36 +0800)]

igb: Fix a deadlock in igb_sriov_reinit

When igb_init_interrupt_scheme in igb_sriov_reinit is failed, the lock
acquired by rtnl_lock() is not released, which causes a deadlock.
This patch adds rtnl_unlock() in error handling to fix it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jia-Ju Bai [Mon, 3 Aug 2015 02:40:48 +0000 (10:40 +0800)]

e100: Release skb when DMA mapping is failed in e100_xmit_prepare

When pci_dma_mapping_error in e100_xmit_prepare is failed, the skb buffer
allocated by netdev_alloc_skb_ip_align in e100_rx_alloc_skb is not
released, which causes a possible resource leak.
This patch adds error handling code to fix it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Jia-Ju Bai [Mon, 3 Aug 2015 02:17:08 +0000 (10:17 +0800)]

e100: Add a check after pci_pool_create to avoid null pointer dereference

The driver lacks the check of nic->cbs_pool after pci_pool_create
in e100_probe. When this function is failed, a null pointer dereference
occurs when pci_pool_alloc uses nic->cbs_pool in e100_alloc_cbs.
This patch adds a check and related error handling code to fix it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Alex Williamson [Wed, 29 Jul 2015 20:38:15 +0000 (14:38 -0600)]

igb: Teardown SR-IOV before unregister_netdev()

When the .remove() callback for a PF is called, SR-IOV support for the
device is disabled, which requires unbinding and removing the VFs.
The VFs may be in-use either by the host kernel or userspace, such as
assigned to a VM through vfio-pci.  In this latter case, the VFs may
be removed either by shutting down the VM or hot-unplugging the
devices from the VM.  Unfortunately in the case of a Windows 2012 R2
guest, hot-unplug is broken due to the ordering of the PF driver
teardown.  Disabling SR-IOV prior to unregister_netdev() avoids this
issue.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Todd Fujinaka [Wed, 29 Jul 2015 14:32:06 +0000 (07:32 -0700)]

igb: add support for 1512 PHY

This patch adds support for Marvell PHY 1512 (required for I354).

Submitted by: Maciej Szwed <maciej.szwed@intel.com>
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Richard Cochran [Thu, 23 Jul 2015 21:59:30 +0000 (14:59 -0700)]

igb: implement high frequency periodic output signals

In addition to interrupt driven target time output events, the i210
also has two programmable clock outputs. These clocks support periods
between 16 nanoseconds and 140 milliseconds. This patch implements
the periodic output function using the clock outputs when possible,
falling back to the target time for longer periods.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Stefan Assmann [Fri, 10 Jul 2015 13:01:12 +0000 (15:01 +0200)]

igb: do not re-init SR-IOV during probe

During driver probing the following code path is triggered.
igb_probe
->igb_sw_init
  ->igb_probe_vfs
    ->igb_pci_enable_sriov
      ->igb_sriov_reinit

Doing the SR-IOV re-init is not necessary during probing since we're
starting from scratch. Here we can call igb_enable_sriov() right away.

Running igb_sriov_reinit() during igb_probe() also seems to cause
occasional packet loss on some onboard 82576 NICs. Reproduced on
Dell and HP servers with onboard 82576 NICs.
Example:
Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01)
Subsystem: Dell Device [1028:0481]

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Vasily Averin [Tue, 7 Jul 2015 15:53:45 +0000 (18:53 +0300)]

igb: missing rtnl_unlock in igb_sriov_reinit()

Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

Shota Suzuki [Wed, 1 Jul 2015 00:25:52 +0000 (09:25 +0900)]

igb: Fix oops caused by missing queue pairing

When initializing igb driver (e.g. 82576, I350), IGB_FLAG_QUEUE_PAIRS is
set if adapter->rss_queues exceeds half of max_rss_queues in
igb_init_queue_configuration().
On the other hand, IGB_FLAG_QUEUE_PAIRS is not set even if the number of
queues exceeds half of max_combined in igb_set_channels() when changing
the number of queues by "ethtool -L".
In this case, if numvecs is larger than MAX_MSIX_ENTRIES (10), the size
of adapter->msix_entries[], an overflow can occur in
igb_set_interrupt_capability(), which in turn leads to an oops.

Fix this problem as follows:
- When changing the number of queues by "ethtool -L", set
IGB_FLAG_QUEUE_PAIRS in the same way as initializing igb driver.
- When increasing the size of q_vector, reallocate it appropriately.
(With IGB_FLAG_QUEUE_PAIRS set, the size of q_vector gets larger.)

Another possible way to fix this problem is to cap the queues at its
initial number, which is the number of the initial online cpus. But this
is not the optimal way because we cannot increase queues when another
cpu becomes online.

Note that before commit cd14ef54d25b ("igb: Change to use statically
allocated array for MSIx entries"), this problem did not cause oops
but just made the number of queues become 1 because of entering msi_only
mode in igb_set_interrupt_capability().

Fixes: 907b7835799f ("igb: Add ethtool support to configure number of channels")
CC: stable <stable@vger.kernel.org>
Signed-off-by: Shota Suzuki <suzuki_shota_t3@lab.ntt.co.jp>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

commit | commitdiff | tree

David S. Miller [Tue, 18 Aug 2015 18:55:08 +0000 (11:55 -0700)]

Merge branch 'drivers_iff_no_queue'

Phil Sutter says:

====================
net: Convert drivers to IFF_NO_QUEUE and cleanup afterwards

This series converts in-tree users away from the old and deprecated
'tx_queue_len = 0' idiom, adds a warning to notify out-of-tree driver
maintainers that there is need for action on their behalf and finally drops any
workarounds in scheduling algorithm implementations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:49 +0000 (10:30 +0200)]

net: sched: drop all special handling of tx_queue_len == 0

Those were all workarounds for the formerly double meaning of
tx_queue_len, which broke scheduling algorithms if untreated.

Now that all in-tree drivers have been converted away from setting
tx_queue_len = 0, it should be safe to drop these workarounds for
categorically broken setups.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:48 +0000 (10:30 +0200)]

net: warn if drivers set tx_queue_len = 0

Due to the introduction of IFF_NO_QUEUE, there is a better way for
drivers to indicate that no qdisc should be attached by default. Though,
the old convention can't be dropped since ignoring that setting would
break drivers still using it. Instead, add a warning so out-of-tree
driver maintainers get a chance to adjust their code before we finally
get rid of any special handling of tx_queue_len == 0.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:47 +0000 (10:30 +0200)]

staging: wilc1000: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Johnny Kim <johnny.kim@atmel.com>
Cc: Rachel Kim <rachel.kim@atmel.com>
Cc: Dean Lee <dean.lee@atmel.com>
Cc: Chris Park <chris.park@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:46 +0000 (10:30 +0200)]

net: caif: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:45 +0000 (10:30 +0200)]

net: hsr: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:44 +0000 (10:30 +0200)]

net: batman-adv: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:43 +0000 (10:30 +0200)]

net: mac80211_hwsim: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:42 +0000 (10:30 +0200)]

net: hostap: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Jouni Malinen <j@w1.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:41 +0000 (10:30 +0200)]

net: dsa: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Lennert Buytenhek <buytenh@wantstofly.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:40 +0000 (10:30 +0200)]

net: ipvlan: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:39 +0000 (10:30 +0200)]

net: bonding: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:38 +0000 (10:30 +0200)]

net: 6lowpan: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:37 +0000 (10:30 +0200)]

net: bridge: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:36 +0000 (10:30 +0200)]

net: 8021q: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:35 +0000 (10:30 +0200)]

net: vxlan: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:34 +0000 (10:30 +0200)]

net: team: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:33 +0000 (10:30 +0200)]

net: nlmon: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:32 +0000 (10:30 +0200)]

net: loopback: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:31 +0000 (10:30 +0200)]

net: geneve: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:30 +0000 (10:30 +0200)]

net: dummy: convert to using IFF_NO_QUEUE

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Tue, 18 Aug 2015 08:30:29 +0000 (10:30 +0200)]

net: veth: enable noqueue operation by default

Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 18 Aug 2015 04:33:06 +0000 (21:33 -0700)]

Merge branch 'Identifier-Locator-Addressing'

Tom Herbert says:

====================
net: Identifier Locator Addressing - Part I

This patch set provides rudimentary support for Identifier Locator
Addressing or ILA. The basic concept of ILA is that we split an IPv6
address into a 64 bit locator and 64 bit identifier. The identifier is
the identity of an entity in communication ("who"), and the locator
expresses the location of the entity ("where"). Applications
use externally visible address that contains the identifier.
When a packet is actually sent, a translation is done that
overwrites the first 64 bits of the address with a locator.
The packet can then be forwarded over the network to the host where
the addressed entity is located. At the receiver, the reverse
translation is done so the that the application sees the original,
untranslated address. Presumably an external control plane will
provide identifier->locator mappings.

v2:
  - Fix compilation erros when LWT not configured
  - Consolidate ILA into a single ila.c

v3:
  - Change pseudohdr argument od inet_proto_csum_replace functions to
    be a bool

v4:
  - In ila_build_state check locator being in netlink params before
    allocating tunnel state

The data path for ILA is a simple NAT translation that only operates
on the upper 64 bits of a destination address in IPv6 packets. The
basic process is:

   1) Lookup 64 bit identifier (lower 64 bits of destination)
   2) If a match is found
      a) Overwrite locator (upper 64 bits of destination) with
         the new locator
      b) Adjust any checksum that has destination address included in
         pseudo header
   3) Send or receive packet

ILA is a means to implement tunnels or network virtualization without
encapsulation. Since there is no encapsulation involved, we assume that
stateless support in the network for IPv6 (e.g. RSS, ECMP, TSO, etc.)
just works. Also, since we're minimally changing the packet many of
the worries about encapsulation (MTU, checksum, fragmentation) are
not relevant. The downside is that, ILA is not extensible like other
encapsulations (GUE for instance) so it might not be appropriate for
all use cases. Also, this only makes sense to do in IPv6!

A key aspect of ILA is performance. The intent is that ILA would be
used in data centers in virtualizing tasks or jobs. In the fullest
incarnation all intra data center communications might be targeted to
virtual ILA addresses. This is basically adding a new virtualization
capability to the existing services in a datacenter, so there is a
strong expectation is that this does not degrade performance for
existing applications.

Performance seems to be dependent on how ILA is hooked into kernel.
ILA can be implemented under some different models:

  - Mechanically it is a form a stateless DNAT
  - It can be thought of as a type of (source) routing
  - As a functional replacement of encapsulation

In this patch set we hook into the data path using Light Weight
Tunnels (LWT) infrastructure. As part of that, we add support in LWT
to redirect dst input. iproute will be modified to take a new ila encap
type. ILA can be configured like:

ip route add 3333:0:0:1:5555:0:2:0/128 \
   encap ila 2001:0:0:2 via 2401:db00:20:911a:face:0:27:0

ip -6 addr add 3333:0:0:1:5555:0:1:0/128 dev eth0

ip route add table local local 2001:0:0:1:5555:0:1:0/128
   encap ila 3333:0:0:1 dev lo

So sending to destination 3333:0:0:1:5555:0:2:0 will have destination
of 2001:0:0:2:5555:0:2:0 on the wire.

Performance results are below. With ILA we see about a 10% drop in
pps compared to non-ILA. Much of this drop can be attributed to the
loss of early demux on input (translation occurs after it is attempted).
We will address this in the next patch set. Also, IPvlan input path
does not work with ILA since the routing is bypassed-- this will
be addressed in a future patch.

Performance testing:

Performing netperf TCP_RR with 200 clients:

Non-ILA baseline
  84.92% CPU utilization
  1861922.9 tps
  93/163/330 50/90/99% latencies

ILA single destination
  83.16% CPU utilization
  1679683.4 tps
  105/180/332 50/90/99% latencies

References:

Slides from netconf:
http://vger.kernel.org/netconf2015Herbert-ILA.pdf

Slides from presentation at IETF:
https://www.ietf.org/proceedings/92/slides/slides-92-nvo3-1.pdf

I-D:
https://tools.ietf.org/html/draft-herbert-nvo3-ila-00
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tom Herbert [Mon, 17 Aug 2015 20:42:27 +0000 (13:42 -0700)]

net: Identifier Locator Addressing module

Adding new module name ila. This implements ILA translation. Light
weight tunnel redirection is used to perform the translation in
the data path. This is configured by the "ip -6 route" command
using the "encap ila <locator>" option, where <locator> is the
value to set in destination locator of the packet. e.g.

ip -6 route add 3333:0:0:1:5555:0:1:0/128 \
encap ila 2001:0:0:1 via 2401:db00:20:911a:face:0:25:0

Sets a route where 3333:0:0:1 will be overwritten by
2001:0:0:1 on output.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tom Herbert [Mon, 17 Aug 2015 20:42:26 +0000 (13:42 -0700)]

net: Add inet_proto_csum_replace_by_diff utility function

This function updates a checksum field value and skb->csum based on
a value which is the difference between the old and new checksum.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tom Herbert [Mon, 17 Aug 2015 20:42:25 +0000 (13:42 -0700)]

net: Change pseudohdr argument of inet_proto_csum_replace* to be a bool

inet_proto_csum_replace4,2,16 take a pseudohdr argument which indicates
the checksum field carries a pseudo header. This argument should be a
boolean instead of an int.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tom Herbert [Mon, 17 Aug 2015 20:42:24 +0000 (13:42 -0700)]

lwt: Add support to redirect dst.input

This patch adds the capability to redirect dst input in the same way
that dst output is redirected by LWT.

Also, save the original dst.input and and dst.out when setting up
lwtunnel redirection. These can be called by the client as a pass-
through.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 18 Aug 2015 04:24:59 +0000 (21:24 -0700)]

enic: Fix sparse warning in vnic_devcmd_init().

>> drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13: sparse: incorrect type in assignment (different address spaces)
drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13: expected void *res
drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13: got void [noderef] <asn:2>*

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Tue, 18 Aug 2015 04:22:26 +0000 (21:22 -0700)]

mlx5e: Fix sparse warnings in mlx5e_handle_csum().

>> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44: sparse: incorrect type in argument 1 (different base types)
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44: expected restricted __sum16 [usertype] n
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:173:44: got restricted __be16 [usertype] check_sum

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David Ahern [Sun, 16 Aug 2015 23:13:27 +0000 (17:13 -0600)]

inet: Move VRF table lookup to inlined function

Table lookup compiles out when VRF is not enabled.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David Ahern [Sun, 16 Aug 2015 16:26:49 +0000 (10:26 -0600)]

net: Fix docbook warning for IFF_VRF_MASTER enum

kbuild test robot reported:
tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git master
head: d52736e24fe2e927c26817256f8d1a3c8b5d51a0
commit: 4e3c89920cd3a6cfce22c6f537690747c26128dd [751/762] net: Introduce VRF related flags and helpers
reproduce: make htmldocs

>> Warning(include/linux/netdevice.h:1293): Enum value 'IFF_VRF_MASTER' not described in enum 'netdev_priv_flags'

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David Ahern [Sun, 16 Aug 2015 13:49:20 +0000 (07:49 -0600)]

net: Updates to netif_index_is_vrf

As Eric noted netif_index_is_vrf is not called with rcu_read_lock held,
so wrap the dev_get_by_index_rcu in rcu_read_lock and unlock.

If VRF is not enabled or oif is 0 skip the device lookup. In both cases
index cannot be the VRF master.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 17 Aug 2015 22:51:36 +0000 (15:51 -0700)]

Merge branch 'mlx5e-next'

Achiad Shochat says:

====================
Driver updates 16-Aug-2015

This patchset contains bug fixes, new RSS and pause parameters ethtool
options, and support for RX CHECKSUM_COMPLETE.

Patchset was applied and tested over commit adc6310 ("Merge branch
'mv88e6xxx-switchdev-fdb'").
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Achiad Shochat [Sun, 16 Aug 2015 13:04:52 +0000 (16:04 +0300)]

net/mlx5e: Support RX CHECKSUM_COMPLETE

Only for packets with first ethertype set to IPv4/6 for now.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Achiad Shochat [Sun, 16 Aug 2015 13:04:51 +0000 (16:04 +0300)]

net/mlx5e: Support ethtool get/set_pauseparam

Only rx/tx pause settings.
Autoneg setting is currently not supported.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Achiad Shochat [Sun, 16 Aug 2015 13:04:50 +0000 (16:04 +0300)]

net/mlx5e: Ethtool link speed setting fixes

- Port speed settings are applied by the device only upon
  port admin status transition from DOWN to UP.
  So we enforce this transition regardless of the port's
  current operation state (which may be occasionally DOWN if
  for example the network cable is disconnected).
- Fix the PORT_UP/DOWN device interface enum
- Set the local_port bit in the device PAOS register
- EXPORT the PAOS (Port Administrative and Operational Status)
  register set/query access functions.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Achiad Shochat [Sun, 16 Aug 2015 13:04:49 +0000 (16:04 +0300)]

net/mlx5e: HW LRO changes/fixes

- Change the maximum LRO session size from 16KB to 64KB
- Reduce the LRO session timeout from 512us to 32us in
order to reduce the TCP latency of non-LRO'ed flows.
- Fix skb_shinfo(skb)->gso_size and set skb_shinfo(skb)->gso_type.
- Fix a bug accessing un-initialized mdev pointer.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Achiad Shochat [Sun, 16 Aug 2015 13:04:48 +0000 (16:04 +0300)]

net/mlx5e: Support smaller RX/TX ring sizes

We un-intentionally limited the minimum rings size too much.

TX minimum ring size reduced from 128 to 64.
RX minimum ring size reduced from 128 to 2.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Achiad Shochat [Sun, 16 Aug 2015 13:04:47 +0000 (16:04 +0300)]

net/mlx5e: Add ethtool RSS configuration options

- get_rxfh_key_size
- get_rxfh_indir_size
- get/set_rxfh indirection table and RSS Toeplitz hash key
- get_rxnfc

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Achiad Shochat [Sun, 16 Aug 2015 13:04:46 +0000 (16:04 +0300)]

net/mlx5e: Make RSS indirection table size a constant

The indirection table size was defined by a variable that
was actually assigned a constant value.
Since we do not have any forseen intension to make it configurable
we simply made it a constant.

We also limit the number of channels such that the RSS indirection
table could always populate all RX rings.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Achiad Shochat [Sun, 16 Aug 2015 13:04:45 +0000 (16:04 +0300)]

net/mlx5e: Have a single RSS Toeplitz hash key

No need to generate a unique key per TIR.
Generating a single key per netdev and copying it to all
its TIRs.

Signed-off-by: Achiad Shochat <achiad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 17 Aug 2015 22:41:21 +0000 (15:41 -0700)]

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2015-08-16

Here's what's likely the last bluetooth-next pull request for 4.3:

- 6lowpan/802.15.4 refactoring, cleanups & fixes
- Document 6lowpan netdev usage in Documentation/networking/6lowpan.txt
- Support for UART based QCA Bluetooth controllers
- Power management support for Broeadcom Bluetooth controllers
- Change LE connection initiation to always use passive scanning first
- Support for new Silicon Wave USB ID

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 17 Aug 2015 22:25:30 +0000 (15:25 -0700)]

Merge branch 'enic-devcmd2'

Govindarajulu Varadarajan says:

====================
enic: add devcmd2

This series adds new devcmd2 support. The first two patches are code
refactoring.

devcmd is an interface for driver to communicate with fw/adaptor. It
involves writing data to hardware registers and waiting for the result.
This mechanism does not scale well. The queuing of "no wait" devcmds is
done in firmware memory rather than on the host. Firmware memory is a
rather more scarce and valuable resource than host memory. A devcmd storm
from one vf can disrupt the service on other pf/vf. The lack of flow
control allows for possible denial of server from one VM to another.
Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
allows better flow control.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Govindarajulu Varadarajan [Sat, 15 Aug 2015 20:14:54 +0000 (01:44 +0530)]

enic: add devcmd2

devcmd is an interface for driver to communicate with fw/adaptor. It
involves writing data to hardware registers and waiting for the result.
This mechanism does not scale well. The queuing of "no wait" devcmds is
done in firmware memory rather than on the host. Firmware memory is a
rather more scarce and valuable resource than host memory. A devcmd storm
from one vf can disrupt the service on other pf/vf. The lack of flow
control allows for possible denial of server from one VM to another.

Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
allows better flow control.

Initialize devcmd2, if fails we fall back to devcmd1.

Also change the driver version.

Signed-off-by: N V V Satyanarayana Reddy <nalreddy@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Govindarajulu Varadarajan [Sat, 15 Aug 2015 20:14:53 +0000 (01:44 +0530)]

enic: add devcmd2 resources

Add devcmd resources to vnic_res_type. Add data types used by devcmd.

Signed-off-by: N V V Satyanarayana Reddy <nalreddy@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Govindarajulu Varadarajan [Sat, 15 Aug 2015 20:14:52 +0000 (01:44 +0530)]

enic: use netdev_<foo> or dev_<foo> instead of pr_<foo>

pr_info does not give any details about the interface involved. This patch
uses netdev_info for printing the message. Use dev_info where netdev is not
ready.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Govindarajulu Varadarajan [Sat, 15 Aug 2015 20:14:51 +0000 (01:44 +0530)]

enic: move struct definition from .c to .h file

Some of the structure definitions are in .c file to make them private to
that file. This patch moves the struct definition to .h file, So that their
definitions are accessible from other files.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 17 Aug 2015 21:37:06 +0000 (14:37 -0700)]

net: Export bpf_prog_create_from_user().

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Ian Morris [Fri, 14 Aug 2015 21:43:38 +0000 (22:43 +0100)]

ipv6: trivial whitespace fix

Change brace placement to be in line with coding standards

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Phil Sutter [Fri, 14 Aug 2015 22:37:15 +0000 (00:37 +0200)]

rhashtable-test: extend to test concurrency

After having tested insertion, lookup, table walk and removal, spawn a
number of threads running operations on the same rhashtable. Each of
them will:

1) insert it's own set of objects,
2) lookup every successfully inserted object and finally
3) remove objects in several rounds until all of them have been removed,
making sure the remaining ones are still found after each round.

This should put a good amount of load onto the system and due to
synchronising thread startup via two semaphores also extensive
concurrent table access.

The default number of ten threads returned within half a second on my
local VM with two cores. Running 200 threads took about four seconds. If
slow systems suffer too much from this though, the default could be
lowered or even set to zero so this extended test does not run at all by
default.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 17 Aug 2015 21:31:42 +0000 (14:31 -0700)]

Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge

Antonio Quartulli says:

====================
Included changes:
- avoid integer overflow in GW selection routine
- prevent race condition by making capability bit changes atomic (use
clear/set/test_bit)
- fix synchronization issue in mcast tvlv handler
- fix crash on double list removal of TT Request objects
- fix leak by puring packets enqueued for sending upon iface removal
- ensure network header pointer is set in skb
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 17 Aug 2015 21:25:04 +0000 (14:25 -0700)]

Merge tag 'mac80211-next-for-davem-2015-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
Another pull request for the next cycle, this time with quite
a bit of content:
* mesh fixes/improvements from Alexis, Bob, Chun-Yeow and Jesse
* TDLS higher bandwidth support (Arik)
* OCB fixes from Bertold Van den Bergh
* suspend/resume fixes from Eliad
* dynamic SMPS support for minstrel-HT (Krishna Chaitanya)
* VHT bitrate mask support (Lorenzo Bianconi)
* better regulatory support for 5/10 MHz channels (Matthias May)
* basic support for MU-MIMO to avoid the multi-vif issue (Sara Sharon)
along with a number of other cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 17 Aug 2015 21:22:48 +0000 (14:22 -0700)]

Merge branch 'bpf_fanout'

Willem de Bruijn says:

====================
packet: add cBPF and eBPF fanout modes

Allow programmable fanout modes. Support both classical BPF programs
passed directly and extended BPF programs passed by file descriptor.

One use case is packet steering by deep packet inspection, for
instance for packet steering by application layer header fields.

Separate the configuration of the fanout mode and the configuration
of the program, to allow dynamic updates to the latter at runtime.

Changes
  v1 -> v2:
    - follow SO_LOCK_FILTER semantics on filter updates
    - only accept eBPF programs of type BPF_PROG_TYPE_SOCKET_FILTER
    - rename PACKET_FANOUT_BPF to PACKET_FANOUT_CBPF to match
      man 2 bpf usage: "classic" vs. "extended" BPF.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Willem de Bruijn [Sat, 15 Aug 2015 02:31:37 +0000 (22:31 -0400)]

selftests/net: test extended BPF fanout mode

Test PACKET_FANOUT_EBPF by inserting a program into the the kernel
with bpf(), then attaching it to the fanout group. Observe the same
payload-based distribution as in the PACKET_FANOUT_CBPF test.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Willem de Bruijn [Sat, 15 Aug 2015 02:31:36 +0000 (22:31 -0400)]

selftests/net: test classic bpf fanout mode

Test PACKET_FANOUT_CBPF by inserting a cBPF program that selects a
socket by payload. Requires modifying the test program to send
packets with multiple payloads.

Also fix a bug in testing the return value of mmap()

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Willem de Bruijn [Sat, 15 Aug 2015 02:31:35 +0000 (22:31 -0400)]

packet: add extended BPF fanout mode

Add fanout mode PACKET_FANOUT_EBPF that accepts an en extended BPF
program to select a socket.

Update the internal eBPF program by passing to socket option
SOL_PACKET/PACKET_FANOUT_DATA a file descriptor returned by bpf().

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Beck SC1x5 Kernel

RSS Atom