Tom Herbert [Thu, 5 Jun 2014 00:20:02 +0000 (17:20 -0700)]
net: Support for multiple checksums with gso
When creating a GSO packet segment we may need to set more than
one checksum in the packet (for instance a TCP checksum and
UDP checksum for VXLAN encapsulation). To be efficient, we want
to do checksum calculation for any part of the packet at most once.
This patch adds csum_start offset to skb_gso_cb. This tracks the
starting offset for skb->csum which is initially set in skb_segment.
When a protocol needs to compute a transport checksum it calls
gso_make_checksum which computes the checksum value from the start
of transport header to csum_start and then adds in skb->csum to get
the full checksum. skb->csum and csum_start are then updated to reflect
the checksum of the resultant packet starting from the transport header.
This patch also adds a flag to skbuff, encap_hdr_csum, which is set
in *gso_segment fucntions to indicate that a tunnel protocol needs
checksum calculation
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Thu, 5 Jun 2014 00:19:48 +0000 (17:19 -0700)]
udp: Generic functions to set checksum
Added udp_set_csum and udp6_set_csum functions to set UDP checksums
in packets. These are for simple UDP packets such as those that might
be created in UDP tunnels.
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Jun 2014 22:14:17 +0000 (15:14 -0700)]
Merge branch 'bonding-macvlan'
Vlad Yasevich says:
====================
Fix support for macvlan devices on top bonding
Currently, macvlan devices do not work well over bond interfaces.
Everything works well, untill a failover is triggered in the bond
device and then macvlan becomes unreachble untill arp entries
are flushed. This series adds needed functionality to
handle correct notifications and update switches with mac addresses
assigned to macvlans.
The first patch simply addes IFF_UNICAST_FLT flag to bonds since they
already correctly manage the unicast filter list of the slaves, so
we might as well prevent the bond from needlessly going into promiscuous
mode.
The second patch adds notifier handler to macvlan to trigger correct
ARP notifications.
The third patch adds handling for TLB and RLB modes that use special
ETH_P_LOOPBACK type packets to teach switch about mac addresses.
It also allow ARPs for the macvlan mac addresses to be handled by
RLB mode.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 4 Jun 2014 20:23:38 +0000 (16:23 -0400)]
bonding: Support macvlans on top of tlb/rlb mode bonds
To make TLB mode work, the patch allows learning packets
to be sent using mac addresses assigned to macvlan devices,
also taking into an account vlans that may be between the
bond and macvlan device.
To make RLB work, all we have to do is accept ARP packets
for addresses added to the bond dev->uc list. Since RLB
mode will take care to update the peers directly with
correct mac addresses, learning packets for these addresses
do not have be send to switch.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 4 Jun 2014 20:23:37 +0000 (16:23 -0400)]
macvlan: Support bonding events
Bonding and team drivers generate specific events during failover
that trigger switch updates. When a macvlan device is configured
on top of bonding, we want switches to learn about the macvlan
devices as well. This patch adds a handler to macvlan driver to
propagate these events to all macvlan devices. We let the generic
inetdev event handler do the work.
This allows macvlan to operated correctly over active-backup
mode bond.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Wed, 4 Jun 2014 20:23:36 +0000 (16:23 -0400)]
bonding: Turn on IFF_UNICAST_FLT on bond devices
Bonding devices manage the unicast filters of the underlying
interfaces, but do not turn on IFF_UNICAST_FLT flag. Thus
anytime a unicast address is added to the bond, the bond is
places in promiscuous mode.
Turn on IFF_UNICAST_FLT on the bond device so that the bond does
not go into promiscuous mode needlesly. If an underlying device
does not support unicast filtering, that device will automaticall
enter promiscuous mode already.
Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
A small detail (assuming CONFIG_NET_NS=y) is that while for seq_files
you could do seq_file_net() to get the net ptr, doing so for a regular
file would be wrong and would dereference an invalid pointer.
The fib_triestat lie claimed a victim, and trying to show the file would
be bad for the kernel. This patch just reverts the issue and fixes
fib_triestat, which still needs a rewrite to either be a seq_file or
stop claiming it is.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Antonio Ospite <ao2@ao2.it> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Xiubo Li [Wed, 4 Jun 2014 08:49:16 +0000 (16:49 +0800)]
gianfar: Fix the section mismatch warnings.
Building with CONFIG_DEBUG_SECTION_MISMATCH enabled, the following
WARNING is occured:
LD drivers/net/built-in.o
WARNING: drivers/net/built-in.o(.text+0xcd4c): Section mismatch in
reference from the function gfar_probe() to the function
.init.text:gfar_init_addr_hash_table()
The function gfar_probe() references
the function __init gfar_init_addr_hash_table().
This is often because gfar_probe lacks a __init
annotation or the annotation of gfar_init_addr_hash_table is wrong.
Signed-off-by: Xiubo Li <Li.Xiubo@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Jun 2014 21:48:30 +0000 (14:48 -0700)]
Merge branch 'xen-netback-netfront-multiqueue'
Wei Liu says:
====================
This is rebased version of Andrew's V8 patch series. The original cover letter:
--------------------
xen-net{back, front}: Multiple transmit and receive queues
This patch series implements multiple transmit and receive queues (i.e.
multiple shared rings) for the xen virtual network interfaces.
The series is split up as follows:
- Patch 1 brings the 'grant_copy_op' array back into struct xenvif, in
preparation for multi-queue support. See the patch itself for more details.
- Patches 2 and 4 factor out the queue-specific data for netback and
netfront respectively, and modify the rest of the code to use these
as appropriate.
- Patches 3 and 5 introduce new XenStore keys to negotiate and use
multiple shared rings and event channels, and code to connect these
as appropriate.
- Patch 6 documents the XenStore keys required for the new feature
in include/xen/interface/io/netif.h
All other transmit and receive processing remains unchanged, i.e. there
is a kthread per queue and a NAPI context per queue.
The performance of these patches has been analysed in detail, with
results available at:
To summarise:
* Using multiple queues allows a VM to transmit at line rate on a 10
Gbit/s NIC, compared with a maximum aggregate throughput of 6 Gbit/s
with a single queue.
* For intra-host VM--VM traffic, eight queues provide 171% of the
throughput of a single queue; almost 12 Gbit/s instead of 6 Gbit/s.
* There is a corresponding increase in total CPU usage, i.e. this is a
scaling out over available resources, not an efficiency improvement.
* Results depend on the availability of sufficient CPUs, as well as the
distribution of interrupts and the distribution of TCP streams across
the queues.
Queue selection is currently achieved via an L4 hash on the packet (i.e.
TCP src/dst port, IP src/dst address) and is not negotiated between the
frontend and backend, since only one option exists. Future patches to
support other frontends (particularly Windows) will need to add some
capability to negotiate not only the hash algorithm selection, but also
allow the frontend to specify some parameters to this.
Note that queue selection is a decision by the transmitting system about
which queue to use for a particular packet. In general, the algorithm
may differ between the frontend and the backend with no adverse effects.
Queue-specific XenStore entries for ring references and event channels
are stored hierarchically, i.e. under .../queue-N/... where N varies
from 0 to one less than the requested number of queues (inclusive). If
only one queue is requested, it falls back to the flat structure where
the ring references and event channels are written at the same level as
other vif information.
V8:
- Squash the queue error handling code into patch 3.
- Update the documentation (patch 6) according to comments on the
equivalent patch to Xen.
V7:
- Rebase on latest net-next, which includes the netback grant mapping
patch series from Zoltan Kiss
- Reduce QUEUE_NAME_SIZE by 1 to avoid double-counting the trailing '\0'
- Simplify the queue hashing by using (hash % num_queues) instead of
multiply & shift.
- Add ratelimited warning for invalid queue selection.
- Fix error handling to correctly tear down already setup queues.
- Use dev->real_num_tx_queues instead of separately maintaining a
count of the number of queues.
V6:
- Use 'max_queues' as the module param. name for both netback and netfront.
V5:
- Fix bug in xenvif_free() that could lead to an attempt to transmit an
skb after the queue structures had been freed.
- Improve the XenStore protocol documentation in netif.h.
- Fix IRQ_NAME_SIZE double-accounting for null terminator.
- Move rx_gso_checksum_fixup stat into struct xenvif_stats (per-queue).
- Don't initialise a local variable that is set in both branches (xspath).
V4:
- Add MODULE_PARM_DESC() for the multi-queue parameters for netback
and netfront modules.
- Move del_timer_sync() in netfront to after unregister_netdev, which
restores the order in which these functions were called before applying
these patches.
V3:
- Further indentation and style fixups.
V2:
- Rebase onto net-next.
- Change queue->number to queue->id.
- Add atomic operations around the small number of stats variables that
are not queue-specific or per-cpu.
- Fixup formatting and style issues.
- XenStore protocol changes documented in netif.h.
- Default max. number of queues to num_online_cpus().
- Check requested number of queues does not exceed maximum.
--------------------
I rebased this on top of net-next. No functional change is introduced. The
patch that needed some extra care was "xen-netback: Factor queue-specific data
into queue struct" because it clashed with a fix introduced in net. A simple
test of creating guest, iperf, then shutting down guest worked as expected.
The last patch fixes a minor problem that queue name is not initialised in
xen-netfront, resulting in names like "-tx" "-rx" in /proc/interrupt.
Changes since v9 (no functional change introduced):
* include commit summary in the commit message of first patch
* fold David Vrabel's Reviewed-by into last patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
xen-net{back, front}: Document multi-queue feature in netif.h
Document the multi-queue feature in terms of XenStore keys to be written
by the backend and by the frontend.
Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Build on the refactoring of the previous patch to implement multiple
queues between xen-netfront and xen-netback.
Check XenStore for multi-queue support, and set up the rings and event
channels accordingly.
Write ring references and event channels to XenStore in a queue
hierarchy if appropriate, or flat when using only one queue.
Update the xennet_select_queue() function to choose the queue on which
to transmit a packet based on the skb hash result.
Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
xen-netfront: Factor queue-specific data into queue struct.
In preparation for multi-queue support in xen-netfront, move the
queue-specific data from struct netfront_info to struct netfront_queue,
and update the rest of the code to use this.
Also adds loops over queues where appropriate, even though only one is
configured at this point, and uses alloc_etherdev_mq() and the
corresponding multi-queue netif wake/start/stop functions in preparation
for multiple active queues.
Finally, implements a trivial queue selection function suitable for
ndo_select_queue, which simply returns 0, selecting the first (and
only) queue.
Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Builds on the refactoring of the previous patch to implement multiple
queues between xen-netfront and xen-netback.
Writes the maximum supported number of queues into XenStore, and reads
the values written by the frontend to determine how many queues to use.
Ring references and event channels are read from XenStore on a per-queue
basis and rings are connected accordingly.
Also adds code to handle the cleanup of any already initialised queues
if the initialisation of a subsequent queue fails.
Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Liu [Wed, 4 Jun 2014 09:30:42 +0000 (10:30 +0100)]
xen-netback: Factor queue-specific data into queue struct
In preparation for multi-queue support in xen-netback, move the
queue-specific data from struct xenvif into struct xenvif_queue, and
update the rest of the code to use this.
Also adds loops over queues where appropriate, even though only one is
configured at this point, and uses alloc_netdev_mq() and the
corresponding multi-queue netif wake/start/stop functions in preparation
for multiple active queues.
Finally, implements a trivial queue selection function suitable for
ndo_select_queue, which simply returns 0 for a single queue and uses
skb_get_hash() to compute the queue index otherwise.
Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
xen-netback: Move grant_copy_op array back into struct xenvif.
This array was allocated separately in commit ac3d5ac2 ("xen-netback:
fix guest-receive-side array sizes") due to it being very large, and a
struct xenvif is allocated as the netdev_priv part of a struct
net_device, i.e. via kmalloc() but falling back to vmalloc() if the
initial alloc. fails.
In preparation for the multi-queue patches, where this array becomes
part of struct xenvif_queue and is always allocated through vzalloc(),
move this back into the struct xenvif.
Signed-off-by: Andrew J. Bennieston <andrew.bennieston@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Jun 2014 21:40:17 +0000 (14:40 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates
This series contains updates to e1000, igb and ixgbe.
Emil provides his version 2 fix for the detection of SFP+ capable interfaces.
In cases where the driver is loaded while there are no SFP+ modules in cage,
the interface was not being detected as SFP capable. Resolve the issue by
identifying interfaces with no PHY type set as SFP capable which allows the
driver to detect the SFP module when the interface is brought up. In this
version 2 of the patch, the 82599 specific check was removed since we only
have 82598 devices that are SFP capable.
Jacob removes the including of the export header in the ixgbe PTP core, since
it is not needed. Renames igb_ptp_enable() to igb_ptp_feature_enable() to
better reflect the actual functions purpose.
Todd fixes the ethtool loopback test for i354 backplane devices since we
do not know what PHY is to be used for the devices, use MAC loopback for
ethtool tests. Todd also sets the packet buffer size register defaults for
i210 devices.
Yongjian Xu removes the check for skb->len being negative or zero since there
is never a case where it would be zero or negative for e1000.
Manuel Schölling updates e1000 to use the time_after() helper function.
v2: Fix indentation on wrapped line in patch 3 of the series based on
feedback from David Miller
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Manuel Schölling [Fri, 23 May 2014 18:04:17 +0000 (18:04 +0000)]
e1000: Use time_after() for time comparison
To be future-proof and for better readability the time comparisons are modified
to use time_after() instead of plain, error-prone math.
Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 16 May 2014 07:21:13 +0000 (07:21 +0000)]
igb: rename igb_ptp_enable to igb_ptp_feature_enable
The name igb_ptp_enable is not synonymous with the purpose of this
function, so rename it to better explain its purpose.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Fri, 23 May 2014 08:18:13 +0000 (08:18 +0000)]
ixgbe: remove linux/export.h header from ixgbe_ptp.c
We don't need this header file, so we shouldn't be including it.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Thu, 15 May 2014 07:16:53 +0000 (07:16 +0000)]
ixgbe: fix detection of SFP+ capable interfaces
In cases where the driver is loaded while there are no SFP+ modules in
the cage the interface was not being detected as SFP capable. To account
for this the driver called identify_sfp in ixgbe_get_settings to make
sure the data is correct. However when there is no SFP+ module in the cage
the driver waits for the I2C reads to time out which can take more than a
second and will cause issues with tools (like net-snmp) that may poll
for that information.
This patch resolves the issue by identifying interfaces with no PHY
type set as SFP capable which allows the driver to detect the SFP module
when the interface is brought up. As result of this we can also remove the
identify_sfp call from ixgbe_get_settings.
v2: remove the 82599 specific check since we have 82598 devices that are SFP
capable.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Sergei Shtylyov [Tue, 3 Jun 2014 19:42:26 +0000 (23:42 +0400)]
sh_eth: fix SH7619/771x support
Commit 4a55530f38e4 (net: sh_eth: modify the definitions of register) managed
to leave out the E-DMAC register entries in sh_eth_offset_fast_sh3_sh2[], thus
totally breaking SH7619/771x support. Add the missing entries using the data
from before that commit.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Acked-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Dooks [Tue, 3 Jun 2014 11:21:13 +0000 (12:21 +0100)]
sh_eth: use RNC mode for packet reception
The current behaviour of the sh_eth driver is not to use the RNC bit
for the receive ring. This means that every packet recieved is not only
generating an IRQ but it also stops the receive ring DMA as well until
the driver re-enables it after unloading the packet.
This means that a number of the following errors are generated due to
the receive packet FIFO overflowing due to nowhere to put packets:
net eth0: Receive FIFO Overflow
Since feedback from Yoshihiro Shimoda shows that every supported LSI
for this driver should have the bit enabled it seems the best way is
to remove the RMCR default value from the per-system data and just
write it when initialising the RMCR value. This is discussed in
the message (http://www.spinics.net/lists/netdev/msg284912.html).
I have tested the RMCR_RNC configuration with NFS root filesystem and
the driver has not failed yet. There are further test reports from
Sergei Shtylov and others for both the R8A7790 and R8A7791.
There is also feedback fron Cao Minh Hiep[1] which reports the
same issue in (http://comments.gmane.org/gmane.linux.network/316285)
showing this fixes issues with losing UDP datagrams under iperf.
Tested-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Acked-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Tue, 3 Jun 2014 23:40:47 +0000 (16:40 -0700)]
rtnetlink: fix a memory leak when ->newlink fails
It is possible that ->newlink() fails before registering
the device, in this case we should just free it, it's
safe to call free_netdev().
Fixes: commit 0e0eee2465df77bcec2 (net: correct error path in rtnl_newlink()) Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x: Fix kernel crash and data miscompare after EEH recovery
A rmb() is required to ensure that the CQE is not read before it
is written by the adapter DMA. PCI ordering rules will make sure
the other fields are written before the marker at the end of struct
eth_fast_path_rx_cqe but without rmb() a weakly ordered processor can
process stale data.
Without the barrier we have observed various crashes including
bnx2x_tpa_start being called on queues not stopped (resulting in message
start of bin not in stop) and NULL pointer exceptions from bnx2x_rx_int.
Signed-off-by: Milton Miller <miltonm@us.ibm.com> Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: driver: smsc: set NOCARRIER flag in dev at driver initialization
As smsc driver supports carrier detection, it should unset NOCARRIER
flag only after carrier state determination. By default that flag
is off so driver should set it before starting auto-negotiation
Signed-off-by: Balakumaran <Balakumaran.Kannan@ap.sony.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The uuid structure could be managed as a const in several places.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubecek [Tue, 3 Jun 2014 08:26:06 +0000 (10:26 +0200)]
xfrm: fix race between netns cleanup and state expire notification
The xfrm_user module registers its pernet init/exit after xfrm
itself so that its net exit function xfrm_user_net_exit() is
executed before xfrm_net_exit() which calls xfrm_state_fini() to
cleanup the SA's (xfrm states). This opens a window between
zeroing net->xfrm.nlsk pointer and deleting all xfrm_state
instances which may access it (via the timer). If an xfrm state
expires in this window, xfrm_exp_state_notify() will pass null
pointer as socket to nlmsg_multicast().
As the notifications are called inside rcu_read_lock() block, it
is sufficient to retrieve the nlsk socket with rcu_dereference()
and check the it for null.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhangfei Gao [Tue, 3 Jun 2014 05:49:37 +0000 (13:49 +0800)]
net: hisilicon: add hix5hd2 mac driver
Add support for the hix5hd2 XGMAC 1Gb ethernet device.
The controller requires two queues for tx and two queues for rx.
Controller fetch buffer from free queue and then push to used queue.
Diver should prepare free queue and free buffer from used queue.
Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Tue, 3 Jun 2014 06:08:48 +0000 (23:08 -0700)]
cnic: Fix missing ISCSI_KEVENT_IF_DOWN message
The iSCSI netlink message needs to be sent before the ulp_ops is cleared
as it is sent through a function pointer in the ulp_ops. This bug
causes iscsid to not get the message when the bnx2i driver is unloaded.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Tue, 3 Jun 2014 06:08:47 +0000 (23:08 -0700)]
cnic: Don't take cnic_dev_lock in cnic_alloc_uio_rings()
We are allocating memory with GFP_KERNEL under spinlock. Since this is
the only call manipulating the cnic_udev_list and it is always under
rtnl_lock, cnic_dev_lock can be safely removed.
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Tue, 3 Jun 2014 06:08:46 +0000 (23:08 -0700)]
cnic: Don't take rcu_read_lock in cnic_rcv_netevent()
Because the called function, such as bnx2fc_indicate_netevent(), can sleep,
we cannot take rcu_lock(). To prevent the rcu protected ulp_ops from going
away, we use the cnic_lock mutex and set the ULP_F_CALL_PENDING flag.
The code already waits for ULP_F_CALL_PENDING flag to clear in
cnic_unregister_device().
Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In commit cd11cf505318ff24e42f35145f9cdf8596fa1958 I accidentally
added an error message. I used it for debugging and forgot to remove
it before submitting the patch.
Signed-off-by: Christian Riesch <christian.riesch@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 3 Jun 2014 06:07:02 +0000 (23:07 -0700)]
Merge branch 'ethtool-rssh-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bwh/net-next
Ben Hutchings says:
====================
Pull request: Fixes for new ethtool RSS commands
This addresses several problems I previously identified with the new
ETHTOOL_{G,S}RSSH commands:
1. Missing validation of reserved parameters
2. Vague documentation
3. Use of unnamed magic number
4. No consolidation with existing driver operations
I don't currently have access to suitable network hardware, but have
tested these changes with a dummy driver that can support various
combinations of operations and sizes, together with (a) Debian's ethtool
3.13 (b) ethtool 3.14 with the submitted patch to use ETHTOOL_{G,S}RSSH
and minor adjustment for fixes 1 and 3.
v2: Update RSS operations in vmxnet3 too
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ben Hutchings [Thu, 15 May 2014 00:25:27 +0000 (01:25 +0100)]
ethtool: Replace ethtool_ops::{get,set}_rxfh_indir() with {get,set}_rxfh()
ETHTOOL_{G,S}RXFHINDIR and ETHTOOL_{G,S}RSSH should work for drivers
regardless of whether they expose the hash key, unless you try to
set a hash key for a driver that doesn't expose it.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
1) Unbreak zebra and other netlink apps, from Eric W Biederman.
2) Some new qmi_wwan device IDs, from Aleksander Morgado.
3) Fix info leak in DCB netlink handler of qlcnic driver, from Dan
Carpenter.
4) inet_getid() and ipv6_select_ident() do not generate monotonically
increasing ID numbers, fix from Eric Dumazet.
5) Fix memory leak in __sk_prepare_filter(), from Leon Yu.
6) Netlink leftover bytes warning message is user triggerable, rate
limit it. From Michal Schmidt.
7) Fix non-linear SKB panic in ipvs, from Peter Christensen.
8) Congestion window undo needs to be performed even if only never
retransmitted data is SACK'd, fix from Yuching Cheng.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
net: filter: fix possible memory leak in __sk_prepare_filter()
net: ec_bhf: Add runtime dependencies
tcp: fix cwnd undo on DSACK in F-RTO
netlink: Only check file credentials for implicit destinations
ipheth: Add support for iPad 2 and iPad 3
team: fix mtu setting
net: fix inet_getid() and ipv6_select_ident() bugs
net: qmi_wwan: interface #11 in Sierra Wireless MC73xx is not QMI
net: qmi_wwan: add additional Sierra Wireless QMI devices
bridge: Prevent insertion of FDB entry with disallowed vlan
netlink: rate-limit leftover bytes warning and print process name
bridge: notify user space after fdb update
net: qmi_wwan: add Netgear AirCard 341U
net: fix wrong mac_len calculation for vlans
batman-adv: fix NULL pointer dereferences
net/mlx4_core: Reset RoCE VF gids when guest driver goes down
emac: aggregation of v1-2 PLB errors for IER register
emac: add missing support of 10mbit in emac/rgmii
can: only rename enabled led triggers when changing the netdev name
ipvs: Fix panic due to non-linear skb
...
Fabio Estevam [Mon, 2 Jun 2014 18:44:30 +0000 (15:44 -0300)]
fec: Include pinctrl header file
Commit 5bbde4d2ec ("net: fec: use pinctrl PM helpers") caused the following
build error on m68k:
drivers/net/ethernet/freescale/fec_main.c: In function 'fec_enet_open':
drivers/net/ethernet/freescale/fec_main.c:1819:2: error: implicit declaration of function 'pinctrl_pm_select_default_state' [-Werror=implicit-function-declaration]
drivers/net/ethernet/freescale/fec_main.c: In function 'fec_enet_close':
drivers/net/ethernet/freescale/fec_main.c:1863:2: error: implicit declaration of function 'pinctrl_pm_select_sleep_state' [-Werror=implicit-function-declaration]
In order to fix the build error, include the linux/pinctrl/consumer.h header
file.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu [Wed, 28 May 2014 05:39:37 +0000 (22:39 -0700)]
bridge: Add bridge ifindex to bridge fdb notify msgs
(This patch was previously posted as RFC at
http://patchwork.ozlabs.org/patch/352677/)
This patch adds NDA_MASTER attribute to neighbour attributes enum for
bridge/master ifindex. And adds NDA_MASTER to bridge fdb notify msgs.
Today bridge fdb notifications dont contain bridge information.
Userspace can derive it from the port information in the fdb
notification. However this is tricky in some scenarious.
Example, bridge port delete notification comes before bridge fdb
delete notifications. And we have seen problems in userspace
when using libnl where, the bridge fdb delete notification handling code
does not understand which bridge this fdb entry is part of because
the bridge and port association has already been deleted.
And these notifications (port membership and fdb) are generated on
separate rtnl groups.
Fixing the order of notifications could possibly solve the problem
for some cases (I can submit a separate patch for that).
This patch chooses to add NDA_MASTER to bridge fdb notify msgs
because it not only solves the problem described above, but also helps
userspace avoid another lookup into link msgs to derive the master index.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Leon Yu [Sun, 1 Jun 2014 05:37:25 +0000 (05:37 +0000)]
net: filter: fix possible memory leak in __sk_prepare_filter()
__sk_prepare_filter() was reworked in commit bd4cf0ed3 (net: filter:
rework/optimize internal BPF interpreter's instruction set) so that it should
have uncharged memory once things went wrong. However that work isn't complete.
Error is handled only in __sk_migrate_filter() while memory can still leak in
the error path right after sk_chk_filter().
Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set") Signed-off-by: Leon Yu <chianglungyu@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Tested-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 3 Jun 2014 00:04:37 +0000 (17:04 -0700)]
Merge tag 'md/3.15-fixes' of git://neil.brown.name/md
Pull two md bugfixes from Neil Brown:
"Two md bugfixes for possible corruption when restarting reshape
If a raid5/6 reshape is restarted (After stopping and re-assembling
the array) and the array is marked read-only (or read-auto), then the
reshape will appear to complete immediately, without actually moving
anything around. This can result in corruption.
There are two patches which do much the same thing in different
places. They are separate because one is an older bug and so can be
applied to more -stable kernels"
* tag 'md/3.15-fixes' of git://neil.brown.name/md:
md: always set MD_RECOVERY_INTR when interrupting a reshape thread.
md: always set MD_RECOVERY_INTR when aborting a reshape or other "resync".
This patch remove variables that are initialized with a constant,
are never updated, and are only used as parameter of return.
Return the constant instead of using a variable.
Verified by compilation only.
The coccinelle script that find and fixes this issue is:
// <smpl>
@@
type T;
constant C;
identifier ret;
@@
- T ret = C;
... when != ret
when strict
return
- ret
+ C
;
// </smpl>
Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jean Delvare [Sat, 31 May 2014 15:32:27 +0000 (17:32 +0200)]
net: ec_bhf: Add runtime dependencies
The ec_bhf driver is specific to the Beckhoff CX embedded PC series.
These are based on Intel x86 CPU. So we can add a dependency on
X86, with COMPILE_TEST as an alternative to still allow for broader
build-testing.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Darek Marcinkiewicz <reksio@newterm.pl> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Queued trim only works for some users with MU05 firmware. Revert to
blacklisting all firmware versions.
Introduced by commit d121f7d0cbb8 ("libata: Update queued trim blacklist
for M5x0 drives") which this effectively reverts, while retaining the
blacklisting of M550.
See
https://bugzilla.kernel.org/show_bug.cgi?id=71371
for reports of trouble with MU05 firmware.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch remove variables that are initialized with a constant,
are never updated, and are only used as parameter of return.
Return the constant instead of using a variable.
Verified by compilation only.
The coccinelle script that find and fixes this issue is:
// <smpl>
@@
type T;
constant C;
identifier ret;
@@
- T ret = C;
... when != ret
when strict
return
- ret
+ C
;
// </smpl>
Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch remove variables that are initialized with a constant,
are never updated, and are only used as parameter of return.
Return the constant instead of using a variable.
Verified by compilation only.
The coccinelle script that find and fixes this issue is:
// <smpl>
@@
type T;
constant C;
identifier ret;
@@
- T ret = C;
... when != ret
when strict
return
- ret
+ C
;
// </smpl>
Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 2 Jun 2014 23:57:23 +0000 (16:57 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Peter Anvin:
"A single quite small patch that managed to get overlooked earlier, to
prevent a user space triggerable oops on systems without HPET"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, vdso: Fix an OOPS accessing the HPET mapping w/o an HPET
Linus Torvalds [Mon, 2 Jun 2014 23:56:42 +0000 (16:56 -0700)]
Merge tag 'usb-3.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some fixes for 3.15-rc8 that resolve a number of tiny USB
issues that have been reported, and there are some new device ids as
well.
All have been tested in linux-next"
* tag 'usb-3.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: delete endpoints from bandwidth list before freeing whole device
usb: pci-quirks: Prevent Sony VAIO t-series from switching usb ports
USB: cdc-wdm: properly include types.h
usb: cdc-wdm: export cdc-wdm uapi header
USB: serial: option: add support for Novatel E371 PCIe card
USB: ftdi_sio: add NovaTech OrionLXm product ID
USB: io_ti: fix firmware download on big-endian machines (part 2)
USB: Avoid runtime suspend loops for HCDs that can't handle suspend/resume
Linus Torvalds [Mon, 2 Jun 2014 23:55:18 +0000 (16:55 -0700)]
Merge tag 'staging-3.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are some staging driver fixes for 3.15.
Three are for the speakup drivers (one fixes a regression caused in
3.15-rc, and the other two resolve a tty issue found by Ben Hutchings)
The comedi and r8192e_pci driver fixes also resolve reported issues"
* tag 'staging-3.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: r8192e_pci: fix htons error
Staging: speakup: Update __speakup_paste_selection() tty (ab)usage to match vt
Staging: speakup: Move pasting into a work item
staging: comedi: ni_daq_700: add mux settling delay
speakup: fix incorrect perms on speakup_acntsa.c
Yuchung Cheng [Fri, 30 May 2014 22:25:59 +0000 (15:25 -0700)]
tcp: fix cwnd undo on DSACK in F-RTO
This bug is discovered by an recent F-RTO issue on tcpm list
https://www.ietf.org/mail-archive/web/tcpm/current/msg08794.html
The bug is that currently F-RTO does not use DSACK to undo cwnd in
certain cases: upon receiving an ACK after the RTO retransmission in
F-RTO, and the ACK has DSACK indicating the retransmission is spurious,
the sender only calls tcp_try_undo_loss() if some never retransmisted
data is sacked (FLAG_ORIG_DATA_SACKED).
The correct behavior is to unconditionally call tcp_try_undo_loss so
the DSACK information is used properly to undo the cwnd reduction.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Gortmaker [Fri, 30 May 2014 19:39:30 +0000 (15:39 -0400)]
drivers/net: fix broadcom/bcmsysport.c compile fail on SPARC
To fix:
CC drivers/net/ethernet/broadcom/bcmsysport.o
In file included from drivers/net/ethernet/broadcom/bcmsysport.c:28:0:
drivers/net/ethernet/broadcom/bcmsysport.h:41:8: error: redefinition of 'struct tsb'
arch/sparc/include/asm/mmu_64.h:65:8: note: originally defined here
make[1]: *** [drivers/net/ethernet/broadcom/bcmsysport.o] Error 1
we change struct tsb to struct bcm_tsb in the broadcom driver in
order to avoid the namespace collision. For consistency, we also
change struct rsb to struct bcm_rsb, so the Rx/Tx symmetry is
maintained.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Fri, 30 May 2014 19:34:39 +0000 (13:34 -0600)]
fib_trie: use seq_file_net rather than seq->private
Make fib_triestat_seq_show consistent with other /proc/net files and
use seq_file_net.
Signed-off-by: David Ahern <dsahern@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
netlink: Only check file credentials for implicit destinations
It was possible to get a setuid root or setcap executable to write to
it's stdout or stderr (which has been set made a netlink socket) and
inadvertently reconfigure the networking stack.
To prevent this we check that both the creator of the socket and
the currentl applications has permission to reconfigure the network
stack.
Unfortunately this breaks Zebra which always uses sendto/sendmsg
and creates it's socket without any privileges.
To keep Zebra working don't bother checking if the creator of the
socket has privilege when a destination address is specified. Instead
rely exclusively on the privileges of the sender of the socket.
Note from Andy: This is exactly Eric's code except for some comment
clarifications and formatting fixes. Neither I nor, I think, anyone
else is thrilled with this approach, but I'm hesitant to wait on a
better fix since 3.15 is almost here.
Note to stable maintainers: This is a mess. An earlier series of
patches in 3.15 fix a rather serious security issue (CVE-2014-0181),
but they did so in a way that breaks Zebra. The offending series
includes:
net: Add variants of capable for use on netlink messages
If a given kernel version is missing that series of fixes, it's
probably worth backporting it and this patch. if that series is
present, then this fix is critical if you care about Zebra.
Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Chema Gonzalez [Fri, 30 May 2014 17:15:12 +0000 (10:15 -0700)]
net: filter: fix length calculation in BPF testsuite
The current probe_filter_length() (the function that calculates the
length of a test BPF filter) behavior is to declare the end of the
filter as soon as it finds {0, *, *, 0}. This is actually a valid
insn ("ld #0"), so any filter with includes "BPF_STMT(BPF_LD | BPF_IMM, 0)"
fails (its length is cut short).
We are changing probe_filter_length() so as to start from the end, and
declare the end of the filter as the first instruction which is not
{0, *, *, 0}. This solution produces a simpler patch than the
alternative of using an explicit end-of-filter mark. It is technically
incorrect if your filter ends up with "ld #0", but that should not
happen anyway.
We also add a new test (LD_IMM_0) that includes ld #0 (does not work
without this patch).
Signed-off-by: Chema Gonzalez <chema@google.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset consists of different fixes and improvements in the mvneta
and mv643xx_eth drivers. The most important change is the one that allows
to support small MSS values (see patches 2 and 6).
This is done following the Solarflare driver (see commit 7e6d06f0de3f7).
While doing this some other fixes were spotted and so they are included.
Finally, notice that the TSO support introduced a wrong DMA unmapping
of the TSO header buffers, so patches 4 and 8 provide a couple patches to
fix that in the drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Fri, 30 May 2014 16:40:11 +0000 (13:40 -0300)]
net: mv643xx_eth: Avoid unmapping the TSO header buffers
The buffers for the TSO headers belong to a DMA coherent region which is
allocated at ndo_open() time, and released at ndo_stop() time.
Therefore, and contrary to the TSO payload descriptor buffers, the TSO header
buffers don't need to be unmapped. This commit adds a check to detect a
TSO header buffer and explicitly prevent the unmap.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Fri, 30 May 2014 16:40:10 +0000 (13:40 -0300)]
net: mv643xx_eth: Drop the NETDEV_TX_BUSY return path
After adding proper stop/wake thresholds, we can expect a queue to never
be full and drop the NETDEV_TX_BUSY return path. In any case, if the queue
cannot accommodate a TSO packet, the packet would be discarded.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Fri, 30 May 2014 16:40:09 +0000 (13:40 -0300)]
net: mv643xx_eth: Limit the TSO segments and adjust stop/wake thresholds
Currently small MSS values may require too many TSO descriptors for
the default queue size. This commit prevents this situation by fixing
the maximum supported TSO number of segments to 100 and by setting a
minimum Tx queue size. The minimum Tx queue size is set so that at
least 2 worst-case skb can be accommodated.
In addition, the queue stop and wake thresholds values are adjusted
accordingly. The queue is stopped when there's room for only 1 worst-case
skb and waked when the number of descriptors is half that value.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Fri, 30 May 2014 16:40:08 +0000 (13:40 -0300)]
net: mv643xx_eth: Count dropped packets properly
This commit fixes the current dropped packet count by doing it properly,
increasing the count when a packet is discarded; i.e. the packet is not
processed and the driver returns NETDEV_TX_OK.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Fri, 30 May 2014 16:40:07 +0000 (13:40 -0300)]
net: mvneta: Avoid unmapping the TSO header buffers
The buffers for the TSO headers belong to a DMA coherent region which is
allocated at ndo_open() time, and released at ndo_stop() time.
Therefore, and contrary to the TSO payload descriptor buffers, the TSO header
buffers don't need to be unmapped. This commit adds a check to detect a
TSO header buffer and explicitly prevent the unmap.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Fri, 30 May 2014 16:40:06 +0000 (13:40 -0300)]
net: mvneta: Fix missing DMA region unmap
The Tx descriptor release code currently calls dma_unmap_single() and
dev_kfree_skb_any() if the descriptor is associated with a non-NULL skb.
This is true only for the last fragment of the packet.
This is wrong, however, since every descriptor buffer is DMA mapped and needs
to be unmapped.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ezequiel Garcia [Fri, 30 May 2014 16:40:05 +0000 (13:40 -0300)]
net: mvneta: Limit the TSO segments and adjust stop/wake thresholds
Currently small MSS values may require too many TSO descriptors for
the default queue size. This commit prevents this situation by fixing
the maximum supported TSO number of segments to 100 and by setting a
minimum Tx queue size. The minimum Tx queue size is set so that at
least 2 worst-case skb can be accommodated.
In addition, the queue stop and wake thresholds values are adjusted
accordingly. The queue is stopped when there's room for only 1 worst-case
skb and waked when the number of descriptors is half that value.
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kristian Evensen [Fri, 30 May 2014 10:17:00 +0000 (12:17 +0200)]
ipheth: Add support for iPad 2 and iPad 3
Each iPad model has a different product id, this patch adds support for iPad 2
(pid 0x12a2) and iPad 3 (pid 0x12a6). Note that iPad 2 must be jailbroken and a
third-party app must be used for tethering to work. On iPad 3, tethering works
out of the box (assuming your ISP is nice).
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 2 Jun 2014 23:01:37 +0000 (16:01 -0700)]
Merge branch 'cdc_ncm'
Bjørn Mork says:
====================
cdc_ncm: fixes and conversion to sysfs API
After considering the comments received after the ethtool coalesce
support was commited, I have ended up concluding that we should
remove it again, while we can, before it hits a release. The idea
was not well enough thought through, and all comments received
pointed to advantages of using a sysfs based API instead.
This series removes the ethtool coalesce support and replaces it
with sysfs attributes in a driver specific group under the netdev.
The first 3 patches are unrelated fixes:
patch 1: reducing truesize as discussed
patch 2: fixing a potentional buffer overrun when changing tx_max
patch 3: prevent framing errors when changing rx_max
Changes v2:
- minor editorial changes to patch 8, as suggested by Peter Stuge
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 30 May 2014 07:31:09 +0000 (09:31 +0200)]
net: cdc_ncm: allow tuning min_tx_pkt
The min_tx_pkt variable decides the cutoff point where the driver
will stop padding out NTBs to maximum size. The padding is a tradeoff
where we use some USB bus bandwidth to allow the device to receive
fixed size buffers. Different devices will have different optimal
settings, spanning from no padding at all to padding every NTB.
There is no way to automatically figure out which setting is best
for a specific device.
The default value is a reasonable tradeoff, calculated based on the
USB packet size and out NTB max size. This may have to be changed
along with any tx_max changes.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 30 May 2014 07:31:08 +0000 (09:31 +0200)]
net: cdc_ncm: export NCM Transfer Block (NTB) parameters
The mandatory GetNtbParameters control request is an important part of
the host <-> device protocol negotiation in CDC NCM (and CDC MBIM). It
gives device limits which the host must obey when configuring the
protocol aggregation variables. The driver will enforce this by
rejecting attempts to set any of the tunable variables to a value
which is not supported by the device. Exporting the parameter block
helps userspace decide which values are allowed without resorting
to trial and error.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 30 May 2014 07:31:07 +0000 (09:31 +0200)]
net: cdc_ncm: drop ethtool coalesce support
The ethtool coalesce API is not applicable for this driver. Forcing
it to fit the NCM aggregation redefined the API in a driver specific
way, which is much worse than defining a clean new API. These ethtool
coalesce functions have therefore been replaced by a new sysfs API.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 30 May 2014 07:31:06 +0000 (09:31 +0200)]
net: cdc_ncm: use sysfs for rx/tx aggregation tuning
Attach a driver specific sysfs group to the netdev, and use it
for the rx/tx aggregation variables.
The datagram aggregation defined by the CDC NCM specification is
specific to this device class (including CDC MBIM). Using the
ethtool interrupt coalesce API as an interface to the aggregation
parameters redefined that API in a driver specific and confusing
way. A sysfs group
- makes it clear that this is a driver specific userspace API, and
- allows us to export the real values instead of some translated
version, and
- lets us include more aggregation variables which were impossible
to force into the ethtool API.
Additionally, using sysfs allows tuning the driver on space
constrained hosts where userspace tools like ethtool are undesired.
Suggested-by: Peter Stuge <peter@stuge.se> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 30 May 2014 07:31:05 +0000 (09:31 +0200)]
net: cdc_ncm: inform usbnet when rx buffers are reduced
It doesn't matter whether the buffer size goes up or down. We have to
keep usbnet and device syncronized to be able to split transfers at the
correct boundaries. The spec allow skipping short packets when using
max sized transfers. If we don't tell usbnet about our new expected rx
buffer size, then it will merge and/or split NTBs. The driver does not
support this, and the result will be lots of framing errors.
Fix by always reallocating usbnet rx buffers when the rx_max value
changes.
Fixes: 68864abf08f0 ("net: cdc_ncm: support rx_max/tx_max updates when running") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 30 May 2014 07:31:04 +0000 (09:31 +0200)]
net: cdc_ncm: always reallocate tx_curr_skb when tx_max increases
We are calling usbnet_start_xmit() to flush any remaining data,
depending on the side effect that tx_curr_skb is set to NULL,
ensuring a new allocation using the updated tx_max. But this
side effect will only happen if there were any cached data ready
to transmit. If not, then an empty tx_curr_skb is still allocated
using the old tx_max size. Free it to avoid a buffer overrun.
Fixes: 68864abf08f0 ("net: cdc_ncm: support rx_max/tx_max updates when running") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 30 May 2014 07:31:03 +0000 (09:31 +0200)]
net: cdc_ncm: reduce skb truesize in rx path
Cloning the big skbs we use for USB buffering chokes up TCP and
SCTP because the socket memory limits are hitting earlier than
they should. It is better to unconditionally copy the unwrapped
packets to freshly allocated skbs.
Reported-by: Jim Baxter <jim_baxter@mentor.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
dingtianhong [Fri, 30 May 2014 06:32:49 +0000 (14:32 +0800)]
macvlan: fix the problem when mac address changes for passthru mode
The macvlan dev should always have the same mac address like lowerdev
when in the passthru mode, change the mac address alone will break the
work mechanism, so when the lowerdev or macvlan mac address changes,
we should propagate the changes to another dev.
v1->v2: Allow macvlan dev to change mac address for passthru mode and propagate to
lowerdev.
v2->v3: Don't set the mac address to the lower dev's unicast address for
passthru mode when mac address changes.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 29 May 2014 18:46:17 +0000 (20:46 +0200)]
team: fix mtu setting
Now it is not possible to set mtu to team device which has a port
enslaved to it. The reason is that when team_change_mtu() calls
dev_set_mtu() for port device, notificator for NETDEV_PRECHANGEMTU
event is called and team_device_event() returns NOTIFY_BAD forbidding
the change. So fix this by returning NOTIFY_DONE here in case team is
changing mtu in team_change_mtu().
Introduced-by: 3d249d4c "net: introduce ethernet teaming device" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 29 May 2014 15:45:14 +0000 (08:45 -0700)]
net: fix inet_getid() and ipv6_select_ident() bugs
I noticed we were sending wrong IPv4 ID in TCP flows when MTU discovery
is disabled.
Note how GSO/TSO packets do not have monotonically incrementing ID.
06:37:41.575531 IP (id 14227, proto: TCP (6), length: 4396)
06:37:41.575534 IP (id 14272, proto: TCP (6), length: 65212)
06:37:41.575544 IP (id 14312, proto: TCP (6), length: 57972)
06:37:41.575678 IP (id 14317, proto: TCP (6), length: 7292)
06:37:41.575683 IP (id 14361, proto: TCP (6), length: 63764)
It appears I introduced this bug in linux-3.1.
inet_getid() must return the old value of peer->ip_id_count,
not the new one.
Lets revert this part, and remove the prevention of
a null identification field in IPv6 Fragment Extension Header,
which is dubious and not even done properly.
Fixes: 87c48fa3b463 ("ipv6: make fragment identifications less predictable") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chen-Yu Tsai [Thu, 29 May 2014 14:31:40 +0000 (22:31 +0800)]
net: stmmac: Handle different error codes from platform_get_irq_byname
The following patch moved device tree interrupt resolution into
platform_get_irq_byname:
ad69674 of/irq: do irq resolution in platform_get_irq_byname()
As a result, the function no longer only return -ENXIO on error.
This breaks DT based probing of stmmac, as seen in test runs of
linux-next next-20140526 cubie2-sunxi_defconfig:
This patch makes the stmmac_platform probe function properly handle
error codes, such as returning for deferred probing, and other codes
returned by of_irq_get_by_name.
Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: qmi_wwan: interface #11 in Sierra Wireless MC73xx is not QMI
This interface is unusable, as the cdc-wdm character device doesn't reply to
any QMI command. Also, the out-of-tree Sierra Wireless GobiNet driver fully
skips it.
Signed-off-by: Aleksander Morgado <aleksander@aleksander.es> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Toshiaki Makita [Mon, 26 May 2014 06:15:53 +0000 (15:15 +0900)]
bridge: Prevent insertion of FDB entry with disallowed vlan
br_handle_local_finish() is allowing us to insert an FDB entry with
disallowed vlan. For example, when port 1 and 2 are communicating in
vlan 10, and even if vlan 10 is disallowed on port 3, port 3 can
interfere with their communication by spoofed src mac address with
vlan id 10.
Note: Even if it is judged that a frame should not be learned, it should
not be dropped because it is destined for not forwarding layer but higher
layer. See IEEE 802.1Q-2011 8.13.10.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Please pull this remaining batch of updates intended for the 3.16 stream...
For the mac80211 bits, Johannes says:
"The remainder for -next right now is mostly fixes, and a handful of
small new things like some CSA infrastructure, the regdb script mW/dBm
conversion change and sending wiphy notifications."
For the bluetooth bits, Gustavo says:
"Some more patches for 3.16. There is nothing really special here, just a
bunch of clean ups, fixes plus some small improvements. Please pull."
For the nfc bits, Samuel says:
"We have:
- Felica (Type3) tags support for trf7970a
- Type 4b tags support for port100
- st21nfca DTS typo fix
- A few sparse warning fixes"
For the atheros bits, Kalle says:
"Ben added support for setting antenna configurations. Michal improved
warm reset so that we would not need to fall back to cold reset that
often, an issue where ath10k stripped protected flag while in monitor
mode and made module initialisation asynchronous to fix the problems
with firmware loading when the driver is linked to the kernel.
Luca removed unused channel_switch_beacon callbacks both from ath9k and
ath10k. Marek fixed Protected Management Frames (PMF) when using Action
Frames. Also we had other small fixes everywhere in the driver."
Along with that, there are a handful of updates to a variety
of drivers. This includes updates to at76c50x-usb, ath9k, b43,
brcmfmac, mwifiex, rsi, rtlwifi, and wil6210.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 2 Jun 2014 12:26:03 +0000 (05:26 -0700)]
inetpeer: get rid of ip_id_count
Ideally, we would need to generate IP ID using a per destination IP
generator.
linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.
1) each inet_peer struct consumes 192 bytes
2) inetpeer cache uses a binary tree of inet_peer structs,
with a nominal size of ~66000 elements under load.
3) lookups in this tree are hitting a lot of cache lines, as tree depth
is about 20.
4) If server deals with many tcp flows, we have a high probability of
not finding the inet_peer, allocating a fresh one, inserting it in
the tree with same initial ip_id_count, (cf secure_ip_id())
5) We garbage collect inet_peer aggressively.
IP ID generation do not have to be 'perfect'
Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.
We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.
ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)
secure_ip_id() and secure_ipv6_id() no longer are needed.
Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Mack [Mon, 2 Jun 2014 11:32:46 +0000 (13:32 +0200)]
of: of_mdio: export symbol of_mdiobus_link_phydev
Make of_mdiobus_link_phydev externally available.
This fixes CONFIG_OF_MDIO=m.
Signed-off-by: Daniel Mack <zonque@gmail.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 86f6cf41272 ("net: of_mdio: add of_mdiobus_link_phydev()") Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Mack [Mon, 2 Jun 2014 11:32:45 +0000 (13:32 +0200)]
net: of_mdio: use int type for address variable
Use int rather than u32 to fix the following warning:
drivers/of/of_mdio.c:147 of_mdiobus_register() warn: unsigned 'addr' is
never less than zero.
Signed-off-by: Daniel Mack <zonque@gmail.com> Fixes: 8f8382888cba ("net: of_mdio: factor out code to parse a phy's 'reg' property") Signed-off-by: David S. Miller <davem@davemloft.net>