Emil Tantilov [Wed, 28 Jan 2015 03:21:29 +0000 (03:21 +0000)]
ixgbevf: rewrite watchdog task to function similar to igbvf
This patch cleans up the logic dealing with link down/up by breaking down the
link detection and up/down events into separate functions - similar to how these
events are handled in other drivers.
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Wed, 28 Jan 2015 03:21:24 +0000 (03:21 +0000)]
ixgbevf: Add code to check for Tx hang
This patch adds code to allow for Tx hang checking. The idea is to provide
more robust debug info in the event of a transmit unit hang. Similar to the
logic in ixgbe.
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Wed, 28 Jan 2015 03:21:18 +0000 (03:21 +0000)]
ixgbevf: Fix ordering of shutdown to correctly disable Rx and Tx
This patch updates the ordering of the shutdown path so that we attempt to
shutdown the rings more gracefully. Basically the big changes are that we
shutdown the main Rx filter in the case of Rx and we set the carrier_off
state in the case of Tx so that packets stop being delivered from outside
the driver. Then we shut down interrupts and NAPI. Finally we stop the
rings from performing DMA and clean them. This is a bit more graceful than
the previous path.
CC: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Emil Tantilov [Wed, 28 Jan 2015 03:21:13 +0000 (03:21 +0000)]
ixgbevf: set vlan_features in a single write instead of several ORs
Clean up the setting of vlan_features by enabling all features at once.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Fri, 12 Dec 2014 05:37:30 +0000 (05:37 +0000)]
ixgbe: Cleanup probe to remove redundant attempt to ID PHY
We always identify the PHY in our reset_hw path anyway so there is
no need to do it in get_invariants(). The reason I even noticed this
is that for new hardware (X550em) we don't assign some methods until
later in probe and calling phy.ops.read_reg could lead to a panic.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Sat, 6 Dec 2014 05:59:21 +0000 (05:59 +0000)]
ixgbe: cleanup sparse errors in new ixgbe_x550.c file
This patch cleans up prototypes that should have been defined
as static.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Matthew Vick [Tue, 27 Jan 2015 02:33:26 +0000 (02:33 +0000)]
fm10k: Validate VLAN ID in fm10k_update_xc_addr_pf
Currently, fm10k_update_xc_addr_pf has an issue where it does not
properly drop the upper-most four bits of the VLAN ID due to type
promotion. Resolve the issue not by masking off the bits, but by
throwing an error if the VLAN ID is out-of-bounds.
Reported-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Matthew Vick <matthew.vick@intel.com> Tested-by: Krishneil Singh <Krishneil.k.singh@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Lad, Prabhakar [Thu, 5 Feb 2015 13:38:07 +0000 (13:38 +0000)]
xen-netback: fix sparse warning
this patch fixes following sparse warning:
interface.c:83:5: warning: symbol 'xenvif_poll' was not declared. Should it be static?
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lad, Prabhakar [Thu, 5 Feb 2015 16:21:07 +0000 (16:21 +0000)]
net/macb: fix sparse warning
this patch fixes following sparse warning:
macb.c:2038:26: warning: symbol 'gem_ethtool_ops' was not declared. Should it be static?
Alongside drops exporting of gem_ethtool_ops as there is no need.
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lad, Prabhakar [Thu, 5 Feb 2015 15:47:17 +0000 (15:47 +0000)]
net: bnx2x: fix sparse warnings
this patch fixes following sparse warnings:
bnx2x_main.c:9172:6: warning: symbol 'bnx2x_stop_ptp' was not declared. Should it be static?
bnx2x_main.c:13321:6: warning: symbol 'bnx2x_register_phc' was not declared. Should it be static?
bnx2x_main.c:14638:5: warning: symbol 'bnx2x_enable_ptp_packets' was not declared. Should it be static?
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lad, Prabhakar [Thu, 5 Feb 2015 15:34:13 +0000 (15:34 +0000)]
enic: enic_main: fix sparse warnings
this patch fixes following sparse warnings:
enic_main.c:92:28: warning: symbol 'mod_table' was not declared. Should it be static?
enic_main.c:109:28: warning: symbol 'mod_range' was not declared. Should it be static?
enic_main.c:1306:5: warning: symbol 'enic_busy_poll' was not declared. Should it be static?
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lad, Prabhakar [Thu, 5 Feb 2015 15:06:33 +0000 (15:06 +0000)]
hyperv: fix sparse warnings
this patch fixes following sparse warnings:
netvsc.c:688:5: warning: symbol 'netvsc_copy_to_send_buf' was not declared. Should it be static?
rndis_filter.c:627:5: warning: symbol 'rndis_filter_set_offload_params' was not declared. Should it be static?
rndis_filter.c:702:5: warning: symbol 'rndis_filter_set_rss_param' was not declared. Should it be static?
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 6 Feb 2015 00:00:06 +0000 (16:00 -0800)]
Merge branch 'tipc-next'
Jon Maloy says:
====================
tipc: resolve message disordering problem
When TIPC receives messages from multi-threaded device drivers it may
occasionally deliver messages to their destination sockets in the wrong
order. This happens despite correct resequencing at the link layer,
because the upcall path from link to socket is not protected by any
locks.
These commits solve this problem by introducing an 'input' message
queue in each link, through which messages must be delivered to the
upper layers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 5 Feb 2015 13:36:44 +0000 (08:36 -0500)]
tipc: eliminate race condition at multicast reception
In a previous commit in this series we resolved a race problem during
unicast message reception.
Here, we resolve the same problem at multicast reception. We apply the
same technique: an input queue serializing the delivery of arriving
buffers. The main difference is that here we do it in two steps.
First, the broadcast link feeds arriving buffers into the tail of an
arrival queue, which head is consumed at the socket level, and where
destination lookup is performed. Second, if the lookup is successful,
the resulting buffer clones are fed into a second queue, the input
queue. This queue is consumed at reception in the socket just like
in the unicast case. Both queues are protected by the same lock, -the
one of the input queue.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 5 Feb 2015 13:36:43 +0000 (08:36 -0500)]
tipc: simplify socket multicast reception
The structure 'tipc_port_list' is used to collect port numbers
representing multicast destination socket on a receiving node.
The list is not based on a standard linked list, and is in reality
optimized for the uncommon case that there are more than one
multicast destinations per node. This makes the list handling
unecessarily complex, and as a consequence, even the socket
multicast reception becomes more complex.
In this commit, we replace 'tipc_port_list' with a new 'struct
tipc_plist', which is based on a standard list. We give the new
list stack (push/pop) semantics, someting that simplifies
the implementation of the function tipc_sk_mcast_rcv().
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 5 Feb 2015 13:36:42 +0000 (08:36 -0500)]
tipc: simplify connection abort notifications when links break
The new input message queue in struct tipc_link can be used for
delivering connection abort messages to subscribing sockets. This
makes it possible to simplify the code for such cases.
This commit removes the temporary list in tipc_node_unlock()
used for transforming abort subscriptions to messages. Instead, the
abort messages are now created at the moment of lost contact, and
then added to the last failed link's generic input queue for delivery
to the sockets concerned.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 5 Feb 2015 13:36:41 +0000 (08:36 -0500)]
tipc: resolve race problem at unicast message reception
TIPC handles message cardinality and sequencing at the link layer,
before passing messages upwards to the destination sockets. During the
upcall from link to socket no locks are held. It is therefore possible,
and we see it happen occasionally, that messages arriving in different
threads and delivered in sequence still bypass each other before they
reach the destination socket. This must not happen, since it violates
the sequentiality guarantee.
We solve this by adding a new input buffer queue to the link structure.
Arriving messages are added safely to the tail of that queue by the
link, while the head of the queue is consumed, also safely, by the
receiving socket. Sequentiality is secured per socket by only allowing
buffers to be dequeued inside the socket lock. Since there may be multiple
simultaneous readers of the queue, we use a 'filter' parameter to reduce
the risk that they peek the same buffer from the queue, hence also
reducing the risk of contention on the receiving socket locks.
This solves the sequentiality problem, and seems to cause no measurable
performance degradation.
A nice side effect of this change is that lock handling in the functions
tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that
will enable future simplifications of those functions.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 5 Feb 2015 13:36:40 +0000 (08:36 -0500)]
tipc: use existing sk_write_queue for outgoing packet chain
The list for outgoing traffic buffers from a socket is currently
allocated on the stack. This forces us to initialize the queue for
each sent message, something costing extra CPU cycles in the most
critical data path. Later in this series we will introduce a new
safe input buffer queue, something that would force us to initialize
even the spinlock of the outgoing queue. A closer analysis reveals
that the queue always is filled and emptied within the same lock_sock()
session. It is therefore safe to use a queue aggregated in the socket
itself for this purpose. Since there already exists a queue for this
in struct sock, sk_write_queue, we introduce use of that queue in
this commit.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 5 Feb 2015 13:36:39 +0000 (08:36 -0500)]
tipc: split up function tipc_msg_eval()
The function tipc_msg_eval() is in reality doing two related, but
different tasks. First it tries to find a new destination for named
messages, in case there was no first lookup, or if the first lookup
failed. Second, it does what its name suggests, evaluating the validity
of the message and its destination, and returning an appropriate error
code depending on the result.
This is confusing, and in this commit we choose to break it up into two
functions. A new function, tipc_msg_lookup_dest(), first attempts to find
a new destination, if the message is of the right type. If this lookup
fails, or if the message should not be subject to a second lookup, the
already existing tipc_msg_reverse() is called. This function performs
prepares the message for rejection, if applicable.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 5 Feb 2015 13:36:38 +0000 (08:36 -0500)]
tipc: enqueue arrived buffers in socket in separate function
The code for enqueuing arriving buffers in the function tipc_sk_rcv()
contains long code lines and currently goes to two indentation levels.
As a cosmetic preparaton for the next commits, we break it out into
a separate function.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 5 Feb 2015 13:36:37 +0000 (08:36 -0500)]
tipc: simplify message forwarding and rejection in socket layer
Despite recent improvements, the handling of error codes and return
values at reception of messages in the socket layer is still confusing.
In this commit, we try to make it more comprehensible. First, we
separate between the return values coming from the functions called
by tipc_sk_rcv(), -those are TIPC specific error codes, and the
return values returned by tipc_sk_rcv() itself. Second, we don't
use the returned TIPC error code as indication for whether a buffer
should be forwarded/rejected or not; instead we use the buffer pointer
passed along with filter_msg(). This separation is necessary because
we sometimes want to forward messages even when there is no error
(i.e., protocol messages and successfully secondary looked up data
messages).
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 5 Feb 2015 13:36:36 +0000 (08:36 -0500)]
tipc: reduce usage of context info in socket and link
The most common usage of namespace information is when we fetch the
own node addess from the net structure. This leads to a lot of
passing around of a parameter of type 'struct net *' between
functions just to make them able to obtain this address.
However, in many cases this is unnecessary. The own node address
is readily available as a member of both struct tipc_sock and
tipc_link, and can be fetched from there instead.
The fact that the vast majority of functions in socket.c and link.c
anyway are maintaining a pointer to their respective base structures
makes this option even more compelling.
In this commit, we introduce the inline functions tsk_own_node()
and link_own_node() to make it easy for functions to fetch the node
address from those structs instead of having to pass along and
dereference the namespace struct.
In particular, we make calls to the msg_xx() functions in msg.{h,c}
context independent by directly passing them the own node address
as parameter when needed. Those functions should be regarded as
leaves in the code dependency tree, and it is hence desirable to
keep them namspace unaware.
Apart from a potential positive effect on cache behavior, these
changes make it easier to introduce the changes that will follow
later in this series.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Takashi Iwai [Thu, 5 Feb 2015 10:15:24 +0000 (11:15 +0100)]
hso: Use static attribute groups for sysfs entry
Pass the static attribute groups and the driver data via
tty_port_register_device_attr() instead of manual device_create_file()
and device_remove_file() calls.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.
In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.
In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.
In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.
Signed-off-by: David S. Miller <davem@davemloft.net>
1) Stretch ACKs can kill performance with Reno and CUBIC congestion
control, largely due to LRO and GRO. Fix from Neal Cardwell.
2) Fix userland breakage because we accidently emit zero length netlink
messages from the bridging code. From Roopa Prabhu.
3) Carry handling in generic csum_tcpudp_nofold is broken, fix from
Karl Beldan.
4) Remove bogus dev_set_net() calls from CAIF driver, from Nicolas
Dichtel.
5) Make sure PPP deflation never returns a length greater then the
output buffer, otherwise we overflow and trigger skb_over_panic().
Fix from Florian Westphal.
6) COSA driver needs VIRT_TO_BUS Kconfig dependencies, from Arnd
Bergmann.
7) Don't increase route cached MTU on datagram too big ICMPs. From Li
Wei.
8) Fix error path leaks in nf_tables, from Pablo Neira Ayuso.
9) Fix bitmask handling regression in netlink that broke things like
acpi userland tools. From Pablo Neira Ayuso.
10) Wrong header pointer passed to param_type2af() in SCTP code, from
Saran Maruti Ramanara.
11) Stacked vlans not handled correctly by vlan_get_protocol(), from
Toshiaki Makita.
12) Add missing DMA memory barrier to xgene driver, from Iyappan
Subramanian.
13) Fix crash in rate estimators, from Eric Dumazet.
14) We've been adding various workarounds, one after another, for the
change which added the per-net tcp_sock. It was meant to reduce
socket contention but added lots of problems.
Reduce this instead to a proper per-cpu socket and that rids us of
all the daemons.
From Eric Dumazet.
15) Fix memory corruption and OOPS in mlx4 driver, from Jack
Morgenstein.
16) When we disabled UFO in the virtio_net device, it introduces some
serious performance regressions. The orignal problem was IPV6
fragment ID generation, so fix that properly instead. From Vlad
Yasevich.
17) sr9700 driver build breaks on xtensa because it defines macros with
the same name as those used by the arch code. Use more unique
names. From Chen Gang.
18) Fix endianness in new virio 1.0 mode of the vhost net driver, from
Michael S Tsirkin.
19) Several sysctls were setting the maxlen attribute incorrectly, from
Sasha Levin.
20) Don't accept an FQ scheduler quantum of zero, that leads to crashes.
From Kenneth Klette Jonassen.
21) Fix dumping of non-existing actions in the packet scheduler
classifier. From Ignacy Gawędzki.
22) Return the write work_done value when doing TX work in the qlcnic
driver.
23) ip6gre_err accesses the info field with the wrong endianness, from
Sabrina Dubroca.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
sit: fix some __be16/u16 mismatches
ipv6: fix sparse errors in ip6_make_flowlabel()
net: remove some sparse warnings
flow_keys: n_proto type should be __be16
ip6_gre: fix endianness errors in ip6gre_err
qlcnic: Fix NAPI poll routine for Tx completion
amd-xgbe: Set RSS enablement based on hardware features
amd-xgbe: Adjust for zero-based traffic class count
cls_api.c: Fix dumping of non-existing actions' stats.
pkt_sched: fq: avoid hang when quantum 0
net: rds: use correct size for max unacked packets and bytes
vhost/net: fix up num_buffers endian-ness
gianfar: correct the bad expression while writing bit-pattern
net: usb: sr9700: Use 'SR_' prefix for the common register macros
Revert "drivers/net: Disable UFO through virtio"
Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets"
ipv6: Select fragment id during UFO segmentation if not set.
xen-netback: stop the guest rx thread after a fatal error
net/mlx4_core: Fix kernel Oops (mem corruption) when working with more than 80 VFs
isdn: off by one in connect_res()
...
Linus Torvalds [Thu, 5 Feb 2015 19:17:15 +0000 (11:17 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This patch set is fixing two serious problems which have turned up
late in the release cycle.
The first fixes a problem with 4k sector disks where the transfer
length (amount of data sent to the disk) was getting increased every
time the disk was revalidated leading to potential for overflows.
The other is a regression oops fix for some of our last merge window
code"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
sd: Fix max transfer length for 4k disks
scsi: fix device handler detach oops
Linus Torvalds [Thu, 5 Feb 2015 19:11:44 +0000 (11:11 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Radeon and amdkfd fixes.
Radeon ones mostly for oops in some test/benchmark functions since
fencing changes, and one regression fix for old GPUs,
There is one cirrus regression fix, the 32bpp broke userspace, so this
hides it behind a module option for the few users who care.
I'm off for a few days, so this is probably the final pull I have, if
I see fixes from Intel I'll forward the pull as I should have email"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/cirrus: Limit modes depending on bpp option
drm/radeon: fix the crash in test functions
drm/radeon: fix the crash in benchmark functions
drm/radeon: properly set vm fragment size for TN/RL
drm/radeon: don't init gpuvm if accel is disabled (v3)
drm/radeon: fix PLLs on RS880 and older v2
drm/amdkfd: Don't create BUG due to incorrect user parameter
drm/amdkfd: max num of queues can't be 0
drm/amdkfd: Fix bug in accounting of queues
Linus Torvalds [Thu, 5 Feb 2015 19:07:25 +0000 (11:07 -0800)]
Merge tag 'spi-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A couple of driver specific fixes:
- Disable DMA mode for i.MX6DL chips due to a hardware bug.
- Don't use devm_kzalloc() outside of bind/unbind paths in the
fsl-dspi driver, fixing memory leaks"
* tag 'spi-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: imx: use pio mode for i.mx6dl
spi: spi-fsl-dspi: Remove usage of devm_kzalloc
Linus Torvalds [Thu, 5 Feb 2015 18:57:29 +0000 (10:57 -0800)]
Merge tag 'pm+acpi-3.19-fin' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI power management fix from Rafael Wysocki:
"This is a revert of an ACPI Low-power Subsystem (LPSS) driver change
that was supposed to improve power management of the LPSS DMA
controller, but introduced more serious problems.
Since fixing them turns out to be non-trivial, it is better to revert
the commit in question at this point and try to fix the original issue
differently in the next cycle"
* tag 'pm+acpi-3.19-fin' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPI / LPSS: introduce a 'proxy' device to power on LPSS for DMA"
Eric Dumazet [Wed, 4 Feb 2015 23:12:04 +0000 (15:12 -0800)]
sit: fix some __be16/u16 mismatches
Fixes following sparse warnings :
net/ipv6/sit.c:1509:32: warning: incorrect type in assignment (different base types)
net/ipv6/sit.c:1509:32: expected restricted __be16 [usertype] sport
net/ipv6/sit.c:1509:32: got unsigned short
net/ipv6/sit.c:1514:32: warning: incorrect type in assignment (different base types)
net/ipv6/sit.c:1514:32: expected restricted __be16 [usertype] dport
net/ipv6/sit.c:1514:32: got unsigned short
net/ipv6/sit.c:1711:38: warning: incorrect type in argument 3 (different base types)
net/ipv6/sit.c:1711:38: expected unsigned short [unsigned] [usertype] value
net/ipv6/sit.c:1711:38: got restricted __be16 [usertype] sport
net/ipv6/sit.c:1713:38: warning: incorrect type in argument 3 (different base types)
net/ipv6/sit.c:1713:38: expected unsigned short [unsigned] [usertype] value
net/ipv6/sit.c:1713:38: got restricted __be16 [usertype] dport
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 4 Feb 2015 23:03:25 +0000 (15:03 -0800)]
ipv6: fix sparse errors in ip6_make_flowlabel()
include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types)
include/net/ipv6.h:713:22: expected restricted __be32 [usertype] hash
include/net/ipv6.h:713:22: got unsigned int
include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer
include/net/ipv6.h:719:22: warning: invalid assignment: ^=
include/net/ipv6.h:719:22: left side has type restricted __be32
include/net/ipv6.h:719:22: right side has type unsigned int
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 4 Feb 2015 21:31:54 +0000 (13:31 -0800)]
flow_keys: n_proto type should be __be16
(struct flow_keys)->n_proto is in network order, use
proper type for this.
Fixes following sparse errors :
net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:139:39: expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:139:39: got restricted __be16 [assigned] [usertype] proto
net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types)
net/core/flow_dissector.c:237:23: expected unsigned short [unsigned] [usertype] n_proto
net/core/flow_dissector.c:237:23: got restricted __be16 [assigned] [usertype] proto
Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: e0f31d849867 ("flow_keys: Record IP layer protocol in skb_flow_dissect()") Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Wed, 4 Feb 2015 14:56:58 +0000 (15:56 +0100)]
net: ep93xx_eth: Delete unnecessary checks before the function call "kfree"
The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Wed, 4 Feb 2015 14:25:09 +0000 (15:25 +0100)]
ip6_gre: fix endianness errors in ip6gre_err
info is in network byte order, change it back to host byte order
before use. In particular, the current code sets the MTU of the tunnel
to a wrong (too big) value.
Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Takashi Iwai [Wed, 4 Feb 2015 13:38:55 +0000 (14:38 +0100)]
xen-netfront: Use static attribute groups for sysfs entries
Instead of manual calls of device_create_file() and
device_remove_files(), assign the static attribute groups to netdev
groups array. This simplifies the code and avoids the possible
races.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Takashi Iwai [Wed, 4 Feb 2015 13:37:34 +0000 (14:37 +0100)]
tun: Use static attribute groups for sysfs entries
Instead of manual calls of device_create_file() and
device_remove_files(), assign the static attribute groups to netdev
groups array. This simplifies the code and avoids the possible
races.
Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
this is a pull request of 2 patches for net-next/master.
Nicholas Mc Guire contributes a patch for the janz-ican3 driver to fix
a mismatch in an assignment. Ahmed S. Darwish contributes a patch for
the kvaser_usb driver, to make the driver more robust during the
bus-off handling.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shahed Shaikh [Wed, 4 Feb 2015 10:41:25 +0000 (05:41 -0500)]
qlcnic: Fix NAPI poll routine for Tx completion
After d75b1ade567f ("net: less interrupt masking in NAPI")
driver's NAPI poll routine is expected to return
exact budget value if it wants to be re-called.
Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Fixes: d75b1ade567f ("net: less interrupt masking in NAPI") Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb_busy_poll, corresponding to ndo_busy_poll, gets called by the socket
waiting for data.
With busy_poll enabled, improvement is seen in latency numbers as observed by
collecting netperf TCP_RR numbers.
Below are latency number, with and without busy-poll, in a switched environment
for a particular msg size:
netperf command: netperf -4 -H <ip> -l 30 -t TCP_RR -- -r1,1
Latency without busy-poll: ~16.25 us
Latency with busy-poll : ~08.79 us
Based on original work by Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 5 Feb 2015 05:30:40 +0000 (21:30 -0800)]
pkt_sched: fq: better control of DDOS traffic
FQ has a fast path for skb attached to a socket, as it does not
have to compute a flow hash. But for other packets, FQ being non
stochastic means that hosts exposed to random Internet traffic
can allocate million of flows structure (104 bytes each) pretty
easily. Not only host can OOM, but lookup in RB trees can take
too much cpu and memory resources.
This patch adds a new attribute, orphan_mask, that is adding
possibility of having a stochastic hash for orphaned skb.
Its default value is 1024 slots, to mimic SFQ behavior.
Note: This does not apply to locally generated TCP traffic,
and no locally generated traffic will share a flow structure
with another perfect or stochastic flow.
This patch also handles the specific case of SYNACK messages:
They are attached to the listener socket, and therefore all map
to a single hash bucket. If listener have set SO_MAX_PACING_RATE,
hoping to have new accepted socket inherit this rate, SYNACK
might be paced and even dropped.
This is very similar to an internal patch Google have used more
than one year.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 4 Feb 2015 02:31:53 +0000 (18:31 -0800)]
tcp: do not pace pure ack packets
When we added pacing to TCP, we decided to let sch_fq take care
of actual pacing.
All TCP had to do was to compute sk->pacing_rate using simple formula:
sk->pacing_rate = 2 * cwnd * mss / rtt
It works well for senders (bulk flows), but not very well for receivers
or even RPC :
cwnd on the receiver can be less than 10, rtt can be around 100ms, so we
can end up pacing ACK packets, slowing down the sender.
Really, only the sender should pace, according to its own logic.
Instead of adding a new bit in skb, or call yet another flow
dissection, we tweak skb->truesize to a small value (2), and
we instruct sch_fq to use new helper and not pace pure ack.
Note this also helps TCP small queue, as ack packets present
in qdisc/NIC do not prevent sending a data packet (RPC workload)
This helps to reduce tx completion overhead, ack packets can use regular
sock_wfree() instead of tcp_wfree() which is a bit more expensive.
This has no impact in the case packets are sent to loopback interface,
as we do not coalesce ack packets (were we would detect skb->truesize
lie)
In case netem (with a delay) is used, skb_orphan_partial() also sets
skb->truesize to 1.
This patch is a combination of two patches we used for about one year at
Google.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 5 Feb 2015 04:35:05 +0000 (20:35 -0800)]
Merge branch 'rhashtable-next'
Herbert Xu says:
====================
rhashtable: Add iterators and use them
The first patch fixes a potential crash with nft_hash destroying
the table during a shrinking process. While the next patch adds
rhashtable iterators to replace current manual walks used by
netlink and netfilter. The final two patches make use of these
iterators in netlink and netfilter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 3 Feb 2015 20:33:25 +0000 (07:33 +1100)]
netfilter: Use rhashtable walk iterator
This patch gets rid of the manual rhashtable walk in nft_hash
which touches rhashtable internals that should not be exposed.
It does so by using the rhashtable iterator primitives.
Note that I'm leaving nft_hash_destroy alone since it's only
invoked on shutdown and it shouldn't be affected by changes
to rhashtable internals (or at least not what I'm planning to
change).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 3 Feb 2015 20:33:24 +0000 (07:33 +1100)]
netlink: Use rhashtable walk iterator
This patch gets rid of the manual rhashtable walk in netlink
which touches rhashtable internals that should not be exposed.
It does so by using the rhashtable iterator primitives.
In fact the existing code was very buggy. Some sockets weren't
shown at all while others were shown more than once.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 3 Feb 2015 20:33:23 +0000 (07:33 +1100)]
rhashtable: Introduce rhashtable_walk_*
Some existing rhashtable users get too intimate with it by walking
the buckets directly. This prevents us from easily changing the
internals of rhashtable.
This patch adds the helpers rhashtable_walk_init/exit/start/next/stop
which will replace these custom walkers.
They are meant to be usable for both procfs seq_file walks as well
as walking by a netlink dump. The iterator structure should fit
inside a netlink dump cb structure, with at least one element to
spare.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Tue, 3 Feb 2015 20:33:22 +0000 (07:33 +1100)]
rhashtable: Fix potential crash on destroy in rhashtable_shrink
The current being_destroyed check in rhashtable_expand is not
enough since if we start a shrinking process after freeing all
elements in the table that's also going to crash.
This patch adds a being_destroyed check to the deferred worker
thread so that we bail out as soon as we take the lock.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
amd-xgbe: Set RSS enablement based on hardware features
The RSS support requires enablement based on the features reported by
the hardware. The setting of this flag is missing. Add support to
set the RSS enablement flag based on the reported hardware features.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring [Tue, 3 Feb 2015 19:12:25 +0000 (20:12 +0100)]
NetCP: Deletion of unnecessary checks before two function calls
The functions cpsw_ale_destroy() and of_dev_put() test whether their argument
is NULL and then return immediately. Thus the test around the call
is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Ignacy Gawędzki [Tue, 3 Feb 2015 18:05:18 +0000 (19:05 +0100)]
cls_api.c: Fix dumping of non-existing actions' stats.
In tcf_exts_dump_stats(), ensure that exts->actions is not empty before
accessing the first element of that list and calling tcf_action_copy_stats()
on it. This fixes some random segvs when adding filters of type "basic" with
no particular action.
This also fixes the dumping of those "no-action" filters, which more often
than not made calls to tcf_action_copy_stats() fail and consequently netlink
attributes added by the caller to be removed by a call to nla_nest_cancel().
Fixes: 33be62715991 ("net_sched: act: use standard struct list_head") Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Configuring fq with quantum 0 hangs the system, presumably because of a
non-interruptible infinite loop. Either way quantum 0 does not make sense.
Reproduce with:
sudo tc qdisc add dev lo root fq quantum 0 initial_quantum 0
ping 127.0.0.1
Signed-off-by: Kenneth Klette Jonassen <kennetkl@ifi.uio.no> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Takashi Iwai [Tue, 3 Feb 2015 16:51:23 +0000 (17:51 +0100)]
drm/cirrus: Limit modes depending on bpp option
The commit [8975626ea35a: drm/cirrus: allow 32bpp framebuffers for
cirrus drm] broke X modesetting driver because cirrus driver still
provides the full list of modes up to 1280x1024 while the 32bpp can
support only up to 800x600.
We might be able to filter out the invalid modes in mode_valid
callback, but unfortunately the bpp in question can't be referred
there for now (let me know if there is a better way to retrieve the
bpp for the probed fb).
So, instead, this patch adds the bpp module option to specify the
maximal bpp explicitly and limits the resolutions in get_modes
depending on its value.
The default value is set to 24 so that the existing stuff keeps
working. If you need a new 32bpp feature, specify cirrus.bpp=32
option explicitly.
Fixes: 8975626ea35a ('drm/cirrus: allow 32bpp framebuffers for cirrus drm') Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
This patchset introduces some small bug fixes and code cleanups in mlx4_core,
mlx4_en and mlx5_core.
I am sending it in parallel to the patchset sent by Or Gerlitz today [1] because
this is the end of the time frame for 3.20. I also checked that there are no
conflicts between those two patchsets (Or's patchset is focused on the bonding
area while this on Mellanox drivers).
The patchset was applied on top of commit 7d37d0c ('net: sctp: Deletion of an
unnecessary check before the function call "kfree"')
[1] - [PATCH 00/10] Add HA and LAG support to mlx4 RoCE and SRIOV services
http://marc.info/?l=linux-netdev&m=142297582610254&w=2
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Shamay [Tue, 3 Feb 2015 15:57:20 +0000 (17:57 +0200)]
net/mlx4_en: Adjust RX frag strides to frag sizes
This patch improves memory utilization and therefore the packets rate
for special MTU's. Instead of setting the frag_stride to the maximal
hard coded frag_size, use the actual frag_size that is set according to
the MTU, when setting the stride of the last frag.
So, for example, for MTU 1600, where the frag_size of the 2nd frag is
86, the frag_size is set to 128 instead of 4096. See below:
Ido Shamay [Tue, 3 Feb 2015 15:57:19 +0000 (17:57 +0200)]
net/mlx4_en: Print page allocator information
After Initialization of page_alloc, print actual allocated page
size and number of frags it contains. prints is done only when drv
message level is set on the interface.
Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Tue, 3 Feb 2015 15:57:17 +0000 (17:57 +0200)]
net/mlx4_core: Fix misleading debug print on CQE stride support
We do support cache line sizes of 32 and 64 bytes without activating the
CQE stride feature. Fix a misleading print saying that these cache line
sizes aren't supported.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maor Gottlieb [Tue, 3 Feb 2015 15:57:15 +0000 (17:57 +0200)]
net/mlx4_core: Fix mpt_entry initialization in mlx4_mr_rereg_mem_write()
a) Previously, mlx4_mr_rereg_write filled the MPT's start
and length with the old MPT's values.
Fixing the initialization to take the new start and length.
b) In addition access flags in mpt_status were initialized instead of
status due to bad boolean operation. Fixing the operation.
c) Initialization of pd_slave caused a protection error.
Fix - removing this initialization.
d) In resource_tracker.c: Fixing vf encoding to be one-based.
Fixes: e630664c ('mlx4_core: Add helper functions to support MR re-registration') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 5 Feb 2015 00:14:29 +0000 (16:14 -0800)]
Merge branch 'mlx4-next'
Or Gerlitz says:
====================
Add HA and LAG support to mlx4 RoCE and SRIOV services
This series takes advanges of bonding mlx4 Ethernet devices to support
a model of High-Availability and Link Aggregation for more environments.
The mlx4 driver reacts on netdev events generated by bonding when
slave state changes happen by programming a HW V2P (Virt-to-Phys)
port table. Bonding was extended to expose these state changes
through netdev events.
When an mlx4 interface such as the mlx4 IB/RoCE driver is subject to
this policy, QPs are created over virtual ports which are mapped
to one of the two physical ports. When a failure happens, the
re-programming of the V2P table allows traffic to keep flowing.
The mlx4 Ethernet driver interfaces are not subject to this
policy and act as usual.
A 2nd use-case for this model would be to add HA and Link Aggregation
support to single ported mlx4 Ethernet VFs. In this case, the PF Ethernet
intrfaces are bonded, all the VFs see single port devices (which is
supported already today), and VF QPs are subject to V2P.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Tue, 3 Feb 2015 14:48:39 +0000 (16:48 +0200)]
IB/mlx4: Load balance ports in port aggregation mode
When the mlx4 IB (RoCE) device works in link aggregation mode, it
exposes a single port to upper layers. Therefore, applications always
set '1' in port_num attribute when modifying a QP or creating an address handle.
To make sure that a node uses all available ports the mlx4 driver will
override the port_num attribute with a round robin policy.
Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Tue, 3 Feb 2015 14:48:38 +0000 (16:48 +0200)]
IB/mlx4: Create mirror flows in port aggregation mode
In port aggregation mode flows for port #1 (the only port) should be mirrored
on port #2. This is because packets can arrive from either physical ports.
Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Tue, 3 Feb 2015 14:48:33 +0000 (16:48 +0200)]
net/mlx4_core: Port aggregation upper layer interface
Supply interface functions to bond and unbond ports of a mlx4 internal
interfaces. Example for such an interface is the one registered by the
mlx4 IB driver under RoCE.
There are
1. Functions to go in/out to/from bonded mode
2. Function to remap virtual ports to physical ports
The bond_mutex prevents simultaneous access to data that keep status of
the device in bonded mode.
The upper mlx4 interface marks to the mlx4 core module that they
want to be subject for such bonding by setting the MLX4_INTFF_BONDING
flag. Interface which goes to/from bonded mode is re-created.
The mlx4 Ethernet driver does not set this flag when registering the
interface, the IB driver does.
Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Moni Shoua [Tue, 3 Feb 2015 14:48:29 +0000 (16:48 +0200)]
net/core: Add event for a change in slave state
Add event which provides an indication on a change in the state
of a bonding slave. The event handler should cast the pointer to the
appropriate type (struct netdev_bonding_info) in order to get the
full info about the slave.
Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 5 Feb 2015 00:09:38 +0000 (16:09 -0800)]
Merge branch 'tipc-next'
Jon Maloy says:
====================
tipc: some small fixes
During extensive testing and analysis of running dual links between
nodes, we have encountered some issues that potentially may cause
problems. We choose to fix those proactively in this series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Tue, 3 Feb 2015 13:59:20 +0000 (08:59 -0500)]
tipc: separate link starting event from link timeout event
When a new link instance is created, it is trigged to start by
sending it a TIPC_STARTING_EVT, whereafter a regular link
reset is applied to it.
The starting event is codewise treated as a timeout event, and prompts
a link RESET message to be sent to the peer node, carrying a link
session identifier. The later link_reset() call nudges this session
identifier, whereafter all subsequent RESET messages will be sent out
with the new identifier. The latter session number overrides the former,
causing the peer to unconditionally accept it irrespective of its
current working state.
We don't think that this causes any problem, but it is not in accordance
with the protocol spec, and may cause confusion when debugging TIPC
sessions.
To avoid this, we make the starting event distinct from the
subsequent timeout events, by not allowing the former to send
out any RESET message. This eliminates the described problem.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>