Don Skidmore [Fri, 7 Nov 2014 03:53:35 +0000 (03:53 +0000)]
ixgbe: Add new support for X550 MAC's
This patch will add in the new MAC defines and fit it into the switch
cases throughout the driver. New functionality and enablement support will
be added in following patches.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Wed, 5 Nov 2014 04:52:09 +0000 (04:52 +0000)]
ixgbe: cleanup move setting PFQDE.HIDE_VLAN to support function.
Move setting of drop enable to support function. This not only makes the
code more readable but is also prep for following patches that add
additional MAC support.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Wed, 29 Oct 2014 07:23:41 +0000 (07:23 +0000)]
ixgbe: fix X540 Completion timeout
On topologies including few levels of PCIe switching X540 can run into an
unexpected completion error. We get around this by waiting after enabling
loopback a sufficient amount of time until Tx Data Fetch is sent. We then
poll the pending transaction bit to ensure we received the completion. Only
then do we go on to clear the buffers.
Signed-of-by: Don Skidmore <donald.c.skidmore@intel.com> Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Sat, 25 Oct 2014 03:24:34 +0000 (03:24 +0000)]
i40evf: don't use more queues than CPUs
It's kind of silly to configure and attempt to use a bunch of queue
pairs when you're running on a single (virtual) CPU. Instead of
unconditionally configuring all of the queues that the PF gives us,
clamp the number of queue pairs to the number of CPUs.
Change-ID: I321714c9e15072ee76de8f95ab9a81f86ed347d1 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Sat, 25 Oct 2014 03:24:33 +0000 (03:24 +0000)]
i40evf: make early init processing more robust
In early init, if we get an unexpected message from the PF (such as link
status), we just kick an error back to the init task, causing it to
restart its state machine and delaying initialization.
Make the early init AQ message receive code more robust by handling
messages in a loop, and ignoring those that we aren't interested in.
This also gets rid of some scary log messages that really didn't
indicate a problem.
Change-ID: I620e8c72e49c49c665ef33eeab2425dd10e721cf Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Sat, 25 Oct 2014 03:24:32 +0000 (03:24 +0000)]
i40e: clean up throttle rate code
The interrupt throttle rate minimum is actually 2us, so
fix that define and while we are there, remove some unused defines.
Change some strings in the function to be a bit less wrappy, and
express the correct limits.
Change-ID: I96829bbc77935e0b57c6f0fc1439fb4152b2960a Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Sat, 25 Oct 2014 10:35:25 +0000 (10:35 +0000)]
i40e: don't do link_status or stats collection on every ARQ
The ARQ events cause a service_task execution, and we do a link_status
check and full stats gathering for each service_task. However, when
there are a lot of ARQ events, such as when doing an NVM update, we end up
doing 10's if not 100's of these per second, thereby heavily abusing the
PCI bus and especially the Firmware. This patch adds a check to keep the
service_task from running these periodic tasks more than once per second,
while still allowing quick action to service the events.
Change-ID: Iec7670c37bfae9791c43fec26df48aea7f70b33e Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Signed-off-by: Patrick Lu <patrick.lu@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Kamil Krawczyk [Sat, 25 Oct 2014 03:24:30 +0000 (03:24 +0000)]
i40e: poll firmware slower
The code was polling the firmware tail register for completion every
10 microseconds, which is way faster than the firmware can respond.
This changes the poll interval to 1ms, which reduces polling CPU
utilization, and the number of times we loop.
The maximum delay is still 100ms.
Change-ID: I4bbfa6b66d802890baf8b4154061e55942b90958 Signed-off-by: Kamil Krawczyk <kamil.krawczyk@intel.com> Acked-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <jamesx.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Eric Dumazet [Mon, 10 Nov 2014 22:07:20 +0000 (14:07 -0800)]
mlx4: restore conditional call to napi_complete_done()
After commit 1a28817282 ("mlx4: use napi_complete_done()") we ended up
calling napi_complete_done() in the case NAPI poll consumed all its
budget.
This added extra interrupt pressure, this patch restores proper
behavior.
Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 1a28817282 ("mlx4: use napi_complete_done()") Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series contains fixes for race-conditions in sunvnet,
that can encountered when there is a difference in latency between
producer and consumer.
Patch 1 addresses a case when the STOPPED LDC ack from a peer is
processed before vnet_start_xmit can finish updating the dr->prod
state.
Patch 2 fixes the edge-case when outgoing data and incoming
stopped-ack cross each other in flight.
Patch 3 adds a missing rcu_read_unlock(), found by code-inspection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
sunvnet: vnet_ack() should check if !start_cons to send a missed trigger
As per comments in vnet_start_xmit, for the edge case
when outgoing vnet_start_xmit() data and an incoming STOPPED
ACK cross each other in flight, we may need to send the missed
START trigger from maybe_tx_wakeup() after checking for a
false value of start_cons
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
__vnet_tx_trigger for some desc X
at this point dr->prod == X
peer sends back a stopped ack for X
we process X, but X == dr->prod
so we bail out in vnet_ack with
!idx_is_pending
update dr->prod
As a result of the fact that we never processed the stopped ack for X,
the Tx path is led to incorrectly believe that the peer is still
"started" and reading, but the peer has stopped reading, which will
ultimately end in flow-control assertions.
The fix is to synchronize the above 2 paths on the netif_tx_lock.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alban Bedel [Sat, 8 Nov 2014 11:48:36 +0000 (12:48 +0100)]
8139too: Allow using the largest possible MTU
This driver allows MTU up to 1518 bytes which is not enought to run
batman-adv. Simply raise the maximum packet size up to the maximum
allowed by the transmit descriptor, 1792 bytes, giving a maximum MTU
of 1774 bytes.
Signed-off-by: Alban Bedel <albeu@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Please pull this batch of updates intended for the 3.19 stream!
For the mac80211 bits, Johannes says:
"This relatively large batch of changes is comprised of the following:
* large mac80211-hwsim changes from Ben, Jukka and a bit myself
* OCB/WAVE/11p support from Rostislav on behalf of the Czech Technical
University in Prague and Volkswagen Group Research
* minstrel VHT work from Karl
* more CSA work from Luca
* WMM admission control support in mac80211 (myself)
* various smaller fixes, spelling corrections, and minor API additions"
For the Bluetooth bits, Johan says:
"Here's the first bluetooth-next pull request for 3.19. The vast majority
of patches are for ieee802154 from Alexander Aring with various fixes
and cleanups. There are also several LE/SMP fixes as well as improved
support for handling LE devices that have lost their pairing information
(the patches from Alfonso). Jukka provides a couple of stability fixes
for 6lowpan and Szymon conformance fixes for RFCOMM. For the HCI drivers
we have one new USB ID for an Acer controller as well as a reset
handling fix for H5."
For the Atheros bits, Kalle says:
"Major changes are:
o ethtool support (Ben)
o print dev string prefix with debug hex buffers dump (Michal)
o debugfs file to read calibration data from the firmware verification
purposes (me)
o fix fw_stats debugfs file, now results are more reliable (Michal)
o firmware crash counters via debugfs (Ben&me)
o various tracing points to debug firmware (Rajkumar)
o make it possible to provide firmware calibration data via a file (me)
And we have quite a lot of smaller fixes and clean up."
For the iwlwifi bits, Emmanuel says:
"The big new thing here is netdetect which allows the
firmware to wake up the platform when a specific network
is detected. Along with that I have fixes for d3 operation.
The usual amount of rate scaling stuff - we now support STBC.
The other commit that stands out is Johannes's work on
devcoredump. He basically starts to use the standard
infrastructure he built."
Along with that are the usual sort of updates and such for ath9k,
brcmfmac, wil6210, and a handful of other bits here and there...
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 10 Nov 2014 19:25:49 +0000 (14:25 -0500)]
Merge branch 'raw_probe_proto_opt'
Herbert Xu says:
====================
ipv4: Simplify raw_probe_proto_opt and avoid reading user iov twice
This series rewrites the function raw_probe_proto_opt in a more
readable fasion, and then fixes the long-standing bug where we
read the probed bytes twice which means that what we're using to
probe may in fact be invalid.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 7 Nov 2014 13:27:09 +0000 (21:27 +0800)]
ipv4: Avoid reading user iov twice after raw_probe_proto_opt
Ever since raw_probe_proto_opt was added it had the problem of
causing the user iov to be read twice, once during the probe for
the protocol header and once again in ip_append_data.
This is a potential security problem since it means that whatever
we're probing may be invalid. This patch plugs the hole by
firstly advancing the iov so we don't read the same spot again,
and secondly saving what we read the first time around for use
by ip_append_data.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Fri, 7 Nov 2014 13:27:08 +0000 (21:27 +0800)]
ipv4: Use standard iovec primitive in raw_probe_proto_opt
The function raw_probe_proto_opt tries to extract the first two
bytes from the user input in order to seed the IPsec lookup for
ICMP packets. In doing so it's processing iovec by hand and
overcomplicating things.
This patch replaces the manual iovec processing with a call to
memcpy_fromiovecend.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
This series moves the debugfs code to a new file debugfs.c and cleans up
macros/register defines.
Various patches have ended up changing the style of the symbolic macros/register
defines and some of them used the macros/register defines that matches the
output of the script from the hardware team.
As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by five different drivers, a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent.
Will post few more series so that we can cover all the macros so that they all
follow the same style to be consistent.
The patches series is created against 'net-next' tree.
And includes patches on cxgb4, cxgb4vf, iw_cxgb4, csiostor and cxgb4i driver.
We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.
V3: Use suffix instead of prefix for macros/register defines
V2: Changes the description and cover-letter content to answer David Miller's
question
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb4: Cleanup macros so they follow the same style and look consistent, part 2
Various patches have ended up changing the style of the symbolic macros/register
defines to different style.
As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by different drivers a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent. This patch cleans up a part
of it.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb4: Cleanup macros so they follow the same style and look consistent
Various patches have ended up changing the style of the symbolic macros/register
to different style.
As a result, the current kernel.org files are a mix of different macro styles.
Since this macro/register defines is used by different drivers a
few patch series have ended up adding duplicate macro/register define entries
with different styles. This makes these register define/macro files a complete
mess and we want to make them clean and consistent. This patch cleans up a part
of it.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 7 Nov 2014 05:10:11 +0000 (21:10 -0800)]
mlx4: use napi_complete_done()
To enable gro_flush_timeout, a driver has to use napi_complete_done()
instead of napi_complete().
Tested:
Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
Without this feature, we send back about 305,000 ACK per second.
GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)
Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.
Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.
Eric Dumazet [Fri, 7 Nov 2014 05:09:44 +0000 (21:09 -0800)]
net: gro: add a per device gro flush timer
Tuning coalescing parameters on NIC can be really hard.
Servers can handle both bulk and RPC like traffic, with conflicting
goals : bulk flows want as big GRO packets as possible, RPC want minimal
latencies.
To reach big GRO packets on 10Gbe NIC, one can use :
ethtool -C eth0 rx-usecs 4 rx-frames 44
But this penalizes rpc sessions, with an increase of latencies, up to
50% in some cases, as NICs generally do not force an interrupt when
a packet with TCP Push flag is received.
Some NICs do not have an absolute timer, only a timer rearmed for every
incoming packet.
This patch uses a different strategy : Let GRO stack decides what do do,
based on traffic pattern.
Packets with Push flag wont be delayed.
Packets without Push flag might be held in GRO engine, if we keep
receiving data.
This new mechanism is off by default, and shall be enabled by setting
/sys/class/net/ethX/gro_flush_timeout to a value in nanosecond.
To fully enable this mechanism, drivers should use napi_complete_done()
instead of napi_complete().
Tested:
Ran 200 netperf TCP_STREAM from A to B (10Gbe mlx4 link, 8 RX queues)
Without this feature, we send back about 305,000 ACK per second.
GRO aggregation ratio is low (811/305 = 2.65 segments per GRO packet)
Setting a timer of 2000 nsec is enough to increase GRO packet sizes
and reduce number of ACK packets. (811/19.2 = 42)
Receiver performs less calls to upper stacks, less wakes up.
This also reduces cpu usage on the sender, as it receives less ACK
packets.
Note that reducing number of wakes up increases cpu efficiency, but can
decrease QPS, as applications wont have the chance to warmup cpu caches
doing a partial read of RPC requests/answers if they fit in one skb.
Dave Taht [Thu, 6 Nov 2014 16:10:14 +0000 (08:10 -0800)]
rtnetlink: add babel protocol recognition
Babel uses rt_proto 42. Add to userspace visible header file.
Signed-off-by: Dave Taht <dave.taht@bufferbloat.net> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Rick Jones [Thu, 6 Nov 2014 18:37:54 +0000 (10:37 -0800)]
udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts
As NIC multicast filtering isn't perfect, and some platforms are
quite content to spew broadcasts, we should not trigger an event
for skb:kfree_skb when we do not have a match for such an incoming
datagram. We do though want to avoid sweeping the matter under the
rug entirely, so increment a suitable statistic.
This incorporates feedback from David L. Stevens, Karl Neiss and Eric
Dumazet.
V3 - use bool per David Miller
Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Fri, 7 Nov 2014 14:46:42 +0000 (16:46 +0200)]
stmmac: platform: fix sparse warnings
This patch fixes the following sparse warnings. One is fixed by casting return
value to a return type of the function. The others by creating a specific
stmmac_platform.h which provides the bits related to the platform driver.
drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c:59:29: warning: incorrect type in return expression (different address spaces)
drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c:59:29: expected void *
drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c:59:29: got void [noderef] <asn:2>*reg
drivers/net/ethernet/stmicro/stmmac/dwmac-meson.c:64:29: warning: symbol 'meson6_dwmac_data' was not declared. Should it be static?
drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c:354:29: warning: symbol 'stih4xx_dwmac_data' was not declared. Should it be static?
drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c:361:29: warning: symbol 'stid127_dwmac_data' was not declared. Should it be static?
drivers/net/ethernet/stmicro/stmmac/dwmac-sunxi.c:133:29: warning: symbol 'sun7i_gmac_data' was not declared. Should it be static?
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 7 Nov 2014 17:13:40 +0000 (12:13 -0500)]
Merge branch 'iov_iter'
Herbert Xu says:
====================
Replace skb_copy_datagram_const_iovec with iterator version
This patch series adds the helper skb_copy_datagram_iter, which
is meant to replace both skb_copy_datagram_iovec and its evil
twin skb_copy_datagram_const_iovec.
It then converts tun and macvtap over to the new helper and finally
removes skb_copy_datagram_const_iovec which is only used by tun
and macvtap.
The copy_to_iter return value issue pointed out by Al has now been
fixed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 7 Nov 2014 02:06:01 +0000 (18:06 -0800)]
vxlan: Fix to enable UDP checksums on interface
Add definition to vxlan nla_policy for UDP checksum. This is necessary
to enable UDP checksums on VXLAN.
In some instances, enabling UDP checksums can improve performance on
receive for devices that return legacy checksum-unnecessary for UDP/IP.
Also, UDP checksum provides some protection against VNI corruption.
Testing:
Ran 200 instances of TCP_STREAM and TCP_RR on bnx2x.
TCP_STREAM
IPv4, without UDP checksums
14.41% TX CPU utilization
25.71% RX CPU utilization
9083.4 Mbps
IPv4, with UDP checksums
13.99% TX CPU utilization
13.40% RX CPU utilization
9095.65 Mbps
TCP_RR
IPv4, without UDP checksums
94.08% TX CPU utilization
156/248/462 90/95/99% latencies
1.12743e+06 tps
IPv4, with UDP checksums
94.43% TX CPU utilization
158/250/462 90/95/99% latencies
1.13345e+06 tps
Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following series of patches fixes a couple of bugs that slipped
through my last series.
- Free channel structure after freeing the per channel interrupts
- If an skb error allocation occurs during receive processing check
whether more descriptors are associated with the packet or whether
to start on a new packet
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
amd-xgbe: Check for complete packet on skb allocation error
If the skb allocation fails during receive processing, the driver would
continue reading descriptors without first determining if there were
any more descriptors for the current packet. Update the code to check
whether more descriptors are associated with the current packet or
whether to move on to the next descriptor as a new packet.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The channel structure is freed before freeing the per channel
interrupts resulting in a kernel oops. Move the call to free
the channel structure to after the freeing of the per channel
interrupts.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
fixes: a03bb56e67c357980dae886683733dab5583dc14 ("enic: implement rx_copybreak") Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
enic: handle error condition properly in enic_rq_indicate_buf
In case of error in rx path, we free the buf->os_buf but we do not make it NULL.
In next iteration we use the skb which is already freed. This causes the
following crash.
fixes: a03bb56e67c357980dae886683733dab5583dc14 ("enic: implement rx_copybreak") Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 6 Nov 2014 10:51:22 +0000 (12:51 +0200)]
net/mlx5_core: Fix race on driver load
When events arrive at driver load, the event handler gets called even before
the spinlock and list are initialized. Fix this by moving the initialization
before EQs creation.
Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eli Cohen [Thu, 6 Nov 2014 10:51:21 +0000 (12:51 +0200)]
net/mlx5_core: Fix race in create EQ
After the EQ is created, it can possibly generate interrupts and the interrupt
handler is referencing eq->dev. It is therefore required to set eq->dev before
calling request_irq() so if an event is generated before request_irq() returns,
we will have a valid eq->dev field.
Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 6 Nov 2014 20:16:44 +0000 (15:16 -0500)]
Merge branch 'sunvnet-next'
Sowmini Varadhan says:
====================
sunvnet: bug fixes
This patch series has a coding-style fix and a bug fix.
The coding style fix (patch 1) is the extra indentation flagged by
Ben Hutchings:
http://marc.info/?l=linux-netdev&m=141529243409594&w=2
The bugfix (patch 2) is the following:
when vnet_event_napi() is called as part of napi_resume
(i.e., continuation of a previous NAPI read that was truncated
due to budget constraints), and then finds no more packets to read,
the code was trying to avoid an additional trip through ldc_rx
as an optimization. However, when this corner case happens, we would
need to reset a number of dring state bits such as rcv_nxt carefully,
which quickly becomes complex and hacky. The cleaner solution
is to just roll back to vnet_poll, re-enable interrupts and set up
dring state as was done in the pre-NAPI version of the driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
sunvnet: Return from vnet_napi_event() if no packets to read
vnet_event_napi() may be called as part of the NAPI ->poll,
to resume reading descriptor rings. When no data is available,
descriptor ring state (e.g., rcv_nxt) needs to be reset
carefully to stay in lock-step with ldc_read(). In the interest
of simplicity, the best way to do this is to return from
vnet_event_napi() when there are no more packets to read.
The next trip through ldc_rx will correctly set up the dring state.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: David Stevens <david.stevens@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Wed, 5 Nov 2014 18:47:28 +0000 (19:47 +0100)]
net: dsa: slave: Fix autoneg for phys on switch MDIO bus
When the ports phys are connected to the switches internal MDIO bus,
we need to connect the phy to the slave netdev, otherwise
auto-negotiation etc, does not work.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Wed, 5 Nov 2014 19:51:51 +0000 (20:51 +0100)]
sched: fix act file names in header comment
Fixes: 4bba3925 ("[PKT_SCHED]: Prefix tc actions with act_") Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Mugunthan V N [Wed, 5 Nov 2014 13:03:31 +0000 (18:33 +0530)]
drivers: net: cpsw: remove cpsw_ale_stop from cpsw_ale_destroy
when cpsw is build as modulea and simple insert and removal of module
creates a deadlock, due to delete timer. the timer is created and destroyed
in cpsw_ale_start and cpsw_ale_stop which are from device open and close.
root@am437x-evm:~# modprobe -r ti_cpsw
[ 158.505333] INFO: trying to register non-static key.
[ 158.510623] the code is fine but needs lockdep annotation.
[ 158.516448] turning off the locking correctness validator.
[ 158.522282] CPU: 0 PID: 1339 Comm: modprobe Not tainted 3.14.23-00445-gd41c88f #44
[ 158.530359] [<c0015380>] (unwind_backtrace) from [<c0012088>] (show_stack+0x10/0x14)
[ 158.538603] [<c0012088>] (show_stack) from [<c054ad70>] (dump_stack+0x78/0x94)
[ 158.546295] [<c054ad70>] (dump_stack) from [<c0088008>] (__lock_acquire+0x176c/0x1b74)
[ 158.554711] [<c0088008>] (__lock_acquire) from [<c0088944>] (lock_acquire+0x9c/0x104)
[ 158.563043] [<c0088944>] (lock_acquire) from [<c004e520>] (del_timer_sync+0x44/0xd8)
[ 158.571289] [<c004e520>] (del_timer_sync) from [<bf2eac1c>] (cpsw_ale_destroy+0x10/0x3c [ti_cpsw])
[ 158.580821] [<bf2eac1c>] (cpsw_ale_destroy [ti_cpsw]) from [<bf2eb268>] (cpsw_remove+0x30/0xa0 [ti_cpsw])
[ 158.591000] [<bf2eb268>] (cpsw_remove [ti_cpsw]) from [<c035ef44>] (platform_drv_remove+0x18/0x1c)
[ 158.600527] [<c035ef44>] (platform_drv_remove) from [<c035d8bc>] (__device_release_driver+0x70/0xc8)
[ 158.610236] [<c035d8bc>] (__device_release_driver) from [<c035e0d4>] (driver_detach+0xb4/0xb8)
[ 158.619386] [<c035e0d4>] (driver_detach) from [<c035d6e4>] (bus_remove_driver+0x4c/0x90)
[ 158.627988] [<c035d6e4>] (bus_remove_driver) from [<c00af2a8>] (SyS_delete_module+0x10c/0x198)
[ 158.637144] [<c00af2a8>] (SyS_delete_module) from [<c000e580>] (ret_fast_syscall+0x0/0x48)
[ 179.524727] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=2102 jiffies, g=1487, c=1486, q=6)
[ 179.535741] INFO: Stall ended before state dump start
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Karl Beldan [Wed, 5 Nov 2014 14:32:59 +0000 (15:32 +0100)]
net: mv643xx_eth: reclaim TX skbs only when released by the HW
ATM, txq_reclaim will dequeue and free an skb for each tx desc released
by the hw that has TX_LAST_DESC set. However, in case of TSO, each
hw desc embedding the last part of a segment has TX_LAST_DESC set,
losing the one-to-one 'last skb frag'/'TX_LAST_DESC set' correspondance,
which causes data corruption.
Fix this by checking TX_ENABLE_INTERRUPT instead of TX_LAST_DESC, and
warn when trying to dequeue from an empty txq (which can be symptomatic
of releasing skbs prematurely).
Fixes: 3ae8f4e0b98 ('net: mv643xx_eth: Implement software TSO') Reported-by: Slawomir Gajzner <slawomir.gajzner@gmail.com> Reported-by: Julien D'Ascenzio <jdascenzio@yahoo.fr> Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Ian Campbell <ijc@hellion.org.uk> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
xen-netback: remove unconditional __pskb_pull_tail() in guest Tx path
Unconditionally pulling 128 bytes into the linear area is not required
for:
- security: Every protocol demux starts with pskb_may_pull() to pull
frag data into the linear area, if necessary, before looking at
headers.
- performance: Netback has already grant copied up-to 128 bytes from
the first slot of a packet into the linear area. The first slot
normally contain all the IPv4/IPv6 and TCP/UDP headers.
The unconditional pull would often copy frag data unnecessarily. This
is a performance problem when running on a version of Xen where grant
unmap avoids TLB flushes for pages which are not accessed. TLB
flushes can now be avoided for > 99% of unmaps (it was 0% before).
Grant unmap TLB flush avoidance will be available in a future version
of Xen (probably 4.6).
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Wed, 5 Nov 2014 09:45:32 +0000 (11:45 +0200)]
stmmac: fix sparse warnings
This patch fixes the following sparse warnings.
drivers/net/ethernet/stmicro/stmmac/enh_desc.c:381:30: warning: symbol 'enh_desc_ops' was not declared. Should it be static?
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:253:30: warning: symbol 'ndesc_ops' was not declared. Should it be static?
drivers/net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c:141:33: warning: symbol 'stmmac_ptp' was not declared. Should it be static?
There is no functional change.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Giuseppe CAVALLARO <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_tunnel: Add support for wildcard tunnel endpoints.
This patch adds support for tunnels with local or
remote wildcard endpoints. With this we get a
NBMA tunnel mode like we have it for ipv4 and
sit tunnels.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: Allow sending packets through tunnels with wildcard endpoints
Currently we need the IP6_TNL_F_CAP_XMIT capabiltiy to transmit
packets through an ipv6 tunnel. This capability is set when the
tunnel gets configured, based on the tunnel endpoint addresses.
On tunnels with wildcard tunnel endpoints, we need to do the
capabiltiy checking on a per packet basis like it is done in
the receive path.
This patch extends ip6_tnl_xmit_ctl() to take local and remote
addresses as parameters to allow for per packet capabiltiy
checking.
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pravin B Shelar [Sun, 19 Oct 2014 19:03:40 +0000 (12:03 -0700)]
openvswitch: Avoid NULL mask check while building mask
OVS does mask validation even if it does not need to convert
netlink mask attributes to mask structure. ovs_nla_get_match()
caller can pass NULL mask structure pointer if the caller does
not need mask. Therefore NULL check is required in SW_FLOW_KEY*
macros. Following patch does not convert mask netlink attributes
if mask pointer is NULL, so we do not need these checks in
SW_FLOW_KEY* macro.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Daniele Di Proietto <ddiproietto@vmware.com> Acked-by: Andy Zhou <azhou@nicira.com>
Pravin B Shelar [Sun, 19 Oct 2014 18:19:51 +0000 (11:19 -0700)]
openvswitch: Refactor action alloc and copy api.
There are two separate API to allocate and copy actions list. Anytime
OVS needs to copy action list, it needs to call both functions.
Following patch moves action allocation to copy function to avoid
code duplication.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Chunhe Li [Mon, 8 Sep 2014 20:17:21 +0000 (13:17 -0700)]
openvswitch: Drop packets when interdev is not up
If the internal device is not up, it should drop received
packets. Sometimes it receive the broadcast or multicast
packets, and the ip protocol stack will casue more cpu
usage wasted.
Signed-off-by: Chunhe Li <lichunhe@huawei.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Joe Stringer [Mon, 8 Sep 2014 20:09:37 +0000 (13:09 -0700)]
openvswitch: Refactor ovs_flow_cmd_fill_info().
Split up ovs_flow_cmd_fill_info() to make it easier to cache parts of a
dump reply. This will be used to streamline flow_dump in a future patch.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Andy Zhou [Mon, 8 Sep 2014 07:35:02 +0000 (00:35 -0700)]
openvswitch: refactor do_output() to move NULL check out of fast path
skb_clone() NULL check is implemented in do_output(), as past of the
common (fast) path. Refactoring so that NULL check is done in the
slow path, immediately after skb_clone() is called.
Besides optimization, this change also improves code readability by
making the skb_clone() NULL check consistent within OVS datapath
module.
Signed-off-by: Andy Zhou <azhou@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Jesse Gross [Mon, 6 Oct 2014 12:08:38 +0000 (05:08 -0700)]
openvswitch: Additional logging for -EINVAL on flow setups.
There are many possible ways that a flow can be invalid so we've
added logging for most of them. This adds logs for the remaining
possible cases so there isn't any ambiguity while debugging.
CC: Federico Iezzi <fiezzi@enter.it> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Simon Horman [Mon, 6 Oct 2014 12:05:13 +0000 (05:05 -0700)]
openvswitch: Add basic MPLS support to kernel
Allow datapath to recognize and extract MPLS labels into flow keys
and execute actions which push, pop, and set labels on packets.
Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and Joe Stringer.
Cc: Ravi K <rkerur@gmail.com> Cc: Leo Alterman <lalterman@nicira.com> Cc: Isaku Yamahata <yamahata@valinux.co.jp> Cc: Joe Stringer <joe@wand.net.nz> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Pravin B Shelar [Wed, 5 Nov 2014 23:27:48 +0000 (15:27 -0800)]
net: Remove MPLS GSO feature.
Device can export MPLS GSO support in dev->mpls_features same way
it export vlan features in dev->vlan_features. So it is safe to
remove NETIF_F_GSO_MPLS redundant flag.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
hayeswang [Wed, 5 Nov 2014 02:17:02 +0000 (10:17 +0800)]
r8152: disable the tasklet by default
Let the tasklet only be enabled after open(), and be disabled for
the other situation. The tasklet is only necessary after open() for
tx/rx, so it could be disabled by default.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 5 Nov 2014 19:27:38 +0000 (20:27 +0100)]
ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs
It has been reported that generating an MLD listener report on
devices with large MTUs (e.g. 9000) and a high number of IPv6
addresses can trigger a skb_over_panic():
mld_newpack() skb allocations are usually requested with dev->mtu
in size, since commit 72e09ad107e7 ("ipv6: avoid high order allocations")
we have changed the limit in order to be less likely to fail.
However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
macros, which determine if we may end up doing an skb_put() for
adding another record. To avoid possible fragmentation, we check
the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
assumption as the actual max allocation size can be much smaller.
The IGMP case doesn't have this issue as commit 57e1ab6eaddc
("igmp: refine skb allocations") stores the allocation size in
the cb[].
Set a reserved_tailroom to make it fit into the MTU and use
skb_availroom() helper instead. This also allows to get rid of
igmp_skb_size().
Reported-by: Wei Liu <lw1a2.jing@gmail.com> Fixes: 72e09ad107e7 ("ipv6: avoid high order allocations") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: David L Stevens <david.stevens@oracle.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
By default the arch_fast_hash hashing function pointers are initialized
to jhash(2). If during boot-up a CPU with SSE4.2 is detected they get
updated to the CRC32 ones. This dispatching scheme incurs a function
pointer lookup and indirect call for every hashing operation.
rhashtable as a user of arch_fast_hash e.g. stores pointers to hashing
functions in its structure, too, causing two indirect branches per
hashing operation.
Using alternative_call we can get away with one of those indirect branches.
Acked-by: Daniel Borkmann <dborkman@redhat.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>