git.karo-electronics.de Git - linux-beck.git/log

Merge branch 'fec-next'

Russell King says:

====================
Freescale ethernet driver updates (part 2)

Here's the second batch of patches for the Freescale FEC ethernet driver,
based upon the previous set of patches. One further set of 7 patches
remains.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: clean up duplex mode handling

Many places call fec_restart() with the second parameter being some kind
of previously saved duplex value, but only two places call it with some
other setting. This is at odds with how the other link settings are
handled, and used to be racy before the rtnl locks were added to
fec_restart()'s various call paths.

Clean this up so all link capabilities are handled in the same way -
saved into the fec_enet_private structure, and then fec_restart() acts
on those settings.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: quiesce packet processing when taking link down in fec_enet_adjust_link()

When the link goes down, the adjust_link method will be called, but
there is no synchronisation to ensure that we won't be processing some
last remaining packets via the NAPI handlers while performing a reset of
the device.

Add the necessary synchronisation to ensure that packet processing
is complete before we stop and reset the FEC.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: quiesce packet processing before changing features

Changing the features (receive checksumming) requires the hardware to be
reprogrammed, and also changes the checks in the receive packet
processing.

The current implementation has a race - fec_set_features() changes the
flags which alter the receive packet processing while the adapter is
active, and potentially receiving frames. Only after we've modified
the software flag do we shutdown and reconfigure the hardware.

This can lead to packets being received and marked with a valid checksum
(via CHECKSUM_UNNECESSARY) when the hardware checksum validation has not
yet been enabled.

We must quiesce the device, then change the software configuration for
this feature, and then resume the device if it was previously running.

The resulting code structure also allows us to add other configuration
features in this path without having to quiesce and resume the network
interface and device.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: quiesce packet processing before stopping device in fec_set_features()

fec_set_features() calls fec_stop() to stop the transmit ring while the
transmit queue is still active. This can lead to the transmit ring
being restarted by an intervening packet queued for transmission, or
by the tx quirk timer expiring.

Fix this by disabling NAPI (which ensures that the NAPI handlers are
not running), and then take the transmit lock while we stop and
restart the adapter (which prevents new packets being queued).

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: quiesce packet processing before stopping device in fec_suspend()

fec_suspend() calls fec_stop() to stop the transmit ring while the
transmit packet processing is still active.  This can lead to the
transmit queue being restarted by an intervening packet queued for
transmission, or by the tx quirk timer expiring.

Fix this by disabling NAPI first, which will ensure that the NAPI
handlers are not running.  Then, take the transmit lock before
detaching the netif device.  This ensures that there are no races
with the transmit path - and also ensures that the watchdog won't
fire.

We can then safely stop the ethernet device itself, knowing that the
rest of the driver is safely shut down.

On resume, we bring the device back up in reverse order - we restart
the device, reattach the device (under the tx lock), and then enable
the NAPI handlers.

We also need to adjust the close function to cope with this new
sequence, so that it's possible to cleanly close down the driver
after the hardware fails to resume (eg, due to the regulator_enable()
or pinctrl calls in the resume path returning an error.)

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: remove inappropriate calls around fec_restart()

This is the second stage to "move calls to quiesce/resume packet
processing out of fec_restart()", where we remove calls which are not
appropriate to the call site.

In the majority of cases, there is no need to detach and reattach the
interface as we are holding the queue xmit lock across the reset.  The
exception to that is in fec_resume(), where we are already detached by
the suspend function.  Here, we can remove the call to detach the
interface.

We also do not need to stop the transmit queue.  Holding the xmit lock
is enough to ensure that the transmit packet processing is not running
while we perform our task.  However, since fec_restart() always cleans
the rings, we call netif_wake_queue() (or netif_device_attach() in the
case of resume) just before dropping the xmit lock.  This prevents the
watchdog firing.

Lastly, always call napi_enable() after the device has been reattached
in the resume path so that we know that the transmit packet processing
is already in an enabled state, so we don't call netif_wake_queue()
while detached.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: move calls to quiesce/resume packet processing out of fec_restart()

Move the calls to quiesce and resume packet processing out of
fec_restart() to its call sites. This is the first step in a two stage
clean up of this code, where we just move the calls out of fec_restart()
without changing them. Not everywhere needs to issue these calls, and
not everywhere needs all of these calls to be issued.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: only restart or stop the device if it is present and running

Avoid calling fec_restart() or fec_stop() while the device is down
or not present (iow suspended.)

Although the ndo_timeout method will only be called if the device is
present and running, we defer this to a work queue.  The work queue
can run independently, and so needs to repeat these checks to ensure
that a restart doesn't occur after the device has been taken down or
detached for suspend.  In this case, we call fec_restart() in the
resume path, so nothing is lost.

For fec_set_features, we add a call to fec_restart() in fec_enet_open()
to ensure that the hardware is appropriate programmed when the interface
is opened.  fec_set_features() call should not occur while we're
suspended, so we don't have to worry about that case.

The adjust_link needs similar treatment - this also is called from a
work queue, which may be run independently after we have taken the
device down and detached it.  In this case, we just mark the link
down and take no further action.  We will reset things appropriately
once the device is up and running again, at which point we will receive
another adjust_link callback.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: ensure fec_enet_close() copes with resume failure

When the FEC is suspended, the device is detached. Upon resume failure,
the device is left in detached mode, possibly with some of the required
clocks not running. We don't want to be poking the device in that state
because as it may cause bus errors.

If the device is marked detached, avoid calling fec_stop().

This depends upon: "net:fec: improve safety of suspend/resume paths"

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: improve safety of suspend/resume/transmit timeout paths

We should hold the rtnl lock while suspending, resuming or processing
the transmit timeout to ensure that nothing will interfere while we
bring up, take down or restart the hardware. The transmit timeout
could run if we're preempted during suspend.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4-next'

Amir Vadai says:

====================
Mellanox driver update Jul-08-2014

This patch set introduce some small bug fixes.
Most of the patches are small fixes to cornet case bugs.
The patch by Noa ("Fix mac_hash database inconsistency") was sent in the past
[1] and was droped because a fix to the bonding code was supposed to make it
unnecessary. After a second look on the patch, it is still needed even
after the direct access to dev_addr by the bonding will be fixed.

Patches were applied and tested over commit bd4578b
("drivers/net/hyperv/netvsc.c: remove unnecessary null test before kfree")

[1] - http://permalink.gmane.org/gmane.linux.network/315900
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix mac_hash database inconsistency

Using a local copy of dev_addr in mlx4_en_set_mac() to prevent dev_addr
from being modified during error flow or when dev_addr is modified in
another context (which is another problem that is being discussed over
the mailing list [1]).
Also fixing bad naming of priv->prev_mac into priv->current_mac.

[1] - http://patchwork.ozlabs.org/patch/351489/

Reviewed-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Do not count LLC/SNAP in MTU calculation

LLC/SNAP 8 bytes should not be added as part of header calculation.
If used, payload will be decreased accordingly. For MTU of 1500
we'll set 1522 instead of 1523.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Liran Liss <liranl@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Do not disable vlan filter during promiscuous mode

Promiscous mode is only for MACs.
Should not disable/enable VLAN filter when entering/leaving promisuous mode.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.co.il>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4: Verify port number in __mlx4_unregister_mac

Verify port number to avoid crashes if port number is outside the range.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Run loopback test only when port is up

Loopback can't work when port is down.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: Fix set port ratelimit for 40GE

In 40GE we can't use the default bw units for set ratelimit (100 Mbps)
since the max is 255*100 Mbps = 25 Gbps (not suited for 40GE), thus we need 1 Gbps units.
But for 10GE 1 Gbps units might be too bruit so we use the following solution.

For user set ratelimit <= 25 Gbps:
        use 100 Mbps units * user_ratelimit (* 10).

For user set ratelimit > 25 Gbps:
        use 1 Gbps units * user_ratelimit.

For user set unlimited ratelimit (0 Gbps):
        use 1 Gbps units * MAX_RATELIMIT_DEFAULT (57)

Note: any value > 58 will damage the FW ratelimit computation, so we allow
      a max and any higher value will be pulled down to 57.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bridge_batmanadv_exports'

Linus Lüssing says:

====================
bridge: multicast snooping exports #2

Some people pointed out to me that it might be helpful to add stubs for
the newly added multicast exports. That way e.g. batman-adv should continue
to be compile and useable without having to have a kernel compiled
with bridge code in the future. This is what the first patch is supposed
to do.

The second patch adds a third multicast export for the bridge which
e.g. batman-adv is supposed to use, too, soon: Just like the bridge
disables its multicast snooping activities if no querier is present,
batman-adv needs to do the same if bridges are involved.

These three exports should be the final ones needed to marry the bridge
multicast snooping with the batman-adv multicast optimizations recently
added for the 3.15 kernel, allowing to use these optimzations in common
setups having a bridge on top of e.g. bat0, too. So far these bridged
setups would fall back to simple flooding through the batman-adv mesh
network for any multicast packet entering bat0.

More information about the batman-adv multicast optimizations currently
implemented can be found here:

http://www.open-mesh.org/projects/batman-adv/wiki/Basic-multicast-optimizations

The integration on the batman-adv side could afterwards look like this,
for instance (now including the third export):

http://git.open-mesh.org/batman-adv.git/commitdiff/61e4f6af4b7a21ed4040f2e711d50c778e5b6d93?hp=6ae4281474675fbca5bedcf768972a32db586eb6
====================

bridge: export knowledge about the presence of IGMP/MLD queriers

With this patch other modules are able to ask the bridge whether an
IGMP or MLD querier exists on the according, bridged link layer.

Multicast snooping can only be performed if a valid, selected querier
exists on a link.

Just like the bridge only enables its multicast snooping if a querier
exists, e.g. batman-adv too can only activate its multicast
snooping in bridged scenarios if a querier is present.

For instance this export avoids having to reimplement IGMP/MLD
querier message snooping and parsing in e.g. batman-adv, when
multicast optimizations for bridged scenarios are added in the
future.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: adding stubs for multicast exports

To make users (e.g. batman-adv soon) load- and runnable even if the
bridge was compiled without snooping capabilities - or even if the
kernel was compiled without any bridge code at all.

Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: fix a memleak when sending data

This fixes a regression bug caused by:
067608e9d019d6477fd45dd948e81af0e5bf599f ("tipc: introduce direct
iovec to buffer chain fragmentation function")

If data is sent on a nonblocking socket and the destination link
is congested, the buffer chain is leaked. We fix this by freeing
the chain in this case.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

defxx: Fix issues with debug printk calls

This fixes issues with debug printk calls across the driver, normally
disabled; first compilation errors:

drivers/net/fddi/defxx.c:676:1: error: pasting "(" and ""In dfx_bus_init...\n"" does not give a valid preprocessing token
drivers/net/fddi/defxx.c:820:1: error: pasting "(" and ""In dfx_bus_uninit...\n"" does not give a valid preprocessing token

and so on, and then warnings:

drivers/net/fddi/defxx.c: In function 'dfx_driver_init':
drivers/net/fddi/defxx.c:1132: warning: format '%0X' expects type 'unsigned int', but argument 4 has type 'dma_addr_t'
drivers/net/fddi/defxx.c:1132: warning: format '%0X' expects type 'unsigned int', but argument 4 has type 'dma_addr_t'

etc. Additionally casts are removed from virtual addresses and %p used.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'defxx-next'

Maciej W. Rozycki says:

====================
defxx: Fixes for 64-bit host support

This mini patch series addresses issues with 64-bit host support for FDDI
interface boards supported by the defxx driver where DMA mapping
synchronisation is required on swiotlb systems.  While PDQ, the DMA engine
chip used with these boards, supports 48-bit addressing that would
normally suffice for typical 64-bit systems in existence, the host bus
interface chips used by individual implementations have their limitations
as follows:

* DEFTA or DEC FDDIcontroller/TURBOchannel -- there's no host bus
  interface chip, the PDQ connects to TURBOchannel directly; TURBOchannel
  supports DMA addressing of up to 16GB (34-bit addressing), however no
  TURBOchannel system has ever been made that supports more than 1GB of
  RAM, so in reality no remapping is ever required,

* DEFEA or DEC FDDIcontroller/EISA -- the ESIC EISA interface chip only
  supports 32-bit addressing, all accesses beyond 4GB have to be remapped,

* DEFPA or DEC FDDIcontroller/PCI -- the PFI PCI interface chip rev. 1 & 2
  only support 32-bit addressing, they have 32 AD lines only both on the
  PDQ and the PCI side, and consequently no Dual Address Cycle support, so
  all accesses beyond 4GB have to be remapped; the range of addressing
  supported by PFI rev. 3 is currently not certain, however the chip is
  backwards compatible with earlier revisions and will work with code that
  supports them.

Some other issues discovered in the course of correcting 64-bit support
have been fixed as well.  Each of the patches is functionally
self-contained and can be applied independentely, although there may be
mechanical dependencies making it necessary to apply patches in order.

The driver suffers from non-standard formatting and while I did my best
with these bug fixes to follow our coding style, I found some pieces
hopeless, checkpatch.pl will complain.  I plan to reformat the whole
driver, that will inevitably require factoring out some pieces into
separate functions, but that's going to be a major effort and therefore I
want to do this separately, with no functional changes made at the same
time.  If anyone has specific suggestions as to how to reformat any of the
pieces submitted here for a better layout, then I'll be happy to take them
into account.

And last but not least many thanks to Robert Coerver, who was the most
recent person to report this problem with the driver and was kind enough
to patiently try a few revisions of the driver update on his system as I
was finding and addressing issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

defxx: Add missing DMA synchronisation calls

This adds DMA synchronisation calls needed in the receive path:

1. To retrieve the Receive Status word that is prepended by the PDQ DMA
   engine in the receive buffer, and provides information about the
   frame received, including its size and any errors.

2. To make data received available for copying in the small-frame case
   (size <= SKBUFF_RX_COPYBREAK) where the original DMA buffer will be
   returned to the receive descriptor ring and therefore its mapping
   retained.

   With DMA mapping error handling in place, added by the other patch,
   this may now also trigger where an attempt to map a newly allocated
   buffer for DMA has failed.  In that case data from the original buffer
   will be copied out and the buffer returned to the DMA descriptor ring.

These calls may do nothing when data is in the host DMA addressing range
of the FDDI interface, such as always on 32-bit systems, however their
absence makes frame reception stop functioning reliably on systems that
have memory beyond the low 4GB of the address space.

Reported-by: Robert Coerver <Robert.Coerver@ll.mit.edu>
Tested-by: Robert Coerver <Robert.Coerver@ll.mit.edu>
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

defxx: Handle DMA mapping errors

This adds error handling for DMA mapping requests; I think there isn't
much else to say about it.

A good side-effect is the mapping in the transmit path is now made with
the board lock released. Also if DMA mapping fails for a newly
allocated receive buffer, then data from the old buffer will be copied
out (as is presently done for small frames only whose size does not
exceed SKBUFF_RX_COPYBREAK) and the original buffer returned, with its
mapping unchanged, to the DMA descriptor ring.

Reported-by: Robert Coerver <Robert.Coerver@ll.mit.edu>
Tested-by: Robert Coerver <Robert.Coerver@ll.mit.edu>
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

defxx: Use netdev_alloc_skb consistently

Switch the two remaining places across the driver that use dev_alloc_skb
to netdev_alloc_skb. Another place has already been converted to use
__netdev_alloc_skb, no idea why these two have been left behind.

Reported-by: Robert Coerver <Robert.Coerver@ll.mit.edu>
Tested-by: Robert Coerver <Robert.Coerver@ll.mit.edu>
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

defxx: Discard DMA maps on buffer deallocation

Prearranged receive DMA bounce buffer mappings are not released in the
card reboot/shutdown path. That does not affect frame reception, but
probably explains the random segmentation fault I observed the other day
on interface shutdown. Card is rebooted as required by the spec in the
process of ring fault recovery when a PC Trace signal has been received.

This change fixes the problem in an obvious manner.

Reported-by: Robert Coerver <Robert.Coerver@ll.mit.edu>
Tested-by: Robert Coerver <Robert.Coerver@ll.mit.edu>
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

defxx: Correct the receive DMA map size

Receive DMA maps are oversized, they include EISA legacy 128-byte
alignment padding in size calculation whereas this padding is never used
for data. Worse yet, if the skb's data area has been realigned indeed,
then data beyond the end of the buffer will be synchronised from the
receive DMA bounce buffer, possibly corrupting data structures residing
in memory beyond the actual end of this data buffer.

Therefore switch to using PI_RCV_DATA_K_SIZE_MAX rather than NEW_SKB_SIZE
in DMA mapping, the value the former macro expands to is written to the
receive ring DMA descriptor of the PDQ DMA chip and determines the
maximum amount of data PDQ will ever transfer to the corresponding data
buffer, including all headers and padding.

Reported-by: Robert Coerver <Robert.Coerver@ll.mit.edu>
Tested-by: Robert Coerver <Robert.Coerver@ll.mit.edu>
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sctp_command_queue'

David Laight says:

====================
net: sctp: Optimisations to sctp command queue code

These 3 patches optimise the code that processes sctp's command queue.
(A list of 'tasks' to be performed after the rest of the chunk processing.)

1) Inline all the functions from command.c
2) Remove the memset() calls used to zero a word-sized union.
3) Use pointers instead of array indexes.

The combined changes reduce the code size (amd64) by a few kb.

I'm not 100% convinced that the zeroing done in patch 2 is needed at all.
On BE systems it is likely to generate more code than on LE ones.
In fact it might be best to change the union to only contain 'long' sized
items.

Changes for v2:
- Add some missing initialisers in patch 2/3 and delete them in 3/3.
- Modify the commit message for 2/3 to point out that the union
shouldn't need to be zeroed, but the patches aren't intended to
change the behaviour even if the code is buggy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: Use pointers (not array indexes) to access sctp_cmd_seq_t.cmds[].

Using pointers into sctp_cmd_seq_t.cmds[] lets the compiler generate much
better code.
Use the last entry first to optimise the overflow check.

Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: Optimise the way 'sctp_arg_t' values are initialised.

Even if memset() is inlined (as on x86) using it to zero the union
generates a memory word write of zero, followed by a write of the
smaller field, and then a read of the word.
As well as being a lot of instructions the sequence is unlikely to
be optimised by the store-load forward hardware so will be slow.

Instead allocate a field of the union that is the same size as the
entire union and write a zero value to it. The compiler will then
generate the required value in a register.

Zeroing the union shouldn't be necessary, but this patch series isn't
intended to have a behavioural change.

Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: Inline the functions from command.c

sctp_init_cmd_seq() and sctp_next_cmd() are only called from one place.
The call sequence for sctp_add_cmd_sf() is likely to be longer than
the inlined code.
With sctp_add_cmd_sf() inlined the compiler can optimise repeated calls.

Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

appletalk: fix a coccinella warning in net/appletalk/ddp.c

This warning is introduced by commit 7b30600cc6 ("appletalk:
fix checkpatch error with indent"), So fix it.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next

John W. Linville says:

====================
pull request: wireless-next 2014-07-03

Please pull this first batch of wireless updates intended for the
3.17 stream...

For the mac80211 bits, Johannes says:

"The biggest thing here is probably Arik's TDLS rework, beyond that we
have smaller improvements and features like David's scanning IE thing,
Luca's queue work, some CSA work, etc. Also your PID rate control
removal, of course."

For the iwlwifi bits, Emmanuel says:

"I have here a whole bunch of various things. Andy contributes
better debug prints for dvm specific flows and a module parameter to
completely disable power save for dvm. Andrei is sharing the premises
of his work on CSA - more to come. Eran and Liad keep on working
on the new devices. I have the regular amount of BT Coex stuff and
I continue to work on the firmware error report system adding more
debug capabilities. More to come on that subject too."

On top of that, there are some cleanups to the new rsi driver, some
continuing improvements to the rtl818x drivers, and the usual bundles
of updates to ath9k, b43, mwifiex, wil6210, and a few other bits here
and there.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: move load_pointer() into filter.h

load_pointer() is already a static inline function.
Let's move it into filter.h so BPF JIT implementations can reuse this
function.

Since we're exporting this function, let's also rename it to
bpf_load_pointer() for clarity.

Signed-off-by: Zi Shen Lim <zlim.lnx@gmail.com>
Reviewed-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

declance: Fix 64-bit compilation warnings

This fixes compiler warnings:

drivers/net/ethernet/amd/declance.c: In function 'lance_init_ring':
drivers/net/ethernet/amd/declance.c:478: warning: format '%8.8x' expects type 'unsigned int', but argument 3 has type 'long unsigned int'
drivers/net/ethernet/amd/declance.c:487: warning: format '%8.8x' expects type 'unsigned int', but argument 3 has type 'long unsigned int'
drivers/net/ethernet/amd/declance.c:503: warning: cast from pointer to integer of different size
drivers/net/ethernet/amd/declance.c:520: warning: cast from pointer to integer of different size

in 64-bit compilation. Where the value printed is an offset (whose range
will always fit) the cast uses a 32-bit type, otherwise, where it is a
host memory address, the pointer is output directly with %p. Also the
remaining `0x' prefix is dropped for consistency across these messages.

Tested with both 32-bit and 64-bit compilation, as well as at the run time
(with the debug messages affected enabled).

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hsr-next'

Arvid Brodin says:

====================
net/hsr: Use list_head+rcu, better frame dispatch, etc.

This patch series is meant to improve the HSR code in several ways:

* Better code readability.
* In general, make the code structure more like the net/bridge code (HSR
  operates similarly to a bridge, but uses the HSR-specific frame headers to
  break up rings, instead of the STP protocol).
* Better handling of HSR ports' net_device features.
* Use list_head and the _rcu list traversing routines instead of array of slave
  devices.
* Make it easy to support HSR Interlink devices (for future Redbox/Quadbox
  support).
* Somewhat better throughput on non-HAVE_EFFICIENT_UNALIGNED_ACCESS archs, due
  to lesser copying of skb data.

The code has been tested in a ring together with other HSR nodes running
unchanged code, on both avr32 and x86_64. There should only be one minor change
in behaviour from a user perspective:

* Anyone using the Netlink HSR_C_GET_NODE_LIST message to dump the internal
  node database will notice that the database now also contains the self node.

All patches pass 'checkpatch.pl --ignore CAMELCASE --max-line-length=83
--strict' with only CHECKs, each of which have been deliberately left in place.

The final code passes sparse checks with no output.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/hsr: Fix NULL pointer dereference on incomplete hsr_newlink() params.

If none of the slave interfaces are specified, struct nlattr *data[] may
be NULL. Make sure to check for that.

While I'm at it, fix the horrible error messages displayed when only one
of the slave interfaces isn't specified.

Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/hsr: Better frame dispatch

This patch removes the separate paths for frames coming from the outside, and
frames sent from the HSR device, and instead makes all frames go through
hsr_forward_skb() in hsr_forward.c. This greatly improves code readability and
also opens up the possibility for future support of the HSR Interlink device
that is the basis for HSR RedBoxes and HSR QuadBoxes, as well as VLAN
compatibility.

Other improvements:
* A reduction in the number of times an skb is copied on machines without
  HAVE_EFFICIENT_UNALIGNED_ACCESS, which improves throughput somewhat.
* Headers are now created using the standard eth_header(), and using the
  standard hard_header_len.
* Each HSR slave now gets its own private skb, so slave-specific fields can be
  correctly set.

Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/hsr: Added SET_NETDEV_DEVTYPE and features |= NETIF_F_NETNS_LOCAL to dev_setup.

Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/hsr: Implemented .ndo_fix_features (better device features handling).

Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/hsr: Use list_head (and rcu) instead of array for slave devices.

Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/hsr: Move slave init to hsr_slave.c.

Also try to prevent some possible slave dereference race conditions. This is
finalized in the next patch, which abandons the slave array in favour of
a list_head list and list RCU.

Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/hsr: Operstate handling cleanup.

Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/hsr: Move to per-hsr device prune timer.

Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/hsr: Switch from dev_add_pack() to netdev_rx_handler_register()

Also move the frame receive handler to hsr_slave.c.

Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/hsr: Better variable names and update of contact info.

Signed-off-by: Arvid Brodin <arvid.brodin@alten.se>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: increase the tx timeout

When the system is too busy to complete the urb, the tx timout function
would be called. This causes the other tx urbs would be killed, too.
Increase the tx timeout to avoid it.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipconfig: add static to local variable

ic_dev_xid is only used in ipconfig.c

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: netdev@vger.kernel.org
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'amd-xgbe-next'

Tom Lendacky says:

====================
amd-xgbe: AMD 10Gb Ethernet driver updates

The following series fixes some bugs and provides new/changed support
in the driver.

- Fix a debugfs backward compatibility issue introduced by a previous patch
- Write to the interrupt enablement register, not the status register when
  setting MTL interrupts
- Call netif_napi_del whenever the ndo_stop operation is called (to match
  the call to netif_napi_add on ndo_open)
- Peformance enhancements:
  - Adjusted default coalescing settings
  - AXI DMA changes (burst length size and cache settings)
  - ioread/iowrite reduction during interrupt
  - Napi poll updates
- AXI DMA settings based on device tree property to account for a change in
  the ARM64 default cache operations assignment

This patch series is based on net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Base AXI DMA cache settings on device tree

The default cache operations for ARM64 were changed during 3.15.
To use coherent operations a "dma-coherent" device tree property
is required. If that property is not present in the device tree
node then the non-coherent operations are assigned for the device.

Add support to the amd-xgbe driver to assign the AXI DMA cache settings
based on whether the "dma-coherent" property is present in the device
node. If present, use settings that work with the caches. If not
present, use settings that do not look at the caches.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Performance enhancements

This patch provides some general performance enhancements for the
driver:
  - Modify the default coalescing settings (reduce usec, increase frames)
  - Change the AXI burst length to 256 bytes (default was 16 bytes which
    was smaller than a cache line)
  - Change the AXI cache settings to write-back/write-allocate which
    allocate cache entries for received packets during the DMA since the
    packet will be processed soon afterwards
  - Combine ioread/iowrite when disabling both the Tx and Rx interrupts
  - Change to processing the Tx/Rx channels in pairs
  - Only recycle the Rx descriptors when a threshold of dirty descriptors
    is reached

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Call netif_napi_del on ndo_stop operation

Currently the napi context is added using netif_napi_add each time
the ndo_open operation is called. However, there is not a
corresponding netif_napi_del call during the ndo_stop operation. If
the device ndo_open operation was called more than once an infinite
loop occurs during module unload. Add a call to netif_napi_del during
the ndo_stop operation.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Clear the proper MTL interrupt register

When initializing the MTL interrupts the interrupt status
register is written to instead of the interrupt enable register.
Since no MTL interrupts are being enabled and the default state
is for MTL interrupts to be disabled this did not cause a problem,
but needs to be fixed to target the correct register.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Fix debugfs compatibility change with kstrtouint

The initial change from sscanf to kstrtouint broke backward
compatbility by using a base of "0" in the kstrtouint call.
This allowed for entering decimal, hexadecimal or octal as
input where previously the sscanf always interpreted the input
as hexadecimal. Additionally, -EIO was returned on error prior
to this change and now it is whatever the error value that is
returned by kstrtouint.

Change the base value of the kstrtouint from 0 to 16 and return
-EIO on error.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: arcnet: Remove "#define bool int"

The header file include/linux/arcdevice.h #defines bool to int, if
bool is not already #defined. However, the files which use that header
file seem to rely on that #define (unconditionally) being in effect:
the prototypes for the functions arcrimi_reset, com20020_reset,
com90io_reset, com90xx_reset (whose addresses are assigned to the
hw.reset member of struct arcnet_local) use int explicitly.

Moreover, that #define is an accident waiting to happen (scenario:
inclusion of arcdevice.h followed by inclusion of some header which
declares function prototypes using bool). Also, #include
<linux/types.h> must appear before #include <linux/arcdevice.h> (the
compiler wouldn't like "typedef _Bool int").

Since none of the files using arcdevice.h declare variables of type
"bool", the patch is actually quite simple, unlike the commit message.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: fix return values in enic_set_coalesce

enic_set_coalesce() has two problems.

* It should return -EINVAL and not -EOPNOTSUPP for invalid coalesce values.

* In case of MSIX, enic_set_coalesce return error after applying requested
coalescing setting partially. We should either apply all the setting requeste
and return success or apply non and return error.

* This patch also simplifies the algo.

This was introduced by
'7c2ce6e60f703 enic: Add support for adaptive interrupt coalescing'

These changes were suggested by Ben Hutchings here
http://www.spinics.net/lists/netdev/msg283972.html

Also change enic driver version.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: remove no longer relevant vlan warnings

These warnings are no longer relevant. Even when last slave is
removed, there is a valid address assigned to bond (random).
The correct functionality of vlans is ensured by maintaining unicast
list in vlan_sync_address().

Suggested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'at86rf230-next'

Alexander Aring says:

====================
at86rf230: rework driver implementation

this patch series includes a rework of the at86rf230 driver.

There are several changes:

- Add regmap support.
- Merge at86rf212 operations with generic at86rf2xx operations, all chips
   supports these operations.
- Drop of irqworker. This is a workqueue which will scheduled by an irq to
   handle synchronous spi handling. Instead using asynchronous spi handling,
   then no scheduler is involved at irq handling.
- Also detected some bugs by receiving frame like CRC can be correct and a
   802.15.4 frame length could be above 127 bytes. This would crash the whole
   kernel (but should be handled by the mac layer). Another bug is the handling
   with RX_SAFE_MODE which protect the frame buffer after a readout. This is
   currently not working because we read out the buffer twice and the first one
   to get the frame size. Solution is to readout always the whole frame buffer.
- Added some timing relevants things from the datasheet for state changes And
   IEEE 802.15.4 standard like interframe spacing. Interframe spacing is needed
   to insert some receiving space time between frame transmitting. This should be
   also handled by MAC layer, but it's currently a workaround to add this inside
   the driver layer.
- Add some callback setting for chip specific handling, instead of runtime decisions
   if (is_chip_type()). Callbacks are set only once at probe time.
- We don't using a force state change anymore. A force state change will do a
   abort of receiving frames while we want to transmit a new frame. This should
   decrease the drop rate of packets.
- And many others changes and bug fixes...

changes since v3:
- fix irq polarity in patch ("at86rf230: rework irq_pol setting").

changes since v2:
- add check if necessary functions are implemented when hw flags are set in patch
   ("mac802154: at86rf230: add hw flags and merge ops"). I choosed the second variant.
- remove unnecessary includes for workqueue and mutex in patch
   ("at86rf230: rework transmit and receive").
- remove unnecessary cast in patch ("at86rf230: rework transmit and receive").
- acivate regmap cache with REGCACHE_RBTREE in patch
   ("at86rf230: add regmap support").
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: add new author

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: add sleep cycle timing

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: add timing for channel switch

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: rework reset to trx_off state change

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: rework state change and start/stop

This patch removes the current synchron state change function and add a
new function for a state assert. Change the start and stop callbacks to
use this new synchron state change behaviour. It's a wrapper around the
async state change function.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: rework irq_pol setting

This patch rework the irq_pol register setting for rising and falling
interrupt settings only. The default behaviour should be rising flag.

Also use IRQ_TYPE_* defines instead of IRQF_* defines. There is no
functionality change but irq_get_trigger_type returns IRQ_TYPE_* defines.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: move RX_SAFE_MODE setting to hw_init

There is no need to set this bit in start callback which could be
called more than once.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: rework transmit and receive handling

This patch is a complete reimplementation of transmit and receive
handling for the at86rf230 driver.

It solves also six bugs:

First:

The RX_SAFE_MODE is enabled and the transceiver doesn't leave the
receive state while the framebuffer isn't read by a CMD_FB command.
This is useful to read out the frame and don't get into another receive
or transmit state, otherwise the frame would be overwritten.
The current driver do twice CMD_FB calls, the first one leaves this
protection.

Second:

Sometimes the CRC calculation is correct and the length field is greater
127. The current mac802154 layer and filter of a at86rf2xx doesn't check
on this and the kernel crashes. In this case the frame is corrupted, we
send the whole receive buffer to the next layer which can be useful for
sniffing.

Thrid:
There is a undocumented race condition. When we are go into the
RX_AACK_ON state the transceiver could be changed into RX_AACK_BUSY
state. This is a normal behaviour. In this case the transceiver received
a SHR while assert wasn't finished.

Fourth:
It also handle some more "correct" state changes. In aret mode the
transceiver need to go to TX_ON before the transceiver go into
RX_AACK_ON.

Fifth:
The programming model [0] describes also a error handling in ARET mode
if the trac status is different than zero. This is patch adds support
for handling this.

Sixth:
In receive handling the transceiver should also get the trac status
according [0]. The driver could use the trac status as error statistic
handling, but the driver doesn't use this currently. There is maybe some
timing behaviour or the read of this register change some transceiver
states.

In addition the irqworker is removed. Instead we do async spi calls and
no scheduling is involved anymore. The transmit function is also
asynchron but with a wait_for_completion handling. The mac802154 layer
doesn't support asynchron transmit handling right now.

The state change behaviour is now changes, before it was:

1. assert while(!STATE_TRANSITION_IN_PROGRESS)
2. state change
3. assert while(!STATE_TRANSITION_IN_PROGRESS)
4. assert once(wanted state != current state)

Sometimes a unexcepted state change occurs when 4. assert was violated.
The new state change behaviour is:

1. assert while(!STATE_TRANSITION_IN_PROGRESS)
2. state change
3. wait state change timing according datasheet
4. assert once(wanted state != current state)

This behaviour is described in the at86rf231 software programming model [0].
The state change documentation in this programming guide should also valid for
at86rf212 and at86rf233 chips.

The transceiver don't do a FORCE_TX_ON while we want to transmit a PDU.
The new behaviour is a TX_ON and wait a receiving time (tFrame + tPAck).
If we are still in RX_AACK_BUSY then we transmit a FORCE_TX_ON as timeout
handling. The different is that FORCE_TX_ON aborts receiving and TX_ON
waits if RX_AACK_BUSY is finished. This should decrease the drop rate of
packets.

[0] http://www.atmel.com/Images/AVR2022_swpm231-2.0.zip

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: add support for at86rf23x desense

To set the CCA_ED_THRES register the calculation for at86rf23x is
different than for at86rf212. This patch adds a new callback for this
calculation in chip data struct.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: remove is212 and add driver data

This patch adds a new at86rf2xx_chip_data structure which holds device
specific attributes. Instead of runtime decisions "if (is212())" we set
callbacks/attributes while device detection.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: rework detect device handling

This patch drops the current lowlevel spi calls for the detect device
function instead we handle this via regmap. Also put the detection of
in a seperate function and set all device specific attributes while detection.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

at86rf230: add regmap support

This patch adds regmap support for the at86rf230 driver and drop the
lowlevel spi access functions and use the regmap access functions.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mac802154: at86rf230: add hw flags and merge ops

This patch adds new mac802154 hw flags for transmit power, csma and
listen before transmit (lbt). These flags indicates that the transceiver
supports these features. If the flags are set and the driver doesn't
implement the necessary functions, then ieee802154_register_device
returns -ENOSYS "Function not implemented".

This patch merges also all at86rf230 operations into one operations structure
and set the right hw flags for the at86rf230 transceivers.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-07-02

This series contains updates to i40e and i40evf.

Anjali fixes a possible race where we were trying to free the dummy packet
buffer in the function that created it, so cleanup the dummy packet buffer
in i40e_clean_tx_ring() instead.  Also fixes an issue where the filter
program routine was not checking if there were descriptors available for
programming a filter.

Mitch fixes unnecessary delays when sending the admin queue commands by
moving a declaration up one level so we do not dereference it out of scope.
Fixes an issue with the VF where if the admin queue interrupts get lost for
some reason, the VF communication will stall as the VFs have no way of
reaching the PF.  To alleviate this condition, go ahead and check the ARQ
every time we run the service task.  Updates i40evf to allow the watchdog
to fire vector 0 via software, which makes the driver tolerant of dropped
interrupts on that vector.

Paul fixes a shifted '1' to be unsigned to avoid shifting a signed integer.

Jesse disables TPH by default since it is currently not enabled in the
current hardware.  Also finishes the i40e implementation of get_settings
for ethtool.

Catherine adds a new variable (hw.phy.link_info.an_enabled) to track whether
auto-negotiation is enabled, along with the functionality to update the
variable.  Adds the functionality to set the requested flow control mode.
Adds i40e implementation of setpauseparam and set_settings to ethtool.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fec-next'

Russell King says:

====================
Freescale ethernet driver updates

Here's the first batch of patches for the Freescale FEC ethernet driver.
They require the previously applied "net: fec: Don't clear IPV6 header
checksum field when IP accelerator enable" patch.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: fix missing kmalloc() failure check in fec_enet_alloc_buffers()

fec_enet_alloc_buffers() assumes that kmalloc() will never fail, which
is an invalid assumption. Fix this by implementing a common error
cleanup path, and use it to also clean up after failed bounce buffer
allocation.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: ensure fec_enet_free_buffers() properly cleans the rings

Ensure that we do not double-free any allocations, and that any transmit
skbuffs are properly freed when we clean up the rings.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: clean up transmit descriptor setup

Avoid writing any state until we're certain we can proceed with the
transmission: this avoids writing mapping error address values to the
descriptors, or setting the skbuff pointer until we have successfully
mapped the skb.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: make rx skb handling more robust

Allocate, and then map the receive skb before writing any data to the
ring descriptor or storing the skb. When freeing the receive ring
entries, unmap and free the skb, and then clear the stored skb pointer.

This means we have ring data and skb pointer in one of two states:
either both fully setup, or nothing setup.

This simplifies the cleanup, as we can use just the skb pointer to
indicate whether the descriptor is setup, and thus avoids potentially
calling dma_unmap_single() on a DMA error value.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: remove useless fep->opened

napi_disable() waits until the NAPI processing has completed, and then
prevents any further polls.  At this point, the driver then clears
fep->opened.  The NAPI poll function uses this to stop processing in
the receive path.  Hence, it will never see this variable cleared,
because the NAPI poll has to complete before it will be cleared.

Therefore, this variable serves no purpose, so let's remove it.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: stop the phy before shutting down the MAC

When the network interface goes down, stop the phy to prevent further
link up status changes before taking the MAC or netif sections down.
This prevents further reception of link up events which could
potentially call fec_restart().

Since phy_stop() takes the mutex which adjust_link() runs under, we
also ensure that adjust_link() will not already be processing a link
up event.

We also need to do this when suspending as well - we don't want a
mis-timed phy state change to restart the MAC after we have stopped
it for suspend, and thus need to restart the phy when resuming.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: ensure that a disconnected phy isn't configured

When we disconnect from a phy, we should forget our pointer to it so we
don't accidentally try to configure it. We handle a NULL phy pointer
correctly in most places, except fec_enet_set_pauseparam(). Fix this
too.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: remove checking for NULL phy_dev in fec_enet_close()

fep->phy_dev can not be NULL here for two reasons:
- fec_enet_open() will have successfully connected the phy, or will have
failed.
- fec_enet_open() will have called phy_start(fep->phy_dev), which
unconditionally dereferences this pointer.

If it were to be NULL here, then fec_enet_open() will have already
oopsed.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: use netif_tx_disable() rather than netif_stop_queue()

We use netif_stop_queue() in several places where we want to ensure that
the start_xmit function is not running. netif_stop_queue() is not
sufficient to achieve that - it merely sets a flag to indicate that the
transmit queue(s) should not be run.

netif_tx_disable() gives this guarantee, since it takes the transmit
queue lock while marking the queue stopped. This will wait for the
transmit function to complete before returning.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: fix interrupt handling races

While running: while :; do iperf -c <HOST> -P 4; done, transmit timeouts
are regularly reported.  With the tx ring dumping in place, we can see
that all entries are in use, and the hardware has finished transmitting
these packets.  However, the driver has not reclaimed these ring
entries.

This can occur if the interrupt handler is invoked at the wrong moment -
eg:

CPU0 CPU1
fec_enet_tx()
interrupt, IEVENT = FEC_ENET_TXF
FEC_ENET_TXF cleared
napi_schedule_prep()
napi_complete()

The result is that we clear the transmit interrupt, but we don't trigger
any cleaning of the transmit ring.  Instead, use a different strategy:

- When receiving a transmit or receive interrupt, disable both tx and rx
  interrupts, but do not acknowledge them.  Schedule a napi poll.  Don't
  loop.

- When we are polled, read IEVENT, acknowledging the pending transmit
  and receive interrupts, before then going on to process the
  appropriate rings.

This allows us to avoid the race, and has a number of other advantages:
- we cut down on the number of transmit interrupts we have to process.
- we only look at the rings which have pending events.
- we gain additional throughput: the iperf total bandwidth increases
  from about 180Mbps to 240Mbps:

[  3]  0.0-10.0 sec  68.1 MBytes  57.0 Mbits/sec
[  5]  0.0-10.0 sec  72.4 MBytes  60.5 Mbits/sec
[  4]  0.0-10.1 sec  76.1 MBytes  63.5 Mbits/sec
[  6]  0.0-10.1 sec  71.9 MBytes  59.9 Mbits/sec
[SUM]  0.0-10.1 sec   288 MBytes   241 Mbits/sec

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: fix ethtool set_pauseparam duplex bug

Setting the pause parameters causes a running network interface to be
restarted. However, the restart forces the FEC into half-duplex mode,
whether or not the remote end is in half-duplex mode. Misconfigured
duplex mode is a known source of problems on a link.

Fix this by always preserving the duplex mode on configuration changes.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fec: iMX6 FEC does not support half-duplex gigabit

The iMX6 gigabit FEC does not support half-duplex gigabit operation.
Phys attacked to the FEC may support this, and we currently do nothing
to disable this feature. This may result in an invalid configuration.
Mask out phy support for gigabit half-duplex operation.

Acked-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-hash-tx'

Tom Herbert says:

====================
net: Improvements and applications of packet flow hash in transmit path

This patch series includes some patches which improve and make use
of skb->hash in the transmit path.

What is included:

- Infrastructure to save a precomputed hash in the sock structure.
  For connected TCP and UDP sockets we only need to compute the
  flow hash once and not once for every packet.
- Call skb_get_hash in get_xps_queue and __skb_tx_hash. This eliminates
  the awkward access to skb->sk->sk_hash in the lower transmit path.
- Move UDP source port generation into a common function in udp.h This
  implementation is mostly based on vxlan_src_port.
- Use non-zero IPv6 flow labels in flow_dissector as port information
  for flow hash calculation.
- Implement automatic flow label generation on transmit (per RFC 6438).
- Don't repeatedly try to compute an L4 hash in skb_get_hash if we've
  already tried to find one in software stack calculation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: Only do flow_dissector hash computation once per packet

Add sw_hash flag to skbuff to indicate that skb->hash was computed
from flow_dissector. This flag is checked in skb_get_hash to avoid
repeatedly trying to compute the hash (ie. in the case that no L4 hash
can be computed).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Implement automatic flow label generation on transmit

Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.

Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.

By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.

It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.

Performance impact:

Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.

Automatic flow labels disabled:

  TCP_RR:
    86.53% CPU utilization
    127/195/322 90/95/99% latencies
    1.40498e+06 tps

  UDP_RR:
    90.70% CPU utilization
    118/168/243 90/95/99% latencies
    1.50309e+06 tps

Automatic flow labels enabled:

  TCP_RR:
    85.90% CPU utilization
    128/199/337 90/95/99% latencies
    1.40051e+06

  UDP_RR
    92.61% CPU utilization
    115/164/236 90/95/99% latencies
    1.4687e+06

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_dissector: Use IPv6 flow label in flow_dissector

This patch implements the receive side to support RFC 6438 which is to
use the flow label as an ECMP hash. If an IPv6 flow label is set
in a packet we can use this as input for computing an L4-hash. There
should be no need to parse any transport headers in this case.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: Call udp_flow_src_port

In vxlan and OVS vport-vxlan call common function to get source port
for a UDP tunnel. Removed vxlan_src_port since the functionality is
now in udp_flow_src_port.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: Add function to make source port for UDP tunnels

This patch adds udp_flow_src_port function which is intended to be
a common function that UDP tunnel implementations call to set the source
port. The source port is chosen so that a hash over the outer headers
(IP addresses and UDP ports) acts as suitable hash for the flow of the
encapsulated packet. In this manner, UDP encapsulation works with RSS
and ECMP based wrt the inner flow.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Call skb_get_hash in get_xps_queue and __skb_tx_hash

Call standard function to get a packet hash instead of taking this from
skb->sk->sk_hash or only using skb->protocol.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Save TX flow hash in sock and set in skbuf on xmit

For a connected socket we can precompute the flow hash for setting
in skb->hash on output. This is a performance advantage over
calculating the skb->hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).

This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.

skb_set_hash_from_sk is a function which sets skb->hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.

Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).

Before fix:

  95.02% CPU utilization
  154/256/505 90/95/99% latencies
  1.13042e+06 tps

  Time in functions:
    0.28% skb_flow_dissect
    0.21% __skb_get_hash

After fix:

  94.95% CPU utilization
  156/254/485 90/95/99% latencies
  1.15447e+06

  Neither __skb_get_hash nor skb_flow_dissect appear in perf

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

flow_dissector: Abstract out hash computation

Move the hash computation located in __skb_get_hash to be a separate
function which takes flow_keys as input. This will allow flow hash
computation in other contexts where we only have addresses and ports.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'systemport-next'

Florian Fainelli says:

====================
net: systemport: PM and Wake-on-LAN support

This patchset brings Power Management and Wake-on-LAN support to the
Broadcom SYSTEM PORT driver.

S2 and S3 modes are supported, while we only support Wake-on-LAN using
MagicPackets for now
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: systemport: add Wake-on-LAN support

Support for Wake-on-LAN using Magic Packet with or without SecureOn
password is implemented doing the following:

- setting the password to the relevant UniMAC registers
- flagging the device as a wakeup source for the system, as well as
its Wake-on-LAN interrupt
- prepare the hardware for entering WoL mode
- enabling the MPD interrupt to wake us

The Device Tree binding documentation is also reflected to specify the
third optional Wake-on-LAN interrupt line.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: systemport: rename rx_csum_en to rx_chk_en

This boolean tells us whether we are using the RXCHK hardware block,
so use a variable name that reflects that. RXCHK might be used in the
future to implement Wake-on-LAN using ARP or unicast packets.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: systemport: add suspend and resume support

Implement the hardware recommended suspend/resume procedure for
SYSTEMPORT. We leverage the previous factoring work such that we can
logically break all suspend/resume operations into disctint RX and TX
code paths.

When the system enters S3, we will loose all register contents, so
make sure that we correctly re-program all the hardware and software
views of the RX & TX rings as well.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>