git.karo-electronics.de Git - linux-beck.git/log

sctp: rwnd_press should be cumulative

rwnd_press tracks the pressure on the recieve window. Every
timer the receive buffer overlows, we truncate the receive
window and then grow it back. However, if we don't track
the cumulative presser, it's possible to reach a situation
when receive buffer is empty, but rwnd stays truncated.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: fast recovery algorithm is per association.

SCTP fast recovery algorithm really applies per association
and impacts all transports.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: update transport initializations

Right now, sctp transports are not fully initialized and when
adding any new fields, they have to be explicitely initialized.
This is prone to mistakes. So we switch to calling kzalloc()
which makes things much simpler.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Save some room in the sctp_transport by using bitfields

Saves some room in the sctp_transport structure.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Do not force T3 timer on fast retransmissions.

We don't need to force the T3 timer any more and it's
actually wrong to do as it causes too long of a delay.
The timer will be started if one is not running, but if
one is running, we leave it alone.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: remove 'resent' bit from the chunk

The 'resent' bit is used to make sure that we don't update
rto estimate based on retransmitted chunks. However, we already
have the 'rto_pending' bit that we test when need to update rto,
so 'resent' bit is just extra. Additionally, we currently have
a bug in that we always set a 'resent' bit and thus rto estimate
is only updated by Heartbeats.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Make sure we always return valid retransmit path

commit 4951feda0c60d1ef681f1a270afdd617924ab041
    sctp: Do no select unconfirmed transports for retransmissions

added code to make sure that we do not select unconfirmed paths
for data transmission.  This caused a problem when there are only
2 paths, 1 unconfirmed and 1 unreachable.  In that case, the next
retransmit path returned is NULL and that causes a kernel crash.

The solution is to only change retransmit paths if we found one to use.

Reported-by: Frank Schuster <frank.schuster01@web.de>
Signed-off-b: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: cleanup: remove duplicate assignment

This assignment isn't needed because we did it earlier already.

Also another reason to delete the assignment is because it triggers a
Smatch warning about checking for NULL pointers after a dereference.

Reported-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: implement sctp association probing module

This patch implement sctp association probing module, the module
will be called sctp_probe.

This module allows for capturing the changes to SCTP association
state in response to incoming packets. It is used for debugging
SCTP congestion control algorithms.

Usage:
  $ modprobe sctp_probe [full=n] [port=n] [bufsize=n]
  $ cat /proc/net/sctpprobe

  The output format is:
    TIME     ASSOC     LPORT RPORT MTU    RWND  UNACK <REMOTE-ADDR   STATE  CWND   SSTHRESH  INFLIGHT  PARTIAL_BYTES_ACKED MTU> ...

  The output will be like this:
    9.226086 c4064c48  9000  8000  1500    53352     1 *192.168.0.19  1     4380    54784     1252        0     1500
    9.287195 c4064c48  9000  8000  1500    45144     5 *192.168.0.19  1     5880    54784     6500        0     1500
    9.289130 c4064c48  9000  8000  1500    42724     5 *192.168.0.19  1     7380    54784     6500        0     1500
    9.620332 c4064c48  9000  8000  1500    48284     4 *192.168.0.19  1     8880    54784     5200        0     1500
    ......

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: use sctp_chunk_is_data macro to decide a chunk is data chunk

sctp_chunk_is_data macro is defined to decide that
whether a chunk is data chunk or not.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Do no select unconfirmed transports for retransmissions

An unconfirmed transport is one that we have not been
able to reach since the beginning. There is no point in
trying to retrasnmit data on those transports. Also, the
specification forbids it due to security issues.

Reported-by: Frank Schuster <frank.schuster01@web.de>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: fix to retranmit at least one DATA chunk

While doing retranmit, if control chunk exists, such as
FORWARD TSN chunk, and the DATA chunk can not be bundled with
this control chunk because of PMTU limit, no DATA chunk
will be retranmitted in the current implementation. This
patch makes sure to retranmit at least one DATA chunk in this case.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: missing set src and dest port while lookup output route

While lookup the output route, we do not set the src and dest
port. This will cause we got a wrong route if we had set the
outbund transport to IPsec with src or dst port.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: discard ABORT chunk with zero verification tag in COOKIE-WAIT state

In current implementation if ABORT chunk is received with T flag is set
and zero verification tag in COOKIE-WAIT state, the ABORT chunk will be
always accepted. This is because in COOKIE-WAIT state, the endpoint does
not know the peer's verification tag, and it's zero in the endpoint.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: assure at least one T3-rtx timer is running if a FORWARD TSN is sent

PR-SCTP extension section 3.5 Sender Side Implementation of PR-SCTP:
C5) If a FORWARD TSN is sent, the sender MUST assure that at
least one T3-rtx timer is running.

So this patch fix to assure at least one T3-rtx timer is running
if a FORWARD TSN is or will to sent.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: send SHUTDOWN-ACK chunk back to the source.

SHUTDOWN-ACK is alaways sent to the primary path at the first time,
but should better transmit SHUTDOWN-ACK chunk to the same destination
transport address from which it received the SHUTDOWN chunk.
Based on the work from Wei Yongjun <yjwei@cn.fujitsu.com>.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

sctp: Use correct address family in sctp_getsockopt_peer_addrs()

The function should use the address family of the address when
trying to determine the length of the structure.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>

ipv6: cleanup: remove unneeded null check

We dereference "sk" unconditionally elsewhere in the function.

This was left over from: b30bd282 "ip6_xmit: remove unnecessary NULL
ptr check". According to that commit message, "the sk argument to
ip6_xmit is never NULL nowadays since the skb->priority assigment
expects a valid socket."

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

xfrm: potential uninitialized variable num_xfrms

potential uninitialized variable num_xfrms

fix compiler warning: 'num_xfrms' may be used uninitialized in this function.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
net/xfrm/xfrm_policy.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: David S. Miller <davem@davemloft.net>

net: speedup sock_recv_ts_and_drops()

sock_recv_ts_and_drops() is fat and slow (~ 4% of cpu time on some
profiles)

We can test all socket flags at once to make fast path fast again.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cdc_ether: Identify MBM devices by GUID in MDLM descriptor

This patch removes vid/pid for Ericsson MBM devices from the whitelist set of
devices. The MBM devices are instead identified by GUID.

In order for cdc_ether to handle these devices the GUID in the MDLM descriptor
is tested. All MBM devices currently handled by cdc_ether as well as future
CDC Ethernet MBM devices can be identified by the GUID.

This is the same solution used in Carl Nordbeck's mbm driver,
http://kerneltrap.org/mailarchive/linux-usb/2008/11/17/4141384/thread

I post this as RFC to get feedback on however cdc_ether is the correct place to
do the binding, or if it should be done in a separate driver, e.g. zaurus.

Signed-off-by: Jonas Sjöquist <jonas.sjoquist@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

forcedeth: Stay in NAPI as long as there's work

The following does the same thing without the extra overhead
of testing all the registers. It also handles the out of memory
case.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Tested-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ip_queue_rcv_skb() helper

When queueing a skb to socket, we can immediately release its dst if
target socket do not use IP_CMSG_PKTINFO.

tcp_data_queue() can drop dst too.

This to benefit from a hot cache line and avoid the receiver, possibly
on another cpu, to dirty this cache line himself.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: speedup udp receive path

Since commit 95766fff ([UDP]: Add memory accounting.),
each received packet needs one extra sock_lock()/sock_release() pair.

This added latency because of possible backlog handling. Then later,
ticket spinlocks added yet another latency source in case of DDOS.

This patch introduces lock_sock_bh() and unlock_sock_bh()
synchronization primitives, avoiding one atomic operation and backlog
processing.

skb_free_datagram_locked() uses them instead of full blown
lock_sock()/release_sock(). skb is orphaned inside locked section for
proper socket memory reclaim, and finally freed outside of it.

UDP receive path now take the socket spinlock only once.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: Clean up left over prototype of igb_get_hw_dev_name()

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

wireless: Fix merge.

in your merge in 5c01d5669356e13f0fb468944c1dd4c6a7e978ad you added "int
i;" into wl1271_main.c which is unused in that function.

This patch fixes the merge problem:

Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Bugfix: Link selection was swapped in switch.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif: Bugfixes in CAIF netdevice for close and flow control

Changes:
o Bugfix: Flow control was causing the device to be destroyed.
o Bugfix: Handle CAIF channel connect failures.
o If the underlying link layer is gone the net-device is no longer removed,
but closed.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif: Rewritten socket implementation

Changes:
This is a complete re-write of the socket layer. Making the socket
implementation more aligned with the other socket layers and using more
of the support functions available in sock.c. Lots of code is copied
from af_unix (and some from af_irda).
Non-blocking mode should be working as well.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif: Disconnect without waiting for response

Changes:
o Function cfcnfg_disconn_adapt_layer is changed to do asynchronous
  disconnect, not waiting for any response from the modem. Due to this
  the function cfcnfg_linkdestroy_rsp does nothing anymore.
o Because disconnect may take down a connection before a connect response
  is received the function cfcnfg_linkup_rsp is checking if the client is
  still waiting for the response, if not a disconnect request is sent to
  the modem.
o cfctrl is no longer keeping track of pending disconnect requests.
o Added function cfctrl_cancel_req, which is used for deleting a pending
  connect request if disconnect is done before connect response is received.
o Removed unused function cfctrl_insert_req2
o Added better handling of connect reject from modem.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif: Add reference counting to service layer

Changes:
o Added functions cfsrvl_get and cfsrvl_put.
o Added support release_client to use by socket and net device.
o Increase reference counting for in-flight packets from cfmuxl

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif: Rename functions in cfcnfg and caif_dev

Changes:
o Renamed cfcnfg_del_adapt_layer to cfcnfg_disconn_adapt_layer
o Fixed typo cfcfg to cfcnfg
o Renamed linkid to channel_id
o Updated documentation in caif_dev.h
o Minor formatting changes

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

caif: Ldisc add permission check and mem-alloc error check

Changes:
   o Added permission checks for installing. CAP_SYS_ADMIN and
     CAP_SYS_TTY_CONFIG can install the ldisc.
   o Check if allocation of skb was successful.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Create multiple TX queues

Create a core TX queue and 2 hardware TX queues for each channel.
If separate_tx_channels is set, create equal numbers of RX and TX
channels instead.

Rewrite the channel and queue iteration macros accordingly.
Eliminate efx_channel::used_flags as redundant.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Test only the first pair of TX queues

This makes no immediate difference, but we definitely do not want
to test all TX queues once we allocate a pair of TX queues to each
channel.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Add Siena PHY BIST and cable diagnostic support

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Clean up efx_nic::irq_zero_count

There is no need for this to be unsigned long; make it unsigned int.
It does need a line in kernel-doc, so add that.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Add necessary parentheses to macro definitions in net_driver.h

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Break NAPI processing after one ring-full of TX completions

Currently TX completions do not count towards the NAPI budget. This
means a continuous stream of TX completions can cause the polling
function to loop indefinitely with scheduling disabled. To avoid
this, follow the common practice of reporting the budget spent after
processing one ring-full of TX completions.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Set PERIODIC_NOEVENT flag for MC_CMD_MAC_STATS

When set, an event is not sent whenever periodic MAC statistics are
raised. This avoids unnecessary wake-ups.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Update MCDI protocol definitions

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Enable IPv6 RSS using random key for Toeplitz hash

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Read MEM_STAT for SRM_PERR as well as MEM_PERR errors

Parity errors in different blocks of SRAM may set one of two different
interrupt flags.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Log specific message for failure of NVRAM self-test

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Extend the legacy interrupt workarounds

Siena has two problems with legacy interrupts:
  1. There is no synchronisation between the ISR read completion,
     and the interrupt deassert message.
  2. A downstream read at the "wrong" moment can return 0, and
     suppress generating the next interrupt.

Falcon should suffer from both of these, and it appears it does.
Enable EFX_WORKAROUND_15783 on Falcon as well.

Also, when we see queues == 0, ensure we always schedule or rearm
every event queue.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Reconfigure the XAUI serdes after an EM reset

Fix a regression introduced in d3245b28ef2a45ec4e115062a38100bd06229289
"sfc: Refactor link configuration".

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Stop masking out XGMII faults over reconfigures

The aim of this code was to avoid a spurious XGMII fault over a MAC
reconfigure. It's less relevant now that the PHY reconfigure isn't
called from the MAC reconfigure.

After applying this patch, our link stress test passed 48 hours of
testing without ever resetting the PHY.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Handle serious errors in exactly one interrupt handler

'Fatal' errors set an interrupt flag associated with a specific event
queue; only read the syndrome vector if we see that queue's flag set
(legacy interrupts) or in the interrupt handler for that queue (MSI).

Do not ignore an interrupt if the fatal error flag is set but specific
error flags are all zero. Even if we don't schedule a reset, we must
respect the queue mask and rearm the appropriate event queues.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Consistently report short MCDI responses as EIO

In some cases failing functions were returning 0 which is obviously wrong.
In other cases they were returning inappropriate error codes.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Ignore parity errors in the other port's SRAM

Siena has a separate SRAM bank for each port. On single-port boards
these can be merged together, so each port has an interrupt flag for
parity errors in the other port's SRAM. Currently we do not enable
such merging and should mask this interrupt source.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sky2: use the DMA state API instead of the pci equivalents

This replace the PCI DMA state API (include/linux/pci-dma.h) with the
DMA equivalents since the PCI DMA state API will be obsolete.

No functional change.

For further information about the background:

http://marc.info/?l=linux-netdev&m=127037540020276&w=2

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: Remove two prefetch()

1) Even on 64bit arches, sizeof(struct sk_buff) < 256
2) No need to prefetch same pointer twice.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Acked-by: Eliezer Tamir <eliezer@tamir.org.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: disable MSI-X by default on certain Cisco adapters

Due to an errata in 82598 parts MSI-X needs to be disabled
in certain ixgbe devices designed to transfer peer-to-peer
traffic on the PCIe bus. This patch sets the default
interrupt type to MSI rather than MSI-X for specific Cisco
ixgbe adapters.

Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Acked-by: John Ronciak <john.ronciak@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: multicast_flood cleanup

Move some declarations around to make it clearer which variables
are being used inside loop.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: multicast port group RCU fix

The recently introduced bridge mulitcast port group list was only
partially using RCU correctly. It was missing rcu_dereference()
and missing the necessary barrier on deletion.

The code should have used one of the standard list methods (list or hlist)
instead of open coding a RCU based link list.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: multicast flood

Fix unsafe usage of RCU. Would never work on Alpha SMP because
of lack of rcu_dereference()

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: simplify multicast_add_router

By coding slightly differently, there are only two cases
to deal with: add at head and add after previous entry.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: add registers etc. printout code just before resetting adapters

This patch adds registers (,tx/rx rings' status and so on) printout
code just before resetting adapters. This will be helpful for detecting
the root cause of adapters reset.

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: add registers etc. printout code just before resetting adapters

This patch adds registers (,tx/rx rings' status and so on) printout
code just before resetting adapters. This will be helpful for detecting
the root cause of adapters reset.

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: add registers etc. printout code just before resetting adapters

This patch adds registers (,tx/rx rings' status and so on) printout
code just before resetting adapters. This will be helpful for detecting
the root cause of adapters reset.

Signed-off-by: Taku Izumi <izumi.taku@jp.fujitsu.com>
Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: Use netdev_<level>, pr_<level> and dev_<level>

This patch is an alternative to similar patch provided by Joe Perches.

Substitute DPRINTK macro for e_<level> that uses netdev_<level> and dev_<level>
similar to e1000e.
- Convert printk to pr_<level> where applicable.
- Use common #define pr_fmt for the driver.
- Use dev_<level> for displaying text in parts of the driver where the interface
name is not assigned (like e1000_param.c).
- Better align test with the new macros.

CC: Joe Perches <joe@perches.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "bridge: Use hlist_for_each_entry_rcu() in br_multicast_add_router()"

This reverts commit ff65e8275f6c96a5eda57493bd84c4555decf7b3.

As explained by Stephen Hemminger, the traversal doesn't require
RCU handling as we hold a lock.

The list addition et al. calls, on the other hand, do.

Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbevf: use DMA API instead of PCI DMA functions

Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: use DMA API instead of PCI DMA functions

Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgb: use DMA API instead of PCI DMA functions

Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igbvf: use DMA API instead of PCI DMA functions

Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: convert igb from using PCI DMA functions to using DMA API functions

This patch makes it so that igb now uses the DMA API functions instead of
the PCI API functions. To do this the pci_dev pointer that was in the
rings has been replaced with a device pointer, and as a result all
references to [tr]x_ring->pdev have been replaced with [tr]x_ring->dev.

This patch is based of of work originally done by Nicholas Nunley.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: use DMA API instead of PCI DMA functions

Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000: use DMA API instead of PCI DMA functions

Signed-off-by: Nicholas Nunley <nicholasx.d.nunley@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: Use hlist_for_each_entry_rcu() in br_multicast_add_router()

Noticed by Michał Mirosław.

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: set skb->rxhash

Implement the ->set_flags ethtool method to control NETIF_F_RXHASH and
set skb->rxhash to the HW calculated hash accordingly.

Follow Eric Dumazet's suggestion and use the hash value raw.

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sb1250: setup the pdevice within the soc code

doing it within the driver does not look good.
And surely isn't how platform devices were meat to be used.

Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sb1250: remove CONFIG_SIBYTE_STANDALONE

CONFIG_SIBYTE_STANDALONE is gone since v2.6.31-rc1 ("MIPS: Sibyte:
Remove standalone kernel support")
This is a missing piece.

Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: disallow to use net_assign_generic externally

Now there's no need to use this fuction directly because it's handled by
register_pernet_device. So to make this simple and easy to understand,
make this static to do not tempt potentional users.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: increase serial number length

Some boards have longer serial numbers in their VPD, up to 24 bytes.

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: parse the VPD instead of relying on a static VPD layout

Some boards' VPDs contain additional keywords or have longer serial numbers,
meaning the keyword locations are variable. Ditch the static layout and
use the pci_vpd_* family of functions to parse the VPD instead.

Signed-off-by: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sk_add_backlog() take rmem_alloc into account

Current socket backlog limit is not enough to really stop DDOS attacks,
because user thread spend many time to process a full backlog each
round, and user might crazy spin on socket lock.

We should add backlog size and receive_queue size (aka rmem_alloc) to
pace writers, and let user run without being slow down too much.

Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
stress situations.

Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on a 8 core machine.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: batch skb dequeueing from softnet input_pkt_queue

batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
contention when RPS is enabled.

Note: in the worst case, the number of packets in a softnet_data may
be double of netdev_max_backlog.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Make RFS socket operations not be inet specific.

Idea from Eric Dumazet.

As for placement inside of struct sock, I tried to choose a place
that otherwise has a 32-bit hole on 64-bit systems.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

ixgbe: Properly display 1 gig downshift warning for backplane

Description: When using Intel smartspeed, the patch displays a
warning when the link down shifts to 1 Gig.

Signed-off-by: Anjali Singhai <anjali.singhai@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: cleanup ethtool autoneg input

The way we were setting autoneg via ethtool was inconstant with that
of our other drivers.  It will change the following:

If autoneg is off:
>ethtool -a eth0
Pause parameters for eth0:

Autonegotiate:  off
RX:             off
TX:             off

Before:
>ethtool -A eth0 autoneg on
>ethtool -a eth0
Pause parameters for eth0:

Autonegotiate:  off
RX:             off
TX:             off

Now:
>ethtool -A eth0 autoneg on
>ethtool -a eth0
Pause parameters for eth0:

Autonegotiate:  on
RX:             on
TX:             on

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbevf: Fix link speed display

The ixgbevf driver would always report 10Gig speeds even when the link
speed is downshifted to 1Gig. This patch fixes that problem.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: reimplement softnet_data.output_queue as a FIFO queue

reimplement softnet_data.output_queue as a FIFO queue to keep the
fairness among the qdiscs rescheduled.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
----
include/linux/netdevice.h | 1 +
net/core/dev.c | 22 ++++++++++++----------
2 files changed, 13 insertions(+), 10 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6

ixgb: Use pr_<level> and netdev_<level>

Convert DEBUGOUTx to pr_debug
Convert DEBUGFUNC to more commonly used ENTER
Convert mac address output to %pM
Use #define pr_fmt
Convert a few printks to pr_<level>
Improve ixgb_mc_addr_list_update: use a temporary for current mc address
Use etherdevice.h functions for mac address testing

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: ixgbe_down needs to stop dev_watchdog

There is a small race between when the tx queues are stopped
and when netif_carrier_off() is called in ixgbe_down. If the
dev_watchdog() timer fires during this time it is possible for
a false tx timeout to occur.

This patch moves the netif_carrier_off() so that it is called before
the tx queues are stopped preventing the dev_watchdog timer from
detecting false tx timeouts. The race is seen occosionally when
FCoE or DCB settings are being configured or changed.

Testing note, running ifconfig up/down will not reproduce this
issue because dev_open/dev_close call dev_deactivate() and then
dev_activate().

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: add support for reporting 5GT/s during probe on PCIe Gen2

This change corrects the fact that we were not reporting Gen2 link speeds
when we were in fact connected at Gen2 rates.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: fix bug when EITR=0 causing no writebacks

writebacks can be held indefinitely by hardware if EITR=0, when
combined with TXDCTL.WTHRESH=8. When EITR=0, WTHRESH should be
set back to zero.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: enable extremely low latency

82598/82599 can support EITR == 0, which allows for the
absolutely lowest latency setting in the hardware. This disables
writeback batching and anything else that relies upon a delayed
interrupt. This patch enables the feature of "override" when a
user sets rx-usecs to zero, the driver will respect that setting
over using RSC, and automatically disable RSC. If rx-usecs is
used to set the EITR value to 0, then the driver should disable
LRO (aka RSC) internally until EITR is set to non-zero again.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igbvf: double increment nr_frags

There is no need to increment nr_frags because skb_fill_page_desc increments
it.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: double increment nr_frags

There is no need to increment nr_frags because skb_fill_page_desc increments
it.

Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix a lockdep rcu warning in __sk_dst_set()

__sk_dst_set() might be called while no state can be integrated in a
rcu_dereference_check() condition.

So use rcu_dereference_raw() to shutup lockdep warnings (if
CONFIG_PROVE_RCU is set)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rps: inet_rps_save_rxhash() argument is not const

const qualifier on sock argument is misleading, since we can modify rxhash.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

TCP: avoid to send keepalive probes if receiving data

RFC 1122 says the following:
...
  Keep-alive packets MUST only be sent when no data or
  acknowledgement packets have been received for the
  connection within an interval.
...

The acknowledgement packet is reseting the keepalive
timer but the data packet isn't. This patch fixes it by
checking the timestamp of the last received data packet
too when the keepalive timer expires.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: multicast router list manipulation

I prefer that the hlist be only accessed through the hlist macro
objects. Explicit twiddling of links (especially with RCU) exposes
the code to future bugs.

Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: use is_multicast_ether_addr

Use existing inline function.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

Conflicts:
drivers/net/e100.c
drivers/net/e1000e/netdev.c

net: suppress RCU lockdep false positive in twsk_net()

Calls to twsk_net() are in some cases protected by reference counting
as an alternative to RCU protection.  Cases covered by reference counts
include __inet_twsk_kill(), inet_twsk_free(), inet_twdr_do_twkill_work(),
inet_twdr_twcal_tick(), and tcp_timewait_state_process().  RCU is used
by inet_twsk_purge().  Locking is used by established_get_first()
and established_get_next().  Finally, __inet_twsk_hashdance() is an
initialization case.

It appears to be non-trivial to locate the appropriate locks and
reference counts from within twsk_net(), so used rcu_dereference_raw().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb3: Wait longer for control packets on initialization

In some Power7 platforms, when using VIOS (Virtual I/O Server), we
need to wait longer for control packets to finish transfer during
initialization.
Without this change, initialization may fail prematurely.

Signed-off-by: Wen Xiong <wenxiong@us.ibm.com>
Signed-off-by: Andre Detsch <adetsch@br.ibm.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata

Prompted by a previous patch submitted by Matthew Garret <mjg@redhat.com>,
further digging into errata documentation reveals the current enabling or
disabling of ASPM L0s and L1 states for certain parts supported by this
driver are incorrect. 82571 and 82572 should always disable L1. For
standard frames, 82573/82574/82583 can enable L1 but L0s must be disabled,
and for jumbo frames 82573/82574 must disable L1. This allows for some
parts to enable L1 in certain configurations leading to better power
savings.

Also according to the same errata, Early Receive (ERT) should be disabled
on 82573 when using jumbo frames.

Cc: Matthew Garret <mjg@redhat.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>