git.karo-electronics.de Git - linux-beck.git/log

ipvs: add support for sync threads

Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.

The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.

Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.

Make sure the backup socks do not block on reading.

Special thanks to Aleksey Chudov for helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: reduce sync rate with time thresholds

Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.

sync_refresh_period: in seconds, difference in reported connection
timer that triggers new sync message. It can be used to
avoid sync messages for the specified period (or half of
the connection timeout if it is lower) if connection state
is not changed from last sync.

sync_retries: integer, 0..3, defines sync retries with period of
sync_refresh_period/8. Useful to protect against loss of
sync messages.

Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.

Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.

As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.

Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: wakeup master thread

High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,

Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.

Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.

As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).

Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.

Change master thread to enter TASK_INTERRUPTIBLE
state early, so that we do not miss wakeups due to messages or
kthread_should_stop event.

Thanks to Pablo Neira Ayuso for his valuable feedback!

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: always update some of the flags bits in backup

As the goal is to mirror the inactconns/activeconns
counters in the backup server, make sure the cp->flags are
updated even if cp is still not bound to dest. If cp->flags
are not updated ip_vs_bind_dest will rely only on the initial
flags when updating the counters. To avoid mistakes and
complicated checks for protocol state rely only on the
IP_VS_CONN_F_INACTIVE bit when updating the counters.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: fix ip_vs_try_bind_dest to rebind app and transmitter

Initially, when the synced connection is created we
use the forwarding method provided by master but once we
bind to destination it can be changed. As result, we must
update the application and the transmitter.

As ip_vs_try_bind_dest is called always for connections
that require dest binding, there is no need to validate the
cp and dest pointers.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: remove check for IP_VS_CONN_F_SYNC from ip_vs_bind_dest

As the IP_VS_CONN_F_INACTIVE bit is properly set
in cp->flags for all kind of connections we do not need to
add special checks for synced connections when updating
the activeconns/inactconns counters for first time. Now
logic will look just like in ip_vs_unbind_dest.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: ignore IP_VS_CONN_F_NOOUTPUT in backup server

As IP_VS_CONN_F_NOOUTPUT is derived from the
forwarding method we should get it from conn_flags just
like we do it for IP_VS_CONN_F_FWD_MASK bits when binding
to real server.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: use GFP_KERNEL allocation where possible

Use GFP_KERNEL instead of GFP_ATOMIC when registering an ipvs protocol.

This is safe since it will always run from a process context.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: SH scheduler does not need GFP_ATOMIC allocation

Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: LBLCR scheduler does not need GFP_ATOMIC allocation on init

Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: WRR scheduler does not need GFP_ATOMIC allocation

Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: DH scheduler does not need GFP_ATOMIC allocation

Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: LBLC scheduler does not need GFP_ATOMIC allocation on init

Schedulers are initialized and bound to services only
on commands.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: timeout tables do not need GFP_ATOMIC allocation

They are called only on initialization.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

netfilter: bridge: optionally set indev to vlan

if net.bridge.bridge-nf-filter-vlan-tagged sysctl is enabled, bridge
netfilter removes the vlan header temporarily and then feeds the packet
to ip(6)tables.

When the new "bridge-nf-pass-vlan-input-device" sysctl is on
(default off), then bridge netfilter will also set the
in-interface to the vlan interface; if such an interface exists.

This is needed to make iptables REDIRECT target work with
"vlan-on-top-of-bridge" setups and to allow use of "iptables -i" to
match the vlan device name.

Also update Documentation with current brnf default settings.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack: use this_cpu_inc()

this_cpu_inc() is IRQ safe and faster than
local_bh_disable()/__this_cpu_inc()/local_bh_enable(), at least on x86.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_ct_helper: allow to disable automatic helper assignment

This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper

[ Note: flows that already got a helper will keep using it even
if automatic helper assignment has been disabled ]

Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.

There are good reasons to stop supporting automatic helper
assignment, for further information, please read:

http://www.netfilter.org/news.html#2012-04-03

This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_ct_ecache: refactor notifier registration

* ret variable initialization removed as useless
* similar code strings concatenated and functions code
flow became more plain

Signed-off-by: Tony Zelenoff <antonz@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

smsc75xx: let EEPROM determine GPIO/LED settings

This patch allows the GPIO/LED settings to be configured by the
EEPROM if present, and only sets the default values (LED outputs
for link/activity) when an EEPROM is not detected.

Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

smsc75xx: eliminate unnecessary phy register read

Only a write is necessary to clear the interrupt status, and we
don't use the value from the preceding read operation. This
patch eliminates the unnecessary read.

Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

smsc75xx: replace 0xffff with PHY_INT_SRC_CLEAR_ALL

This patch defines PHY_INT_SRC_CLEAR_ALL to replace the value 0xffff
in order to be more self-documenting.

This patch should make no functional change, it is purely cosmetic.

Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Conflicts:
drivers/net/ethernet/intel/e1000e/param.c
drivers/net/wireless/iwlwifi/iwl-agn-rx.c
drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
drivers/net/wireless/iwlwifi/iwl-trans.h

Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell. In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.

In e1000e a conflict arose in the validation code for settings of
adapter->itr. 'net-next' had more sophisticated logic so that
logic was used.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vhost-net-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Michael S. Tsirkin says:

--------------------
There are mostly bugfixes here.
I hope to merge some more patches by 3.5, in particular
vlan support fixes are waiting for Eric's ack,
and a version of tracepoint patch might be
ready in time, but let's merge what's ready so it's testable.

This includes a ton of zerocopy fixes by Jason -
good stuff but too intrusive for 3.4 and zerocopy is experimental
anyway.

virtio supported delayed interrupt for a while now
so adding support to the virtio tool made sense
--------------------

Signed-off-by: David S. Miller <davem@davemloft.net>

net: IP_MULTICAST_IF setsockopt now recognizes struct mreq

Until now, struct mreq has not been recognized and it was worked with
as with struct in_addr. That means imr_multiaddr was copied to
imr_address. So do recognize struct mreq here and copy that correctly.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdev/of/phy: Add MDIO bus multiplexer driven by GPIO lines.

The GPIO pins select which sub bus is connected to the master.

Initially tested with an sn74cbtlv3253 switch device wired into the
MDIO bus.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdev/of/phy: Add MDIO bus multiplexer support.

This patch adds a somewhat generic framework for MDIO bus
multiplexers.  It is modeled on the I2C multiplexer.

The multiplexer is needed if there are multiple PHYs with the same
address connected to the same MDIO bus adepter, or if there is
insufficient electrical drive capability for all the connected PHY
devices.

Conceptually it could look something like this:

                   ------------------
                   | Control Signal |
                   --------+---------
                           |
---------------   --------+------
| MDIO MASTER |---| Multiplexer |
---------------   --+-------+----
                     |       |
                     C       C
                     h       h
                     i       i
                     l       l
                     d       d
                     |       |
     ---------       A       B   ---------
     |       |       |       |   |       |
     | PHY@1 +-------+       +---+ PHY@1 |
     |       |       |       |   |       |
     ---------       |       |   ---------
     ---------       |       |   ---------
     |       |       |       |   |       |
     | PHY@2 +-------+       +---+ PHY@2 |
     |       |                   |       |
     ---------                   ---------

This framework configures the bus topology from device tree data.  The
mechanics of switching the multiplexer is left to device specific
drivers.

The follow-on patch contains a multiplexer driven by GPIO lines.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netdev/of/phy: New function: of_mdio_find_bus().

Add of_mdio_find_bus() which allows an mii_bus to be located given its
associated the device tree node.

This is needed by the follow-on patch to add a driver for MDIO bus
multiplexers.

The of_mdiobus_register() function is modified so that the device tree
node is recorded in the mii_bus. Then we can find it again by
iterating over all mdio_bus_class devices.

Because the OF device tree has now become an integral part of the
kernel, this can live in mdio_bus.c (which contains the needed
mdio_bus_class structure) instead of of_mdio.c.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/capi: elliminate capincci_find() in non-middleware case

If Kernel CAPI is compiled without CONFIG_ISDN_CAPI_MIDDLEWARE,
the structure retrieved via capincci_find() is never actually
used, so don't compile that function in that case.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/capi: fix readability damage

Fix up some of the readibility deterioration caused by the recent
whitespace coding style cleanup.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/gigaset: unify function return values

Various functions in the Gigaset driver were using different
conventions for the meaning of their int return values.
Align them to the usual negative error numbers convention.

Inspired-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/gigaset: internal function name cleanup

Functions clear_at_state and free_strings did the same thing;
drop one of them, keeping the more descriptive name.
Drop a redundant call.
Rename function dealloc_at_states to dealloc_temp_at_states
to clarify its purpose.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/gigaset: fix readability damage

Fix up some of the readibility deterioration caused by the recent
whitespace coding style cleanup.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/gigaset: improve error handling querying firmware version

An out-of-place "OK" response to the "AT+GMR" (get firmware version)
command turns out to be, more often than not, a delayed response to
a previous command rather than an actual error, so continue waiting
for the version number in that case.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/gigaset: fix CAPI disconnect B3 handling

If DISCONNECT_B3_IND was synthesized because of a DISCONNECT_REQ
with existing logical connections, the connection state wasn't
updated accordingly. Also the emitted DISCONNECT_B3_IND message
wasn't included in the debug log as requested.
This patch fixes both of these issues.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn/gigaset: ratelimit CAPI message dumps

Introduce a global ratelimit for CAPI message dumps to protect
against possible log flood.
Drop the ratelimit for ignored messages which is now covered by the
global one.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: compare_ether_addr[_64bits]() has no ordering

Neither compare_ether_addr() nor compare_ether_addr_64bits()
(as it can fall back to the former) have comparison semantics
like memcmp() where the sign of the return value indicates sort
order. We had a bug in the wireless code due to a blind memcmp
replacement because of this.

A cursory look suggests that the wireless bug was the only one
due to this semantic difference.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

skb: Add inline helper for getting the skb end offset from head

With the recent changes for how we compute the skb truesize it occurs to me
we are probably going to have a lot of calls to skb_end_pointer -
skb->head. Instead of running all over the place doing that it would make
more sense to just make it a separate inline skb_end_offset(skb) that way
we can return the correct value without having gcc having to do all the
optimization to cancel out skb->head - skb->head.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

skb: Drop "fastpath" variable for skb_cloned check in pskb_expand_head

Since there is now only one spot that actually uses "fastpath" there isn't
much point in carrying it. Instead we can just use a check for skb_cloned
to verify if we can perform the fast-path free for the head or not.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

skb: Drop bad code from pskb_expand_head

The fast-path for pskb_expand_head contains a check where the size plus the
unaligned size of skb_shared_info is compared against the size of the data
buffer.  This code path has two issues.  First is the fact that after the
recent changes by Eric Dumazet to __alloc_skb and build_skb the shared info
is always placed in the optimal spot for a buffer size making this check
unnecessary.  The second issue is the fact that the check doesn't take into
account the aligned size of shared info.  As a result the code burns cycles
doing a memcpy with nothing actually being shifted.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cdc_ether: Ignore bogus union descriptor for RNDIS devices

Some RNDIS devices include a bogus CDC Union descriptor pointing
to non-existing interfaces. The RNDIS code is already prepared
to handle devices without a CDC Union descriptor by hardwiring
the driver to use interfaces 0 and 1, which is correct for the
devices with the bogus descriptor as well. So we can reuse the
existing workaround.

Cc: Markus Kolb <linux-201011@tower-net.de>
Cc: Iker Salmón San Millán <shaola@esdebian.org>
Cc: Jonathan Nieder <jrnieder@gmail.com>
Cc: Oliver Neukum <oliver@neukum.org>
Cc: 655387@bugs.debian.org
Cc: stable@vger.kernel.org
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnx2x: bug fix when loading after SAN boot

This is a bug fix for an "interface fails to load" issue.
The issue occurs when bnx2x driver loads after UNDI driver was previously
loaded over the chip. In such a scenario the UNDI driver is loaded and operates
in the pre-boot kernel, within its own specific host memory address range.
When the pre-boot stage is complete, the real kernel is loaded, in a new and
distinct host memory address range. The transition from pre-boot stage to boot
is asynchronous from UNDI point of view.

A race condition occurs when UNDI driver triggers a DMAE transaction to valid
host addresses in the pre-boot stage, when control is diverted to the real
kernel. This results in access to illegal addresses by our HW as the addresses
which were valid in the preboot stage are no longer considered valid.
Specifically, the 'was_error' bit in the pci glue of our device is set. This
causes all following pci transactions from chip to host to timeout (in
accordance to the pci spec).

Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: dcb: IEEE PFC stats and reset logic incorrect

PFC stats are only tabulated when PFC is enabled. However in IEEE
mode the ieee_pfc pfc_tc bits were not checked and the calculation
was aborted.

This results in statistics not being reported through ethtool and
possible a false Tx hang occurring when receiving pause frames.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: increase version number

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: clear REQ and GNT in EECD (82571 && 82572)

Clear the REQ and GNT bit in the eeprom control register (EECD).
This is required if the eeprom is to be accessed with auto read
EERD register.

After a cold reset this doesn't matter but if PBIST MAC test was
executed before booting, the register was left in a dirty state
(the 2 bits where set), which caused the read operation to time out
and returning 0.

Reference (page 312):
http://download.intel.com/design/network/manuals/316080.pdf

Reported-by: Aleksandar Igic <aleksandar.igic@dektech.com.au>
Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: enable forced master/slave on 82577

Like other supported (igp) PHYs, the driver needs to be able to force the
master/slave mode on 82577. Since the code is the same as what already
exists in the code flow for igp PHYs, move it to a new function to be
called for both flows.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless

tcp: be more strict before accepting ECN negociation

It appears some networks play bad games with the two bits reserved for
ECN. This can trigger false congestion notifications and very slow
transferts.

Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can
disable TCP ECN negociation if it happens we receive mangled CT bits in
the SYN packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Perry Lorier <perryl@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Wilmer van der Gaast <wilmer@google.com>
Cc: Ankur Jain <jankur@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Dave Täht <dave.taht@bufferbloat.net>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Help to identify the card

With multiple cards is hard to figure out which port caused trouble
int the layer2 routines (e.g. got a timeout).
Now we have the informations in the log output.

Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Layer1 statemachine fix

The timer3 and the activation delay timer need to be independent.
If timer3 fires do not reqest power up we have to send only INFO 0.
Now layer1 pass TBR3 again.

Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Make layer1 timer 3 value configurable

For certification test it is very useful to change the layer1
timer3 value on runtime.

Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: L2 timeouts need to be queued as L2 event

To be full preemptiv safe, we cannot handle a L2 timeout in the timer
context itself, we should do all actions via the D-channel thread.

Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Fix refcounting bug

Under some configs it was still not possible to unload the driver,
because the module use count was srewed up.

Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: Added PH_* state info to tei manager.

Tei manager reports current layer 1 state on creation.
On state change it reports it to the socket interface.

Signed-off-by: Andreas Eversberg <andreas@eversberg.eu>
Signed-off-by: Karsten Keil <keil@b1-systems.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sched: factorize code (qdisc_drop())

Use qdisc_drop() helper where possible.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net

e1000: Silence sparse warnings by correcting type

Silence sparse warnings shown below:
...
drivers/net/ethernet/intel/e1000/e1000_main.c:3435:17: warning:
cast to restricted __le64
drivers/net/ethernet/intel/e1000/e1000_main.c:3435:17: warning:
cast to restricted __le64
...

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

igb, ixgbe: netdev_tx_reset_queue incorrectly called from tx init path

igb and ixgbe incorrectly call netdev_tx_reset_queue() from
i{gb|xgbe}_clean_tx_ring() this sort of works in most cases except
when the number of real tx queues changes. When the number of real
tx queues changes netdev_tx_reset_queue() only gets called on the
new number of queues so when we reduce the number of queues we risk
triggering the watchdog timer and repeated device resets.

So this is not only a cosmetic issue but causes real bugs. For
example enabling/disabling DCB or FCoE in ixgbe will trigger this.

CC: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: John Bishop <johnx.bishop@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Update link flow control to correctly handle multiple packet buffer DCB

This change updates the link flow control configuration so that we
correctly set the link flow control settings for DCB.  Previously we would
have to call the fc_enable call 8 times, once for each packet buffer.  If
we move that logic into the fc_enable call itself we can avoid multiple
unnecessary register writes.

This change also corrects an issue in which we were only shifting the water
marks for 82599 parts by 6 instead of 10.  This was resulting in us only
using 1/16 of the packet buffer when flow control was enabled.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Reorder link flow control functions in ixgbe_common.c

We can avoid many of the forward declarations found in ixgbe_common.c by
just reordering things so this patch does that to help cleanup the code.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Use __free_pages instead of put_page to release pages

This change replaces the calls to put_page with calls to __free_page.

Since the FCoE code is able to access order 1 pages I thought it would be a
good idea to change things over to using __free_pages since that is the
preferred approach for freeing pages.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Make ixgbe_fc_autoneg return void and always set current_mode

This change makes it so that ixgbe_fc_autoneg is a void and always sets the
current_mode. Previously if the link was down we would return an error,
however there is no harm in simply treating a link down case as a case in
which autoneg simply failed. This allows us to rely on the return value of
the ixgbe_fc_enable call now since there should be no cases where it
returns an error that would normally be ignored.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Reorder the ring to q_vector mapping to improve performance

This change reorders the mapping of rings to q_vectors in the case that the
number of rings exceeds the number of q_vectors.  Previously we would
allocate the first R/N queues to the first q_vector where R is the number
of rings and N is the number of q_vectors.  Instead of doing this we can do
a better job of interleaving the rings to the CPUs by assigning every Nth
ring to the q_vector.

The below tables illustrate this change for the R = 16 N = 4 case.
          Before patch  After patch
q_vector:  0  1  2  3    0  1  2  3
Rings:     0  4  8 12    0  1  2  3
           1  5  9 13    4  5  6  7
           3  6 10 14    8  9 10 11
           4  7 11 15   12 13 14 15

This should improve the performance for both DCB or ATR when the number of
rings exceeds the number of q_vectors allocated by the adapter.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Track instances of buffer available but no DMA resources present

This change makes it so that we can track instances of where a packet was
dropped due to a packet being received when there are no DMA buffers
available in the ring.

For some reason this was only being enabled with RSC, however it makes
more sense to always have this feature on so that we can track any cases
where we might drop a buffer due to an Rx ring being full.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: initial support for i217

i217 is the next-generation LOM that will be available on systems with the
Lynx Point Platform Controller Hub (PCH) chipset from Intel. This patch
provides the initial support for the device.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Update driver version number

Version bump to 1.11.3-k.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge tag 'mfd-for-linus-3.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6

Pull second set of MFD fixes from Samuel Ortiz:
"This time we only have a one liner fixing an omap-usb build error."

* tag 'mfd-for-linus-3.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
mfd: Fix build breakage in omap-usb-host.c

Merge branch 'efi-vars' from Matthew Garrett

* efi-vars:
efivars: Improve variable validation

efivars: Improve variable validation

Ben Hutchings pointed out that the validation in efivars was inadequate -
most obviously, an entry with size 0 would server as a DoS against the
kernel. Improve this based on his suggestions.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'tag/upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

Pull libata fixes from Jeff Garzik:

1) Fix regression that could cause a misdiagnosis, which in turn could
   lead to an erroneous 3.0 Gbps -> 1.5 downshift, particularly when hotplug
   and suspend/resume is involved.

2) Fix a regression that led to ata%d controller ids being numbered one
   larger than in <= 3.4-rc3 (oh, the horror!).  Controller ids should now be
   as expected.

3) add some DT, PCI id's

4) ata/pata_arasan_cf: minor cpp fixing/cleaning

* tag 'tag/upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  ata: ahci_platform: Add synopsys ahci controller in DT's compatible list
  ata/pata_arasan_cf: Move arasan_cf_pm_ops out of #ifdef, #endif macros
  libata: init ata_print_id to 0
  ahci: Detect Marvell 88SE9172 SATA controller
  libata: skip old error history when counting probe trials

Merge branch 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux

Pull i2c embedded fixes from Wolfram Sang:
"Here are some typical i2c driver bugfixes for 3.4.  Missed clock
  handling, improper timeout fixes, hardware wrokarounds...  All
  patches have been in linux-next for a few days, too."

* 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux:
  i2c: mxs: disable QUEUE when sending is done
  i2c: mxs: handle spurious interrupt
  i2c-eg20t: Modify MODULE_AUTHOR's email address
  i2c-eg20t: change timeout value 50msec to 1000msec
  i2c: tegra: Add delay before resetting the controller after NACK
  i2c: pnx: Disable clk in suspend

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Just some regression fixes from Ben along with a variable that gcc
  failed to spot is uninitialised."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  nouveau: initialise has_optimus variable.
  drm/nv10/gpio: fix thinko in mask for gpio lines 2-9
  nvc0/fb: shut up PMFB interrupt after the first occurrence
  drm/nouveau/hdmi: use correct hdmi regs for nvaa/nvac
  drm/nouveau/bios: fix regression on some nv4x board

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Transfer padding was wrong for full-speed USB in ASIX driver, fix
    from Ingo van Lil.

2) Propagate the negative packet offset fix into the PowerPC BPF JIT.
    From Jan Seiffert.

3) dl2k driver's private ioctls were letting unprivileged tasks make
    MII writes and other ugly bits like that.  Fix from Jeff Mahoney.

4) Fix TX VLAN and RX packet drops in ucc_geth, from Joakim Tjernlund.

5) OOPS and network namespace fixes in IPVS from Hans Schillstrom and
    Julian Anastasov.

6) Fix races and sleeping in locked context bugs in drop_monitor, from
    Neil Horman.

7) Fix link status indication in smsc95xx driver, from Paolo Pisati.

8) Fix bridge netfilter OOPS, from Peter Huang.

9) L2TP sendmsg can return on error conditions with the socket lock
    held, oops.  Fix from Sasha Levin.

10) udp_diag should return meaningful values for socket memory usage,
    from Shan Wei.

11) Eric Dumazet is so awesome he gets his own section:

       Socket memory cgroup code (I never should have applied those
       patches, grumble...) made erroneous changes to
       sk_sockets_allocated_read_positive().  It was changed to
       use percpu_counter_sum_positive (which requires BH disabling)
       instead of percpu_counter_read_positive (which does not).
       Revert back to avoid crashes and lockdep warnings.

       Adjust the default tcp_adv_win_scale and tcp_rmem[2] values
       to fix throughput regressions.  This is necessary as a result
       of our more precise skb->truesize tracking.

       Fix SKB leak in netem packet scheduler.

12) New device IDs for various bluetooth devices, from Manoj Iyer,
    AceLan Kao, and Steven Harms.

13) Fix command completion race in ipw2200, from Stanislav Yakovlev.

14) Fix rtlwifi oops on unload, from Larry Finger.

15) Fix hard_mtu when adjusting hard_header_len in smsc95xx driver.
    From Stephane Fillod.

16) ehea driver registers it's IRQ before all the necessary state is
    setup, resulting in crashes.  Fix from Thadeu Lima de Souza
    Cascardo.

17) Fix PHY connection failures in davinci_emac driver, from Anatolij
    Gustschin.

18) Missing break; in switch statement in bluetooth's
    hci_cmd_complete_evt().  Fix from Szymon Janc.

19) Fix queue programming in iwlwifi, from Johannes Berg.

20) Interrupt throttling defaults not being actually programmed into the
    hardware, fix from Jeff Kirsher and Ying Cai.

21) TLAN driver SKB encoding in descriptor busted on 64-bit, fix from
    Benjamin Poirier.

22) Fix blind status block RX producer pointer deref in TG3 driver, from
    Matt Carlson.

23) Promisc and multicast are busted on ehea, fixes from Thadeu Lima de
    Souza Cascardo.

24) Fix crashes in 6lowpan, from Alexander Smirnov.

25) tcp_complete_cwr() needs to be careful to not rewind the CWND to
    ssthresh if ssthresh has the "infinite" value.  Fix from Yuchung
    Cheng.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
  sungem: Fix WakeOnLan
  tcp: change tcp_adv_win_scale and tcp_rmem[2]
  net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg
  drop_monitor: prevent init path from scheduling on the wrong cpu
  usbnet: fix failure handling in usbnet_probe
  usbnet: fix leak of transfer buffer of dev->interrupt
  ucc_geth: Add 16 bytes to max TX frame for VLANs
  net: ucc_geth, increase no. of HW RX descriptors
  netem: fix possible skb leak
  sky2: fix receive length error in mixed non-VLAN/VLAN traffic
  sky2: propogate rx hash when packet is copied
  net: fix two typos in skbuff.h
  cxgb3: Don't call cxgb_vlan_mode until q locks are initialized
  ixgbe: fix calling skb_put on nonlinear skb assertion bug
  ixgbe: Fix a memory leak in IEEE DCB
  igbvf: fix the bug when initializing the igbvf
  smsc75xx: enable mac to detect speed/duplex from phy
  smsc75xx: declare smsc75xx's MII as GMII capable
  smsc75xx: fix phy interrupt acknowledge
  smsc75xx: fix phy init reset loop
  ...

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
"Fix OOPS seen in coretemp driver if the CPU core ID is too large"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (coretemp) Increase CPU core limit
hwmon: (coretemp) fix oops on cpu unplug

net/niu: remove one superfluous dma mask check

The idea here seems to be to get a 44bit DMA mask working and if this
fails it should fallback to a 32bit DMA mask. The dma_mask variable is
assigned once to 44bit and never updated. pci_set_dma_mask() and
pci_set_consistent_dma_mask() are both implemented as functions so there
is no evil macro which might update dma_mask. Looking at the assembly, I
see a call to dma_set_mask() followed by dma_supported() and then a jump
passed the second dma_set_mask(). The only way to get to second
dma_set_mask() call is by an error code in the first one.

So I hereby remove the check since it looks superfluous. Please ignore
the path if there is black magic involved.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

ata: ahci_platform: Add synopsys ahci controller in DT's compatible list

SPEAr13xx series of SoCs contain Synopsys AHCI SATA Controller which shares
ahci_platform driver with other controller versions.

This patch updates DT compatible list for ahci_platform. It also updates and
renames binding documentation to more generic name.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Cc: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ata/pata_arasan_cf: Move arasan_cf_pm_ops out of #ifdef, #endif macros

#ifdef, #endif is not required in definition/usage of arasan_cf_pm_ops. So, move
this definition and its usage outside of them.

Signed-off-by: Viresh Kumar <viresh.kumar@st.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: init ata_print_id to 0

When comparing the dmesg between 3.4-rc3 and 3.4-rc4 I found the
following differences:

-ata1: SATA max UDMA/133 abar m2048@0xf9fff000 port 0xf9fff100 irq 47
-ata2: SATA max UDMA/133 abar m2048@0xf9fff000 port 0xf9fff180 irq 47
-ata3: DUMMY
+ata2: SATA max UDMA/133 abar m2048@0xf9fff000 port 0xf9fff100 irq 47
+ata3: SATA max UDMA/133 abar m2048@0xf9fff000 port 0xf9fff180 irq 47
ata4: DUMMY
ata5: DUMMY
-ata6: SATA max UDMA/133 abar m2048@0xf9fff000 port 0xf9fff380 irq 47
+ata6: DUMMY
+ata7: SATA max UDMA/133 abar m2048@0xf9fff000 port 0xf9fff380 irq 47

The change of numbering comes from commit 85d6725b7c0d7e3f ("libata:
make ata_print_id atomic") that changed lines like

ap->print_id = ata_print_id++;
to
ap->print_id = atomic_inc_return(&ata_print_id);

As the latter behaves like ++ata_print_id, we must initialize
it to zero to start the numbering from one.

Signed-off-by: Tero Roponen <tero.roponen@gmail.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

ahci: Detect Marvell 88SE9172 SATA controller

The Marvell 88SE9172 SATA controller (PCI ID 1b4b 917a) already worked
once it was detected, but was missing an ahci_pci_tbl entry.

Boot tested on a Gigabyte Z68X-UD3H-B3 motherboard.

Signed-off-by: Matt Johnson <johnso87@illinois.edu>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

libata: skip old error history when counting probe trials

Commit d902747("[libata] Add ATA transport class") introduced
ATA_EFLAG_OLD_ER to mark entries in the error ring as cleared.

But ata_count_probe_trials_cb() didn't check this flag and it still
counts the old error history. So wrong probe trials count is returned
and it causes problem, for example, SATA link speed is slowed down from
3.0Gbps to 1.5Gbps.

Fix it by checking ATA_EFLAG_OLD_ER in ata_count_probe_trials_cb().

Cc: stable <stable@vger.kernel.org> # 2.6.37+
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

skb: Add skb_head_is_locked helper function

This patch adds support for a skb_head_is_locked helper function. It is
meant to be used any time we are considering transferring the head from
skb->head to a paged frag. If the head is locked it means we cannot remove
the head from the skb so it must be copied or we must take the skb as a
whole.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix truesize accounting in skb_gro_receive()

GRO is very optimistic in skb truesize estimates, only taking into
account the used part of fragments.

Be conservative, and use more precise computation, so that bloated GRO
skbs can be collapsed eventually.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem

iwlwifi: fix skb truesize underestimation

By default, iwlwifi uses order-1 pages (8 KB) to store incoming frames,
but doesnt say so in skb->truesize.

This makes very possible to exhaust kernel memory since these skb evade
normal socket memory accounting.

As struct ieee80211_hdr is going to be pulled before calling IP stack,
there is no need to use dev_alloc_skb() to reserve NET_SKB_PAD bytes.
alloc_skb() is ok in this driver, allowing more tailroom.

Pull beginning of frame in skb header, in the hope we can reuse order-1
pages in the driver immediately for small frames and reduce their
truesize to the minimum (linear skbs)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>

ixgbe: dcb: BIT_APP_UPCHG not set by ixgbe_copy_dcb_cfg()

After this commit:

commit aacc1bea190d731755a65cb8ec31dd756f4e263e
Author: Multanen, Eric W <eric.w.multanen@intel.com>
Date: Wed Mar 28 07:49:09 2012 +0000

ixgbe: driver fix for link flap

The BIT_APP_UPCHG bit is no longer set when ixgbe_dcbnl_set_all() is
called. This results in the FCoE app user priority never getting set
and the driver will not configure the tx_rings correctly for FCoE
packets which use the SAN MTU and FCoE offloads.

We resolve this regression by fixing ixgbe_copy_dcb_cfg() to also
check for FCoE application changes. Additionally, we can drop the
IEEE variants of get_dcb_app() because this path is never called
with the IEEE mode enabled.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Ross Brattain <ross.b.brattain@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: fix race condition with shutdown

It was possible for shutdown to pull the rug out from other driver entry
points. Now we just grab the rtnl lock before taking everything apart.
Thanks to Hariharan for noticing this tight race condition.

Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Cc: Hariharan Nagarajan <hanagara@cisco.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: Update version string

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: Make sure jumbo frames are set correctly after PF reset

If the Physical Function (PF) resets after the VF has set jumbo
frame MTU then the VF jumbo frame is overwritten. Make sure the
VF driver always requests proper MTU size after reset
synchronization.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbevf: Add support to recognize 100mb link speed

The X540 10Gig controller is capable of linking at 100Mbits - add
support for reporting that link speed.

Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Remove special case for 82573/82574 ASPM L1 disablement

For the 82573, ASPM L1 gets disabled wholesale so this special-case code
is not required. For the 82574 the previous patch does the same as for
the 82573, disabling L1 on the adapter. Thus, this code is no longer
required and can be removed.

Signed-off-by: Chris Boot <bootc@bootc.net>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Disable ASPM L1 on 82574

ASPM on the 82574 causes trouble. Currently the driver disables L0s for
this NIC but only disables L1 if the MTU is >1500. This patch simply
causes L1 to be disabled regardless of the MTU setting.

Signed-off-by: Chris Boot <bootc@bootc.net>
Cc: "Wyborny, Carolyn" <carolyn.wyborny@intel.com>
Cc: Nix <nix@esperi.org.uk>
Link: https://lkml.org/lkml/2012/3/19/362
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Driver workaround for IPv6 Header Extension Erratum.

Previously, IPv6 extension header parsing was disabled for all devices
supported by e1000e when using packet split mode. However, as per a
silicon errata, only certain devices need this restriction and will need
to disable IPv6 extension header parsing for all modes.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: Resolve intermittent negotiation issue on 82574/82583.

For 82574 and 82583 devices, resolve an intermittent link issue where
the link negotiates to 100Mbps rather than 1Gbps when powering off the
PHY and powering on the PHY after several seconds.

Signed-off-by: Matthew Vick <matthew.vick@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: cleanup long [read|write]_reg_locked PHY ops function pointers

Calling the locked versions of the read/write PHY ops function pointers
often produces excessively long lines. Shorten these as is done with
the non-locked versions of the PHY register read/write functions.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

e1000e: suggest a possible workaround to a device hang on 82577/8

There is a known issue in the 82577 and 82578 device that can cause a hang
in the device hardware during traffic stress; the current workaround in the
driver is to disable transmit flow control by default. If the user enables
transmit flow control and the device hang occurs, provide a message in the
syslog suggesting to re-enable the workaround.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

nouveau: initialise has_optimus variable.

We should initialise this to 0 really to avoid getting false positives.

Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

ixgbe: Fix use after free on module remove

While testing the TCP changes I had to fix an issue in order to be able to
load and unload the module.

The recent patch that added thermal sensor support added a use after free
bug on module unload with an 82598 adapter in the system. To resolve the
issue I have updated the code so that when we free the info_kobj we set it
back to NULL.

I suspect there are likely other bugs present, but I will leave that for
another patch that can undergo more testing.

I am submitting this directly to net-next since this fixes a fairly serious
bug that will lock up the ixgbe module until the system is rebooted.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: move stats merge to the end of tcp_try_coalesce

This change cleans up the last bits of tcp_try_coalesce so that we only
need one goto which jumps to the end of the function. The idea is to make
the code more readable by putting things in a linear order so that we start
execution at the top of the function, and end it at the bottom.

I also made a slight tweak to the code for handling frags when we are a
clone. Instead of making it an if (clone) loop else nr_frags = 0 I changed
the logic so that if (!clone) we just set the number of frags to 0 which
disables the for loop anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: Move code related to head frag in tcp_try_coalesce

This change reorders the code related to the use of an skb->head_frag so it
is placed before we check the rest of the frags. This allows the code to
read more linearly instead of like some sort of loop.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>