git.karo-electronics.de Git - karo-tx-linux.git/log

netfilter: nf_conntrack: don't send destroy events from iterator

Let nf_ct_delete handle delivery of the DESTROY event.

Based on earlier patch from Pablo Neira.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

ipvs: ip_vs_sh: ip_vs_sh_get_port: check skb_header_pointer for NULL

skb_header_pointer could return NULL, so check for it as we do it
everywhere else in ipvs code. This fixes a coverity warning.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>

ipvs: fixed spacing at for statements

found using checkpatch.pl

Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>

netfilter: tproxy: fix build with IP6_NF_IPTABLES=n

after commit 93742cf (netfilter: tproxy: remove nf_tproxy_core.h)

CONFIG_IPV6=y
CONFIG_IP6_NF_IPTABLES=n

gives us:

net/netfilter/xt_TPROXY.c: In function 'nf_tproxy_get_sock_v6':
net/netfilter/xt_TPROXY.c:178:4: error: implicit declaration of function 'inet6_lookup_listener'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_nat: use per-conntrack locking for sequence number adjustments

Get rid of the global lock and use per-conntrack locks for protecting the
sequencen number adjustment data. Additionally saves one lock/unlock
operation for every TCP packet.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_nat: change sequence number adjustments to 32 bits

Using 16 bits is too small, when many adjustments happen the offsets might
overflow and break the connection.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_nat: fix locking in nf_nat_seq_adjust()

nf_nat_seq_adjust() needs to grab nf_nat_seqofs_lock to protect against
concurrent changes to the sequence adjustment data.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack: remove duplicate code in ctnetlink

ctnetlink contains copy-paste code from death_by_timeout. In order to
avoid changing both places in upcoming event delivery patch,
export death_by_timeout functionality and use it in the ctnetlink code.

Based on earlier patch from Pablo Neira.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: tproxy: remove nf_tproxy_core.h

We've removed nf_tproxy_core.ko, so also remove its header.
The lookup helpers are split and then moved to tproxy target/socket match.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: tproxy: remove nf_tproxy_core, keep tw sk assigned to skb

The module was "permanent", due to the special tproxy skb->destructor.
Nowadays we have tcp early demux and its sock_edemux destructor in
networking core which can be used instead.

Thanks to early demux changes the input path now also handles
"skb->sk is tw socket" correctly, so this no longer needs the special
handling introduced with commit d503b30bd648b3cb4e5f50b65d27e389960cc6d9
(netfilter: tproxy: do not assign timewait sockets to skb->sk).

Thus:
- move assign_sock function to where its needed
- don't prevent timewait sockets from being assigned to the skb
- remove nf_tproxy_core.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_queue: relax NFQA_CT attribute check

Allow modifying attributes of the conntrack associated with a packet
without first requesting ct data via CFG_F_CONNTRACK or extra
nfnetlink_conntrack socket.

Also remove unneded rcu_read_lock; the entire function is already
protected by rcu.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: connlabels: remove unneeded includes

leftovers from the (never merged) v1 patch.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack: constify sk_buff argument to nf_ct_attach()

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_conntrack: remove net_ratelimit() for LOG_INVALID()

Logging of invalid packets has to be explicitly enabled. Rate-limiting these
messages is inconsistent with other netfilter logging features and makes
debugging harder.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: xt_addrtype: fix trivial typo

Fix typo in error message.

Signed-off-by: Phil Oester <kernel@linuxace.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

drivers/net/ethernet/stmicro/stmmac: don't check resource with devm_ioremap_resource

devm_ioremap_resource does sanity checks on the given resource. No need to
duplicate this in the driver.

Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Make devnet_rename_seq static

No users outside net/core/dev.c.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: TCP_NOTSENT_LOWAT socket option

Idea of this patch is to add optional limitation of number of
unsent bytes in TCP sockets, to reduce usage of kernel memory.

TCP receiver might announce a big window, and TCP sender autotuning
might allow a large amount of bytes in write queue, but this has little
performance impact if a large part of this buffering is wasted :

Write queue needs to be large only to deal with large BDP, not
necessarily to cope with scheduling delays (incoming ACKS make room
for the application to queue more bytes)

For most workloads, using a value of 128 KB or less is OK to give
applications enough time to react to POLLOUT events in time
(or being awaken in a blocking sendmsg())

This patch adds two ways to set the limit :

1) Per socket option TCP_NOTSENT_LOWAT

2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets
not using TCP_NOTSENT_LOWAT socket option (or setting a zero value)
Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect.

This changes poll()/select()/epoll() to report POLLOUT
only if number of unsent bytes is below tp->nosent_lowat

Note this might increase number of sendmsg()/sendfile() calls
when using non blocking sockets,
and increase number of context switches for blocking sockets.

Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is
defined as :
Specify the minimum number of bytes in the buffer until
the socket layer will pass the data to the protocol)

Tested:

netperf sessions, and watching /proc/net/protocols "memory" column for TCP

With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory
used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458)

lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6     1880      2   45458   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
TCP       1696    508   45458   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y

lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols
TCPv6     1880      2   20567   no     208   yes  ipv6        y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y
TCP       1696    508   20567   no     208   yes  kernel      y  y  y  y  y  y  y  y  y  y  y  y  y  n  y  y  y  y  y

Using 128KB has no bad effect on the throughput or cpu usage
of a single flow, although there is an increase of context switches.

A bonus is that we hold socket lock for a shorter amount
of time and should improve latencies of ACK processing.

lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
Final       Final                                             %     Method %      Method
1651584     6291456     16384  20.00   17447.90   10^6bits/s  3.13  S      -1.00  U      0.353   -1.000  usec/KB

Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

           412,514 context-switches

     200.034645535 seconds time elapsed

lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat
lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf.
Local       Remote      Local  Elapsed Throughput Throughput  Local Local  Remote Remote Local   Remote  Service
Send Socket Recv Socket Send   Time               Units       CPU   CPU    CPU    CPU    Service Service Demand
Size        Size        Size   (sec)                          Util  Util   Util   Util   Demand  Demand  Units
Final       Final                                             %     Method %      Method
1593240     6291456     16384  20.00   17321.16   10^6bits/s  3.35  S      -1.00  U      0.381   -1.000  usec/KB

Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3':

         2,675,818 context-switches

     200.029651391 seconds time elapsed

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-By: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: add sk_stream_is_writeable() helper

Several call sites use the hardcoded following condition :

sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)

Lets use a helper because TCP_NOTSENT_LOWAT support will change this
condition for TCP sockets.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: trivial: add uapi/linux/sctp.h into maintainers

After this file has moved to the uapi section, we also need to update
this in the maintainers file.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sctp: trivial: update mailing list address

The SCTP mailing list address to send patches or questions
to is linux-sctp@vger.kernel.org and not
lksctp-developers@lists.sourceforge.net anymore. Therefore,
update all occurences.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: cpsw: add support to show hw stats via ethtool

Add support to show CPSW hardware statistics to user via ethtool
so user can find if there were any error reported by hardware or
the system is over loaded duing high data rate transfer.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: Fixed up a error "do not initialise statics to 0 or NULL" in bond_main.c

The error is found by the checkpatch.pl tools.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add rtnl protection for bonding_store_fail_over_mac

We need rtnl protection while reading slave_cnt and updating
the .fail_over_mac, and it also follows the logic "don't change
anything slave-related without rtnl". :)

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: bond_sysfs.c checkpatch cleanup

net/bonding/bond_sysfs.c:1302: ERROR: else should follow close brace '}'
net/bonding/bond_sysfs.c:1314: ERROR: else should follow close brace '}'

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: don't call slave_xxx_netpoll under spinlocks

The slave_xxx_netpoll will call synchronize_rcu_bh(),
so the function may schedule and sleep, it should't be
called under spinlocks.

bond_netpoll_setup() and bond_netpoll_cleanup() are always
protected by rtnl lock, it is no need to take the read lock,
as the slave list couldn't be changed outside rtnl lock.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers/net: enic: Move ethtool code to a separate file

This patch moves all enic ethtool hooks from enic_main.c to a new file
enic_ethtool.c

Signed-off-by: Neel Patel <neepatel@cisco.com>
Signed-off-by: Christian Benvenuti <benve@cisco.com>
Signed-off-by: Nishank Trivedi <nistrive@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: trans_rdma: remove unused function

This patch gets rid of the following warning:

net/9p/trans_rdma.c:594:12: warning: ‘rdma_cancelled’ defined but not used [-Wunused-function]
static int rdma_cancelled(struct p9_client *client, struct p9_req_t *req)

The rdma_cancelled function is not called anywhere in the kernel

Signed-off-by: Andi Shyti <andi@etezian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'be2net'

Sathya Perla says:

====================
The following patches are mostly for providing MAC
filtering ability for VFs. Pls apply. Thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: delete primary MAC address while unloading

Currently the UC-list is being deleted from the HW MAC table, but the primary
MAC is not.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: use SET/GET_MAC_LIST for SH-R

On SH-R and Lancer-R, GET_MAC_LIST cmd is better supported
(instead of NTWK_MAC_QUERY cmd) to query provisioned MAC addresses.
Similiarly, (on SH-R and Lancer-R) SET_MAC_LIST must be used by the PF to
provision a permanent MAC addresses to the VF.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: refactor MAC-addr setup code

The code to configure the permanent MAC in be_setup() has become quite
complicated, with different FW cmds being used for BEx, SH-R and Lancer.
Simplify the logic by moving some of this complexity to be_cmds.c. This
makes the code in be_setup() a little more readable.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix pmac_id for BE3 VFs

For BE3 VFs, the permanent MAC is added by its PF. The VF can retrieve its
pmac_id only via the IFACE_CREATE cmd. This is not true for Lancer and SH-R
VFs which get the pmac_id by issuing a ADD_IFACE_MAC cmd. So, use this
hack only for BE3 VFs.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: allow VFs to program MAC and VLAN filters

In the current design VFs were not allowed to program MAC/VLAN filters.
Only the PF driver was allowed to configure/provision MAC and transparent
VLANs to a VF. Change this to support MAC/VLAN filtering on a VF by a VM.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

be2net: fix MAC address modification for VF

Currently, the VFs by default don't have the privilege to modify MAC address.
This will change in a subsequent fix wherein VFs will have the ability to
modify MAC/VLAN filters.

Fix be_mac_addr_set() logic to support MAC address modification on a
privileged VF too.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: Add support for r8a7790 SoC

This is a copy of support for r8a7778/9 with the .rmiimode mode bit
of struct sh_eth_cpu_data set.

Also update R8A7779 to R8A777x.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

sh_eth: add support for RMIIMODE register

This register is prsent on the r8a7790 SoC.

Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv6 eliminate parameter "int addrlen" in function fib6_add_1

The "int addrlen" in fib6_add_1 is rebundant, as we can get it from
parameter "struct in6_addr *addr" once we modified its type.
And also fix some coding style issues in fib6_add_1

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net ipv6: Remove rebundant rt6i_nsiblings initialization

Seting rt->rt6i_nsiblings to zero is rebundant, because above memset
zeroed the rest of rt excluding the first dst memember.

Signed-off-by: Fan Du <fan.du@windriver.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

vti: switch to new ip tunnel code

GRE tunnel and IPIP tunnel already switched to the new
ip tunnel code, VTI tunnel can use it too.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Saurabh Mohan <saurabh.mohan@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip6mr: change the prototype of ip6_mr_forward().

This patch changes the prototpye of the ip6_mr_forward() method to return void
instead of int.

The ip6_mr_forward() method always returns 0; moreover, the return value of this
method is not checked anywhere.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipmr: change the prototype of ip_mr_forward().

This patch changes the prototpye of the ip_mr_forward() method to return void
instead of int.

The ip_mr_forward() method always returns 0; moreover, the return value of this
method is not checked anywhere.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'team' ("add support for peer notifications and igmp rejoins for team")

Jiri Pirko says:

====================
The middle patch adjusts core infrastructure so the bonding code can be
generalized and reused by team.

v1->v2: using msecs_to_jiffies() as suggested by Eric

Jiri Pirko (3):
  team: add peer notification
  net: convert resend IGMP to notifier event
  team: add support for sending multicast rejoins
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

team: add support for sending multicast rejoins

Similar to what is implemented in bonding. User is able to ask team
driver to send IGMP rejoins in case port is enabled or disabled. Using
previously introduced netdev notifier.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: convert resend IGMP to notifier event

Until now, bond_resend_igmp_join_requests() looks for vlans attached to
bonding device, bridge where bonding act as port manually. It does not
care of other scenarios, like stacked bonds or team device above. Make
this more generic and use netdev notifier to propagate the event to
upper devices and to actually call ip_mc_rejoin_groups().

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: add peer notification

When port is enabled or disabled, allow to notify peers by unsolicitated
NAs or gratuitous ARPs. Disabled by default.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvlan fdb replace support

Add support for iproute2 command 'bridge fdb replace ...'.
The rtnletlink call back function ndo_fdb_add will be called
with the NLM_F_REPLACE flag set.
Simply return -EOPNOTSUP.

Resubmitted because net-next was closed last week.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan fdb replace an existing entry

Add support to replace an existing entry found in the
vxlan fdb database. The entry in question is identified
by its unicast mac address and the destination information
is changed. If the entry is not found, it is added in the
forwarding database. This is similar to changing an entry
in the neighbour table.

Multicast mac addresses can not be changed with the replace
option.

This is useful for virtual machine migration when the
destination of a target virtual machine changes. The replace
feature can be used instead of delete followed by add.

Resubmitted because net-next was closed last week.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tcp'

Yuchung Cheng says:

====================
This patch series improve RTT sampling in three ways:
1. Sample RTT during fast recovery and reordering events.
2. Favor ack-based RTT to timestamps because of broken TS ECR fields
3. Consolidate the RTT measurement logic.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: use RTT from SACK for RTO

If RTT is not available because Karn's check has failed or no
new packet is acked, use the RTT measured from SACK to estimate
the RTO. The sender can continue to estimate the RTO during loss
recovery or reordering event upon receiving non-partial ACKs.

This also changes when the RTO is re-armed. Previously it is
only re-armed when some data is cummulatively acknowledged (i.e.,
SND.UNA advances), but now it is re-armed whenever RTT estimator
is updated. This feature is particularly useful to reduce spurious
timeout for buffer bloat including cellular carriers [1], and
RTT estimation on reordering events.

[1] "An In-depth Study of LTE: Effect of Network Protocol and
Application Behavior on Performance", In Proc. of SIGCOMM 2013

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: measure RTT from new SACK

Take RTT sample if an ACK selectively acks some sequences that
have never been retransmitted. The Karn's algorithm does not apply
even if that ACK (s)acks other retransmitted sequences, because it
must been generated by an original but perhaps out-of-order packet.
There is no ambiguity. In case when multiple blocks are newly
sacked because of ACK losses the earliest block is used to
measure RTT, similar to cummulative ACKs.

Such RTT samples allow the sender to estimate the RTO during loss
recovery and packet reordering events. It is still useful even with
TCP timestamps. That's because during these events the SND.UNA may
not advance preventing RTT samples from TS ECR (thus the FLAG_ACKED
check before calling tcp_ack_update_rtt()). Therefore this new
RTT source is complementary to existing ACK and TS RTT mechanisms.

This patch does not update the RTO. It is done in the next patch.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: prefer packet timing to TS-ECR for RTT

Prefer packet timings to TS-ecr for RTT measurements when both
sources are available. That's because broken middle-boxes and remote
peer can return packets with corrupted TS ECR fields. Similarly most
congestion controls that require RTT signals favor timing-based
sources as well. Also check for bad TS ECR values to avoid RTT
blow-ups. It has happened on production Web servers.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: consolidate SYNACK RTT sampling

The first patch consolidates SYNACK and other RTT measurement to use a
central function tcp_ack_update_rtt(). A (small) bonus is now SYNACK
RTT measurement happens after PAWS check, potentially reducing the
impact of RTO seeding on bad TCP timestamps values.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'fec'

Fabio Estevam says:

====================
This series improves clock handling in the driver by not enabling/disabling
the optional ptp and enet_out clocks unconditionally, check for the return value
of clk_prepare_enable and also handle clk_ptp in suspend/resume.

Remove an unneeded check in platform_get_resource() and also use
devm_request_irq() that can help to simplify the code.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Frank Li <lznuaa@gmail.com>

fec: Use devm_request_irq()

Using devm_request_irq() can make the code smaller and cleaner.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fec: Remove unneeded check in platform_get_resource()

As devm_ioremap_resource() is used, there is no need to explicitely check the
return value from platform_get_resource(), as this is something that
devm_ioremap_resource() takes care by itself.

Also, place platform_get_resource() prior to devm_ioremap_resource() for
better code readability.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fec: Check the return value from clk_prepare_enable()

clk_prepare_enable() may fail, so let's check its return value and propagate it
in the case of error.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fec: Enable/disable clk_ptp in suspend/resume

clk_ptp should also be enabled in fec_resume() and disabled in fec_suspend().

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fec: Fix the order for enabling/disabling the clocks

On fec_probe the clocks are enabled in the following order:

clk_ahb -> clk_ipg -> clk_enet_out -> clk_ptp

, so in the error and remove paths we should disabled them in the opposite
order.

Also fix the order in the suspend/resume functions.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

fec: Do not enable/disable optional clocks unconditionally

clk_enet_out and clk_ptp are optional clocks, so we should not enable/disable
them unconditionally.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tun: Support software transmit time stamping.

This patch adds transmit time stamping to the tun/tap driver. Similar
support already exists for UDP, can, and raw packets.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Provide a generic socket error queue delivery method for Tx time stamps.

This patch moves the private error queue delivery function from the
af_packet code to the core socket method. In this way, network layers
only needing the error queue for transmit time stamping can share common
code.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/velocity: add poll controller function for velocity nic

Add poll controller function for velocity nic.

Signed-off-by: Amit Uttamchandani <auttamchandani@logicube.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: replace sum of bitmasks with OR operation.

Suggested by coccinelle and manually verified.

Signed-off-by: Alexandru Juncu <alexj@rosedu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/irda: fixed style issues in irttp

Applied error fixes suggested by checpatch.pl

Signed-off-by: Dragos Foianu <dragos.foianu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:
"A couple interesting SKB fragment handling fixes, plus the usual small
  bits here and there:

   1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from
      Tim Gardner.

   2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC
      address printing helper.

   3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the
      device can't do checksumming offloads then it shouldn't say it can
      do SG either.  From Haiyang Zhang.

   4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens.

   5) Don't leak DMA mappings on mapping failures, from Neil Horman.

   6) We need to reset the transport header of SKBs in ipv4 before we
      attempt to perform early socket demux, just like ipv6 does.  From
      Eric Dumazet.

   7) Add missing locking on vxlan device removal, from Stephen
      Hemminger.

   8) xen-netfront has to make two passes over an SKB to prepare it for
      transfer.  One pass calculates the number of slots needed, the
      second massages the SKB and fills the slots.  Unfortunately, the
      first pass doesn't calculate the number of slots properly so we
      can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't
      work out so well.  Fix from Jan Beulich with help and discussion
      with several others.

   9) Fix a similar problem in tun and macvtap, which have to split up
      scatter-gather elements at PAGE_SIZE boundaries.  Don't do
      zerocopy if it would result in a > MAX_SKB_FRAGS skb.  Fixes from
      Jason Wang.

  10) On receive, once we've decoded the VLAN state completely, clear
      skb->vlan_tci.  Otherwise demuxed tunnels underneath can trigger
      the VLAN code again, corrupting the packet.  Fix from Eric
      Dumazet"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  vlan: fix a race in egress prio management
  vlan: mask vlan prio bits
  macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
  pkt_sched: sch_qfq: remove a source of high packet delay/jitter
  xen-netfront: pull on receive skb may need to happen earlier
  vxlan: add necessary locking on device removal
  hyperv: Fix the NETIF_F_SG flag setting in netvsc
  net: Fix sysfs_format_mac() code duplication.
  be2net: Fix to avoid hardware workaround when not needed
  macvtap: do not assume 802.1Q when send vlan packets
  macvtap: fix the missing ret value of TUNSETQUEUE
  ipv4: set transport header earlier
  mlx5 core: Fix __udivdi3 when compiling for 32 bit arches
  bgmac: add dependency to phylib
  net/irda: fixed style issues in irlan_eth
  ethtool: fixed trailing statements in ethtool
  ndisc: bool initializations should use true and false
  atl1e: unmap partially mapped skb on dma error and free skb

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
"Trying again to get the fixes queue, including the fixed IDT alignment
  patch.

  The UEFI patch is by far the biggest issue at hand: it is currently
  causing quite a few machines to boot.  Which is sad, because the only
  reason they would is because their BIOSes touch memory that has
  already been freed.  The other major issue is that we finally have
  tracked down the root cause of a significant number of machines
  failing to suspend/resume"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Make sure IDT is page aligned
  x86, suspend: Handle CPUs which fail to #GP on RDMSR
  x86/platform/ce4100: Add header file for reboot type
  Revert "UEFI: Don't pass boot services regions to SetVirtualAddressMap()"
  efivars: check for EFI_RUNTIME_SERVICES

Merge tag 'md-3.11-fixes' of git://neil.brown.name/md

Pull md bug fixes from NeilBrown:
"Sorry boss, back at work now boss.  Here's them nice shiny patches ya
  wanted.  All nicely tagged and justified for -stable and everyfing:

  Three bug fixes for md in 3.10

  3.10 wasn't a good release for md.  The bio changes left a couple of
  bugs, and an md "fix" created another one.

  These three patches appear to fix the issues and have been tagged for
  -stable"

* tag 'md-3.11-fixes' of git://neil.brown.name/md:
  md/raid1: fix bio handling problems in process_checks()
  md: Remove recent change which allows devices to skip recovery.
  md/raid10: fix two problems with RAID10 resync.

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"You'll be terribly disappointed in this, I'm not trying to sneak any
  features in or anything, its mostly radeon and intel fixes, a couple
  of ARM driver fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (34 commits)
  drm/radeon/dpm: add debugfs support for RS780/RS880 (v3)
  drm/radeon/dpm/atom: fix broken gcc harder
  drm/radeon/dpm/atom: restructure logic to work around a compiler bug
  drm/radeon/dpm: fix atom vram table parsing
  drm/radeon: fix an endian bug in atom table parsing
  drm/radeon: add a module parameter to disable aspm
  drm/rcar-du: Use the GEM PRIME helpers
  drm/shmobile: Use the GEM PRIME helpers
  uvesafb: Really allow mtrr being 0, as documented and warn()ed
  radeon kms: do not flush uninitialized hotplug work
  drm/radeon/dpm/sumo: handle boost states properly when forcing a perf level
  drm/radeon: align VM PTBs (Page Table Blocks) to 32K
  drm/radeon: allow selection of alignment in the sub-allocator
  drm/radeon: never unpin UVD bo v3
  drm/radeon: fix UVD fence emit
  drm/radeon: add fault decode function for CIK
  drm/radeon: add fault decode function for SI (v2)
  drm/radeon: add fault decode function for cayman/TN (v2)
  drm/radeon: use radeon device for request firmware
  drm/radeon: add missing ttm_eu_backoff_reservation to radeon_bo_list_validate
  ...

vlan: fix a race in egress prio management

egress_priority_map[] hash table updates are protected by rtnl,
and we never remove elements until device is dismantled.

We have to make sure that before inserting an new element in hash table,
all its fields are committed to memory or else another cpu could
find corrupt values and crash.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

vlan: mask vlan prio bits

In commit 48cc32d38a52d0b68f91a171a8d00531edc6a46e
("vlan: don't deliver frames for unknown vlans to protocols")
Florian made sure we set pkt_type to PACKET_OTHERHOST
if the vlan id is set and we could find a vlan device for this
particular id.

But we also have a problem if prio bits are set.

Steinar reported an issue on a router receiving IPv6 frames with a
vlan tag of 4000 (id 0, prio 2), and tunneled into a sit device,
because skb->vlan_tci is set.

Forwarded frame is completely corrupted : We can see (8100:4000)
being inserted in the middle of IPv6 source address :

16:48:00.780413 IP6 2001:16d8:8100:4000:ee1c:0:9d9:bc87 >
9f94:4d95:2001:67c:29f4::: ICMP6, unknown icmp6 type (0), length 64
       0x0000:  0000 0029 8000 c7c3 7103 0001 a0ae e651
       0x0010:  0000 0000 ccce 0b00 0000 0000 1011 1213
       0x0020:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
       0x0030:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233

It seems we are not really ready to properly cope with this right now.

We can probably do better in future kernels :
vlan_get_ingress_priority() should be a netdev property instead of
a per vlan_dev one.

For stable kernels, lets clear vlan_tci to fix the bugs.

Reported-by: Steinar H. Gunderson <sesse@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS

We try to linearize part of the skb when the number of iov is greater than
MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
network.

Solve this problem by calculate the pages needed for iov before trying to do
zerocopy and switch to use copy instead of zerocopy if it needs more than
MAX_SKB_FRAGS.

This is done through introducing a new helper to count the pages for iov, and
call uarg->callback() manually when switching from zerocopy to copy to notify
vhost.

We can do further optimization on top.

This bug were introduced from b92946e2919134ebe2a4083e4302236295ea2a73
(macvtap: zerocopy: validate vectors before building skb).

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS

We try to linearize part of the skb when the number of iov is greater than
MAX_SKB_FRAGS. This is not enough since each single vector may occupy more than
one pages, so zerocopy_sg_fromiovec() may still fail and may break the guest
network.

Solve this problem by calculate the pages needed for iov before trying to do
zerocopy and switch to use copy instead of zerocopy if it needs more than
MAX_SKB_FRAGS.

This is done through introducing a new helper to count the pages for iov, and
call uarg->callback() manually when switching from zerocopy to copy to notify
vhost.

We can do further optimization on top.

The bug were introduced from commit 0690899b4d4501b3505be069b9a687e68ccbe15b
(tun: experimental zero copy tx support)

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pkt_sched: sch_qfq: remove a source of high packet delay/jitter

QFQ+ inherits from QFQ a design choice that may cause a high packet
delay/jitter and a severe short-term unfairness. As QFQ, QFQ+ uses a
special quantity, the system virtual time, to track the service
provided by the ideal system it approximates. When a packet is
dequeued, this quantity must be incremented by the size of the packet,
divided by the sum of the weights of the aggregates waiting to be
served. Tracking this sum correctly is a non-trivial task, because, to
preserve tight service guarantees, the decrement of this sum must be
delayed in a special way [1]: this sum can be decremented only after
that its value would decrease also in the ideal system approximated by
QFQ+. For efficiency, QFQ+ keeps track only of the 'instantaneous'
weight sum, increased and decreased immediately as the weight of an
aggregate changes, and as an aggregate is created or destroyed (which,
in its turn, happens as a consequence of some class being
created/destroyed/changed). However, to avoid the problems caused to
service guarantees by these immediate decreases, QFQ+ increments the
system virtual time using the maximum value allowed for the weight
sum, 2^10, in place of the dynamic, instantaneous value. The
instantaneous value of the weight sum is used only to check whether a
request of weight increase or a class creation can be satisfied.

Unfortunately, the problems caused by this choice are worse than the
temporary degradation of the service guarantees that may occur, when a
class is changed or destroyed, if the instantaneous value of the
weight sum was used to update the system virtual time. In fact, the
fraction of the link bandwidth guaranteed by QFQ+ to each aggregate is
equal to the ratio between the weight of the aggregate and the sum of
the weights of the competing aggregates. The packet delay guaranteed
to the aggregate is instead inversely proportional to the guaranteed
bandwidth. By using the maximum possible value, and not the actual
value of the weight sum, QFQ+ provides each aggregate with the worst
possible service guarantees, and not with service guarantees related
to the actual set of competing aggregates. To see the consequences of
this fact, consider the following simple example.

Suppose that only the following aggregates are backlogged, i.e., that
only the classes in the following aggregates have packets to transmit:
one aggregate with weight 10, say A, and ten aggregates with weight 1,
say B1, B2, ..., B10. In particular, suppose that these aggregates are
always backlogged. Given the weight distribution, the smoothest and
fairest service order would be:
A B1 A B2 A B3 A B4 A B5 A B6 A B7 A B8 A B9 A B10 A B1 A B2 ...

QFQ+ would provide exactly this optimal service if it used the actual
value for the weight sum instead of the maximum possible value, i.e.,
11 instead of 2^10. In contrast, since QFQ+ uses the latter value, it
serves aggregates as follows (easy to prove and to reproduce
experimentally):
A B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 A A A A A A A A A A B1 B2 ... B10 A A ...

By replacing 10 with N in the above example, and by increasing N, one
can increase at will the maximum packet delay and the jitter
experienced by the classes in aggregate A.

This patch addresses this issue by just using the above
'instantaneous' value of the weight sum, instead of the maximum
possible value, when updating the system virtual time. After the
instantaneous weight sum is decreased, QFQ+ may deviate from the ideal
service for a time interval in the order of the time to serve one
maximum-size packet for each backlogged class. The worst-case extent
of the deviation exhibited by QFQ+ during this time interval [1] is
basically the same as of the deviation described above (but, without
this patch, QFQ+ suffers from such a deviation all the time). Finally,
this patch modifies the comment to the function qfq_slot_insert, to
make it coherent with the fact that the weight sum used by QFQ+ can
now be lower than the maximum possible value.

[1] P. Valente, "Extending WF2Q+ to support a dynamic traffic mix",
Proceedings of AAA-IDEA'05, June 2005.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core patches from Greg KH:
"Here are some driver core patches for 3.11-rc2.  They aren't really
  bugfixes, but a bunch of new helper macros for drivers to properly
  create attribute groups, which drivers and subsystems need to fix up a
  ton of race issues with incorrectly creating sysfs files (binary and
  normal) after userspace has been told that the device is present.

  Also here is the ability to create binary files as attribute groups,
  to solve that race condition, which was impossible to do before this,
  so that's my fault the drivers were broken.

  The majority of the .c changes is indenting and moving code around a
  bit.  It affects no existing code, but allows the large backlog of 70+
  patches that I already have created to start flowing into the
  different subtrees, instead of having to live in my driver-core tree,
  causing merge nightmares in linux-next for the next few months.

  These were finalized too late for the -rc1 merge window, which is why
  they were didn't make that pull request, testing and review from
  others didn't happen until a few weeks ago, and then there's the whole
  distraction of the past few days, which prevented these from getting
  to you sooner, sorry about that.

  Oh, and there's a bugfix for the documentation build warning in here
  as well.  All of these have been in linux-next this week, with no
  reported problems"

* tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  driver-core: fix new kernel-doc warning in base/platform.c
  sysfs: use file mode defines from stat.h
  sysfs: add more helper macro's for (bin_)attribute(_groups)
  driver core: add default groups to struct class
  driver core: Introduce device_create_groups
  sysfs: prevent warning when only using binary attributes
  sysfs: add support for binary attributes in groups
  driver core: device.h: add RW and RO attribute macros
  sysfs.h: add BIN_ATTR macro
  sysfs.h: add ATTRIBUTE_GROUPS() macro
  sysfs.h: add __ATTR_RW() macro

Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fix from Guenter Roeck:
"Single patch to staticize a local variable"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (abx500) Staticize abx500_temp_attributes

Merge branch 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux

Pull phase two of __cpuinit removal from Paul Gortmaker:
"With the __cpuinit infrastructure removed earlier, this group of
  commits only removes the function/data tagging that was done with the
  various (now no-op) __cpuinit related prefixes.

  Now that the dust has settled with yesterday's v3.11-rc1, there
  hopefully shouldn't be any new users leaking back in tree, but I think
  we can leave the harmless no-op stubs there for a release as a
  courtesy to those who still have out of tree stuff and weren't paying
  attention.

  Although the commits are against the recent tag to allow for minor
  context refreshes for things like yesterday's v3.11-rc1~ slab content,
  the patches have been largely unchanged for weeks, aside from such
  trivial updates.

  For detail junkies, the largely boring and mostly irrelevant history
  of the patches can be viewed at:

    http://git.kernel.org/cgit/linux/kernel/git/paulg/cpuinit-delete.git

  If nothing else, I guess it does at least demonstrate the level of
  involvement required to shepherd such a treewide change to completion.

  This is the same repository of patches that has been applied to the
  end of the daily linux-next branches for the past several weeks"

* 'cpuinit_phase2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (28 commits)
  block: delete __cpuinit usage from all block files
  drivers: delete __cpuinit usage from all remaining drivers files
  kernel: delete __cpuinit usage from all core kernel files
  rcu: delete __cpuinit usage from all rcu files
  net: delete __cpuinit usage from all net files
  acpi: delete __cpuinit usage from all acpi files
  hwmon: delete __cpuinit usage from all hwmon files
  cpufreq: delete __cpuinit usage from all cpufreq files
  clocksource+irqchip: delete __cpuinit usage from all related files
  x86: delete __cpuinit usage from all x86 files
  score: delete __cpuinit usage from all score files
  xtensa: delete __cpuinit usage from all xtensa files
  openrisc: delete __cpuinit usage from all openrisc files
  m32r: delete __cpuinit usage from all m32r files
  hexagon: delete __cpuinit usage from all hexagon files
  frv: delete __cpuinit usage from all frv files
  cris: delete __cpuinit usage from all cris files
  metag: delete __cpuinit usage from all metag files
  tile: delete __cpuinit usage from all tile files
  sh: delete __cpuinit usage from all sh files
  ...

Merge tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Except for a slightly big OMAP changes, all rest are small, mostly
  boring changes; all either 3.11 regression fixes or stable materials.

   - ASoC OMAP fixes due to non-DT OMAP4 removals
   - Other ASoC driver changes (sglt5000, wm8978, wm8948, samsung)
   - Fix missing locking for snd_pcm_stop() calls in many drivers
   - Fix the blocking request_module() in OSS sequencer
   - Fix old OSS vwsnd driver builds
   - Add a new HD-audio HDMI codec ID"

* tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (23 commits)
  ALSA: seq-oss: Initialize MIDI clients asynchronously
  ALSA: hda - Add new GPU codec ID to snd-hda
  staging: line6: Fix unlocked snd_pcm_stop() call
  [media] saa7134: Fix unlocked snd_pcm_stop() call
  ASoC: s6000: Fix unlocked snd_pcm_stop() call
  ASoC: atmel: Fix unlocked snd_pcm_stop() call
  ALSA: pxa2xx: Fix unlocked snd_pcm_stop() call
  ALSA: usx2y: Fix unlocked snd_pcm_stop() call
  ALSA: ua101: Fix unlocked snd_pcm_stop() call
  ALSA: 6fire: Fix unlocked snd_pcm_stop() call
  ALSA: atiixp: Fix unlocked snd_pcm_stop() call
  ALSA: asihpi: Fix unlocked snd_pcm_stop() call
  sound: oss/vwsnd: Always define vwsnd_mutex
  sound: oss/vwsnd: Add missing inclusion of linux/delay.h
  ASoC: wm8978: enable symmetric rates
  ASoC: omap-mcbsp: Use different method for DMA request when booted with DT
  ASoC: omap-dmic: Do not use platform_get_resource_byname() for DMA
  ASoC: omap-mcpdm: Do not use platform_get_resource_byname() for DMA
  ASoC: omap-pcm: Request the DMA channel differently when DT is involved
  ASoC: Samsung: Set RFS and BFS in slave mode
  ...

Merge branch 'drm/3.11/fixes' of git://linuxtv.org/pinchartl/fbdev into drm-fixes

Fixes builds
* 'drm/3.11/fixes' of git://linuxtv.org/pinchartl/fbdev:
drm/rcar-du: Use the GEM PRIME helpers
drm/shmobile: Use the GEM PRIME helpers

md/raid1: fix bio handling problems in process_checks()

Recent change to use bio_copy_data() in raid1 when repairing
an array is faulty.

The underlying may have changed the bio in various ways using
bio_advance and these need to be undone not just for the 'sbio' which
is being copied to, but also the 'pbio' (primary) which is being
copied from.

So perform the reset on all bios that were read from and do it early.

This also ensure that the sbio->bi_io_vec[j].bv_len passed to
memcmp is correct.

This fixes a crash during a 'check' of a RAID1 array. The crash was
introduced in 3.10 so this is suitable for 3.10-stable.

Cc: stable@vger.kernel.org (3.10)
Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md: Remove recent change which allows devices to skip recovery.

commit 7ceb17e87bde79d285a8b988cfed9eaeebe60b86
    md: Allow devices to be re-added to a read-only array.

allowed a bit more than just that.  It also allows devices to be added
to a read-write array and to end up skipping recovery.

This patch removes the offending piece of code pending a rewrite for a
subsequent release.

More specifically:
If the array has a bitmap, then the device will still need a bitmap
based resync ('saved_raid_disk' is set under different conditions
is a bitmap is present).
If the array doesn't have a bitmap, then this is correct as long as
nothing has been written to the array since the metadata was checked
by ->validate_super.  However there is no locking to ensure that there
was no write.

Bug was introduced in 3.10 and causes data corruption so
patch is suitable for 3.10-stable.

Cc: stable@vger.kernel.org (3.10)
Reported-by: Joe Lawrence <joe.lawrence@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>

md/raid10: fix two problems with RAID10 resync.

1/ When an different between blocks is found, data is copied from
   one bio to the other.  However bv_len is used as the length to
   copy and this could be zero.  So use r10_bio->sectors to calculate
   length instead.
   Using bv_len was probably always a bit dubious, but the introduction
   of bio_advance made it much more likely to be a problem.

2/ When preparing some blocks for sync, we don't set BIO_UPTODATE
   except on bios that we schedule for a read.  This ensures that
   missing/failed devices don't confuse the loop at the top of
   sync_request write.
   Commit 8be185f2c9d54d6 "raid10: Use bio_reset()"
   removed a loop which set BIO_UPTDATE on all appropriate bios.
   So we need to re-add that flag.

These bugs were introduced in 3.10, so this patch is suitable for
3.10-stable, and can remove a potential for data corruption.

Cc: stable@vger.kernel.org (3.10)
Reported-by: Brassow Jonathan <jbrassow@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.de>

Merge branch 'drm-fixes-3.11' of git://people.freedesktop.org/~agd5f/linux

more DPM fixes for radeon.

* 'drm-fixes-3.11' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon/dpm: add debugfs support for RS780/RS880 (v3)
  drm/radeon/dpm/atom: fix broken gcc harder
  drm/radeon/dpm/atom: restructure logic to work around a compiler bug
  drm/radeon/dpm: fix atom vram table parsing
  drm/radeon: fix an endian bug in atom table parsing
  drm/radeon: add a module parameter to disable aspm

drm/radeon/dpm: add debugfs support for RS780/RS880 (v3)

This allows you to look at the current DPM state via
debugfs.

Due to the way the hardware works on these asics, there's
no way to look up exactly what power state we are in, so
we make the best guess we can based on the current sclk.

v2: Anthoine's version
v3: fix ref div

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Merge branch 'for-3.11' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfixes from Bruce Fields:
"Just three minor bugfixes"

* 'for-3.11' of git://linux-nfs.org/~bfields/linux:
  svcrdma: underflow issue in decode_write_list()
  nfsd4: fix minorversion support interface
  lockd: protect nlm_blocked access in nlmsvc_retry_blocked

drm/radeon/dpm/atom: fix broken gcc harder

See bugs:
https://bugs.freedesktop.org/show_bug.cgi?id=66932
https://bugs.freedesktop.org/show_bug.cgi?id=66972
https://bugs.freedesktop.org/show_bug.cgi?id=66945

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

xen-netfront: pull on receive skb may need to happen earlier

Due to commit 3683243b ("xen-netfront: use __pskb_pull_tail to ensure
linear area is big enough on RX") xennet_fill_frags() may end up
filling MAX_SKB_FRAGS + 1 fragments in a receive skb, and only reduce
the fragment count subsequently via __pskb_pull_tail(). That's a
result of xennet_get_responses() allowing a maximum of one more slot to
be consumed (and intermediately transformed into a fragment) if the
head slot has a size less than or equal to RX_COPY_THRESHOLD.

Hence we need to adjust xennet_fill_frags() to pull earlier if we
reached the maximum fragment count - due to the described behavior of
xennet_get_responses() this guarantees that at least the first fragment
will get completely consumed, and hence the fragment count reduced.

In order to not needlessly call __pskb_pull_tail() twice, make the
original call conditional upon the pull target not having been reached
yet, and defer the newly added one as much as possible (an alternative
would have been to always call the function right before the call to
xennet_fill_frags(), but that would imply more frequent cases of
needing to call it twice).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: stable@vger.kernel.org (3.6 onwards)
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: add necessary locking on device removal

The socket management is now done in workqueue (outside of RTNL)
and protected by vn->sock_lock. There were two possible bugs, first
the vxlan device was removed from the VNI hash table per socket without
holding lock. And there was a race when device is created and the workqueue
could run after deletion.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

drm/radeon/dpm/atom: restructure logic to work around a compiler bug

It seems gcc 4.8.1 generates bogus code for the old logic causing
part of the function to get skipped.

Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=66932
https://bugs.freedesktop.org/show_bug.cgi?id=66972
https://bugs.freedesktop.org/show_bug.cgi?id=66945

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon/dpm: fix atom vram table parsing

Parsing the table in incorrectly led to problems with
certain asics with mclk switching.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: fix an endian bug in atom table parsing

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: add a module parameter to disable aspm

Can cause hangs when enabled in certain motherboards.
Set radeon.aspm=0 to disable aspm.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/rcar-du: Use the GEM PRIME helpers

The GEM CMA PRIME import/export helpers have been removed in favor of
generic GEM PRIME helpers with GEM CMA low-level operations. Fix the
driver accordingly.

Reported-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Mark Brown <broonie@linaro.org>

drm/shmobile: Use the GEM PRIME helpers

The GEM CMA PRIME import/export helpers have been removed in favor of
generic GEM PRIME helpers with GEM CMA low-level operations. Fix the
driver accordingly.

Reported-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Mark Brown <broonie@linaro.org>

ALSA: seq-oss: Initialize MIDI clients asynchronously

We've got bug reports that the module loading stuck on Debian system
with 3.10 kernel.  The debugging session revealed that the initial
registration of OSS sequencer clients stuck at module loading time,
which involves again with request_module() at the init phase.  This is
triggered only by special --install stuff Debian is using, but it's
still not good to have such loops.

As a workaround, call the registration part asynchronously.  This is a
better approach irrespective of the hang fix, in anyway.

Reported-and-tested-by: Philipp Matthias Hahn <pmhahn@pmhahn.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

hyperv: Fix the NETIF_F_SG flag setting in netvsc

SG mode is not currently supported by netvsc, so remove this flag for now.
Otherwise, it will be unconditionally enabled by commit ec5f0615642
"Kill link between CSUM and SG features"
Previously, the SG feature is disabled because CSUM is not set here.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

driver-core: fix new kernel-doc warning in base/platform.c

Fix new kernel-doc warning in drivers/base/platform.c:

Warning(drivers/base/platform.c:528): No description found for parameter 'owner'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

linked-list: Remove __list_for_each

__list_for_each used to be the non prefetch() aware list walking
primitive. When we removed the prefetch macros from the list routines,
it became redundant. Given it does exactly the same thing as
list_for_each now, we might as well remove it and call list_for_each
directly.

All users of __list_for_each have been converted to list_for_each calls
in the current merge window.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

net: Fix sysfs_format_mac() code duplication.

It's just a duplicate implementation of "%*phC". Thanks to Joe
Perches for showing that we had exactly this support in the
lib/vsprintf.c code already.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'drm-intel-fixes-2013-07-11' of git://people.freedesktop.org/~danvet/drm-intel

One feature latecomer, I've forgotten to merge the patch to reeanble the
Haswell power well feature now that the audio interaction is fixed up.
Since that was the only unfixed issue with it I've figured I could throw
it in a bit late, and it's trivial to revert in case I'm wrong.

Otherwise all bug/regression fixes:
- Fix status page reinit after gpu hangs, spotted by more paranoid igt
  checks.
- Fix object list walking fumble regression in the shrinker (only the
  counting part, the actual shrinking code was correct so no Oops
  potential), from Xiong Zhang.
- Fix DP 1.2 bw limits (Imre).
- Restore legacy forcewake on ivb, too many broken biosen out there. We
  dump a warn though that recent userspace might fall over with that
  config (Guenter Roeck).
- Patch up the gen2 cs tlb w/a.
- Improve the fence coherency w/a now that we have a better understanding
  what's going on. The removed wbinvd+ipi should make -rt folks happy. Big
  thanks to Jon Bloomfield for figuring this out, patches from Chris.
- Fix write-read race when switching ring (Chris). Spotted with code
  inspection, but now we also have an igt for it.

There's an ugly regression we're still working on introduced between
3.10-rc7 and 3.10.0. Unfortunately we can't just revert the offender since
that one fixes another regression :( I've asked Steven to include my
-fixes branch into linux-next to prevent such fallout in the future,
hopefully.

* tag 'drm-intel-fixes-2013-07-11' of git://people.freedesktop.org/~danvet/drm-intel:
  Revert "drm/i915: Workaround incoherence between fences and LLC across multiple CPUs"
  drm/i915: Fix incoherence with fence updates on Sandybridge+
  drm/i915: Fix write-read race with multiple rings
  Partially revert "drm/i915: unconditionally use mt forcewake on hsw/ivb"
  drm/i915: fix lane bandwidth capping for DP 1.2 sinks
  drm/i915: fix up ring cleanup for the i830/i845 CS tlb w/a
  drm/i915: Correct obj->mm_list link to dev_priv->dev_priv->mm.inactive_list
  drm/i915: switch disable_power_well default value to 1
  drm/i915: reinit status page registers after gpu reset