git.karo-electronics.de Git - linux-beck.git/log

ipv4: rcu cleanup in ip_ra_control()

Remove one sparse warning :
net/ipv4/ip_sockglue.c:328:22: warning: incorrect type in assignment (different address spaces)
net/ipv4/ip_sockglue.c:328:22: expected struct ip_ra_chain [noderef] <asn:4>*next
net/ipv4/ip_sockglue.c:328:22: got struct ip_ra_chain *[assigned] ra

And replace one rcu_assign_ptr() by RCU_INIT_POINTER() where applicable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: mcast: remove dead debugging defines

It's not used anywhere, so just remove these.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

irda: vlsi_ir: use %*ph specifier

Instead of looping in the code let's use kernel extension to dump small
buffers.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: use usleep_range

Replace mdelay with usleep_range to avoid busy loop.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-timestamp: optimize sock_tx_timestamp default path

Few packets have timestamping enabled. Exit sock_tx_timestamp quickly
in this common case.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: sfq: remove unused macro

not used anymore since ddecf0f
(net_sched: sfq: add optional RED on top of SFQ).

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: Convert the normal transmit complete path to dev_consume_skb_any()

Convert the normal transmit completion path from dev_kfree_skb_any()
to dev_consume_skb_any() to help keep dropped packet profiling
meaningful.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bond_lock_removal'

Nikolay Aleksandrov says:

====================
bonding: get rid of bond->lock

This patch-set removes the last users of bond->lock and converts the places
that needed it for sync to use curr_slave_lock or RCU as appropriate.
I've run this with lockdep and have stress-tested it via loading/unloading
and enslaving/releasing in parallel while outputting bond's proc, I didn't
see any issues. Please pay special attention to the procfs change, I've
done about an hour of stress-testing on it and have checked that the event
that causes the bonding to delete its proc entry (NETDEV_UNREGISTER) is
called before ndo_uninit() and the freeing of the dev so any readers will
sync with that. Also ran sparse checks and there were no splats.

v2: Add patch 0001/cxgb4 bond->lock removal, RTNL should be held in the
notifier call, the other patches are the same. Also tested with
allmodconfig to make sure there're no more users of bond->lock.
Changes from the RFC:
use RCU in procfs instead of RTNL since RTNL might lead to a deadlock with
unloading and also is much slower. The bond destruction syncs with proc
via the proc locks. There's one new patch that converts primary_slave to
use RCU as it was necessary to fix a longstanding bugs in sysfs and
procfs and to make it easy to migrate bond's procfs to RCU. And of course
rebased on top of net-next current.

This is the first patch-set in a series that should simplify the bond's
locking requirements and will make it easier to define the locking
conditions necessary for the various paths. The goal is to rely on RTNL
and rcu alone, an extra lock would be needed in a few special cases that
would be documented very well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: remove last users of bond->lock and bond->lock itself

The usage of bond->lock in bond_main.c was completely unnecessary as it
didn't help to sync with anything, most of the spots already had RTNL.
Since there're no more users of bond->lock, remove it.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: options: remove bond->lock usage

We're safe to remove the bond->lock use from the arp targets because
arp_rcv_probe no longer acquires bond->lock, only rcu_read_lock.
Also setting the primary slave is safe because noone uses the bond->lock
as a syncing mechanism for that anymore.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: procfs: clean bond->lock usage and use RCU

Use RCU to protect against slave release, the proc show function will sync
with the bond destruction by the proc locks and the fact that the bond is
released after NETDEV_UNREGISTER which causes the bonding to remove the
proc entry.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: convert primary_slave to use RCU

This is necessary mainly for two bonding call sites: procfs and
sysfs as it was dereferenced without any real protection.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: alb: clean bond->lock

We can remove the lock/unlock as it's no longer necessary since
RTNL should be held while calling bond_alb_set_mac_address().

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: 3ad: use curr_slave_lock instead of bond->lock

In 3ad mode the only syncing needed by bond->lock is for the wq
and the recv handler, so change them to use curr_slave_lock.
There're no locking dependencies here as 3ad doesn't use
curr_slave_lock at all.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: remove bond->lock

RTNL should be already held in the notifier call so the slave list can
be traversed without a problem, remove the unnecessary bond->lock.

CC: Hariprasad S <hariprasad@chelsio.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: Enable emac node on the rk3188-radxarock boards

This enables EMAC Rockchip support on radxa rock boards.

Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ARM: dts: Add emac nodes to the rk3188 device tree

This adds support for EMAC Rockchip driver on RK3188 SoCs.

Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: Document EMAC Rockchip

This adds the necessary binding documentation for the EMAC Rockchip platform
driver found in RK3066 and RK3188 SoCs.

Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: arc: Add support for Rockchip SoC layer device tree bindings

This patch defines a platform glue layer for Rockchip SoCs which
support arc-emac driver. It ensures that regulator for the rmii is on
before trying to connect to the ethernet controller. It applies right
speed and mode changes to the grf when ethernet settings change.

Signed-off-by: Romain Perier <romain.perier@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-next'

Daniel Borkmann says:

====================
BPF updates

[ Set applies on top of current net-next but also on top of
  Alexei's latest patches. Please see individual patches for
  more details. ]

Changelog:
v1->v2:
  - Removed paragraph in 1st commit message
  - Rest stays the same
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: bpf: be friendly to kmemcheck

Reported by Mikulas Patocka, kmemcheck currently barks out a
false positive since we don't have special kmemcheck annotation
for bitfields used in bpf_prog structure.

We currently have jited:1, len:31 and thus when accessing len
while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that
we're reading uninitialized memory.

As we don't need the whole bit universe for pages member, we
can just split it to u16 and use a bool flag for jited instead
of a bitfield.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bpf: arm: address randomize and write protect JIT code

This is the ARM variant for 314beb9bcab ("x86: bpf_jit_comp: secure bpf
jit against spraying attacks").

It is now possible to implement it due to commits 75374ad47c64 ("ARM: mm:
Define set_memory_* functions for ARM") and dca9aa92fc7c ("ARM: add
DEBUG_SET_MODULE_RONX option to Kconfig") which added infrastructure for
this facility.

Thus, this patch makes sure the BPF generated JIT code is marked RO, as
other kernel text sections, and also lets the generated JIT code start
at a pseudo random offset instead on a page boundary. The holes are filled
with illegal instructions.

JIT tested on armv7hl with BPF test suite.

Reference: http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Mircea Gherzan <mgherzan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bpf: consolidate JIT binary allocator

Introduced in commit 314beb9bcabf ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks") and later on replicated in aa2d2c73c21f
("s390/bpf,jit: address randomize and write protect jit code") for
s390 architecture, write protection for BPF JIT images got added and
a random start address of the JIT code, so that it's not on a page
boundary anymore.

Since both use a very similar allocator for the BPF binary header,
we can consolidate this code into the BPF core as it's mostly JIT
independant anyway.

This will also allow for future archs that support DEBUG_SET_MODULE_RONX
to just reuse instead of reimplementing it.

JIT tested on x86_64 and s390x with BPF test suite.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: remove dst refcount false sharing for prequeue mode

Alexander Duyck reported high false sharing on dst refcount in tcp stack
when prequeue is used. prequeue is the mechanism used when a thread is
blocked in recvmsg()/read() on a TCP socket, using a blocking model
rather than select()/poll()/epoll() non blocking one.

We already try to use RCU in input path as much as possible, but we were
forced to take a refcount on the dst when skb escaped RCU protected
region. When/if the user thread runs on different cpu, dst_release()
will then touch dst refcount again.

Commit 093162553c33 (tcp: force a dst refcount when prequeue packet)
was an example of a race fix.

It turns out the only remaining usage of skb->dst for a packet stored
in a TCP socket prequeue is IP early demux.

We can add a logic to detect when IP early demux is probably going
to use skb->dst. Because we do an optimistic check rather than duplicate
existing logic, we need to guard inet_sk_rx_dst_set() and
inet6_sk_rx_dst_set() from using a NULL dst.

Many thanks to Alexander for providing a nice bug report, git bisection,
and reproducer.

Tested using Alexander script on a 40Gb NIC, 8 RX queues.
Hosts have 24 cores, 48 hyper threads.

echo 0 >/proc/sys/net/ipv4/tcp_autocorking

for i in `seq 0 47`
do
  for j in `seq 0 2`
  do
     netperf -H $DEST -t TCP_STREAM -l 1000 \
             -c -C -T $i,$i -P 0 -- \
             -m 64 -s 64K -D &
  done
done

Before patch : ~6Mpps and ~95% cpu usage on receiver
After patch : ~9Mpps and ~35% cpu usage on receiver.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ath5k: Add missing vmalloc.h include.

After merging the wireless-next tree, today's linux-next build (powerpc
allyesconfig) failed like this:

drivers/net/wireless/ath/ath5k/debug.c: In function 'open_file_eeprom':
drivers/net/wireless/ath/ath5k/debug.c:933:2: error: implicit declaration of function 'vmalloc' [-Werror=implicit-function-declaration]
  buf = vmalloc(eesize);
  ^
drivers/net/wireless/ath/ath5k/debug.c:933:6: warning: assignment makes pointer from integer without a cast
  buf = vmalloc(eesize);
      ^
drivers/net/wireless/ath/ath5k/debug.c:960:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
  vfree(buf);
  ^

Caused by commit db906eb2101b ("ath5k: added debugfs file for dumping
eeprom").  Also reported by Guenter Roeck.

I have used Geert Uytterhoeven's suggested fix of including vmalloc.h
and so added this patch for today:

From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Mon, 8 Sep 2014 18:39:23 +1000
Subject: [PATCH] ath5k: fix debugfs addition

Reported-by: Guenter Roeck <linux@roeck-us.net>
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: ti: remove unwanted THIS_MODULE macro

It removes the owner field updation of driver structure.
It will be automatically updated by module_platform_driver()

Signed-off-by: Varka Bhadram <varkab@cdac.in>
Signed-off-by: David S. Miller <davem@davemloft.net>

openvswitch: change the data type of error status to atomic_long_t

Change the date type of error status from u64 to atomic_long_t, and use atomic
operation, then remove the lock which is used to protect the error status.

The operation of atomic maybe faster than spin lock.

Cc: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: Cleanup of unncessary check.

This patch removes an unncessary check in the br_afspec() method of
br_netlink.c.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bridge_rtnl_link'

Jiri Pirko says:

====================
bridge: implement rtnl_link options for getting and setting bridge options

So far, only sysfs is complete interface for getting and setting bridge
options. This patchset follows-up on the similar bonding code and
allows userspace to get/set bridge master/port options using Netlink
IFLA_INFO_DATA/IFLA_INFO_SLAVE_DATA attr.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: implement rtnl_link_ops->changelink

Allow rtnetlink users to set bridge master info via IFLA_INFO_DATA attr
This initial part implements forward_delay, hello_time, max_age options.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: implement rtnl_link_ops->get_size and rtnl_link_ops->fill_info

Allow rtnetlink users to get bridge master info in IFLA_INFO_DATA attr
This initial part implements forward_delay, hello_time, max_age options.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: implement rtnl_link_ops->slave_changelink

Allow rtnetlink users to set port info via IFLA_INFO_SLAVE_DATA attr

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: implement rtnl_link_ops->get_slave_size and rtnl_link_ops->fill_slave_info

Allow rtnetlink users to get port info in IFLA_INFO_SLAVE_DATA attr

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: switch order of rx_handler reg and upper dev link

The thing is that netdev_master_upper_dev_link calls
call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev). That generates rtnl
link message and during that, rtnl_link_ops->fill_slave_info is called.
But with current ordering, rx_handler and IFF_BRIDGE_PORT are not set
yet so there would have to be check for that in fill_slave_info callback.

Resolve this by reordering to similar what bonding and team does to
avoid the check.

Also add removal of IFF_BRIDGE_PORT flag into error path.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv4: bind ip_nonlocal_bind to current netns

net.ipv4.ip_nonlocal_bind sysctl was global to all network
namespaces. This patch allows to set a different value for each
network namespace.

Signed-off-by: Vincent Bernat <vincent@bernat.im>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ebpf'

Alexei Starovoitov says:

====================
load imm64 insn and uapi/linux/bpf.h

V9->V10
- no changes, added Daniel's ack

Note they're on top of Hannes's patch in the same area [1]

V8 thread with 'why' reasoning and end goal [2]

Original set [3] of ~28 patches I'm planning to present in 4 stages:

I. this 2 patches to fork off llvm upstreaming
II. bpf syscall with manpage and map implementation
III. bpf program load/unload with verifier testsuite (1st user of
instruction macros from bpf.h and 1st user of load imm64 insn)
IV. tracing, etc

[1] http://patchwork.ozlabs.org/patch/385266/
[2] https://lkml.org/lkml/2014/8/27/628
[3] https://lkml.org/lkml/2014/8/26/859
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: split filter.h and expose eBPF to user space

allow user space to generate eBPF programs

uapi/linux/bpf.h: eBPF instruction set definition

linux/filter.h: the rest

This patch only moves macro definitions, but practically it freezes existing
eBPF instruction set, though new instructions can still be added in the future.

These eBPF definitions cannot go into uapi/linux/filter.h, since the names
may conflict with existing applications.

Full eBPF ISA description is in Documentation/networking/filter.txt

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: filter: add "load 64-bit immediate" eBPF instruction

add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register.
All previous instructions were 8-byte. This is first 16-byte instruction.
Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction:
insn[0].code = BPF_LD | BPF_DW | BPF_IMM
insn[0].dst_reg = destination register
insn[0].imm = lower 32-bit
insn[1].code = 0
insn[1].imm = upper 32-bit
All unused fields must be zero.

Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM
which loads 32-bit immediate value into a register.

x64 JITs it as single 'movabsq %rax, imm64'
arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn

Note that old eBPF programs are binary compatible with new interpreter.

It helps eBPF programs load 64-bit constant into a register with one
instruction instead of using two registers and 4 instructions:
BPF_MOV32_IMM(R1, imm32)
BPF_ALU64_IMM(BPF_LSH, R1, 32)
BPF_MOV32_IMM(R2, imm32)
BPF_ALU64_REG(BPF_OR, R1, R2)

User space generated programs will use this instruction to load constants only.

To tell kernel that user space needs a pointer the _pseudo_ variant of
this instruction may be added later, which will use extra bits of encoding
to indicate what type of pointer user space is asking kernel to provide.
For example 'off' or 'src_reg' fields can be used for such purpose.
src_reg = 1 could mean that user space is asking kernel to validate and
load in-kernel map pointer.
src_reg = 2 could mean that user space needs readonly data section pointer
src_reg = 3 could mean that user space needs a pointer to per-cpu local data
All such future pseudo instructions will not be carrying the actual pointer
as part of the instruction, but rather will be treated as a request to kernel
to provide one. The kernel will verify the request_for_a_pointer, then
will drop _pseudo_ marking and will store actual internal pointer inside
the instruction, so the end result is the interpreter and JITs never
see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that
loads 64-bit immediate into a register. User space never operates on direct
pointers and verifier can easily recognize request_for_pointer vs other
instructions.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'master-2014-09-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next

John W. Linville says:

====================
pull request: wireless-next 2014-09-08

Please pull this batch of updates intended for the 3.18 stream...

For the mac80211 bits, Johannes says:

"Not that much content this time. Some RCU cleanups, crypto
performance improvements, and various patches all over,
rather than listing them one might as well look into the
git log instead."

For the Bluetooth bits, Gustavo says:

"The changes consists of:

        - Coding style fixes to HCI drivers
        - Corrupted ack value fix for the H5 HCI driver
        - A couple of Enhanced L2CAP fixes
        - Conversion of SMP code to use common L2CAP channel API
        - Page scan optimizations when using the kernel-side whitelist
        - Various mac802154 and and ieee802154 6lowpan cleanups
        - One new Atheros USB ID"

For the iwlwifi bits, Emmanuel says:

"We have a new big thing coming up which is called Dynamic Queue
Allocation (or DQA).  This is a completely new way to work with the
Tx queues and it requires major refactoring.  This is being done by
Johannes and Avri.  Besides this, Johannes disables U-APSD by default
because of APs that would disable A-MPDU if the association supports
U-ASPD.  Luca contributed to the power area which he was cleaning
up on the way while working on CSA.  A few more random things here
and there."

For the Atheros bits, Kalle says:

"For ath6kl we had two small fixes and a new SDIO device id.

For ath10k the bigger changes are:

* support for new firmware version 10.2 (Michal)

* spectral scan support (Simon, Sven & Mathias)

* export a firmware crash dump file (Ben & me)

* cleaning up of pci.c (Michal)

* print pci id in all messages, which causes most of the churn (Michal)"

Beyond that, we have the usual collection of various updates to ath9k,
b43, mwifiex, and wil6210, as well as a few other bits here and there.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

inet: remove dead inetpeer sequence code

inetpeer sequence numbers are no longer incremented, so no need to
check and flush the tree. The function that increments the sequence
number was already dead code and removed in in "ipv4: remove unused
function" (068a6e18). Remove the code that checks for a change, too.

Verifying that v4_seq and v6_seq are never incremented and thus that
flush_check compares bp->flush_seq to 0 is trivial.

The second part of the change removes flush_check completely even
though bp->flush_seq is exactly !0 once, at initialization. This
change is correct because the time this branch is true is when
bp->root == peer_avl_empty_rcu, in which the branch and
inetpeer_invalidate_tree are a NOOP.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

drivers: net: cpsw: Add support for pause frames

CPSW supports both rx and tx pause frames for flow control.

Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hp100: Convert the normal skb free path to dev_consume_skb_any()

A bit of floor sweeping in a dusty old corner. Convert the "normal"
skb free calls to dev_consume_skb_any() so packet drop tracing will
be more sane.

Signed-off-by: Rick Jones <rick.jones2@hp.com>
Acked-by: Jaroslav Kysela <perex@perex.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Fix GRE RX to use skb_transport_header for GRE header offset

GRE assumes that the GRE header is at skb_network_header +
ip_hrdlen(skb). It is more general to use skb_transport_header
and this allows the possbility of inserting additional header
between IP and GRE (which is what we will done in Generic UDP
Encapsulation for GRE).

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dp83640: Make use of skb_queue_purge instead of reimplementing the code

This change makes it so that dp83640_remove can use skb_queue_purge
instead of looping through itself to flush any entries out of the queue.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Merge branch 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
"This pull request includes Alban's patch to disallow '\n' in cgroup
  names.

  Two other patches from Li to fix a possible oops when cgroup
  destruction races against other file operations and one from Vivek to
  fix a unified hierarchy devel behavior"

* 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: check cgroup liveliness before unbreaking kernfs
  cgroup: delay the clearing of cgrp->kn->priv
  cgroup: Display legacy cgroup files on default hierarchy
  cgroup: reject cgroup names with '\n'

Merge branch 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu

Pull percpu fixes from Tejun Heo:
"One patch to fix a failure path in the alloc path.  The bug is
  dangerous but probably not too likely to actually trigger in the wild
  given that there hasn't been any report yet.

  The other two are low impact fixes"

* 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  percpu: free percpu allocation info for uniprocessor system
  percpu: perform tlb flush after pcpu_map_pages() failure
  percpu: fix pcpu_alloc_pages() failure path

Merge branch 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata

Pull libata fixes from Tejun Heo:
"Two patches are to add PCI IDs for ICH9 and all others are device
  specific fixes.  Nothing too interesting"

* 'for-3.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ahci_xgene: Fix the link down in first attempt for the APM X-Gene SoC AHCI SATA host controller driver.
  ahci_xgene: Skip the PHY and clock initialization if already configured by the firmware.
  ahci: add pcid for Marvel 0x9182 controller
  ata: Disabling the async PM for JMicron chip 363/361
  ata_piix: Add Device IDs for Intel 9 Series PCH
  ahci: Add Device IDs for Intel 9 Series PCH
  ata: ahci_tegra: Read calibration fuse

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix skb leak in mac802154, from Martin Townsend

2) Use select not depends on NF_NAT for NFT_NAT, from Pablo Neira
    Ayuso

3) Fix union initializer bogosity in vxlan, from Gerhard Stenzel

4) Fix RX checksum configuration in stmmac driver, from Giuseppe
    CAVALLARO

5) Fix TSO with non-accelerated VLANs in e1000, e1000e, bna, ehea,
    i40e, i40evf, mvneta, and qlge, from Vlad Yasevich

6) Fix capability checks in phy_init_eee(), from Giuseppe CAVALLARO

7) Try high order allocations more sanely for SKBs, specifically if a
    high order allocation fails, fall back directly to zero order pages
    rather than iterating down one order at a time.  From Eric Dumazet

8) Fix a memory leak in openvswitch, from Li RongQing

9) amd-xgbe initializes wrong spinlock, from Thomas Lendacky

10) RTNL locking was busted in setsockopt for anycast and multicast, fix
    from Sabrina Dubroca

11) Fix peer address refcount leak in ipv6, from Nicolas Dichtel

12) DocBook typo fixes, from Masanari Iida

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (101 commits)
  ipv6: restore the behavior of ipv6_sock_ac_drop()
  amd-xgbe: Enable interrupts for all management counters
  amd-xgbe: Treat certain counter registers as 64 bit
  greth: moved TX ring cleaning to NAPI rx poll func
  cnic : Cleanup CONFIG_IPV6 & VLAN check
  net: treewide: Fix typo found in DocBook/networking.xml
  bnx2x: Fix link problems for 1G SFP RJ45 module
  3c59x: avoid panic in boomerang_start_xmit when finding page address:
  netfilter: add explicit Kconfig for NETFILTER_XT_NAT
  ipv6: use addrconf_get_prefix_route() to remove peer addr
  ipv6: fix a refcnt leak with peer addr
  net-timestamp: only report sw timestamp if reporting bit is set
  drivers/net/fddi/skfp/h/skfbi.h: Remove useless PCI_BASE_2ND macros
  l2tp: fix race while getting PMTU on PPP pseudo-wire
  ipv6: fix rtnl locking in setsockopt for anycast and multicast
  VMXNET3: Check for map error in vmxnet3_set_mc
  openvswitch: distinguish between the dropped and consumed skb
  amd-xgbe: Fix initialization of the wrong spin lock
  openvswitch: fix a memory leak
  netfilter: fix missing dependencies in NETFILTER_XT_TARGET_LOG
  ...

net: phy: mdio-sun4i: don't select REGULATOR

The mdio-sun4i driver automatically selects REGULATOR and
REGULATOR_FIXED_VOLTAGE because it uses the regulator API. But a
driver selecting a subsystem increases the chance of generating
circular Kconfig dependencies, especially when other drivers depend on
the selected symbol.

Since the regulator API functions are replaced with no-ops when
REGULATOR is disabled, the driver can be built successfully even
without regulator support and so those 'select' dependencies can be
safely dropped.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'master-2014-09-04' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless

John W. Linville says:

====================
pull request: wireless 2014-09-05

Please pull this batch of fixes intended for the 3.17 stream...

For the mac80211 bits, Johannes says:

"Here are a few fixes for mac80211. One has been discussed for a while
and adds a terminating NUL-byte to the alpha2 sent to userspace, which
shouldn't be necessary but since many places treat it as a string we
couldn't move to just sending two bytes.

In addition to that, we have two VLAN fixes from Felix, a mesh fix, a
fix for the recently introduced RX aggregation offload, a revert for
a broken patch (that luckily didn't really cause any harm) and a small
fix for alignment in debugfs."

For the iwlwifi bits, Emmanuel says:

"I revert a patch that disabled CTS to self in dvm because users
reported issues. The revert is CCed to stable since the offending
patch was sent to stable too. I also bump the firmware API versions
since a new firmware is coming up. On top of that, Marcel fixes a
bug I introduced while fixing a bug in our Kconfig file."

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: restore the behavior of ipv6_sock_ac_drop()

It is possible that the interface is already gone after joining
the list of anycast on this interface as we don't hold a refcount
for the device, in this case we are safe to ignore the error.

What's more important, for API compatibility we should not
change this behavior for applications even if it were correct.

Fixes: commit a9ed4a2986e13011 ("ipv6: fix rtnl locking in setsockopt for anycast and multicast")
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Linux 3.17-rc4

rose: use %*ph specifier

Instead of dereference each byte let's use %*ph specifier in the printk()
calls.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Documentation: new page link in SubmittingPatches

new link for - How to piss off a Linux kernel subsystem maintainer

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Documentation: NFS/RDMA: Document separate Kconfig symbols

The NFS/RDMA Kconfig symbol was split into separate options for client
and server in commit 2e8c12e1b765 ("xprtrdma: add separate Kconfig
options for NFSoRDMA client and server support").

Update the documentation to reflect this split.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "J. Bruce Fields" <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Documentation: misc-devices: Rename freefall.c from hpfall.c in lis2lv02d

hpfall.c was renamed to freefall.c in 3.16, but this file still refer to
hpfall.c instead of freefall.c

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Documentation: i2c: rename variable "register" to "reg"

The example code provided with the i2c device interface documentation
won't compile since it uses the reserved word "register" to name a
variable.

The compiler fails with this error message:

error: expected identifier or '(' before '=' token
__u8 register = 0x20; /* Device register to access */
^

Rename the variable "register" to simply "reg" in the example code.

Another couple of typos has been fixed as well.
[Change "! =" to "!=".]

Signed-off-by: Jose Alarcon Roldan <jose.alarcon.roldan@gmail.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Documentation: seq_file: Document seq_open_private(), seq_release_private()

Despite the fact that these functions have been around for years, they
are little used (only 15 uses in 13 files at the preseht time) even
though many other files use work-arounds to achieve the same result.

By documenting them, hopefully they will become more widely used.

Signed-off-by: Rob Jones <rob.jones@codethink.co.uk>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'pm+acpi-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:
"These are regression fixes (ACPI sysfs, ACPI video, suspend test),
  ACPI cpuidle deadlock fix, missing runtime validation of ACPI _DSD
  output, a fix and a new CPU ID for the RAPL driver, new blacklist
  entry for the ACPI EC driver and a couple of trivial cleanups
  (intel_pstate and generic PM domains).

  Specifics:

   - Fix for recently broken test_suspend= command line argument (Rafael
     Wysocki).

   - Fixes for regressions related to the ACPI video driver caused by
     switching the default to native backlight handling in 3.16 from
     Hans de Goede.

   - Fix for a sysfs attribute of ACPI device objects that returns stale
     values sometimes due to the fact that they are cached instead of
     executing the appropriate method (_SUN) every time (broken in
     3.14).  From Yasuaki Ishimatsu.

   - Fix for a deadlock between cpuidle_lock and cpu_hotplug.lock in the
     ACPI processor driver from Jiri Kosina.

   - Runtime output validation for the ACPI _DSD device configuration
     object missing from the support for it that has been introduced
     recently.  From Mika Westerberg.

   - Fix for an unuseful and misleading RAPL (Running Average Power
     Limit) domain detection message in the RAPL driver from Jacob Pan.

   - New Intel Haswell CPU ID for the RAPL driver from Jason Baron.

   - New Clevo W350etq blacklist entry for the ACPI EC driver from Lan
     Tianyu.

   - Cleanup for the intel_pstate driver and the core generic PM domains
     code from Gabriele Mazzotta and Geert Uytterhoeven"

* tag 'pm+acpi-3.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock
  ACPI / scan: not cache _SUN value in struct acpi_device_pnp
  cpufreq: intel_pstate: Remove unneeded variable
  powercap / RAPL: change domain detection message
  powercap / RAPL: add support for CPU model 0x3f
  PM / domains: Make generic_pm_domain.name const
  PM / sleep: Fix test_suspend= command line option
  ACPI / EC: Add msi quirk for Clevo W350etq
  ACPI / video: Disable native_backlight on HP ENVY 15 Notebook PC
  ACPI / video: Add a disable_native_backlight quirk
  ACPI / video: Fix use_native_backlight selection logic
  ACPICA: ACPI 5.1: Add support for runtime validation of _DSD package.

amd-xgbe-phy: Fix build break for missing declaration

A previous patch inadvertently deleted a declaration in the
amd_xgbe_an_tx_training function causing the build to fail.

Add the declaration for 'priv' back to the function.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull filesystem fixes from Al Viro:
"Several bugfixes (all of them -stable fodder).

  Alexey's one deals with double mutex_lock() in UFS (apparently, nobody
  has tried to test "ufs: sb mutex merge + mutex_destroy" on something
  like file creation/removal on ufs).  Mine deal with two kinds of
  umount bugs, in umount propagation and in handling of automounted
  submounts, both resulting in bogus transient EBUSY from umount"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ufs: fix deadlocks introduced by sb mutex merge
  fix EBUSY on umount() from MNT_SHRINKABLE
  get rid of propagate_umount() mistakenly treating slaves as busy.

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU fix from Ingo Molnar:
"A boot hang fix for the offloaded callback RCU model (RCU_NOCB_CPU=y
&& (TREE_CPU=y || TREE_PREEMPT_RC)) in certain bootup scenarios"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcu: Make nocb leader kthreads process pending callbacks after spawning

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
"Three fixlets from the timer departement:

   - Update the timekeeper before updating vsyscall and pvclock.  This
     fixes the kvm-clock regression reported by Chris and Paolo.

   - Use the proper irq work interface from NMI.  This fixes the
     regression reported by Catalin and Dave.

   - Clarify the compat_nanosleep error handling mechanism to avoid
     future confusion"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Update timekeeper before updating vsyscall and pvclock
  compat: nanosleep: Clarify error handling
  nohz: Restore NMI safe local irq work for local nohz kick

ufs: fix deadlocks introduced by sb mutex merge

Commit 0244756edc4b ("ufs: sb mutex merge + mutex_destroy") introduces
deadlocks in ufs_new_inode() and ufs_free_inode().
Most callers of that functions acqure the mutex by themselves and
ufs_{new,free}_inode() do that via lock_ufs(),
i.e we have an unavoidable double lock.

The patch proposes to resolve the issue by making sure that
ufs_{new,free}_inode() are not called with the mutex held.

Found by Linux Driver Verification project (linuxtesting.org).

Cc: stable@vger.kernel.org # 3.16
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"A smattering of bug fixes across most architectures"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  powerpc/kvm/cma: Fix panic introduces by signed shift operation
  KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags
  KVM: s390/mm: Fix storage key corruption during swapping
  arm/arm64: KVM: Complete WFI/WFE instructions
  ARM/ARM64: KVM: Nuke Hyp-mode tlbs before enabling MMU
  KVM: s390/mm: try a cow on read only pages for key ops
  KVM: s390: Fix user triggerable bug in dead code

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Kevin Hilman:
"Another round of fixes from arm-soc land, which are mostly DT fixes
  for:

   - OMAP: handful of DT fixes devices on newly supported hardware
   - davinci: fix 2nd EDMA channel
   - ux500: extend previous pinctrl fix to another board
   - at91: clock registration fixes, compatibility string precision

  And one more fix for event cleanup in drivers/bus/arm-ccn"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  bus: arm-ccn: Move event cleanup routine
  ARM: at91/dt: rm9200: fix usb clock definition
  ARM: at91: rm9200: fix clock registration
  ARM: at91/dt: sam9g20: set at91sam9g20 pllb driver
  ARM: dts: dra7-evm: Add vtt regulator support
  ARM: dts: dra7-evm: Fix spi1 mux documentation
  ARM: dts: am43x-epos-evm: Disable QSPI to prevent conflict with GPMC-NAND
  ARM: OMAP2+: gpmc: Don't complain if wait pin is used without r/w monitoring
  ARM: dts: am43xx-epos-evm: Don't use read/write wait monitoring
  ARM: dts: am437x-gp-evm: Don't use read/write wait monitoring
  ARM: dts: am437x-gp-evm: Use BCH16 ECC scheme instead of BCH8
  ARM: dts: am43x-epos-evm: Use BCH16 ECC scheme instead of BCH8
  ARM: dts: am4372: fix USB regs size
  ARM: dts: am437x-gp: switch i2c0 to 100KHz
  ARM: dts: dra7-evm: Fix 8th NAND partition's name
  ARM: dts: dra7-evm: Fix i2c3 pinmux and frequency
  ARM: ux500: disable msp2 node on Snowball
  ARM: edma: Fix configuration parsing for SoCs with multiple eDMA3 CC
  ARM: dts: set 'ti,set-rate-parent' for dpll4_m5x2 clock

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-09-06

This series contains updates to e1000 and igb.

Krzysztof provides a patch to cleanup the coding style in e1000 to quiet
checkpatch.pl warnings.

Todd adds two boolean flags to igb to allow for changes in the
advertised EEE speeds from ethtool.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: remove obsolete comment about TCP_SKB_CB(skb)->when in tcp_fragment()

The TCP_SKB_CB(skb)->when field no longer exists as of recent change
7faee5c0d514 ("tcp: remove TCP_SKB_CB(skb)->when"). And in any case,
tcp_fragment() is called on already-transmitted packets from the
__tcp_retransmit_skb() call site, so copying timestamps of any kind
in this spot is quite sensible.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reported-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs

Pull xfs fixes from Dave Chinner:
"The fixes all address recently discovered data corruption issues.

  The original Direct IO issue was discovered by Chris Mason @ Facebook
  on a production workload which mixed buffered reads with direct reads
  and writes IO to the same file.  The fix for that exposed other issues
  with page invalidation (exposed by millions of fsx operations) failing
  due to dirty buffers beyond EOF.

  Finally, the collapse_range code could also cause problems due to
  racing writeback changing the extent map while it was being shifted
  around.  The commits for that problem are simple mitigation fixes that
  prevent the problem from occuring.  A more robust fix for 3.18 that
  addresses the underlying problem is currently being worked on by
  Brian.

  Summary of fixes:
   - a direct IO read/buffered read data corruption
   - the associated fallout from the DIO data corruption fix
   - collapse range bugs that are potential data corruption issues"

* tag 'xfs-for-linus-3.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
  xfs: trim eofblocks before collapse range
  xfs: xfs_file_collapse_range is delalloc challenged
  xfs: don't log inode unless extent shift makes extent modifications
  xfs: use ranged writeback and invalidation for direct IO
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't zero partial page cache pages during O_DIRECT writes
  xfs: don't dirty buffers beyond EOF

Merge tag 'for-linus-20140905' of git://git.infradead.org/linux-mtd

Pull mtd fixes from Brian Norris:
"Two trivial MTD updates for 3.17-rc4:

   - a tiny comment tweak, to kill a bunch of DocBook warnings added
     during the merge window

   - a small fixup to the OTP routines' error handling"

* tag 'for-linus-20140905' of git://git.infradead.org/linux-mtd:
  mtd: nand: fix DocBook warnings on nand_sdr_timings doc
  mtd: cfi_cmdset_0002: check return code for get_chip()

igb: add flags to set eee advertisement mode

Change e1000_set_eee and e1000_set_eee_i35(0|4) to allow
changes in the advertised EEE speeds from ethtool. Adds two boolean
flags to e1000_set_eee_i35(0|4) to pass in advertised speed data.

Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

timekeeping: Update timekeeper before updating vsyscall and pvclock

The update_walltime() code works on the shadow timekeeper to make the
seqcount protected region as short as possible. But that update to the
shadow timekeeper does not update all timekeeper fields because it's
sufficient to do that once before it becomes life. One of these fields
is tkr.base_mono. That stays stale in the shadow timekeeper unless an
operation happens which copies the real timekeeper to the shadow.

The update function is called after the update calls to vsyscall and
pvclock. While not correct, it did not cause any problems because none
of the invoked update functions used base_mono.

commit cbcf2dd3b3d4 (x86: kvm: Make kvm_get_time_and_clockread()
nanoseconds based) changed that in the kvm pvclock update function, so
the stale mono_base value got used and caused kvm-clock to malfunction.

Put the update where it belongs and fix the issue.

Reported-by: Chris J Arges <chris.j.arges@canonical.com>
Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409050000570.3333@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

compat: nanosleep: Clarify error handling

The error handling in compat_sys_nanosleep() is correct, but
completely non obvious. Document it and restrict it to the
-ERESTART_RESTARTBLOCK return value for clarity.

Reported-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

e1000: e1000_ethertool.c coding style fixes

Fixed many errors/warnings and checks in e1000_ethtool.c reported
by checkpatch.pl. Suggestions from Joe Perches and Alexander Duyck
applied as well

Signed-off-by: Krzysztof Majzerowicz-Jaszcz <cristos@vipserv.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge branch 'amd-xgbe-net'

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver fixes 2014-09-05

The following series of patches includes fixes to the driver.

- Proper access to 64 bit management counter registers
- Enable all management counter registers to generate an interrupt when
the counter threshold is reached

This patch series is based on net.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Enable interrupts for all management counters

As the management counters reach a threshold they will generate an
interrupt so the value can be saved and the counter reset. The
current code does not enable this interrupt on all counters. This
can result in inaccurate statistics.

Update the code to enable all the counters to generate an interrupt
when its threshold is exceeded.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Treat certain counter registers as 64 bit

Even if the management counters are configured to be 32 bit register
values, the [rt]xoctetcount_gb and [rt]xoctetcount_g counters are
always 64 bit counter registers. Since they are not being treated as
64 bit values, these statistics are being reported incorrectly (ifconfig,
ethtool, etc.).

Update the routines used to read the registers to access the "hi"
register (an offset of 4 from the "lo" register) to create a 64 bit
value for these 64 bit counters.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4: only pull headers into skb head

Use the new fancy eth_get_headlen() to pull exactly the headers
into skb->head.

This speeds up GRE traffic (or more generally tunneled traffuc),
as GRO can aggregate up to 17 MSS per GRO packet instead of 8.

(Pulling too much data was forcing GRO to keep 2 frags per MSS)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mISDN: remove DSP_NEVER_DEFINED and adjust code identation

The DSP_NEVER_DEFINED #ifdef is confusing, it slips in an
extra } which is not required because the previous code is
indented incorrectly. Correct the identation and remove the
extraneous DSP_NEVER_DEFINED

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

greth: moved TX ring cleaning to NAPI rx poll func

This patch does not affect the 10/100 GRETH MAC.

Before all GBit GRETH TX descriptor ring cleaning was done in
start_xmit(), when descriptor list became full it activated
TX interrupt to start the NAPI rx poll function to do TX ring
cleaning.

With this patch the TX descriptor ring is always cleaned from
the NAPI rx poll function, triggered via TX or RX interrupt.
Otherwise we could end up in TX frames being sent but not
reported to the stack being sent. On the 10/100 GRETH this
is not an issue since the SKB is copied&aligned into private
buffers so that the SKB can be freed directly on start_xmit()

Signed-off-by: Daniel Hellstrom <daniel@gaisler.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: add slave netlink policy and put slave-related ops together

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cnic : Cleanup CONFIG_IPV6 & VLAN check

The cnic module needs to ensure that if ipv6 support is compiled as a module,
then the cnic module cannot be compiled as built-in as it depends on ipv6.
Made this check cleaner via Kconfig

Use simpler IS_ENABLED for CONFIG_VLAN_8021Q check

Signed-off-by: Anish Bhatt <anish@chelsio.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ahci_xgene: Fix the link down in first attempt for the APM X-Gene SoC AHCI SATA host controller driver.

Due to HW errata the APM X-Gene AHCI SATA host controller reports link
down even if the device presence is detected. This issue is due to speed
negotiation failure. This patch implements the algorithm to retry the
COMRESET if PxSTAT register reports device presence detected but
PHY communication not established. The maximum retry attempts are 3.

This patch also fixes the code to match the algorithm for the printing
a warning message if the disparity error still exists after link up.

Signed-off-by: Loc Ho <lho@apm.com>
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

ahci_xgene: Skip the PHY and clock initialization if already configured by the firmware.

This patch implements the feature to skip the PHY and clock
initialization if it is already configured by the firmware.

Signed-off-by: Loc Ho <lho@apm.com>
Signed-off-by: Suman Tripathi <stripathi@apm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

Merge branch 'tcp'

Eric Dumazet says:

====================
tcp: deduplicate TCP_SKB_CB(skb)->when

TCP_SKB_CB(skb)->when has different meaning in output and input paths.

In output path, it contains a timestamp.
In input path, it contains an ISN, chosen by tcp_timewait_state_process()

Its usage in output path is obsolete after usec timestamping.
Lets simplify and clean this.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: remove TCP_SKB_CB(skb)->when

After commit 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution"),
we no longer need to maintain timestamps in two different fields.

TCP_SKB_CB(skb)->when can be removed, as same information sits in skb_mstamp.stamp_jiffies

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: introduce TCP_SKB_CB(skb)->tcp_tw_isn

TCP_SKB_CB(skb)->when has different meaning in output and input paths.

In output path, it contains a timestamp.
In input path, it contains an ISN, chosen by tcp_timewait_state_process()

Lets add a different name to ease code comprehension.

Note that 'when' field will disappear in following patch,
as skb_mstamp already contains timestamp, the anonymous
union will promptly disappear as well.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'eth_get_headlen'

Alexander Duyck says:

====================
net: Drop get_headlen functions in favor of generic function

This series replaces the igb_get_headlen and ixgbe_get_headlen functions
with a generic function named eth_get_headlen.

I have done some performance testing on ixgbe with 258 byte frames since
the calls are only used on frames larger than 256 bytes and have seen no
significant difference in CPU utilization.

v2: renamed __skb_get_poff to skb_get_poff
renamed ___skb_get_poff to __skb_get_poff
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ixgbe: use new eth_get_headlen interface

Update ixgbe to drop the ixgbe_get_headlen function in favor of eth_get_headlen.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

igb: use new eth_get_headlen interface

Update igb to drop the igb_get_headlen function in favor of eth_get_headlen.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Add function for parsing the header length out of linear ethernet frames

This patch updates some of the flow_dissector api so that it can be used to
parse the length of ethernet buffers stored in fragments. Most of the
changes needed were to __skb_get_poff as it needed to be updated to support
sending a linear buffer instead of a skb.

I have split __skb_get_poff into two functions, the first is skb_get_poff
and it retains the functionality of the original __skb_get_poff. The other
function is __skb_get_poff which now works much like __skb_flow_dissect in
relation to skb_flow_dissect in that it provides the same functionality but
works with just a data buffer and hlen instead of needing an skb.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'timestamping'

Alexander Duyck says:

====================
This change makes it so that the core path for the phy timestamping logic
is shared between skb_tx_tstamp and skb_complete_tx_timestamp.  In addition
it provides a means of using the same skb clone type path in non phy
timestamping drivers.

The main motivation for this is to enable non-phy drivers to be able to
manipulate tx timestamp skbs for such things as putting them in lists or
setting aside buffer in the context block.

v2: Incorporated suggested changes from Willem de Bruijn and Eric Dumazet
     dropped uneeded comment
     restored order of hwtstamp vs swtstamp
     added destructor for skb
    Dropped usage of skb_complete_tx_timestamp as a kfree_skb w/ destructor

v3: Updated destructor handling and dealt with socket reference counting issues

v4: Split out combining destructors into a separate patch
====================

net: merge cases where sock_efree and sock_edemux are the same function

Since sock_efree and sock_demux are essentially the same code for non-TCP
sockets and the case where CONFIG_INET is not defined we can combine the
code or replace the call to sock_edemux in several spots. As a result we
can avoid a bit of unnecessary code or code duplication.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-timestamp: Make the clone operation stand-alone from phy timestamping

The phy timestamping takes a different path than the regular timestamping
does in that it will create a clone first so that the packets needing to be
timestamped can be placed in a queue, or the context block could be used.

In order to support these use cases I am pulling the core of the code out
so it can be used in other drivers beyond just phy devices.

In addition I have added a destructor named sock_efree which is meant to
provide a simple way for dropping the reference to skb exceptions that
aren't part of either the receive or send windows for the socket, and I
have removed some duplication in spots where this destructor could be used
in place of sock_edemux.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-timestamp: Merge shared code between phy and regular timestamping

This change merges the shared bits that exist between skb_tx_tstamp and
skb_complete_tx_timestamp. By doing this we can avoid the two diverging as
there were already changes pushed into skb_tx_tstamp that hadn't made it
into the other function.

In addition this resolves issues with the fact that
skb_complete_tx_timestamp was included in linux/skbuff.h even though it was
only compiled in if phy timestamping was enabled.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: harden fnhe_hashfun()

Lets make this hash function a bit secure, as ICMP attacks are still
in the wild.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: treewide: Fix typo found in DocBook/networking.xml

This patch fix spelling typo found in DocBook/networking.xml.
It is because the neworking.xml is generated from comments
in the source, I have to fix typo in comments within the source.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-timestamp: fix allocation error in test

A buffer is incorrectly zeroed to the length of the pointer. If
cfg_payload_len < sizeof(void *) this can overwrites unrelated memory.
The buffer contents are never read, so no need to zero.

Fixes: 8fe2f761cae9 ("net-timestamp: expand documentation")
Reported-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>