Jon Paul Maloy [Thu, 19 Nov 2015 19:30:45 +0000 (14:30 -0500)]
tipc: narrow down exposure of struct tipc_node
In our effort to have less code and include dependencies between
entities such as node, link and bearer, we try to narrow down
the exposed interface towards the node as much as possible.
In this commit, we move the definition of struct tipc_node, along
with many of its associated function declarations, from node.h to
node.c. We also move some function definitions from link.c and
name_distr.c to node.c, since they access fields in struct tipc_node
that should not be externally visible. The moved functions are renamed
according to new location, and made static whenever possible.
There are no functional changes in this commit.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 19 Nov 2015 19:30:44 +0000 (14:30 -0500)]
tipc: convert node lock to rwlock
According to the node FSM a node in state SELF_UP_PEER_UP cannot
change state inside a lock context, except when a TUNNEL_PROTOCOL
(SYNCH or FAILOVER) packet arrives. However, the node's individual
links may still change state.
Since each link now is protected by its own spinlock, we finally have
the conditions in place to convert the node spinlock to an rwlock_t.
If the node state and arriving packet type are rigth, we can let the
link directly receive the packet under protection of its own spinlock
and the node lock in read mode. In all other cases we use the node
lock in write mode. This enables full concurrent execution between
parallel links during steady-state traffic situations, i.e., 99+ %
of the time.
This commit implements this change.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 19 Nov 2015 19:30:43 +0000 (14:30 -0500)]
tipc: introduce per-link spinlock
As a preparation to allow parallel links to work more independently
from each other we introduce a per-link spinlock, to be stored in the
struct nodes's link entry area. Since the node lock still is a regular
spinlock there is no increase in parallellism at this stage.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 19 Nov 2015 19:30:42 +0000 (14:30 -0500)]
tipc: reduce code dependency between binding table and node layer
The file name_distr.c currently contains three functions,
named_cluster_distribute(), tipc_publ_subcscribe() and
tipc_publ_unsubscribe() that all directly access fields in
struct tipc_node. We want to eliminate such dependencies, so
we move those functions to the file node.c and rename them to
tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
respectively.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 19 Nov 2015 19:30:41 +0000 (14:30 -0500)]
tipc: small cleanup of function tipc_node_check_state()
The function tipc_node_check_state() contains the core logics
for handling link synchronization and failover. For this reason,
it is important to keep it as comprehensible as possible.
In this commit, we make three small cleanups.
1) If the node is in state SELF_DOWN_PEER_LEAVING and the received
packet confirms that the peer has lost contact, there will be no
further action in this function. To make this clearer, we return
from the function directly after the state change.
2) Since commit 0f8b8e28fb3241f9fd ("tipc: eliminate risk of stalled
link synchronization") only the logically first TUNNEL_PROTO/SYNCH
packet can alter the link state and set the synch point,
independently of arrival order. Hence, there is not any longer any
need to adjust the synch value in case such packets arrive in
disorder. We remove this adjustment.
3) It is the intention that any message arriving on any of the links
may trig a check for and possible termination of a node SYNCH state.
A redundant and unnoticed check for tipc_link_is_synching() obviously
beats this purpose, with the effect that only packets arriving on the
synching link may currently end the synch state. We remove this check.
This change will further shorten the synchronization period between
parallel links.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 19 Nov 2015 19:30:40 +0000 (14:30 -0500)]
tipc: move linearization of buffers to generic code
In commit 5cbb28a4bf65c7e4 ("tipc: linearize arriving NAME_DISTR
and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
tipc_udp_recv(). The location of the change was selected in order
to make the commit easily appliable to 'net' and 'stable'.
We now move this linearization to where it should be done, in the
functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.
Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 20 Nov 2015 17:14:53 +0000 (12:14 -0500)]
Merge branch 'bnx2x-stats'
Yuval Mintz says:
====================
bnx2x: Statistics patch series
This series contains 2 small statistics-related patches,
first adding a new SW statistics and the other exposing port stats
for multi-function devices.
Please consider applying this series to `net-next'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 19 Nov 2015 15:04:36 +0000 (17:04 +0200)]
bnx2x: Show port statistics in Multi-function
Today, port statistics are being presented when using `ethool -S' only
for single-function devices, but there are some port statistics which are
crucial for analyzing bottle-necks. E.g., HW Rx discards due to lack of
buffer space [when device isn't handling ingress traffic fast enough].
Judging the pros and cons, it was decided that in-order to better support
automatic dump-gathering tools, bnx2x should no longer hide those stats.
This leaves only VFs lacking the port statistics.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 19 Nov 2015 15:04:35 +0000 (17:04 +0200)]
bnx2x: Add new SW stat 'tx_exhaustion_events'
Driver already has an internal counter for number of times a given queue
had to be stopped due to Tx ring exhaustion.
This add the counter to the statistics presented by driver, e.g., by using
`ethtool -S'.
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 20 Nov 2015 16:31:27 +0000 (11:31 -0500)]
Merge branch 'ppp-kill-zombie-state'
Guillaume Nault says:
====================
ppp: Remove PPPOX_ZOMBIE socket state
Several issues have been found lately wrt. the PPPOX_ZOMBIE socket
state. This state is now only set upon reception of a PADT to stop
further transmissions. However this is redundant with the PADT
workqueue mechanism introduced by 287f3a943fef ("pppoe: Use workqueue
to die properly when a PADT is received").
We can thus simplify pppox socket state handling by getting rid of
PPPOX_ZOMBIE entirely.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Thu, 19 Nov 2015 11:52:56 +0000 (12:52 +0100)]
ppp: don't set sk_state to PPPOX_ZOMBIE in pppoe_disc_rcv()
Since 287f3a943fef ("pppoe: Use workqueue to die properly when a PADT
is received"), pppoe_disc_rcv() disconnects the socket by scheduling
pppoe_unbind_sock_work(). This is enough to stop socket transmission
and makes the PPPOX_ZOMBIE state uncessary.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 19 Nov 2015 11:27:40 +0000 (12:27 +0100)]
mlxsw: spectrum: Add error paths to __mlxsw_sp_port_vlans_add
The operation of adding VLANs on a port via switchdev ops can fail and
we need to be prepared for it. If we do not rollback hardware operations
following a failure, hardware and software will remain in an
inconsistent state.
Solve that by adding suitable error paths to __mlxsw_sp_port_vlans_add.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 19 Nov 2015 11:27:39 +0000 (12:27 +0100)]
mlxsw: spectrum: Unify setting of HW VLAN filters
When adding or deleting VLANs from a bridged port, HW VLAN filters must be
set accordingly. Instead of having the same code in both add and delete
functions, just wrap it in a function and call it with the appropriate
parameters.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 19 Nov 2015 10:56:22 +0000 (11:56 +0100)]
bpf: add show_fdinfo handler for maps
Add a handler for show_fdinfo() to be used by the anon-inodes
backend for eBPF maps, and dump the map specification there. Not
only useful for admins, but also it provides a minimal way to
compare specs from ELF vs pinned object.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Nov 2015 21:17:43 +0000 (16:17 -0500)]
Merge branch 'net-generic-busy-polling'
Eric Dumazet says:
====================
net: extend busy polling support
This patch series extends busy polling range to tunnels devices,
and adds busy polling generic support to all NAPI drivers.
No need to provide ndo_busy_poll() method and extra synchronization
between ndo_busy_poll() and normal napi->poll() method.
This was proven very difficult and bug prone.
mlx5 driver is changed to support busy polling using this new method,
and a second mlx5 patch adds napi_complete_done() support and proper
SNMP accounting.
bnx2x and mlx4 drivers are converted to new infrastructure,
reducing kernel bloat and improving performance.
Latest patch, adding generic support, adds a new requirement :
-free_netdev() and netif_napi_del() must be called from process context.
Since this might not be the case in some drivers, we might have to
either : fix the non conformant drivers (by disabling busy polling on them)
or revert this last patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Nov 2015 14:31:03 +0000 (06:31 -0800)]
net: provide generic busy polling to all NAPI drivers
NAPI drivers no longer need to observe a particular protocol
to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)
napi_hash_add() and napi_hash_del() are automatically called
from core networking stack, respectively from
netif_napi_add() and netif_napi_del()
This patch depends on free_netdev() and netif_napi_del() being
called from process context, which seems to be the norm.
Drivers might still prefer to call napi_hash_del() on their
own, since they might combine all the rcu grace periods into
a single one, knowing their NAPI structures lifetime, while
core networking stack has no idea of a possible combining.
Once this patch proves to not bring serious regressions,
we will cleanup drivers to either remove napi_hash_del()
or provide appropriate rcu grace periods combining.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Nov 2015 14:31:02 +0000 (06:31 -0800)]
net: napi_hash_del() returns a boolean status
napi_hash_del() will soon be used from both drivers (if they want)
or core networking stack.
Callers are responsibles to ensure an RCU grace period is respected
before freeing napi structure : napi_hash_del() can signal if
this RCU grace period is needed or not.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Nov 2015 14:30:59 +0000 (06:30 -0800)]
net: move skb_mark_napi_id() into core networking stack
We would like to automatically provide busy polling support
to all NAPI drivers, without them having to implement anything.
skb_mark_napi_id() can be called from napi_gro_receive() and
napi_get_frags().
Few drivers are still calling skb_mark_napi_id() because
they use netif_receive_skb(). They should eventually call
napi_gro_receive() instead. I will leave this to drivers
maintainers.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
lpk51:~# ./netperf -H 192.168.4.52 -t TCP_RR -l 10
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.4.52 () port 0 AF_INET : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 1 1 10.00 97530.92
16384 87380
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Nov 2015 14:30:54 +0000 (06:30 -0800)]
net: network drivers no longer need to implement ndo_busy_poll()
Instead of having to implement complex ndo_busy_poll() method,
drivers can simply rely on NAPI poll logic.
Busy polling gains are mainly coming from polling itself,
not on exact details on how we poll the device.
ndo_busy_poll() if implemented can avoid touching
napi state, but it adds extra synchronization between
normal napi->poll() and busy poll handler, slowing down
the common path (non busy polling) with extra atomic operations.
In practice few drivers ever got busy poll because of the complexity.
We could go one step further, and make busy polling
available for all NAPI drivers, but this would require
that all netif_napi_del() calls are done in process context
so that we can call synchronize_rcu().
Full audit would be required.
Before this is done, a driver still needs to call :
- skb_mark_napi_id() for each skb provided to the stack.
- napi_hash_add() and napi_hash_del() to allocate a napi_id per napi struct.
- Make sure RCU grace period is respected after napi_hash_del() before
memory containing napi structure is freed.
Followup patch implements busy poll for mlx5 driver as an example.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Nov 2015 14:30:53 +0000 (06:30 -0800)]
net: allow BH servicing in sk_busy_loop()
Instead of blocking BH in whole sk_busy_loop(), block them
only around ->ndo_busy_poll() calls.
This has many benefits.
1) allow tunneled traffic to use busy poll as well as native traffic.
Tunnels handlers usually call netif_rx() and depend on net_rx_action()
being run (from sofirq handler)
2) allow RFS/RPS being used (sending IPI to other cpus if needed)
3) use the 'lets burn cpu cycles' budget to do useful work
(like TX completions, timers, RCU callbacks...)
4) reduce BH latencies, making busy poll a better citizen.
Tested:
Tested with SIT tunnel
lpaa5:~# echo 0 >/proc/sys/net/core/busy_read
lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 1 1 10.00 58314.77
16384 87380
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 18 Nov 2015 14:30:50 +0000 (06:30 -0800)]
net: better skb->sender_cpu and skb->napi_id cohabitation
skb->sender_cpu and skb->napi_id share a common storage,
and we had various bugs about this.
We had to call skb_sender_cpu_clear() in some places to
not leave a prior skb->napi_id and fool netdev_pick_tx()
As suggested by Alexei, we could split the space so that
these errors can not happen.
0 value being reserved as the common (not initialized) value,
let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
and [NR_CPUS+1 .. ~0U] for valid napi_id.
This will allow proper busy polling support over tunnels.
Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Wed, 18 Nov 2015 13:06:34 +0000 (14:06 +0100)]
be2net: remove local variable 'status'
The lancer_cmd_get_file_len() uses lancer_cmd_read_object() to get
the current size of registers for ethtool registers dump. Returned status
value is stored but not checked. The check itself is not necessary as
the data_read output variable is initialized to 0 and status variable
can be removed.
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
huangdaode [Wed, 18 Nov 2015 02:08:00 +0000 (10:08 +0800)]
net: hisilicon: fix binding document of mdio
This patch explains the occasion of "hisilcon,mdio" and
"hisilicon,hns-mdio" according to Arnd's comments.
and reformat it according to comments from Rob<robh@kernel.org>.
Signed-off-by: huangdaode <huangdaode@hisilicon.com> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 18 Nov 2015 16:59:29 +0000 (08:59 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Assorted bug fixes, the mlock2 system call gets added, and one
improvement. The boot from dasd devices is now possible from a wider
range of devices"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: remove SALIPL loader
s390: wire up mlock2 system call
s390: remove g5 elf platform support
s390: avoid cache aliasing under z/VM and KVM
s390/sclp: _sclp_wait_int(): retain full PSW mask
s390/zcrypt: Fix initialisation when zcrypt is built-in
s390/zcrypt: Fix kernel crash on systems without AP bus support
s390: add support for ipl devices in subchannel sets > 0
s390/ipl: fix out of bounds access in scpdata_write
s390/pci_dma: improve debugging of errors during dma map
s390/pci_dma: handle dma table failures
s390/pci_dma: unify label of invalid translation table entries
s390/syscalls: remove system call number calculation
s390/cio: simplify css_generate_pgid
s390/diag: add a s390 prefix to the diagnose trace point
s390/head: fix error message on unsupported hardware
Linus Torvalds [Wed, 18 Nov 2015 16:43:29 +0000 (08:43 -0800)]
Merge tag 'hwmon-for-linus-v4.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
"Fix build issues in scpi and ina2xx drivers, update scpi driver to
support recent firmware, and fix an uninitialized variable warning in
applesmc driver"
1) Fix list tests in netfilter ingress support, from Florian Westphal.
2) Fix reversal of input and output interfaces in ingress hook
invocation, from Pablo Neira Ayuso.
3) We have a use after free in r8169, caught by Dave Jones, fixed by
Francois Romieu.
4) Splice use-after-free fix in AF_UNIX frmo Hannes Frederic Sowa.
5) Three ipv6 route handling bug fixes from Martin KaFai Lau:
a) Don't create clone routes not managed by the fib6 tree
b) Don't forget to check expiration of DST_NOCACHE routes.
c) Handle rt->dst.from == NULL properly.
6) Several AF_PACKET fixes wrt transport header setting and SKB
protocol setting, from Daniel Borkmann.
7) Fix thunder driver crash on shutdown, from Pavel Fedin.
8) Several Mellanox driver fixes (max MTU calculations, use of correct
DMA unmap in TX path, etc.) from Saeed Mahameed, Tariq Toukan, Doron
Tsur, Achiad Shochat, Eran Ben Elisha, and Noa Osherovich.
9) Several mv88e6060 DSA driver fixes (wrong bit definitions for
certain registers, etc.) from Neil Armstrong.
10) Make sure to disable preemption while updating per-cpu stats of ip
tunnels, from Jason A. Donenfeld.
11) Various ARM64 bpf JIT fixes, from Yang Shi.
12) Flush icache properly in ARM JITs, from Daniel Borkmann.
13) Fix masking of RX and TX interrupts in ravb driver, from Masaru
Nagai.
14) Fix netdev feature propagation for devices not implementing
->ndo_set_features(). From Nikolay Aleksandrov.
15) Big endian fix in vmxnet3 driver, from Shrikrishna Khare.
16) RAW socket code increments incorrect SNMP counters, fix from Ben
Cartwright-Cox.
17) IPv6 multicast SNMP counters are bumped twice, fix from Neil Horman.
18) Fix handling of VLAN headers on stacked devices when REORDER is
disabled. From Vlad Yasevich.
19) Fix SKB leaks and use-after-free in ipvlan and macvlan drivers, from
Sabrina Dubroca.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
MAINTAINERS: Update Mellanox's Eth NIC driver entries
net/core: revert "net: fix __netdev_update_features return.." and add comment
af_unix: take receive queue lock while appending new skb
rtnetlink: fix frame size warning in rtnl_fill_ifinfo
net: use skb_clone to avoid alloc_pages failure.
packet: Use PAGE_ALIGNED macro
packet: Don't check frames_per_block against negative values
net: phy: Use interrupts when available in NOLINK state
phy: marvell: Add support for 88E1540 PHY
arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
macvlan: fix leak in macvlan_handle_frame
ipvlan: fix use after free of skb
ipvlan: fix leak in ipvlan_rcv_frame
vlan: Do not put vlan headers back on bridge and macvlan ports
vlan: Fix untag operations of stacked vlans with REORDER_HEADER off
via-velocity: unconditionally drop frames with bad l2 length
ipg: Remove ipg driver
dl2k: Add support for IP1000A-based cards
snmp: Remove duplicate OUTMCAST stat increment
net: thunder: Check for driver data in nicvf_remove()
...
net/core: revert "net: fix __netdev_update_features return.." and add comment
This reverts commit 00ee59271777 ("net: fix __netdev_update_features return
on ndo_set_features failure")
and adds a comment explaining why it's okay to return a value other than
0 upon error. Some drivers might actually change flags and return an
error so it's better to fire a spurious notification rather than miss
these.
CC: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
af_unix: take receive queue lock while appending new skb
While possibly in future we don't necessarily need to use
sk_buff_head.lock this is a rather larger change, as it affects the
af_unix fd garbage collector, diag and socket cleanups. This is too much
for a stable patch.
For the time being grab sk_buff_head.lock without disabling bh and irqs,
so don't use locked skb_queue_tail.
Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support") Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
rtnetlink: fix frame size warning in rtnl_fill_ifinfo
Fix the following warning:
CC net/core/rtnetlink.o
net/core/rtnetlink.c: In function ‘rtnl_fill_ifinfo’:
net/core/rtnetlink.c:1308:1: warning: the frame size of 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
}
^
by splitting up the huge rtnl_fill_ifinfo into some smaller ones, so we
don't have the huge frame allocations at the same time.
Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Zhang [Tue, 17 Nov 2015 12:49:30 +0000 (20:49 +0800)]
net: use skb_clone to avoid alloc_pages failure.
1. new skb only need dst and ip address(v4 or v6).
2. skb_copy may need high order pages, which is very rare on long running server.
Signed-off-by: Junwei Zhang <linggao.zjw@alibaba-inc.com> Signed-off-by: Martin Zhang <martinbj2008@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Mon, 16 Nov 2015 22:36:46 +0000 (23:36 +0100)]
net: phy: Use interrupts when available in NOLINK state
The NOLINK state will poll the phy once a second to see if the link
has come up. If the phy has an interrupt line, this polling can be
skipped, since the phy should interrupt when the link returns.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
The 88E1540 can be found embedded in the Marvell 88E6352 switch. It
is compatible with the 88E1510, so add support for it, using the 88E1510 specific functions.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Shi [Mon, 16 Nov 2015 22:35:35 +0000 (14:35 -0800)]
arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
Save and restore FP/LR in BPF prog prologue and epilogue, save SP to FP
in prologue in order to get the correct stack backtrace.
However, ARM64 JIT used FP (x29) as eBPF fp register, FP is subjected to
change during function call so it may cause the BPF prog stack base address
change too.
Use x25 to replace FP as BPF stack base register (fp). Since x25 is callee
saved register, so it will keep intact during function call.
It is initialized in BPF prog prologue when BPF prog is started to run
everytime. Save and restore x25/x26 in BPF prologue and epilogue to keep
them intact for the outside of BPF. Actually, x26 is unnecessary, but SP
requires 16 bytes alignment.
CC: Zi Shen Lim <zlim.lnx@gmail.com> CC: Xi Wang <xi.wang@gmail.com> Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Mon, 16 Nov 2015 21:54:20 +0000 (22:54 +0100)]
macvlan: fix leak in macvlan_handle_frame
Reset pskb in macvlan_handle_frame in case skb_share_check returned a
clone.
Fixes: 8a4eb5734e8d ("net: introduce rx_handler results and logic around that") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Mon, 16 Nov 2015 21:44:53 +0000 (22:44 +0100)]
ipvlan: fix use after free of skb
ipvlan_handle_frame is a rx_handler, and when it returns a value other
than RX_HANDLER_CONSUMED (here, NET_RX_DROP aka RX_HANDLER_ANOTHER),
__netif_receive_skb_core expects that the skb still exists and will
process it further, but we just freed it.
Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Mon, 16 Nov 2015 21:34:26 +0000 (22:34 +0100)]
ipvlan: fix leak in ipvlan_rcv_frame
Pass a **skb to ipvlan_rcv_frame so that if skb_share_check returns a
new skb, we actually use it during further processing.
It's safe to ignore the new skb in the ipvlan_xmit_* functions, because
they call ipvlan_rcv_frame with local == true, so that dev_forward_skb
is called and always takes ownership of the skb.
Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 17 Nov 2015 19:38:36 +0000 (14:38 -0500)]
Merge branch 'vlan-reorder'
Vladislav Yasevich says:
====================
Fix issues with vlans without REORDER_HEADER
A while ago Phil Sutter brought up an issue with vlans without
REORDER_HEADER and bridges. The problem was that if a vlan
without REORDER_HEADER was a port in the bridge, the bridge ended
up forwarding corrupted packets that still contained the vlan header.
The same issue exists for bridge mode macvlan/macvtap devices.
An additional issue with vlans without REORDER_HEADER is that stacking
them also doesn't work. The reason here is that skb_reorder_vlan_header()
function assumes that it on ETH_HLEN bytes deep into the packet. That
is not the case, when you a vlan without REORRDER_HEADER flag set.
This series attempts to correct these 2 issues.
1) To solve the stacked vlans problem, the patch simply use
skb->mac_len as an offset to start copying mac addresses that
is part of header reordering.
2) To fix the issue with bridge/macvlan/macvtap, the second patch
simply doesn't write the vlan header back to the packet if the
vlan device is either a bridge or a macvlan port. This ends up
being the simplest and least performance intrussive solution.
I've considered extending patch 2 to all stacked devices (essentially
checked for the presense of rx_handler), but that feels like a broader
restriction and _may_ break existing uses.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Mon, 16 Nov 2015 20:43:45 +0000 (15:43 -0500)]
vlan: Do not put vlan headers back on bridge and macvlan ports
When a vlan is configured with REORDER_HEADER set to 0, the vlan
header is put back into the packet and makes it appear that
the vlan header is still there even after it's been processed.
This posses a problem for bridge and macvlan ports. The packets
passed to those device may be forwarded and at the time of the
forward, vlan headers end up being unexpectedly present.
With the patch, we make sure that we do not put the vlan header
back (when REORDER_HEADER is 0) if a bridge or macvlan has
been configured on top of the vlan device.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Yasevich [Mon, 16 Nov 2015 20:43:44 +0000 (15:43 -0500)]
vlan: Fix untag operations of stacked vlans with REORDER_HEADER off
When we have multiple stacked vlan devices all of which have
turned off REORDER_HEADER flag, the untag operation does not
locate the ethernet addresses correctly for nested vlans.
The reason is that in case of REORDER_HEADER flag being off,
the outer vlan headers are put back and the mac_len is adjusted
to account for the presense of the header. Then, the subsequent
untag operation, for the next level vlan, always use VLAN_ETH_HLEN
to locate the begining of the ethernet header and that ends up
being a multiple of 4 bytes short of the actuall beginning
of the mac header (the multiple depending on the how many vlan
encapsulations ethere are).
As a reslult, if there are multiple levles of vlan devices
with REODER_HEADER being off, the recevied packets end up
being dropped.
To solve this, we use skb->mac_len as the offset. The value
is always set on receive path and starts out as a ETH_HLEN.
The value is also updated when the vlan header manupations occur
so we know it will be correct.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Timo Teräs [Mon, 16 Nov 2015 12:36:32 +0000 (14:36 +0200)]
via-velocity: unconditionally drop frames with bad l2 length
By default the driver allowed incorrect frames to be received. What is
worse the code does not handle very short frames correctly. The FCS
length is unconditionally subtracted, and the underflow can cause
skb_put to be called with large number after implicit cast to unsigned.
And indeed, an skb_over_panic() was observed with via-velocity.
This removes the module parameter as it does not work in it's
current state, and should be implemented via NETIF_F_RXALL if needed.
Suggested-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 17 Nov 2015 18:11:08 +0000 (10:11 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"A fs-cache regression fix, and adding a warning about obnoxiou^W
moderation of list given in MAINTAINERS"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
MAINTAINERS: linux-cachefs@redhat.com is moderated for non-subscribers
FS-Cache: Add missing initialization of ret in cachefiles_write_page()
FS-Cache: Add missing initialization of ret in cachefiles_write_page()
fs/cachefiles/rdwr.c: In function ‘cachefiles_write_page’:
fs/cachefiles/rdwr.c:882: warning: ‘ret’ may be used uninitialized in
this function
If the jump to label "error" is taken, "ret" will indeed be
uninitialized, and random stack data may be printed by the debug code.
Fixes: 102f4d900c9c8f5e ("FS-Cache: Handle a write to the page immediately beyond the EOF marker") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Ondrej Zary [Sun, 15 Nov 2015 21:36:11 +0000 (22:36 +0100)]
dl2k: Add support for IP1000A-based cards
Add support for IP1000A chips to dl2k driver.
IP1000A chip looks like a TC9020 with integrated PHY.
This allows IP1000A chips to work reliably because the ipg driver is
buggy - it loses packets under load and then completely stops
transmitting data.
Tested with Asus NX1101 v2.0 at 10, 100 and 1000Mbps:
vendor=0x13f0 device=0x1023 (rev 0x41)
subsystem vendor=0x1043 device=0x8180
MAC address registers access needed to be changed from 8-bit to 16-bit
because 8-bit does not work on IP1000A. 8-bit access is not even
allowed in the TC9020 datasheet (although it worked). 16-bit access
works on both.
Tested that it does not break D-Link DGE-550T (DL-2000 chip, probably
a rebranded TC9020):
vendor=0x1186 device=0x4000 (rev 0x0c)
subsystem vendor=0x1186 device=0x4000
Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 16 Nov 2015 18:09:10 +0000 (13:09 -0500)]
snmp: Remove duplicate OUTMCAST stat increment
the OUTMCAST stat is double incremented, getting bumped once in the mcast code
itself, and again in the common ip output path. Remove the mcast bump, as its
not needed
Validated by the reporter, with good results
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Claus Jensen <claus.jensen@microsemi.com> CC: Claus Jensen <claus.jensen@microsemi.com> CC: David Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Fedin [Mon, 16 Nov 2015 14:51:34 +0000 (17:51 +0300)]
net: thunder: Check for driver data in nicvf_remove()
In some cases the crash is caused by nicvf_remove() being called from
outside. For example, if we try to feed the device to vfio after the
probe has failed for some reason. So, move the check to better place.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The ":" is supposed to be preceded by a driver name, but in this
case it is an empty string since the device has no parent.
There are many types of network devices without a parent. The
anonymous warnings for these devices can be hard to debug. Log
the network device name instead in these cases to assist further
debugging.
This is mostly similar to how __netdev_printk() handles orphan
devices.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
af_unix: don't append consumed skbs to sk_receive_queue
In case multiple writes to a unix stream socket race we could end up in a
situation where we pre-allocate a new skb for use in unix_stream_sendpage
but have to free it again in the locked section because another skb
has been appended meanwhile, which we must use. Accidentally we didn't
clear the pointer after consuming it and so we touched freed memory
while appending it to the sk_receive_queue. So, clear the pointer after
consuming the skb.
This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.
Fixes: 869e7c62486e ("net: af_unix: implement stream sendpage support") Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dragos Tatulea [Mon, 16 Nov 2015 09:52:48 +0000 (10:52 +0100)]
net: switchdev: fix return code of fdb_dump stub
rtnl_fdb_dump always expects an index to be returned by the ndo_fdb_dump op,
but when CONFIG_NET_SWITCHDEV is off, it returns an error.
Fix that by returning the given unmodified idx.
A similar fix was 0890cf6cb6ab ("switchdev: fix return value of
switchdev_port_fdb_dump in case of error") but for the CONFIG_NET_SWITCHDEV=y
case.
Fixes: 45d4122ca7cd ("switchdev: add support for fdb add/del/dump via switchdev_port_obj ops.") Signed-off-by: Dragos Tatulea <dragos@endocode.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Sun, 15 Nov 2015 13:02:16 +0000 (15:02 +0200)]
bnx2x: Fix VLANs null-pointer for 57710, 57711
Commit 05cc5a39ddb7 "bnx2x: add vlan filtering offload" introduced
a regression in regard for vlans for 57710, 57711 adapters -
Loading 8021q module on a machine with such an adapter would cause
a null pointer dereference, as the driver mistakenly publishes it
has capabilities for vlan CTAG filtering.
Reported-by: Otto Sabart <osabart@redhat.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Masaru Nagai [Sun, 15 Nov 2015 12:34:42 +0000 (21:34 +0900)]
ravb: remove unhandle int cause
This driver does not handle the AVB-DMAC Receive FIFO Warning interrupt
now, so the interrupt should not be enabled.
Signed-off-by: Masaru Nagai <masaru.nagai.vx@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
raw: increment correct SNMP counters for ICMP messages
Sending ICMP packets with raw sockets ends up in the SNMP counters
logging the type as the first byte of the IPv4 header rather than
the ICMP header. This is fixed by adding the IP Header Length to
the casting into a icmphdr struct.
Signed-off-by: Ben Cartwright-Cox <ben@benjojo.co.uk> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: fix __netdev_update_features return on ndo_set_features failure
If ndo_set_features fails __netdev_update_features() will return -1 but
this is wrong because it is expected to return 0 if no features were
changed (see netdev_update_features()), which will cause a netdev
notifier to be called without any actual changes. Fix this by returning
0 if ndo_set_features fails.
Fixes: 6cb6a27c45ce ("net: Call netdev_features_change() from netdev_update_features()") CC: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: fix feature changes on devices without ndo_set_features
When __netdev_update_features() was updated to ensure some features are
disabled on new lower devices, an error was introduced for devices which
don't have the ndo_set_features() method set. Before we'll just set the
new features, but now we return an error and don't set them. Fix this by
returning the old behaviour and setting err to 0 when ndo_set_features
is not present.
Fixes: e7868a85e1b2 ("net/core: ensure features get disabled on new lower devs") CC: Jarod Wilson <jarod@redhat.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Ido Schimmel <idosch@mellanox.com> CC: Sander Eikelenboom <linux@eikelenboom.it> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com> Reviewed-by: Jarod Wilson <jarod@redhat.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Dave Young <dyoung@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Fri, 13 Nov 2015 10:36:57 +0000 (11:36 +0100)]
be2net: remove unused local rsstable array
Remove rsstable array and its initialization from be_set_rss_hash_opts().
The array became unused after "e255787 be2net: Support for configurable
RSS hash key". The initial RSS table is now filled and stored for later
usage during Rx queue creation.
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Sathya Perla <sathya.perla@avagotech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Masaru Nagai [Fri, 13 Nov 2015 10:24:49 +0000 (19:24 +0900)]
ravb: Fix int mask value overwritten issue
When RX/TX interrupt for Network Control queue and Best Effort queue
is issued at the same time, the interrupt mask of Network Control
queue will be reset when the mask of Best Effort queue is set.
This patch fixes this problem.
Signed-off-by: Masaru Nagai <masaru.nagai.vx@renesas.com> Signed-off-by: Yoshihiro Kaneko <ykaneko0929@gmail.com> Acked-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Fedin [Fri, 13 Nov 2015 06:46:59 +0000 (09:46 +0300)]
net: smsc911x: Reset PHY during initialization
On certain hardware after software reboot the chip may get stuck and fail
to reinitialize during reset. This can be fixed by ensuring that PHY is
reset too.
Old PHY resetting method required operational MDIO interface, therefore
the chip should have been already set up. In order to be able to function
during probe, it is changed to use PMT_CTRL register.
The problem could be observed on SMDK5410 board.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 14 Nov 2015 00:16:18 +0000 (01:16 +0100)]
bpf, arm64: start flushing icache range from header
While recently going over ARM64's BPF code, I noticed that the icache
range we're flushing should start at header already and not at ctx.image.
Reason is that after b569c1c622c5 ("net: bpf: arm64: address randomize
and write protect JIT code"), we also want to make sure to flush the
random-sized trap in front of the start of the actual program (analogous
to x86). No operational differences from user side.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Zi Shen Lim <zlim.lnx@gmail.com> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Sat, 14 Nov 2015 00:26:53 +0000 (01:26 +0100)]
bpf, arm: start flushing icache range from header
During review I noticed that the icache range we're flushing should
start at header already and not at ctx.image.
Reason is that after 55309dd3d4cd ("net: bpf: arm: address randomize
and write protect JIT code"), we also want to make sure to flush the
random-sized trap in front of the start of the actual program (analogous
to x86). No operational differences from user side.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Nicolas Schichan <nschichan@freebox.fr> Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Shi [Thu, 12 Nov 2015 22:07:46 +0000 (14:07 -0800)]
bpf: samples: exclude asm/sysreg.h for arm64
commit 338d4f49d6f7114a017d294ccf7374df4f998edc
("arm64: kernel: Add support for Privileged Access Never") includes sysreg.h
into futex.h and uaccess.h. But, the inline assembly used by asm/sysreg.h is
incompatible with llvm so it will cause BPF samples build failure for ARM64.
Since sysreg.h is useless for BPF samples, just exclude it from Makefile via
defining __ASM_SYSREG_H.
Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Shi [Thu, 12 Nov 2015 21:57:00 +0000 (13:57 -0800)]
arm64: bpf: fix JIT frame pointer setup
BPF fp should point to the top of the BPF prog stack. The original
implementation made it point to the bottom incorrectly.
Move A64_SP to fp before reserve BPF prog stack space.
CC: Zi Shen Lim <zlim.lnx@gmail.com> CC: Xi Wang <xi.wang@gmail.com> Signed-off-by: Yang Shi <yang.shi@linaro.org> Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Måns Rullgård [Thu, 12 Nov 2015 18:41:12 +0000 (18:41 +0000)]
net: phy: vitesse: add support for VSC8601
This adds support for the Vitesse VSC8601 PHY. Generic functions are
used for everything except interrupt handling.
Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Måns Rullgård [Thu, 12 Nov 2015 17:40:20 +0000 (17:40 +0000)]
net: phy: at803x: support interrupt on 8030 and 8035
Commit 77a993942 "phy/at8031: enable at8031 to work on interrupt mode"
added interrupt support for the 8031 PHY but left out the other two
chips supported by this driver.
This patch sets the .ack_interrupt and .config_intr functions for the
8030 and 8035 drivers as well.
Signed-off-by: Mans Rullgard <mans@mansr.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ip_tunnel: disable preemption when updating per-cpu tstats
Drivers like vxlan use the recently introduced
udp_tunnel_xmit_skb/udp_tunnel6_xmit_skb APIs. udp_tunnel6_xmit_skb
makes use of ip6tunnel_xmit, and ip6tunnel_xmit, after sending the
packet, updates the struct stats using the usual
u64_stats_update_begin/end calls on this_cpu_ptr(dev->tstats).
udp_tunnel_xmit_skb makes use of iptunnel_xmit, which doesn't touch
tstats, so drivers like vxlan, immediately after, call
iptunnel_xmit_stats, which does the same thing - calls
u64_stats_update_begin/end on this_cpu_ptr(dev->tstats).
While vxlan is probably fine (I don't know?), calling a similar function
from, say, an unbound workqueue, on a fully preemptable kernel causes
real issues:
The solution would be to protect the whole
this_cpu_ptr(dev->tstats)/u64_stats_update_begin/end blocks with
disabling preemption and then reenabling it.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sudeep Holla [Wed, 28 Oct 2015 17:17:31 +0000 (17:17 +0000)]
hwmon: (scpi) skip unsupported sensors properly
Currently it's assumed that firmware exports only the class of sensors
supported by the driver. However with newer firmware or SCPI protocol
revision, support for newer classes of sensors can be present.
The driver fails to probe with the following warning if an unsupported
class of sensor is encountered in the firmware.
sysfs: cannot create duplicate filename
'/devices/platform/scpi/scpi:sensors/hwmon/hwmon0/'
------------[ cut here ]------------
WARNING: at fs/sysfs/dir.c:31
Modules linked in:
CPU: 0 PID: 6 Comm: kworker/u12:0 Not tainted 4.3.0-rc7 #137
Hardware name: ARM Juno development board (r0) (DT)
Workqueue: deferwq deferred_probe_work_func
PC is at sysfs_warn_dup+0x54/0x78
LR is at sysfs_warn_dup+0x54/0x78
This patch fixes the above issue by skipping through the unsupported
class of SCPI sensors.
Fixes: 68acc77a2d51 ("hwmon: Support thermal zones registration for SCP temperature sensors") Fixes: ea98b29a05e9 ("hwmon: Support sensors exported via ARM SCP interface") Cc: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Punit Agrawal <punit.agrawal@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Arnd Bergmann [Mon, 16 Nov 2015 16:56:39 +0000 (17:56 +0100)]
hwmon: (scpi) add thermal-of dependency
The newly added scpi thermal support is broken when the scpi driver
is built-in but the thermal driver is a loadable module:
drivers/built-in.o: In function `scpi_hwmon_probe':
(.text+0x444d70): undefined reference to `thermal_zone_of_sensor_unregister'
(.text+0x444d94): undefined reference to `thermal_zone_of_sensor_register'
drivers/built-in.o: In function `scpi_hwmon_remove':
(text+0x444e6c): undefined reference to `thermal_zone_of_sensor_unregister'
This uses the same Kconfig trick that we have in a couple of other
drivers already to ensure we can only select the driver in valid
configurations when either THERMAL_OF is disabled, or when with a
dependency on CONFIG_THERMAL that can force SCPI to be a loadable
module in the case I was hitting.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 68acc77a2d51 ("hwmon: Support thermal zones registration for SCP temperature sensors") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Heiko Carstens [Fri, 13 Nov 2015 13:17:14 +0000 (14:17 +0100)]
s390: remove SALIPL loader
There is no known user, therefore remove the code.
Acked-by: Rob Van Der Heij <robvdheij@nl.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
commit 1f6b83e5e4d3 ("s390: avoid z13 cache aliasing") checks for the
machine type to optimize address space randomization and zero page
allocation to avoid cache aliases.
This check might fail under a hypervisor with migration support.
z/VMs "Single System Image and Live Guest Relocation" facility will
"fake" the machine type of the oldest system in the group. For example
in a group of zEC12 and Z13 the guest appears to run on a zEC12
(architecture fencing within the relocation domain)
Remove the machine type detection and always use cache aliasing
rules that are known to work for all machines. These are the z13
aliasing rules.
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix the following "maybe used uninitialized" warnings by
initializing the variables to keep the compiler quiet.
There is no "used uninitialized" in this case.
CC [M] drivers/hwmon/applesmc.o
drivers/hwmon/applesmc.c: In function ‘applesmc_init_smcreg’:
drivers/hwmon/applesmc.c:595:43: warning: ‘right_light_sensor’
may be used uninitialized in this function [-Wmaybe-uninitialized]
s->num_light_sensors = left_light_sensor + right_light_sensor;
^
drivers/hwmon/applesmc.c:540:26: note: ‘right_light_sensor’ was
declared here
bool left_light_sensor, right_light_sensor;
^
drivers/hwmon/applesmc.c:595:43: warning: ‘left_light_sensor’ may
be used uninitialized in this function [-Wmaybe-uninitialized]
s->num_light_sensors = left_light_sensor + right_light_sensor;
^
drivers/hwmon/applesmc.c:540:7: note: ‘left_light_sensor’ was
declared here
bool left_light_sensor, right_light_sensor;
^
Li Yang [Thu, 5 Nov 2015 20:18:17 +0000 (14:18 -0600)]
hwmon: (ina2xx) Fix build issue by selecting REGMAP_I2C
Since a0de56c81fcf ("hwmon: (ina2xx) convert driver to using regmap")
the driver requires REGMAP_I2C to build. Select it by default
in Kconfig.
Reported-by: Guo Chunrong <B40290@freescale.com> Cc: Marc Titinger <mtitinger@baylibre.com> Signed-off-by: Li Yang <leoli@freescale.com> Fixes: a0de56c81fcf ("hwmon: (ina2xx) convert driver to using regmap") Signed-off-by: Guenter Roeck <linux@roeck-us.net>
David S. Miller [Mon, 16 Nov 2015 01:16:26 +0000 (20:16 -0500)]
Merge branch 'mv88e6060-fixes'
Neil Armstrong says:
====================
net: dsa: mv88e6060: cleanup and fix setup
This patchset introduces some fixes and a registers addressing cleanup for
the mv88e6060 DSA driver.
The first patch removes the poll_link as mv88e6xxx.
The 3 following patches fixes the setup in regards of the datasheet.
The 2 last patches introduces a clean header and replaces all magic values.
To align with the mv88e6xxx code, add a similar header file
with all the register defines.
The file is based on the mv88e6xxx header for coherency.
Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Armstrong [Tue, 10 Nov 2015 15:51:32 +0000 (16:51 +0100)]
net: dsa: mv88e6060: use the correct bit shift for mac0
According to the mv88e6060 datasheet, the first mac byte must
be at position 9 instead of 8 since the bit 8 is used to select
if the mac address must differ for each port for Pause frames.
Use the correct shift and set the same mac address for all port.
Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>