The legacy sysfs interface to modify interfaces belonging to batman-adv
is run inside a region holding s_lock. And to add a net_device, it has
to also get the rtnl_lock. This is exactly the other way around than in
other virtual net_devices and conflicts with netdevice notifier which
executes inside rtnl_lock.
The inverted lock situation is currently solved by executing the removal
of netdevices via workqueue. The workqueue isn't executed inside
rtnl_lock and thus can independently get the s_lock and the rtnl_lock.
But this workaround fails when the netdevice notifier creates events in
quick succession and the earlier triggered removal of a net_device isn't
processed in the workqueue before the adding of the new netdevice (with
same name) event is issued.
Instead the legacy sysfs interface store events have to be enqueued in
a workqueue to loose the s_lock. The worker is then free to get the
required locks and the deadlock is avoided.
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Sven Eckelmann [Fri, 10 Jun 2016 21:00:56 +0000 (23:00 +0200)]
batman-adv: Use rtnl link in device creation example
The standard kernel API to add new virtual interfaces and attach other
interfaces to it is rtnl-link. batman-adv supports it since v3.10. This
functionality should be used instead of the legacy batman-adv-only sysfs
interface.
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Sven Eckelmann [Fri, 10 Jun 2016 21:00:55 +0000 (23:00 +0200)]
batman-adv: Define module rtnl link name
The batman-adv module can automatically be loaded when operations over the
rtnl link are triggered. This requires only the correct rtnl link name in
the module header.
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Sven Eckelmann [Tue, 7 Jun 2016 20:44:53 +0000 (22:44 +0200)]
batman-adv: Document optional batadv_algo_ops
Some operations in batadv_algo_ops are optional and marked as such in the
kerneldoc. But some of them miss the "(optional)" in their kerneldoc. These
have to also be marked to give an implementor of an algorithm the correct
background information without looking in the code calling these function
pointers.
Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Xin Long [Sun, 7 Aug 2016 06:15:13 +0000 (14:15 +0800)]
sctp: use event->chunk when it's valid
Commit 52253db924d1 ("sctp: also point GSO head_skb to the sk when
it's available") used event->chunk->head_skb to get the head_skb in
sctp_ulpevent_set_owner().
But at that moment, the event->chunk was NULL, as it cloned the skb
in sctp_ulpevent_make_rcvmsg(). Therefore, that patch didn't really
work.
This patch is to move the event->chunk initialization before calling
sctp_ulpevent_receive_data() so that it uses event->chunk when it's
valid.
Fixes: 52253db924d1 ("sctp: also point GSO head_skb to the sk when it's available") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Sat, 6 Aug 2016 00:45:37 +0000 (17:45 -0700)]
net: vxlan: lwt: Fix vxlan local traffic.
vxlan driver has bypass for local vxlan traffic, but that
depends on information about all VNIs on local system in
vxlan driver. This is not available in case of LWT.
Therefore following patch disable encap bypass for LWT
vxlan traffic.
Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Reported-by: Jakub Libosvar <jlibosva@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Sat, 6 Aug 2016 00:45:36 +0000 (17:45 -0700)]
net: vxlan: lwt: Use source ip address during route lookup.
LWT user can specify destination as well as source ip address
for given tunnel endpoint. But vxlan is ignoring given source
ip address. Following patch uses both ip address to route the
tunnel packet. This consistent with other LWT implementations,
like GENEVE and GRE.
Fixes: ee122c79d42 ("vxlan: Flow based tunneling"). Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 4 Aug 2016 22:11:13 +0000 (00:11 +0200)]
bpf: fix checksum for vlan push/pop helper
When having skbs on ingress with CHECKSUM_COMPLETE, tc BPF programs don't
push rcsum of mac header back in and after BPF run back pull out again as
opposed to some other subsystems (ovs, for example).
For cases like q-in-q, meaning when a vlan tag for offloading is already
present and we're about to push another one, then skb_vlan_push() pushes the
inner one into the skb, increasing mac header and skb_postpush_rcsum()'ing
the 4 bytes vlan header diff. Likewise, for the reverse operation in
skb_vlan_pop() for the case where vlan header needs to be pulled out of the
skb, we're decreasing the mac header and skb_postpull_rcsum()'ing the 4 bytes
rcsum of the vlan header that was removed.
However mangling the rcsum here will lead to hw csum failure for BPF case,
since we're pulling or pushing data that was not part of the current rcsum.
Changing tc BPF programs in general to push/pull rcsum around BPF_PROG_RUN()
is also not really an option since current behaviour is ABI by now, but apart
from that would also mean to do quite a bit of useless work in the sense that
usually 12 bytes need to be rcsum pushed/pulled also when we don't need to
touch this vlan related corner case. One way to fix it would be to push the
necessary rcsum fixup down into vlan helpers that are (mostly) slow-path
anyway.
Fixes: 4e10df9a60d9 ("bpf: introduce bpf_skb_vlan_push/pop() helpers") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 4 Aug 2016 22:11:12 +0000 (00:11 +0200)]
bpf: fix checksum fixups on bpf_skb_store_bytes
bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
Where we ran into an issue with bpf_skb_store_bytes() is when we did a
single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
underlying issue has been tracked down to a buffer alignment issue.
Meaning, that csum_partial() computations via skb_postpull_rcsum() and
skb_postpush_rcsum() pair invoked had a wrong result since they operated on
an odd address for the hoplimit, while other computations were done on an
even address. This mix doesn't work as-is with skb_postpull_rcsum(),
skb_postpush_rcsum() pair as it always expects at least half-word alignment
of input buffers, which is normally the case. Thus, instead of these helpers
using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
csum_block_add(), respectively. For unaligned offsets, they rotate the sum
to align it to a half-word boundary again, otherwise they work the same as
csum_sub() and csum_add().
Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
use a 0 constant for offset so that the compiler optimizes the offset & 1
test away and generates the same code as with csum_sub()/_add().
Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 4 Aug 2016 22:11:11 +0000 (00:11 +0200)]
bpf: also call skb_postpush_rcsum on xmit occasions
Follow-up to commit f8ffad69c9f8 ("bpf: add skb_postpush_rcsum and fix
dev_forward_skb occasions") to fix an issue for dev_queue_xmit() redirect
locations which need CHECKSUM_COMPLETE fixups on ingress.
For the same reasons as described in f8ffad69c9f8 already, we of course
also need this here, since dev_queue_xmit() on a veth device will let us
end up in the dev_forward_skb() helper again to cross namespaces.
Latter then calls into skb_postpull_rcsum() to pull out L2 header, so
that netif_rx_internal() sees CHECKSUM_COMPLETE as it is expected. That
is, CHECKSUM_COMPLETE on ingress covering L2 _payload_, not L2 headers.
Also here we have to address bpf_redirect() and bpf_clone_redirect().
Fixes: 3896d655f4d4 ("bpf: introduce bpf_clone_redirect() helper") Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
...leading to the following warning for !DEBUG builds:
drivers/net/ethernet/tundra/tsi108_eth.c:169:13: warning: 'dump_eth_one' defined but not used [-Wunused-function]
static void dump_eth_one(struct net_device *dev)
^
...when using the arch/powerpc/configs/mpc7448_hpc2_defconfig
Put the function definition under the same #ifdef as the call site
to avoid the warning.
Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 4 Aug 2016 14:36:22 +0000 (17:36 +0300)]
mlxsw: spectrum: Add missing DCB rollback in error path
We correctly execute mlxsw_sp_port_dcb_fini() when port is removed, but
I missed its rollback in the error path of port creation, so add it.
Fixes: f00817df2b42 ("mlxsw: spectrum: Introduce support for Data Center Bridging (DCB)") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 4 Aug 2016 14:36:21 +0000 (17:36 +0300)]
mlxsw: spectrum: Do not override PAUSE settings
The PFCC register is used to configure both PAUSE and PFC frames.
Therefore, when PFC frames are disabled we must make sure we don't
mistakenly also disable PAUSE frames (which might be enabled).
Fix this by packing the PFCC register with the current PAUSE settings.
Note that this register is also accessed via ethtool ops, but there we
are guaranteed to have PFC disabled.
Fixes: d81a6bdb87ce ("mlxsw: spectrum: Add IEEE 802.1Qbb PFC support") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 1aa661f5c3df1 ("rhashtable-test: Measure time to insert, remove & traverse entries") Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Thu, 4 Aug 2016 10:11:57 +0000 (12:11 +0200)]
sctp_diag: Respect ss adding TCPF_CLOSE to idiag_states
Since 'ss' always adds TCPF_CLOSE to idiag_states flags, sctp_diag can't
rely upon TCPF_LISTEN flag solely being present when listening sockets
are requested.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Sutter [Thu, 4 Aug 2016 10:11:56 +0000 (12:11 +0200)]
sctp_diag: Fix T3_rtx timer export
The asoc's timer value is not kept in asoc->timeouts array but in it's
primary transport instead.
Furthermore, we must export the timer only if it is pending, otherwise
the value will underrun when stored in an unsigned variable and
user space will only see a very large timeout value.
Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 7 Aug 2016 00:52:00 +0000 (20:52 -0400)]
Merge tag 'mac80211-for-davem-2016-08-05' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
First set of fixes for the current cycle:
* fix 80+80 bandwidth warning
* fix powersave with mac80211 TXQ implementation
* use correct way to free SKBs from multicast buffering
* mesh: fix operation ordering to work with all drivers
* mesh: end service period even when peer goes away
* mesh: correct HT opmode validity checks
* pass hw pointer from mac80211 to driver in TPT method,
fixing a bug (in a bit the wrong way, but that's what
we have right now)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The introduction of pre-allocated hash elements inadvertently broke
the behavior of bpf hash maps where users expected to call
bpf_map_update_elem() without considering that the map can be full.
Some programs do:
old_value = bpf_map_lookup_elem(map, key);
if (old_value) {
... prepare new_value on stack ...
bpf_map_update_elem(map, key, new_value);
}
Before pre-alloc the update() for existing element would work even
in 'map full' condition. Restore this behavior.
The above program could have updated old_value in place instead of
update() which would be faster and most programs use that approach,
but sometimes the values are large and the programs use update()
helper to do atomic replacement of the element.
Note we cannot simply update element's value in-place like percpu
hash map does and have to allocate extra num_possible_cpu elements
and use this extra reserve when the map is full.
Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: b53: Add missing ULL suffix for 64-bit constant
On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1):
drivers/net/dsa/b53/b53_common.c: In function ‘b53_arl_read’:
drivers/net/dsa/b53/b53_common.c:1072: warning: integer constant is too large for ‘long’ type
Fixes: 1da6df85c6fbed8f ("net: dsa: b53: Implement ARL add/del/dump operations") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dave Forster <dforster@brocade.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Howells [Wed, 3 Aug 2016 13:11:40 +0000 (14:11 +0100)]
rxrpc: Fix races between skb free, ACK generation and replying
Inside the kafs filesystem it is possible to occasionally have a call
processed and terminated before we've had a chance to check whether we need
to clean up the rx queue for that call because afs_send_simple_reply() ends
the call when it is done, but this is done in a workqueue item that might
happen to run to completion before afs_deliver_to_call() completes.
Further, it is possible for rxrpc_kernel_send_data() to be called to send a
reply before the last request-phase data skb is released. The rxrpc skb
destructor is where the ACK processing is done and the call state is
advanced upon release of the last skb. ACK generation is also deferred to
a work item because it's possible that the skb destructor is not called in
a context where kernel_sendmsg() can be invoked.
To this end, the following changes are made:
(1) kernel_rxrpc_data_consumed() is added. This should be called whenever
an skb is emptied so as to crank the ACK and call states. This does
not release the skb, however. kernel_rxrpc_free_skb() must now be
called to achieve that. These together replace
rxrpc_kernel_data_delivered().
(2) kernel_rxrpc_data_consumed() is wrapped by afs_data_consumed().
This makes afs_deliver_to_call() easier to work as the skb can simply
be discarded unconditionally here without trying to work out what the
return value of the ->deliver() function means.
The ->deliver() functions can, via afs_data_complete(),
afs_transfer_reply() and afs_extract_data() mark that an skb has been
consumed (thereby cranking the state) without the need to
conditionally free the skb to make sure the state is correct on an
incoming call for when the call processor tries to send the reply.
(3) rxrpc_recvmsg() now has to call kernel_rxrpc_data_consumed() when it
has finished with a packet and MSG_PEEK isn't set.
(4) rxrpc_packet_destructor() no longer calls rxrpc_hard_ACK_data().
Because of this, we no longer need to clear the destructor and put the
call before we free the skb in cases where we don't want the ACK/call
state to be cranked.
(5) The ->deliver() call-type callbacks are made to return -EAGAIN rather
than 0 if they expect more data (afs_extract_data() returns -EAGAIN to
the delivery function already), and the caller is now responsible for
producing an abort if that was the last packet.
(6) There are many bits of unmarshalling code where:
ret = afs_extract_data(call, skb, last, ...);
switch (ret) {
case 0: break;
case -EAGAIN: return 0;
default: return ret;
}
is to be found. As -EAGAIN can now be passed back to the caller, we
now just return if ret < 0:
ret = afs_extract_data(call, skb, last, ...);
if (ret < 0)
return ret;
(7) Checks for trailing data and empty final data packets has been
consolidated as afs_data_complete(). So:
if (skb->len > 0)
return -EBADMSG;
if (!last)
return 0;
becomes:
ret = afs_data_complete(call, skb, last);
if (ret < 0)
return ret;
(8) afs_transfer_reply() now checks the amount of data it has against the
amount of data desired and the amount of data in the skb and returns
an error to induce an abort if we don't get exactly what we want.
Without these changes, the following oops can occasionally be observed,
particularly if some printks are inserted into the delivery path:
This doesn't have an immediate effect, but can mess up later
LL_RESERVED_SPACE calculations, such as done in
net/ipv6/mcast.c:mld_newpack. For reference, this issue was found
from a skb_panic raised there after the length calculations had given
the wrong result.
Note the other current users of this interface
(drivers/net/tun.c:tun_set_headroom and
drivers/net/veth.c:veth_set_rx_headroom) are both checking this
correctly thus need no modification.
Thanks to Ben for some pointers from the crash dumps!
Cc: Benjamin Poirier <bpoirier@suse.com> Cc: Paolo Abeni <pabeni@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1361414 Signed-off-by: Ian Wienand <iwienand@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maxim Altshul [Thu, 4 Aug 2016 12:43:04 +0000 (15:43 +0300)]
mac80211: Add ieee80211_hw pointer to get_expected_throughput
The variable is added to allow the driver an easy access to
it's own hw->priv when the op is invoked.
This fixes a crash in wlcore because it was relying on a
station pointer that wasn't initialized yet. It's the wrong
way to fix the crash, but it solves the problem for now and
it does make sense to have the hw pointer here.
Masashi Honma [Tue, 2 Aug 2016 08:16:57 +0000 (17:16 +0900)]
mac80211: End the MPSP even if EOSP frame was not acked
If QoS frame with EOSP (end of service period) subfield=1 sent by local
peer was not acked by remote peer, local peer did not end the MPSP. This
prevents local peer from going to DOZE state. And if the remote peer
goes away without closing connection, local peer continues AWAKE state
and wastes battery.
Signed-off-by: Masashi Honma <masashi.honma@gmail.com> Acked-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Felix Fietkau [Tue, 2 Aug 2016 09:13:41 +0000 (11:13 +0200)]
mac80211: fix purging multicast PS buffer queue
The code currently assumes that buffered multicast PS frames don't have
a pending ACK frame for tx status reporting.
However, hostapd sends a broadcast deauth frame on teardown for which tx
status is requested. This can lead to the "Have pending ack frames"
warning on module reload.
Fix this by using ieee80211_free_txskb/ieee80211_purge_tx_queue.
Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Manish Chopra [Wed, 3 Aug 2016 08:02:02 +0000 (04:02 -0400)]
qlcnic: fix data structure corruption in async mbx command handling
This patch fixes a data structure corruption bug in the SRIOV VF mailbox
handler code. While handling mailbox commands from the atomic context,
driver is accessing and updating qlcnic_async_work_list_struct entry fields
in the async work list. These fields could be concurrently accessed by the
work function resulting in data corruption.
This patch restructures async mbx command handling by using a separate
async command list instead of using a list of work_struct structures.
A single work_struct is used to schedule and handle the async commands
with proper locking mechanism.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com> Signed-off-by: Sony Chacko <sony.chacko@qlogic.com> Signed-off-by: Manish Chopra <manish.chopra@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tg3: Report the correct number of RSS queues through tg3_get_rxnfc
This patch remove the wrong substraction from info->data in
tg3_get_rxnfc function. Without this patch, the number of RSS
queues reported is less by one.
Reported-by: Michal Soltys <soltys@ziu.info> Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the rx coalescing time is 0, interrupts
are not generated from the controller and rx path hangs.
To avoid this rx hang, updating the driver to not allow
rx coalescing time to be 0.
Signed-off-by: Satish Baddipadige <satish.baddipadige@broadcom.com> Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We need to get a UNKNOWN_VALUE with imm to force id
generation so lines 0-5 make r7 a valid packet pointer.
We then read two different bytes from the packet and
add them to copies of the constructed packet pointer.
r8 (line 9) and r9 (line 11) will get the same id of 1,
independently. When either of them is validated (line
13) - find_good_pkt_pointers() will also mark the other
as safe. This leads to access on line 14 being mistakenly
considered safe.
Fixes: 969bf05eb3ce ("bpf: direct packet access") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 2 Aug 2016 10:23:29 +0000 (12:23 +0200)]
net: xgene: fix maybe-uninitialized variable
Building with -Wmaybe-uninitialized shows a potential use of
an uninitialized variable:
drivers/net/ethernet/apm/xgene/xgene_enet_hw.c: In function 'xgene_enet_phy_connect':
drivers/net/ethernet/apm/xgene/xgene_enet_hw.c:802:23: warning: 'phy_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]
Although the compiler correctly identified this based on the function,
the current code is still safe as long dev->of_node is non-NULL
for the case of CONFIG_ACPI=n, which is currently the case.
The warning is now disabled by default, but still appears when
building with W=1, and other build test tools should be able to
detect it as well. Adding an #else clause here makes the code
more robust and makes it clear to the compiler that this cannot
happen.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: 8089a96f601b ("drivers: net: xgene: Add backward compatibility") Signed-off-by: David S. Miller <davem@davemloft.net>
Jarno Rajahalme [Tue, 2 Aug 2016 02:36:07 +0000 (19:36 -0700)]
openvswitch: Remove incorrect WARN_ONCE().
ovs_ct_find_existing() issues a warning if an existing conntrack entry
classified as IP_CT_NEW is found, with the premise that this should
not happen. However, a newly confirmed, non-expected conntrack entry
remains IP_CT_NEW as long as no reply direction traffic is seen. This
has resulted into somewhat confusing kernel log messages. This patch
removes this check and warning.
Fixes: 289f2253 ("openvswitch: Find existing conntrack entry after upcall.") Suggested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 3 Aug 2016 16:50:06 +0000 (12:50 -0400)]
Merge tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"A few updates and fixes:
- move the suppressing of the __builtin_return_address >0 warning to
the tracing directory only.
- metag recordmcount fix for newer glibc's
- two tracing histogram fixes that were reported by KASAN"
* tag 'trace-v4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix use-after-free in hist_register_trigger()
tracing: Fix use-after-free in hist_unreg_all/hist_enable_unreg_all
Makefile: Mute warning for __builtin_return_address(>0) for tracing only
ftrace/recordmcount: Work around for addition of metag magic but not relocations
mac80211: mesh: flush stations before beacons are stopped
Some drivers (e.g. wl18xx) expect that the last stage in the
de-initialization process will be stopping the beacons, similar to AP flow.
Update ieee80211_stop_mesh() flow accordingly.
As peers can be removed dynamically, this would not impact other drivers.
Linus Torvalds [Tue, 2 Aug 2016 23:47:06 +0000 (19:47 -0400)]
Merge tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux
Pull orangefs update from Martin Brandenburg:
"Kernel side caching and executable bugfix
This allows OrangeFS to utilize the dcache and adds an in kernel
attribute cache. We previously used the user side client for this
purpose.
We see a modest performance increase on small file operations. For
example, without the cache, compiling coreutils takes about 17
minutes. With the patch and a 50 millisecond timeout for
dcache_timeout_msecs and getattr_timeout_msecs (the default),
compiling coreutils takes about 6 minutes 20 seconds. On the same
hardware, compiling coreutils on an xfs filesystem takes 90 seconds.
We see similar improvements with mdtest and a test involving writing,
reading, and deleting a large number of small files.
Interested parties can review more data at the following URL.
The eventual goal of this is to allow getdents to turn into a
readdirplus to the OrangeFS server. The cache will be filled then,
which should provide a performance benefit to the common case of
readdir followed by getattr on each entry (i.e. ls -l).
This also fixes a bug. When orangefs_inode_permission was added, it
did not collect i_size from the OrangeFS server, since this presses an
unnecessary load on the OrangeFS server. However, it left a case
where i_size is never initialized. Then running an executable could
fail.
With this patch, size is always collected to be inserted into the
cache. Thus the bug disappears. If this patch is not accepted during
this merge window, we will send a one-line band-aid for this bug
instead"
* tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux:
Orangefs: update orangefs.txt
orangefs: Account for jiffies wraparound.
orangefs: Change default dcache and getattr timeout to 50 msec.
orangefs: Allow dcache and getattr cache time to be configured.
orangefs: Cache getattr results.
orangefs: Use d_time to avoid excessive lookups
Linus Torvalds [Tue, 2 Aug 2016 23:39:09 +0000 (19:39 -0400)]
Merge tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client
Pull Ceph updates from Ilya Dryomov:
"The highlights are:
- RADOS namespace support in libceph and CephFS (Zheng Yan and
myself). The stopgaps added in 4.5 to deny access to inodes in
namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
bit is now fully supported
- A large rework of the MDS cap flushing code (Zheng Yan)
- Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were
overly pessimistic before, bailing at the first sight of LOOKUP_RCU
On top of that we've got a few CephFS bug fixes, a couple of cleanups
and Arnd's workaround for a weird genksyms issue"
* tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
ceph: fix symbol versioning for ceph_monc_do_statfs
ceph: Correctly return NXIO errors from ceph_llseek
ceph: Mark the file cache as unreclaimable
ceph: optimize cap flush waiting
ceph: cleanup ceph_flush_snaps()
ceph: kick cap flushes before sending other cap message
ceph: introduce an inode flag to indicates if snapflush is needed
ceph: avoid sending duplicated cap flush message
ceph: unify cap flush and snapcap flush
ceph: use list instead of rbtree to track cap flushes
ceph: update types of some local varibles
ceph: include 'follows' of pending snapflush in cap reconnect message
ceph: update cap reconnect message to version 3
ceph: mount non-default filesystem by name
libceph: fsmap.user subscription support
ceph: handle LOOKUP_RCU in ceph_d_revalidate
ceph: allow dentry_lease_is_valid to work under RCU walk
ceph: clear d_fsinfo pointer under d_lock
ceph: remove ceph_mdsc_lease_release
ceph: don't use ->d_time
...
Vegard Nossum [Tue, 2 Aug 2016 21:07:30 +0000 (14:07 -0700)]
kcov: allow more fine-grained coverage instrumentation
For more targeted fuzzing, it's better to disable kernel-wide
instrumentation and instead enable it on a per-subsystem basis. This
follows the pattern of UBSAN and allows you to compile in the kcov
driver without instrumenting the whole kernel.
To instrument a part of the kernel, you can use either
# for a single file in the current directory
KCOV_INSTRUMENT_filename.o := y
or
# for all the files in the current directory (excluding subdirectories)
KCOV_INSTRUMENT := y
or
# (same as above)
ccflags-y += $(CFLAGS_KCOV)
or
# for all the files in the current directory (including subdirectories)
subdir-ccflags-y += $(CFLAGS_KCOV)
init/Kconfig: add clarification for out-of-tree modules
It doesn't trim just symbols that are totally unused in-tree - it trims
the symbols unused by any in-tree modules actually built. If you've
done a 'make localmodconfig' and only build a hundred or so modules,
it's pretty likely that your out-of-tree module will come up lacking
something...
Hopefully this will save the next guy from a Homer Simpson "D'oh!"
moment.
Link: http://lkml.kernel.org/r/10177.1469787292@turing-police.cc.vt.edu Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Michal Marek <mmarek@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rob Herring [Tue, 2 Aug 2016 21:07:24 +0000 (14:07 -0700)]
config: add android config fragments
Copy the config fragments from the AOSP common kernel android-4.4
branch. It is becoming possible to run mainline kernels with Android,
but the kernel defconfigs don't work as-is and debugging missing config
options is a pain. Adding the config fragments into the kernel tree,
makes configuring a mainline kernel as simple as:
make ARCH=arm multi_v7_defconfig android-base.config android-recommended.config
The following non-upstream config options were removed:
Alexey Dobriyan [Tue, 2 Aug 2016 21:07:21 +0000 (14:07 -0700)]
init/Kconfig: ban CONFIG_LOCALVERSION_AUTO with allmodconfig
Doing patches with allmodconfig kernel compiled and committing stuff
into local tree have unfortunate consequence: kernel version changes (as
it should) leading to recompiling and relinking of several files even if
they weren't touched (or interesting at all). This and "git-whatever"
figuring out current version slow down compilation for no good reason.
But lets face it, "allmodconfig" kernels don't care about kernel
version, they are simply compile check guinea pigs.
Make LOCALVERSION_AUTO depend on !COMPILE_TEST, so it doesn't sneak into
allmodconfig .config.
Link: http://lkml.kernel.org/r/20160707214954.GC31678@p183.telecom.by Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Michal Marek <mmarek@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akash Goel [Tue, 2 Aug 2016 21:07:18 +0000 (14:07 -0700)]
relay: add global mode support for buffer-only channels
Commit 20d8b67c06fa ("relay: add buffer-only channels; useful for early
logging") added support to use channels with no associated files.
This is useful when the exact location of relay file is not known or the
the parent directory of relay file is not available, while creating the
channel and the logging has to start right from the boot.
But there was no provision to use global mode with buffer-only channels,
which is added by this patch, without modifying the interface where
initially there will be a dummy invocation of create_buf_file callback
through which kernel client can convey the need of a global buffer.
For the use case where drivers/kernel clients want a simple interface
for the userspace, which enables them to capture data/logs from relay
file inorder & without any post processing, support of Global buffer
mode is warranted.
Modules, like i915, using relay_open() in early init would have to later
register their buffer-only relays, once debugfs is available, by calling
relay_late_setup_files(). Hence relay_late_setup_files() symbol also
needs to be exported.
Link: http://lkml.kernel.org/r/1468404563-11653-1-git-send-email-akash.goel@intel.com Signed-off-by: Akash Goel <akash.goel@intel.com> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prarit Bhargava [Tue, 2 Aug 2016 21:07:15 +0000 (14:07 -0700)]
init: allow blacklisting of module_init functions
sprint_symbol_no_offset() returns the string "function_name
[module_name]" where [module_name] is not printed for built in kernel
functions. This means that the blacklisting code will fail when
comparing module function names with the extended string.
This patch adds the functionality to block a module's module_init()
function by finding the space in the string and truncating the
comparison to that length.
Link: http://lkml.kernel.org/r/1466124387-20446-1-git-send-email-prarit@redhat.com Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yang Shi <yang.shi@linaro.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Kees Cook <keescook@chromium.org> Cc: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit e93762bbf681 ("w1: masters: omap_hdq: add support for 1-wire
mode") added a statement to clear the hdq_irqstatus flags in
hdq_read_byte().
If the hdq reading process is scheduled slowly or interrupts are
disabled for a while the hardware read activity might already be
finished on entry of hdq_read_byte(). And hdq_isr() already has set the
hdq_irqstatus to 0x6 (can be seen in debug mode) denoting that both, the
TXCOMPLETE and RXCOMPLETE interrupts occurred in parallel.
This means there is no need to wait and the hdq_read_byte() can just
read the byte from the hdq controller.
By resetting hdq_irqstatus to 0 the read process is forced to be always
waiting again (because the if statement always succeeds) but the
hardware will not issue another RXCOMPLETE interrupt. This results in a
false timeout.
Andrew F. Davis [Tue, 2 Aug 2016 21:07:09 +0000 (14:07 -0700)]
w1: add helper macro module_w1_family
The helper macro module_w1_family can be used in module drivers that
only register a w1 driver in their module init functions. Add this
macro and use it in all applicable drivers.
Link: http://lkml.kernel.org/r/20160531204313.20979-2-afd@ti.com Signed-off-by: Andrew F. Davis <afd@ti.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew F. Davis [Tue, 2 Aug 2016 21:07:06 +0000 (14:07 -0700)]
w1: remove need for ida and use PLATFORM_DEVID_AUTO
PLATFORM_DEVID_AUTO can be used to have the platform core assign a
unique ID instead of manually creating one with IDA. Do this in all
applicable drivers.
Link: http://lkml.kernel.org/r/20160531204313.20979-1-afd@ti.com Signed-off-by: Andrew F. Davis <afd@ti.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement changes made in RapidIO specification rev.3 to LP-Serial Physical
Layer register definitions:
- use per-port register offset calculations based on LP-Serial Extended
Features Block (EFB) Register Map type (I or II) with different
per-port offset step (0x20 vs 0x40 respectfully).
- remove deprecated Parallel Physical layer definitions and related
code.
Current definition of map_inb() mport operations callback uses u32 type
to specify required inbound window (IBW) size. This is limiting factor
because existing hardware - tsi721 and fsl_rio, both support IBW size up
to 16GB.
Changing type of size parameter to u64 to allow IBW size configurations
larger than 4GB.
[alexandre.bounine@idt.com: remove compiler warning about size of constant] Link: http://lkml.kernel.org/r/20160802184856.2566-1-alexandre.bounine@idt.com Link: http://lkml.kernel.org/r/1469125134-16523-11-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio/tsi721_dma: advance queue processing from transfer submit call
Add advancing transfer queue immediately from transfer submit call. DMA
performance improvement: This will start transfer without waiting for
'issue_pending' command if there is no DMA transfer in progress.
Link: http://lkml.kernel.org/r/1469125134-16523-8-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add module parameter to allow load time configuration of available
RapidIO messaging mailboxes (MBOX1 - MBOX4).
Having a messaging MBOX selector mask allows to define which MBOXes are
controlled by the mport device driver and reserve some of them for
direct use by other drivers.
Link: http://lkml.kernel.org/r/1469125134-16523-7-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Barry Wood <barry.wood@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add PCIe Maximum Read Request Size (MRRS) adjustment parameter to allow
users to override configuration register value set during PCIe bus
initialization.
Performance of Tsi721 device as PCIe bus master can be improved if MRRS
is set to its maximum value (4096 bytes). Some platforms have
limitations for supported MRRS and therefore the default value should be
preserved, unless it is known that given platform supports full set of
MRRS values defined by PCI Express specification.
Link: http://lkml.kernel.org/r/1469125134-16523-6-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio/tsi721_dma: add channel mask and queue size parameters
Add module parameters to allow load time configuration of DMA channels.
Depending on application, performance of DMA data transfers can benefit
from adjusted sizes of buffer descriptor ring and/or transaction
requests queue.
Having HW DMA channel selector mask allows to define which channels
(from seven available) are controlled by the mport device driver and
reserve some of them for direct use by other drivers.
Link: http://lkml.kernel.org/r/1469125134-16523-5-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Barry Wood <barry.wood@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rapidio: fix return value description for dma_prep functions
Update return value description for rio_dma_prep_... functions to
include error-valued pointer that can be returned by HW mport device
drivers. Return values from these functions must be checked using
IS_ERR_OR_NULL macro.
This patch is applicable to kernel versions starting from v4.6-rc1.
Link: http://lkml.kernel.org/r/1469125134-16523-4-git-send-email-alexandre.bounine@idt.com Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 2 Aug 2016 21:06:28 +0000 (14:06 -0700)]
rapidio: remove unnecessary 0x prefixes before %pa extension uses
Patch series "RapidIO subsystem updates".
This set of patches contains RapidIO subsystem fixes and updates that
have been made since kernel v4.6. The most significant update brings
changes related to the latest revision of RapidIO specification
(rev.3.x) and introduction of next generation of RapidIO switches by IDT
(RXS1632 and RXS2448).
This patch (of 13):
This is RapidIO part of the original patch submitted by Joe Perches.
(see: https://lkml.org/lkml/2016/3/5/19)
Since commit 3cab1e711297 ("lib/vsprintf: refactor duplicate code
to special_hex_number()") %pa uses have been output with a 0x prefix.
These 0x prefixes in the formats are unnecessary.
Link: http://lkml.kernel.org/r/1469125134-16523-2-git-send-email-alexandre.bounine@idt.com Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Barry Wood <barry.wood@idt.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add channelized messaging driver to support native RapidIO messaging
exchange between multiple senders/recipients on devices that use kernel
RapidIO subsystem services.
This device driver is the result of collaboration within the RapidIO.org
Software Task Group (STG) between Texas Instruments, Prodrive
Technologies, Nokia Networks, BAE and IDT. Additional input was
received from other members of RapidIO.org.
The objective was to create a character mode driver interface which
exposes messaging capabilities of RapidIO endpoint devices (mports)
directly to applications, in a manner that allows the numerous and
varied RapidIO implementations to interoperate.
This char mode device driver allows user-space applications to setup
messaging communication channels using single shared RapidIO messaging
mailbox.
By default this driver uses RapidIO MBOX_1 (MBOX_0 is reserved for use by
RIONET Ethernet emulation driver).
zhong jiang [Tue, 2 Aug 2016 21:06:22 +0000 (14:06 -0700)]
kexec: add restriction on kexec_load() segment sizes
I hit the following issue when run trinity in my system. The kernel is
3.4 version, but mainline has the same issue.
The root cause is that the segment size is too large so the kerenl
spends too long trying to allocate a page. Other cases will block until
the test case quits. Also, OOM conditions will occur.
Petr Tesarik [Tue, 2 Aug 2016 21:06:19 +0000 (14:06 -0700)]
kexec: allow kdump with crash_kexec_post_notifiers
If a crash kernel is loaded, do not crash the running domain. This is
needed if the kernel is loaded with crash_kexec_post_notifiers, because
panic notifiers are run before __crash_kexec() in that case, and this
Xen hook prevents its being called later.
[akpm@linux-foundation.org: build fix: unconditionally include kexec.h] Link: http://lkml.kernel.org/r/20160713122000.14969.99963.stgit@hananiah.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Petr Tesarik [Tue, 2 Aug 2016 21:06:16 +0000 (14:06 -0700)]
kexec: add a kexec_crash_loaded() function
Provide a wrapper function to be used by kernel code to check whether a
crash kernel is loaded. It returns the same value that can be seen in
/sys/kernel/kexec_crash_loaded by userspace programs.
I'm exporting the function, because it will be used by Xen, and it is
possible to compile Xen modules separately to enable the use of PV
drivers with unmodified bare-metal kernels.
Link: http://lkml.kernel.org/r/20160713121955.14969.69080.stgit@hananiah.suse.cz Signed-off-by: Petr Tesarik <ptesarik@suse.com> Cc: Juergen Gross <jgross@suse.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Eric Biederman <ebiederm@xmission.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hidehiro Kawai [Tue, 2 Aug 2016 21:06:13 +0000 (14:06 -0700)]
kexec: use core_param for crash_kexec_post_notifiers boot option
crash_kexec_post_notifiers ia a boot option which controls whether the
1st kernel calls panic notifiers or not before booting the 2nd kernel.
However, there is no need to limit it to being modifiable only at boot
time. So, use core_param instead of early_param.
Link: http://lkml.kernel.org/r/20160705113327.5864.43139.stgit@softrs Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:10 +0000 (14:06 -0700)]
ARM: kexec: fix kexec for Keystone 2
Provide kexec with the boot view of memory by overriding the normal
kexec translation functions added in a previous patch. We also need to
fix a call to memblock in machine_kexec_prepare() so that we provide it
with a running-view physical address rather than a boot- view physical
address.
Link: http://lkml.kernel.org/r/E1b8koa-0004Hl-Ey@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Keerthy <j-keerthy@ti.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit adds definition for cpu_on, cpu_off and cpu_suspend
commands. These definitions must match the corresponding PSCI
definitions in boot monitor.
Having those command and corresponding PSCI support in boot monitor
allows run time CPU hot plugin.
Link: http://lkml.kernel.org/r/E1b8koV-0004Hf-2j@rmk-PC.armlinux.org.uk Signed-off-by: Keerthy <j-keerthy@ti.com> Signed-off-by: Vitaly Andrianov <vitalya@ti.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Pratyush Anand <panand@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:04 +0000 (14:06 -0700)]
kexec: allow architectures to override boot mapping
kexec physical addresses are the boot-time view of the system. For
certain ARM systems (such as Keystone 2), the boot view of the system
does not match the kernel's view of the system: the boot view uses a
special alias in the lower 4GB of the physical address space.
To cater for these kinds of setups, we need to translate between the
boot view physical addresses and the normal kernel view physical
addresses. This patch extracts the current transation points into
linux/kexec.h, and allows an architecture to override the functions.
Due to the translations required, we unfortunately end up with six
translation functions, which are reduced down to four that the
architecture can override.
[akpm@linux-foundation.org: kexec.h needs asm/io.h for phys_to_virt()] Link: http://lkml.kernel.org/r/E1b8koP-0004HZ-Vf@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Keerthy <j-keerthy@ti.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:06:00 +0000 (14:06 -0700)]
kdump: arrange for paddr_vmcoreinfo_note() to return phys_addr_t
On PAE systems (eg, ARM LPAE) the vmcore note may be located above 4GB
physical on 32-bit architectures, so we need a wider type than "unsigned
long" here. Arrange for paddr_vmcoreinfo_note() to return a
phys_addr_t, thereby allowing it to be located above 4GB.
This makes no difference for kexec-tools, as they already assume a
64-bit type when reading from this file.
Link: http://lkml.kernel.org/r/E1b8koK-0004HS-K9@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:57 +0000 (14:05 -0700)]
kexec: ensure user memory sizes do not wrap
Ensure that user memory sizes do not wrap around when validating the
user input, which can lead to the following input validation working
incorrectly.
[akpm@linux-foundation.org: fix it for kexec-return-error-number-directly.patch] Link: http://lkml.kernel.org/r/E1b8koF-0004HM-5x@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:54 +0000 (14:05 -0700)]
kexec: don't invoke OOM-killer for control page allocation
If we are unable to find a suitable page when allocating the control
page, do not invoke the OOM-killer: killing processes probably isn't
going to help.
Link: http://lkml.kernel.org/r/E1b8ko9-0004HG-R5@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Russell King [Tue, 2 Aug 2016 21:05:51 +0000 (14:05 -0700)]
ARM: kexec: advertise location of bootable RAM
Advertise the location of bootable RAM to kexec-tools. kexec needs to
know where it can place the kernel in RAM, and so be executable when the
system needs to jump into it.
Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
tag.
Link: http://lkml.kernel.org/r/E1b8ko4-0004HA-GF@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: Pratyush Anand <panand@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Advertise a resource which describes where the crash kernel is located
in the boot view of RAM. This allows kexec-tools to have this vital
information.
Link: http://lkml.kernel.org/r/E1b8knz-0004H4-Bd@rmk-PC.armlinux.org.uk Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Baoquan He <bhe@redhat.com> Cc: Keerthy <j-keerthy@ti.com> Cc: Pratyush Anand <panand@redhat.com> Cc: Vitaly Andrianov <vitalya@ti.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minfei Huang [Tue, 2 Aug 2016 21:05:45 +0000 (14:05 -0700)]
kexec: return error number directly
This is a cleanup patch to make kexec more clear to return error number
directly. The variable result is useless, because there is no other
function's return value assignes to it. So remove it.
Link: http://lkml.kernel.org/r/1464179273-57668-1-git-send-email-mnghuan@gmail.com Signed-off-by: Minfei Huang <mnghuan@gmail.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Atsushi Kumagai <ats-kumagai@wm.jp.nec.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Lutomirski [Tue, 2 Aug 2016 21:05:36 +0000 (14:05 -0700)]
signal: consolidate {TS,TLF}_RESTORE_SIGMASK code
In general, there's no need for the "restore sigmask" flag to live in
ti->flags. alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.
Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.
Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.
Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org Signed-off-by: Andy Lutomirski <luto@kernel.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dmitry Safonov <dsafonov@virtuozzo.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Mahoney [Tue, 2 Aug 2016 21:05:33 +0000 (14:05 -0700)]
reiserfs: fix "new_insert_key may be used uninitialized ..."
new_insert_key only makes any sense when it's associated with a
new_insert_ptr, which is initialized to NULL and changed to a
buffer_head when we also initialize new_insert_key. We can key off of
that to avoid the uninitialized warning.
Link: http://lkml.kernel.org/r/5eca5ffb-2155-8df2-b4a2-f162f105efed@suse.com Signed-off-by: Jeff Mahoney <jeffm@suse.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jan Kara <jack@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:30 +0000 (14:05 -0700)]
nilfs2: move ioctl interface and disk layout to uapi separately
The header file "include/linux/nilfs2_fs.h" is composed of parts for
ioctl and disk format, and both are intended to be shared with user
space programs.
This moves them to the uapi directory "include/uapi/linux" splitting the
file to "nilfs2_api.h" and "nilfs2_ondisk.h". The following minor
changes are accompanied by this migration:
- nilfs_direct_node struct in nilfs2/direct.h is converged to
nilfs2_ondisk.h because it's an on-disk structure.
- inline functions nilfs_rec_len_from_disk() and
nilfs_rec_len_to_disk() are moved to nilfs2/dir.c.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:25 +0000 (14:05 -0700)]
nilfs2: fix misuse of a semaphore in sysfs code
Variables ns_seg_seq, ns_segnum, ns_nextnum, ns_pseg_offset, ns_cno,
ns_ctime, ns_nongc_ctime, and ns_ndirtyblks, are protected by
ns_segctor_sem, but ns_sem is wrongly used by the nilfs sysfs code when
reading these variables. This fixes the misuse and clarifies which
semaphore protects them in the comment of the_nilfs struct.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:22 +0000 (14:05 -0700)]
nilfs2: refactor parser of snapshot mount option
Move parser of snapshot mount option to a separate function
nilfs_parse_snapshot_option(), replace simple_strtoull() with
kstrtoull() to avoid checkpatch.pl warning "WARNING: simple_strtoull is
obsolete, use kstrtoull instead", and refine the error message of the
parser.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:19 +0000 (14:05 -0700)]
nilfs2: do not use yield()
Use cond_resched() instead of yield() in the loop of
nilfs_transaction_lock() since the usage corresponds to the "be nice for
others" case that the comment of yield() says.
This removes the following checkpatch.pl warning:
"WARNING: Using yield() is generally wrong. See yield() kernel-doc
(sched/core.c)"
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:17 +0000 (14:05 -0700)]
nilfs2: emit error message when I/O error is detected
When nilfs returned -EIO as an error code, it's not always clear if it
came from the underlying block device or not. This will mend the issue
by having low level I/O routines of nilfs output an error message when
they detected an I/O error.
Ryusuke Konishi [Tue, 2 Aug 2016 21:05:14 +0000 (14:05 -0700)]
nilfs2: replace nilfs_warning() with nilfs_msg()
Use nilfs_msg() to output warning messages and get rid of
nilfs_warning() function. This also removes function names from the
messages unless we embed them explicitly in format strings. Instead,
some messages are revised to clarify the context.