Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
set. Flags passed to the kernel are blindly copied to the allocated
rt6_info by ip6_route_info_create making a newly inserted route appear
as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
and expects rt->dst.from to be set - which it is not since it is not
really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
generates the fault.
Fix by checking for the flag and failing with EINVAL.
Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilan Tayari [Wed, 19 Apr 2017 18:26:07 +0000 (21:26 +0300)]
gso: Validate assumption of frag_list segementation
Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list
pointer") assumes that all SKBs in a frag_list (except maybe the last
one) contain the same amount of GSO payload.
This assumption is not always correct, resulting in the following
warning message in the log:
skb_segment: too many frags
For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
one frag, and some with 2 frags.
After GRO, the frag_list SKBs end up having different amounts of payload.
If this frag_list SKB is then forwarded, the aforementioned assumption
is violated.
Validate the assumption, and fall back to software GSO if it not true.
Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212 Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:26 +0000 (09:59 -0700)]
kaweth: use skb_cow_head() to deal with cloned skbs
We can use skb_cow_head() to properly deal with clones,
especially the ones coming from TCP stack that allow their head being
modified. This avoids a copy.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:25 +0000 (09:59 -0700)]
ch9200: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:24 +0000 (09:59 -0700)]
lan78xx: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:23 +0000 (09:59 -0700)]
sr9700: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:22 +0000 (09:59 -0700)]
cx82310_eth: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: cc28a20e77b2 ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:21 +0000 (09:59 -0700)]
smsc75xx: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Wed, 19 Apr 2017 14:10:19 +0000 (16:10 +0200)]
ipv6: sr: fix double free of skb after handling invalid SRH
The icmpv6_param_prob() function already does a kfree_skb(),
this patch removes the duplicate one.
Fixes: 1ababeba4a21f3dba3da3523c670b207fb2feb62 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Policing filters do not use the TCA_ACT_* enum and the tb[]
nlattr array in tcf_action_init_1() doesn't get filled for
them so we should not try to look for a TCA_ACT_COOKIE
attribute in the then uninitialized array.
The error handling in cookie allocation then calls
tcf_hash_release() leading to invalid memory access later
on.
Additionally, if cookie allocation fails after an already
existing non-policing filter has successfully been changed,
tcf_action_release() should not be called, also we would
have to roll back the changes in the error handling, so
instead we now allocate the cookie early and assign it on
success at the end.
CVE-2017-7979 Fixes: 1045ba77a596 ("net sched actions: Add support for user cookies") Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix issue in populating the PFC config paramters.
Change ieee_setpfc() callback implementation to populate traffic class
count with the user provided value.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix possible system hang in the dcbnl-getdcbx() path.
qed_dcbnl_get_dcbx() API uses kmalloc in GFT_KERNEL mode. The API gets
invoked in the interrupt context by qed_dcbnl_getdcbx callback. Need
to invoke this kmalloc in atomic mode.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix sending an invalid PFC error mask to MFW.
PFC error-mask value is not supported by MFW, but this bit could be
set in the pfc bit-map of the operational parameters if remote device
supports it. These operational parameters are used as basis for
populating the dcbx config parameters. User provided configs will be
applied on top of these parameters and then send them to MFW when
requested. Driver need to clear the error-mask bit before sending the
config parameters to MFW.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix possible error in populating max_tc field.
Some adapters may not publish the max_tc value. Populate the default
value for max_tc field in case the mfw didn't provide one.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Hughes [Wed, 19 Apr 2017 10:13:40 +0000 (11:13 +0100)]
smsc95xx: Use skb_cow_head to deal with cloned skbs
The driver was failing to check that the SKB wasn't cloned
before adding checksum data.
Replace existing handling to extend/copy the header buffer
with skb_cow_head.
Signed-off-by: James Hughes <james.hughes@raspberrypi.org> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sekhar Nori [Wed, 19 Apr 2017 08:38:24 +0000 (14:08 +0530)]
MAINTAINERS: update entry for TI's CPSW driver
Mugunthan V N, who was reviewing TI's CPSW driver patches is
not working for TI anymore and wont be reviewing patches for
that driver.
Drop Mugunthan as the maintiainer for this driver.
Grygorii continues to be a reviewer. Dave Miller applies the
patches directly and adding a maintainer is actually
misleading since get_maintainer.pl script stops suggesting
that Dave Miller be copied.
Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 18 Apr 2017 19:14:26 +0000 (22:14 +0300)]
dp83640: don't recieve time stamps twice
This patch is prompted by a static checker warning about a potential
use after free. The concern is that netif_rx_ni() can free "skb" and we
call it twice.
When I look at the commit that added this, it looks like some stray
lines were added accidentally. It doesn't make sense to me that we
would recieve the same data two times. I asked the author but never
recieved a response.
I can't test this code, but I'm pretty sure my patch is correct.
Fixes: 4b063258ab93 ("dp83640: Delay scheduled work.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 18 Apr 2017 15:59:49 +0000 (17:59 +0200)]
ipv6: sr: fix out-of-bounds access in SRH validation
This patch fixes an out-of-bounds access in seg6_validate_srh() when the
trailing data is less than sizeof(struct sr6_tlv).
Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Maloney [Tue, 18 Apr 2017 15:14:16 +0000 (11:14 -0400)]
selftests/net: Fixes psock_fanout CBPF test case
'psock_fanout' has been failing since commit 4d7b9dc1f36a9 ("tools:
psock_lib: harden socket filter used by psock tests"). That commit
changed the CBPF filter to examine the full ethernet frame, and was
tested on 'psock_tpacket' which uses SOCK_RAW. But 'psock_fanout' was
also using this same CBPF in two places, for filtering and fanout, on a
SOCK_DGRAM socket.
Change 'psock_fanout' to use SOCK_RAW so that the CBPF program used with
SO_ATTACH_FILTER can examine the entire frame. Create a new CBPF
program for use with PACKET_FANOUT_DATA which ignores the header, as it
cannot see the ethernet header.
Tested: Ran tools/testing/selftests/net/psock_{fanout,tpacket} 10 times,
and they all passed.
Fixes: 4d7b9dc1f36a9 ("tools: psock_lib: harden socket filter used by psock tests") Signed-off-by: 'Mike Maloney <maloneykernel@gmail.com>' Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Thu, 20 Apr 2017 19:32:16 +0000 (21:32 +0200)]
mac80211: reject ToDS broadcast data frames
AP/AP_VLAN modes don't accept any real 802.11 multicast data
frames, but since they do need to accept broadcast management
frames the same is currently permitted for data frames. This
opens a security problem because such frames would be decrypted
with the GTK, and could even contain unicast L3 frames.
Since the spec says that ToDS frames must always have the BSSID
as the RA (addr1), reject any other data frames.
The problem was originally reported in "Predicting, Decrypting,
and Abusing WPA2/802.11 Group Keys" at usenix
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/vanhoef
and brought to my attention by Jouni.
Cc: stable@vger.kernel.org Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
--
Dave, I didn't want to send you a new pull request for a single
commit yet again - can you apply this one patch as is? Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Apr 2017 17:36:42 +0000 (13:36 -0400)]
Merge tag 'mac80211-for-davem-2017-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A single fix, for the MU-MIMO monitor mode, that fixes
bad SKB accesses if the SKB was paged, which is the case
for the only driver supporting this - iwlwifi.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Mon, 17 Apr 2017 12:55:22 +0000 (15:55 +0300)]
sh_eth: unmap DMA buffers when freeing rings
The DMA API debugging (when enabled) causes:
WARNING: CPU: 0 PID: 1445 at lib/dma-debug.c:519 add_dma_entry+0xe0/0x12c
DMA-API: exceeded 7 overlapping mappings of cacheline 0x01b2974d
to be printed after repeated initialization of the Ether device, e.g.
suspend/resume or 'ifconfig' up/down. This is because DMA buffers mapped
using dma_map_single() in sh_eth_ring_format() and sh_eth_start_xmit() are
never unmapped. Resolve this problem by unmapping the buffers when freeing
the descriptor rings; in order to do it right, we'd have to add an extra
parameter to sh_eth_txfree() (we rename this function to sh_eth_tx_free(),
while at it).
Based on the commit a47b70ea86bd ("ravb: unmap descriptors when freeing
rings").
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) BPF tail call handling bug fixes from Daniel Borkmann.
2) Fix allowance of too many rx queues in sfc driver, from Bert
Kenward.
3) Non-loopback ipv6 packets claiming src of ::1 should be dropped,
from Florian Westphal.
4) Statistics requests on KSZ9031 can crash, fix from Grygorii
Strashko.
5) TX ring handling fixes in mediatek driver, from Sean Wang.
6) ip_ra_control can deadlock, fix lock acquisition ordering to fix,
from Cong WANG.
7) Fix use after free in ip_recv_error(), from Willem de Buijn.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
bpf: fix checking xdp_adjust_head on tail calls
bpf: fix cb access in socket filter programs on tail calls
ipv6: drop non loopback packets claiming to originate from ::1
net: ethernet: mediatek: fix inconsistency of port number carried in TXD
net: ethernet: mediatek: fix inconsistency between TXD and the used buffer
net: phy: micrel: fix crash when statistic requested for KSZ9031 phy
net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule
net: thunderx: Fix set_max_bgx_per_node for 81xx rgx
net-timestamp: avoid use-after-free in ip_recv_error
ipv4: fix a deadlock in ip_ra_control
sfc: limit the number of receive queues
Make sure the start adderess is aligned to PMD_SIZE
boundary when freeing page table backing a hugepage
region. The issue was causing segfaults when a region
backed by 64K pages was unmapped since such a region
is in general not PMD_SIZE aligned.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Jordan [Mon, 10 Apr 2017 15:50:52 +0000 (11:50 -0400)]
sparc64: Use LOCKDEP_SMALL, not PROVE_LOCKING_SMALL
CONFIG_PROVE_LOCKING_SMALL shrinks the memory usage of lockdep so the
kernel text, data, and bss fit in the required 32MB limit, but this
option is not set for every config that enables lockdep.
A 4.10 kernel fails to boot with the console output
Kernel: Using 8 locked TLB entries for main kernel image.
hypervisor_tlb_lock[2000000:0:8000000071c007c3:1]: errors with f
Program terminated
To fix, rename CONFIG_PROVE_LOCKING_SMALL to CONFIG_LOCKDEP_SMALL, and
enable this option with CONFIG_LOCKDEP=y so we get the reduced memory
usage every time lockdep is turned on.
Tested that CONFIG_LOCKDEP_SMALL is set to 'y' if and only if
CONFIG_LOCKDEP is set to 'y'. When other lockdep-related config options
that select CONFIG_LOCKDEP are enabled (e.g. CONFIG_LOCK_STAT or
CONFIG_PROVE_LOCKING), verified that CONFIG_LOCKDEP_SMALL is also
enabled.
Fixes: e6b5f1be7afe ("config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Babu Moger <babu.moger@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace testcase update from Steven Rostedt:
"While testing my development branch, without the fix for the pid use
after free bug, the selftest that Namhyung added triggers it. I
figured it would be good to add the test for the bug after the fix,
such that it does not exist without the fix.
I added another patch that lets the test only test part of the pid
filtering, and ignores the function-fork (filtering on children as
well) if the function-fork feature does not exist. This feature is
added by Namhyung just before he added this test. But since the test
tests both with and without the feature, it would be good to let it
not fail if the feature does not exist"
* tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
selftests: ftrace: Add check for function-fork before running pid filter test
selftests: ftrace: Add a testcase for function PID filter
selftests: ftrace: Add check for function-fork before running pid filter test
Have the func-filter-pid test check for the function-fork option before
testing it. It can still test the pid filtering, but will stop before
testing the function-fork option for children inheriting the pids.
This allows the test to be added before the function-fork feature, but after
a bug fix that triggers one of the bugs the test can cause.
Cc: Namhyung Kim <namhyung@kernel.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Merge tag 'trace-v4.11-rc5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fix from Steven Rostedt:
"Namhyung Kim discovered a use after free bug. It has to do with adding
a pid filter to function tracing in an instance, and then freeing the
instance"
* tag 'trace-v4.11-rc5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix function pid filter on instances
Namhyung Kim [Mon, 17 Apr 2017 02:44:30 +0000 (11:44 +0900)]
selftests: ftrace: Add a testcase for function PID filter
Like event pid filtering test, add function pid filtering test with the
new "function-fork" option. It also tests it on an instance directory
so that it can verify the bug related pid filtering on instances.
Herbert Xu [Thu, 13 Apr 2017 10:35:59 +0000 (18:35 +0800)]
af_key: Fix sadb_x_ipsecrequest parsing
The parsing of sadb_x_ipsecrequest is broken in a number of ways.
First of all we're not verifying sadb_x_ipsecrequest_len. This
is needed when the structure carries addresses at the end. Worse
we don't even look at the length when we parse those optional
addresses.
The migration code had similar parsing code that's better but
it also has some deficiencies. The length is overcounted first
of all as it includes the header itself. It also fails to check
the length before dereferencing the sa_family field.
This patch fixes those problems in parse_sockaddr_pair and then
uses it in parse_ipsecrequest.
Namhyung Kim [Mon, 17 Apr 2017 02:44:27 +0000 (11:44 +0900)]
ftrace: Fix function pid filter on instances
When function tracer has a pid filter, it adds a probe to sched_switch
to track if current task can be ignored. The probe checks the
ftrace_ignore_pid from current tr to filter tasks. But it misses to
delete the probe when removing an instance so that it can cause a crash
due to the invalid tr pointer (use-after-free).
David S. Miller [Mon, 17 Apr 2017 19:51:58 +0000 (15:51 -0400)]
Merge branch 'bpf-fixes'
Daniel Borkmann says:
====================
Two BPF fixes
The set fixes cb_access and xdp_adjust_head bits in struct bpf_prog,
that are used for requirement checks on the program rather than f.e.
heuristics. Thus, for tail calls, we cannot make any assumptions and
are forced to set them.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 17 Apr 2017 01:12:07 +0000 (03:12 +0200)]
bpf: fix checking xdp_adjust_head on tail calls
Commit 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog")
added the xdp_adjust_head bit to the BPF prog in order to tell drivers
that the program that is to be attached requires support for the XDP
bpf_xdp_adjust_head() helper such that drivers not supporting this
helper can reject the program. There are also drivers that do support
the helper, but need to check for xdp_adjust_head bit in order to move
packet metadata prepended by the firmware away for making headroom.
For these cases, the current check for xdp_adjust_head bit is insufficient
since there can be cases where the program itself does not use the
bpf_xdp_adjust_head() helper, but tail calls into another program that
uses bpf_xdp_adjust_head(). As such, the xdp_adjust_head bit is still
set to 0. Since the first program has no control over which program it
calls into, we need to assume that bpf_xdp_adjust_head() helper is used
upon tail calls. Thus, for the very same reasons in cb_access, set the
xdp_adjust_head bit to 1 when the main program uses tail calls.
Fixes: 17bedab27231 ("bpf: xdp: Allow head adjustment in XDP prog") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Mon, 17 Apr 2017 01:12:06 +0000 (03:12 +0200)]
bpf: fix cb access in socket filter programs on tail calls
Commit ff936a04e5f2 ("bpf: fix cb access in socket filter programs")
added a fix for socket filter programs such that in i) AF_PACKET the
20 bytes of skb->cb[] area gets zeroed before use in order to not leak
data, and ii) socket filter programs attached to TCP/UDP sockets need
to save/restore these 20 bytes since they are also used by protocol
layers at that time.
The problem is that bpf_prog_run_save_cb() and bpf_prog_run_clear_cb()
only look at the actual attached program to determine whether to zero
or save/restore the skb->cb[] parts. There can be cases where the
actual attached program does not access the skb->cb[], but the program
tail calls into another program which does access this area. In such
a case, the zero or save/restore is currently not performed.
Since the programs we tail call into are unknown at verification time
and can dynamically change, we need to assume that whenever the attached
program performs a tail call, that later programs could access the
skb->cb[], and therefore we need to always set cb_access to 1.
Fixes: ff936a04e5f2 ("bpf: fix cb access in socket filter programs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: drop non loopback packets claiming to originate from ::1
We lack a saddr check for ::1. This causes security issues e.g. with acls
permitting connections from ::1 because of assumption that these originate
from local machine.
Assuming a source address of ::1 is local seems reasonable.
RFC4291 doesn't allow such a source address either, so drop such packets.
Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Apr 2017 17:33:59 +0000 (13:33 -0400)]
Merge branch 'mediatek-tx-bugs'
Sean Wang says:
====================
mediatek: Fix crash caused by reporting inconsistent skb->len to BQL
Changes since v1:
- fix inconsistent enumeration which easily causes the potential bug
The series fixes kernel BUG caused by inconsistent SKB length reported
into BQL. The reason for inconsistent length comes from hardware BUG which
results in different port number carried on the TXD within the lifecycle of
SKB. So patch 2) is proposed for use a software way to track which port
the SKB involving instead of hardware way. And patch 1) is given for another
issue I found which causes TXD and SKB inconsistency that is not expected
in the initial logic, so it is also being corrected it in the series.
The log for the kernel BUG caused by the issue is posted as below.
[ 120.825955] kernel BUG at ... lib/dynamic_queue_limits.c:26!
[ 120.837684] Internal error: Oops - BUG: 0 [#1] SMP ARM
[ 120.842778] Modules linked in:
[ 120.845811] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc1-191576-gdbcef47 #35
[ 120.853488] Hardware name: Mediatek Cortex-A7 (Device Tree)
[ 120.859012] task: c1007480 task.stack: c1000000
[ 120.863510] PC is at dql_completed+0x108/0x17c
[ 120.867915] LR is at 0x46
[ 120.870512] pc : [<c03c19c8>] lr : [<00000046>] psr: 80000113
[ 120.870512] sp : c1001d58 ip : c1001d80 fp : c1001d7c
[ 120.881895] r10: 0000003e r9 : df6b3400 r8 : 0ed86506
[ 120.887075] r7 : 00000001 r6 : 00000001 r5 : 0ed8654c r4 : df0135d8
[ 120.893546] r3 : 00000001 r2 : df016800 r1 : 0000fece r0 : df6b3480
[ 120.900018] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 120.907093] Control: 10c5387d Table: 9e27806a DAC: 00000051
[ 120.912789] Process swapper/0 (pid: 0, stack limit = 0xc1000218)
[ 120.918744] Stack: (0xc1001d58 to 0xc1002000)
Sean Wang [Fri, 14 Apr 2017 03:19:12 +0000 (11:19 +0800)]
net: ethernet: mediatek: fix inconsistency of port number carried in TXD
Fix port inconsistency on TXD due to hardware BUG that would cause
different port number is carried on the same TXD between tx_map()
and tx_unmap() with the iperf test. It would cause confusing BQL
logic which leads to kernel panic when dual GMAC runs concurrently.
Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: micrel: fix crash when statistic requested for KSZ9031 phy
Now the command:
ethtool --phy-statistics eth0
will cause system crash with meassage "Unable to handle kernel NULL pointer
dereference at virtual address 00000010" from:
(kszphy_get_stats) from [<c069f1d8>] (ethtool_get_phy_stats+0xd8/0x210)
(ethtool_get_phy_stats) from [<c06a0738>] (dev_ethtool+0x5b8/0x228c)
(dev_ethtool) from [<c06b5484>] (dev_ioctl+0x3fc/0x964)
(dev_ioctl) from [<c0679f7c>] (sock_ioctl+0x170/0x2c0)
(sock_ioctl) from [<c02419d4>] (do_vfs_ioctl+0xa8/0x95c)
(do_vfs_ioctl) from [<c02422c4>] (SyS_ioctl+0x3c/0x64)
(SyS_ioctl) from [<c0107d60>] (ret_fast_syscall+0x0/0x44)
The reason: phy_driver structure for KSZ9031 phy has no .probe() callback
defined. As result, struct phy_device *phydev->priv pointer will not be
initializes (null).
This issue will affect also following phys:
KSZ8795, KSZ886X, KSZ8873MLL, KSZ9031, KSZ9021, KSZ8061, KS8737
Fix it by:
- adding .probe() = kszphy_probe() callback to KSZ9031, KSZ9021
phys. The kszphy_probe() can be re-used as it doesn't do any phy specific
settings.
- removing statistic callbacks from other phys (KSZ8795, KSZ886X,
KSZ8873MLL, KSZ8061, KS8737) as they doesn't have corresponding
statistic counters.
Fixes: 2b2427d06426 ("phy: micrel: Add ethtool statistics counters") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 13 Apr 2017 16:57:15 +0000 (10:57 -0600)]
net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule
Only need 1 l3mdev FIB rule. Fix setting NLM_F_EXCL in the nlmsghdr.
Fixes: 1aa6c4f6b8cd8 ("net: vrf: Add l3mdev rules on first device create") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
George Cherian [Thu, 13 Apr 2017 07:25:01 +0000 (07:25 +0000)]
net: thunderx: Fix set_max_bgx_per_node for 81xx rgx
Add the PCI_SUBSYS_DEVID_81XX_RGX and use the same to set
the max bgx per node count.
This fixes the issue intoduced by following commit 78aacb6f6 net: thunderx: Fix invalid mac addresses for node1 interfaces
With this commit the max_bgx_per_node for 81xx is set as 2 instead of 3
because of which num_vfs is always calculated as zero.
Signed-off-by: George Cherian <george.cherian@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Wed, 12 Apr 2017 23:24:35 +0000 (19:24 -0400)]
net-timestamp: avoid use-after-free in ip_recv_error
Syzkaller reported a use-after-free in ip_recv_error at line
info->ipi_ifindex = skb->dev->ifindex;
This function is called on dequeue from the error queue, at which
point the device pointer may no longer be valid.
Save ifindex on enqueue in __skb_complete_tx_timestamp, when the
pointer is valid or NULL. Store it in temporary storage skb->cb.
It is safe to reference skb->dev here, as called from device drivers
or dev_queue_xmit. The exception is when called from tcp_ack_tstamp;
in that case it is NULL and ifindex is set to 0 (invalid).
Do not return a pktinfo cmsg if ifindex is 0. This maintains the
current behavior of not returning a cmsg if skb->dev was NULL.
On dequeue, the ipv4 path will cast from sock_exterr_skb to
in_pktinfo. Both have ifindex as their first element, so no explicit
conversion is needed. This is by design, introduced in commit 0b922b7a829c ("net: original ingress device index in PKTINFO"). For
ipv6 ip6_datagram_support_cmsg converts to in6_pktinfo.
Fixes: 829ae9d61165 ("net-timestamp: allow reading recv cmsg on errqueue with origin tstamp") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix this by always locking RTNL first on all setsockopt() paths.
Note, after this patch ip_ra_lock is no longer needed either.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The number of rx queues is determined by the rss_cpus parameter
or the cpu topology. If that is higher than EFX_MAX_RX_QUEUES the
driver can corrupt state.
Fixes: 8ceee660aacb ("New driver "sfc" for Solarstorm SFC4000 controller.") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"Again, a batch that's been sitting a couple of weeks, mostly because
I anticipated a bit more material but it didn't show up -- which is
good.
These are all your garden variety fixes for ARM platforms.
The most visible issue fixed here is probably the SMP reset issue on
OMAP, the rest are minor stuff"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: allwinner: a64: add pmu0 regs for USB PHY
ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer
reset: add exported __reset_control_get, return NULL if optional
ARM: orion5x: only call into phylib when available
ARM: omap2+: Revert omap-smp.c changes resetting CPU1 during boot
ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
ARM: dts: ti: fix PCI bus dtc warnings
ARM: dts: am335x-baltos: disable EEE for Atheros 8035 PHY
ARM: dts: OMAP3: Fix MFG ID EEPROM
ARM: sun8i: a33: add operating-points-v2 property to all nodes
ARM: sun8i: a33: remove highest OPP to fix CPU crashes
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Four small fixes.
Three of them fix the same error in NVMe, in loop, fc, and rdma
respectively. The last fix from Ming fixes a regression in this
series, where our bvec gap logic was wrong and causes an oops on
NVMe for certain conditions"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix bio_will_gap() for first bvec with offset
nvme-fc: Fix sqsize wrong assignment based on ctrl MQES capability
nvme-rdma: Fix sqsize wrong assignment based on ctrl MQES capability
nvme-loop: Fix sqsize wrong assignment based on ctrl MQES capability
Olof Johansson [Sun, 16 Apr 2017 18:52:26 +0000 (11:52 -0700)]
Merge tag 'omap-for-v4.11/fixes-rc6-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes
Regression fix for omap interconnect code for deferred probe.
Without this fix we can get PM related warnings for devices that
use deferred probe. If necessary, this fix can wait for the
v4.12 merge window no problem.
* tag 'omap-for-v4.11/fixes-rc6-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: omap_device: Sync omap_device and pm_runtime after probe defer
ARM: omap2+: Revert omap-smp.c changes resetting CPU1 during boot
ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
ARM: dts: ti: fix PCI bus dtc warnings
ARM: dts: am335x-baltos: disable EEE for Atheros 8035 PHY
ARM: dts: OMAP3: Fix MFG ID EEPROM
Merge branch 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
"Unfortunately, the commit to fix the cgroup mount race in the previous
pull request can lead to hangs.
The original bug has been around for a while and isn't too likely to
be triggered in usual use cases. Revert the commit for now"
* 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
Revert "cgroup: avoid attaching a cgroup root to two different superblocks"
Merge tag 'tty-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fix from Greg KH:
"Here is a single tty core revert for a patch that was reported to
cause problems.
The original issue is one that we have lived with for decades, so
trying to scramble to fix the fix in time for 4.11-final does not make
sense due to the fragility of the tty ldisc layer. Just reverting it
makes sense for now"
* tag 'tty-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "tty: don't panic on OOM in tty_set_ldisc()"
Merge tag 'trace-v4.11-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fix from Steven Rostedt:
"While rewriting the function probe code, I stumbled over a long
standing bug. This bug has been there sinc function tracing was added
way back when. But my new development depends on this bug being fixed,
and it should be fixed regardless as it causes ftrace to disable
itself when triggered, and a reboot is required to enable it again.
The bug is that the function probe does not disable itself properly if
there's another probe of its type still enabled. For example:
The above registers two traceoff probes (one for schedule and one for
do_IRQ, and then removes do_IRQ.
But since there still exists one for schedule, it is not done
properly. When adding do_IRQ back, the breakage in the accounting is
noticed by the ftrace self tests, and it causes a warning and disables
ftrace"
* tag 'trace-v4.11-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix removing of second function probe
Andrei reports CRIU test hangs with the patch applied. The bug fixed
by the patch isn't too likely to trigger in actual uses. Revert the
patch for now.
parisc: Fix get_user() for 64-bit value on 32-bit kernel
This fixes a bug in which the upper 32-bits of a 64-bit value which is
read by get_user() was lost on a 32-bit kernel.
While touching this code, split out pre-loading of %sr2 space register
and clean up code indent.
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull nvdimm fixes from Dan Williams:
"A small crop of lockdep, sleeping while atomic, and other fixes /
band-aids in advance of the full-blown reworks targeting the next
merge window. The largest change here is "libnvdimm: fix blk free
space accounting" which deletes a pile of buggy code that better
testing would have caught before merging. The next change that is
borderline too big for a late rc is switching the device-dax locking
from rcu to srcu, I couldn't think of a smaller way to make that fix.
The __copy_user_nocache fix will have a full replacement in 4.12 to
move those pmem special case considerations into the pmem driver. The
"libnvdimm: band aid btt vs clear poison locking" commit admits that
our error clearing support for btt went in broken, so we just disable
it in 4.11 and -stable. A replacement / full fix is in the pipeline
for 4.12
Some of these would have been caught earlier had DEBUG_ATOMIC_SLEEP
been enabled on my development station. I wonder if we should have:
config DEBUG_ATOMIC_SLEEP
default PROVE_LOCKING
...since I mistakenly thought I got both with PROVE_LOCKING=y.
These have received a build success notification from the 0day robot,
and some have appeared in a -next release with no reported issues"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions
device-dax: switch to srcu, fix rcu_read_lock() vs pte allocation
libnvdimm: band aid btt vs clear poison locking
libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
libnvdimm: fix blk free space accounting
acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison)
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is seven small fixes which are all for user visible issues that
fortunately only occur in rare circumstances.
The most serious is the sr one in which QEMU can cause us to read
beyond the end of a buffer (I don't think it's exploitable, but just
in case).
The next is the sd capacity fix which means all non 512 byte sector
drives greater than 2TB fail to be correctly sized.
The rest are either in new drivers (qedf) or on error legs"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ipr: do not set DID_PASSTHROUGH on CHECK CONDITION
scsi: aacraid: fix PCI error recovery path
scsi: sd: Fix capacity calculation with 32-bit sector_t
scsi: qla2xxx: Add fix to read correct register value for ISP82xx.
scsi: qedf: Fix crash due to unsolicited FIP VLAN response.
scsi: sr: Sanity check returned mode data
scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
Merge branch 'parisc-4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fix from Helge Deller:
"Mikulas Patocka fixed a few bugs in our new pa_memcpy() assembler
function, e.g. one bug made the kernel unbootable if source and
destination address are the same"
* 'parisc-4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix bugs in pa_memcpy
[ 1337.483798] ================================================
[ 1337.483999] [ BUG: lock held when returning to user space! ]
[ 1337.484252] 4.11.0-rc6 #19 Not tainted
[ 1337.484423] ------------------------------------------------
[ 1337.484626] mount/14766 is leaving the kernel with locks still held!
[ 1337.484841] 1 lock held by mount/14766:
[ 1337.485017] #0: (&type->s_umount_key#33/1){+.+.+.}, at: [<ffffffff8124171f>] sget_userns+0x2af/0x520
Caught by xfstests generic/413 which tried to mount with the unsupported
mount option dax. Then xfstests generic/422 ran sync which deadlocks.
Signed-off-by: Martin Brandenburg <martin@omnibond.com> Acked-by: Mike Marshall <hubcap@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Normal pathname lookup doesn't allow empty pathnames, but using
AT_EMPTY_PATH (with name_to_handle_at() or fstatat(), for example) you
can trigger an empty pathname lookup.
And not only is the RCU lookup in that case entirely unnecessary
(because we'll obviously immediately finalize the end result), it is
actively wrong.
Why? An empth path is a special case that will return the original
'dirfd' dentry - and that dentry may not actually be RCU-free'd,
resulting in a potential use-after-free if we were to initialize the
path lazily under the RCU read lock and depend on complete_walk()
finalizing the dentry.
The patch 554bfeceb8a22d448cd986fc9efce25e833278a1 ("parisc: Fix access
fault handling in pa_memcpy()") reimplements the pa_memcpy function.
Unfortunatelly, it makes the kernel unbootable. The crash happens in the
function ide_complete_cmd where memcpy is called with the same source
and destination address.
This patch fixes a few bugs in pa_memcpy:
* When jumping to .Lcopy_loop_16 for the first time, don't skip the
instruction "ldi 31,t0" (this bug made the kernel unbootable)
* Use the COND macro when comparing length, so that the comparison is
64-bit (a theoretical issue, in case the length is greater than
0xffffffff)
* Don't use the COND macro after the "extru" instruction (the PA-RISC
specification says that the upper 32-bits of extru result are undefined,
although they are set to zero in practice)
* Fix exception addresses in .Lcopy16_fault and .Lcopy8_fault
* Rename .Lcopy_loop_4 to .Lcopy_loop_8 (so that it is consistent with
.Lcopy8_fault)
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"Just a small update to xpad driver to recognize yet another gamepad,
and another change making sure userio.h is exported"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - add support for Razer Wildcat gamepad
uapi: add missing install of userio.h
Pull networking fixes from David Miller:
"Things seem to be settling down as far as networking is concerned,
let's hope this trend continues...
1) Add iov_iter_revert() and use it to fix the behavior of
skb_copy_datagram_msg() et al., from Al Viro.
2) Fix the protocol used in the synthetic SKB we cons up for the
purposes of doing a simulated route lookup for RTM_GETROUTE
requests. From Florian Larysch.
3) Don't add noop_qdisc to the per-device qdisc hashes, from Cong
Wang.
4) Don't call netdev_change_features with the team lock held, from
Xin Long.
5) Revert TCP F-RTO extension to catch more spurious timeouts because
it interacts very badly with some middle-boxes. From Yuchung
Cheng.
6) Fix the loss of error values in l2tp {s,g}etsockopt calls, from
Guillaume Nault.
7) ctnetlink uses bit positions where it should be using bit masks,
fix from Liping Zhang.
8) Missing RCU locking in netfilter helper code, from Gao Feng.
9) Avoid double frees and use-after-frees in tcp_disconnect(), from
Eric Dumazet.
10) Don't do a changelink before we register the netdevice in
bridging, from Ido Schimmel.
11) Lock the ipv6 device address list properly, from Rabin Vincent"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
netfilter: ipt_CLUSTERIP: Fix wrong conntrack netns refcnt usage
netfilter: nft_hash: do not dump the auto generated seed
drivers: net: usb: qmi_wwan: add QMI_QUIRK_SET_DTR for Telit PID 0x1201
ipv6: Fix idev->addr_list corruption
net: xdp: don't export dev_change_xdp_fd()
bridge: netlink: register netdevice before executing changelink
bridge: implement missing ndo_uninit()
bpf: reference may_access_skb() from __bpf_prog_run()
tcp: clear saved_syn in tcp_disconnect()
netfilter: nf_ct_expect: use proper RCU list traversal/update APIs
netfilter: ctnetlink: skip dumping expect when nfct_help(ct) is NULL
netfilter: make it safer during the inet6_dev->addr_list traversal
netfilter: ctnetlink: make it safer when checking the ct helper name
netfilter: helper: Add the rcu lock when call __nf_conntrack_helper_find
netfilter: ctnetlink: using bit to represent the ct event
netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
net: tcp: Increase TCP_MIB_OUTRSTS even though fail to alloc skb
l2tp: don't mask errors in pppol2tp_getsockopt()
l2tp: don't mask errors in pppol2tp_setsockopt()
tcp: restrict F-RTO to work-around broken middle-boxes
...
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of small fixes for x86:
- fix locking in RDT to prevent memory leaks and freeing in use
memory
- prevent setting invalid values for vdso32_enabled which cause
inconsistencies for user space resulting in application crashes.
- plug a race in the vdso32 code between fork and sysctl which causes
inconsistencies for user space resulting in application crashes.
- make MPX signal delivery work in compat mode
- make the dmesg output of traps and faults readable again"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel_rdt: Fix locking in rdtgroup_schemata_write()
x86/debug: Fix the printk() debug output of signal_fault(), do_trap() and do_general_protection()
x86/vdso: Plug race between mapping and ELF header setup
x86/vdso: Ensure vdso32_enabled gets set to valid values only
x86/signals: Fix lower/upper bound reporting in compat siginfo
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"The irq department provides:
- two fixes for the CPU affinity spread infrastructure to prevent
unbalanced spreading in corner cases which leads to horrible
performance, because interrupts are rather aggregated than spread
- add a missing spinlock initializer in the imx-gpcv2 init code"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-imx-gpcv2: Fix spinlock initialization
irq/affinity: Fix extra vecs calculation
irq/affinity: Fix CPU spread for unbalanced nodes
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Thomas Gleixner:
"Three fixes from EFI land:
- prevent accessing a Graphic Output Device (GOP) which the kernel
does not know to handle
- prevent PCI reconfiguration to modify a BAR which covers the
framebuffer because that's already in use through the EFI GOP
interface
- avoid reserving EFI runtime regions as this results in bogus memory
mappings"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Don't try to reserve runtime regions
efi/fb: Avoid reconfiguration of BAR that covers the framebuffer
efi/libstub: Skip GOP with PIXEL_BLT_ONLY format
Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"Dave Sterba collected a few more fixes for the last rc.
These aren't marked for stable, but I'm putting them in with a batch
were testing/sending by hand for this release"
* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix potential use-after-free for cloned bio
Btrfs: fix segmentation fault when doing dio read
Btrfs: fix invalid dereference in btrfs_retry_endio
btrfs: drop the nossd flag when remounting with -o ssd
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull more CIFS fixes from Steve French:
"As promised, here is the remaining set of cifs/smb3 fixes for stable
(and a fix for one regression) now that they have had additional
review and testing"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Fix SMB3 mount without specifying a security mechanism
CIFS: store results of cifs_reopen_file to avoid infinite wait
CIFS: remove bad_network_name flag
CIFS: reconnect thread reschedule itself
CIFS: handle guest access errors to Windows shares
CIFS: Fix null pointer deref during read resp processing
When two function probes are added to set_ftrace_filter, and then one of
them is removed, the update to the function locations is not performed, and
the record keeping of the function states are corrupted, and causes an
ftrace_bug() to occur.
This is easily reproducable by adding two probes, removing one, and then
adding it back again.
Cc: stable@vger.kernel.org Fixes: 59df055f1991 ("ftrace: trace different functions with a different tracer") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Ming Lei [Fri, 14 Apr 2017 19:58:29 +0000 (13:58 -0600)]
block: fix bio_will_gap() for first bvec with offset
Commit 729204ef49ec("block: relax check on sg gap") allows us to merge
bios, if both are physically contiguous. This change can merge a huge
number of small bios, through mkfs for example, mkfs.ntfs running time
can be decreased to ~1/10.
But if one rq starts with a non-aligned buffer (the 1st bvec's bv_offset
is non-zero) and if we allow the merge, it is quite difficult to respect
sg gap limit, especially the max segment size, or we risk having an
unaligned virtual boundary. This patch tries to avoid the issue by
disallowing a merge, if the req starts with an unaligned buffer.
Also add comments to explain why the merged segment can't end in
unaligned virt boundary.
Fixes: 729204ef49ec ("block: relax check on sg gap") Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com>
Rewrote parts of the commit message and comments.
Merge tag 'fbdev-v4.11-rc6' of git://github.com/bzolnier/linux
Pull fbdev fixes from Bartlomiej Zolnierkiewicz:
- fix probing time checks in omapfb driver (regression fix)
- fix optional VBAT support in ssd1307fb driver (regression fix)
- fix connecting to backend in xen-fbfront driver
* tag 'fbdev-v4.11-rc6' of git://github.com/bzolnier/linux:
fbdev: omapfb: delete check_required_callbacks()
xen, fbfront: fix connecting to backend
fbdev/ssd1307fb: fix optional VBAT support
Merge tag 'pm-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a cpufreq core regression related to CPU online/offline and
several issues in the turbostat and cpupower utilities.
Specifics:
- Allow CPUs to be put back online even if the cpufreq driver is
unable to work with them (eg. due to missing information from
platform firmware), which was the previous behavior expected by
users, but changed in the 4.9 time frame (Chen Yu).
- Fix a few minor issues in the turbostat utility, introduced mostly
during the recent update of it (Len Brown, Doug Smythies).
- Fix a cpupower utility bug causing it to report incorrect values
for turbo frequencies in some cases (Ben Hutchings)"
* tag 'pm-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores
cpufreq: Bring CPUs up even if cpufreq_online() failed
tools/power turbostat: update version number
tools/power turbostat: fix impossibly large CPU%c1 value
tools/power turbostat: turbostat.8 add missing column definitions
tools/power turbostat: update HWP dump to decimal from hex
tools/power turbostat: enable package THERM_INTERRUPT dump
tools/power turbostat: show missing Core and GFX power on SKL and KBL
tools/power turbostat: bugfix: GFXMHz column not changing
Merge tag 'acpi-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert a recent ACPICA commit that turned out to be problematic
and fix a device enumeration breakage from the 4.8 cycle.
Specifics:
- Revert a recent ACPICA commit targeted at catching firmware bugs
which promptly did that and caused functional problems to appear
(Rafael Wysocki).
- Fix a device enumeration problem introduced in the 4.8 time frame
which caused the ACPI docking station driver to report incorrect
status via sysfs among other things (Rafael Wysocki)"
* tag 'acpi-4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Revert "ACPICA: Resources: Not a valid resource if buffer length too long"
ACPI / scan: Set the visited flag for all enumerated devices
Merge tag 'devmem-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull CONFIG_STRICT_DEVMEM fix from Kees Cook:
"Fixes /dev/mem to read back zeros for System RAM areas in the 1MB
exception area on x86 to avoid exposing RAM or tripping hardened
usercopy"
* tag 'devmem-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
mm: Tighten x86 /dev/mem with zeroing reads
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael S. Tsirkin:
"virtio oops fixes
The virtio pci rework using shared interrupts caused a lot of issues.
We tried to fix them but run out of time. Revert for now, and revisit
the issue for the next kernel.
Luckily we are able to do this without loosing automatic interrupt
NUMA affinity which was the main motivator for the rework"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio-pci: Remove affinity hint before freeing the interrupt
Revert "virtio_pci: remove struct virtio_pci_vq_info"
Revert "virtio_pci: use shared interrupts for virtqueues"
Revert "virtio_pci: don't duplicate the msix_enable flag in struct pci_dev"
Revert "virtio_pci: simplify MSI-X setup"
Revert "virtio_pci: fix out of bound access for msix_names"
MAINTAINERS: fix virtio file pattern
virtio_console: fix uninitialized variable use
virtio_net: clear MTU when out of range
virtio: allow drivers to validate features
virtio_net: enable big packets for large MTU values
Aaro Koskinen [Fri, 14 Apr 2017 11:38:32 +0000 (13:38 +0200)]
fbdev: omapfb: delete check_required_callbacks()
Commit 561eb9d09a93 ("fbdev: omap/lcd: Make callbacks optional") made
panel callbacks optional but forgot to update check_required_callbacks().
As a result many (all?) OMAP systems using omapfb will crash at boot.
Fix by deleting the whole function.
Fixes: 561eb9d09a93 ("fbdev: omap/lcd: Make callbacks optional") Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Merge branches 'pm-cpufreq-fixes' and 'pm-tools-fixes'
* pm-cpufreq-fixes:
cpufreq: Bring CPUs up even if cpufreq_online() failed
* pm-tools-fixes:
cpupower: Fix turbo frequency reporting for pre-Sandy Bridge cores
tools/power turbostat: update version number
tools/power turbostat: fix impossibly large CPU%c1 value
tools/power turbostat: turbostat.8 add missing column definitions
tools/power turbostat: update HWP dump to decimal from hex
tools/power turbostat: enable package THERM_INTERRUPT dump
tools/power turbostat: show missing Core and GFX power on SKL and KBL
tools/power turbostat: bugfix: GFXMHz column not changing
The presence of 'thp: reduce indentation level in change_huge_pmd()'
is unfortunate. But the patchset had been decently reviewed and tested
before we decided it was needed in -stable and I felt it best not to
churn things at the last minute"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mailmap: add Martin Kepplinger's email
zsmalloc: expand class bit
zram: do not use copy_page with non-page aligned address
zram: fix operator precedence to get offset
hugetlbfs: fix offset overflow in hugetlbfs mmap
thp: fix MADV_DONTNEED vs clear soft dirty race
thp: fix MADV_DONTNEED vs. MADV_FREE race
mm: drop unused pmdp_huge_get_and_clear_notify()
thp: fix MADV_DONTNEED vs. numa balancing race
thp: reduce indentation level in change_huge_pmd()
z3fold: fix page locking in z3fold_alloc()
Minchan Kim [Thu, 13 Apr 2017 21:56:40 +0000 (14:56 -0700)]
zsmalloc: expand class bit
Now 64K page system, zsamlloc has 257 classes so 8 class bit is not
enough. With that, it corrupts the system when zsmalloc stores
65536byte data(ie, index number 256) so that this patch increases class
bit for simple fix for stable backport. We should clean up this mess
soon.
Minchan Kim [Thu, 13 Apr 2017 21:56:37 +0000 (14:56 -0700)]
zram: do not use copy_page with non-page aligned address
The copy_page is optimized memcpy for page-alinged address. If it is
used with non-page aligned address, it can corrupt memory which means
system corruption. With zram, it can happen with
1. 64K architecture
2. partial IO
3. slub debug
Partial IO need to allocate a page and zram allocates it via kmalloc.
With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned
address. And finally, copy_page(mem, cmem) corrupts memory.
So, this patch changes it to memcpy.
Actuaully, we don't need to change zram_bvec_write part because zsmalloc
returns page-aligned address in case of PAGE_SIZE class but it's not
good to rely on the internal of zsmalloc.
Note:
When this patch is merged to stable, clear_page should be fixed, too.
Unfortunately, recent zram removes it by "same page merge" feature so
it's hard to backport this patch to -stable tree.
I will handle it when I receive the mail from stable tree maintainer to
merge this patch to backport.
Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()") Link: http://lkml.kernel.org/r/1492042622-12074-2-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Thu, 13 Apr 2017 21:56:35 +0000 (14:56 -0700)]
zram: fix operator precedence to get offset
In zram_rw_page, the logic to get offset is wrong by operator precedence
(i.e., "<<" is higher than "&"). With wrong offset, zram can corrupt
the user's data. This patch fixes it.
Fixes: 8c7f01025 ("zram: implement rw_page operation of zram") Link: http://lkml.kernel.org/r/1492042622-12074-1-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Kravetz [Thu, 13 Apr 2017 21:56:32 +0000 (14:56 -0700)]
hugetlbfs: fix offset overflow in hugetlbfs mmap
If mmap() maps a file, it can be passed an offset into the file at which
the mapping is to start. Offset could be a negative value when
represented as a loff_t. The offset plus length will be used to update
the file size (i_size) which is also a loff_t.
Validate the value of offset and offset + length to make sure they do
not overflow and appear as negative.
Found by syzcaller with commit ff8c0c53c475 ("mm/hugetlb.c: don't call
region_abort if region_chg fails") applied. Prior to this commit, the
overflow would still occur but we would luckily return ENOMEM.
Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem).
It's critical to not clear pmd intermittently while handling MADV_FREE
to avoid race with MADV_DONTNEED:
CPU0: CPU1:
madvise_free_huge_pmd()
pmdp_huge_get_and_clear_full()
madvise_dontneed()
zap_pmd_range()
pmd_trans_huge(*pmd) == 0 (without ptl)
// skip the pmd
set_pmd_at();
// pmd is re-established
It results in MADV_DONTNEED skipping the pmd, leaving it not cleared.
It violates MADV_DONTNEED interface and can result is userspace
misbehaviour.
Basically it's the same race as with numa balancing in
change_huge_pmd(), but a bit simpler to mitigate: we don't need to
preserve dirty/young flags here due to MADV_FREE functionality.
[kirill.shutemov@linux.intel.com: Urgh... Power is special again] Link: http://lkml.kernel.org/r/20170303102636.bhd2zhtpds4mt62a@black.fi.intel.com Link: http://lkml.kernel.org/r/20170302151034.27829-4-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In case prot_numa, we are under down_read(mmap_sem). It's critical to
not clear pmd intermittently to avoid race with MADV_DONTNEED which is
also under down_read(mmap_sem):
CPU0: CPU1:
change_huge_pmd(prot_numa=1)
pmdp_huge_get_and_clear_notify()
madvise_dontneed()
zap_pmd_range()
pmd_trans_huge(*pmd) == 0 (without ptl)
// skip the pmd
set_pmd_at();
// pmd is re-established
The race makes MADV_DONTNEED miss the huge pmd and don't clear it
which may break userspace.
Found by code analysis, never saw triggered.
Link: http://lkml.kernel.org/r/20170302151034.27829-3-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stress testing of the current z3fold implementation on a 8-core system
revealed it was possible that a z3fold page deleted from its unbuddied
list in z3fold_alloc() would be put on another unbuddied list by
z3fold_free() while z3fold_alloc() is still processing it. This has
been introduced with commit 5a27aa822 ("z3fold: add kref refcounting")
due to the removal of special handling of a z3fold page not on any list
in z3fold_free().
To fix this, the z3fold page lock should be taken in z3fold_alloc()
before the pool lock is released. To avoid deadlocking, we just try to
lock the page as soon as we get a hold of it, and if trylock fails, we
drop this page and take the next one.
Signed-off-by: Vitaly Wool <vitalywool@gmail.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: <Oleksiy.Avramchenko@sony.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>