Martin Habets [Mon, 2 Nov 2015 12:51:31 +0000 (12:51 +0000)]
sfc: push partner queue for skb->xmit_more
When the IP stack passes SKBs the sfc driver puts them in 2 different TX
queues (called partners), one for checksummed and one for not checksummed.
If the SKB has xmit_more set the driver will delay pushing the work to the
NIC.
When later it does decide to push the buffers this patch ensures it also
pushes the partner queue, if that also has any delayed work. Before this
fix the work in the partner queue would be left for a long time and cause
a netdev watchdog.
Fixes: 70b33fb ("sfc: add support for skb->xmit_more") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Martin Habets <mhabets@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 3 Nov 2015 01:08:19 +0000 (17:08 -0800)]
sit: fix sit0 percpu double allocations
sit0 device allocates its percpu storage twice :
- One time in ipip6_tunnel_init()
- One time in ipip6_fb_tunnel_init()
Thus we leak 48 bytes per possible cpu per network namespace dismantle.
ipip6_fb_tunnel_init() can be much simpler and does not
return an error, and should be called after register_netdev()
Note that ipip6_tunnel_clone_6rd() also needs to be called
after register_netdev() (calling ipip6_tunnel_init())
Fixes: ebe084aafb7e ("sit: Use ipip6_tunnel_init as the ndo_init function.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 2 Nov 2015 15:50:07 +0000 (07:50 -0800)]
net: avoid NULL deref in inet_ctl_sock_destroy()
Under low memory conditions, tcp_sk_init() and icmp_sk_init()
can both iterate on all possible cpus and call inet_ctl_sock_destroy(),
with eventual NULL pointer.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: fix crash on ICMPv6 redirects with prohibited/blackholed source
There are other error values besides ip6_null_entry that can be returned by
ip6_route_redirect(): fib6_rule_action() can also result in
ip6_blk_hole_entry and ip6_prohibit_entry if such ip rules are installed.
Only checking for ip6_null_entry in rt6_do_redirect() causes ip6_ins_rt()
to be called with rt->rt6i_table == NULL in these cases, making the kernel
crash.
Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently slhc_init() treats out-of-range values of rslots and tslots
as equivalent to 0, except that if tslots is too large it will
dereference a null pointer (CVE-2015-7799).
Add a range-check at the top of the function and make it return an
ERR_PTR() on error instead of NULL. Change the callers accordingly.
Compile-tested only.
Reported-by: 郭永刚 <guoyonggang@360.cn>
References: http://article.gmane.org/gmane.comp.security.oss.general/17908 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Sun, 1 Nov 2015 00:34:50 +0000 (01:34 +0100)]
qmi_wwan: fix entry for HP lt4112 LTE/HSPA+ Gobi 4G Module
The lt4112 is a HP branded Huawei me906e modem. Like other Huawei
modems, it does not have a fixed interface to function mapping.
Instead it uses a Huawei specific scheme: functions are mapped by
subclass and protocol.
However, the HP vendor ID is used for modems from many different
manufacturers using different schemes, so we cannot apply a generic
vendor rule like we do for the Huawei vendor ID.
Replace the previous lt4112 entry pointing to an arbitrary interface
number with a device specific subclass + protocol match.
Reported-and-tested-by: Muri Nicanor <muri+libqmi@immerda.ch> Tested-by: Martin Hauke <mardnh@gmx.de> Fixes: bb2bdeb83fb1 ("qmi_wwan: Add support for HP lt4112 LTE/HSPA+ Gobi 4G Modem") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Fri, 30 Oct 2015 23:05:56 +0000 (02:05 +0300)]
sh_eth: fix uninitialized arrays in sh_eth_ring_init()
sh_eth_ring_free() called in the sh_eth_ring_init()'s error path expects
the arrays pointed to by 'sh_eth_private::[rt]x_skbuff' to be initialized
with NULLs but they are allocated with just kmalloc_array() and so are left
filled with random data. Use kcalloc() instead.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Phil Reid [Fri, 30 Oct 2015 08:43:55 +0000 (16:43 +0800)]
stmmac: Correctly report PTP capabilities.
priv->hwts_*_en indicate if timestamping is enabled/disabled at run
time. But priv->dma_cap.time_stamp and priv->dma_cap.atime_stamp
indicates HW is support for PTPv1/PTPv2.
Signed-off-by: Phil Reid <preid@electromag.com.au> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 1 Nov 2015 21:57:44 +0000 (16:57 -0500)]
Merge branch 'ipv4_link_down'
Julian Anastasov says:
====================
ipv4: fix problems from the RTNH_F_LINKDOWN introduction
Fix two problems from the change that introduced RTNH_F_LINKDOWN
flag. The first patch deals with the removal of local route on
DOWN event. The second patch makes sure the RTNH_F_LINKDOWN
flag is properly updated on UP event because the DOWN event
sets it in all cases.
v2->v3:
- use bool for force var
v1->v2:
- forgot to add ifconfig dummy0 down in the test case
- split to 2 patches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Fri, 30 Oct 2015 08:23:34 +0000 (10:23 +0200)]
ipv4: update RTNH_F_LINKDOWN flag on UP event
When nexthop is part of multipath route we should clear the
LINKDOWN flag when link goes UP or when first address is added.
This is needed because we always set LINKDOWN flag when DEAD flag
was set but now on UP the nexthop is not dead anymore. Examples when
LINKDOWN bit can be forgotten when no NETDEV_CHANGE is delivered:
- link goes down (LINKDOWN is set), then link goes UP and device
shows carrier OK but LINKDOWN remains set
- last address is deleted (LINKDOWN is set), then address is
added and device shows carrier OK but LINKDOWN remains set
Steps to reproduce:
modprobe dummy
ifconfig dummy0 192.168.168.1 up
here add a multipath route where one nexthop is for dummy0:
ip route add 1.2.3.4 nexthop dummy0 nexthop SOME_OTHER_DEVICE
ifconfig dummy0 down
ifconfig dummy0 up
now ip route shows nexthop that is not dead. Now set the sysctl var:
now ip route will show a dead nexthop because the forgotten
RTNH_F_LINKDOWN is propagated as RTNH_F_DEAD.
Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Anastasov [Fri, 30 Oct 2015 08:23:33 +0000 (10:23 +0200)]
ipv4: fix to not remove local route on link down
When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event
we should not delete the local routes if the local address
is still present. The confusion comes from the fact that both
fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN
constant. Fix it by returning back the variable 'force'.
Steps to reproduce:
modprobe dummy
ifconfig dummy0 192.168.168.1 up
ifconfig dummy0 down
ip route list table local | grep dummy | grep host
local 192.168.168.1 dev dummy0 proto kernel scope host src 192.168.168.1
Fixes: 8a3d03166f19 ("net: track link-status of ipv4 nexthops") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 30 Oct 2015 01:11:35 +0000 (18:11 -0700)]
net: bcmgenet: Software reset EPHY after power on
The EPHY on GENET v1->v3 is extremely finicky, and will show occasional
failures based on the timing and reset sequence, ranging from duplicate
packets, to extremely high latencies.
Perform an additional software reset, and re-configuration to make sure it is
in a consistent and working state.
Fixes: 6ac3ce8295e6 ("net: bcmgenet: Remove excessive PHY reset") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pavel Fedin [Thu, 29 Oct 2015 06:45:22 +0000 (09:45 +0300)]
net: smsc911x: Fix crash if loopback test fails
On certain hardware in certain situations loopback test fails and the
driver gets removed. During mdiobus_unregister() instance of PHY driver
gets disposed. But by this time it has already been started using
phy_connect_direct().
PHY driver uses DELAYED_WORK in order to maintain its state. Attempting
to dispose the driver without calling phy_disconnect() causes deallocation
of DELAYED_WORK being active. This shortly causes a bad crash in timer
code.
The problem can be discovered by enabling CONFIG_DEBUG_OBJECTS_TIMERS and
CONFIG_DEBUG_OBJECTS_FREE
Signed-off-by: Pavel Fedin <p.fedin@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Wed, 28 Oct 2015 17:09:53 +0000 (13:09 -0400)]
tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers
Testing of the new UDP bearer has revealed that reception of
NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE
message buffers is not prepared for the case that those may be
non-linear.
We now linearize all such buffers before they are delivered up to the
generic reception layer.
In order for the commit to apply cleanly to 'net' and 'stable', we do
the change in the function tipc_udp_recv() for now. Later, we will post
a commit to 'net-next' moving the linearization to generic code, in
tipc_named_rcv() and tipc_link_proto_rcv().
Fixes: commit d0f91938bede ("tipc: add ip/udp media type") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 31 Oct 2015 22:19:36 +0000 (15:19 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
"This sets the stable pages flag on the RBD block device when we have
CRCs enabled. (This is necessary since the default assumption for
block devices changed in 3.9)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: require stable pages if message data CRCs are enabled
Linus Torvalds [Sat, 31 Oct 2015 21:49:19 +0000 (14:49 -0700)]
Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs bug fixes from Miklos Szeredi:
"This contains fixes for bugs that appeared in earlier kernels (all are
marked for -stable)"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: free lower_mnt array in ovl_put_super
ovl: free stack of paths in ovl_fill_super
ovl: fix open in stacked overlay
ovl: fix dentry reference leak
ovl: use O_LARGEFILE in ovl_copy_up()
1) Fix two regressions in ipv6 route lookups, particularly wrt output
interface specifications in the lookup key. From David Ahern.
2) Fix checks in ipv6 IPSEC tunnel pre-encap fragmentation, from
Herbert Xu.
3) Fix mis-advertisement of 1000BASE-T on bcm63xx_enet, from Simon
Arlott.
4) Some smsc phys misbehave with energy detect mode enabled, so add a
DT property and disable it on such switches. From Heiko Schocher.
5) Fix TSO corruption on TX in mv643xx_eth, from Philipp Kirchhofer.
6) Fix regression added by removal of openvswitch vport stats, from
James Morse.
7) Vendor Kconfig options should be bool, not tristate, from Andreas
Schwab.
8) Use non-_BH() net stats bump in tcp_xmit_probe_skb(), otherwise we
barf during TCP REPAIR operations.
9) Fix various bugs in openvswitch conntrack support, from Joe
Stringer.
10) Fix NETLINK_LIST_MEMBERSHIPS locking, from David Herrmann.
11) Don't have VSOCK do sock_put() in interrupt context, from Jorgen
Hansen.
12) Fix skb_realloc_headroom() failures properly in ISDN, from Karsten
Keil.
13) Add some device IDs to qmi_wwan, from Bjorn Mork.
14) Fix ovs egress tunnel information when using lwtunnel devices, from
Pravin B Shelar.
15) Add missing NETIF_F_FRAGLIST to macvtab feature list, from Jason
Wang.
16) Fix incorrect handling of throw routes when the result of the throw
cannot find a match, from Xin Long.
17) Protect ipv6 MTU calculations from wrap-around, from Hannes Frederic
Sowa.
18) Fix failed autonegotiation on KSZ9031 micrel PHYs, from Nathan
Sullivan.
19) Add missing memory barries in descriptor accesses or xgbe driver,
from Thomas Lendacky.
20) Fix release conditon test in pppoe_release(), from Guillaume Nault.
21) Fix gianfar bugs wrt filter configuration, from Claudiu Manoil.
22) Fix violations of RX buffer alignment in sh_eth driver, from Sergei
Shtylyov.
23) Fixing missing of_node_put() calls in various places around the
networking, from Julia Lawall.
24) Fix incorrect leaf now walking in ipv4 routing tree, from Alexander
Duyck.
25) RDS doesn't check pskb_pull()/pskb_trim() return values, from
Sowmini Varadhan.
26) Fix VLAN configuration in mlx4 driver, from Jack Morgenstein.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues
Revert "Merge branch 'ipv6-overflow-arith'"
net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes
net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is present
vhost: fix performance on LE hosts
bpf: sample: define aarch64 specific registers
amd-xgbe: Fix race between access of desc and desc index
RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv
forcedeth: fix unilateral interrupt disabling in netpoll path
openvswitch: Fix skb leak using IPv6 defrag
ipv6: Export nf_ct_frag6_consume_orig()
openvswitch: Fix double-free on ip_defrag() errors
fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key
net: mv643xx_eth: add missing of_node_put
ath6kl: add missing of_node_put
net: phy: mdio: add missing of_node_put
netdev/phy: add missing of_node_put
net: netcp: add missing of_node_put
net: thunderx: add missing of_node_put
ipv6: gre: support SIT encapsulation
...
Linus Torvalds [Sat, 31 Oct 2015 01:49:44 +0000 (18:49 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input layer fixes from Dmitry Torokhov:
- a change to the ALPS driver where we had limit the quirk for
trackstick handling from being active on all Dells to just a few
models
- a fix for a build dependency issue in the sur40 driver
- a small clock handling fixup in the LPC32xx touchscreen driver
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: alps - only the Dell Latitude D420/430/620/630 have separate stick button bits
Input: sur40 - add dependency on VIDEO_V4L2
Input: lpc32xx_ts - fix warnings caused by enabling unprepared clock
Linus Torvalds [Sat, 31 Oct 2015 01:47:18 +0000 (18:47 -0700)]
Merge tag 'pci-v4.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fix from Bjorn Helgaas:
"Sorry for this last-minute update; it's been in -next for quite a
while, but I forgot about it until I started getting ready for the
merge window.
It's small and fixes a way a user could cause a panic via sysfs, so I
think it's worth getting it in v4.3.
NUMA:
- Prevent out of bounds access in sysfs numa_node override (Sasha Levin)"
* tag 'pci-v4.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Prevent out of bounds access in numa_node override
Linus Torvalds [Fri, 30 Oct 2015 23:57:55 +0000 (16:57 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Apologies for this being so late, but we've uncovered a few nasty
issues on arm64 which didn't settle down until yesterday and the fixes
all look suitable for 4.3. Of the four patches, three of them are
Cc'd to stable, with the remaining patch fixing an issue that only
took effect during the merge window.
Summary:
- Fix corruption in SWP emulation when STXR fails due to contention
- Fix MMU re-initialisation when resuming from a low-power state
- Fix stack unwinding code to match what ftrace expects
- Fix relocation code in the EFI stub when DRAM base is not 2MB aligned"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/efi: do not assume DRAM base is aligned to 2 MB
Revert "ARM64: unwind: Fix PC calculation"
arm64: kernel: fix tcr_el1.t0sz restore on systems with extended idmap
arm64: compat: fix stxr failure case in SWP emulation
Ronny Hegewald [Thu, 15 Oct 2015 18:50:46 +0000 (18:50 +0000)]
rbd: require stable pages if message data CRCs are enabled
rbd requires stable pages, as it performs a crc of the page data before
they are send to the OSDs.
But since kernel 3.9 (patch 1d1d1a767206fbe5d4c69493b7e6d2a8d08cc0a0
"mm: only enforce stable page writes if the backing device requires
it") it is not assumed anymore that block devices require stable pages.
This patch sets the necessary flag to get stable pages back for rbd.
In a ceph installation that provides multiple ext4 formatted rbd
devices "bad crc" messages appeared regularly (ca 1 message every 1-2
minutes on every OSD that provided the data for the rbd) in the
OSD-logs before this patch. After this patch this messages are pretty
much gone (only ca 1-2 / month / OSD).
Ard Biesheuvel [Thu, 29 Oct 2015 14:07:25 +0000 (15:07 +0100)]
arm64/efi: do not assume DRAM base is aligned to 2 MB
The current arm64 Image relocation code in the UEFI stub assumes that
the dram_base argument it receives is always a multiple of 2 MB. In
reality, it is simply the lowest start address of all RAM entries in
the UEFI memory map, which means it could be any multiple of 4 KB.
Since the arm64 kernel Image needs to reside TEXT_OFFSET bytes beyond
a 2 MB aligned base, or it will fail to boot, make sure we round dram_base
to 2 MB before using it to calculate the relocation address.
Fixes: e38457c361b30c5a ("arm64: efi: prefer AllocatePages() over efi_low_alloc() for vmlinux") Reported-by: Timur Tabi <timur@codeaurora.org> Tested-by: Timur Tabi <timur@codeaurora.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues
Raw sockets with hdrincl enabled can insert ipv6 extension headers
right into the data stream. In case we need to fragment those packets,
we reparse the options header to find the place where we can insert
the fragment header. If the extension headers exceed the link's MTU we
actually cannot make progress in such a case.
Instead of ending up in broken arithmetic or rounding towards 0 and
entering an endless loop in ip6_fragment, just prevent those cases by
aborting early and signal -EMSGSIZE to user space.
This is the second version of the patch which doesn't use the
overflow_usub function, which got reverted for now.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch applied, we were the only architecture making this sort
of adjustment to the PC calculation in the unwinder. This causes
problems for ftrace, where the PC values are matched against the
contents of the stack frames in the callchain and fail to match any
records after the address adjustment.
Whilst there has been some effort to change ftrace to workaround this,
those patches are not yet ready for mainline and, since we're the odd
architecture in this regard, let's just step in line with other
architectures (like arch/arm/) for now.
Cc: <stable@vger.kernel.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
arm64: kernel: fix tcr_el1.t0sz restore on systems with extended idmap
Commit dd006da21646 ("arm64: mm: increase VA range of identity map")
introduced a mechanism to extend the virtual memory map range
to support arm64 systems with system RAM located at very high offset,
where the identity mapping used to enable/disable the MMU requires
additional translation levels to map the physical memory at an equal
virtual offset.
The kernel detects at boot time the tcr_el1.t0sz value required by the
identity mapping and sets-up the tcr_el1.t0sz register field accordingly,
any time the identity map is required in the kernel (ie when enabling the
MMU).
After enabling the MMU, in the cold boot path the kernel resets the
tcr_el1.t0sz to its default value (ie the actual configuration value for
the system virtual address space) so that after enabling the MMU the
memory space translated by ttbr0_el1 is restored as expected.
Commit dd006da21646 ("arm64: mm: increase VA range of identity map")
also added code to set-up the tcr_el1.t0sz value when the kernel resumes
from low-power states with the MMU off through cpu_resume() in order to
effectively use the identity mapping to enable the MMU but failed to add
the code required to restore the tcr_el1.t0sz to its default value, when
the core returns to the kernel with the MMU enabled, so that the kernel
might end up running with tcr_el1.t0sz value set-up for the identity
mapping which can be lower than the value required by the actual virtual
address space, resulting in an erroneous set-up.
This patchs adds code in the resume path that restores the tcr_el1.t0sz
default value upon core resume, mirroring this way the cold boot path
behaviour therefore fixing the issue.
Cc: <stable@vger.kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: dd006da21646 ("arm64: mm: increase VA range of identity map") Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
Will Deacon [Thu, 15 Oct 2015 12:55:53 +0000 (13:55 +0100)]
arm64: compat: fix stxr failure case in SWP emulation
If the STXR instruction fails in the SWP emulation code, we leave *data
overwritten with the loaded value, therefore corrupting the data written
by a subsequent, successful attempt.
This patch re-jigs the code so that we only write back to *data once we
know that the update has happened.
Cc: <stable@vger.kernel.org> Fixes: bd35a4adc413 ("arm64: Port SWP/SWPB emulation support from arm") Reported-by: Shengjiu Wang <shengjiu.wang@freescale.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
powerpc/dma: dma_set_coherent_mask() should not be GPL only
When turning this from inline to an exported function I was a bit
over-eager and made it GPL only. This prevents the use of pretty much
all non-GPL PCI driver which is a bit over the top. Let's bring it
back in line with other architecture.
Fixes: 817820b0226a ("powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Carol L Soto [Tue, 27 Oct 2015 15:36:20 +0000 (17:36 +0200)]
net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes
When doing memcpy/memset of EQEs, we should use sizeof struct
mlx4_eqe as the base size and not caps.eqe_size which could be bigger.
If caps.eqe_size is bigger than the struct mlx4_eqe then we corrupt
data in the master context.
When using a 64 byte stride, the memcpy copied over 63 bytes to the
slave_eq structure. This resulted in copying over the entire eqe of
interest, including its ownership bit -- and also 31 bytes of garbage
into the next WQE in the slave EQ -- which did NOT include the ownership
bit (and therefore had no impact).
However, once the stride is increased to 128, we are overwriting the
ownership bits of *three* eqes in the slave_eq struct. This results
in an incorrect ownership bit for those eqes, which causes the eq to
seem to be full. The issue therefore surfaced only once 128-byte EQEs
started being used in SRIOV and (overarchitectures that have 128/256
byte cache-lines such as PPC) - e.g after commit 77507aa249ae
"net/mlx4_core: Enable CQE/EQE stride support".
Fixes: 08ff32352d6f ('mlx4: 64-byte CQE/EQE support') Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Tue, 27 Oct 2015 15:36:19 +0000 (17:36 +0200)]
net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is present
We do not set the ins_vlan field to zero when no vlan id is present in the packet.
Since WQEs in the TX ring are not zeroed out between uses, this oversight
could result in having vlan flags present in the WQE ctrl segment when no
vlan is preset.
Fixes: e38af4faf01d ('net/mlx4_en: Add support for hardware accelerated 802.1ad vlan') Reported-by: Gideon Naim <gideonn@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian
support for legacy devices") introduced a minor regression: even with
cross-endian disabled, and even on LE host, vhost_is_little_endian is
checking is_le flag so there's always a branch.
To fix, simply check virtio_legacy_is_little_endian first.
Cc: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Mon, 26 Oct 2015 22:13:54 +0000 (17:13 -0500)]
amd-xgbe: Fix race between access of desc and desc index
During Tx cleanup it's still possible for the descriptor data to be
read ahead of the descriptor index. A memory barrier is required between
the read of the descriptor index and the start of the Tx cleanup loop.
This allows a change to a lighter-weight barrier in the Tx transmit
routine just before updating the current descriptor index.
Since the memory barrier does result in extra overhead on arm64, keep
the previous change to not chase the current descriptor value. This
prevents the execution of the barrier for each loop performed.
Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini Varadhan [Mon, 26 Oct 2015 16:46:37 +0000 (12:46 -0400)]
RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv
Either of pskb_pull() or pskb_trim() may fail under low memory conditions.
If rds_tcp_data_recv() ignores such failures, the application will
receive corrupted data because the skb has not been correctly
carved to the RDS datagram size.
Avoid this by handling pskb_pull/pskb_trim failure in the same
manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and
retry via the deferred call to rds_send_worker() that gets set up on
ENOMEM from rds_tcp_read_sock()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 26 Oct 2015 16:24:22 +0000 (12:24 -0400)]
forcedeth: fix unilateral interrupt disabling in netpoll path
Forcedeth currently uses disable_irq_lockdep and enable_irq_lockdep, which in
some configurations simply calls local_irq_disable. This causes errant warnings
in the netpoll path as in netpoll_send_skb_on_dev, where we disable irqs using
local_irq_save, leading to the following warning:
Fix it by modifying the forcedeth code to use
disable_irq_nosync_lockdep_irqsavedisable_irq_nosync_lockdep_irqsave instead,
which saves and restores irq state properly. This also saves us a little code
in the process
Tested by the reporter, with successful restuls
Patch applies to the head of the net tree
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> Reported-by: Vasily Averin <vvs@sw.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Mon, 26 Oct 2015 03:21:50 +0000 (20:21 -0700)]
openvswitch: Fix skb leak using IPv6 defrag
nf_ct_frag6_gather() makes a clone of each skb passed to it, and if the
reassembly is successful, expects the caller to free all of the original
skbs using nf_ct_frag6_consume_orig(). This call was previously missing,
meaning that the original fragments were never freed (with the exception
of the last fragment to arrive).
Fix this by ensuring that all original fragments except for the last
fragment are freed via nf_ct_frag6_consume_orig(). The last fragment
will be morphed into the head, so it must not be freed yet. Furthermore,
retain the ->next pointer for the head after skb_morph().
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Mon, 26 Oct 2015 03:21:49 +0000 (20:21 -0700)]
ipv6: Export nf_ct_frag6_consume_orig()
This is needed in openvswitch to fix an skb leak in the next patch.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Stringer [Mon, 26 Oct 2015 03:21:48 +0000 (20:21 -0700)]
openvswitch: Fix double-free on ip_defrag() errors
If ip_defrag() returns an error other than -EINPROGRESS, then the skb is
freed. When handle_fragments() passes this back up to
do_execute_actions(), it will be freed again. Prevent this double free
by never freeing the skb in do_execute_actions() for errors returned by
ovs_ct_execute. Always free it in ovs_ct_execute() error paths instead.
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Florian Westphal <fw@strlen.de> Signed-off-by: Joe Stringer <joestringer@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Tue, 27 Oct 2015 22:06:45 +0000 (15:06 -0700)]
fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key
We were computing the child index in cases where the key value we were
looking for was actually less than the base key of the tnode. As a result
we were getting incorrect index values that would cause us to skip over
some children.
To fix this I have added a test that will force us to use child index 0 if
the key we are looking for is less than the key of the current tnode.
Fixes: 8be33e955cb9 ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf") Reported-by: Brian Rak <brak@gameservers.com> Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ming Lin [Thu, 22 Oct 2015 16:59:42 +0000 (09:59 -0700)]
block: re-add discard_granularity and alignment checks
In commit b49a087("block: remove split code in
blkdev_issue_{discard,write_same}"), discard_granularity and alignment
checks were removed. Ideally, with bio late splitting, the upper layers
shouldn't need to depend on device's limits.
Christoph reported a discard regression on the HGST Ultrastar SN100 NVMe
device when mkfs.xfs. We have not found the root cause yet.
This patch re-adds discard_granularity and alignment checks by reverting
the related changes in commit b49a087. The good thing is now we can
remove the 2G discard size cap and just use UINT_MAX to avoid bi_size
overflow.
Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Linus Torvalds [Tue, 27 Oct 2015 22:24:53 +0000 (07:24 +0900)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Two fixes for ARM and one for clkdev:
- Fix another build issue with vdsomunge on non-glibc systems
- Fix a randconfig build error caused by an invalid configuration
- Fix a clkdev problem causing the Nokia n700 to no longer boot"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
clkdev: fix clk_add_alias() with a NULL alias device name
ARM: 8445/1: fix vdsomunge not to depend on glibc specific byteswap.h
ARM: make RiscPC depend on MMU
Linus Torvalds [Tue, 27 Oct 2015 22:22:15 +0000 (07:22 +0900)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull blkcg fix from Jens Axboe:
"One final fix that should go into 4.3. It's a simple 2x1 liner,
fixing a blkcg accounting issue. It was using the wrong bio member to
look at the sync and write bits..."
* 'for-linus' of git://git.kernel.dk/linux-block:
blkcg: fix incorrect read/write sync/async stat accounting
Linus Torvalds [Tue, 27 Oct 2015 22:20:10 +0000 (07:20 +0900)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This fixes a problem in the Crypto API that may cause spurious errors
when signals are received by the process that made the orignal system
call into the kernel"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: api - Only abort operations on fatal signal
Linus Torvalds [Tue, 27 Oct 2015 22:17:50 +0000 (07:17 +0900)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module preemption fix from Rusty Russell:
"Turns out we should have always been disabling preemption here;
someone finally caught it thanks to Peter Z's additional checks"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: Fix locking in symbol_put_addr()
Tejun Heo [Thu, 22 Oct 2015 00:27:12 +0000 (09:27 +0900)]
blkcg: fix incorrect read/write sync/async stat accounting
While unifying how blkcg stats are collected, 77ea733884eb ("blkcg:
move io_service_bytes and io_serviced stats into blkcg_gq")
incorrectly used bio->flags instead of bio->rw to tell the IO type.
This made IOs to be accounted as the wrong type. Fix it.
Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 77ea733884eb ("blkcg: move io_service_bytes and io_serviced stats into blkcg_gq") Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>
David S. Miller [Tue, 27 Oct 2015 05:08:22 +0000 (22:08 -0700)]
Merge branch 'net_of_node_put'
Julia Lawall says:
====================
add missing of_node_put
The various for_each device_node iterators performs an of_node_get on each
iteration, so a break out of the loop requires an of_node_put.
The complete semantic patch that fixes this problem is
(http://coccinelle.lip6.fr):
// <smpl>
@r@
local idexpression n;
expression e1,e2;
iterator name for_each_node_by_name, for_each_node_by_type,
for_each_compatible_node, for_each_matching_node,
for_each_matching_node_and_match, for_each_child_of_node,
for_each_available_child_of_node, for_each_node_with_property;
iterator i;
statement S;
expression list [n1] es;
@@
(
(
for_each_node_by_name(n,e1) S
|
for_each_node_by_type(n,e1) S
|
for_each_compatible_node(n,e1,e2) S
|
for_each_matching_node(n,e1) S
|
for_each_matching_node_and_match(n,e1,e2) S
|
for_each_child_of_node(e1,n) S
|
for_each_available_child_of_node(e1,n) S
|
for_each_node_with_property(n,e1) S
)
&
i(es,n,...) S
)
@@
local idexpression r.n;
iterator r.i;
expression e;
expression list [r.n1] es;
@@
Eric Dumazet [Sat, 24 Oct 2015 12:47:44 +0000 (05:47 -0700)]
ipv6: gre: support SIT encapsulation
gre_gso_segment() chokes if SIT frames were aggregated by GRO engine.
Fixes: 61c1db7fae21e ("ipv6: sit: add GSO/TSO support") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hans de Goede [Mon, 26 Oct 2015 08:50:28 +0000 (01:50 -0700)]
Input: alps - only the Dell Latitude D420/430/620/630 have separate stick button bits
commit 92bac83dd79e ("Input: alps - non interleaved V2 dualpoint has
separate stick button bits") assumes that all alps v2 non-interleaved
dual point setups have the separate stick button bits.
Later we limited this to Dell laptops only because of reports that this
broke things on non Dell laptops. Now it turns out that this breaks things
on the Dell Latitude D600 too. So it seems that only the Dell Latitude
D420/430/620/630, which all share the same touchpad / stick combo,
have these separate bits.
This patch limits the checking of the separate bits to only these models
fixing regressions with other models.
Reported-and-tested-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: stable@vger.kernel.org Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-By: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Linus Torvalds [Mon, 26 Oct 2015 22:44:13 +0000 (07:44 +0900)]
Merge tag 'iommu-fixes-v4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"Two late fixes for the AMD IOMMU driver:
- add an additional check to the io page-fault handler to avoid a
BUG_ON being hit in handle_mm_fault()
- fix a problem with devices writing to the system management area
and were blocked by the IOMMU because the driver wrongly cleared
out the DTE flags allowing that access"
* tag 'iommu-fixes-v4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Don't clear DTE flags when modifying it
iommu/amd: Fix BUG when faulting a PROT_NONE VMA
Linus Torvalds [Mon, 26 Oct 2015 22:41:48 +0000 (07:41 +0900)]
Merge tag 'md/4.3-rc6-fixes' of git://neil.brown.name/md
Pull md fixes from Neil Brown:
"Some raid1/raid10 fixes.
I meant to get this to you before -rc7, but what with all the travel
plans..
Two fixes for bugs that are in both raid1 and raid10. Both related to
bad-block-lists and at least one needs to be back ported to 3.1.
Also a revision for the "new" layout in raid10. This "new" code
(which aims to improve robustness) actually reduces robustness in some
cases. It probably isn't in use at all as not public user-space code
makes use of these new layouts. However just in case someone has
their own code, it would be good to get the WARNing out for them
sooner"
* tag 'md/4.3-rc6-fixes' of git://neil.brown.name/md:
md/raid10: fix the 'new' raid10 layout to work correctly.
md/raid10: don't clear bitmap bit when bad-block-list write fails.
md/raid1: don't clear bitmap bit when bad-block-list write fails.
md/raid10: submit_bio_wait() returns 0 on success
md/raid1: submit_bio_wait() returns 0 on success
Linus Torvalds [Mon, 26 Oct 2015 22:40:01 +0000 (07:40 +0900)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Last fixes from me: one amdgpu/radeon suspend resume and one leak fix,
along with one vmware fix for some issues when command submission
fails"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: don't try to recreate sysfs entries on resume
drm/radeon: don't try to recreate sysfs entries on resume
drm/amdgpu: stop leaking page flip fence
drm/vmwgfx: Stabilize the command buffer submission code
David S. Miller [Mon, 26 Oct 2015 01:28:23 +0000 (18:28 -0700)]
Merge branch 'gianfar-fixes'
Claudiu Manoil says:
====================
gianfar: Misc. fixes and updates
Various fixes for some older issues, including having a
MAINTAINERS entry for this driver.
I'd recommend applying them on top of net, thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 23 Oct 2015 08:42:00 +0000 (11:42 +0300)]
gianfar: Fix Rx BSY error handling
The Rx BSY error interrupt indicates that a frame was
received and discarded due to lack of buffers, so it's
a rx ring overflow condition and has nothing to do with
with bad rx packets. Use the right counter.
BSY conditions happen when the SoC is under performance
stress. Doing *more* work in stress situations by trying
to schedule NAPI is not a good idea as the stressed system
becomes still more stressed. The Rx interrupt is already
at work making sure the NAPI is scheduled.
So calling gfar_receive() here does not help. This issue
was present since day 1.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 23 Oct 2015 08:41:59 +0000 (11:41 +0300)]
gianfar: Don't enable the Filer w/o the Parser
Under one unusual circumstance it's possible to wrongly set
FILREN without enabling PRSDEP as well in the RCTRL register,
against the hardware specifications. With the default config
this does not happen because the default Rx offloads (Rx csum
and Rx VLAN) properly enable PRSDEP. But if anyone disables
all these offloads (via ethtool), we get a wrong configuration
were the Rx flow classification and hashing, and other Filer
based features (e.g. wake-on-filer interrupt) won't work.
This patch fixes the issue.
Also, account for Rx FCB insertion which happens every time
PRSDEP is set.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 26 Oct 2015 01:13:37 +0000 (18:13 -0700)]
Merge branch 'thunderx-fixes'
David Daney says:
====================
net: thunderx: Support pass-2 revision hardware.
With the availability of a new revision of the ThunderX NIC hardware a
few changes to the driver are required. With these, the driver works
on all currently available hardware revisions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: thunderx: Incorporate pass2 silicon CPI index configuration changes
Add support for ThunderX pass2 CPI and MPI configuration changes.
MPI_ALG is not enabled i.e MCAM parsing is disabled.
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Sat, 24 Oct 2015 00:14:09 +0000 (17:14 -0700)]
net: thunderx: Rewrite silicon revision tests.
The test for pass-1 silicon was incorrect, it should be for all
revisions less than 8. Also the revision is already present in the
pci_dev, so there is no need to read and keep a private copy.
Remove rev_id and code to read it from struct nicpf. Create new
static inline function pass1_silicon() to be used to testing the
silicon version. Use pass1_silicon() for revision checks, this will
be more widely used in follow on patches.
Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sat, 24 Oct 2015 00:14:08 +0000 (17:14 -0700)]
net: thunderx: Fix incorrect subsystem devid of VF on pass2 silicon
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sat, 24 Oct 2015 00:14:07 +0000 (17:14 -0700)]
net: thunderx: Remove PF soft reset.
In some silicon revisions, the soft reset clobbers PCI config space,
so quit doing the reset.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Airlie [Sat, 24 Oct 2015 19:02:33 +0000 (05:02 +1000)]
Merge branch 'vmwgfx-fixes-4.3' of git://people.freedesktop.org/~thomash/linux
I'm not sure whether this patch comes in too late, but it would be good to
have it in. It stabilizes command submission in case of command buffer errors.
* 'vmwgfx-fixes-4.3' of git://people.freedesktop.org/~thomash/linux:
drm/vmwgfx: Stabilize the command buffer submission code
NeilBrown [Thu, 22 Oct 2015 02:20:15 +0000 (13:20 +1100)]
md/raid10: fix the 'new' raid10 layout to work correctly.
In Linux 3.9 we introduce a new 'far' layout for RAID10 which was
supposed to rotate the replicas differently and so provide better
resilience. In particular it could survive more combinations of 2
drive failures.
Unfortunately. due to a coding error, this some did what was wanted,
sometimes improved less than we hoped, and sometimes - in very
unlikely circumstances - put multiple replicas on the same device so
the redundancy was harmed.
No public user-space tool has created arrays using this layout so it
is very unlikely that zero-redundancy arrays actually exist. Probably
no arrays using any form of the new layout exist. But we cannot be
certain.
So use another bit in the 'layout' number and introduce a bug-fixed
version of the layout.
Also when assembling an array, if it has a zero-redundancy layout,
give a warning.
NeilBrown [Sat, 24 Oct 2015 05:23:48 +0000 (16:23 +1100)]
md/raid10: don't clear bitmap bit when bad-block-list write fails.
When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data. If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.
However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit. Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced. This leads to data corruption.
We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.
So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.
This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.
Backports will require at least
Commit: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.")
as well. I'll send that to 'stable' separately.
Note that of the two tests of R10BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed. However doing it this way makes the
patch more obviously correct. I will tidy the code up in a
future merge window.
NeilBrown [Sat, 24 Oct 2015 05:02:16 +0000 (16:02 +1100)]
md/raid1: don't clear bitmap bit when bad-block-list write fails.
When a write fails and a bad-block-list is present, we can
update the bad-block-list instead of writing the data. If
this succeeds then it is OK clear the relevant bitmap-bit as
no further 'sync' of the block is needed.
However if writing the bad-block-list fails then we need to
treat the write as failed and particularly must not clear
the bitmap bit. Otherwise the device can be re-added (after
any hardware connection issues are resolved) and because the
relevant bit in the bitmap is clear, that block will not be
resynced. This leads to data corruption.
We already delay the final bio_endio() on the write until
the bad-block-list is written so that when the write
returns: either that data is safe, the bad-block record is
safe, or the fact that the device is faulty is safe.
However we *don't* delay the clearing of the bitmap, so the
bitmap bit can be recorded as cleared before we know if the
bad-block-list was written safely.
So: delay that until the write really is safe.
i.e. move the call to close_write() until just before
calling bio_endio(), and recheck the 'is array degraded'
status before making that call.
This bug goes back to v3.1 when bad-block-lists were
introduced, though it only affects arrays created with
mdadm-3.3 or later as only those have bad-block lists.
Backports will require at least
Commit: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.")
as well. I'll send that to 'stable' separately.
Note that of the two tests of R1BIO_WriteError that this
patch adds, the first is certain to fail and the second is
certain to succeed. However doing it this way makes the
patch more obviously correct. I will tidy the code up in a
future merge window.
Linus Torvalds [Fri, 23 Oct 2015 22:52:59 +0000 (07:52 +0900)]
Merge tag 'usb-4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are three xhci driver fixes for reported issues for 4.3-rc7
All have been in linux-next for a while with no problems"
* tag 'usb-4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: Add spurious wakeup quirk for LynxPoint-LP controllers
xhci: handle no ping response error properly
xhci: don't finish a TD if we get a short transfer event mid TD
Linus Torvalds [Fri, 23 Oct 2015 22:52:09 +0000 (07:52 +0900)]
Merge tag 'tty-4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are two fixes that resolve reported issues, one with the 8250
driver, and the other with the generic fbcon driver.
Both have been in linux-next for a while"
* tag 'tty-4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
fbcon: initialize blink interval before calling fb_set_par
Revert "serial: 8250_dma: don't bother DMA with small transfers"
Linus Torvalds [Fri, 23 Oct 2015 22:51:13 +0000 (07:51 +0900)]
Merge tag 'staging-4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are four iio driver fixes for 4.3-rc7, fixing some reported
issues. All of these have been in linux-next for a while"
* tag 'staging-4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: mxs-lradc: Fix temperature offset
iio: accel: sca3000: memory corruption in sca3000_read_first_n_hw_rb()
iio: st_accel: fix interrupt handling on LIS3LV02
iio: adc: twl4030: Fix ADC[3:6] readings
Linus Torvalds [Fri, 23 Oct 2015 22:28:05 +0000 (07:28 +0900)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull infiniband fixes from Doug Ledford:
"It's late in the game, I know, but these fixes seemed important enough
to warrant a late pull request. They all involve oopses or use after
frees or corruptions.
Six serious fixes:
- Hold the mutex around the find and corresponding update of our gid
- The ifa list is rcu protected, copy its contents under rcu to avoid
using a freed structure
- On error, netdev might be null, so check it before trying to
release it
- On init, if workqueue alloc fails, fail init
- The new demux patches exposed a bug in mlx5 and ipath drivers, we
need to use the payload P_Key to determine the P_Key the packet
arrived on because the hardware doesn't tell us the truth
- Due to a couple convoluted error flows, it is possible for the CM
to trigger a use_after_free and a double_free of rb nodes. Add two
checks to prevent that. This code has worked for 10+ years. It is
likely that some of the recent changes have caused this issue to
surface. The current patch will protect us from nasty events for
now while we track down why this is just now showing up"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/cm: Fix rb-tree duplicate free and use-after-free
IB/cma: Use inner P_Key to determine netdev
IB/ucma: check workqueue allocation before usage
IB/cma: Potential NULL dereference in cma_id_from_event
IB/core: Fix use after free of ifa
IB/core: Fix memory corruption in ib_cache_gid_set_default_gid
Linus Torvalds [Fri, 23 Oct 2015 22:23:52 +0000 (07:23 +0900)]
Merge tag 'dm-4.3-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Three stable fixes (two in btree code used by DM thinp and one to
properly store flags in DM cache metadata's superblock)"
* tag 'dm-4.3-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: the CLEAN_SHUTDOWN flag was not being set
dm btree: fix leak of bufio-backed block in btree_split_beneath error path
dm btree remove: fix a bug when rebalancing nodes after removal
Linus Torvalds [Fri, 23 Oct 2015 22:20:57 +0000 (07:20 +0900)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
"A final set of fixes for 4.3.
It is (again) bigger than I would have liked, but it's all been
through the testing mill and has been carefully reviewed by multiple
parties. Each fix is either a regression fix for this cycle, or is
marked stable. You can scold me at KS. The pull request contains:
- Three simple fixes for NVMe, fixing regressions since 4.3. From
Arnd, Christoph, and Keith.
- A single xen-blkfront fix from Cathy, fixing a NULL dereference if
an error is returned through the staste change callback.
- Fixup for some bad/sloppy code in nbd that got introduced earlier
in this cycle. From Markus Pargmann.
- A blk-mq tagset use-after-free fix from Junichi.
- A backing device lifetime fix from Tejun, fixing a crash.
- And finally, a set of regression/stable fixes for cgroup writeback
from Tejun"
* 'for-linus' of git://git.kernel.dk/linux-block:
writeback: remove broken rbtree_postorder_for_each_entry_safe() usage in cgwb_bdi_destroy()
NVMe: Fix memory leak on retried commands
block: don't release bdi while request_queue has live references
nvme: use an integer value to Linux errno values
blk-mq: fix use-after-free in blk_mq_free_tag_set()
nvme: fix 32-bit build warning
writeback: fix incorrect calculation of available memory for memcg domains
writeback: memcg dirty_throttle_control should be initialized with wb->memcg_completions
writeback: bdi_writeback iteration must not skip dying ones
writeback: fix bdi_writeback iteration in wakeup_dirtytime_writeback()
writeback: laptop_mode_timer_fn() needs rcu_read_lock() around bdi_writeback iteration
nbd: Add locking for tasks
xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing)
Linus Torvalds [Fri, 23 Oct 2015 22:19:33 +0000 (07:19 +0900)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"Two fixes.
One is a stopgap to prevent a stack blowout when users have a deep
chain of image clones. (We'll rewrite this code to be non-recursive
for the next window, but in the meantime this is a simple fix that
avoids a crash.)
The second fixes a refcount underflow"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: prevent kernel stack blow up on rbd map
rbd: don't leak parent_spec in rbd_dev_probe_parent()
Linus Torvalds [Fri, 23 Oct 2015 22:17:58 +0000 (07:17 +0900)]
Merge branch 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"I have two more small fixes this week:
Qu's fix avoids unneeded COW during fallocate, and Christian found a
memory leak in the error handling of an earlier fix"
* 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix possible leak in btrfs_ioctl_balance()
btrfs: Avoid truncate tailing page if fallocate range doesn't exceed inode size
i2c: pnx: fix runtime warnings caused by enabling unprepared clock
The driver can not be used on a platform with common clock framework
until clk_prepare/clk_unprepare calls are added, otherwise clk_enable
calls will fail and a WARN is generated.
Signed-off-by: Vladimir Zapolskiy <vz@mleia.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Joe Thornber [Thu, 22 Oct 2015 17:10:55 +0000 (18:10 +0100)]
dm cache: the CLEAN_SHUTDOWN flag was not being set
If the CLEAN_SHUTDOWN flag is not set when a cache is loaded then all cache
blocks are marked as dirty and a full writeback occurs.
__commit_transaction() is responsible for setting/clearing
CLEAN_SHUTDOWN (based the flags_mutator that is passed in).
Fix this issue, of the cache's on-disk flags being wrong, by making sure
__commit_transaction() does not reset the flags after the mutator has
altered the flags in preparation for them being serialized to disk.
Mike Snitzer [Thu, 22 Oct 2015 14:56:40 +0000 (10:56 -0400)]
dm btree: fix leak of bufio-backed block in btree_split_beneath error path
btree_split_beneath()'s error path had an outstanding FIXME that speaks
directly to the potential for _not_ cleaning up a previously allocated
bufio-backed block.
Fix this by releasing the previously allocated bufio block using
unlock_block().
Reported-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <thornber@redhat.com> Cc: stable@vger.kernel.org
Joe Thornber [Wed, 21 Oct 2015 17:36:49 +0000 (18:36 +0100)]
dm btree remove: fix a bug when rebalancing nodes after removal
Commit 4c7e309340ff ("dm btree remove: fix bug in redistribute3") wasn't
a complete fix for redistribute3().
The redistribute3 function takes 3 btree nodes and shares out the entries
evenly between them. If the three nodes in total contained
(MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting
rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in
the center.
Fix this issue by being more careful about calculating the target number
of entries for the left and right nodes.
Unit tested in userspace using this program:
https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.c
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org