Dave Ertman [Tue, 30 Aug 2016 00:38:26 +0000 (17:38 -0700)]
i40e: Fix kernel panic on enable/disable LLDP
If DCB is configured on the link partner switch with an
unsupported traffic class configuration (e.g. non-contiguous TCs),
the driver is flagging DCB as disabled. But, for future DCB
LLDPDUs, the driver was checking if the interface was DCB capable
instead of enabled. This was causing a kernel panic when LLDP
was enabled/disabled on the link partner switch.
This patch corrects the situation by having the LLDP event handler
check the correct flag in the pf structure. It also cleans up the
setting and clearing of the enabled flag for other checks.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tipc: fix random link resets while adding a second bearer
In a dual bearer configuration, if the second tipc link becomes
active while the first link still has pending nametable "bulk"
updates, it randomly leads to reset of the second link.
When a link is established, the function named_distribute(),
fills the skb based on node mtu (allows room for TUNNEL_PROTOCOL)
with NAME_DISTRIBUTOR message for each PUBLICATION.
However, the function named_distribute() allocates the buffer by
increasing the node mtu by INT_H_SIZE (to insert NAME_DISTRIBUTOR).
This consumes the space allocated for TUNNEL_PROTOCOL.
When establishing the second link, the link shall tunnel all the
messages in the first link queue including the "bulk" update.
As size of the NAME_DISTRIBUTOR messages while tunnelling, exceeds
the link mtu the transmission fails (-EMSGSIZE).
Thus, the synch point based on the message count of the tunnel
packets is never reached leading to link timeout.
In this commit, we adjust the size of name distributor message so that
they can be tunnelled.
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 1 Sep 2016 09:28:59 +0000 (11:28 +0200)]
tg3: Fix for disallow tx coalescing time to be 0
The recent commit 087d7a8c9174 "tg3: Fix for diasllow rx coalescing
time to be 0" disallow to set Rx coalescing time to be 0 as this stops
generating interrupts for the incoming packets. I found the zero
Tx coalescing time stops generating interrupts for outgoing packets
as well and fires Tx watchdog later. To avoid this, don't allow to set
Tx coalescing time to 0 and also remove subsequent checks that become
senseless.
Cc: satish.baddipadige@broadcom.com Cc: siva.kallam@broadcom.com Cc: michael.chan@broadcom.com Signed-off-by: Ivan Vecera <ivecera@redhat.com> Acked-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Acked-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/ethernet/qlogic/qed/qed_dcbx.c:1230:13-20: WARNING: kzalloc should be used for dcbx_info, instead of kmalloc/memset
drivers/net/ethernet/qlogic/qed/qed_dcbx.c:1192:13-20: WARNING: kzalloc should be used for dcbx_info, instead of kmalloc/memset
Use kzalloc rather than kmalloc followed by memset with 0
This considers some simple cases that are common and easy to validate
Note in particular that there are no ...s in the rule, so all of the
matched code has to be contiguous
mlxsw: spectrum: Use existing flood setup when adding VLANs
When a VLAN is added on a bridge port we should use the existing unicast
flood configuration of the port instead of assuming it's enabled.
Fixes: 0293038e0c36 ("mlxsw: spectrum: Add support for flood control") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum: Don't take multiple references on a FID
In commit 14d39461b3f4 ("mlxsw: spectrum: Use per-FID struct for the
VLAN-aware bridge") I added a per-FID struct, which member ports can
take a reference on upon VLAN membership configuration.
However, sometimes only the VLAN flags (e.g. egress untagged) are
toggled without changing the VLAN membership. In these cases we
shouldn't take another reference on the FID.
Fixes: 14d39461b3f4 ("mlxsw: spectrum: Use per-FID struct for the VLAN-aware bridge") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum: Fix error path in mlxsw_sp_module_init
Add forgotten notifier unregister.
Fixes: 99724c18fc66 ("mlxsw: spectrum: Introduce support for router interfaces") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Originally, I expected that there would be needed to call update
operation in case RALUE record action is changed. However, that is not
needed since write operation takes care of that nicely. Remove prepared
construct and always call the write operation.
Fixes: 61c503f976b5 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_router: Fix failure caused by double fib removal from HW
In mlxsw we squash tables 254 and 255 together into HW. Kernel adds/dels
/32 ip to/from both 254 and 255. On del path, that causes the same
prefix being removed twice. Fix this by introducing reference counting
for private mlxsw fib entries. That required a bit of code reshuffle.
Also put dev into fib entry key so the same prefix could be represented
once per every router interface.
Fixes: 61c503f976b5 ("mlxsw: spectrum_router: Implement fib4 add/del switchdev obj ops") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch takes care of clearing the uninitialized buffer before using it.
1. pfc pri-enable bitmap need to be cleared before setting the requested
enable bits. Without this, the un-touched values will be merged with
requested values and sent to MFW.
2. The data in app-entry field need to be cleared before using it.
3. Clear the output data buffer used in qed_dcbx_query_params().
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Set selection-field while configuring the app entry in ieee mode.
Management firmware requires the selection-field (SF) to be set for
configuring the application/protocol entry in IEEE mode. Without this
setting, the app entry will be configured incorrectly in MFW.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed*: Disallow dcbx configuration for VF interfaces.
Dcbx configuration is not supported for VF interfaces. Hence don't populate
the callbacks for VFs and also fail the dcbx-query for VFs.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This is because on the error path, after we install
the new socket file, we call sock_release() to clean
up the socket, which leaves the fd pointing to a freed
socket. Fix this by calling sys_close() on that fd
directly.
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Sep 2016 03:53:49 +0000 (20:53 -0700)]
Merge branch 'mediatek-fixes'
Sean Wang says:
====================
net: ethernet: mediatek: a couple of fixes
a couple of fixes come out from integrating with linux-4.8 rc1
they all are verified and workable on linux-4.8 rc1
Changes since v1:
- usage of loops to work out if all required clock are ready instead
of tedious coding
- remove redundant pinctrl setup that is already done by core driver
thanks for careful and patient reviewing by Andrew Lunn
- splitting distinct changes into the separate patches
- change variable naming from err to ret for readable coding
Changes since v2:
- restore to original clock disabling sequence that is changed
accidentally in the last version
- refine the commit log that would cause misunderstanding what has
been done in the changes
- refine the commit log that would cause footnote losing due to
improper delimiter use
Changes since v3:
- fix git rejects caused by mixing a change from net-next, so
remake the patch set based on the current net branch again.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Return -ENODEV if the MDIO bus is disabled in the device tree.
Signed-off-by: Sean Wang <sean.wang@mediatek.com> Acked-by: John Crispin <john@phrozen.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Thu, 1 Sep 2016 02:47:34 +0000 (10:47 +0800)]
net: ethernet: mediatek: use devm_mdiobus_alloc instead of mdiobus_alloc inside mtk_mdio_init
a lot of parts in the driver uses devm_* APIs to gain benefits from the
device resource management, so devm_mdiobus_alloc is also used instead
of mdiobus_alloc to have more elegant code flow.
Using common code provided by the devm_* helps to
1) have simplified the code flow as [1] says
2) decrease the risk of incorrect error handling by human
3) only a few drivers used it since it was proposed on linux 3.16,
so just hope to promote for this.
Sean Wang [Thu, 1 Sep 2016 02:47:30 +0000 (10:47 +0800)]
net: ethernet: mediatek: remove redundant free_irq for devm_request_irq allocated irq
these irqs are not used for shared irq and disabled during ethernet stops.
irq requested by devm_request_irq is safe to be freed automatically on
driver detach.
Signed-off-by: Sean Wang <sean.wang@mediatek.com> Acked-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Thu, 1 Sep 2016 02:47:28 +0000 (10:47 +0800)]
net: ethernet: mediatek: fix incorrect return value of devm_clk_get with EPROBE_DEFER
1) If the return value of devm_clk_get is EPROBE_DEFER, we should
defer probing the driver. The change is verified and works based
on 4.8-rc1 staying with the latest clk-next code for MT7623.
2) Changing with the usage of loops to work out if all clocks
required are fine
Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Wang [Thu, 1 Sep 2016 02:47:27 +0000 (10:47 +0800)]
net: ethernet: mediatek: fix fails from TX housekeeping due to incorrect port setup
which net device the SKB is complete for depends on the forward port
on txd4 on the corresponding TX descriptor, but the information isn't
set up well in case of SKB fragments that would lead to watchdog timeout
from the upper layer, so fix it up.
Signed-off-by: Sean Wang <sean.wang@mediatek.com> Acked-by: John Crispin <john@phrozen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Wed, 31 Aug 2016 12:16:44 +0000 (14:16 +0200)]
bridge: re-introduce 'fix parsing of MLDv2 reports'
commit bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with
INCLUDE and no sources as a leave") seems to have accidentally reverted
commit 47cc84ce0c2f ("bridge: fix parsing of MLDv2 reports"). This
commit brings back a change to br_ip6_multicast_mld2_report() where
parsing of MLDv2 reports stops when the first group is successfully
added to the MDB cache.
Fixes: bc8c20acaea1 ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 31 Aug 2016 04:34:48 +0000 (21:34 -0700)]
Merge tag 'mac80211-for-davem-2016-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Three little fixes:
* revert a recent wext patch, which Ben Hutchings noticed was
wrong, and it turns out not to be necessary for any driver
* fix an infinite loop that can occur under certain conditions
in mac80211's TDLS code (depending on regulatory information)
* add a cfg80211_get_station() static inline when cfg80211 isn't
built, to allow other modules to not have to depend on it for it
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We have already use skb_header_pointer to get the ip header pointer,
so there's no need to use ip_hdr again. Moreover, in NETDEV INGRESS
hook, ip header maybe not linear, so use ip_hdr is not appropriate,
remove it.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Linus Torvalds [Tue, 30 Aug 2016 02:12:35 +0000 (19:12 -0700)]
Merge tag 'hwmon-for-linus-v4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fix from Guenter Roeck:
"Add missing sysfs attribute group terminator to it87 driver"
* tag 'hwmon-for-linus-v4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (it87) Add missing sysfs attribute group terminator
Linus Torvalds [Mon, 29 Aug 2016 19:37:11 +0000 (12:37 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Fix bugs that could cause kernel deadlocks or file system corruption
while moving xattrs to expand the extended inode.
Also add some sanity checks to the block group descriptors to make
sure we don't end up overwriting the superblock"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: avoid deadlock when expanding inode size
ext4: properly align shifted xattrs when expanding inodes
ext4: fix xattr shifting when expanding inodes part 2
ext4: fix xattr shifting when expanding inodes
ext4: validate that metadata blocks do not overlap superblock
ext4: reserve xattr index for the Hurd
1) Segregate namespaces properly in conntrack dumps, from Liping Zhang.
2) tcp listener refcount fix in netfilter tproxy, from Eric Dumazet.
3) Fix timeouts in qed driver due to xmit_more, from Yuval Mintz.
4) Fix use-after-free in tcp_xmit_retransmit_queue().
5) Userspace header fixups (use of __u32, missing includes, etc.) from
Mikko Rapeli.
6) Further refinements to fragmentation wrt gso and tunnels, from
Shmulik Ladkani.
7) Trigger poll correctly for zero length UDP packets, from Eric
Dumazet.
8) TCP window scaling fix, also from Eric Dumazet.
9) SLAB_DESTROY_BY_RCU is not relevant any more for UDP sockets.
10) Module refcount leak in qdisc_create_dflt(), from Eric Dumazet.
11) Fix deadlock in cp_rx_poll() of 8139cp driver, from Gao Feng.
12) Memory leak in rhashtable's alloc_bucket_locks(), from Eric Dumazet.
13) Add new device ID to alx driver, from Owen Lin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
Add Killer E2500 device ID in alx driver.
net: smc91x: fix SMC accesses
Documentation: networking: dsa: Remove platform device TODO
net/mlx5: Increase number of ethtool steering priorities
net/mlx5: Add error prints when validate ETS failed
net/mlx5e: Fix memory leak if refreshing TIRs fails
net/mlx5e: Add ethtool counter for TX xmit_more
net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ
net/mlx5e: Don't wait for SQ completions on close
net/mlx5e: Don't post fragmented MPWQE when RQ is disabled
net/mlx5e: Don't wait for RQ completions on close
net/mlx5e: Limit UMR length to the device's limitation
rhashtable: fix a memory leak in alloc_bucket_locks()
sfc: fix potential stack corruption from running past stat bitmask
team: loadbalance: push lacpdus to exact delivery
net: hns: dereference ppe_cb->ppe_common_cb if it is non-null
8139cp: Fix one possible deadloop in cp_rx_poll
i40e: Change some init flow for the client
Revert "phy: IRQ cannot be shared"
net: dsa: bcm_sf2: Fix race condition while unmasking interrupts
...
Linus Torvalds [Mon, 29 Aug 2016 19:20:22 +0000 (12:20 -0700)]
Merge tag 'platform-drivers-x86-v4.8-4' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86
Pull x86 platform driver fixes from Darren Hart:
"Remove module related code from two drivers that are only configurable
as built-in: intel_pmic_gpio and platform/olpc"
* tag 'platform-drivers-x86-v4.8-4' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
intel_pmic_gpio: Make explicitly non-modular
platform/olpc: Make ec explicitly non-modular
Linus Torvalds [Mon, 29 Aug 2016 19:12:15 +0000 (12:12 -0700)]
Merge tag 'powerpc-4.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Ben Herrenschmidt:
"This was meant to be sent early last week, but I has a change pending
on one of the fixes and other things made me forget all about. Ugh.
We have some misc fixes for powerpc 4.8. Some trivial bits and some
regressions, and a trivial cleanup or two that I saw no point in
letting rot in patchwork"
* tag 'powerpc-4.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: signals: Discard transaction state from signal frames
powerpc/powernv : Drop reference added by kset_find_obj()
powerpc/tm: do not use r13 for tabort_syscall
powerpc: move hmi.c to arch/powerpc/kvm/
powerpc: sysdev: cpm: fix gpio save_regs functions
powerpc/pseries: PACA save area fix for MCE vs MCE
powerpc/pseries: PACA save area fix for general exception vs MCE
powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support
powerpc, hotplug: Avoid to touch non-existent cpumasks.
powerpc: migrate exception table users off module.h and onto extable.h
powerpc/powernv/pci: fix iterator signedness
powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb)
cxl: use pcibios_free_controller_deferred() when removing vPHBs
powerpc: mpc8349emitx: Delete unnecessary assignment for the field "owner"
powerpc/512x: Delete unnecessary assignment for the field "owner"
drivers/macintosh: Delete owner assignment
powerpc: cputhreads: Add missing include file
...meaning that it currently is not being built as a module by anyone.
Lets remove the couple traces of modular infrastructure use, so that
when reading the driver there is no doubt it is builtin-only.
We delete the MODULE_LICENSE tag etc. since all that information
was (or is now) contained at the top of the file in the comments.
We don't replace module.h with init.h since the file already has that.
Cc: Alek Du <alek.du@intel.com> Cc: platform-driver-x86@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Paul Gortmaker [Mon, 15 Aug 2016 22:25:17 +0000 (18:25 -0400)]
platform/olpc: Make ec explicitly non-modular
The Kconfig entry controlling compilation of this code is:
arch/x86/Kconfig:config OLPC
arch/x86/Kconfig: bool "One Laptop Per Child support"
...meaning that it currently is not being built as a module by anyone.
Lets remove the couple traces of modular infrastructure use, so that
when reading the driver there is no doubt it is builtin-only.
We delete the MODULE_LICENSE tag etc. since all that information
was (or is now) contained at the top of the file in the comments.
Cc: platform-driver-x86@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Russell King [Sat, 27 Aug 2016 16:33:03 +0000 (17:33 +0100)]
net: smc91x: fix SMC accesses
Commit b70661c70830 ("net: smc91x: use run-time configuration on all ARM
machines") broke some ARM platforms through several mistakes. Firstly,
the access size must correspond to the following rule:
(a) at least one of 16-bit or 8-bit access size must be supported
(b) 32-bit accesses are optional, and may be enabled in addition to
the above.
Secondly, it provides no emulation of 16-bit accesses, instead blindly
making 16-bit accesses even when the platform specifies that only 8-bit
is supported.
Reorganise smc91x.h so we can make use of the existing 16-bit access
emulation already provided - if 16-bit accesses are supported, use
16-bit accesses directly, otherwise if 8-bit accesses are supported,
use the provided 16-bit access emulation. If neither, BUG(). This
exactly reflects the driver behaviour prior to the commit being fixed.
Since the conversion incorrectly cut down the available access sizes on
several platforms, we also need to go through every platform and fix up
the overly-restrictive access size: Arnd assumed that if a platform can
perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
size needed to be specified - not so, all available access sizes must
be specified.
This likely fixes some performance regressions in doing this: if a
platform does not support 8-bit accesses, 8-bit accesses have been
emulated by performing a 16-bit read-modify-write access.
Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
accesses, which was broken by the original commit.
Fixes: b70661c70830 ("net: smc91x: use run-time configuration on all ARM machines") Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 27 Aug 2016 22:34:20 +0000 (15:34 -0700)]
Documentation: networking: dsa: Remove platform device TODO
Since commit 83c0afaec7b7 ("net: dsa: Add new binding implementation"),
the shortcomings of the dsa platform device have been addressed, remove
that TODO item.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
This series contains some bug fixes for the mlx5 core and mlx5
ethernet driver.
From Saeed, Fix UMR to consider hardware translation table field
size limitation when calculating the maximum number of MTTs required
by the driver. Three patches to speed-up netdevice close time by
serializing channel (SQs & RQs) destruction rather than issuing and
waiting for hardware interrupts to free them.
From Eran, Fix ethtool ring parameter reporting for striding RQ layout.
Add error prints on ETS validation failure.
From Kamal, Fix memory leak on error flow.
From Maor, Fix ethtool steering priorities number.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Maor Gottlieb [Sun, 28 Aug 2016 22:13:50 +0000 (01:13 +0300)]
net/mlx5: Increase number of ethtool steering priorities
Ethtool has 11 flow tables, each flow table has its own priority.
Increase the number of priorities to be aligned with the number of flow
tables.
Fixes: 1174fce8d141 ('net/mlx5e: Support l3/l4 flow type specs in ethtool flow steering') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Sun, 28 Aug 2016 22:13:49 +0000 (01:13 +0300)]
net/mlx5: Add error prints when validate ETS failed
Upon set ETS failure due to user invalid input, add error prints to
specify the exact error to the user.
Fixes: cdcf11212b22 ('net/mlx5e: Validate BW weight values of ETS') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kamal Heib [Sun, 28 Aug 2016 22:13:48 +0000 (01:13 +0300)]
net/mlx5e: Fix memory leak if refreshing TIRs fails
Free 'in' command object also when mlx5_core_modify_tir fails.
Fixes: 724b2aa15126 ("net/mlx5e: TIRs management refactoring") Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Sun, 28 Aug 2016 22:13:47 +0000 (01:13 +0300)]
net/mlx5e: Add ethtool counter for TX xmit_more
Add a counter in ethtool for the number of times that
TX xmit_more was used.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eran Ben Elisha [Sun, 28 Aug 2016 22:13:46 +0000 (01:13 +0300)]
net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ
The driver RQ has two possible configurations: striding RQ and
non-striding RQ. Until this patch, the driver always reported the
number of hardware WQEs (ring descriptors). For non striding RQ
configuration, this was OK since we have one WQE per pending packet
For striding RQ, multiple packets can fit into one WQE. For better
user experience we normalize the rx_pending parameter (size of wqe/mtu)
as the average ring size in case of striding RQ.
Fixes: 461017cb006a ('net/mlx5e: Support RX multi-packet WQE ...') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Sun, 28 Aug 2016 22:13:45 +0000 (01:13 +0300)]
net/mlx5e: Don't wait for SQ completions on close
Instead of asking the firmware to flush the SQ (Send Queue) via
asynchronous completions when moved to error, we handle SQ flush
manually (mlx5e_free_tx_descs) same as we did when SQ flush got
timed out or on tx_timeout.
This will reduce SQs flush time and speedup interface down procedure.
Moved mlx5e_free_tx_descs to the end of en_tx.c for tx
critical code locality.
Fixes: 29429f3300a3 ('net/mlx5e: Timeout if SQ doesn't flush during close') Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Sun, 28 Aug 2016 22:13:44 +0000 (01:13 +0300)]
net/mlx5e: Don't post fragmented MPWQE when RQ is disabled
ICO (Internal control operations) SQ (Send Queue) is closed/disabled
after RQ (Receive Queue). After RQ is closed an ICO SQ completion
might post a fragmented MPWQE (Multi Packet Work Queue Element) into
that RQ.
As on regular RQ post, check if we are allowed to post to that
RQ (RQ is enabled). Cleanup in-progress UMR MPWQE on mlx5e_free_rx_descs
if needed.
Fixes: bc77b240b3c5 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE') Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Sun, 28 Aug 2016 22:13:43 +0000 (01:13 +0300)]
net/mlx5e: Don't wait for RQ completions on close
This will significantly reduce receive queue flush time on interface
down.
Instead of asking the firmware to flush the RQ (Receive Queue) via
asynchronous completions when moved to error, we handle RQ flush
manually (mlx5e_free_rx_descs) same as we did when RQ flush got timed
out.
This will reduce RQs flush time and speedup interface down procedure
(ifconfig down) from 6 sec to 0.3 sec on a 48 cores system.
Moved mlx5e_free_rx_descs en_main.c where it is needed, to keep en_rx.c
free form non critical data path code for better code locality.
Fixes: 6cd392a082de ('net/mlx5e: Handle RQ flush in error cases') Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Saeed Mahameed [Sun, 28 Aug 2016 22:13:42 +0000 (01:13 +0300)]
net/mlx5e: Limit UMR length to the device's limitation
ConnectX-4 UMR (User Memory Region) MTT translation table offset in WQE
is limited to U16_MAX, before this patch we ignored that limitation and
requested the maximum possible UMR translation length that the netdev
might need (MAX channels * MAX pages per channel).
In case of a system with #cores > 32 and when linear WQE allocation fails,
falling back to using UMR WQEs will cause the RQ (Receive Queue) to get
stuck.
Here we limit UMR length to min(U16_MAX, max required pages) (while
considering the required alignments) on driver load, by default U16_MAX is
sufficient since the default RX rings value guarantees that we are in
range, dynamically (on set_ringparam/set_channels) we will check if the
new required UMR length (num mtts) is still in range, if not, fail the
request.
Fixes: bc77b240b3c5 ('net/mlx5e: Add fragmented memory support for RX multi packet WQE') Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cyril Bur [Tue, 23 Aug 2016 00:46:17 +0000 (10:46 +1000)]
powerpc: signals: Discard transaction state from signal frames
Userspace can begin and suspend a transaction within the signal
handler which means they might enter sys_rt_sigreturn() with the
processor in suspended state.
sys_rt_sigreturn() wants to restore process context (which may have
been in a transaction before signal delivery). To do this it must
restore TM SPRS. To achieve this, any transaction initiated within the
signal frame must be discarded in order to be able to restore TM SPRs
as TM SPRs can only be manipulated non-transactionally..
>From the PowerPC ISA:
TM Bad Thing Exception [Category: Transactional Memory]
An attempt is made to execute a mtspr targeting a TM register in
other than Non-transactional state.
It isn't clear exactly if there is really a use case for userspace
returning with a suspended transaction, however, doing so doesn't (on
its own) constitute a bad frame. As such, this patch simply discards
the transactional state of the context calling the sigreturn and
continues.
Mukesh Ojha [Mon, 22 Aug 2016 06:47:44 +0000 (12:17 +0530)]
powerpc/powernv : Drop reference added by kset_find_obj()
In a situation, where Linux kernel gets notified about duplicate error log
from OPAL, it is been observed that kernel fails to remove sysfs entries
(/sys/firmware/opal/elog/0xXXXXXXXX) of such error logs. This is because,
we currently search the error log/dump kobject in the kset list via
'kset_find_obj()' routine. Which eventually increment the reference count
by one, once it founds the kobject.
So, unless we decrement the reference count by one after it found the kobject,
we would not be able to release the kobject properly later.
This patch adds the 'kobject_put()' which was missing earlier.
Nicholas Piggin [Mon, 25 Jul 2016 04:26:51 +0000 (14:26 +1000)]
powerpc/tm: do not use r13 for tabort_syscall
tabort_syscall runs with RI=1, so a nested recoverable machine
check will load the paca into r13 and overwrite what we loaded
it with, because exceptions returning to privileged mode do not
restore r13.
Fixes: b4b56f9ecab4 (powerpc/tm: Abort syscalls in active transactions) Cc: stable@vger.kernel.org Signed-off-by: Nick Piggin <npiggin@gmail.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Linus Torvalds [Sun, 28 Aug 2016 21:31:36 +0000 (14:31 -0700)]
Merge tag 'drm-fixes-for-4.8-rc4' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A bunch of fixes covering i915, amdgpu, one tegra and some core DRM
ones. Nothing too strange at this point"
* tag 'drm-fixes-for-4.8-rc4' of git://people.freedesktop.org/~airlied/linux: (21 commits)
drm/atomic: Don't potentially reset color_mgmt_changed on successive property updates.
drm: Protect fb_defio in drivers with CONFIG_KMS_FBDEV_EMULATION
drm/amdgpu: skip TV/CV in display parsing
drm/amdgpu: avoid a possible array overflow
drm/amdgpu: fix lru size grouping v2
drm/tegra: dsi: Enhance runtime power management
drm/i915: Fix botched merge that downgrades CSR versions.
drm/i915/skl: Ensure pipes with changed wms get added to the state
drm/i915/gen9: Only copy WM results for changed pipes to skl_hw
drm/i915/skl: Add support for the SAGV, fix underrun hangs
drm/i915/gen6+: Interpret mailbox error flags
drm/i915: Reattach comment, complete type specification
drm/i915: Unconditionally flush any chipset buffers before execbuf
drm/i915/gen9: Drop invalid WARN() during data rate calculation
drm/i915/gen9: Initialize intel_state->active_crtcs during WM sanitization (v2)
drm: Reject page_flip for !DRIVER_MODESET
drm/amdgpu: fix timeout value check in amd_sched_job_recovery
drm/amdgpu: fix sdma_v2_4_ring_test_ib
drm/amdgpu: fix amdgpu_move_blit on 32bit systems
drm/radeon: fix radeon_move_blit on 32bit systems
...
Mario Kleiner [Fri, 26 Aug 2016 23:02:28 +0000 (01:02 +0200)]
drm/atomic: Don't potentially reset color_mgmt_changed on successive property updates.
Due to assigning the 'replaced' value instead of or'ing it,
if drm_atomic_crtc_set_property() gets called multiple times,
the last call will define the color_mgmt_changed flag, so
a non-updating call to a property can reset the flag and
prevent actual hw state updates required by preceding
property updates.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Cc: <stable@vger.kernel.org> # v4.6+ Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Sun, 28 Aug 2016 17:02:23 +0000 (10:02 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A few fixes from the perf departement
- prevent a imbalanced preemption disable in the events teardown code
- prevent out of bound acces in perf userspace
- make perf tools compile with UCLIBC again
- a fix for the userspace unwinder utility"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Use this_cpu_ptr() when stopping AUX events
perf evsel: Do not access outside hw cache name arrays
tools lib: Reinstate strlcpy() header guard with __UCLIBC__
perf unwind: Use addr_location::addr instead of ip for entries
Linus Torvalds [Sun, 28 Aug 2016 16:52:40 +0000 (09:52 -0700)]
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"This lot provides:
- plug a hotplug race in the new affinity infrastructure
- a fix for the trigger type of chained interrupts
- plug a potential memory leak in the core code
- a few fixes for ARM and MIPS GICs"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Implement activate op for device domain
irqchip/mips-gic: Cleanup chip and handler setup
genirq/affinity: Use get/put_online_cpus around cpumask operations
genirq: Fix potential memleak when failing to get irq pm
irqchip/gicv3-its: Disable the ITS before initializing it
irqchip/gicv3: Remove disabling redistributor and group1 non-secure interrupts
irqchip/gic: Allow self-SGIs for SMP on UP configurations
genirq: Correctly configure the trigger on chained interrupts
Linus Torvalds [Sun, 28 Aug 2016 16:03:05 +0000 (09:03 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"A few updates for timers & co:
- prevent a livelock in the timekeeping code when debugging is
enabled
- prevent out of bounds access in the timekeeping debug code
- various fixes in clocksource drivers
- a new maintainers entry"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/sun4i: Clear interrupts after stopping timer in probe function
drivers/clocksource/pistachio: Fix memory corruption in init
clocksource/drivers/timer-atmel-pit: Enable mck clock
clocksource/drivers/pxa: Fix include files for compilation
MAINTAINERS: Add ARM ARCHITECTED TIMER entry
timekeeping: Cap array access in timekeeping_debug
timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
Linus Torvalds [Sat, 27 Aug 2016 22:51:50 +0000 (15:51 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"ARM:
- fixes for ITS init issues, error handling, IRQ leakage, race
conditions
- an erratum workaround for timers
- some removal of misleading use of errors and comments
- a fix for GICv3 on 32-bit guests
MIPS:
- fix for where the guest could wrongly map the first page of
physical memory
x86:
- nested virtualization fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
MIPS: KVM: Check for pfn noslot case
kvm: nVMX: fix nested tsc scaling
KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
arm64: KVM: report configured SRE value to 32-bit world
arm64: KVM: remove misleading comment on pmu status
KVM: arm/arm64: timer: Workaround misconfigured timer interrupt
arm64: Document workaround for Cortex-A72 erratum #853709
KVM: arm/arm64: Change misleading use of is_error_pfn
KVM: arm64: ITS: avoid re-mapping LPIs
KVM: arm64: check for ITS device on MSI injection
KVM: arm64: ITS: move ITS registration into first VCPU run
KVM: arm64: vgic-its: Make updates to propbaser/pendbaser atomic
KVM: arm64: vgic-its: Plug race in vgic_put_irq
KVM: arm64: vgic-its: Handle errors from vgic_add_lpi
KVM: arm64: ITS: return 1 on successful MSI injection
Linus Torvalds [Sat, 27 Aug 2016 06:01:09 +0000 (23:01 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Round one of 4.8 rc fixes.
This should be the bulk of the -rc fixes for 4.8. I only have a few
things that are still outstanding (two ipoib bugs for which the
solution is not yet fully known, and a few queued items that came in
after my last push and I didn't want to delay this pull request for
late comers again).
Even though the patch count is kind of high, everything is minor fixes
so the overall churn is pretty low.
Summary:
- minor fixes to cxgb4
- minor fixes to mlx4
- one minor fix each to core, rxe, isert, srpt, mlx5, ocrdma, and usnic
- six or so fixes to i40iw fixes
- the rest are hfi1 fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (34 commits)
i40iw: Send last streaming mode message for loopback connections
IB/srpt: Update sport->port_guid with each port refresh
RDMA/ocrdma: Fix the max_sge reported from FW
i40iw: Avoid writing to freed memory
i40iw: Fix double free of allocated_buffer
IB/mlx5: Remove superfluous include of io-mapping.h
i40iw: Do not set self-referencing pointer to NULL after kfree
i40iw: Add missing NULL check for MPA private data
iw_cxgb4: Fix cxgb4 arm CQ logic w/IB_CQ_REPORT_MISSED_EVENTS
i40iw: Add missing check for interface already open
i40iw: Protect req_resource_num update
i40iw: Change mem_resources pointer to a u8
IB/core: Use memdup_user() rather than duplicating its implementation
IB/qib: Use memdup_user() rather than duplicating its implementation
iw_cxgb4: use the MPA initiator's IRD if < our ORD
iw_cxgb4: limit IRD/ORD advertised to ULP by device max.
IB/hfi1: Fix mm_struct use after free
IB/rdmvat: Fix double vfree() in rvt_create_qp() error path
IB/hfi1: Improve J_KEY generation
IB/hfi1: Return invalid field for non-QSFP CableInfo queries
...
Linus Torvalds [Sat, 27 Aug 2016 05:53:21 +0000 (22:53 -0700)]
Merge tag 'sound-4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Here are a bunch of fixes as you can see in diffstat.
One core change in ASoC is about the unexpected unbinding error, and
another about debugfs cleanup.
The rest are wide-spread driver-specific fixes: a series of LINE6 USB
fixes, a HD-audio quirk, and various ASoC fixes including OMAP boot
fixes and Intel SKL fixes"
* tag 'sound-4.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (22 commits)
ALSA: hda/realtek - fix headset mic detection for MSI MS-B120
ASoC: omap-mcpdm: Fix irq resource handling
ASoC: max98371: Add terminate entry for i2c_device_id tables
ALSA: line6: Fix POD sysfs attributes segfault
ALSA: line6: Give up on the lock while URBs are released.
ALSA: line6: Remove double line6_pcm_release() after failed acquire.
ASoC: omap-abe-twl6040: Correct dmic-codec device registration
ASoC: core: Clean up DAPM before the card debugfs
ASoC: omap-mcpdm: Drop pdmclk clock handling
ASoC: atmel_ssc_dai: Don't unconditionally reset SSC on stream startup
ASoC: compress: Fix leak of a widget list in soc_compr_open_fe
ASoC: Intel: Skylake: Fix error return code in skl_probe()
ASoC: wm2000: Fix return of uninitialised varible
ASoC: Fix leak of rtd in soc_bind_dai_link
ASoC: da7213: Default to 64 BCLKs per WCLK to support all formats
ASoC: nau8825: fix static check error about semaphone control
ASoC: nau8825: fix bug in playback when suspend
ASoC: samsung: Fix clock handling in S3C24XX_UDA134X card
ASoC: simple-card-utils: add missing MODULE_xxx()
ASoC: Intel: Skylake: Check list empty while getting module info
...
Eric Dumazet [Fri, 26 Aug 2016 15:51:39 +0000 (08:51 -0700)]
rhashtable: fix a memory leak in alloc_bucket_locks()
If vmalloc() was successful, do not attempt a kmalloc_array()
Fixes: 4cf0b354d92e ("rhashtable: avoid large lock-array allocations") Reported-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Rybchenko [Fri, 26 Aug 2016 10:19:34 +0000 (11:19 +0100)]
sfc: fix potential stack corruption from running past stat bitmask
On 32-bit systems, mask is only an array of 3 longs, not 4, so don't try
to write to mask[3].
Also include build-time checks in case the size of the bitmask changes.
Fixes: 3c36a2aded8c ("sfc: display vadaptor statistics for all interfaces") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 27 Aug 2016 03:22:01 +0000 (20:22 -0700)]
Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"We've queued up a few different fixes in here. These range from
enospc corners to fsync and quota fixes, and a few targeted at error
handling for corrupt metadata/fuzzing"
* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix lockdep warning on deadlock against an inode's log mutex
Btrfs: detect corruption when non-root leaf has zero item
Btrfs: check btree node's nritems
btrfs: don't create or leak aliased root while cleaning up orphans
Btrfs: fix em leak in find_first_block_group
btrfs: do not background blkdev_put()
Btrfs: clarify do_chunk_alloc()'s return value
btrfs: fix fsfreeze hang caused by delayed iputs deal
btrfs: update btrfs_space_info's bytes_may_use timely
btrfs: divide btrfs_update_reserved_bytes() into two functions
btrfs: use correct offset for reloc_inode in prealloc_file_extent_cluster()
btrfs: qgroup: Fix qgroup incorrectness caused by log replay
btrfs: relocation: Fix leaking qgroups numbers on data extents
btrfs: qgroup: Refactor btrfs_qgroup_insert_dirty_extent()
btrfs: waiting on qgroup rescan should not always be interruptible
btrfs: properly track when rescan worker is running
btrfs: flush_space: treat return value of do_chunk_alloc properly
Btrfs: add ASSERT for block group's memory leak
btrfs: backref: Fix soft lockup in __merge_refs function
Btrfs: fix memory leak of reloc_root
Linus Torvalds [Sat, 27 Aug 2016 03:15:32 +0000 (20:15 -0700)]
Merge tag 'dm-4.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- another stable fix for DM flakey (that tweaks the previous fix that
didn't factor in expected 'drop_writes' behavior for read IO).
- a dm-log bio operation flags fix for the broader block changes that
were merged during the 4.8 merge window.
* tag 'dm-4.8-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm log: fix unitialized bio operation flags
dm flakey: fix reads to be issued if drop_writes configured
Linus Torvalds [Sat, 27 Aug 2016 03:12:35 +0000 (20:12 -0700)]
Merge tag 'iommu-fixes-v4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Fixes from Will Deacon:
- fix a couple of thinkos in the CMDQ error handling and
short-descriptor page table code that have been there since day one
- disable stalling faults, since they may result in hardware deadlock
- fix an accidental BUG() when passing disable_bypass=1 on the
cmdline"
* tag 'iommu-fixes-v4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/arm-smmu: Don't BUG() if we find aborting STEs with disable_bypass
iommu/arm-smmu: Disable stalling faults for all endpoints
iommu/arm-smmu: Fix CMDQ error handling
iommu/io-pgtable-arm-v7s: Fix attributes when splitting blocks
Linus Torvalds [Sat, 27 Aug 2016 01:50:07 +0000 (18:50 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Here's a set of block fixes for the current 4.8-rc release. This
contains:
- a fix for a secure erase regression, from Adrian.
- a fix for an mmc use-after-free bug regression, also from Adrian.
- potential zero pointer deference in bdev freezing, from Andrey.
- a race fix for blk_set_queue_dying() from Bart.
- a set of xen blkfront fixes from Bob Liu.
- three small fixes for bcache, from Eric and Kent.
- a fix for a potential invalid NVMe state transition, from Gabriel.
- blk-mq CPU offline fix, preventing us from issuing and completing a
request on the wrong queue. From me.
- revert two previous floppy changes, since they caused a user
visibile regression. A better fix is in the works.
- ensure that we don't send down bios that have more than 256
elements in them. Fixes a crash with bcache, for example. From
Ming.
- a fix for deferencing an error pointer with cgroup writeback.
Fixes a regression. From Vegard"
* 'for-linus' of git://git.kernel.dk/linux-block:
mmc: fix use-after-free of struct request
Revert "floppy: refactor open() flags handling"
Revert "floppy: fix open(O_ACCMODE) for ioctl-only open"
fs/block_dev: fix potential NULL ptr deref in freeze_bdev()
blk-mq: improve warning for running a queue on the wrong CPU
blk-mq: don't overwrite rq->mq_ctx
block: make sure a big bio is split into at most 256 bvecs
nvme: Fix nvme_get/set_features() with a NULL result pointer
bdev: fix NULL pointer dereference
xen-blkfront: free resources if xlvbd_alloc_gendisk fails
xen-blkfront: introduce blkif_set_queue_limits()
xen-blkfront: fix places not updated after introducing 64KB page granularity
bcache: pr_err: more meaningful error message when nr_stripes is invalid
bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two.
bcache: register_bcache(): call blkdev_put() when cache_alloc() fails
block: Fix race triggered by blk_set_queue_dying()
block: Fix secure erase
nvme: Prevent controller state invalid transition
MSI:
- Use positive flags in pci_alloc_irq_vectors() (Christoph Hellwig)
- Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors() (Christoph Hellwig)
* tag 'pci-v4.8-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
x86/PCI: VMD: Fix infinite loop executing irq's
PCI: Call pci_intx() when using legacy interrupts in pci_alloc_irq_vectors()
PCI: Use positive flags in pci_alloc_irq_vectors()
PCI: Update "pci=resource_alignment" documentation
Ross Zwisler [Thu, 25 Aug 2016 22:17:17 +0000 (15:17 -0700)]
mm: silently skip readahead for DAX inodes
For DAX inodes we need to be careful to never have page cache pages in
the mapping->page_tree. This radix tree should be composed only of DAX
exceptional entries and zero pages.
ltp's readahead02 test was triggering a warning because we were trying
to insert a DAX exceptional entry but found that a page cache page had
already been inserted into the tree. This page was being inserted into
the radix tree in response to a readahead(2) call.
Readahead doesn't make sense for DAX inodes, but we don't want it to
report a failure either. Instead, we just return success and don't do
any work.
Link: http://lkml.kernel.org/r/20160824221429.21158-1-ross.zwisler@linux.intel.com Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Jeff Moyer <jmoyer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jan Kara <jack@suse.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Williams [Thu, 25 Aug 2016 22:17:14 +0000 (15:17 -0700)]
dax: fix device-dax region base
The data offset for a dax region needs to account for a reservation in
the resource range. Otherwise, device-dax is allowing mappings directly
into the memmap or device-info-block area with crash signatures like the
following:
Vegard Nossum [Thu, 25 Aug 2016 22:17:11 +0000 (15:17 -0700)]
fs/seq_file: fix out-of-bounds read
seq_read() is a nasty piece of work, not to mention buggy.
It has (I think) an old bug which allows unprivileged userspace to read
beyond the end of m->buf.
I was getting these:
BUG: KASAN: slab-out-of-bounds in seq_read+0xcd2/0x1480 at addr ffff880116889880
Read of size 2713 by task trinity-c2/1329
CPU: 2 PID: 1329 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #96
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Call Trace:
kasan_object_err+0x1c/0x80
kasan_report_error+0x2cb/0x7e0
kasan_report+0x4e/0x80
check_memory_region+0x13e/0x1a0
kasan_check_read+0x11/0x20
seq_read+0xcd2/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
entry_SYSCALL64_slow_path+0x25/0x25
Object at ffff880116889100, in cache kmalloc-4096 size: 4096
Allocated:
PID = 1329
save_stack_trace+0x26/0x80
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
__kmalloc+0x1aa/0x4a0
seq_buf_alloc+0x35/0x40
seq_read+0x7d8/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
return_from_SYSCALL_64+0x0/0x6a
Freed:
PID = 0
(stack is not available)
Memory state around the buggy address: ffff88011688a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88011688a080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88011688a100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff88011688a180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88011688a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint
This seems to be the same thing that Dave Jones was seeing here:
https://lkml.org/lkml/2016/8/12/334
There are multiple issues here:
1) If we enter the function with a non-empty buffer, there is an attempt
to flush it. But it was not clearing m->from after doing so, which
means that if we try to do this flush twice in a row without any call
to traverse() in between, we are going to be reading from the wrong
place -- the splat above, fixed by this patch.
2) If there's a short write to userspace because of page faults, the
buffer may already contain multiple lines (i.e. pos has advanced by
more than 1), but we don't save the progress that was made so the
next call will output what we've already returned previously. Since
that is a much less serious issue (and I have a headache after
staring at seq_read() for the past 8 hours), I'll leave that for now.
Link: http://lkml.kernel.org/r/1471447270-32093-1-git-send-email-vegard.nossum@oracle.com Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Reported-by: Dave Jones <davej@codemonkey.org.uk> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnd Bergmann [Thu, 25 Aug 2016 22:17:08 +0000 (15:17 -0700)]
mm: memcontrol: avoid unused function warning
A bugfix in v4.8-rc2 introduced a harmless warning when
CONFIG_MEMCG_SWAP is disabled but CONFIG_MEMCG is enabled:
mm/memcontrol.c:4085:27: error: 'mem_cgroup_id_get_online' defined but not used [-Werror=unused-function]
static struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg)
This moves the function inside of the #ifdef block that hides the
calling function, to avoid the warning.
Fixes: 1f47b61fb407 ("mm: memcontrol: fix swap counter leak on swapout from offline cgroup") Link: http://lkml.kernel.org/r/20160824113733.2776701-1-arnd@arndb.de Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 25 Aug 2016 22:17:05 +0000 (15:17 -0700)]
mm: clarify COMPACTION Kconfig text
The current wording of the COMPACTION Kconfig help text doesn't
emphasise that disabling COMPACTION might cripple the page allocator
which relies on the compaction quite heavily for high order requests and
an unexpected OOM can happen with the lack of compaction. Make sure we
are vocal about that.
Link: http://lkml.kernel.org/r/20160823091726.GK23577@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Mel Gorman <mgorman@suse.de> Cc: Joonsoo Kim <js1304@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Thu, 25 Aug 2016 22:17:02 +0000 (15:17 -0700)]
treewide: replace config_enabled() with IS_ENABLED() (2nd round)
Commit 97f2645f358b ("tree-wide: replace config_enabled() with
IS_ENABLED()") mostly killed config_enabled(), but some new users have
appeared for v4.8-rc1. They are all used for a boolean option, so can
be replaced with IS_ENABLED() safely.
Link: http://lkml.kernel.org/r/1471970749-24867-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Kees Cook <keescook@chromium.org> Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Iooss [Thu, 25 Aug 2016 22:17:00 +0000 (15:17 -0700)]
printk: fix parsing of "brl=" option
Commit bbeddf52adc1 ("printk: move braille console support into separate
braille.[ch] files") moved the parsing of braille-related options into
_braille_console_setup(), changing the type of variable str from char*
to char**. In this commit, memcmp(str, "brl,", 4) was correctly updated
to memcmp(*str, "brl,", 4) but not memcmp(str, "brl=", 4).
Update the code to make "brl=" option work again and replace memcmp()
with strncmp() to make the compiler able to detect such an issue.
Fixes: bbeddf52adc1 ("printk: move braille console support into separate braille.[ch] files") Link: http://lkml.kernel.org/r/20160823165700.28952-1-nicolas.iooss_linux@m4x.org Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrea Arcangeli [Thu, 25 Aug 2016 22:16:57 +0000 (15:16 -0700)]
soft_dirty: fix soft_dirty during THP split
While adding proper userfaultfd_wp support with bits in pagetable and
swap entry to avoid false positives WP userfaults through swap/fork/
KSM/etc, I've been adding a framework that mostly mirrors soft dirty.
So I noticed in one place I had to add uffd_wp support to the pagetables
that wasn't covered by soft_dirty and I think it should have.
Example: in the THP migration code migrate_misplaced_transhuge_page()
pmd_mkdirty is called unconditionally after mk_huge_pmd.
That sets soft dirty too (it's a false positive for soft dirty, the soft
dirty bit could be more finegrained and transfer the bit like uffd_wp
will do.. pmd/pte_uffd_wp() enforces the invariant that when it's set
pmd/pte_write is not set).
However in the THP split there's no unconditional pmd_mkdirty after
mk_huge_pmd and pte_swp_mksoft_dirty isn't called after the migration
entry is created. The code sets the dirty bit in the struct page
instead of setting it in the pagetable (which is fully equivalent as far
as the real dirty bit is concerned, as the whole point of pagetable bits
is to be eventually flushed out of to the page, but that is not
equivalent for the soft-dirty bit that gets lost in translation).
This was found by code review only and totally untested as I'm working
to actually replace soft dirty and I don't have time to test potential
soft dirty bugfixes as well :).
Transfer the soft_dirty from pmd to pte during THP splits.
This fix avoids losing the soft_dirty bit and avoids userland memory
corruption in the checkpoint.
Fixes: eef1b3ba053aa6 ("thp: implement split_huge_pmd()") Link: http://lkml.kernel.org/r/1471610515-30229-2-git-send-email-aarcange@redhat.com Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
sysctl: handle error writing UINT_MAX to u32 fields
We have scripts which write to certain fields on 3.18 kernels but this
seems to be failing on 4.4 kernels. An entry which we write to here is
xfrm_aevent_rseqth which is u32.
Commit 230633d109e3 ("kernel/sysctl.c: detect overflows when converting
to int") prevented writing to sysctl entries when integer overflow
occurs. However, this does not apply to unsigned integers.
Heinrich suggested that we introduce a new option to handle 64 bit
limits and set min as 0 and max as UINT_MAX. This might not work as it
leads to issues similar to __do_proc_doulongvec_minmax. Alternatively,
we would need to change the datatype of the entry to 64 bit.
static int __do_proc_doulongvec_minmax(void *data, struct ctl_table
{
i = (unsigned long *) data; //This cast is causing to read beyond the size of data (u32)
vleft = table->maxlen / sizeof(unsigned long); //vleft is 0 because maxlen is sizeof(u32) which is lesser than sizeof(unsigned long) on x86_64.
Introduce a new proc handler proc_douintvec. Individual proc entries
will need to be updated to use the new handler.
[akpm@linux-foundation.org: coding-style fixes] Fixes: 230633d109e3 ("kernel/sysctl.c:detect overflows when converting to int") Link: http://lkml.kernel.org/r/1471479806-5252-1-git-send-email-subashab@codeaurora.org Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Heinrich Schuchardt <xypron.glpk@gmx.de> Cc: Kees Cook <keescook@chromium.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Pirko [Thu, 25 Aug 2016 16:30:52 +0000 (18:30 +0200)]
team: loadbalance: push lacpdus to exact delivery
When team is in bridge and LACP is utilized, LACPDU packets are pushed
to userspace using raw socket and there they are processed. However,
since 8626c56c8279b, LACPDU skbs are dropped by bridge rx_handler so
they never reach packet handlers in rx path. Fix this by explicity treat
LACPDUs to be pushed to exact delivery in team rx_handler.
Reported-by: Ido Schimmel <idosch@mellanox.com> Fixes: 8626c56c8279b ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 25 Aug 2016 06:51:10 +0000 (07:51 +0100)]
net: hns: dereference ppe_cb->ppe_common_cb if it is non-null
ppe_cb->ppe_common_cb is being dereferenced before a null check is
being made on it. If ppe_cb->ppe_common_cb is null then we end up
with a null pointer dereference when assigning dsaf_dev. Fix this
by moving the initialisation of dsaf_dev once we know
ppe_cb->ppe_common_cb is OK to dereference.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This is because table_open() depends on file->f_op to tell which
seq_file ops should be passed down. But, the original file ops in
file->f_op is replaced by "debugfs_full_proxy_file_operations" with
commit 49d200deaa68 ("debugfs: prevent access to removed files'
private data").
Currently, I can think up 2 solutions: 1st, replace
debugfs_create_file() with debugfs_create_file_unsafe();
2nd, make different table_open#() accordingly. The 1st one
is neat, but I don't thoroughly understand its risk. Maybe
someone has a better one.
Signed-off-by: Eric Ren <zren@suse.com> Signed-off-by: David Teigland <teigland@redhat.com>
Chen-Yu Tsai [Thu, 25 Aug 2016 06:26:59 +0000 (14:26 +0800)]
clocksource/drivers/sun4i: Clear interrupts after stopping timer in probe function
The bootloader (U-boot) sometimes uses this timer for various delays.
It uses it as a ongoing counter, and does comparisons on the current
counter value. The timer counter is never stopped.
In some cases when the user interacts with the bootloader, or lets
it idle for some time before loading Linux, the timer may expire,
and an interrupt will be pending. This results in an unexpected
interrupt when the timer interrupt is enabled by the kernel, at
which point the event_handler isn't set yet. This results in a NULL
pointer dereference exception, panic, and no way to reboot.
Clear any pending interrupts after we stop the timer in the probe
function to avoid this.
Cc: stable@vger.kernel.org Signed-off-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
drivers/clocksource/pistachio: Fix memory corruption in init
Driver init code incorrectly uses the block base address and as a result
clears clocksource structure's fields instead of the hardware registers.
Commit 09a998201649 ("timekeeping: Lift clocksource cacheline
restriction") has changed the offsets within pistachio_clocksource
structure and what has previously gone unnoticed now leads to a kernel
panic during boot.
Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Gao Feng [Thu, 25 Aug 2016 01:45:39 +0000 (09:45 +0800)]
8139cp: Fix one possible deadloop in cp_rx_poll
When cp_rx_poll does not get enough packet, it will check the rx
interrupt status again. If so, it will jumpt to rx_status_loop again.
But the goto jump resets the rx variable as zero too.
As a result, it causes one possible deadloop. Assume this case,
rx_status_loop only gets the packet count which is less than budget,
and (cpr16(IntrStatus) & cp_rx_intr_mask) condition is always true.
It causes the deadloop happens and system is blocked.
Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This change makes a common flow for Client instance open during init
and reset path. The Client subtask can handle both the cases instead of
making a separate notify_client_of_open call.
Also it may fix a bug during reset where the service task was leaking
some memory and causing issues.
Change-Id: I7232a32fd52b82e863abb54266fa83122f80a0cd Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>