]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
7 years agoblk-mq: Always schedule hctx->next_cpu
Gabriel Krisman Bertazi [Wed, 28 Sep 2016 03:24:24 +0000 (00:24 -0300)]
blk-mq: Always schedule hctx->next_cpu

commit c02ebfdddbafa9a6a0f52fbd715e6bfa229af9d3 upstream.

Commit 0e87e58bf60e ("blk-mq: improve warning for running a queue on the
wrong CPU") attempts to avoid triggering the WARN_ON in
__blk_mq_run_hw_queue when the expected CPU is dead.  Problem is, in the
last batch execution before round robin, blk_mq_hctx_next_cpu can
schedule a dead CPU and also update next_cpu to the next alive CPU in
the mask, which will trigger the WARN_ON despite the previous
workaround.

The following patch fixes this scenario by always scheduling the value
in hctx->next_cpu.  This changes the moment when we round-robin the CPU
running the hctx, but it really doesn't matter, since it still executes
BLK_MQ_CPU_WORK_BATCH times in a row before switching to another CPU.

Fixes: 0e87e58bf60e ("blk-mq: improve warning for running a queue on the wrong CPU")
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopower: supply: bq27xxx_battery: Fix register map for BQ27510 and BQ27520
Andrew F. Davis [Fri, 4 Nov 2016 18:33:13 +0000 (13:33 -0500)]
power: supply: bq27xxx_battery: Fix register map for BQ27510 and BQ27520

commit 3bee9ea1de687925d116670f036599cbed8b66b0 upstream.

The BQ27510 and BQ27520 use a slightly different register map than the
BQ27500, add a new type enum and add these gauges to it.

Fixes: d74534c27775 ("power: bq27xxx_battery: Add support for additional bq27xxx family devices")
Based-on-patch-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobq24190_charger: Fix PM runtime use for bq24190_battery_set_property
Tony Lindgren [Tue, 15 Nov 2016 03:38:31 +0000 (19:38 -0800)]
bq24190_charger: Fix PM runtime use for bq24190_battery_set_property

commit 075eb5719d53e8bb4a406ad87e1de99319aa50f0 upstream.

There's a typo, it should do pm_runtime_get_sync, not put.

Fixes: d7bf353fd0aa3 ("bq24190_charger: Add support for TI BQ24190 Battery Charger")
Cc: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Mark Greer <mgreer@animalcreek.com>
Signed-off-by: Sebastian Reichel <sre@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoiw_cxgb4: Fix error return code in c4iw_rdev_open()
Wei Yongjun [Sat, 17 Sep 2016 00:41:37 +0000 (00:41 +0000)]
iw_cxgb4: Fix error return code in c4iw_rdev_open()

commit 15f7e3c21b76598bc6e5816d2577ce843b2b963f upstream.

Fix to return error code -ENOMEM from the __get_free_page() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 05eb23893c2c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowercap/intel_rapl: fix and tidy up error handling
Jacob Pan [Mon, 28 Nov 2016 21:53:11 +0000 (13:53 -0800)]
powercap/intel_rapl: fix and tidy up error handling

commit cb43f81b8489dcb87555e16c17453f0a9fa690f2 upstream.

Commit e1399ba20eee ("powercap / RAPL: handle missing MSRs") added
contraint_to_pl() function to return index into an array. But it
can potentially return -EINVAL if powercap layer sends an out of
range constraint ID. This patch adds sanity check.

Unnecessary RAPL domain pointer check is removed since it must be
initialized before calling rapl_unit_xlate().

Fixes: e1399ba20eee ("powercap / RAPL: handle missing MSRs")
Reported-by: Odzioba, Lukasz <lukasz.odzioba@intel.com>
Reported-by: Koss, Marcin <marcin.koss@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoACPI / APEI: Fix NMI notification handling
Prarit Bhargava [Wed, 30 Nov 2016 13:19:39 +0000 (08:19 -0500)]
ACPI / APEI: Fix NMI notification handling

commit a545715d2dae8d071c5b06af947b07ffa846b288 upstream.

When removing and adding cpu 0 on a system with GHES NMI the following stack
trace is seen when re-adding the cpu:

WARNING: CPU: 0 PID: 0 at arch/x86/kernel/apic/apic.c:1349 setup_local_APIC+
Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache coretemp intel_ra
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.0-rc6+ #2
Call Trace:
 dump_stack+0x63/0x8e
 __warn+0xd1/0xf0
 warn_slowpath_null+0x1d/0x20
 setup_local_APIC+0x275/0x370
 apic_ap_setup+0xe/0x20
 start_secondary+0x48/0x180
 set_init_arg+0x55/0x55
 early_idt_handler_array+0x120/0x120
 x86_64_start_reservations+0x2a/0x2c
 x86_64_start_kernel+0x13d/0x14c

During the cpu bringup, wakeup_cpu_via_init_nmi() is called and issues an
NMI on CPU 0.  The GHES NMI handler, ghes_notify_nmi() runs the
ghes_proc_irq_work work queue which ends up setting IRQ_WORK_VECTOR
(0xf6).  The "faulty" IR line set at arch/x86/kernel/apic/apic.c:1349 is  also
0xf6 (specifically APIC IRR for irqs 255 to 224 is 0x400000) which confirms
that something has set the IRQ_WORK_VECTOR line prior to the APIC being
initialized.

Commit 2383844d4850 ("GHES: Elliminate double-loop in the NMI handler")
incorrectly modified the behavior such that the handler returns
NMI_HANDLED only if an error was processed, and incorrectly runs the ghes
work queue for every NMI.

This patch modifies the ghes_proc_irq_work() to run as it did prior to
2383844d4850 ("GHES: Elliminate double-loop in the NMI handler") by
properly returning NMI_HANDLED and only calling the work queue if
NMI_HANDLED has been set.

Fixes: 2383844d4850 (GHES: Elliminate double-loop in the NMI handler)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoblock: cfq_cpd_alloc() should use @gfp
Tejun Heo [Thu, 10 Nov 2016 16:16:37 +0000 (11:16 -0500)]
block: cfq_cpd_alloc() should use @gfp

commit ebc4ff661fbe76781c6b16dfb7b754a5d5073f8e upstream.

cfq_cpd_alloc() which is the cpd_alloc_fn implementation for cfq was
incorrectly hard coding GFP_KERNEL instead of using the mask specified
through the @gfp parameter.  This currently doesn't cause any actual
issues because all current callers specify GFP_KERNEL.  Fix it.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e4a9bde9589f ("blkcg: replace blkcg_policy->cpd_size with ->cpd_alloc/free_fn() methods")
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoblock: Change extern inline to static inline
Tobias Klauser [Fri, 18 Nov 2016 14:16:06 +0000 (15:16 +0100)]
block: Change extern inline to static inline

commit 9a05e7541c39680d28ecf91892338e074738d5fd upstream.

With compilers which follow the C99 standard (like modern versions of
gcc and clang), "extern inline" does the opposite thing from older
versions of gcc (emits code for an externally linkable version of the
inline function).

"static inline" does the intended behavior in all cases instead.

Description taken from commit 6d91857d4826 ("staging, rtl8192e,
LLVMLinux: Change extern inline to static inline").

This also fixes the following GCC warning when building with CONFIG_PM
disabled:

  ./include/linux/blkdev.h:1143:20: warning: no previous prototype for 'blk_set_runtime_active' [-Wmissing-prototypes]

Fixes: d07ab6d11477 ("block: Add blk_set_runtime_active()")
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoACPI / CPPC: set an error code on probe error path
Dan Carpenter [Wed, 30 Nov 2016 19:22:54 +0000 (22:22 +0300)]
ACPI / CPPC: set an error code on probe error path

commit 501634759d55a5b56967de6d9465acf02bbc3565 upstream.

We should return -EINVAL (instead of 0) if get_cpu_device() fails.

Fixes: 158c998ea44b (ACPI / CPPC: add sysfs support to compute delivered performance)
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoregulators: helpers: Fix handling of bypass_val_on in get_bypass_regmap
Charles Keepax [Thu, 10 Nov 2016 10:45:18 +0000 (10:45 +0000)]
regulators: helpers: Fix handling of bypass_val_on in get_bypass_regmap

commit 85b037442e3f0e84296ab1010fd6b057eee18496 upstream.

The handling of bypass_val_on that was added in
regulator_get_bypass_regmap is done unconditionally however
several drivers don't define a value for bypass_val_on. This
results in those drivers reporting bypass being enabled when
it is not. In regulator_set_bypass_regmap we use bypass_mask
if bypass_val_on is zero. This patch adds similar handling in
regulator_get_bypass_regmap.

Fixes: commit dd1a571daee7 ("regulator: helpers: Ensure bypass register field matches ON value")
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agocpufreq: powernv: Disable preemption while checking CPU throttling state
Denis Kirjanov [Tue, 8 Nov 2016 10:39:28 +0000 (05:39 -0500)]
cpufreq: powernv: Disable preemption while checking CPU throttling state

commit 8a10c06a20ec8097a68fd7a4a1c0e285095b4d2f upstream.

With preemption turned on we can read incorrect throttling state
while being switched to CPU on a different chip.

 BUG: using smp_processor_id() in preemptible [00000000] code: cat/7343
 caller is .powernv_cpufreq_throttle_check+0x2c/0x710
 CPU: 13 PID: 7343 Comm: cat Not tainted 4.8.0-rc5-dirty #1
 Call Trace:
 [c0000007d25b75b0] [c000000000971378] .dump_stack+0xe4/0x150 (unreliable)
 [c0000007d25b7640] [c0000000005162e4] .check_preemption_disabled+0x134/0x150
 [c0000007d25b76e0] [c0000000007b63ac] .powernv_cpufreq_throttle_check+0x2c/0x710
 [c0000007d25b7790] [c0000000007b6d18] .powernv_cpufreq_target_index+0x288/0x360
 [c0000007d25b7870] [c0000000007acee4] .__cpufreq_driver_target+0x394/0x8c0
 [c0000007d25b7920] [c0000000007b22ac] .cpufreq_set+0x7c/0xd0
 [c0000007d25b79b0] [c0000000007adf50] .store_scaling_setspeed+0x80/0xc0
 [c0000007d25b7a40] [c0000000007ae270] .store+0xa0/0x100
 [c0000007d25b7ae0] [c0000000003566e8] .sysfs_kf_write+0x88/0xb0
 [c0000007d25b7b70] [c0000000003553b8] .kernfs_fop_write+0x178/0x260
 [c0000007d25b7c10] [c0000000002ac3cc] .__vfs_write+0x3c/0x1c0
 [c0000007d25b7cf0] [c0000000002ad584] .vfs_write+0xc4/0x230
 [c0000007d25b7d90] [c0000000002aeef8] .SyS_write+0x58/0x100
 [c0000007d25b7e30] [c00000000000bfec] system_call+0x38/0xfc

Fixes: 09a972d16209 (cpufreq: powernv: Report cpu frequency throttling)
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopowerpc/64: Simplify adaptation to new ISA v3.00 HPTE format
Paul Mackerras [Fri, 11 Nov 2016 05:55:03 +0000 (16:55 +1100)]
powerpc/64: Simplify adaptation to new ISA v3.00 HPTE format

commit 6b243fcfb5f1e16bcf732e6f86a63f8af5b59a9f upstream.

This changes the way that we support the new ISA v3.00 HPTE format.
Instead of adapting everything that uses HPTE values to handle either
the old format or the new format, depending on which CPU we are on,
we now convert explicitly between old and new formats if necessary
in the low-level routines that actually access HPTEs in memory.
This limits the amount of code that needs to know about the new
format and makes the conversions explicit.  This is OK because the
old format contains all the information that is in the new format.

This also fixes operation under a hypervisor, because the H_ENTER
hypercall (and other hypercalls that deal with HPTEs) will continue
to require the HPTE value to be supplied in the old format.  At
present the kernel will not boot in HPT mode on POWER9 under a
hypervisor.

This fixes and partially reverts commit 50de596de8be
("powerpc/mm/hash: Add support for Power9 Hash", 2016-04-29).

Fixes: 50de596de8be ("powerpc/mm/hash: Add support for Power9 Hash")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoremoteproc: st: Fix error return code in st_rproc_probe()
Wei Yongjun [Mon, 17 Oct 2016 16:23:35 +0000 (16:23 +0000)]
remoteproc: st: Fix error return code in st_rproc_probe()

commit 1d701d3dd8caf6660ff33c3c23a115b4649c5cdb upstream.

Fix to return a negative error code from the st_rproc_state() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 63edb0310a5c ("remoteproc: Supply controller driver for ST's Remote Processors")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoremoteproc: qcom_wcnss: Fix circular module dependency
Bjorn Andersson [Fri, 4 Nov 2016 02:37:25 +0000 (19:37 -0700)]
remoteproc: qcom_wcnss: Fix circular module dependency

commit 6de1a507c46bf22ed97043495b9ab96e4d5c213b upstream.

The tie between the main WCNSS driver and the IRIS driver causes a
circular dependency between the two modules. Neither part makes sense to
have on their own so lets merge them into one module.

For the sake of picking up the clock and regulator resources described
in the iris of_node we need an associated struct device. But, to keep
the size of the patch down we continue to represent the IRIS part as its
own platform_driver, within the same module, rather than setting up a
dummy device.

Fixes: aed361adca9f ("remoteproc: qcom: Introduce WCNSS peripheral image loader")
Reported-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm: Initialise drm_mm.head_node.allocated
Chris Wilson [Wed, 30 Nov 2016 20:51:26 +0000 (20:51 +0000)]
drm: Initialise drm_mm.head_node.allocated

commit cc98e6ce6abe1c0103cbd7aff1ee586622a9361e upstream.

commit 202b52b7fbf7 ("drm: Track drm_mm nodes with an interval tree")
introduced a requirement that the special drm_mm.head_node was
initialised and marked as not being allocated. It is a very special node
that has no side but has a hole that represents the drm_mm address
space, and holds the list of nodes. Since it is not a real node, it is
not part of the node rbtree and we detect this as it being unallocated.
This presumed that drm_mm_init() was initialising it to zero. It happens
that i915 kzallocs its objects and so it was accidentally setting it,
but for generic use we cannot make that assumption.

[   22.981519] general protection fault: 0000 [#1] SMP
[   22.981521] Modules linked in: test_drm_mm(+) ctr ccm arc4 rt2800usb rt2x00usb rt2800lib rt2x00lib crc_ccitt mac80211 cmac rfcomm bnep snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel dcdbas snd_hda_codec x86_pkg_temp_thermal intel_powerclamp btusb snd_hda_core coretemp crct10dif_pclmul cfg80211 btrtl btbcm btintel bluetooth crc32_pclmul ghash_clmulni_intel aesni_intel snd_pcm i2c_hid aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd snd_timer hid_multitouch snd joydev serio_raw lpc_ich mfd_core i2c_designware_platform i2c_designware_core 8250_dw binfmt_misc soundcore acpi_pad nls_iso8859_1 usbhid hid psmouse ahci libahci [last unloaded: test_drm_mm]
[   22.981544] CPU: 1 PID: 2088 Comm: drm_mm Tainted: G        W       4.9.0-rc7+ #234
[   22.981545] Hardware name: Dell Inc. XPS 13 9343/0310JH, BIOS A07 11/11/2015
[   22.981546] task: ffff88020c971cc0 task.stack: ffffc90001728000
[   22.981547] RIP: 0010:[<ffffffff814050f0>]  [<ffffffff814050f0>] drm_mm_interval_tree_add_node+0xa0/0xd0
[   22.981551] RSP: 0018:ffffc9000172ba98  EFLAGS: 00010202
[   22.981552] RAX: 0f0000c69cf63d80 RBX: ffff88020be00000 RCX: ffff88020be00000
[   22.981553] RDX: 0000000000000fff RSI: ffffc9000172bc48 RDI: ffffffff810ac4df
[   22.981553] RBP: ffffc9000172bb08 R08: ffffc9000172bc70 R09: 0000000000000fff
[   22.981554] R10: ffffffff810ac4d7 R11: 4dc04d8b4cffffe5 R12: 0000000000001000
[   22.981555] R13: ffffc9000172bbd0 R14: ffffc9000172bbe0 R15: 0000000002000000
[   22.981556] FS:  00007f80c9fab740(0000) GS:ffff88021f480000(0000) knlGS:0000000000000000
[   22.981557] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   22.981558] CR2: 00007f80c9fd5000 CR3: 000000020c191000 CR4: 00000000003406e0
[   22.981559] Stack:
[   22.981560]  ffffffff81405d09 ffff88020be00000 ffffc9000172bbe0 000000000172bb08
[   22.981562]  ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
[   22.981563]  0000000002000000 0000000002000000 ffffffffa02f3000 ffff88020be00000
[   22.981565] Call Trace:
[   22.981568]  [<ffffffff81405d09>] ? drm_mm_insert_node_generic+0x229/0x310
[   22.981570]  [<ffffffffa02f3000>] ? 0xffffffffa02f3000
[   22.981572]  [<ffffffffa02903c1>] __subtest_insert_range.constprop.7+0xd1/0x5b0 [test_drm_mm]
[   22.981575]  [<ffffffff81081222>] ? default_wake_function+0x12/0x20
[   22.981576]  [<ffffffff81096905>] ? __wake_up_common+0x55/0x90
[   22.981578]  [<ffffffff81085f42>] ? sched_clock_cpu+0x72/0xa0
[   22.981581]  [<ffffffff811308ad>] ? irq_work_queue+0xd/0x80
[   22.981582]  [<ffffffff810abcc4>] ? wake_up_klogd+0x34/0x40
[   22.981584]  [<ffffffff810ac19d>] ? console_unlock+0x4cd/0x530
[   22.981585]  [<ffffffff810ac4d7>] ? vprintk_emit+0x2d7/0x490
[   22.981587]  [<ffffffff810ac82f>] ? vprintk_default+0x1f/0x30
[   22.981589]  [<ffffffff81146e1c>] ? printk+0x4d/0x4f
[   22.981590]  [<ffffffffa02f3000>] ? 0xffffffffa02f3000
[   22.981592]  [<ffffffffa02908b5>] subtest_insert_range+0x15/0x80 [test_drm_mm]
[   22.981594]  [<ffffffffa02f3088>] test_drm_mm_init+0x88/0x1000 [test_drm_mm]
[   22.981597]  [<ffffffff8100043d>] do_one_initcall+0x3d/0x150
[   22.981600]  [<ffffffff8119dfbf>] ? kfree+0x13f/0x180
[   22.981602]  [<ffffffff811471f2>] do_init_module+0x60/0x1f1
[   22.981606]  [<ffffffff810db878>] load_module+0x2228/0x2790
[   22.981608]  [<ffffffff810d8590>] ? __symbol_put+0x40/0x40
[   22.981612]  [<ffffffff811c52b1>] ? kernel_read+0x41/0x60
[   22.981614]  [<ffffffff810dbfb6>] SYSC_finit_module+0x96/0xd0
[   22.981617]  [<ffffffff810dc00e>] SyS_finit_module+0xe/0x10
[   22.981620]  [<ffffffff816e7aa4>] entry_SYSCALL_64_fastpath+0x17/0x98
[   22.981622] Code: c7 41 30 00 00 00 00 48 89 e5 48 89 3a 48 c7 c2 20 4e 40 81 e8 b2 a1 f0 ff 5d c3 48 8d 56 78 45 31 d2 48 89 d6 eb 25 48 8b 51 58 <48> 39 50 38 73 04 48 89 50 38 4c 8b 58 28 4c 39 59 48 48 8d 50
[   22.981651] RIP  [<ffffffff814050f0>] drm_mm_interval_tree_add_node+0xa0/0xd0
[   22.981655]  RSP <ffffc9000172ba98>

Testcase: igt/drm_mm
Fixes: 202b52b7fbf7 ("drm: Track drm_mm nodes with an interval tree")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: <drm-intel-fixes@lists.freedesktop.org> # v4.9-rc1+
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161130205126.31106-1-chris@chris-wilson.co.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/i915: Move the min_pixclk[] handling to the end of readout
Ville Syrjälä [Tue, 20 Dec 2016 15:39:02 +0000 (17:39 +0200)]
drm/i915: Move the min_pixclk[] handling to the end of readout

commit 00b2b7288299a8c73c0c37b531a075ba5c849e67 upstream.

Trying to determine the pixel rate of the pipe can't be done until we
know the clock, which means it can't be done until the encoder
.get_config() hooks have been called. So let's move the min_pixclk[]
stuff to the end of intel_modeset_readout_hw_state() when we actually
have gathered all the required infromation.

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Fixes: 565602d7501a ("drm/i915: Do not acquire crtc state to check clock during modeset, v4.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20161220153902.15621-1-ville.syrjala@linux.intel.com
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
(cherry picked from commit aca1ebf491518910df156f3dab6a66306bb52e28)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/panel: simple: Check against num_timings when setting preferred for timing
Chen-Yu Tsai [Mon, 24 Oct 2016 13:21:15 +0000 (21:21 +0800)]
drm/panel: simple: Check against num_timings when setting preferred for timing

commit 230c5b44233ff0543c0b5ccf4ff9400057010fbe upstream.

In the loop on .timings, we should check .num_timings to see if it's the
only mode specified, not .num_modes, which should be used with .modes.

Fixes: cda553725c92 ("drm/panel: simple: Set appropriate mode type")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm: avoid uninitialized timestamp use in wait_vblank
Arnd Bergmann [Mon, 17 Oct 2016 22:13:39 +0000 (00:13 +0200)]
drm: avoid uninitialized timestamp use in wait_vblank

commit cff52e5fc4cfc978b7df898dc14a0492c7ef0ae8 upstream.

gcc warns about the timestamp in drm_wait_vblank being possibly
used without an initialization:

drivers/gpu/drm/drm_irq.c: In function 'drm_crtc_send_vblank_event':
drivers/gpu/drm/drm_irq.c:992:24: error: 'now.tv_usec' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/gpu/drm/drm_irq.c:1069:17: note: 'now.tv_usec' was declared here
drivers/gpu/drm/drm_irq.c:991:23: error: 'now.tv_sec' may be used uninitialized in this function [-Werror=maybe-uninitialized]

This can happen if drm_vblank_count_and_time() returns 0 in its
error path. To sanitize the error case, I'm changing that function
to return a zero timestamp when it fails.

Fixes: e6ae8687a87b ("drm: idiot-proof vblank")
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Cc: Rob Clark <robdclark@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20161017221355.1861551-6-arnd@arndb.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/i915/gen9: Fix PCODE polling during SAGV disabling
Imre Deak [Mon, 5 Dec 2016 16:27:38 +0000 (18:27 +0200)]
drm/i915/gen9: Fix PCODE polling during SAGV disabling

commit dccf82ad1775f2b9c36ec85e25e39d88c7e86818 upstream.

According to the previous patch, it's possible atm that we call
intel_do_sagv_disable() only once during the 1ms period and time out if
that call fails. As opposed to this the spec says that we need to keep
retrying this request for a 1ms duration, so let's do this similarly to
the CDCLK change notification request.

v4-5:
- Rebased on the reply_mask, reply change.
v6:
- Remove w/s change. (Lyude)
- Rebased on the timeout_base argument change.

Cc: Lyude <cpaul@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 656d1b89e5ff ("drm/i915/skl: Add support for the SAGV, fix underrun hangs")
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Lyude <lyude@redhat.com> (v4)
Link: http://patchwork.freedesktop.org/patch/msgid/1480955258-26311-2-git-send-email-imre.deak@intel.com
(cherry picked from commit b3b8e99984a4eace91bc097e8f8cec71441cae16)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoi2c: mux: pca954x: fix i2c mux selection caching
Russell King [Sat, 17 Dec 2016 12:10:56 +0000 (12:10 +0000)]
i2c: mux: pca954x: fix i2c mux selection caching

commit 7f638c1cb0a1112dbe0b682a42db30521646686b upstream.

smbus functions return -ve on error, 0 on success.  However,
__i2c_transfer() have a different return signature - -ve on error, or
number of buffers transferred (which may be zero or greater.)

The upshot of this is that the sense of the test is reversed when using
the mux on a bus supporting the master_xfer method: we cache the value
and never retry if we fail to transfer any buffers, but if we succeed,
we clear the cached value.

Fix this by making pca954x_reg_write() return a negative error code for
all failure cases.

Fixes: 463e8f845cbf ("i2c: mux: pca954x: retry updating the mux selection on failure")
Acked-by: Peter Rosin <peda@axentia.se>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoNFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.
NeilBrown [Mon, 19 Dec 2016 00:19:31 +0000 (11:19 +1100)]
NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success.

commit cfd278c280f997cf2fe4662e0acab0fe465f637b upstream.

Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
then ds->ds_clp will also be non-NULL.

This is not necessasrily true in the case when the process received a fatal signal
while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
In that case ->ds_clp may not be set, and the devid may not recently have been marked
unavailable.

So add a test for ds_clp == NULL and return NULL in that case.

Fixes: c23266d532b4 ("NFS4.1 Fix data server connection race")
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Olga Kornievskaia <aglo@umich.edu>
Acked-by: Adamson, Andy <William.Adamson@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoNFS: Fix a performance regression in readdir
Trond Myklebust [Sat, 19 Nov 2016 15:54:55 +0000 (10:54 -0500)]
NFS: Fix a performance regression in readdir

commit 79f687a3de9e3ba2518b4ea33f38ca6cbe9133eb upstream.

Ben Coddington reports that commit 311324ad1713, by adding the function
nfs_dir_mapping_need_revalidate() that checks page cache validity on
each call to nfs_readdir() causes a performance regression when
the directory is being modified.

If the directory is changing while we're iterating through the directory,
POSIX does not require us to invalidate the page cache unless the user
calls rewinddir(). However, we still do want to ensure that we use
readdirplus in order to avoid a load of stat() calls when the user
is doing an 'ls -l' workload.

The fix should be to invalidate the page cache immediately when we're
setting the NFS_INO_ADVISE_RDPLUS bit.

Reported-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 311324ad1713 ("NFS: Be more aggressive in using readdirplus...")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopNFS: Fix race in pnfs_wait_on_layoutreturn
Trond Myklebust [Fri, 18 Nov 2016 20:21:30 +0000 (15:21 -0500)]
pNFS: Fix race in pnfs_wait_on_layoutreturn

commit ee284e35d8c71bf5d4d807eaff6f67a17134b359 upstream.

We must put the task to sleep while holding the inode->i_lock in order
to ensure atomicity with the test for NFS_LAYOUT_RETURN.

Fixes: 500d701f336b ("NFS41: make close wait for layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoNFS: fix typo in parameter description
Wei Yongjun [Fri, 28 Oct 2016 14:37:02 +0000 (14:37 +0000)]
NFS: fix typo in parameter description

commit f36ab161bebe464d33b998294eff29b17a9c8918 upstream.

Fix typo in parameter description.

Fixes: 5405fc44c337 ("NFSv4.x: Add kernel parameter to control the
callback server")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopinctrl: meson: fix gpio request disabling other modes
Neil Armstrong [Tue, 6 Dec 2016 14:08:16 +0000 (15:08 +0100)]
pinctrl: meson: fix gpio request disabling other modes

commit f24d311f92b516a8aadef5056424ccabb4068e7b upstream.

The pinctrl_gpio_request is called with the "full" gpio number, already
containing the base, then meson_pmx_request_gpio is then called with the
final pin number.
Remove the base addition when calling meson_pmx_disable_other_groups.

Fixes: 6ac730951104 ("pinctrl: add driver for Amlogic Meson SoCs")
CC: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Kevin Hilman <khilman@baylibre.com>
Acked-by: Beniamino Galvani <b.galvani@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobtrfs: fix error handling when run_delayed_extent_op fails
Jeff Mahoney [Tue, 20 Dec 2016 18:28:27 +0000 (13:28 -0500)]
btrfs: fix error handling when run_delayed_extent_op fails

commit aa7c8da35d1905d80e840d075f07d26ec90144b5 upstream.

In __btrfs_run_delayed_refs, the error path when run_delayed_extent_op
fails sets locked_ref->processing = 0 but doesn't re-increment
delayed_refs->num_heads_ready.  As a result, we end up triggering
the WARN_ON in btrfs_select_ref_head.

Fixes: d7df2c796d7 (Btrfs: attach delayed ref updates to delayed ref heads)
Reported-by: Jon Nelson <jnelson-suse@jamponi.net>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobtrfs: fix locking when we put back a delayed ref that's too new
Jeff Mahoney [Tue, 20 Dec 2016 18:28:28 +0000 (13:28 -0500)]
btrfs: fix locking when we put back a delayed ref that's too new

commit d0280996437081dd12ed1e982ac8aeaa62835ec4 upstream.

In __btrfs_run_delayed_refs, when we put back a delayed ref that's too
new, we have already dropped the lock on locked_ref when we set
->processing = 0.

This patch keeps the lock to cover that assignment.

Fixes: d7df2c796d7 (Btrfs: attach delayed ref updates to delayed ref heads)
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
Guilherme G. Piccoli [Thu, 29 Dec 2016 00:13:15 +0000 (22:13 -0200)]
nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too

commit b5a10c5f7532b7473776da87e67f8301bbc32693 upstream.

Commit 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter
readiness") introduced a quirk to adapters that cannot read the bit
NVME_CSTS_RDY right after register NVME_REG_CC is set; these adapters
need a delay or else the action of reading the bit NVME_CSTS_RDY could
somehow corrupt adapter's registers state and it never recovers.

When this quirk was added, we checked ctrl->tagset in order to avoid
quirking in probe time, supposing we would never require such delay
during probe. Well, it was too optimistic; we in fact need this quirk
at probe time in some cases, like after a kexec.

In some experiments, after abnormal shutdown of machine (aka power cord
unplug), we booted into our bootloader in Power, which is a Linux kernel,
and kexec'ed into another distro. If this kexec is too quick, we end up
reaching the probe of NVMe adapter in that distro when adapter is in
bad state (not fully initialized on our bootloader). What happens next
is that nvme_wait_ready() is unable to complete, except if the quirk is
enabled.

So, this patch removes the original ctrl->tagset verification in order
to enable the quirk even on probe time.

Fixes: 54adc01055b7 ("nvme/quirk: Add a delay before checking for adapter readiness")
Reported-by: Andrew Byrne <byrneadw@ie.ibm.com>
Reported-by: Jaime A. H. Gomez <jahgomez@mx1.ibm.com>
Reported-by: Zachary D. Myers <zdmyers@us.ibm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Acked-by: Jeffrey Lien <Jeff.Lien@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command...
Lukasz Odzioba [Wed, 28 Dec 2016 13:55:40 +0000 (14:55 +0100)]
x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option

commit dd853fd216d1485ed3045ff772079cc8689a9a4a upstream.

A negative number can be specified in the cmdline which will be used as
setup_clear_cpu_cap() argument. With that we can clear/set some bit in
memory predceeding boot_cpu_data/cpu_caps_cleared which may cause kernel
to misbehave. This patch adds lower bound check to setup_disablecpuid().

Boris Petkov reproduced a crash:

  [    1.234575] BUG: unable to handle kernel paging request at ffffffff858bd540
  [    1.236535] IP: memcpy_erms+0x6/0x10

Signed-off-by: Lukasz Odzioba <lukasz.odzioba@intel.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: andi.kleen@intel.com
Cc: bp@alien8.de
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: slaoub@gmail.com
Fixes: ac72e7888a61 ("x86: add generic clearcpuid=... option")
Link: http://lkml.kernel.org/r/1482933340-11857-1-git-send-email-lukasz.odzioba@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoi2c: piix4: Avoid race conditions with IMC
Ricardo Ribalda Delgado [Wed, 11 Jan 2017 09:11:44 +0000 (10:11 +0100)]
i2c: piix4: Avoid race conditions with IMC

commit 701dc207bf551d9fe6defa36e84a911e880398c3 upstream.

On AMD's SB800 and upwards, the SMBus is shared with the Integrated
Micro Controller (IMC).

The platform provides a hardware semaphore to avoid race conditions
among them. (Check page 288 of the SB800-Series Southbridges Register
Reference Guide http://support.amd.com/TechDocs/45482.pdf)

Without this patch, many access to the SMBus end with an invalid
transaction or even with the bus stalled.

Reported-by: Alexandre Desnoyers <alex@qtec.com>
Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>:
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/mlx5: Only cancel recovery work when cleaning up device
Daniel Jurgens [Tue, 10 Jan 2017 20:33:39 +0000 (22:33 +0200)]
net/mlx5: Only cancel recovery work when cleaning up device

commit 5e44fca5047054f1762813751626b5245e0da022 upstream.

Do not attempt to drain the health workqueue when unloading the device in
the recovery flow, this can cause a deadlock when the recovery work
tries to cancel itself with sync.

Because the work is no longer unconditionally canceled when unloading, it
must be explicitly canceled in the AER flow.

fixes: 689a248df83b ("net/mlx5: Cancel recovery work in remove flow")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix modem-control and B0 handling
Johan Hovold [Fri, 6 Jan 2017 18:15:12 +0000 (19:15 +0100)]
USB: serial: ch341: fix modem-control and B0 handling

commit 030ee7ae52a46a2be52ccc8242c4a330aba8d38e upstream.

The modem-control signals are managed by the tty-layer during open and
should not be asserted prematurely when set_termios is called from
driver open.

Also make sure that the signals are asserted only when changing speed
from B0.

Fixes: 664d5df92e88 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/amdgpu: drop verde dpm quirks
Alex Deucher [Thu, 5 Jan 2017 18:02:37 +0000 (13:02 -0500)]
drm/amdgpu: drop verde dpm quirks

commit 7192c54a68013f6058b1bb505645fcd07015191c upstream.

Port of radeon change to amdgpu.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/amdgpu: update si kicker smc firmware
Flora Cui [Thu, 15 Dec 2016 05:43:59 +0000 (13:43 +0800)]
drm/amdgpu: update si kicker smc firmware

commit 5165484b02f2cbedb5bf3a41ff5e8ae16069016c upstream.

Use the appropriate smc firmware for each chip revision.
Using the wrong one can cause stability issues.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/radeon: drop verde dpm quirks
Alex Deucher [Thu, 5 Jan 2017 17:39:01 +0000 (12:39 -0500)]
drm/radeon: drop verde dpm quirks

commit 8a08403bcb39f5d0e733bcf59a8a74f16b538f6e upstream.

fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=98897
https://bugs.launchpad.net/bugs/1651981

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Adrian Fiergolski <A.Fiergolski@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/radeon: update smc firmware selection for SI
Alex Deucher [Thu, 5 Jan 2017 17:15:52 +0000 (12:15 -0500)]
drm/radeon: update smc firmware selection for SI

commit 6458bd4dfd9414cba5804eb9907fe2a824278c34 upstream.

Use the appropriate smc firmware for each chip revision.
Using the wrong one can cause stability issues.

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm: Clean up planes in atomic commit helper failure path
Laurent Pinchart [Mon, 2 Jan 2017 23:14:27 +0000 (01:14 +0200)]
drm: Clean up planes in atomic commit helper failure path

commit aebe55c2d4b998741c0847ace1b4af47d73c763b upstream.

If waiting for fences fails for blocking commits, planes must be cleaned
up before returning.

Fixes: f6ce410a59a4 ("drm/fence: allow fence waiting to be interrupted by userspace")
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170102231427.7192-1-laurent.pinchart@ideasonboard.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrm/i915/gen9: Fix PCODE polling timeout in stable backport
Imre Deak [Mon, 16 Jan 2017 16:59:46 +0000 (18:59 +0200)]
drm/i915/gen9: Fix PCODE polling timeout in stable backport

The backport of
2c7d0602c - "Fix PCODE polling during CDCLK change notification"
to the 4.9 stable tree used an incorrect timeout value. Fix this up
so the backport matches the upstream commit.

Reported-by: Thomas Backlund <tmb@mageia.org>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonet/af_iucv: don't use paged skbs for TX on HiperSockets
Julian Wiedmann [Tue, 10 Jan 2017 16:10:34 +0000 (17:10 +0100)]
net/af_iucv: don't use paged skbs for TX on HiperSockets

commit dc5367bcc556e97555fc94a32cd1aadbebdff47e upstream.

With commit e53743994e21
("af_iucv: use paged SKBs for big outbound messages"),
we transmit paged skbs for both of AF_IUCV's transport modes
(IUCV or HiperSockets).
The qeth driver for Layer 3 HiperSockets currently doesn't
support NETIF_F_SG, so these skbs would just be linearized again
by the stack.
Avoid that overhead by using paged skbs only for IUCV transport.

cc stable, since this also circumvents a significant skb leak when
sending large messages (where the skb then needs to be linearized).

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Fixes: e53743994e21 ("af_iucv: use paged SKBs for big outbound messages")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosysctl: Drop reference added by grab_header in proc_sys_readdir
Zhou Chengming [Fri, 6 Jan 2017 01:32:32 +0000 (09:32 +0800)]
sysctl: Drop reference added by grab_header in proc_sys_readdir

commit 93362fa47fe98b62e4a34ab408c4a418432e7939 upstream.

Fixes CVE-2016-9191, proc_sys_readdir doesn't drop reference
added by grab_header when return from !dir_emit_dots path.
It can cause any path called unregister_sysctl_table will
wait forever.

The calltrace of CVE-2016-9191:

[ 5535.960522] Call Trace:
[ 5535.963265]  [<ffffffff817cdaaf>] schedule+0x3f/0xa0
[ 5535.968817]  [<ffffffff817d33fb>] schedule_timeout+0x3db/0x6f0
[ 5535.975346]  [<ffffffff817cf055>] ? wait_for_completion+0x45/0x130
[ 5535.982256]  [<ffffffff817cf0d3>] wait_for_completion+0xc3/0x130
[ 5535.988972]  [<ffffffff810d1fd0>] ? wake_up_q+0x80/0x80
[ 5535.994804]  [<ffffffff8130de64>] drop_sysctl_table+0xc4/0xe0
[ 5536.001227]  [<ffffffff8130de17>] drop_sysctl_table+0x77/0xe0
[ 5536.007648]  [<ffffffff8130decd>] unregister_sysctl_table+0x4d/0xa0
[ 5536.014654]  [<ffffffff8130deff>] unregister_sysctl_table+0x7f/0xa0
[ 5536.021657]  [<ffffffff810f57f5>] unregister_sched_domain_sysctl+0x15/0x40
[ 5536.029344]  [<ffffffff810d7704>] partition_sched_domains+0x44/0x450
[ 5536.036447]  [<ffffffff817d0761>] ? __mutex_unlock_slowpath+0x111/0x1f0
[ 5536.043844]  [<ffffffff81167684>] rebuild_sched_domains_locked+0x64/0xb0
[ 5536.051336]  [<ffffffff8116789d>] update_flag+0x11d/0x210
[ 5536.057373]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.064186]  [<ffffffff81167acb>] ? cpuset_css_offline+0x1b/0x60
[ 5536.070899]  [<ffffffff810fce3d>] ? trace_hardirqs_on+0xd/0x10
[ 5536.077420]  [<ffffffff817cf61f>] ? mutex_lock_nested+0x2df/0x450
[ 5536.084234]  [<ffffffff8115a9f5>] ? css_killed_work_fn+0x25/0x220
[ 5536.091049]  [<ffffffff81167ae5>] cpuset_css_offline+0x35/0x60
[ 5536.097571]  [<ffffffff8115aa2c>] css_killed_work_fn+0x5c/0x220
[ 5536.104207]  [<ffffffff810bc83f>] process_one_work+0x1df/0x710
[ 5536.110736]  [<ffffffff810bc7c0>] ? process_one_work+0x160/0x710
[ 5536.117461]  [<ffffffff810bce9b>] worker_thread+0x12b/0x4a0
[ 5536.123697]  [<ffffffff810bcd70>] ? process_one_work+0x710/0x710
[ 5536.130426]  [<ffffffff810c3f7e>] kthread+0xfe/0x120
[ 5536.135991]  [<ffffffff817d4baf>] ret_from_fork+0x1f/0x40
[ 5536.142041]  [<ffffffff810c3e80>] ? kthread_create_on_node+0x230/0x230

One cgroup maintainer mentioned that "cgroup is trying to offline
a cpuset css, which takes place under cgroup_mutex.  The offlining
ends up trying to drain active usages of a sysctl table which apprently
is not happening."
The real reason is that proc_sys_readdir doesn't drop reference added
by grab_header when return from !dir_emit_dots path. So this cpuset
offline path will wait here forever.

See here for details: http://www.openwall.com/lists/oss-security/2016/11/04/13

Fixes: f0c3b5093add ("[readdir] convert procfs")
Reported-by: CAI Qian <caiqian@redhat.com>
Tested-by: Yang Shukui <yangshukui@huawei.com>
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoClearing FIFOs in RS485 emulation mode causes subsequent transmits to break
Daniel Jedrychowski [Sun, 11 Dec 2016 22:18:28 +0000 (09:18 +1100)]
Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break

commit 2bed8a8e70729f996af92042d3ad0f11870acc1f upstream.

When in RS485 emulation mode, __do_stop_tx_rs485() calls
serial8250_clear_fifos().  This not only clears the FIFOs, but also sets
all bits in their control register (UART_FCR) to 0.

One of the effects of this is the disabling of the FIFOs, which turns
them into single-byte holding registers.  The rest of the driver doesn't
know this, which results in the lions share of characters passed into a
write call to be dropped.

(I can supply logic analyzer screenshots if necessary)

This fix replaces the serial8250_clear_fifos() call to
serial8250_clear_and_reinit_fifos() - this prevents the "dropped
characters" issue from manifesting again while retaining the requirement
of clearing the RX FIFO after transmission if the SER_RS485_RX_DURING_TX
flag is disabled.

Signed-off-by: Daniel Jedrychowski <avistel@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoextcon: return error code on failure
Pan Bian [Sat, 3 Dec 2016 08:56:49 +0000 (16:56 +0800)]
extcon: return error code on failure

commit 5b11ebedd6a8bb4271b796e498cd15c0fe1133b6 upstream.

Function get_zeroed_page() returns a NULL pointer if there is no enough
memory. In function extcon_sync(), it returns 0 if the call to
get_zeroed_page() fails. The return value 0 indicates success in the
context, which is incosistent with the execution status. This patch
fixes the bug by returning -ENOMEM.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=188611

Signed-off-by: Pan Bian <bianpan2016@163.com>
Fixes: a580982f0836e
Acked-by: Chanwoo Choi <cw00.choi@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosysrq: attach sysrq handler correctly for 32-bit kernel
Akinobu Mita [Thu, 5 Jan 2017 17:14:16 +0000 (02:14 +0900)]
sysrq: attach sysrq handler correctly for 32-bit kernel

commit 802c03881f29844af0252b6e22be5d2f65f93fd0 upstream.

The sysrq input handler should be attached to the input device which has
a left alt key.

On 32-bit kernels, some input devices which has a left alt key cannot
attach sysrq handler.  Because the keybit bitmap in struct input_device_id
for sysrq is not correctly initialized.  KEY_LEFTALT is 56 which is
greater than BITS_PER_LONG on 32-bit kernels.

I found this problem when using a matrix keypad device which defines
a KEY_LEFTALT (56) but doesn't have a KEY_O (24 == 56%32).

Cc: Jiri Slaby <jslaby@suse.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoorinoco: Use shash instead of ahash for MIC calculations
Andrew Lutomirski [Mon, 12 Dec 2016 20:55:55 +0000 (12:55 -0800)]
orinoco: Use shash instead of ahash for MIC calculations

commit 570b90fa230b8021f51a67fab2245fe8df6fe37d upstream.

Eric Biggers pointed out that the orinoco driver pointed scatterlists
at the stack.

Fix it by switching from ahash to shash.  The result should be
simpler, faster, and more correct.

kvalo: cherry picked from commit 1fef293b8a9850cfa124a53c1d8878d355010403 as I
accidentally applied this patch to wireless-drivers-next when I was supposed to
apply this wireless-drivers

Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoibmvscsis: Fix srp_transfer_data fail return code
Bryant G. Ly [Mon, 9 Jan 2017 16:21:20 +0000 (10:21 -0600)]
ibmvscsis: Fix srp_transfer_data fail return code

commit 7c9d8d0c41b3e24473ac7648a7fc2d644ccf08ff upstream.

If srp_transfer_data fails within ibmvscsis_write_pending, then
the most likely scenario is that the client timed out the op and
removed the TCE mapping. Thus it will loop forever retrying the
op that is pretty much guaranteed to fail forever. A better return
code would be EIO instead of EAGAIN.

Reported-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Tested-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Signed-off-by: Bryant G. Ly <bgly@us.ibm.com>
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx
Richard Genoud [Tue, 13 Dec 2016 16:27:56 +0000 (17:27 +0100)]
tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx

commit 89d8232411a85b9a6b12fd5da4d07d8a138a8e0c upstream.

If we don't disable the transmitter in atmel_stop_tx, the DMA buffer
continues to send data until it is emptied.
This cause problems with the flow control (CTS is asserted and data are
still sent).

So, disabling the transmitter in atmel_stop_tx is a sane thing to do.

Tested on at91sam9g35-cm(DMA)
Tested for regressions on sama5d2-xplained(Fifo) and at91sam9g20ek(PDC)

Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agotty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done
Richard Genoud [Tue, 6 Dec 2016 12:05:33 +0000 (13:05 +0100)]
tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done

commit b389f173aaa1204d6dc1f299082a162eb0491545 upstream.

When using RS485 in half duplex, RX should be enabled when TX is
finished, and stopped when TX starts.

Before commit 0058f0871efe7b01c6 ("tty/serial: atmel: fix RS485 half
duplex with DMA"), RX was not disabled in atmel_start_tx() if the DMA
was used. So, collisions could happened.

But disabling RX in atmel_start_tx() uncovered another bug:
RX was enabled again in the wrong place (in atmel_tx_dma) instead of
being enabled when TX is finished (in atmel_complete_tx_dma), so the
transmission simply stopped.

This bug was not triggered before commit 0058f0871efe7b01c6
("tty/serial: atmel: fix RS485 half duplex with DMA") because RX was
never disabled before.

Moving atmel_start_rx() in atmel_complete_tx_dma() corrects the problem.

Reported-by: Gil Weber <webergil@gmail.com>
Fixes: 0058f0871efe7b01c6
Tested-by: Gil Weber <webergil@gmail.com>
Signed-off-by: Richard Genoud <richard.genoud@gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovirtio_blk: avoid DMA to stack for the sense buffer
Christoph Hellwig [Mon, 9 Jan 2017 15:56:23 +0000 (08:56 -0700)]
virtio_blk: avoid DMA to stack for the sense buffer

commit a14d749fcebe97ddf6af6db3d1f6ece85c9ddcb9 upstream.

Most users of BLOCK_PC requests allocate the sense buffer on the stack,
so to avoid DMA to the stack copy them to a field in the heap allocated
virtblk_req structure.  Without that any attempt at SCSI passthrough I/O,
including the SG_IO ioctl from userspace will crash the kernel.  Note that
this includes running tools like hdparm even when the host does not have
SCSI passthrough enabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodmaengine: omap-dma: Fix dynamic lch_map allocation
Peter Ujfalusi [Mon, 2 Jan 2017 10:07:37 +0000 (12:07 +0200)]
dmaengine: omap-dma: Fix dynamic lch_map allocation

commit 836c3ce2566fb8c1754f8d7c9534cad9bc8a6879 upstream.

The original patch did not done what it was supposed to be doing and even
worst it broke legacy boot (OMAP1).

The lch_map size should be the number of available logical channels in sDMA
and the od->dma_requests should store the number of available DMA request
lines usable in sDMA.

In legacy mode we do not have a way to get the DMA request count, in that
case we use OMAP_SDMA_REQUESTS (127), despite the fact that OMAP1510 have
only 31 DMA request line.

Fixes: 2d1a9a946fae ("dmaengine: omap-dma: Dynamically allocate memory for lch_map")
Reported-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Tested-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodrivers: char: mem: Fix thinkos in kmem address checks
Robin Murphy [Thu, 5 Jan 2017 17:15:01 +0000 (17:15 +0000)]
drivers: char: mem: Fix thinkos in kmem address checks

commit 488debb9971bc7d0edd6d8080ba78ca02a04f6c4 upstream.

When borrowing the pfn_valid() check from mmap_kmem(), somebody managed
to get physical and virtual addresses spectacularly muddled up, such
that we've ended up with checks for one being the other. Whilst this
does indeed prevent out-of-bounds accesses crashing, on most systems
it also prevents the more desirable use-case of working at all ever.

Check the *virtual* offset correctly for what it is. Furthermore, do
so in the right place - a read or write may span multiple pages, so a
single up-front check is insufficient. High memory accesses already
have a similar validity check just before the copy_to_user() call, so
just make the low memory path fully consistent with that.

Reported-by: Jason A. Donenfeld <Jason@zx2c4.com>
Fixes: 148a1bc84398 ("drivers: char: mem: Check {read,write}_kmem() addresses")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomnt: Protect the mountpoint hashtable with mount_lock
Eric W. Biederman [Tue, 3 Jan 2017 01:18:43 +0000 (14:18 +1300)]
mnt: Protect the mountpoint hashtable with mount_lock

commit 3895dbf8985f656675b5bde610723a29cbce3fa7 upstream.

Protecting the mountpoint hashtable with namespace_sem was sufficient
until a call to umount_mnt was added to mntput_no_expire.  At which
point it became possible for multiple calls of put_mountpoint on
the same hash chain to happen on the same time.

Kristen Johansen <kjlx@templeofstupid.com> reported:
> This can cause a panic when simultaneous callers of put_mountpoint
> attempt to free the same mountpoint.  This occurs because some callers
> hold the mount_hash_lock, while others hold the namespace lock.  Some
> even hold both.
>
> In this submitter's case, the panic manifested itself as a GP fault in
> put_mountpoint() when it called hlist_del() and attempted to dereference
> a m_hash.pprev that had been poisioned by another thread.

Al Viro observed that the simple fix is to switch from using the namespace_sem
to the mount_lock to protect the mountpoint hash table.

I have taken Al's suggested patch moved put_mountpoint in pivot_root
(instead of taking mount_lock an additional time), and have replaced
new_mountpoint with get_mountpoint a function that does the hash table
lookup and addition under the mount_lock.   The introduction of get_mounptoint
ensures that only the mount_lock is needed to manipulate the mountpoint
hashtable.

d_set_mounted is modified to only set DCACHE_MOUNTED if it is not
already set.  This allows get_mountpoint to use the setting of
DCACHE_MOUNTED to ensure adding a struct mountpoint for a dentry
happens exactly once.

Fixes: ce07d891a089 ("mnt: Honor MNT_LOCKED when detaching mounts")
Reported-by: Krister Johansen <kjlx@templeofstupid.com>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agopid: fix lockdep deadlock warning due to ucount_lock
Andrei Vagin [Thu, 5 Jan 2017 03:28:14 +0000 (19:28 -0800)]
pid: fix lockdep deadlock warning due to ucount_lock

commit add7c65ca426b7a37184dd3d2172394e23d585d6 upstream.

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
4.10.0-rc2-00024-g4aecec9-dirty #118 Tainted: G        W
---------------------------------------------------------
swapper/1/0 just changed the state of lock:
 (&(&sighand->siglock)->rlock){-.....}, at: [<ffffffffbd0a1bc6>] __lock_task_sighand+0xb6/0x2c0
but this lock took another, HARDIRQ-unsafe lock in the past:
 (ucounts_lock){+.+...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Chain exists of:                 &(&sighand->siglock)->rlock --> &(&tty->ctrl_lock)->rlock --> ucounts_lock
 Possible interrupt unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(ucounts_lock);
                               local_irq_disable();
                               lock(&(&sighand->siglock)->rlock);
                               lock(&(&tty->ctrl_lock)->rlock);
  <Interrupt>
    lock(&(&sighand->siglock)->rlock);

 *** DEADLOCK ***

This patch removes a dependency between rlock and ucount_lock.

Fixes: f333c700c610 ("pidns: Add a limit on the number of pid namespaces")
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agovme: Fix wrong pointer utilization in ca91cx42_slave_get
Augusto Mecking Caringi [Tue, 10 Jan 2017 10:45:00 +0000 (10:45 +0000)]
vme: Fix wrong pointer utilization in ca91cx42_slave_get

commit c8a6a09c1c617402cc9254b2bc8da359a0347d75 upstream.

In ca91cx42_slave_get function, the value pointed by vme_base pointer is
set through:

*vme_base = ioread32(bridge->base + CA91CX42_VSI_BS[i]);

So it must be dereferenced to be used in calculation of pci_base:

*pci_base = (dma_addr_t)*vme_base + pci_offset;

This bug was caught thanks to the following gcc warning:

drivers/vme/bridges/vme_ca91cx42.c: In function ‘ca91cx42_slave_get’:
drivers/vme/bridges/vme_ca91cx42.c:467:14: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
*pci_base = (dma_addr_t)vme_base + pci_offset;

Signed-off-by: Augusto Mecking Caringi <augustocaringi@gmail.com>
Acked-By: Martyn Welch <martyn@welchs.me.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoRevert "tty: serial: 8250: add CON_CONSDEV to flags"
Herbert Xu [Sun, 11 Dec 2016 02:05:49 +0000 (10:05 +0800)]
Revert "tty: serial: 8250: add CON_CONSDEV to flags"

commit 6741f551a0b26479de2532ffa43a366747e6dbf3 upstream.

This commit needs to be reverted because it prevents people from
using the serial console as a secondary console with input being
directed to tty0.

IOW, if you boot with console=ttyS0 console=tty0 then all kernels
prior to this commit will produce output on both ttyS0 and tty0
but input will only be taken from tty0.  With this patch the serial
console will always be the primary console instead of tty0,
potentially preventing people from getting into their machines in
emergency situations.

Fixes: d03516df8375 ("tty: serial: 8250: add CON_CONSDEV to flags")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoASoC: hdmi-codec: use unsigned type to structure members with bit-field
Takashi Sakamoto [Fri, 16 Dec 2016 09:26:54 +0000 (18:26 +0900)]
ASoC: hdmi-codec: use unsigned type to structure members with bit-field

commit 9e4d59ada4d602e78eee9fb5f898ce61fdddb446 upstream.

This is a fix for Linux 4.10-rc1.

In C language specification, a bit-field is interpreted as a signed or
unsigned integer type consisting of the specified number of bits.

In GCC manual, the range of a signed bit field of N bits is from
-(2^N) / 2 to ((2^N) / 2) - 1
https://www.gnu.org/software/gnu-c-manual/gnu-c-manual.html#Bit-Fields

Therefore, when defined as 1 bit-field with signed type, variables can
represents -1 and 0.

The snd-soc-hdmi-codec module includes a structure which has signed type
members with bit-fields. Codes of this module assign 0 and 1 to the
members. This seems to result in implementation-dependent behaviours.

As of v4.10-rc1 merge window, outside of sound subsystem, this structure
is referred by below GPU modules.
 - tda998x
 - sti-drm
 - mediatek-drm-hdmi
 - msm

As long as I review their codes relevant to the structure, the structure
members are used just for condition statements and printk formats.
My proposal of change is a bit intrusive to the printk formats but this
may be acceptable.

Totally, it's reasonable to use unsigned type for the structure members.
This bug is detected by Sparse, static code analyzer with below warnings.

./include/sound/hdmi-codec.h:39:26: error: dubious one-bit signed bitfield
./include/sound/hdmi-codec.h:40:28: error: dubious one-bit signed bitfield
./include/sound/hdmi-codec.h:41:29: error: dubious one-bit signed bitfield
./include/sound/hdmi-codec.h:42:31: error: dubious one-bit signed bitfield

Fixes: 09184118a8ab ("ASoC: hdmi-codec: Add hdmi-codec for external HDMI-encoders")
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Acked-by: Arnaud Pouliquen <arnaud.pouliquen@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobtrfs: fix crash when tracepoint arguments are freed by wq callbacks
David Sterba [Fri, 6 Jan 2017 13:12:51 +0000 (14:12 +0100)]
btrfs: fix crash when tracepoint arguments are freed by wq callbacks

commit ac0c7cf8be00f269f82964cf7b144ca3edc5dbc4 upstream.

Enabling btrfs tracepoints leads to instant crash, as reported. The wq
callbacks could free the memory and the tracepoints started to
dereference the members to get to fs_info.

The proposed fix https://marc.info/?l=linux-btrfs&m=148172436722606&w=2
removed the tracepoints but we could preserve them by passing only the
required data in a safe way.

Fixes: bc074524e123 ("btrfs: prefix fsid to all trace events")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxhci: fix deadlock at host remove by running watchdog correctly
Mathias Nyman [Wed, 11 Jan 2017 15:10:34 +0000 (17:10 +0200)]
xhci: fix deadlock at host remove by running watchdog correctly

commit d6169d04097fd9ddf811e63eae4e5cd71e6666e2 upstream.

If a URB is killed while the host is removed we can end up in a situation
where the hub thread takes the roothub device lock, and waits for
the URB to be given back by xhci-hcd, blocking the host remove code.

xhci-hcd tries to stop the endpoint and give back the urb, but can't
as the host is removed from PCI bus at the same time, preventing the normal
way of giving back urb.

Instead we need to rely on the stop command timeout function to give back
the urb. This xhci_stop_endpoint_command_watchdog() timeout function
used a XHCI_STATE_DYING flag to indicate if the timeout function is already
running, but later this flag has been taking into use in other places to
mark that xhci is dying.

Remove checks for XHCI_STATE_DYING in xhci_urb_dequeue. We are still
checking that reading from pci state does not return 0xffffffff or that
host is not halted before trying to stop the endpoint.

This whole area of stopping endpoints, giving back URBs, and the wathdog
timeout need rework, this fix focuses on solving a specific deadlock
issue that we can then send to stable before any major rework.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agofix a fencepost error in pipe_advance()
Al Viro [Sun, 15 Jan 2017 00:33:08 +0000 (19:33 -0500)]
fix a fencepost error in pipe_advance()

commit b9dc6f65bc5e232d1c05fe34b5daadc7e8bbf1fb upstream.

The logics in pipe_advance() used to release all buffers past the new
position failed in cases when the number of buffers to release was equal
to pipe->buffers.  If that happened, none of them had been released,
leaving pipe full.  Worse, it was trivial to trigger and we end up with
pipe full of uninitialized pages.  IOW, it's an infoleak.

Reported-by: "Alan J. Wylie" <alan@wylie.me.uk>
Tested-by: "Alan J. Wylie" <alan@wylie.me.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoi2c: fix kernel memory disclosure in dev interface
Vlad Tsyrklevich [Mon, 9 Jan 2017 15:53:36 +0000 (22:53 +0700)]
i2c: fix kernel memory disclosure in dev interface

commit 30f939feaeee23e21391cfc7b484f012eb189c3c upstream.

i2c_smbus_xfer() does not always fill an entire block, allowing
kernel stack memory disclosure through the temp variable. Clear
it before it's read to.

Signed-off-by: Vlad Tsyrklevich <vlad@tsyrklevich.net>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoi2c: print correct device invalid address
John Garry [Fri, 6 Jan 2017 11:02:57 +0000 (19:02 +0800)]
i2c: print correct device invalid address

commit 6f724fb3039522486fce2e32e4c0fbe238a6ab02 upstream.

In of_i2c_register_device(), when the check for
device address validity fails we print the info.addr,
which has not been assigned properly.

Fix this by printing the actual invalid address.

Signed-off-by: John Garry <john.garry@huawei.com>
Reviewed-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Fixes: b4e2f6ac1281 ("i2c: apply DT flags when probing")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoInput: elants_i2c - avoid divide by 0 errors on bad touchscreen data
Guenter Roeck [Thu, 5 Jan 2017 22:14:54 +0000 (14:14 -0800)]
Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data

commit 1c3415a06b1016a596bfe59e0cfee56c773aa958 upstream.

The following crash may be seen if bad data is received from the
touchscreen.

[ 2189.425150] elants_i2c i2c-ELAN0001:00: unknown packet ff ff ff ff
[ 2189.430738] divide error: 0000 [#1] PREEMPT SMP
[ 2189.434679] gsmi: Log Shutdown Reason 0x03
[ 2189.434689] Modules linked in: ip6t_REJECT nf_reject_ipv6 rfcomm evdi
uinput uvcvideo cmac videobuf2_vmalloc videobuf2_memops snd_hda_codec_hdmi
i2c_dev videobuf2_core snd_soc_sst_cht_bsw_rt5645 snd_hda_intel
snd_intel_sst_acpi btusb btrtl btbcm btintel bluetooth snd_soc_sst_acpi
snd_hda_codec snd_intel_sst_core snd_hwdep snd_soc_sst_mfld_platform
snd_hda_core snd_soc_rt5645 memconsole_x86_legacy memconsole zram snd_soc_rl6231
fuse ip6table_filter iwlmvm iwlwifi iwl7000_mac80211 cfg80211 iio_trig_sysfs
joydev cros_ec_sensors cros_ec_sensors_core industrialio_triggered_buffer
kfifo_buf industrialio snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq
snd_seq_device ppp_async ppp_generic slhc tun
[ 2189.434866] CPU: 0 PID: 106 Comm: irq/184-ELAN000 Tainted: G        W
3.18.0-13101-g57e8190 #1
[ 2189.434883] Hardware name: GOOGLE Ultima, BIOS Google_Ultima.7287.131.43 07/20/2016
[ 2189.434898] task: ffff88017a0b6d80 ti: ffff88017a2bc000 task.ti: ffff88017a2bc000
[ 2189.434913] RIP: 0010:[<ffffffffbecc48d5>]  [<ffffffffbecc48d5>] elants_i2c_irq+0x190/0x200
[ 2189.434937] RSP: 0018:ffff88017a2bfd98  EFLAGS: 00010293
[ 2189.434948] RAX: 0000000000000000 RBX: ffff88017a967828 RCX: ffff88017a9678e8
[ 2189.434962] RDX: 0000000000000000 RSI: 0000000000000246 RDI: 0000000000000000
[ 2189.434975] RBP: ffff88017a2bfdd8 R08: 00000000000003e8 R09: 0000000000000000
[ 2189.434989] R10: 0000000000000000 R11: 000000000044a2bd R12: ffff88017a991800
[ 2189.435001] R13: ffffffffbe8a2a53 R14: ffff88017a0b6d80 R15: ffff88017a0b6d80
[ 2189.435011] FS:  0000000000000000(0000) GS:ffff88017fc00000(0000) knlGS:0000000000000000
[ 2189.435022] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2189.435030] CR2: 00007f678d94b000 CR3: 000000003f41a000 CR4: 00000000001007f0
[ 2189.435039] Stack:
[ 2189.435044]  ffff88017a2bfda8 ffff88017a9678e8 646464647a2bfdd8 0000000006e09574
[ 2189.435060]  0000000000000000 ffff88017a088b80 ffff88017a921000 ffffffffbe8a2a53
[ 2189.435074]  ffff88017a2bfe08 ffffffffbe8a2a73 ffff88017a0b6d80 0000000006e09574
[ 2189.435089] Call Trace:
[ 2189.435101]  [<ffffffffbe8a2a53>] ? irq_thread_dtor+0xa9/0xa9
[ 2189.435112]  [<ffffffffbe8a2a73>] irq_thread_fn+0x20/0x40
[ 2189.435123]  [<ffffffffbe8a2be1>] irq_thread+0x14e/0x222
[ 2189.435135]  [<ffffffffbee8cbeb>] ? __schedule+0x3b3/0x57a
[ 2189.435145]  [<ffffffffbe8a29aa>] ? wake_threads_waitq+0x2d/0x2d
[ 2189.435156]  [<ffffffffbe8a2a93>] ? irq_thread_fn+0x40/0x40
[ 2189.435168]  [<ffffffffbe87c385>] kthread+0x10e/0x116
[ 2189.435178]  [<ffffffffbe87c277>] ? __kthread_parkme+0x67/0x67
[ 2189.435189]  [<ffffffffbee900ac>] ret_from_fork+0x7c/0xb0
[ 2189.435199]  [<ffffffffbe87c277>] ? __kthread_parkme+0x67/0x67
[ 2189.435208] Code: ff ff eb 73 0f b6 bb c1 00 00 00 83 ff 03 7e 13 49 8d 7c
24 20 ba 04 00 00 00 48 c7 c6 8a cd 21 bf eb 4d 0f b6 83 c2 00 00 00 99 <f7> ff
83 f8 37 75 15 48 6b f7 37 4c 8d a3 c4 00 00 00 4c 8d ac
[ 2189.435312] RIP  [<ffffffffbecc48d5>] elants_i2c_irq+0x190/0x200
[ 2189.435323]  RSP <ffff88017a2bfd98>
[ 2189.435350] ---[ end trace f4945345a75d96dd ]---
[ 2189.443841] Kernel panic - not syncing: Fatal exception
[ 2189.444307] Kernel Offset: 0x3d800000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 2189.444519] gsmi: Log Shutdown Reason 0x02

The problem was seen with a 3.18 based kernel, but there is no reason
to believe that the upstream code is safe.

Fixes: 66aee90088da2 ("Input: add support for Elan eKTH I2C touchscreens")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix open and resume after B0
Johan Hovold [Fri, 6 Jan 2017 18:15:11 +0000 (19:15 +0100)]
USB: serial: ch341: fix open and resume after B0

commit a20047f36e2f6a1eea4f1fd261aaa55882369868 upstream.

The private baud_rate variable is used to configure the port at open and
reset-resume and must never be set to (and left at) zero or reset-resume
and all further open attempts will fail.

Fixes: aa91def41a7b ("USB: ch341: set tty baud speed according to tty struct")
Fixes: 664d5df92e88 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix control-message error handling
Johan Hovold [Fri, 6 Jan 2017 18:15:18 +0000 (19:15 +0100)]
USB: serial: ch341: fix control-message error handling

commit 2d5a9c72d0c4ac73cf97f4b7814ed6c44b1e49ae upstream.

A short control transfer would currently fail to be detected, something
which could lead to stale buffer data being used as valid input.

Check for short transfers, and make sure to log any transfer errors.

Note that this also avoids leaking heap data to user space (TIOCMGET)
and the remote device (break control).

Fixes: 6ce76104781a ("USB: Driver for CH341 USB-serial adaptor")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix open error handling
Johan Hovold [Fri, 6 Jan 2017 18:15:13 +0000 (19:15 +0100)]
USB: serial: ch341: fix open error handling

commit f2950b78547ffb8475297ada6b92bc2d774d5461 upstream.

Make sure to stop the interrupt URB before returning on errors during
open.

Fixes: 664d5df92e88 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix resume after reset
Johan Hovold [Fri, 6 Jan 2017 18:15:14 +0000 (19:15 +0100)]
USB: serial: ch341: fix resume after reset

commit ce5e292828117d1b71cbd3edf9e9137cf31acd30 upstream.

Fix reset-resume handling which failed to resubmit the read and
interrupt URBs, thereby leaving a port that was open before suspend in a
broken state until closed and reopened.

Fixes: 1ded7ea47b88 ("USB: ch341 serial: fix port number changed after resume")
Fixes: 2bfd1c96a9fb ("USB: serial: ch341: remove reset_resume callback")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: ch341: fix initial modem-control state
Johan Hovold [Fri, 6 Jan 2017 18:15:10 +0000 (19:15 +0100)]
USB: serial: ch341: fix initial modem-control state

commit 4e2da44691cffbfffb1535f478d19bc2dca3e62b upstream.

DTR and RTS will be asserted by the tty-layer when the port is opened
and deasserted on close (if HUPCL is set). Make sure the initial state
is not-asserted before the port is first opened as well.

Fixes: 664d5df92e88 ("USB: usb-serial ch341: support for DTR/RTS/CTS")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoUSB: serial: kl5kusb105: fix line-state error handling
Johan Hovold [Tue, 10 Jan 2017 11:05:37 +0000 (12:05 +0100)]
USB: serial: kl5kusb105: fix line-state error handling

commit 146cc8a17a3b4996f6805ee5c080e7101277c410 upstream.

The current implementation failed to detect short transfers when
attempting to read the line state, and also, to make things worse,
logged the content of the uninitialised heap transfer buffer.

Fixes: abf492e7b3ae ("USB: kl5kusb105: fix DMA buffers on stack")
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agousb: musb: fix runtime PM in debugfs
Bin Liu [Tue, 10 Jan 2017 16:46:00 +0000 (10:46 -0600)]
usb: musb: fix runtime PM in debugfs

commit 7b6c1b4c0e1e44544aa18161dba6a741c080a7ef upstream.

MUSB driver now has runtime PM support, but the debugfs driver misses
the PM _get/_put() calls, which could cause MUSB register access
failure.

Acked-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agowusbcore: Fix one more crypto-on-the-stack bug
Andy Lutomirski [Wed, 14 Dec 2016 02:50:13 +0000 (18:50 -0800)]
wusbcore: Fix one more crypto-on-the-stack bug

commit 620f1a632ebcc9811c2f8009ba52297c7006f805 upstream.

The driver put a constant buffer of all zeros on the stack and
pointed a scatterlist entry at it.  This doesn't work with virtual
stacks.  Use ZERO_PAGE instead.

Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/CPU/AMD: Fix Bulldozer topology
Borislav Petkov [Thu, 5 Jan 2017 09:26:38 +0000 (10:26 +0100)]
x86/CPU/AMD: Fix Bulldozer topology

commit a33d331761bc5dd330499ca5ceceb67f0640a8e6 upstream.

The following commit:

  8196dab4fc15 ("x86/cpu: Get rid of compute_unit_id")

... broke the initial strategy for Bulldozer-based cores' topology,
where we consider each thread of a compute unit a standalone core
and not a HT or SMT thread.

Revert to the firmware-supplied core_id numbering and do not make
them thread siblings as we don't consider them for such even if they
technically are, more or less.

Reported-and-tested-by: Brice Goglin <Brice.Goglin@inria.fr>
Tested-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 8196dab4fc15 ("x86/cpu: Get rid of compute_unit_id")
Link: http://lkml.kernel.org/r/20170105092638.5247-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/bugs: Separate AMD E400 erratum and C1E bug
Thomas Gleixner [Fri, 9 Dec 2016 18:29:09 +0000 (19:29 +0100)]
x86/bugs: Separate AMD E400 erratum and C1E bug

commit 3344ed30791af66dbbad5f375008f3d1863b6c99 upstream.

The workaround for the AMD Erratum E400 (Local APIC timer stops in C1E
state) is a two step process:

 - Selection of the E400 aware idle routine

 - Detection whether the platform is affected

The idle routine selection happens for possibly affected CPUs depending on
family/model/stepping information. These range of CPUs is not necessarily
affected as the decision whether to enable the C1E feature is made by the
firmware. Unfortunately there is no way to query this at early boot.

The current implementation polls a MSR in the E400 aware idle routine to
detect whether the CPU is affected. This is inefficient on non affected
CPUs because every idle entry has to do the MSR read.

There is a better way to detect this before going idle for the first time
which requires to seperate the bug flags:

  X86_BUG_AMD_E400  - Selects the E400 aware idle routine and
     enables the detection

  X86_BUG_AMD_APIC_C1E  - Set when the platform is affected by E400

Replace the current X86_BUG_AMD_APIC_C1E usage by the new X86_BUG_AMD_E400
bug bit to select the idle routine which currently does an unconditional
detection poll. X86_BUG_AMD_APIC_C1E is going to be used in later patches
to remove the MSR polling and simplify the handling of this misfeature.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20161209182912.2726-3-bp@alien8.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature
Yazen Ghannam [Tue, 8 Nov 2016 15:30:54 +0000 (16:30 +0100)]
x86/cpu/AMD: Clean up cpu_llc_id assignment per topology feature

commit b6a50cddbcbda7105355898ead18f1a647c22520 upstream.

These changes do not affect current hw - just a cleanup:

Currently, we assume that a system has a single Last Level Cache (LLC)
per node, and that the cpu_llc_id is thus equal to the node_id. This no
longer applies since Fam17h can have multiple last level caches within a
node.

So group the cpu_llc_id assignment by topology feature and family in
order to make the computation of cpu_llc_id on the different families
more clear.

Here is how the LLC ID is being computed on the different families:

The NODEID_MSR feature only applies to Fam10h in which case the LLC is
at the node level.

The TOPOEXT feature is used on families 15h, 16h and 17h. So far we only
see multiple last level caches if L3 caches are available. Otherwise,
the cpu_llc_id will default to be the phys_proc_id.

We have L3 caches only on families 15h and 17h:

 - on Fam15h, the LLC is at the node level.

 - on Fam17h, the LLC is at the core complex level and can be found by
   right shifting the APIC ID. Also, keep the family checks explicit so that
   new families will fall back to the default, which will be node_id for
   TOPOEXT systems.

Single node systems in families 10h and 15h will have a Node ID of 0
which will be the same as the phys_proc_id, so we don't need to check
for multiple nodes before using the node_id.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Rewrote the commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20161108153054.bs3sajbyevq6a6uu@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agobridge: netfilter: Fix dropping packets that moving through bridge interface
Artur Molchanov [Fri, 30 Dec 2016 16:46:36 +0000 (19:46 +0300)]
bridge: netfilter: Fix dropping packets that moving through bridge interface

commit 14221cc45caad2fcab3a8543234bb7eda9b540d5 upstream.

Problem:
br_nf_pre_routing_finish() calls itself instead of
br_nf_pre_routing_finish_bridge(). Due to this bug reverse path filter drops
packets that go through bridge interface.

User impact:
Local docker containers with bridge network can not communicate with each
other.

Fixes: c5136b15ea36 ("netfilter: bridge: add and use br_nf_hook_thresh")
Signed-off-by: Artur Molchanov <artur.molchanov@synesis.ru>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoxfs: Timely free truncated dirty pages
Jan Kara [Wed, 11 Jan 2017 18:20:04 +0000 (10:20 -0800)]
xfs: Timely free truncated dirty pages

commit 0a417b8dc1f10b03e8f558b8a831f07ec4c23795 upstream.

Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started
to skip dirty pages in xfs_vm_releasepage() which also has the effect
that if a dirty page is truncated, it does not get freed by
block_invalidatepage() and is lingering in LRU list waiting for reclaim.
So a simple loop like:

while true; do
dd if=/dev/zero of=file bs=1M count=100
rm file
done

will keep using more and more memory until we hit low watermarks and
start pagecache reclaim which will eventually reclaim also the truncate
pages. Keeping these truncated (and thus never usable) pages in memory
is just a waste of memory, is unnecessarily stressing page cache
reclaim, and reportedly also leads to anonymous mmap(2) returning ENOMEM
prematurely.

So instead of just skipping dirty pages in xfs_vm_releasepage(), return
to old behavior of skipping them only if they have delalloc or unwritten
buffers and fix the spurious warnings by warning only if the page is
clean.

CC: Brian Foster <bfoster@redhat.com>
CC: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Petr Tůma <petr.tuma@d3s.mff.cuni.cz>
Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agogpio: Move freeing of GPIO hogs before numbing of the device
Geert Uytterhoeven [Mon, 19 Dec 2016 17:29:23 +0000 (18:29 +0100)]
gpio: Move freeing of GPIO hogs before numbing of the device

commit 5018ada69a04c8ac21d74bd682fceb8e42dc0f96 upstream.

When removing a gpiochip that uses GPIO hogging (e.g. by unloading the
chip's DT overlay), a warning is printed:

    gpio gpiochip8: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED

This happens because gpiochip_free_hogs() is called after the gdev->chip
pointer is reset to NULL. Hence __gpiod_free() cannot determine the
chip in use, and cannot clear flags nor call the optional chip-specific
.free() callback.

Move the call to gpiochip_free_hogs() up to fix this.

Fixes: ff2b135922992756 ("gpio: make the gpiochip a real device")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agonl80211: fix sched scan netlink socket owner destruction
Johannes Berg [Thu, 5 Jan 2017 09:57:14 +0000 (10:57 +0100)]
nl80211: fix sched scan netlink socket owner destruction

commit 753aacfd2e95df6a0caf23c03dc309020765bea9 upstream.

A single netlink socket might own multiple interfaces *and* a
scheduled scan request (which might belong to another interface),
so when it goes away both may need to be destroyed.

Remove the schedule_scan_stop indirection to fix this - it's only
needed for interface destruction because of the way this works
right now, with a single work taking care of all interfaces.

Fixes: 93a1e86ce10e4 ("nl80211: Stop scheduled scan if netlink client disappears")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agox86/efi: Don't allocate memmap through memblock after mm_init()
Nicolai Stange [Thu, 5 Jan 2017 12:51:29 +0000 (13:51 +0100)]
x86/efi: Don't allocate memmap through memblock after mm_init()

commit 20b1e22d01a4b0b11d3a1066e9feb04be38607ec upstream.

With the following commit:

  4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")

...  efi_bgrt_init() calls into the memblock allocator through
efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init() has been called.

Indeed, KASAN reports a bad read access later on in efi_free_boot_services():

  BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c
            at addr ffff88022de12740
  Read of size 4 by task swapper/0/0
  page:ffffea0008b78480 count:0 mapcount:-127
  mapping:          (null) index:0x1 flags: 0x5fff8000000000()
  [...]
  Call Trace:
   dump_stack+0x68/0x9f
   kasan_report_error+0x4c8/0x500
   kasan_report+0x58/0x60
   __asan_load4+0x61/0x80
   efi_free_boot_services+0xae/0x24c
   start_kernel+0x527/0x562
   x86_64_start_reservations+0x24/0x26
   x86_64_start_kernel+0x157/0x17a
   start_cpu+0x5/0x14

The instruction at the given address is the first read from the memmap's
memory, i.e. the read of md->type in efi_free_boot_services().

Note that the writes earlier in efi_arch_mem_reserve() don't splat because
they're done through early_memremap()ed addresses.

So, after memblock is gone, allocations should be done through the "normal"
page allocator. Introduce a helper, efi_memmap_alloc() for this. Use
it from efi_arch_mem_reserve(), efi_free_boot_services() and, for the sake
of consistency, from efi_fake_memmap() as well.

Note that for the latter, the memmap allocations cease to be page aligned.
This isn't needed though.

Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Fixes: 4bc9f92e64c8 ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data")
Link: http://lkml.kernel.org/r/20170105125130.2815-1-nicstange@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoefi/x86: Prune invalid memory map entries and fix boot regression
Peter Jones [Mon, 12 Dec 2016 23:42:28 +0000 (18:42 -0500)]
efi/x86: Prune invalid memory map entries and fix boot regression

commit 0100a3e67a9cef64d72cd3a1da86f3ddbee50363 upstream.

Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
(2.28), include memory map entries with phys_addr=0x0 and num_pages=0.

These machines fail to boot after the following commit,

  commit 8e80632fb23f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")

Fix this by removing such bogus entries from the memory map.

Furthermore, currently the log output for this case (with efi=debug)
looks like:

 [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |  |  |  |  |  ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)

This is clearly wrong, and also not as informative as it could be.  This
patch changes it so that if we find obviously invalid memory map
entries, we print an error and skip those entries.  It also detects the
display of the address range calculation overflow, so the new output is:

 [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
 [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[0x0000000000000000-0x0000000000000000] (invalid)

It also detects memory map sizes that would overflow the physical
address, for example phys_addr=0xfffffffffffff000 and
num_pages=0x0200000000000001, and prints:

 [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
 [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)

It then removes these entries from the memory map.

Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
[Matt: Include bugzilla info in commit log]
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoefi/libstub/arm*: Pass latest memory map to the kernel
Ard Biesheuvel [Sat, 24 Dec 2016 13:59:23 +0000 (13:59 +0000)]
efi/libstub/arm*: Pass latest memory map to the kernel

commit abfb7b686a3e5be27bf81db62f9c5c895b76f5d1 upstream.

As reported by James Morse, the current libstub code involving the
annotated memory map only works somewhat correctly by accident, due
to the fact that a pool allocation happens to be reused immediately,
retaining its former contents on most implementations of the
UEFI boot services.

Instead of juggling memory maps, which makes the code more complex than
it needs to be, simply put placeholder values into the FDT for the memory
map parameters, and only write the actual values after ExitBootServices()
has been called.

Reported-by: James Morse <james.morse@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Jeffrey Hugo <jhugo@codeaurora.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-efi@vger.kernel.org
Fixes: ed9cc156c42f ("efi/libstub: Use efi_exit_boot_services() in FDT")
Link: http://lkml.kernel.org/r/1482587963-20183-2-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: Introduce segmented_write_std
Steve Rutherford [Thu, 12 Jan 2017 02:28:29 +0000 (18:28 -0800)]
KVM: x86: Introduce segmented_write_std

commit 129a72a0d3c8e139a04512325384fe5ac119e74d upstream.

Introduces segemented_write_std.

Switches from emulated reads/writes to standard read/writes in fxsave,
fxrstor, sgdt, and sidt.  This fixes CVE-2017-2584, a longstanding
kernel memory leak.

Since commit 283c95d0e389 ("KVM: x86: emulate FXSAVE and FXRSTOR",
2016-11-09), which is luckily not yet in any final release, this would
also be an exploitable kernel memory *write*!

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 96051572c819194c37a8367624b285be10297eca
Fixes: 283c95d0e3891b64087706b344a4b545d04a6e62
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steve Rutherford <srutherford@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: emulate FXSAVE and FXRSTOR
Radim Krčmář [Wed, 9 Nov 2016 18:07:06 +0000 (19:07 +0100)]
KVM: x86: emulate FXSAVE and FXRSTOR

commit 283c95d0e3891b64087706b344a4b545d04a6e62 upstream.

Internal errors were reported on 16 bit fxsave and fxrstor with ipxe.
Old Intels don't have unrestricted_guest, so we have to emulate them.

The patch takes advantage of the hardware implementation.

AMD and Intel differ in saving and restoring other fields in first 32
bytes.  A test wrote 0xff to the fxsave area, 0 to upper bits of MCSXR
in the fxsave area, executed fxrstor, rewrote the fxsave area to 0xee,
and executed fxsave:

  Intel (Nehalem):
    7f 1f 7f 7f ff 00 ff 07 ff ff ff ff ff ff 00 00
    ff ff ff ff ff ff 00 00 ff ff 00 00 ff ff 00 00
  Intel (Haswell -- deprecated FPU CS and FPU DS):
    7f 1f 7f 7f ff 00 ff 07 ff ff ff ff 00 00 00 00
    ff ff ff ff 00 00 00 00 ff ff 00 00 ff ff 00 00
  AMD (Opteron 2300-series):
    7f 1f 7f 7f ff 00 ee ee ee ee ee ee ee ee ee ee
    ee ee ee ee ee ee ee ee ff ff 00 00 ff ff 02 00

fxsave/fxrstor will only be emulated on early Intels, so KVM can't do
much to improve the situation.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: add asm_safe wrapper
Radim Krčmář [Tue, 8 Nov 2016 19:54:18 +0000 (20:54 +0100)]
KVM: x86: add asm_safe wrapper

commit aabba3c6abd50b05b1fc2c6ec44244aa6bcda576 upstream.

Move the existing exception handling for inline assembly into a macro
and switch its return values to X86EMUL type.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: add Align16 instruction flag
Radim Krčmář [Tue, 8 Nov 2016 19:54:16 +0000 (20:54 +0100)]
KVM: x86: add Align16 instruction flag

commit d3fe959f81024072068e9ed86b39c2acfd7462a9 upstream.

Needed for FXSAVE and FXRSTOR.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: fix NULL deref in vcpu_scan_ioapic
Wanpeng Li [Wed, 4 Jan 2017 02:56:19 +0000 (18:56 -0800)]
KVM: x86: fix NULL deref in vcpu_scan_ioapic

commit 546d87e5c903a7f3ee7b9f998949a94729fbc65b upstream.

Reported by syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
    IP: _raw_spin_lock+0xc/0x30
    PGD 3e28eb067
    PUD 3f0ac6067
    PMD 0
    Oops: 0002 [#1] SMP
    CPU: 0 PID: 2431 Comm: test Tainted: G           OE   4.10.0-rc1+ #3
    Call Trace:
     ? kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
     kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm]
     ? pick_next_task_fair+0xe1/0x4e0
     ? kvm_arch_vcpu_load+0xea/0x260 [kvm]
     kvm_vcpu_ioctl+0x33a/0x600 [kvm]
     ? hrtimer_try_to_cancel+0x29/0x130
     ? do_nanosleep+0x97/0xf0
     do_vfs_ioctl+0xa1/0x5d0
     ? __hrtimer_init+0x90/0x90
     ? do_nanosleep+0x5b/0xf0
     SyS_ioctl+0x79/0x90
     do_syscall_64+0x6e/0x180
     entry_SYSCALL64_slow_path+0x25/0x25
    RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0

The syzkaller folks reported a NULL pointer dereference due to
ENABLE_CAP succeeding even without an irqchip.  The Hyper-V
synthetic interrupt controller is activated, resulting in a
wrong request to rescan the ioapic and a NULL pointer dereference.

    #include <sys/ioctl.h>
    #include <sys/mman.h>
    #include <sys/types.h>
    #include <linux/kvm.h>
    #include <pthread.h>
    #include <stddef.h>
    #include <stdint.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>

    #ifndef KVM_CAP_HYPERV_SYNIC
    #define KVM_CAP_HYPERV_SYNIC 123
    #endif

    void* thr(void* arg)
    {
struct kvm_enable_cap cap;
cap.flags = 0;
cap.cap = KVM_CAP_HYPERV_SYNIC;
ioctl((long)arg, KVM_ENABLE_CAP, &cap);
return 0;
    }

    int main()
    {
void *host_mem = mmap(0, 0x1000, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
int kvmfd = open("/dev/kvm", 0);
int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
struct kvm_userspace_memory_region memreg;
memreg.slot = 0;
memreg.flags = 0;
memreg.guest_phys_addr = 0;
memreg.memory_size = 0x1000;
memreg.userspace_addr = (unsigned long)host_mem;
host_mem[0] = 0xf4;
ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
struct kvm_sregs sregs;
ioctl(cpufd, KVM_GET_SREGS, &sregs);
sregs.cr0 = 0;
sregs.cr4 = 0;
sregs.efer = 0;
sregs.cs.selector = 0;
sregs.cs.base = 0;
ioctl(cpufd, KVM_SET_SREGS, &sregs);
struct kvm_regs regs = { .rflags = 2 };
ioctl(cpufd, KVM_SET_REGS, &regs);
ioctl(vmfd, KVM_CREATE_IRQCHIP, 0);
pthread_t th;
pthread_create(&th, 0, thr, (void*)(long)cpufd);
usleep(rand() % 10000);
ioctl(cpufd, KVM_RUN, 0);
pthread_join(th, 0);
return 0;
    }

This patch fixes it by failing ENABLE_CAP if without an irqchip.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: 5c919412fe61 (kvm/x86: Hyper-V synthetic interrupt controller)
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: flush pending lapic jump label updates on module unload
David Matlack [Fri, 16 Dec 2016 22:30:36 +0000 (14:30 -0800)]
KVM: x86: flush pending lapic jump label updates on module unload

commit cef84c302fe051744b983a92764d3fcca933415d upstream.

KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled).
These are implemented with delayed_work structs which can still be
pending when the KVM module is unloaded. We've seen this cause kernel
panics when the kvm_intel module is quickly reloaded.

Use the new static_key_deferred_flush() API to flush pending updates on
module unload.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agojump_labels: API for flushing deferred jump label updates
David Matlack [Fri, 16 Dec 2016 22:30:35 +0000 (14:30 -0800)]
jump_labels: API for flushing deferred jump label updates

commit b6416e61012429e0277bd15a229222fd17afc1c1 upstream.

Modules that use static_key_deferred need a way to synchronize with
any delayed work that is still pending when the module is unloaded.
Introduce static_key_deferred_flush() which flushes any pending
jump label updates.

Signed-off-by: David Matlack <dmatlack@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: eventfd: fix NULL deref irqbypass consumer
Wanpeng Li [Fri, 6 Jan 2017 01:39:42 +0000 (17:39 -0800)]
KVM: eventfd: fix NULL deref irqbypass consumer

commit 4f3dbdf47e150016aacd734e663347fcaa768303 upstream.

Reported syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    PGD 0

    Oops: 0002 [#1] SMP
    CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
    Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
    task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
    RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    Call Trace:
     irqfd_shutdown+0x66/0xa0 [kvm]
     process_one_work+0x16b/0x480
     worker_thread+0x4b/0x500
     kthread+0x101/0x140
     ? process_one_work+0x480/0x480
     ? kthread_create_on_node+0x60/0x60
     ret_from_fork+0x25/0x30
    RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
    CR2: 0000000000000008

The syzkaller folks reported a NULL pointer dereference that due to
unregister an consumer which fails registration before. The syzkaller
creates two VMs w/ an equal eventfd occasionally. So the second VM
fails to register an irqbypass consumer. It will make irqfd as inactive
and queue an workqueue work to shutdown irqfd and unregister the irqbypass
consumer when eventfd is closed. However, the second consumer has been
initialized though it fails registration. So the token(same as the first
VM's) is taken to unregister the consumer through the workqueue, the
consumer of the first VM is found and unregistered, then NULL deref incurred
in the path of deleting consumer from the consumers list.

This patch fixes it by making irq_bypass_register/unregister_consumer()
looks for the consumer entry based on consumer pointer itself instead of
token matching.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoKVM: x86: fix emulation of "MOV SS, null selector"
Paolo Bonzini [Thu, 12 Jan 2017 14:02:32 +0000 (15:02 +0100)]
KVM: x86: fix emulation of "MOV SS, null selector"

commit 33ab91103b3415e12457e3104f0e4517ce12d0f3 upstream.

This is CVE-2017-2583.  On Intel this causes a failed vmentry because
SS's type is neither 3 nor 7 (even though the manual says this check is
only done for usable SS, and the dmesg splat says that SS is unusable!).
On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.

The fix fabricates a data segment descriptor when SS is set to a null
selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
this in turn ensures CPL < 3 because RPL must be equal to CPL.

Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
the bug and deciphering the manuals.

Reported-by: Xiaohan Zhang <zhangxiaohan1@huawei.com>
Fixes: 79d5b4c3cd809c770d4bf9812635647016c56011
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm/hugetlb.c: fix reservation race when freeing surplus pages
Mike Kravetz [Wed, 11 Jan 2017 00:58:27 +0000 (16:58 -0800)]
mm/hugetlb.c: fix reservation race when freeing surplus pages

commit e5bbc8a6c992901058bc09e2ce01d16c111ff047 upstream.

return_unused_surplus_pages() decrements the global reservation count,
and frees any unused surplus pages that were backing the reservation.

Commit 7848a4bf51b3 ("mm/hugetlb.c: add cond_resched_lock() in
return_unused_surplus_pages()") added a call to cond_resched_lock in the
loop freeing the pages.

As a result, the hugetlb_lock could be dropped, and someone else could
use the pages that will be freed in subsequent iterations of the loop.
This could result in inconsistent global hugetlb page state, application
api failures (such as mmap) failures or application crashes.

When dropping the lock in return_unused_surplus_pages, make sure that
the global reservation count (resv_huge_pages) remains sufficiently
large to prevent someone else from claiming pages about to be freed.

Analyzed by Paul Cassella.

Fixes: 7848a4bf51b3 ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()")
Link: http://lkml.kernel.org/r/1483991767-6879-1-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Paul Cassella <cassella@cray.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm/slab.c: fix SLAB freelist randomization duplicate entries
John Sperbeck [Wed, 11 Jan 2017 00:58:24 +0000 (16:58 -0800)]
mm/slab.c: fix SLAB freelist randomization duplicate entries

commit c4e490cf148e85ead0d1b1c2caaba833f1d5b29f upstream.

This patch fixes a bug in the freelist randomization code.  When a high
random number is used, the freelist will contain duplicate entries.  It
will result in different allocations sharing the same chunk.

It will result in odd behaviours and crashes.  It should be uncommon but
it depends on the machines.  We saw it happening more often on some
machines (every few hours of running tests).

Fixes: c7ce4f60ac19 ("mm: SLAB freelist randomization")
Link: http://lkml.kernel.org/r/20170103181908.143178-1-thgarnie@google.com
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm: support anonymous stable page
Minchan Kim [Wed, 11 Jan 2017 00:58:15 +0000 (16:58 -0800)]
mm: support anonymous stable page

commit f05714293a591038304ddae7cb0dd747bb3786cc upstream.

During developemnt for zram-swap asynchronous writeback, I found strange
corruption of compressed page, resulting in:

  Modules linked in: zram(E)
  CPU: 3 PID: 1520 Comm: zramd-1 Tainted: G            E   4.8.0-mm1-00320-ge0d4894c9c38-dirty #3274
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  task: ffff88007620b840 task.stack: ffff880078090000
  RIP: set_freeobj.part.43+0x1c/0x1f
  RSP: 0018:ffff880078093ca8  EFLAGS: 00010246
  RAX: 0000000000000018 RBX: ffff880076798d88 RCX: ffffffff81c408c8
  RDX: 0000000000000018 RSI: 0000000000000000 RDI: 0000000000000246
  RBP: ffff880078093cb0 R08: 0000000000000000 R09: 0000000000000000
  R10: ffff88005bc43030 R11: 0000000000001df3 R12: ffff880076798d88
  R13: 000000000005bc43 R14: ffff88007819d1b8 R15: 0000000000000001
  FS:  0000000000000000(0000) GS:ffff88007e380000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fc934048f20 CR3: 0000000077b01000 CR4: 00000000000406e0
  Call Trace:
    obj_malloc+0x22b/0x260
    zs_malloc+0x1e4/0x580
    zram_bvec_rw+0x4cd/0x830 [zram]
    page_requests_rw+0x9c/0x130 [zram]
    zram_thread+0xe6/0x173 [zram]
    kthread+0xca/0xe0
    ret_from_fork+0x25/0x30

With investigation, it reveals currently stable page doesn't support
anonymous page.  IOW, reuse_swap_page can reuse the page without waiting
writeback completion so it can overwrite page zram is compressing.

Unfortunately, zram has used per-cpu stream feature from v4.7.
It aims for increasing cache hit ratio of scratch buffer for
compressing. Downside of that approach is that zram should ask
memory space for compressed page in per-cpu context which requires
stricted gfp flag which could be failed. If so, it retries to
allocate memory space out of per-cpu context so it could get memory
this time and compress the data again, copies it to the memory space.

In this scenario, zram assumes the data should never be changed
but it is not true unless stable page supports. So, If the data is
changed under us, zram can make buffer overrun because second
compression size could be bigger than one we got in previous trial
and blindly, copy bigger size object to smaller buffer which is
buffer overrun. The overrun breaks zsmalloc free object chaining
so system goes crash like above.

I think below is same problem.
https://bugzilla.suse.com/show_bug.cgi?id=997574

Unfortunately, reuse_swap_page should be atomic so that we cannot wait on
writeback in there so the approach in this patch is simply return false if
we found it needs stable page.  Although it increases memory footprint
temporarily, it happens rarely and it should be reclaimed easily althoug
it happened.  Also, It would be better than waiting of IO completion,
which is critial path for application latency.

Fixes: da9556a2367c ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/20161120233015.GA14113@bbox
Link: http://lkml.kernel.org/r/1482366980-3782-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm, memcg: fix the active list aging for lowmem requests when memcg is enabled
Michal Hocko [Wed, 11 Jan 2017 00:58:04 +0000 (16:58 -0800)]
mm, memcg: fix the active list aging for lowmem requests when memcg is enabled

commit b4536f0c829c8586544c94735c343f9b5070bd01 upstream.

Nils Holland and Klaus Ethgen have reported unexpected OOM killer
invocations with 32b kernel starting with 4.8 kernels

kworker/u4:5 invoked oom-killer: gfp_mask=0x2400840(GFP_NOFS|__GFP_NOFAIL), nodemask=0, order=0, oom_score_adj=0
kworker/u4:5 cpuset=/ mems_allowed=0
CPU: 1 PID: 2603 Comm: kworker/u4:5 Not tainted 4.9.0-gentoo #2
[...]
Mem-Info:
active_anon:58685 inactive_anon:90 isolated_anon:0
 active_file:274324 inactive_file:281962 isolated_file:0
 unevictable:0 dirty:649 writeback:0 unstable:0
 slab_reclaimable:40662 slab_unreclaimable:17754
 mapped:7382 shmem:202 pagetables:351 bounce:0
 free:206736 free_pcp:332 free_cma:0
Node 0 active_anon:234740kB inactive_anon:360kB active_file:1097296kB inactive_file:1127848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:29528kB dirty:2596kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 184320kB anon_thp: 808kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
DMA free:3952kB min:788kB low:984kB high:1180kB active_anon:0kB inactive_anon:0kB active_file:7316kB inactive_file:0kB unevictable:0kB writepending:96kB present:15992kB managed:15916kB mlocked:0kB slab_reclaimable:3200kB slab_unreclaimable:1408kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
lowmem_reserve[]: 0 813 3474 3474
Normal free:41332kB min:41368kB low:51708kB high:62048kB active_anon:0kB inactive_anon:0kB active_file:532748kB inactive_file:44kB unevictable:0kB writepending:24kB present:897016kB managed:836248kB mlocked:0kB slab_reclaimable:159448kB slab_unreclaimable:69608kB kernel_stack:1112kB pagetables:1404kB bounce:0kB free_pcp:528kB local_pcp:340kB free_cma:0kB
lowmem_reserve[]: 0 0 21292 21292
HighMem free:781660kB min:512kB low:34356kB high:68200kB active_anon:234740kB inactive_anon:360kB active_file:557232kB inactive_file:1127804kB unevictable:0kB writepending:2592kB present:2725384kB managed:2725384kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:800kB local_pcp:608kB free_cma:0kB

the oom killer is clearly pre-mature because there there is still a lot
of page cache in the zone Normal which should satisfy this lowmem
request.  Further debugging has shown that the reclaim cannot make any
forward progress because the page cache is hidden in the active list
which doesn't get rotated because inactive_list_is_low is not memcg
aware.

The code simply subtracts per-zone highmem counters from the respective
memcg's lru sizes which doesn't make any sense.  We can simply end up
always seeing the resulting active and inactive counts 0 and return
false.  This issue is not limited to 32b kernels but in practice the
effect on systems without CONFIG_HIGHMEM would be much harder to notice
because we do not invoke the OOM killer for allocations requests
targeting < ZONE_NORMAL.

Fix the issue by tracking per zone lru page counts in mem_cgroup_per_node
and subtract per-memcg highmem counts when memcg is enabled.  Introduce
helper lruvec_zone_lru_size which redirects to either zone counters or
mem_cgroup_get_zone_lru_size when appropriate.

We are losing empty LRU but non-zero lru size detection introduced by
ca707239e8a7 ("mm: update_lru_size warn and reset bad lru_size") because
of the inherent zone vs. node discrepancy.

Fixes: f8d1a31163fc ("mm: consider whether to decivate based on eligible zones inactive ratio")
Link: http://lkml.kernel.org/r/20170104100825.3729-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Nils Holland <nholland@tisys.org>
Tested-by: Nils Holland <nholland@tisys.org>
Reported-by: Klaus Ethgen <Klaus@Ethgen.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoocfs2: fix crash caused by stale lvb with fsdlm plugin
Eric Ren [Wed, 11 Jan 2017 00:57:33 +0000 (16:57 -0800)]
ocfs2: fix crash caused by stale lvb with fsdlm plugin

commit e7ee2c089e94067d68475990bdeed211c8852917 upstream.

The crash happens rather often when we reset some cluster nodes while
nodes contend fiercely to do truncate and append.

The crash backtrace is below:

   dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources
   dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms
   ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18)
   ocfs2: End replay journal (node 318952601, slot 2) on device (253,18)
   ocfs2: Beginning quota recovery on device (253,18) for slot 2
   ocfs2: Finishing quota recovery on device (253,18) for slot 2
   (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
   (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
   ------------[ cut here ]------------
   kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
   invalid opcode: 0000 [#1] SMP
   Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod    iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport      joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix               drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd       usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
   Supported: No, Unsupported modules are loaded
   CPU: 1 PID: 30154 Comm: truncate Tainted: G           OE   N  4.4.21-69-default #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
   task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000
   RIP: 0010:[<ffffffffa05c8c30>]  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
   RSP: 0018:ffff880074e6bd50  EFLAGS: 00010282
   RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000
   RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
   RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414
   R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448
   R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020
   FS:  00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
   CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0
   Call Trace:
     ocfs2_setattr+0x698/0xa90 [ocfs2]
     notify_change+0x1ae/0x380
     do_truncate+0x5e/0x90
     do_sys_ftruncate.constprop.11+0x108/0x160
     entry_SYSCALL_64_fastpath+0x12/0x6d
   Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
   RIP  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]

It's because ocfs2_inode_lock() get us stale LVB in which the i_size is
not equal to the disk i_size.  We mistakenly trust the LVB because the
underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with
DLM_SBF_VALNOTVALID properly for us.  But, why?

The current code tries to downconvert lock without DLM_LKF_VALBLK flag
to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even
if the lock resource type needs LVB.  This is not the right way for
fsdlm.

The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
DLM_LKF_VALBLK to decide if we care about the LVB in the LKB.  If
DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from
this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node
failure happens.

The following diagram briefly illustrates how this crash happens:

RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;

The 1st round:

             Node1                                    Node2
RSB1: PR
                                                  RSB1(master): NULL->EX
ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
  ocfs2_dlm_lock(no DLM_LKF_VALBLK)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

dlm_lock(no DLM_LKF_VALBLK)
  convert_lock(overwrite lkb->lkb_exflags
               with no DLM_LKF_VALBLK)

RSB1: NULL                                        RSB1: EX
                                                  reset Node2
dlm_recover_rsbs()
  recover_lvb()

/* The LVB is not trustable if the node with EX fails and
 * no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
 */

 if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
           return;                   * to invalid the LVB here.
                                     */

The 2nd round:

         Node 1                                Node2
RSB1(become master from recovery)

ocfs2_setattr()
  ocfs2_inode_lock(NULL->EX)
    /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
    ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */
  ocfs2_truncate_file()
      mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */

The fix is quite straightforward.  We keep to set DLM_LKF_VALBLK flag
for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin
is uesed.

Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
Dan Williams [Wed, 11 Jan 2017 00:57:36 +0000 (16:57 -0800)]
mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}

commit f931ab479dd24cf7a2c6e2df19778406892591fb upstream.

Both arch_add_memory() and arch_remove_memory() expect a single threaded
context.

For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does
not hold any locks over this check and branch:

    if (pgd_val(*pgd)) {
     pud = (pud_t *)pgd_page_vaddr(*pgd);
     paddr_last = phys_pud_init(pud, __pa(vaddr),
        __pa(vaddr_end),
        page_size_mask);
     continue;
    }

    pud = alloc_low_page();
    paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
        page_size_mask);

The result is that two threads calling devm_memremap_pages()
simultaneously can end up colliding on pgd initialization.  This leads
to crash signatures like the following where the loser of the race
initializes the wrong pgd entry:

    BUG: unable to handle kernel paging request at ffff888ebfff0000
    IP: memcpy_erms+0x6/0x10
    PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13
    task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000
    RIP: memcpy_erms+0x6/0x10
    [..]
    Call Trace:
      ? pmem_do_bvec+0x205/0x370 [nd_pmem]
      ? blk_queue_enter+0x3a/0x280
      pmem_rw_page+0x38/0x80 [nd_pmem]
      bdev_read_page+0x84/0xb0

Hold the standard memory hotplug mutex over calls to
arch_{add,remove}_memory().

Fixes: 41e94a851304 ("add devm_memremap_pages")
Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agomm: pmd dirty emulation in page fault handler
Minchan Kim [Wed, 11 Jan 2017 00:57:51 +0000 (16:57 -0800)]
mm: pmd dirty emulation in page fault handler

commit 20f664aabeb88d582b623a625f83b0454fa34f07 upstream.

Andreas reported [1] made a test in jemalloc hang in THP mode in arm64:

  http://lkml.kernel.org/r/mvmmvfy37g1.fsf@hawking.suse.de

The problem is currently page fault handler doesn't supports dirty bit
emulation of pmd for non-HW dirty-bit architecture so that application
stucks until VM marked the pmd dirty.

How the emulation work depends on the architecture.  In case of arm64,
when it set up pte firstly, it sets pte PTE_RDONLY to get a chance to
mark the pte dirty via triggering page fault when store access happens.
Once the page fault occurs, VM marks the pmd dirty and arch code for
setting pmd will clear PTE_RDONLY for application to proceed.

IOW, if VM doesn't mark the pmd dirty, application hangs forever by
repeated fault(i.e., store op but the pmd is PTE_RDONLY).

This patch enables pmd dirty-bit emulation for those architectures.

[1] b8d3c4c3009d, mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called

Fixes: b8d3c4c3009d ("mm/huge_memory.c: don't split THP page when MADV_FREE syscall is called")
Link: http://lkml.kernel.org/r/1482506098-6149-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Andreas Schwab <schwab@suse.de>
Tested-by: Andreas Schwab <schwab@suse.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jason Evans <je@fb.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agodax: fix deadlock with DAX 4k holes
Ross Zwisler [Wed, 11 Jan 2017 00:57:15 +0000 (16:57 -0800)]
dax: fix deadlock with DAX 4k holes

commit 965d004af54088d138f806d04d803fb60d441986 upstream.

Currently in DAX if we have three read faults on the same hole address we
can end up with the following:

Thread 0 Thread 1 Thread 2
-------- -------- --------
dax_iomap_fault
 grab_mapping_entry
  lock_slot
   <locks empty DAX entry>

   dax_iomap_fault
 grab_mapping_entry
  get_unlocked_mapping_entry
   <sleeps on empty DAX entry>

dax_iomap_fault
 grab_mapping_entry
  get_unlocked_mapping_entry
   <sleeps on empty DAX entry>
  dax_load_hole
   find_or_create_page
   ...
    page_cache_tree_insert
     dax_wake_mapping_entry_waiter
      <wakes one sleeper>
     __radix_tree_replace
      <swaps empty DAX entry with 4k zero page>

<wakes>
get_page
lock_page
...
put_locked_mapping_entry
unlock_page
put_page

<sleeps forever on the DAX
 wait queue>

The crux of the problem is that once we insert a 4k zero page, all
locking from then on is done in terms of that 4k zero page and any
additional threads sleeping on the empty DAX entry will never be woken.

Fix this by waking all sleepers when we replace the DAX radix tree entry
with a 4k zero page.  This will allow all sleeping threads to
successfully transition from locking based on the DAX empty entry to
locking on the 4k zero page.

With the test case reported by Xiong this happens very regularly in my
test setup, with some runs resulting in 9+ threads in this deadlocked
state.  With this fix I've been able to run that same test dozens of
times in a loop without issue.

Fixes: ac401cc78242 ("dax: New fault locking")
Link: http://lkml.kernel.org/r/1483479365-13607-1-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Xiong Zhou <xzhou@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agozram: support BDI_CAP_STABLE_WRITES
Minchan Kim [Wed, 11 Jan 2017 00:58:21 +0000 (16:58 -0800)]
zram: support BDI_CAP_STABLE_WRITES

commit b09ab054b69b07077bd3292f67e777861ac796e5 upstream.

zram has used per-cpu stream feature from v4.7.  It aims for increasing
cache hit ratio of scratch buffer for compressing.  Downside of that
approach is that zram should ask memory space for compressed page in
per-cpu context which requires stricted gfp flag which could be failed.
If so, it retries to allocate memory space out of per-cpu context so it
could get memory this time and compress the data again, copies it to the
memory space.

In this scenario, zram assumes the data should never be changed but it is
not true without stable page support.  So, If the data is changed under
us, zram can make buffer overrun so that zsmalloc free object chain is
broken so system goes crash like below

   https://bugzilla.suse.com/show_bug.cgi?id=997574

This patch adds BDI_CAP_STABLE_WRITES to zram for declaring "I am block
device needing *stable write*".

Fixes: da9556a2367c ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/1482366980-3782-4-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agozram: revalidate disk under init_lock
Minchan Kim [Wed, 11 Jan 2017 00:58:18 +0000 (16:58 -0800)]
zram: revalidate disk under init_lock

commit e7ccfc4ccb703e0f033bd4617580039898e912dd upstream.

Commit b4c5c60920e3 ("zram: avoid lockdep splat by revalidate_disk")
moved revalidate_disk call out of init_lock to avoid lockdep
false-positive splat.  However, commit 08eee69fcf6b ("zram: remove
init_lock in zram_make_request") removed init_lock in IO path so there
is no worry about lockdep splat.  So, let's restore it.

This patch is needed to set BDI_CAP_STABLE_WRITES atomically in next
patch.

Fixes: da9556a2367c ("zram: user per-cpu compression streams")
Link: http://lkml.kernel.org/r/1482366980-3782-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hyeoncheol Lee <cheol.lee@lge.com>
Cc: <yjay.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoselftests: do not require bash for the generated test
Rolf Eike Beer [Wed, 14 Dec 2016 10:59:34 +0000 (11:59 +0100)]
selftests: do not require bash for the generated test

commit a2b1e8a20c992b01eeb76de00d4f534cbe9f3822 upstream.

Nothing in this minimal script seems to require bash. We often run these
tests on embedded devices where the only shell available is the busybox
ash. Use sh instead.

Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoselftests: do not require bash to run netsocktests testcase
Rolf Eike Beer [Wed, 14 Dec 2016 10:59:57 +0000 (11:59 +0100)]
selftests: do not require bash to run netsocktests testcase

commit 3659f98b5375d195f1870c3e508fe51e52206839 upstream.

Nothing in this minimal script seems to require bash. We often run these
tests on embedded devices where the only shell available is the busybox
ash. Use sh instead.

Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>