git.karo-electronics.de Git - karo-tx-linux.git/log

perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi

commit c3d266c8a9838cc141b69548bc3b1b18808ae8c4 upstream.

This patch tries to fix a PEBS warning found in my stress test. The
following perf command can easily trigger the pebs warning or spurious
NMI error on Skylake/Broadwell/Haswell platforms:

  sudo perf record -e 'cpu/umask=0x04,event=0xc4/pp,cycles,branches,ref-cycles,cache-misses,cache-references' --call-graph fp -b -c1000 -a

Also the NMI watchdog must be enabled.

For this case, the events number is larger than counter number. So
perf has to do multiplexing.

In perf_mux_hrtimer_handler, it does perf_pmu_disable(), schedule out
old events, rotate_ctx, schedule in new events and finally
perf_pmu_enable().

If the old events include precise event, the MSR_IA32_PEBS_ENABLE
should be cleared when perf_pmu_disable().  The MSR_IA32_PEBS_ENABLE
should keep 0 until the perf_pmu_enable() is called and the new event is
precise event.

However, there is a corner case which could restore PEBS_ENABLE to
stale value during the above period. In perf_pmu_disable(), GLOBAL_CTRL
will be set to 0 to stop overflow and followed PMI. But there may be
pending PMI from an earlier overflow, which cannot be stopped. So even
GLOBAL_CTRL is cleared, the kernel still be possible to get PMI. At
the end of the PMI handler, __intel_pmu_enable_all() will be called,
which will restore the stale values if old events haven't scheduled
out.

Once the stale pebs value is set, it's impossible to be corrected if
the new events are non-precise. Because the pebs_enabled will be set
to 0. x86_pmu.enable_all() will ignore the MSR_IA32_PEBS_ENABLE
setting. As a result, the following NMI with stale PEBS_ENABLE
trigger pebs warning.

The pending PMI after enabled=0 will become harmless if the NMI handler
does not change the state. This patch checks cpuc->enabled in pmi and
only restore the state when PMU is active.

Here is the dump:

  Call Trace:
   <NMI>  [<ffffffff813c3a2e>] dump_stack+0x63/0x85
   [<ffffffff810a46f2>] warn_slowpath_common+0x82/0xc0
   [<ffffffff810a483a>] warn_slowpath_null+0x1a/0x20
   [<ffffffff8100fe2e>] intel_pmu_drain_pebs_nhm+0x2be/0x320
   [<ffffffff8100caa9>] intel_pmu_handle_irq+0x279/0x460
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff811f290d>] ? vunmap_page_range+0x20d/0x330
   [<ffffffff811f2f11>] ?  unmap_kernel_range_noflush+0x11/0x20
   [<ffffffff8148379f>] ? ghes_copy_tofrom_phys+0x10f/0x2a0
   [<ffffffff814839c8>] ? ghes_read_estatus+0x98/0x170
   [<ffffffff81005a7d>] perf_event_nmi_handler+0x2d/0x50
   [<ffffffff810310b9>] nmi_handle+0x69/0x120
   [<ffffffff810316f6>] default_do_nmi+0xe6/0x100
   [<ffffffff810317f2>] do_nmi+0xe2/0x130
   [<ffffffff817aea71>] end_repeat_nmi+0x1a/0x1e
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
   <<EOE>>  <IRQ>  [<ffffffff81006df8>] ?  x86_perf_event_set_period+0xd8/0x180
   [<ffffffff81006eec>] x86_pmu_start+0x4c/0x100
   [<ffffffff8100722d>] x86_pmu_enable+0x28d/0x300
   [<ffffffff811994d7>] perf_pmu_enable.part.81+0x7/0x10
   [<ffffffff8119cb70>] perf_mux_hrtimer_handler+0x200/0x280
   [<ffffffff8119c970>] ?  __perf_install_in_context+0xc0/0xc0
   [<ffffffff8110f92d>] __hrtimer_run_queues+0xfd/0x280
   [<ffffffff811100d8>] hrtimer_interrupt+0xa8/0x190
   [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff81051bd8>] local_apic_timer_interrupt+0x38/0x60
   [<ffffffff817af01d>] smp_apic_timer_interrupt+0x3d/0x50
   [<ffffffff817ad15c>] apic_timer_interrupt+0x8c/0xa0
   <EOI>  [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff81123de5>] ?  smp_call_function_single+0xd5/0x130
   [<ffffffff81123ddb>] ?  smp_call_function_single+0xcb/0x130
   [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
   [<ffffffff8119765a>] event_function_call+0x10a/0x120
   [<ffffffff8119c660>] ? ctx_resched+0x90/0x90
   [<ffffffff811971e0>] ? cpu_clock_event_read+0x30/0x30
   [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
   [<ffffffff8119772b>] _perf_event_enable+0x5b/0x70
   [<ffffffff81197388>] perf_event_for_each_child+0x38/0xa0
   [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
   [<ffffffff811a0ffd>] perf_ioctl+0x12d/0x3c0
   [<ffffffff8134d855>] ? selinux_file_ioctl+0x95/0x1e0
   [<ffffffff8124a3a1>] do_vfs_ioctl+0xa1/0x5a0
   [<ffffffff81036d29>] ? sched_clock+0x9/0x10
   [<ffffffff8124a919>] SyS_ioctl+0x79/0x90
   [<ffffffff817ac4b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
  ---[ end trace aef202839fe9a71d ]---
  Uhhuh. NMI received for unknown reason 2d on CPU 2.
  Do you have a strange power saving mode enabled?

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1457046448-6184-1-git-send-email-kan.liang@intel.com
[ Fixed various typos and other small details. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/x86/pebs: Add workaround for broken OVFL status on HSW+

commit 8077eca079a212f26419c57226f28696b7100683 upstream.

This patch fixes an issue with the GLOBAL_OVERFLOW_STATUS bits on
Haswell, Broadwell and Skylake processors when using PEBS.

The SDM stipulates that when the PEBS iterrupt threshold is crossed,
an interrupt is posted and the kernel is interrupted. The kernel will
find GLOBAL_OVF_SATUS bit 62 set indicating there are PEBS records to
drain. But the bits corresponding to the actual counters should NOT be
set. The kernel follows the SDM and assumes that all PEBS events are
processed in the drain_pebs() callback. The kernel then checks for
remaining overflows on any other (non-PEBS) events and processes these
in the for_each_bit_set(&status) loop.

As it turns out, under certain conditions on HSW and later processors,
on PEBS buffer interrupt, bit 62 is set but the counter bits may be
set as well. In that case, the kernel drains PEBS and generates
SAMPLES with the EXACT tag, then it processes the counter bits, and
generates normal (non-EXACT) SAMPLES.

I ran into this problem by trying to understand why on HSW sampling on
a PEBS event was sometimes returning SAMPLES without the EXACT tag.
This should not happen on user level code because HSW has the
eventing_ip which always point to the instruction that caused the
event.

The workaround in this patch simply ensures that the bits for the
counters used for PEBS events are cleared after the PEBS buffer has
been drained. With this fix 100% of the PEBS samples on my user code
report the EXACT tag.

Before:
  $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
  $ perf report -D | fgrep SAMPLES
  PERF_RECORD_SAMPLE(IP, 0x2): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                           \--- EXACT tag is missing

After:
  $ perf record -e cpu/event=0xd0,umask=0x81/upp ./multichase
  $ perf report -D | fgrep SAMPLES
  PERF_RECORD_SAMPLE(IP, 0x4002): 11775/11775: 0x406de5 period: 73469 addr: 0 exact=Y
                           \--- EXACT tag is set

The problem tends to appear more often when multiple PEBS events are used.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: adrian.hunter@intel.com
Cc: kan.liang@intel.com
Cc: namhyung@kernel.org
Link: http://lkml.kernel.org/r/1457034642-21837-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sched/cputime: Fix steal time accounting vs. CPU hotplug

commit e9532e69b8d1d1284e8ecf8d2586de34aec61244 upstream.

On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over CPU down and up. So after the CPU comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:

u64 steal = paravirt_steal_clock(smp_processor_id());

steal -= this_rq()->prev_steal_time;

So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
become stale.

Nice trick to tell the world how idle the system is (100%) while the CPU is
100% busy running tasks. Though we prefer realistic numbers.

None of the accounting values which use a previous value to account for
fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.

Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808516c "sched: Remove irq time from available CPU power"
Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi_common: do not clobber fixed sense information

commit ba08311647892cc7912de74525fd78416caf544a upstream.

For fixed sense the information field is 32 bits, to we need to truncate
the information field to avoid clobbering the sense code.

Fixes: a1524f226a02 ("libata-eh: Set 'information' field for autosense")
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PM / sleep: Clear pm_suspend_global_flags upon hibernate

commit 276142730c39c9839465a36a90e5674a8c34e839 upstream.

When suspending to RAM, waking up and later suspending to disk,
we gratuitously runtime resume devices after the thaw phase.
This does not occur if we always suspend to RAM or always to disk.

pm_complete_with_resume_check(), which gets called from
pci_pm_complete() among others, schedules a runtime resume
if PM_SUSPEND_FLAG_FW_RESUME is set. The flag is set during
a suspend-to-RAM cycle. It is cleared at the beginning of
the suspend-to-RAM cycle but not afterwards and it is not
cleared during a suspend-to-disk cycle at all. Fix it.

Fixes: ef25ba047601 (PM / sleep: Add flags to indicate platform firmware involvement)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled

commit d70e28f57e14a481977436695b0c9ba165472431 upstream.

Some SKL-H configurations require "intel_idle.max_cstate=7" to boot.
While that is an effective workaround, it disables C10.

This patch detects the problematic configuration,
and disables C8 and C9, keeping C10 enabled.

Note that enabling SGX in BIOS SETUP can also prevent this issue,
if the system BIOS provides that option.

https://bugzilla.kernel.org/show_bug.cgi?id=109081
"Freezes with Intel i7 6700HQ (Skylake), unless intel_idle.max_cstate=7"

Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtd: onenand: fix deadlock in onenand_block_markbad

commit 5e64c29e98bfbba1b527b0a164f9493f3db9e8cb upstream.

Commit 5942ddbc500d ("mtd: introduce mtd_block_markbad interface")
incorrectly changed onenand_block_markbad() to call mtd_block_markbad
instead of onenand_chip's block_markbad function. As a result the function
will now recurse and deadlock. Fix by reverting the change.

Fixes: 5942ddbc500d ("mtd: introduce mtd_block_markbad interface")
Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm/page_alloc: prevent merging between isolated and other pageblocks

commit d9dddbf556674bf125ecd925b24e43a5cf2a568a upstream.

Hanjun Guo has reported that a CMA stress test causes broken accounting of
CMA and free pages:

> Before the test, I got:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:          195044 kB
>
>
> After running the test:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:         6602584 kB
>
> So the freed CMA memory is more than total..
>
> Also the the MemFree is more than mem total:
>
> -bash-4.3# cat /proc/meminfo
> MemTotal:       16342016 kB
> MemFree:        22367268 kB
> MemAvailable:   22370528 kB

Laura Abbott has confirmed the issue and suspected the freepage accounting
rewrite around 3.18/4.0 by Joonsoo Kim.  Joonsoo had a theory that this is
caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
pageblocks:

> CMA isolates MAX_ORDER aligned blocks, but, during the process,
> partialy isolated block exists. If MAX_ORDER is 11 and
> pageblock_order is 9, two pageblocks make up MAX_ORDER
> aligned block and I can think following scenario because pageblock
> (un)isolation would be done one by one.
>
> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
> MIGRATE_ISOLATE, respectively.
>
> CC -> IC -> II (Isolation)
> II -> CI -> CC (Un-isolation)
>
> If some pages are freed at this intermediate state such as IC or CI,
> that page could be merged to the other page that is resident on
> different type of pageblock and it will cause wrong freepage count.

This was supposed to be prevented by CMA operating on MAX_ORDER blocks,
but since it doesn't hold the zone->lock between pageblocks, a race
window does exist.

It's also likely that unexpected merging can occur between
MIGRATE_ISOLATE and non-CMA pageblocks.  This should be prevented in
__free_one_page() since commit 3c605096d315 ("mm/page_alloc: restrict
max order of merging on isolated pageblock").  However, we only check
the migratetype of the pageblock where buddy merging has been initiated,
not the migratetype of the buddy pageblock (or group of pageblocks)
which can be MIGRATE_ISOLATE.

Joonsoo has suggested checking for buddy migratetype as part of
page_is_buddy(), but that would add extra checks in allocator hotpath
and bloat-o-meter has shown significant code bloat (the function is
inline).

This patch reduces the bloat at some expense of more complicated code.
The buddy-merging while-loop in __free_one_page() is initially bounded
to pageblock_border and without any migratetype checks.  The checks are
placed outside, bumping the max_order if merging is allowed, and
returning to the while-loop with a statement which can't be possibly
considered harmful.

This fixes the accounting bug and also removes the arguably weird state
in the original commit 3c605096d315 where buddies could be left
unmerged.

Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
Link: https://lkml.org/lkml/2016/3/2/280
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Hanjun Guo <guohanjun@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Debugged-by: Laura Abbott <labbott@redhat.com>
Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_list

commit be12b299a83fc807bbaccd2bcb8ec50cbb0cb55c upstream.

When master handles convert request, it queues ast first and then
returns status.  This may happen that the ast is sent before the request
status because the above two messages are sent by two threads.  And
right after the ast is sent, if master down, it may trigger BUG in
dlm_move_lockres_to_recovery_list in the requested node because ast
handler moves it to grant list without clear lock->convert_pending.  So
remove BUG_ON statement and check if the ast is processed in
dlmconvert_remote.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Tariq Saeed <tariq.x.saeed@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ocfs2/dlm: fix race between convert and recovery

commit ac7cf246dfdbec3d8fed296c7bf30e16f5099dac upstream.

There is a race window between dlmconvert_remote and
dlm_move_lockres_to_recovery_list, which will cause a lock with
OCFS2_LOCK_BUSY in grant list, thus system hangs.

dlmconvert_remote
{
        spin_lock(&res->spinlock);
        list_move_tail(&lock->list, &res->converting);
        lock->convert_pending = 1;
        spin_unlock(&res->spinlock);

        status = dlm_send_remote_convert_request();
        >>>>>> race window, master has queued ast and return DLM_NORMAL,
               and then down before sending ast.
               this node detects master down and calls
               dlm_move_lockres_to_recovery_list, which will revert the
               lock to grant list.
               Then OCFS2_LOCK_BUSY won't be cleared as new master won't
               send ast any more because it thinks already be authorized.

        spin_lock(&res->spinlock);
        lock->convert_pending = 0;
        if (status != DLM_NORMAL)
                dlm_revert_pending_convert(res, lock);
        spin_unlock(&res->spinlock);
}

In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set
(res is still in recovering) or res master changed (new master has
finished recovery), reset the status to DLM_RECOVERING, then it will
retry convert.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reported-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Tariq Saeed <tariq.x.saeed@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: ati_remote2 - fix crashes on detecting device with invalid descriptor

commit 950336ba3e4a1ffd2ca60d29f6ef386dd2c7351d upstream.

The ati_remote2 driver expects at least two interfaces with one
endpoint each. If given malicious descriptor that specify one
interface or no endpoints, it will crash in the probe function.
Ensure there is at least two interfaces and one endpoint for each
interface before using it.

The full disclosure: http://seclists.org/bugtraq/2016/Mar/90

Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: ims-pcu - sanity check against missing interfaces

commit a0ad220c96692eda76b2e3fd7279f3dcd1d8a8ff upstream.

A malicious device missing interface can make the driver oops.
Add sanity checking.

Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Input: synaptics - handle spurious release of trackstick buttons, again

commit 82be788c96ed5978d3cb4a00079e26b981a3df3f upstream.

Looks like the fimware 8.2 still has the extra buttons spurious release
bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=114321
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode

commit aaf2559332ba272671bb870464a99b909b29a3a1 upstream.

When cgroup writeback is in use, there can be multiple wb's
(bdi_writeback's) per bdi and an inode may switch among them
dynamically.  In a couple places, the wrong wb was used leading to
performing operations on the wrong list under the wrong lock
corrupting the io lists.

* writeback_single_inode() was taking @wb parameter and used it to
  remove the inode from io lists if it becomes clean after writeback.
  The callers of this function were always passing in the root wb
  regardless of the actual wb that the inode was associated with,
  which could also change while writeback is in progress.

  Fix it by dropping the @wb parameter and using
  inode_to_wb_and_lock_list() to determine and lock the associated wb.

* After writeback_sb_inodes() writes out an inode, it re-locks @wb and
  inode to remove it from or move it to the right io list.  It assumes
  that the inode is still associated with @wb; however, the inode may
  have switched to another wb while writeback was in progress.

  Fix it by using inode_to_wb_and_lock_list() to determine and lock
  the associated wb after writeback is complete.  As the function
  requires the original @wb->list_lock locked for the next iteration,
  in the unlikely case where the inode has changed association, switch
  the locks.

Kudos to Tahsin for pinpointing these subtle breakages.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: d10c80955265 ("writeback: implement foreign cgroup inode bdi_writeback switching")
Link: http://lkml.kernel.org/g/CAAeU0aMYeM_39Y2+PaRvyB1nqAPYZSNngJ1eBRmrxn7gKAt2Mg@mail.gmail.com
Reported-and-diagnosed-by: Tahsin Erdogan <tahsin@google.com>
Tested-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()

commit 614a4e3773148a31f58dc174bbf578ceb63510c2 upstream.

locked_inode_to_wb_and_lock_list() wb_get()'s the wb associated with
the target inode, unlocks inode, locks the wb's list_lock and verifies
that the inode is still associated with the wb.  To prevent the wb
going away between dropping inode lock and acquiring list_lock, the wb
is pinned while inode lock is held.  The wb reference is put right
after acquiring list_lock citing that the wb won't be dereferenced
anymore.

This isn't true.  If the inode is still associated with the wb, the
inode has reference and it's safe to return the wb; however, if inode
has been switched, the wb still needs to be unlocked which is a
dereference and can lead to use-after-free if it it races with wb
destruction.

Fix it by putting the reference after releasing list_lock.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 87e1d789bf55 ("writeback: implement [locked_]inode_to_wb_and_lock_list()")
Tested-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ACPI / PM: Runtime resume devices when waking from hibernate

commit fbda4b38fa3995aa0777fe9cbbdcb223c6292083 upstream.

Commit 58a1fbbb2ee8 ("PM / PCI / ACPI: Kick devices that might have been
reset by firmware") added a runtime resume for devices that were runtime
suspended when the system entered suspend-to-RAM.

Briefly, the motivation was to ensure that devices did not remain in a
reset-power-on state after resume, potentially preventing deep SoC-wide
low-power states from being entered on idle.

Currently we're not doing the same when leaving suspend-to-disk and this
asymmetry is a problem if drivers rely on the automatic resume triggered
by pm_complete_with_resume_check(). Fix it.

Fixes: 58a1fbbb2ee8 (PM / PCI / ACPI: Kick devices that might have been reset by firmware)
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: dts: at91: sama5d4 Xplained: don't disable hsmci regulator

commit b02acd4e62602a6ab307da84388a16bf60106c48 upstream.

If enabling the hsmci regulator on card detection, the board can reboot
on sd card insertion. Keeping the regulator always enabled fixes this
issue.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Fixes: 8d545f32bd77 ("ARM: at91/dt: sama5d4 xplained: add regulators for v(q)mmc1 supplies")
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: dts: at91: sama5d3 Xplained: don't disable hsmci regulator

commit ae3fc8ea08e405682f1fa959f94b6e4126afbc1b upstream.

If enabling the hsmci regulator on card detection, the board can reboot
on sd card insertion. Keeping the regulator always enabled fixes this
issue.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Fixes: 1b53e3416dd0 ("ARM: at91/dt: sama5d3 xplained: add fixed regulator for vmmc0")
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: fix deadlock secinfo+readdir compound

commit 2f6fc056e899bd0144a08da5cacaecbe8997cd74 upstream.

nfsd_lookup_dentry exits with the parent filehandle locked. fh_put also
unlocks if necessary (nfsd filehandle locking is probably too lenient),
so it gets unlocked eventually, but if the following op in the compound
needs to lock it again, we can deadlock.

A fuzzer ran into this; normal clients don't send a secinfo followed by
a readdir in the same compound.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd4: fix bad bounds checking

commit 4aed9c46afb80164401143aa0fdcfe3798baa9d5 upstream.

A number of spots in the xdr decoding follow a pattern like

n = be32_to_cpup(p++);
READ_BUF(n + 4);

where n is a u32. The only bounds checking is done in READ_BUF itself,
but since it's checking (n + 4), it won't catch cases where n is very
large, (u32)(-4) or higher. I'm not sure exactly what the consequences
are, but we've seen crashes soon after.

Instead, just break these up into two READ_BUF()s.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iser-target: Rework connection termination

commit 6d1fba0c2cc7efe42fd761ecbba833ed0ea7b07e upstream.

When we receive an event that triggers connection termination,
we have a a couple of things we may want to do:
1. In case we are already terminating, bailout early
2. In case we are connected but not bound, disconnect and schedule
a connection cleanup silently (don't reinstate)
3. In case we are connected and bound, disconnect and reinstate the connection

This rework fixes a bug that was detected against a mis-behaved
initiator which rejected our rdma_cm accept, in this stage the
isert_conn is no bound and reinstate caused a bogus dereference.

What's great about this is that we don't need the
post_recv_buf_count anymore, so get rid of it.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iser-target: Separate flows for np listeners and connections cma events

commit f81bf458208ef6d12b2fc08091204e3859dcdba4 upstream.

No need to restrict this check to specific events.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iser-target: Add new state ISER_CONN_BOUND to isert_conn

commit aea92980601f7ddfcb3c54caa53a43726314fe46 upstream.

We need an indication that isert_conn->iscsi_conn binding has
happened so we'll know not to invoke a connection reinstatement
on an unbound connection which will lead to a bogus isert_conn->conn
dereferece.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iser-target: Fix identification of login rx descriptor type

commit b89a7c25462b164db280abc3b05d4d9d888d40e9 upstream.

Once connection request is accepted, one rx descriptor
is posted to receive login request. This descriptor has rx type,
but is outside the main pool of rx descriptors, and thus
was mistreated as tx type.

Signed-off-by: Jenny Derzhavetz <jennyf@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

target: Fix target_release_cmd_kref shutdown comp leak

commit 5e47f1985d7107331c3f64fb3ec83d66fd73577e upstream.

This patch fixes an active I/O shutdown bug for fabric
drivers using target_wait_for_sess_cmds(), where se_cmd
descriptor shutdown would result in hung tasks waiting
indefinitely for se_cmd->cmd_wait_comp to complete().

To address this bug, drop the incorrect list_del_init()
usage in target_wait_for_sess_cmds() and always complete()
during se_cmd target_release_cmd_kref() put, in order to
let caller invoke the final fabric release callback
into se_cmd->se_tfo->release_cmd() code.

Reported-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Tested-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: bcm2835: Fix setting of PLL divider clock rates

commit 773b3966dd3cdaeb68e7f2edfe5656abac1dc411 upstream.

Our dividers weren't being set successfully because CM_PASSWORD wasn't
included in the register write. It looks easier to just compute the
divider to write ourselves than to update clk-divider for the ability
to OR in some arbitrary bits on write.

Fixes about half of the video modes on my HDMI monitor (everything
except 720x400).

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: rockchip: add hclk_cpubus to the list of rk3188 critical clocks

commit e8b63288b37dbb8457b510c9d96f6006da4653f6 upstream.

hclk_cpubus needs to keep running because it is needed for devices like
the rom, i2s0 or spdif to be accessible via cpu. Without that all
accesses to devices (readl/writel) return wrong data. So add it
to the list of critical clocks.

Fixes: 78eaf6095cc763c ("clk: rockchip: disable unused clocks")
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: rockchip: rk3368: fix hdmi_cec gate-register

commit fd0c0740fac17a014704ef89d8c8b1768711ca59 upstream.

Fix a typo making the sclk_hdmi_cec access a wrong register to handle
its gate.

Fixes: 3536c97a52db ("clk: rockchip: add rk3368 clock controller")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: zhangqing <zhangqing@rock-chips.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: rockchip: rk3368: fix parents of video encoder/decoder

commit 0f28d98463498c61c61a38aacbf9f69e92e85e9d upstream.

The vdpu and vepu clocks can also be parented to the npll and current
parent list also is wrong as it would use the npll as "usbphy" source,
so adapt the parent to the correct one.

Fixes: 3536c97a52db ("clk: rockchip: add rk3368 clock controller")
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: zhangqing <zhangqing@rock-chips.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: rockchip: rk3368: fix cpuclk core dividers

commit c6d5fe2ca8286f35a79f7345c9378c39d48a1527 upstream.

Similar to commit 9880d4277f6a ("clk: rockchip: fix rk3288 cpuclk core
dividers") it seems the cpuclk dividers are one to high on the rk3368
as well.

And again similar to the previous fix, we opt to make the divider list
contain the values to be written to use the same paradigm for them on all
supported socs.

Fixes: 3536c97a52db ("clk: rockchip: add rk3368 clock controller")
Reported-by: Zhang Qing <zhangqing@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: zhangqing <zhangqing@rock-chips.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

clk: rockchip: rk3368: fix cpuclk mux bit of big cpu-cluster

commit 535ebd428aeb07c3327947281306f2943f2c9faa upstream.

Both clusters have their mux bit in bit 7 of their respective register.
For whatever reason the big cluster currently lists bit 15 which is
definitly wrong.

Fixes: 3536c97a52db ("clk: rockchip: add rk3368 clock controller")
Reported-by: Zhang Qing <zhangqing@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: zhangqing <zhangqing@rock-chips.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: sdhci: Fix override of timeout clk wrt max_busy_timeout

commit 995136247915c5cee633d55ba23f6eebf67aa567 upstream.

Normally the timeout clock frequency is read from the capabilities
register.  It is also possible to set the value prior to calling
sdhci_add_host() in which case that value will override the
capabilities register value.  However that was being done after
calculating max_busy_timeout so that max_busy_timeout was being
calculated using the wrong value of timeout_clk.

Fix that by moving the override before max_busy_timeout is
calculated.

The result is that the max_busy_timeout and max_discard
increase for BSW devices so that, for example, the time for
mkfs.ext4 on a 64GB eMMC drops from about 1 minute 40 seconds
to about 20 seconds.

Note, in the future, the capabilities setting will be tidied up
and this override won't be used anymore.  However this fix is
needed for stable.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: sdhci: fix data timeout (part 2)

commit 7f05538af71c7d30b5fc821cbe9f318edc645961 upstream.

The calculation for the timeout based on the number of card clocks is
incorrect.  The calculation assumed:

timeout in microseconds = clock cycles / clock in Hz

which is clearly a several orders of magnitude wrong.  Fix this by
multiplying the clock cycles by 1000000 prior to dividing by the Hz
based clock.  Also, as per part 1, ensure that the division rounds
up.

As this needs 64-bit math via do_div(), avoid it if the clock cycles
is zero.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: sdhci: fix data timeout (part 1)

commit fafcfda9e78cae8796d1799f14e6457790797555 upstream.

The data timeout gives the minimum amount of time that should be
waited before timing out if no data is received from the card.
Simply dividing the nanosecond part by 1000 does not give this
required guarantee, since such a division rounds down. Use
DIV_ROUND_UP() to give the desired timeout.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: mmc_spi: Add Card Detect comments and fix CD GPIO case

commit bcdc9f260bdce09913db1464be9817170d51044a upstream.

This patch fixes the MMC SPI driver from doing polling card detect when a
CD GPIO that supports interrupts is specified using the gpios DT property.

Without this patch the DT node below results in the following output:

spi_gpio: spi-gpio { /* SD2 @ CN12 */
         compatible = "spi-gpio";
         #address-cells = <1>;
         #size-cells = <0>;
         gpio-sck = <&gpio6 16 GPIO_ACTIVE_HIGH>;
         gpio-mosi = <&gpio6 17 GPIO_ACTIVE_HIGH>;
         gpio-miso = <&gpio6 18 GPIO_ACTIVE_HIGH>;
         num-chipselects = <1>;
         cs-gpios = <&gpio6 21 GPIO_ACTIVE_LOW>;
         status = "okay";

         spi@0 {
                 compatible = "mmc-spi-slot";
                 reg = <0>;
                 voltage-ranges = <3200 3400>;
                 spi-max-frequency = <25000000>;
                 gpios = <&gpio6 22 GPIO_ACTIVE_LOW>;   /* CD */
         };
};

# dmesg | grep mmc
mmc_spi spi32766.0: SD/MMC host mmc0, no WP, no poweroff, cd polling
mmc0: host does not support reading read-only switch, assuming write-enable
mmc0: new SDHC card on SPI
mmcblk0: mmc0:0000 SU04G 3.69 GiB
mmcblk0: p1

With this patch applied the "cd polling" portion above disappears.

Signed-off-by: Magnus Damm <damm+renesas@opensource.se>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: block: fix ABI regression of mmc_blk_ioctl

commit 83c742c344c08c2bbe338d45c6ec63110e9d5e3d upstream.

If mmc_blk_ioctl returns -EINVAL, blkdev_ioctl continues to
work without returning err to user-space. But now we check
CAP_SYS_RAWIO firstly, so we return -EPERM to blkdev_ioctl,
which make blkdev_ioctl return -EPERM to user-space directly.
So this will break all the ioctl with BLKROSET. Now we find
Android-adb suffer it for the following log:

remount of /system failed;
couldn't make block device writable: Operation not permitted
openat(AT_FDCWD, "/dev/block/platform/ff420000.dwmmc/by-name/system", O_RDONLY) = 3
ioctl(3, BLKROSET, 0) = -1 EPERM (Operation not permitted)

Fixes: a5f5774c55a2 ("mmc: block: Add new ioctl to send multi commands")
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ideapad-laptop: Add ideapad Y700 (15) to the no_hw_rfkill DMI list

commit 4db9675d927a71faa66e5ab128d2390d6329750b upstream.

Some Lenovo ideapad models lack a physical rfkill switch.
On Lenovo models ideapad Y700 Touch-15ISK and ideapad Y700-15ISK,
ideapad-laptop would wrongly report all radios as blocked by
hardware which caused wireless network connections to fail.

Add these models without an rfkill switch to the no_hw_rfkill list.

Signed-off-by: John Dahlstrom <jodarom@sdf.org>
Cc: <stable@vger.kernel.org> # 3.17.x-: 4fa9dab: ideapad_laptop: Lenovo G50-30 fix rfkill reports wireless blocked
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MAINTAINERS: Update mailing list and web page for hwmon subsystem

commit 968ce1b1f45a7d76b5471b19bd035dbecc72f32d upstream.

The old web page for the hwmon subsystem is no longer operational,
and the mailing list has become unreliable. Move both to kernel.org.

Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kbuild/mkspec: fix grub2 installkernel issue

commit c8b08ca558c0067bc9e15ce3f1e70af260410bb2 upstream.

mkspec is copying built kernel to temporrary location

/boot/vmlinuz-$KERNELRELEASE-rpm

and runs installkernel on it. This however directly leads to grub2
menuentry for this suffixed binary being generated as well during the run
of installkernel script.

Later in the process the temporary -rpm suffixed files are removed, and
therefore we end up with spurious (and non-functional) grub2 menu entries
for each installed kernel RPM.

Fix that by using a different temporary name (prefixed by '.'), so that
the binary is not recognized as an actual kernel binary and no menuentry
is created for it.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Fixes: 3c9c7a14b627 ("rpm-pkg: add %post section to create initramfs and grub hooks")
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scripts/kconfig: allow building with make 3.80 again

commit 42f9d3c6888bceef6dc7ba72c77acf47347dcf05 upstream.

Documentation/Changes still lists this as the minimal required version,
so it ought to remain usable for the time being.

Fixes: d2036f30cf ("scripts/kconfig/Makefile: Allow KBUILD_DEFCONFIG to be a target")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scripts/coccinelle: modernize &

commit 1b669e713f277a4d4b3cec84e13d16544ac8286d upstream.

& is no longer allowed in column 0, since Coccinelle 1.0.4.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Tested-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bitops: Do not default to __clear_bit() for __clear_bit_unlock()

commit f75d48644c56a31731d17fa693c8175328957e1d upstream.

__clear_bit_unlock() is a special little snowflake. While it carries the
non-atomic '__' prefix, it is specifically documented to pair with
test_and_set_bit() and therefore should be 'somewhat' atomic.

Therefore the generic implementation of __clear_bit_unlock() cannot use
the fully non-atomic __clear_bit() as a default.

If an arch is able to do better; is must provide an implementation of
__clear_bit_unlock() itself.

Specifically, this came up as a result of hackbench livelock'ing in
slab_lock() on ARC with SMP + SLUB + !LLSC.

The issue was incorrect pairing of atomic ops.

slab_lock() -> bit_spin_lock() -> test_and_set_bit()
slab_unlock() -> __bit_spin_unlock() -> __clear_bit()

The non serializing __clear_bit() was getting "lost"

80543b8e: ld_s       r2,[r13,0] <--- (A) Finds PG_locked is set
80543b90: or         r3,r2,1    <--- (B) other core unlocks right here
80543b94: st_s       r3,[r13,0] <--- (C) sets PG_locked (overwrites unlock)

Fixes ARC STAR 9000817404 (and probably more).

Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Tested-by: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160309114054.GJ6356@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Fix trace_printk() to print when not using bprintk()

commit 3debb0a9ddb16526de8b456491b7db60114f7b5e upstream.

The trace_printk() code will allocate extra buffers if the compile detects
that a trace_printk() is used. To do this, the format of the trace_printk()
is saved to the __trace_printk_fmt section, and if that section is bigger
than zero, the buffers are allocated (along with a message that this has
happened).

If trace_printk() uses a format that is not a constant, and thus something
not guaranteed to be around when the print happens, the compiler optimizes
the fmt out, as it is not used, and the __trace_printk_fmt section is not
filled. This means the kernel will not allocate the special buffers needed
for the trace_printk() and the trace_printk() will not write anything to the
tracing buffer.

Adding a "__used" to the variable in the __trace_printk_fmt section will
keep it around, even though it is set to NULL. This will keep the string
from being printed in the debugfs/tracing/printk_formats section as it is
not needed.

Reported-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 07d777fe8c398 "tracing: Add percpu buffers for trace_printk()"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Fix crash from reading trace_pipe with sendfile

commit a29054d9478d0435ab01b7544da4f674ab13f533 upstream.

If tracing contains data and the trace_pipe file is read with sendfile(),
then it can trigger a NULL pointer dereference and various BUG_ON within the
VM code.

There's a patch to fix this in the splice_to_pipe() code, but it's also a
good idea to not let that happen from trace_pipe either.

Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in
Reported-by: Rabin Vincent <rabin.vincent@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tracing: Have preempt(irqs)off trace preempt disabled functions

commit cb86e05390debcc084cfdb0a71ed4c5dbbec517d upstream.

Joel Fernandes reported that the function tracing of preempt disabled
sections was not being reported when running either the preemptirqsoff or
preemptoff tracers. This was due to the fact that the function tracer
callback for those tracers checked if irqs were disabled before tracing. But
this fails when we want to trace preempt off locations as well.

Joel explained that he wanted to see funcitons where interrupts are enabled
but preemption was disabled. The expected output he wanted:

   <...>-2265    1d.h1 3419us : preempt_count_sub <-irq_exit
   <...>-2265    1d..1 3419us : __do_softirq <-irq_exit
   <...>-2265    1d..1 3419us : msecs_to_jiffies <-__do_softirq
   <...>-2265    1d..1 3420us : irqtime_account_irq <-__do_softirq
   <...>-2265    1d..1 3420us : __local_bh_disable_ip <-__do_softirq
   <...>-2265    1..s1 3421us : run_timer_softirq <-__do_softirq
   <...>-2265    1..s1 3421us : hrtimer_run_pending <-run_timer_softirq
   <...>-2265    1..s1 3421us : _raw_spin_lock_irq <-run_timer_softirq
   <...>-2265    1d.s1 3422us : preempt_count_add <-_raw_spin_lock_irq
   <...>-2265    1d.s2 3422us : _raw_spin_unlock_irq <-run_timer_softirq
   <...>-2265    1..s2 3422us : preempt_count_sub <-_raw_spin_unlock_irq
   <...>-2265    1..s1 3423us : rcu_bh_qs <-__do_softirq
   <...>-2265    1d.s1 3423us : irqtime_account_irq <-__do_softirq
   <...>-2265    1d.s1 3423us : __local_bh_enable <-__do_softirq

There's a comment saying that the irq disabled check is because there's a
possible race that tracing_cpu may be set when the function is executed. But
I don't remember that race. For now, I added a check for preemption being
enabled too to not record the function, as there would be no race if that
was the case. I need to re-investigate this, as I'm now thinking that the
tracing_cpu will always be correct. But no harm in keeping the check for
now, except for the slight performance hit.

Link: http://lkml.kernel.org/r/1457770386-88717-1-git-send-email-agnel.joel@gmail.com
Fixes: 5e6d2b9cfa3a "tracing: Use one prologue for the preempt irqs off tracer function tracers"
Reported-by: Joel Fernandes <agnel.joel@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/ipoib: fix for rare multicast join race condition

commit 08bc327629cbd63bb2f66677e4b33b643695097c upstream.

A narrow window for race condition still exist between
multicast join thread and *dev_flush workers.
A kernel crash caused by prolong erratic link state changes
was observed (most likely a faulty cabling):

[167275.656270] BUG: unable to handle kernel NULL pointer dereference at
0000000000000020
[167275.665973] IP: [<ffffffffa05f8f2e>] ipoib_mcast_join+0xae/0x1d0 [ib_ipoib]
[167275.674443] PGD 0
[167275.677373] Oops: 0000 [#1] SMP
...
[167275.977530] Call Trace:
[167275.982225]  [<ffffffffa05f92f0>] ? ipoib_mcast_free+0x200/0x200 [ib_ipoib]
[167275.992024]  [<ffffffffa05fa1b7>] ipoib_mcast_join_task+0x2a7/0x490
[ib_ipoib]
[167276.002149]  [<ffffffff8109d5fb>] process_one_work+0x17b/0x470
[167276.010754]  [<ffffffff8109e3cb>] worker_thread+0x11b/0x400
[167276.019088]  [<ffffffff8109e2b0>] ? rescuer_thread+0x400/0x400
[167276.027737]  [<ffffffff810a5aef>] kthread+0xcf/0xe0
Here was a hit spot:
ipoib_mcast_join() {
..............
      rec.qkey      = priv->broadcast->mcmember.qkey;
                                       ^^^^^^^
.....
}
Proposed patch should prevent multicast join task to continue
if link state change is detected.

Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Changes from v4:
- as suggested by Doug Ledford, optimized spinlock usage,
i.e. ipoib_mcast_join() is called with lock held.
Changes from v3:
- sync with priv->lock before flag check.
Chages from v2:
- Move check for OPER_UP flag state to mcast_join() to
ensure no event worker is in progress.
- minor style fixes.
Changes from v1:
- No need to lock again if error detected.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: include the right version of gmc header files for iceland

commit 16a8a49be1b878ef6dd5d1663d456e254e54ae3d upstream.

Signed-off-by: Ken Wang <Qingqing.Wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amdgpu: disable runtime pm on PX laptops without dGPU power control

commit bedf2a65c1aa8fb29ba8527fd00c0f68ec1f55f1 upstream.

Some PX laptops don't provide an ACPI method to control dGPU power. On
those systems, the driver is responsible for handling the dGPU power
state. Disable runtime PM on them until support for this is implemented.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/radeon: Don't drop DP 2.7 Ghz link setup on some cards.

commit 459ee1c3fd097ab56ababd8ff4bb7ef6a792de33 upstream.

As observed on Apple iMac10,1, DCE-3.2, RV-730,
link rate of 2.7 Ghz is not selected, because
the args.v1.ucConfig flag setting for 2.7 Ghz
gets overwritten by a following assignment of
the transmitter to use.

Move link rate setup a few lines down to fix this.
In practice this didn't have any positive or
negative effect on display setup on the tested
iMac10,1 so i don't know if backporting to stable
makes sense or not.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/radeon: disable runtime pm on PX laptops without dGPU power control

commit e64c952efb8e0c15ae82cec8e455ab4910690ef1 upstream.

Some PX laptops don't provide an ACPI method to control dGPU power. On
those systems, the driver is responsible for handling the dGPU power
state. Disable runtime PM on them until support for this is implemented.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

iwlwifi: mvm: Fix paging memory leak

commit 905e36ae172c83a30894a3adefab7d4f850fcf54 upstream.

If the opmode is stopped and started again we did not free
the paging buffers. Fix that.
In addition when freeing the firmware's paging download
buffer, set the pointer to NULL.

Signed-off-by: Matti Gottlieb <matti.gottlieb@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipr: Fix regression when loading firmware

commit 21b81716c6bff24cda52dc75588455f879ddbfe9 upstream.

Commit d63c7dd5bcb9 ("ipr: Fix out-of-bounds null overwrite") removed
the end of line handling when storing the update_fw sysfs attribute.
This changed the userpace API because it started refusing writes
terminated by a line feed, which broke the update tools we already have.

This patch re-adds that handling, so both a write terminated by a line
feed or not can make it through with the update.

Fixes: d63c7dd5bcb9 ("ipr: Fix out-of-bounds null overwrite")
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Insu Yun <wuninsu@gmail.com>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipr: Fix out-of-bounds null overwrite

commit d63c7dd5bcb9441af0526d370c43a65ca2c980d9 upstream.

Return value of snprintf is not bound by size value, 2nd argument.
(https://www.kernel.org/doc/htmldocs/kernel-api/API-snprintf.html).
Return value is number of printed chars, can be larger than 2nd
argument. Therefore, it can write null byte out of bounds ofbuffer.
Since snprintf puts null, it does not need to put additional null byte.

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Reviewed-by: Shane Seymour <shane.seymour@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rapidio/rionet: fix deadlock on SMP

commit 36915976eca58f2eefa040ba8f9939672564df61 upstream.

Fix deadlocking during concurrent receive and transmit operations on SMP
platforms caused by the use of incorrect lock: on transmit 'tx_lock'
spinlock should be used instead of 'lock' which is used for receive
operation.

This fix is applicable to kernel versions starting from v2.15.

Signed-off-by: Aurelien Jacquiot <a-jacquiot@ti.com>
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs/coredump: prevent fsuid=0 dumps into user-controlled directories

commit 378c6520e7d29280f400ef2ceaf155c86f05a71a upstream.

This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:

- The fs.suid_dumpable sysctl is set to 2.
- The kernel.core_pattern sysctl's value starts with "/". (Systems
   where kernel.core_pattern starts with "|/" are not affected.)
- Unprivileged user namespace creation is permitted. (This is
   true on Linux >=3.8, but some distributions disallow it by
   default using a distro patch.)

Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.

To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.

Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fuse: Add reference counting for fuse_io_priv

commit 744742d692e37ad5c20630e57d526c8f2e2fe3c9 upstream.

The 'reqs' member of fuse_io_priv serves two purposes. First is to track
the number of oustanding async requests to the server and to signal that
the io request is completed. The second is to be a reference count on the
structure to know when it can be freed.

For sync io requests these purposes can be at odds. fuse_direct_IO() wants
to block until the request is done, and since the signal is sent when
'reqs' reaches 0 it cannot keep a reference to the object. Yet it needs to
use the object after the userspace server has completed processing
requests. This leads to some handshaking and special casing that it
needlessly complicated and responsible for at least one race condition.

It's much cleaner and safer to maintain a separate reference count for the
object lifecycle and to let 'reqs' just be a count of outstanding requests
to the userspace server. Then we can know for sure when it is safe to free
the object without any handshaking or special cases.

The catch here is that most of the time these objects are stack allocated
and should not be freed. Initializing these objects with a single reference
that is never released prevents accidental attempts to free the objects.

Fixes: 9d5722b7777e ("fuse: handle synchronous iocbs internally")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fuse: do not use iocb after it may have been freed

commit 7cabc61e01a0a8b663bd2b4c982aa53048218734 upstream.

There's a race in fuse_direct_IO(), whereby is_sync_kiocb() is called on an
iocb that could have been freed if async io has already completed. The fix
in this case is simple and obvious: cache the result before starting io.

It was discovered by KASan:

kernel: ==================================================================
kernel: BUG: KASan: use after free in fuse_direct_IO+0xb1a/0xcc0 at addr ffff88036c414390

Signed-off-by: Robert Doebbelin <robert@quobyte.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: bcba24ccdc82 ("fuse: enable asynchronous processing direct IO")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md: multipath: don't hardcopy bio in .make_request path

commit fafcde3ac1a418688a734365203a12483b83907a upstream.

Inside multipath_make_request(), multipath maps the incoming
bio into low level device's bio, but it is totally wrong to
copy the bio into mapped bio via '*mapped_bio = *bio'. For
example, .__bi_remaining is kept in the copy, especially if
the incoming bio is chained to via bio splitting, so .bi_end_io
can't be called for the mapped bio at all in the completing path
in this kind of situation.

This patch fixes the issue by using clone style.

Reported-and-tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list

commit 550da24f8d62fe81f3c13e3ec27602d6e44d43dc upstream.

break_stripe_batch_list breaks up a batch and copies some flags from
the batch head to the members, preserving others.

It doesn't preserve or copy STRIPE_PREREAD_ACTIVE.  This is not
normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a
stripe_head is added to a batch, and is not set on stripe_heads
already in a batch.

However there is no locking to ensure one thread doesn't set the flag
after it has just been cleared in another.  This does occasionally happen.

md/raid5 maintains a count of the number of stripe_heads with
STRIPE_PREREAD_ACTIVE set: conf->preread_active_stripes.  When
break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently
this could becomes incorrect and will never again return to zero.

md/raid5 delays the handling of some stripe_heads until
preread_active_stripes becomes zero.  So when the above mention race
happens, those stripe_heads become blocked and never progress,
resulting is write to the array handing.

So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE
in the members of a batch.

URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153
URL: http://thread.gmane.org/5649C0E9.2030204@zoner.cz
Reported-by: Martin Svec <martin.svec@zoner.cz> (and others)
Tested-by: Tom Weber <linux@junkyard.4t2.com>
Fixes: 1b956f7a8f9a ("md/raid5: be more selective about distributing flags across batch.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

raid10: include bio_end_io_list in nr_queued to prevent freeze_array hang

commit 23ddba80ebe836476bb2fa1f5ef305dd1c63dc0b upstream.

This is the raid10 counterpart of the bug fixed by Nate
(raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang)

Fixes: 95af587e95(md/raid10: ensure device failure recorded before write request returns)
Cc: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

RAID5: revert e9e4c377e2f563 to fix a livelock

commit 6ab2a4b806ae21b6c3e47c5ff1285ec06d505325 upstream.

Revert commit
e9e4c377e2f563(md/raid5: per hash value and exclusive wait_for_stripe)

The problem is raid5_get_active_stripe waits on
conf->wait_for_stripe[hash]. Assume hash is 0. My test release stripes
in this order:
- release all stripes with hash 0
- raid5_get_active_stripe still sleeps since active_stripes >
max_nr_stripes * 3 / 4
- release all stripes with hash other than 0. active_stripes becomes 0
- raid5_get_active_stripe still sleeps, since nobody wakes up
wait_for_stripe[0]
The system live locks. The problem is active_stripes isn't a per-hash
count. Revert the patch makes the live lock go away.

Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

RAID5: check_reshape() shouldn't call mddev_suspend

commit 27a353c026a879a1001e5eac4bda75b16262c44a upstream.

check_reshape() is called from raid5d thread. raid5d thread shouldn't
call mddev_suspend(), because mddev_suspend() waits for all IO finish
but IO is handled in raid5d thread, we could easily deadlock here.

This issue is introduced by
738a273 ("md/raid5: fix allocation of 'scribble' array.")

Reported-and-tested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

md/raid5: Compare apples to apples (or sectors to sectors)

commit e7597e69dec59b65c5525db1626b9d34afdfa678 upstream.

'max_discard_sectors' is in sectors, while 'stripe' is in bytes.

This fixes the problem where DISCARD would get disabled on some larger
RAID5 configurations (6 or more drives in my testing), while it worked
as expected with smaller configurations.

Fixes: 620125f2bf8 ("MD: raid5 trim support")
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang

commit ccfc7bf1f09d6190ef86693ddc761d5fe3fa47cb upstream.

If raid1d is handling a mix of read and write errors, handle_read_error's
call to freeze_array can get stuck.

This can happen because, though the bio_end_io_list is initially drained,
writes can be added to it via handle_write_finished as the retry_list
is processed. These writes contribute to nr_pending but are not included
in nr_queued.

If a later entry on the retry_list triggers a call to handle_read_error,
freeze array hangs waiting for nr_pending == nr_queued+extra. The writes
on the bio_end_io_list aren't included in nr_queued so the condition will
never be satisfied.

To prevent the hang, include bio_end_io_list writes in nr_queued.

There's probably a better way to handle decrementing nr_queued, but this
seemed like the safest way to avoid breaking surrounding code.

I'm happy to supply the script I used to repro this hang.

Fixes: 55ce74d4bfe1b(md/raid1: ensure device failure recorded before write request returns.)
Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xfs: fix two memory leaks in xfs_attr_list.c error paths

commit 2e83b79b2d6c78bf1b4aa227938a214dcbddc83f upstream.

This plugs 2 trivial leaks in xfs_attr_shortform_list and
xfs_attr3_leaf_list_int.

Signed-off-by: Mateusz Guzik <mguzik@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

quota: Fix possible GPF due to uninitialised pointers

commit ab73ef46398e2c0159f3a71de834586422d2a44a upstream.

When dqget() in __dquot_initialize() fails e.g. due to IO error,
__dquot_initialize() will pass an array of uninitialized pointers to
dqput_all() and thus can lead to deference of random data. Fix the
problem by properly initializing the array.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARC: bitops: Remove non relevant comments

commit 2a41b6dc28dc71c1a3f1622612a26edc58f7561e upstream.

commit 80f420842ff42 removed the ARC bitops microoptimization but failed
to prune the comments to same effect

Fixes: 80f420842ff42 ("ARC: Make ARC bitops "safer" (add anti-optimization)")
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARC: [BE] readl()/writel() to work in Big Endian CPU configuration

commit f778cc65717687a3d3f26dd21bef62cd059f1b8b upstream.

read{l,w}() write{l,w}() primitives should use le{16,32}_to_cpu() and
cpu_to_le{16,32}() respectively to ensure device registers are read
correctly in Big Endian CPU configuration.

Per Arnd Bergmann
| Most drivers using readl() or readl_relaxed() expect those to perform byte
| swaps on big-endian architectures, as the registers tend to be fixed endian

This was needed for getting UART to work correctly on a Big Endian ARC.

The ARC accessors originally were fine, and the bug got introduced
inadventently by commit b8a033023994 ("ARCv2: barriers")

Fixes: b8a033023994 ("ARCv2: barriers")
Link: http://lkml.kernel.org/r/201603100845.30602.arnd@arndb.de
Cc: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Lada Trimasova <ltrimas@synopsys.com>
[vgupta: beefed up changelog, added Fixes/stable tags]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xtensa: clear all DBREAKC registers on start

commit 7de7ac785ae18a2cdc78d7560f48e3213d9ea0ab upstream.

There are XCHAL_NUM_DBREAK registers, clear them all.
This also fixes cryptic assembler error message with binutils 2.25 when
XCHAL_NUM_DBREAK is 0:

as: out of memory allocating 18446744073709551575 bytes after a total
of 495616 bytes

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xtensa: fix preemption in {clear,copy}_user_highpage

commit a67cc9aa2dfc6e66addf240bbd79e16e01565e81 upstream.

Disabling pagefault makes little sense there, preemption disabling is
what was meant.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

xtensa: ISS: don't hang if stdin EOF is reached

commit 362014c8d9d51d504c167c44ac280169457732be upstream.

Simulator stdin may be connected to a file, when its end is reached
kernel hangs in infinite loop inside rs_poll, because simc_poll always
signals that descriptor 0 is readable and simc_read always returns 0.
Check simc_read return value and exit loop if it's not positive. Also
don't rewind polling timer if it's zero.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

splice: handle zero nr_pages in splice_to_pipe()

commit d6785d9152147596f60234157da2b02540c3e60f upstream.

Running the following command:

busybox cat /sys/kernel/debug/tracing/trace_pipe > /dev/null

with any tracing enabled pretty very quickly leads to various NULL
pointer dereferences and VM BUG_ON()s, such as these:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
IP: [<ffffffff8119df6c>] generic_pipe_buf_release+0xc/0x40
Call Trace:
  [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0
  [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10
  [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0
  [<ffffffff81196869>] do_sendfile+0x199/0x380
  [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0
  [<ffffffff8192cbee>] entry_SYSCALL_64_fastpath+0x12/0x6d

page dumped because: VM_BUG_ON_PAGE(atomic_read(&page->_count) == 0)
kernel BUG at include/linux/mm.h:367!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
RIP: [<ffffffff8119df9c>] generic_pipe_buf_release+0x3c/0x40
Call Trace:
  [<ffffffff811c48a3>] splice_direct_to_actor+0x143/0x1e0
  [<ffffffff811c42e0>] ? generic_pipe_buf_nosteal+0x10/0x10
  [<ffffffff811c49cf>] do_splice_direct+0x8f/0xb0
  [<ffffffff81196869>] do_sendfile+0x199/0x380
  [<ffffffff81197600>] SyS_sendfile64+0x90/0xa0
  [<ffffffff8192cd1e>] tracesys_phase2+0x84/0x89

(busybox's cat uses sendfile(2), unlike the coreutils version)

This is because tracing_splice_read_pipe() can call splice_to_pipe()
with spd->nr_pages == 0.  spd_pages underflows in splice_to_pipe() and
we fill the page pointers and the other fields of the pipe_buffers with
garbage.

All other callers of splice_to_pipe() avoid calling it when nr_pages ==
0, and we could make tracing_splice_read_pipe() do that too, but it
seems reasonable to have splice_to_page() handle this condition
gracefully.

Signed-off-by: Rabin Vincent <rabin@rab.in>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vfs: show_vfsstat: do not ignore errors from show_devname method

commit 5f8d498d4364f544fee17125787a47553db02afa upstream.

Explicitly check show_devname method return code and bail out in case
of an error. This fixes regression introduced by commit 9d4d65748a5c.

Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

of: alloc anywhere from memblock if range not specified

commit e53b50c0cbe392c946807abf7d07615a3c588642 upstream.

early_init_dt_alloc_reserved_memory_arch passes end as 0 to
__memblock_alloc_base, when limits are not specified. But
__memblock_alloc_base takes end value of 0 as MEMBLOCK_ALLOC_ACCESSIBLE
and limits the end to memblock.current_limit. This results in regions
never being placed in HIGHMEM area, for e.g. CMA.
Let __memblock_alloc_base allocate from anywhere in memory if limits are
not specified.

Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: mvneta: enable change MAC address when interface is up

commit 928b6519afeb2a5e2dc61154380b545ed66c476a upstream.

Function eth_prepare_mac_addr_change() is called as part of MAC
address change. This function check if interface is running.
To enable change MAC address when interface is running:
IFF_LIVE_ADDR_CHANGE flag must be set to dev->priv_flags field

Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP
network unit")
Signed-off-by: Dmitri Epshtein <dima@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cgroup: ignore css_sets associated with dead cgroups during migration

commit 2b021cbf3cb6208f0d40fd2f1869f237934340ed upstream.

Before 2e91fa7f6d45 ("cgroup: keep zombies associated with their
original cgroups"), all dead tasks were associated with init_css_set.
If a zombie task is requested for migration, while migration prep
operations would still be performed on init_css_set, the actual
migration would ignore zombie tasks.  As init_css_set is always valid,
this worked fine.

However, after 2e91fa7f6d45, zombie tasks stay with the css_set it was
associated with at the time of death.  Let's say a task T associated
with cgroup A on hierarchy H-1 and cgroup B on hiearchy H-2.  After T
becomes a zombie, it would still remain associated with A and B.  If A
only contains zombie tasks, it can be removed.  On removal, A gets
marked offline but stays pinned until all zombies are drained.  At
this point, if migration is initiated on T to a cgroup C on hierarchy
H-2, migration path would try to prepare T's css_set for migration and
trigger the following.

WARNING: CPU: 0 PID: 1576 at kernel/cgroup.c:474 cgroup_get+0x121/0x160()
CPU: 0 PID: 1576 Comm: bash Not tainted 4.4.0-work+ #289
...
Call Trace:
  [<ffffffff8127e63c>] dump_stack+0x4e/0x82
  [<ffffffff810445e8>] warn_slowpath_common+0x78/0xb0
  [<ffffffff810446d5>] warn_slowpath_null+0x15/0x20
  [<ffffffff810c33e1>] cgroup_get+0x121/0x160
  [<ffffffff810c349b>] link_css_set+0x7b/0x90
  [<ffffffff810c4fbc>] find_css_set+0x3bc/0x5e0
  [<ffffffff810c5269>] cgroup_migrate_prepare_dst+0x89/0x1f0
  [<ffffffff810c7547>] cgroup_attach_task+0x157/0x230
  [<ffffffff810c7a17>] __cgroup_procs_write+0x2b7/0x470
  [<ffffffff810c7bdc>] cgroup_tasks_write+0xc/0x10
  [<ffffffff810c4790>] cgroup_file_write+0x30/0x1b0
  [<ffffffff811c68fc>] kernfs_fop_write+0x13c/0x180
  [<ffffffff81151673>] __vfs_write+0x23/0xe0
  [<ffffffff81152494>] vfs_write+0xa4/0x1a0
  [<ffffffff811532d4>] SyS_write+0x44/0xa0
  [<ffffffff814af2d7>] entry_SYSCALL_64_fastpath+0x12/0x6f

It doesn't make sense to prepare migration for css_sets pointing to
dead cgroups as they are guaranteed to contain only zombies which are
ignored later during migration.  This patch makes cgroup destruction
path mark all affected css_sets as dead and updates the migration path
to ignore them during preparation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 2e91fa7f6d45 ("cgroup: keep zombies associated with their original cgroups")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: Fix potential buffer overflow with Add Advertising

commit 6a0e78072c2ae7b20b14e0249d8108441ea928d2 upstream.

The Add Advertising command handler does the appropriate checks for
the AD and Scan Response data, however fails to take into account the
general length of the mgmt command itself, which could lead to
potential buffer overflows. This patch adds the necessary check that
the mgmt command length is consistent with the given ad and scan_rsp
lengths.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bluetooth: Add new AR3012 ID 0489:e095

commit 28c971d82fb58ef7cba22e5308be6d2d2590473d upstream.

T: Bus=01 Lev=01 Prnt=01 Port=04 Cnt=02 Dev#= 3 Spd=12 MxCh= 0
D: Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1
P: Vendor=0489 ProdID=e095 Rev=00.01
C: #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
I: If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
I: If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb

This device requires ar3k/AthrBT_0x31010100.dfu and
ar3k/ramps_0x31010100_40.dfu firmware files that are not in
linux-firmware yet.

BugLink: https://bugs.launchpad.net/bugs/1542944
Signed-off-by: Dmitry Tunin <hanipouspilot@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

watchdog: rc32434_wdt: fix ioctl error handling

commit 10e7ac22cdd4d211cef99afcb9371b70cb175be6 upstream.

Calling return copy_to_user(...) in an ioctl will not do the right thing
if there's a pagefault: copy_to_user returns the number of bytes not
copied in this case.

Fix up watchdog/rc32434_wdt to do
return copy_to_user(...)) ? -EFAULT : 0;

instead.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

watchdog: don't run proc_watchdog_update if new value is same as old

commit a1ee1932aa6bea0bb074f5e3ced112664e4637ed upstream.

While working on a script to restore all sysctl params before a series of
tests I found that writing any value into the
/proc/sys/kernel/{nmi_watchdog,soft_watchdog,watchdog,watchdog_thresh}
causes them to call proc_watchdog_update().

  NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
  NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
  NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
  NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.

There doesn't appear to be a reason for doing this work every time a write
occurs, so only do it when the values change.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ia64: define ioremap_uc()

commit b0f84ac352762ed02d7ea9f284942a8cab7f9077 upstream.

All architectures now need ioremap_uc(), ia64 seems defines this already
through its ioremap_nocache() and it already ensures it *only* uses UC.

This is needed since v4.3 to complete an allyesconfig compile on ia64,
there were others archs that needed this, and this one seems to have
fallen through the cracks.

Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage

commit b6e6edcfa40561e9c8abe5eecf1c96f8e5fd9c6f upstream.

Setting the original memory.limit_in_bytes hardlimit is subject to a
race condition when the desired value is below the current usage.  The
code tries a few times to first reclaim and then see if the usage has
dropped to where we would like it to be, but there is no locking, and
the workload is free to continue making new charges up to the old limit.
Thus, attempting to shrink a workload relies on pure luck and hope that
the workload happens to cooperate.

To fix this in the cgroup2 memory.max knob, do it the other way round:
set the limit first, then try enforcement.  And if reclaim is not able
to succeed, trigger OOM kills in the group.  Keep going until the new
limit is met, we run out of OOM victims and there's only unreclaimable
memory left, or the task writing to memory.max is killed.  This allows
users to shrink groups reliably, and the behavior is consistent with
what happens when new charges are attempted in excess of memory.max.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: memcontrol: reclaim when shrinking memory.high below usage

commit 588083bb37a3cea8533c392370a554417c8f29cb upstream.

When setting memory.high below usage, nothing happens until the next
charge comes along, and then it will only reclaim its own charge and not
the now potentially huge excess of the new memory.high. This can cause
groups to stay in excess of their memory.high indefinitely.

To fix that, when shrinking memory.high, kick off a reclaim cycle that
goes after the delta.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bcache: fix cache_set_flush() NULL pointer dereference on OOM

commit f8b11260a445169989d01df75d35af0f56178f95 upstream.

When bch_cache_set_alloc() fails to kzalloc the cache_set, the
asyncronous closure handling tries to dereference a cache_set that
hadn't yet been allocated inside of cache_set_flush() which is called
by __cache_set_unregister() during cleanup. This appears to happen only
during an OOM condition on bcache_register.

Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bcache: fix race of writeback thread starting before complete initialization

commit 07cc6ef8edc47f8b4fc1e276d31127a0a5863d4d upstream.

The bch_writeback_thread might BUG_ON in read_dirty() if
dc->sb==BDEV_STATE_DIRTY and bch_sectors_dirty_init has not yet completed
its related initialization. This patch downs the dc->writeback_lock until
after initialization is complete, thus preventing bch_writeback_thread
from proceeding prematurely.

See this thread:
http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3453

Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bcache: cleaned up error handling around register_cache()

commit 9b299728ed777428b3908ac72ace5f8f84b97789 upstream.

Fix null pointer dereference by changing register_cache() to return an int
instead of being void.  This allows it to return -ENOMEM or -ENODEV and
enables upper layers to handle the OOM case without NULL pointer issues.

See this thread:
  http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3521

Fixes this error:
  gargamel:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register

  bcache: register_cache() error opening sdh2: cannot allocate memory
  BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
  IP: [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache]
  PGD 120dff067 PUD 1119a3067 PMD 0
  Oops: 0000 [#1] SMP
  Modules linked in: veth ip6table_filter ip6_tables
  (...)
  CPU: 4 PID: 3371 Comm: kworker/4:3 Not tainted 4.4.2-amd64-i915-volpreempt-20160213bc1 #3
  Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
  Workqueue: events cache_set_flush [bcache]
  task: ffff88020d5dc280 ti: ffff88020b6f8000 task.ti: ffff88020b6f8000
  RIP: 0010:[<ffffffffc05a7e8d>]  [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache]

Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/srpt: Simplify srpt_handle_tsk_mgmt()

commit 51093254bf879bc9ce96590400a87897c7498463 upstream.

Let the target core check task existence instead of the SRP target
driver. Additionally, let the target core check the validity of the
task management request instead of the ib_srpt driver.

This patch fixes the following kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
IP: [<ffffffffa0565f37>] srpt_handle_new_iu+0x6d7/0x790 [ib_srpt]
Oops: 0002 [#1] SMP
Call Trace:
[<ffffffffa05660ce>] srpt_process_completion+0xde/0x570 [ib_srpt]
[<ffffffffa056669f>] srpt_compl_thread+0x13f/0x160 [ib_srpt]
[<ffffffff8109726f>] kthread+0xcf/0xe0
[<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Fixes: 3e4f574857ee ("ib_srpt: Convert TMR path to target_submit_tmr")
Tested-by: Alex Estrin <alex.estrin@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

brd: Fix discard request processing

commit 5e4298be45e83ecdffaabb370eea9396889b07f1 upstream.

Avoid that discard requests with size => PAGE_SIZE fail with
-EIO. Refuse discard requests if the discard size is not a
multiple of the page size.

Fixes: 2dbe54957636 ("brd: Refuse improperly aligned discard requests")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robert Elliot <elliott@hp.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path

commit c0a2ad9b50dd80eeccd73d9ff962234590d5ec93 upstream.

On umount path, jbd2_journal_destroy() writes latest transaction ID
(->j_tail_sequence) to be used at next mount.

The bug is that ->j_tail_sequence is not holding latest transaction ID
in some cases. So, at next mount, there is chance to conflict with
remaining (not overwritten yet) transactions.

mount (id=10)
write transaction (id=11)
write transaction (id=12)
umount (id=10) <= the bug doesn't write latest ID

mount (id=10)
write transaction (id=11)
crash

mount
[recovery process]
transaction (id=11)
transaction (id=12) <= valid transaction ID, but old commit
must not replay

Like above, this bug become the cause of recovery failure, or FS
corruption.

So why ->j_tail_sequence doesn't point latest ID?

Because if checkpoint transactions was reclaimed by memory pressure
(i.e. bdev_try_to_free_page()), then ->j_tail_sequence is not updated.
(And another case is, __jbd2_journal_clean_checkpoint_list() is called
with empty transaction.)

So in above cases, ->j_tail_sequence is not pointing latest
transaction ID at umount path. Plus, REQ_FLUSH for checkpoint is not
done too.

So, to fix this problem with minimum changes, this patch updates
->j_tail_sequence, and issue REQ_FLUSH. (With more complex changes,
some optimizations would be possible to avoid unnecessary REQ_FLUSH
for example though.)

BTW,

journal->j_tail_sequence =
++journal->j_transaction_sequence;

Increment of ->j_transaction_sequence seems to be unnecessary, but
ext3 does this.

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tools/hv: Use include/uapi with __EXPORTED_HEADERS__

commit 50fe6dd10069e7c062e27f29606f6e91ea979399 upstream.

Use the local uapi headers to keep in sync with "recently" added #define's
(e.g. VSS_OP_REGISTER1).

Fixes: 3eb2094c59e8 ("Adding makefile for tools/hv")
Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Fix unconditional GPIO toggle via automute

commit 1f7c6658962fa1260c1658d681bd6bb0c746b99a upstream.

Cirrus HD-audio driver may adjust GPIO pins for EAPD dynamically
depending on the jack plug state. This works fine for the auto-mute
mode where the speaker gets muted upon the HP jack plug. OTOH, when
the auto-mute mode is off, this turns off the EAPD unexpectedly
depending on the jack state, which results in the silent speaker
output.

This patch fixes the silent speaker output issue by setting GPIO bits
constantly when the auto-mute mode is off.

Reported-and-tested-by: moosotc@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - fix the mic mute button and led problem for a Lenovo AIO

commit 6ef2f68fa38bf415830f67903d87180d933e0f47 upstream.

This Lenovo ThinkCentre AIO also uses Line2 as mic mute button and
uses GPIO2 to control the mic mute led, so applying this quirk can
make both the button and led work.

BugLink: https://bugs.launchpad.net/bugs/1555912
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Don't handle ELD notify from invalid port

commit 4f8e4f3537cafc4de128e6bfdf83baa78bc60eb1 upstream.

The current Intel HDMI codec driver supports only three fixed ports
from port B to port D. However, i915 driver may assign a DP on other
ports, e.g. port A, when no eDP is used. This incompatibility is
caught later at pin_nid_to_pin_index() and results in a warning
message like "HDMI: pin nid 4 not registered" at each time.

This patch filters out such invalid events beforehand, so that the
kernel won't be too grumbling.

Reported-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: intel8x0: Add clock quirk entry for AD1981B on IBM ThinkPad X41.

commit 4061db03dd71d195b9973ee466f6ed32f6a3fc16 upstream.

The clock measurement on the AC'97 audio card found in the IBM ThinkPad X41
will often fail, so add a quirk entry to fix it.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=441087
Signed-off-by: Vittorio Gambaletta <linuxbugs@vittgam.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: pcm: Avoid "BUG:" string for warnings again

commit 0ab1ace856205d10cbc1924b2d931c01ffd216a6 upstream.

The commit [d507941beb1e: ALSA: pcm: Correct PCM BUG error message]
made the warning prefix back to "BUG:" due to its previous wrong
prefix. But a kernel message containing "BUG:" seems taken as an Oops
message wrongly by some brain-dead daemons, and it annoys users in the
end. Instead of teaching daemons, change the string again to a more
reasonable one.

Fixes: 507941beb1e ('ALSA: pcm: Correct PCM BUG error message')
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda - Apply reboot D3 fix for CX20724 codec, too

commit 56dc66ff1c6d71f9a38c4a7c000b72b921fe4c89 upstream.

Just like CX20722, CX7024 codec also requires the power down at reboot
in order to reduce the noise at reboot/shutdown.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=113511
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtip32xx: Cleanup queued requests after surprise removal

commit 008e56d200225321371748d95908e6222436f06d upstream.

Fail all pending requests after surprise removal of a drive.

Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtip32xx: Implement timeout handler

commit abb0ccd185c9e31847709b86192e6c815d1f57ad upstream.

Added timeout handler. Replaced blk_mq_end_request() with
blk_mq_complete_request() to avoid double completion of a request.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtip32xx: Handle FTL rebuild failure state during device initialization

commit aae4a033868c496adae86fc6f9c3e0c405bbf360 upstream.

Allow device initialization to finish gracefully when it is in
FTL rebuild failure state. Also, recover device out of this state
after successfully secure erasing it.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Vignesh Gunasekaran <vgunasekaran@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mtip32xx: Handle safe removal during IO

commit 51c6570eb922146470c2fe660c34585414679bd6 upstream.

Flush inflight IOs using fsync_bdev() when the device is safely
removed. Also, block further IOs in device open function.

Signed-off-by: Selvan Mani <smani@micron.com>
Signed-off-by: Rajesh Kumar Sambandam <rsambandam@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>