]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
7 years agokprobes/x86: Consolidate insn decoder users for copying code
Masami Hiramatsu [Wed, 29 Mar 2017 05:05:06 +0000 (14:05 +0900)]
kprobes/x86: Consolidate insn decoder users for copying code

Consolidate x86 instruction decoder users on the path of
copying original code for kprobes.

Kprobes decodes the same instruction a maximum of 3 times when
preparing the instruction buffer:

 - The first time for getting the length of the instruction,
 - the 2nd for adjusting displacement,
 - and the 3rd for checking whether the instruction is boostable or not.

For each time, the actual decoding target address is slightly
different (1st is original address or recovered instruction buffer,
2nd and 3rd are pointing to the copied buffer), but all have
the same instruction.

Thus, this patch also changes the target address to the copied
buffer at first and reuses the decoded "insn" for displacement
adjusting and checking boostability.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076389643.22469.13151892839998777373.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agokprobes/x86: Use probe_kernel_read() instead of memcpy()
Masami Hiramatsu [Wed, 29 Mar 2017 05:03:56 +0000 (14:03 +0900)]
kprobes/x86: Use probe_kernel_read() instead of memcpy()

Use probe_kernel_read() for avoiding unexpected faults while
copying kernel text in __recover_probed_insn(),
__recover_optprobed_insn() and __copy_instruction().

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076382624.22469.10091613887942958518.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agokprobes/x86: Set kprobes pages read-only
Masami Hiramatsu [Wed, 29 Mar 2017 05:02:46 +0000 (14:02 +0900)]
kprobes/x86: Set kprobes pages read-only

Set the pages which is used for kprobes' singlestep buffer
and optprobe's trampoline instruction buffer to readonly.
This can prevent unexpected (or unintended) instruction
modification.

This also passes rodata_test as below.

Without this patch, rodata_test shows a warning:

  WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:235 note_page+0x7a9/0xa20
  x86/mm: Found insecure W+X mapping at address ffffffffa0000000/0xffffffffa0000000

With this fix, no W+X pages are found:

  x86/mm: Checked W+X mappings: passed, no W+X pages found.
  rodata_test: all tests were successful

Reported-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076375592.22469.14174394514338612247.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agokprobes/x86: Make boostable flag boolean
Masami Hiramatsu [Wed, 29 Mar 2017 05:01:35 +0000 (14:01 +0900)]
kprobes/x86: Make boostable flag boolean

Make arch_specific_insn.boostable to boolean, since it has
only 2 states, boostable or not. So it is better to use
boolean from the viewpoint of code readability.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076368566.22469.6322906866458231844.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agokprobes/x86: Do not modify singlestep buffer while resuming
Masami Hiramatsu [Wed, 29 Mar 2017 05:00:25 +0000 (14:00 +0900)]
kprobes/x86: Do not modify singlestep buffer while resuming

Do not modify singlestep execution buffer (kprobe.ainsn.insn)
while resuming from single-stepping, instead, modifies
the buffer to add a jump back instruction at preparing
buffer.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076361560.22469.1610155860343077495.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agokprobes/x86: Use instruction decoder for booster
Masami Hiramatsu [Wed, 29 Mar 2017 04:59:15 +0000 (13:59 +0900)]
kprobes/x86: Use instruction decoder for booster

Use x86 instruction decoder for checking whether the probed
instruction is able to boost or not, instead of hand-written
code.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076354563.22469.13379472209338986858.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agokprobes/x86: Fix the description of __copy_instruction()
Masami Hiramatsu [Wed, 29 Mar 2017 04:58:06 +0000 (13:58 +0900)]
kprobes/x86: Fix the description of __copy_instruction()

Fix the description comment of __copy_instruction() function
since it has already been changed to return the length of the
copied instruction.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076347582.22469.3775133607244923462.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agokprobes/x86: Fix kprobe-booster not to boost far call instructions
Masami Hiramatsu [Wed, 29 Mar 2017 04:56:56 +0000 (13:56 +0900)]
kprobes/x86: Fix kprobe-booster not to boost far call instructions

Fix the kprobe-booster not to boost far call instruction,
because a call may store the address in the single-step
execution buffer to the stack, which should be modified
after single stepping.

Currently, this instruction will be filtered as not
boostable in resume_execution(), so this is not a
critical issue.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ye Xiaolong <xiaolong.ye@intel.com>
Link: http://lkml.kernel.org/r/149076340615.22469.14066273186134229909.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoMerge tag 'perf-core-for-mingo-4.12-20170411' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Wed, 12 Apr 2017 05:29:13 +0000 (07:29 +0200)]
Merge tag 'perf-core-for-mingo-4.12-20170411' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

perf/core improvements and fixes:

User visible changes:

- Support s390 jump instructions in perf annotate (Christian Borntraeger)

- When failing to setup multiple events (e.g. '-e irq_vectors:*'), state
  which one caused the failure (Yao Jin)

- Various fixes for pipe mode, where the output of 'perf record' is
  written to stdout instead of to a perf.data file, fixing workloads
  such as: (David Carrillo-Cisneros)

    $ perf record -o - noploop | perf inject -b > perf.data

    $ perf record -o - noploop | perf annotate

Infrastructure changes:

- Simplify ltrim() implementation (Arnaldo Carvalho de Melo)

- Use ltrim() and rtrim() in places where ad-hoc equivalents were being
  used (Taeung Song)

 Conflicts:
tools/perf/util/annotate.c

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoMerge branch 'perf/urgent' into perf/core, to pick up fixes
Ingo Molnar [Wed, 12 Apr 2017 05:28:25 +0000 (07:28 +0200)]
Merge branch 'perf/urgent' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoMerge tag 'perf-urgent-for-mingo-4.11-20170411' of git://git.kernel.org/pub/scm/linux...
Ingo Molnar [Tue, 11 Apr 2017 19:41:39 +0000 (21:41 +0200)]
Merge tag 'perf-urgent-for-mingo-4.11-20170411' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull 'perf annotate' fix for s390:

- The move to support cross arch annotation introduced per arch
  initialization requirements, fullfill them for s/390 (Christian Borntraeger)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoperf annotate: Use stripped line instead of raw disassemble line
Taeung Song [Sat, 8 Apr 2017 00:52:25 +0000 (09:52 +0900)]
perf annotate: Use stripped line instead of raw disassemble line

When parsing disassemble lines for source line number, use a stripped
line instead of raw line.

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1491612748-1605-3-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf annotate: Refactor the code to parse disassemble lines with {l,r}trim()
Taeung Song [Sat, 8 Apr 2017 00:52:24 +0000 (09:52 +0900)]
perf annotate: Refactor the code to parse disassemble lines with {l,r}trim()

When parsing disassemble lines, use ltrim() and rtrim() to strip them,
not using just while loop and isspace().

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1491612748-1605-2-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf tools: Do not print missing features in pipe-mode
David Carrillo-Cisneros [Mon, 10 Apr 2017 20:14:32 +0000 (13:14 -0700)]
perf tools: Do not print missing features in pipe-mode

Pipe-mode has no perf.data header, hence no upfront knowledge of presend
and missing features, hence, do not print missing features in pipe-mode.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170410201432.24807-8-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf session: Don't rely on evlist in pipe mode
David Carrillo-Cisneros [Mon, 10 Apr 2017 20:14:30 +0000 (13:14 -0700)]
perf session: Don't rely on evlist in pipe mode

Session sets a number parameters that rely on evlist. These parameters
are not used in pipe-mode and should not be set, since evlist is
unavailable. Fix that.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170410201432.24807-6-davidcc@google.com
[ Check if file != NULL in perf_session__new(), like when used by builtin-top.c ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf annotate: Process attr and build_id records
David Carrillo-Cisneros [Mon, 10 Apr 2017 20:14:29 +0000 (13:14 -0700)]
perf annotate: Process attr and build_id records

perf annotate did not get some love for pipe-mode, and did not have
.attr and .buil_id setup (while record and inject did. Fix that.

It can easily be reproduced by:

  perf record -o - noploop | perf annotate

that in my system shows:
    0xd8 [0x28]: failed to process type: 9

Committer Testing:

Before:

  $ perf record -o - stress -t 2 -c 2 | perf annotate --stdio
  stress: info: [11060] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
  0x4470 [0x28]: failed to process type: 9
  $ stress: info: [11060] successful run completed in 2s

  $

After:

  $ perf record -o - stress -t 2 -c 2 | perf annotate --stdio
  stress: info: [11871] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
  stress: info: [11871] successful run completed in 2s
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 0.000 MB - ]
  no symbols found in /usr/bin/stress, maybe install a debug package?
   Percent |      Source code & Disassembly of libc-2.24.so for cycles:uhH (6117 samples)
  ---------------------------------------------------------------------------------------
           :
           :      Disassembly of section .text:
           :
           :      000000000003b050 <random_r>:
           :      __random_r():
     10.56 :        3b050:       test   %rdi,%rdi
      0.00 :        3b053:       je     3b0d0 <random_r+0x80>
      0.34 :        3b055:       test   %rsi,%rsi
      0.00 :        3b058:       je     3b0d0 <random_r+0x80>
      0.46 :        3b05a:       mov    0x18(%rdi),%eax
     12.44 :        3b05d:       mov    0x10(%rdi),%r8
      0.18 :        3b061:       test   %eax,%eax
      0.00 :        3b063:       je     3b0b0 <random_r+0x60>
<SNIP>

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170410201432.24807-5-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf tools: Describe pipe mode in perf.data-file-fomat.txt
David Carrillo-Cisneros [Mon, 10 Apr 2017 20:14:28 +0000 (13:14 -0700)]
perf tools: Describe pipe mode in perf.data-file-fomat.txt

Add a minimal description of pipe's data format.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170410201432.24807-4-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf inject: Copy events when reordering events in pipe mode
David Carrillo-Cisneros [Mon, 10 Apr 2017 20:14:27 +0000 (13:14 -0700)]
perf inject: Copy events when reordering events in pipe mode

__perf_session__process_pipe_events reuses the same memory buffer to
process all events in the pipe.

When reordering is needed (e.g. -b option), events are not immediately
flushed, but kept around until reordering is possible, causing
memory corruption.

The problem is usually observed by a "Unknown sample error" output. It
can easily be reproduced by:

  perf record -o - noploop | perf inject -b > output

Committer testing:

Before:

  $ perf record -o - stress -t 2 -c 2 | perf inject -b > /dev/null
  stress: info: [8297] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
  stress: info: [8297] successful run completed in 2s
  [ perf record: Woken up 3 times to write data ]
  [ perf record: Captured and wrote 0.000 MB - ]
  Warning:
  Found 1 unknown events!

  Is this an older tool processing a perf.data file generated by a more recent tool?

  If that is not the case, consider reporting to linux-kernel@vger.kernel.org.

  $

After:

  $ perf record -o - stress -t 2 -c 2 | perf inject -b > /dev/null
  stress: info: [9027] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
  stress: info: [9027] successful run completed in 2s
  [ perf record: Woken up 3 times to write data ]
  [ perf record: Captured and wrote 0.000 MB - ]
  no symbols found in /usr/bin/stress, maybe install a debug package?
  no symbols found in /usr/bin/stress, maybe install a debug package?
  $

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170410201432.24807-3-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf inject: Don't proceed if perf_session__process_event() fails
David Carrillo-Cisneros [Mon, 10 Apr 2017 20:14:26 +0000 (13:14 -0700)]
perf inject: Don't proceed if perf_session__process_event() fails

All paths following perf_session__process_event() in __cmd_inject() are
useless if __cmd_inject() is to fail, some depend on a correct
session->evlist.

First commit to add code that depends on session->evlist without checking
error was commmit e558a5bd8b ("perf inject: Work with files"). It has
grown since then.

Change __cmd_inject() to fail immediately after
perf_session__process_event() fails.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: He Kuang <hekuang@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: e558a5bd8b74 ("perf inject: Work with files")
Link: http://lkml.kernel.org/r/20170410201432.24807-2-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf annotate s390: Implement jump types for perf annotate
Christian Borntraeger [Thu, 6 Apr 2017 07:51:52 +0000 (09:51 +0200)]
perf annotate s390: Implement jump types for perf annotate

Implement simple detection for all kind of jumps and branches.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-s390 <linux-s390@vger.kernel.org>
Cc: stable@kernel.org # v4.10+
Link: http://lkml.kernel.org/r/1491465112-45819-3-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf annotate s390: Fix perf annotate error -95 (4.10 regression)
Christian Borntraeger [Thu, 6 Apr 2017 07:51:51 +0000 (09:51 +0200)]
perf annotate s390: Fix perf annotate error -95 (4.10 regression)

since 4.10 perf annotate exits on s390 with an "unknown error -95".
Turns out that commit 786c1b51844d ("perf annotate: Start supporting
cross arch annotation") added a hard requirement for architecture
support when objdump is used but only provided x86 and arm support.
Meanwhile power was added so lets add s390 as well.

While at it make sure to implement the branch and jump types.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-s390 <linux-s390@vger.kernel.org>
Cc: stable@kernel.org # v4.10+
Fixes: 786c1b51844 "perf annotate: Start supporting cross arch annotation"
Link: http://lkml.kernel.org/r/1491465112-45819-2-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf string: Simplify ltrim() implementation
Arnaldo Carvalho de Melo [Fri, 7 Apr 2017 15:19:38 +0000 (12:19 -0300)]
perf string: Simplify ltrim() implementation

We don't need to use strlen(), a var, or check for the end explicitely,
isspace('\0') is false:

  [acme@jouet c]$ cat ltrim.c
  #include <ctype.h>
  #include <stdio.h>

  static char *ltrim(char *s)
  {
  while (isspace(*s))
  ++s;
  return s;
  }

  int main(void)
  {
  printf("ltrim(\"\")='%s'\n", ltrim(""));
  return 0;
  }
  [acme@jouet c]$ ./ltrim
  ltrim("")=''
  [acme@jouet c]$

Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/n/tip-w3nk0x3pai2vojk2ab6kdvaw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf tools: Refactor the code to strip command name with {l,r}trim()
Taeung Song [Fri, 7 Apr 2017 14:24:21 +0000 (23:24 +0900)]
perf tools: Refactor the code to strip command name with {l,r}trim()

After reading command name from /proc/<pid>/status, use ltrim() and
rtrim() to strip command name, not using just while loop, isspace() and
etc.

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1491575061-704-6-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf pmu: Refactor wordwrap() with ltrim()
Taeung Song [Fri, 7 Apr 2017 14:24:20 +0000 (23:24 +0900)]
perf pmu: Refactor wordwrap() with ltrim()

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1491575061-704-5-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf ui browser: Refactor the code to parse color configs with ltrim()
Taeung Song [Fri, 7 Apr 2017 14:24:19 +0000 (23:24 +0900)]
perf ui browser: Refactor the code to parse color configs with ltrim()

When parsing {fore, back} ground color configs, use ltrim() instead of
just while loop and isspace().

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1491575061-704-4-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf stat: Refactor the code to strip csv output with ltrim()
Taeung Song [Fri, 7 Apr 2017 14:24:18 +0000 (23:24 +0900)]
perf stat: Refactor the code to strip csv output with ltrim()

To strip csv output, use ltrim() instead of just while loop and
isspace() at print_metric_{only}_csv().

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1491575061-704-3-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf evsel: Return exact sub event which failed with EPERM for wildcards
Jin Yao [Fri, 7 Apr 2017 12:08:52 +0000 (20:08 +0800)]
perf evsel: Return exact sub event which failed with EPERM for wildcards

The kernel has a special check for a specific irq_vectors trace event.

TRACE_EVENT_PERF_PERM(irq_work_exit,
is_sampling_event(p_event) ? -EPERM : 0);

The perf-record fails for this irq_vectors event when it is present,
like when using a wildcard:

  root@skl:/tmp# perf record -a -e irq_vectors:* sleep 2
  Error:
  You may not have permission to collect system-wide stats.

  Consider tweaking /proc/sys/kernel/perf_event_paranoid,
  which controls use of the performance events system by
  unprivileged users (without CAP_SYS_ADMIN).

  The current value is 2:

    -1: Allow use of (almost) all events by all users
  >= 0: Disallow raw tracepoint access by users without CAP_IOC_LOCK
  >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
  >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN

  To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

        kernel.perf_event_paranoid = -1

This patch prints out the exact sub event that failed with EPERM for
wildcards to help in understanding what went wrong when this event is
present:

After the patch:

  root@skl:/tmp# perf record -a -e irq_vectors:* sleep 2
  Error:
  No permission to enable irq_vectors:irq_work_exit event.

  You may not have permission to collect system-wide stats.
  ......

Committer notes:

So we have a lot of irq_vectors events:

  [root@jouet ~]# perf list irq_vectors:*

  List of pre-defined events (to be used in -e):

    irq_vectors:call_function_entry                    [Tracepoint event]
    irq_vectors:call_function_exit                     [Tracepoint event]
    irq_vectors:call_function_single_entry             [Tracepoint event]
    irq_vectors:call_function_single_exit              [Tracepoint event]
    irq_vectors:deferred_error_apic_entry              [Tracepoint event]
    irq_vectors:deferred_error_apic_exit               [Tracepoint event]
    irq_vectors:error_apic_entry                       [Tracepoint event]
    irq_vectors:error_apic_exit                        [Tracepoint event]
    irq_vectors:irq_work_entry                         [Tracepoint event]
    irq_vectors:irq_work_exit                          [Tracepoint event]
    irq_vectors:local_timer_entry                      [Tracepoint event]
    irq_vectors:local_timer_exit                       [Tracepoint event]
    irq_vectors:reschedule_entry                       [Tracepoint event]
    irq_vectors:reschedule_exit                        [Tracepoint event]
    irq_vectors:spurious_apic_entry                    [Tracepoint event]
    irq_vectors:spurious_apic_exit                     [Tracepoint event]
    irq_vectors:thermal_apic_entry                     [Tracepoint event]
    irq_vectors:thermal_apic_exit                      [Tracepoint event]
    irq_vectors:threshold_apic_entry                   [Tracepoint event]
    irq_vectors:threshold_apic_exit                    [Tracepoint event]
    irq_vectors:x86_platform_ipi_entry                 [Tracepoint event]
    irq_vectors:x86_platform_ipi_exit                  [Tracepoint event]
  #

And some may be sampled:

  [root@jouet ~]# perf record -e irq_vectors:local* sleep 20s
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.020 MB perf.data (2 samples) ]
  [root@jouet ~]# perf report -D | egrep 'stats:|events:'
  Aggregated stats:
             TOTAL events:        155
              MMAP events:        144
              COMM events:          2
              EXIT events:          1
            SAMPLE events:          2
             MMAP2 events:          4
    FINISHED_ROUND events:          1
         TIME_CONV events:          1
  irq_vectors:local_timer_entry stats:
             TOTAL events:          1
            SAMPLE events:          1
  irq_vectors:local_timer_exit stats:
             TOTAL events:          1
            SAMPLE events:          1
  [root@jouet ~]#

But, as shown in the tracepoint definition at the start of this message,
some, like "irq_vectors:irq_work_exit", may not be sampled, just counted,
i.e. if we try to sample, as when using 'perf record', we get an error:

  [root@jouet ~]# perf record -e irq_vectors:irq_work_exit
  Error:
  You may not have permission to collect system-wide stats.

  Consider tweaking /proc/sys/kernel/perf_event_paranoid,
<SNIP>

The error message is misleading, this patch will help in pointing out
what is the event causing such an error, but the error message needs
improvement, i.e. we need to figure out a way to check if a tracepoint
is counting only, like this one, when all we can do is to count it with
'perf stat', at most printing the delta using interval printing, as in:

   [root@jouet ~]# perf stat -I 5000 -e irq_vectors:irq_work_*
  #           time             counts unit events
       5.000168871                  0      irq_vectors:irq_work_entry
       5.000168871                  0      irq_vectors:irq_work_exit
      10.000676730                  0      irq_vectors:irq_work_entry
      10.000676730                  0      irq_vectors:irq_work_exit
      15.001122415                  0      irq_vectors:irq_work_entry
      15.001122415                  0      irq_vectors:irq_work_exit
      20.001298051                  0      irq_vectors:irq_work_entry
      20.001298051                  0      irq_vectors:irq_work_exit
      25.001485020                  1      irq_vectors:irq_work_entry
      25.001485020                  1      irq_vectors:irq_work_exit
      30.001658706                  0      irq_vectors:irq_work_entry
      30.001658706                  0      irq_vectors:irq_work_exit
  ^C    32.045711878                  0      irq_vectors:irq_work_entry
      32.045711878                  0      irq_vectors:irq_work_exit

  [root@jouet ~]#

But at least, when we use a wildcard, this patch helps a bit.

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1491566932-503-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf script: Use strtok_r() when parsing output field list
Arnaldo Carvalho de Melo [Wed, 5 Apr 2017 14:43:41 +0000 (11:43 -0300)]
perf script: Use strtok_r() when parsing output field list

Just avoiding non-reentrant functions.

Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/n/tip-eqytykipd74epzl9aexvppcg@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf callchains: Switch from strtok() to strtok_r() when parsing options
Arnaldo Carvalho de Melo [Wed, 5 Apr 2017 14:38:10 +0000 (11:38 -0300)]
perf callchains: Switch from strtok() to strtok_r() when parsing options

Trying to keep everything reentrant.

Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/n/tip-rdce0p2k9e1b4qnrb8ki9mtf@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoperf/amd/uncore: Fix pr_fmt() prefix
Borislav Petkov [Mon, 10 Apr 2017 12:20:47 +0000 (14:20 +0200)]
perf/amd/uncore: Fix pr_fmt() prefix

Make it "perf/amd/uncore: ", i.e., something more specific than "perf: ".

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170410122047.3026-4-bp@alien8.de
[ Changed it to perf/amd/uncore/ ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoperf/amd/uncore: Clean up per-family setup
Borislav Petkov [Mon, 10 Apr 2017 12:20:46 +0000 (14:20 +0200)]
perf/amd/uncore: Clean up per-family setup

Fam16h is the same as the default one, remove it. Turn the switch-case
into a simple if-else.

No functionality change.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170410122047.3026-3-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoperf/amd/uncore: Do feature check first, before assignments
Borislav Petkov [Mon, 10 Apr 2017 12:20:45 +0000 (14:20 +0200)]
perf/amd/uncore: Do feature check first, before assignments

... and save some unnecessary work. Remove now unused label while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170410122047.3026-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoMerge tag 'v4.11-rc6' into perf/core, to pick up fixes
Ingo Molnar [Tue, 11 Apr 2017 06:42:47 +0000 (08:42 +0200)]
Merge tag 'v4.11-rc6' into perf/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
7 years agoMerge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Mon, 10 Apr 2017 16:37:43 +0000 (09:37 -0700)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes a number of bugs in the caam driver:

   - device creation fails after release

   - error-path NULL-pointer dereference

   - spurious hardware error in RNG deinstantiation"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: caam - fix RNG deinstantiation error checking
  crypto: caam - fix invalid dereference in caam_rsa_init_tfm()
  crypto: caam - fix JR platform device subsequent (re)creations

7 years agoLinux 4.11-rc6 v4.11-rc6
Linus Torvalds [Sun, 9 Apr 2017 16:49:44 +0000 (09:49 -0700)]
Linux 4.11-rc6

7 years agoMerge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 9 Apr 2017 16:10:02 +0000 (09:10 -0700)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

Pull CIFS fixes from Steve French:
 "This is a set of CIFS/SMB3 fixes for stable.

  There is another set of four SMB3 reconnect fixes for stable in
  progress but they are still being reviewed/tested, so didn't want to
  wait any longer to send these five below"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  Reset TreeId to zero on SMB2 TREE_CONNECT
  CIFS: Fix build failure with smb2
  Introduce cifs_copy_file_range()
  SMB3: Rename clone_range to copychunk_range
  Handle mismatched open calls

7 years agoMerge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Linus Torvalds [Sun, 9 Apr 2017 16:05:25 +0000 (09:05 -0700)]
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
 "A number of ARM fixes:

   - prevent oopses caused by dma_get_sgtable() and declared DMA
     coherent memory

   - fix boot failure on nommu caused by ID_PFR1 access

   - a number of kprobes fixes from Jon Medhurst and Masami Hiramatsu"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8665/1: nommu: access ID_PFR1 only if CPUID scheme
  ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
  arm: kprobes: Align stack to 8-bytes in test code
  arm: kprobes: Fix the return address of multiple kretprobes
  arm: kprobes: Skip single-stepping in recursing path if possible
  arm: kprobes: Allow to handle reentered kprobe on single-stepping

7 years agoMerge tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 9 Apr 2017 16:03:51 +0000 (09:03 -0700)]
Merge tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are 3 small fixes for 4.11-rc6.

  One resolves a reported issue with sysfs files that NeilBrown found,
  one is a documenatation fix for the stable kernel rules, and the last
  is a small MAINTAINERS file update for kernfs"

* tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  MAINTAINERS: separate out kernfs maintainership
  sysfs: be careful of error returns from ops->show()
  Documentation: stable-kernel-rules: fix stable-tag format

7 years agoMerge tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 9 Apr 2017 16:02:31 +0000 (09:02 -0700)]
Merge tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO driver rfixes from Greg KH:
 "Here are a number of small IIO and staging driver fixes for 4.11-rc6.
  Nothing big here, just iio fixes for reported issues, and an ashmem
  fix for a very old bug that has been reported by a number of Android
  vendors"

* tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
  iio: hid-sensor-attributes: Fix sensor property setting failure.
  iio: accel: hid-sensor-accel-3d: Fix duplicate scan index error
  iio: core: Fix IIO_VAL_FRACTIONAL_LOG2 for negative values
  iio: st_pressure: initialize lps22hb bootime
  iio: bmg160: reset chip when probing
  iio: cros_ec_sensors: Fix return value to get raw and calibbias data.

7 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Linus Torvalds [Sun, 9 Apr 2017 15:26:21 +0000 (08:26 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull VFS fixes from Al Viro:
 "statx followup fixes and a fix for stack-smashing on alpha"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  alpha: fix stack smashing in old_adjtimex(2)
  statx: Include a mask for stx_attributes in struct statx
  statx: Reserve the top bit of the mask for future struct expansion
  xfs: report crtime and attribute flags to statx
  ext4: Add statx support
  statx: optimize copy of struct statx to userspace
  statx: remove incorrect part of vfs_statx() comment
  statx: reject unknown flags when using NULL path
  Documentation/filesystems: fix documentation for ->getattr()

7 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 8 Apr 2017 18:56:58 +0000 (11:56 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Here's a pull request for 4.11-rc, fixing a set of issues mostly
  centered around the new scheduling framework. These have been brewing
  for a while, but split up into what we absolutely need in 4.11, and
  what we can defer until 4.12. These are well tested, on both single
  queue and multiqueue setups, and with and without shared tags. They
  fix several hangs that have happened in testing.

  This is obviously larger than I would have preferred at this point in
  time, but I don't think we can shave much off this and still get the
  desired results.

  In detail, this pull request contains:

   - a set of five fixes for NVMe, mostly from Christoph and one from
     Roland.

   - a series from Bart, fixing issues with dm-mq and SCSI shared tags
     and scheduling. Note that one of those patches commit messages may
     read like an optimization, but it is in fact an important fix for
     queue restarts in particular.

   - a series from Omar, most importantly fixing a hang with multiple
     hardware queues when we fail to get a driver tag. Another important
     fix in there is for resizing hardware queues, which nbd does when
     handling multiple sockets for one connection.

   - fixing an imbalance in putting the ctx for hctx request allocations
     from Minchan"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: Restart a single queue if tag sets are shared
  dm rq: Avoid that request processing stalls sporadically
  scsi: Avoid that SCSI queues get stuck
  blk-mq: Introduce blk_mq_delay_run_hw_queue()
  blk-mq: remap queues when adding/removing hardware queues
  blk-mq-sched: fix crash in switch error path
  blk-mq-sched: set up scheduler tags when bringing up new queues
  blk-mq-sched: refactor scheduler initialization
  blk-mq: use the right hctx when getting a driver tag fails
  nvmet: fix byte swap in nvmet_parse_io_cmd
  nvmet: fix byte swap in nvmet_execute_write_zeroes
  nvmet: add missing byte swap in nvmet_get_smart_log
  nvme: add missing byte swap in nvme_setup_discard
  nvme: Correct NVMF enum values to match NVMe-oF rev 1.0
  block: do not put mq context in blk_mq_alloc_request_hctx

7 years agoMerge tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Sat, 8 Apr 2017 18:43:38 +0000 (11:43 -0700)]
Merge tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fix from Linus Walleij:
 "This late fix for pin control is hopefully the last I send this cycle.

  The problem was detected early in the v4.11 release cycle and there
  has been some back and forth on how to solve it. Sadly the proper fix
  arrives late, but at least not too late.

  An issue was detected with pin control on the Freescale i.MX after the
  refactorings for more general group and function handling.

  We now have the proper fix for this"

* tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()

7 years agoMerge tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sat, 8 Apr 2017 18:06:12 +0000 (11:06 -0700)]
Merge tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 4.11:

  Headed to stable:

   - disable HFSCR[TM] if TM is not supported, fixes a potential host
     kernel crash triggered by a hostile guest, but only in
     configurations that no one uses

   - don't try to fix up misaligned load-with-reservation instructions

   - fix flush_(d|i)cache_range() called from modules on little endian
     kernels

   - add missing global TLB invalidate if cxl is active

   - fix missing preempt_disable() in crc32c-vpmsum

  And a fix for selftests build changes that went in this release:

   - selftests/powerpc: Fix standalone powerpc build

  Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran,
  Paul Mackerras"

* tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
  powerpc/mm: Add missing global TLB invalidate if cxl is active
  powerpc/64: Fix flush_(d|i)cache_range() called from modules
  powerpc: Don't try to fix up misaligned load-with-reservation instructions
  powerpc: Disable HFSCR[TM] if TM is not supported
  selftests/powerpc: Fix standalone powerpc build

7 years agomm/mempolicy.c: fix error handling in set_mempolicy and mbind.
Chris Salls [Sat, 8 Apr 2017 06:48:11 +0000 (23:48 -0700)]
mm/mempolicy.c: fix error handling in set_mempolicy and mbind.

In the case that compat_get_bitmap fails we do not want to copy the
bitmap to the user as it will contain uninitialized stack data and leak
sensitive data.

Signed-off-by: Chris Salls <salls@cs.ucsb.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec
Liping Zhang [Fri, 7 Apr 2017 15:51:07 +0000 (23:51 +0800)]
sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec

Currently, inputting the following command will succeed but actually the
value will be truncated:

  # echo 0x12ffffffff > /proc/sys/net/ipv4/tcp_notsent_lowat

This is not friendly to the user, so instead, we should report error
when the value is larger than UINT_MAX.

Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMAINTAINERS: separate out kernfs maintainership
Tejun Heo [Thu, 23 Mar 2017 17:34:47 +0000 (13:34 -0400)]
MAINTAINERS: separate out kernfs maintainership

Separate out kernfs from driver core and add myself as a
co-maintainer.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agosysfs: be careful of error returns from ops->show()
NeilBrown [Mon, 3 Apr 2017 01:30:34 +0000 (11:30 +1000)]
sysfs: be careful of error returns from ops->show()

ops->show() can return a negative error code.
Commit 65da3484d9be ("sysfs: correctly handle short reads on PREALLOC attrs.")
(in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors
would look like large numbers.
As a result, if an error is returned, sysfs_kf_read() will return the
value of 'count', typically 4096.

Commit 17d0774f8068 ("sysfs: correctly handle read offset on PREALLOC attrs")
(in v4.8) extended this error to use the unsigned large 'len' as a size for
memmove().
Consequently, if ->show returns an error, then the first read() on the
sysfs file will return 4096 and could return uninitialized memory to
user-space.
If the application performs a subsequent read, this will trigger a memmove()
with extremely large count, and is likely to crash the machine is bizarre ways.

This bug can currently only be triggered by reading from an md
sysfs attribute declared with __ATTR_PREALLOC() during the
brief period between when mddev_put() deletes an mddev from
the ->all_mddevs list, and when mddev_delayed_delete() - which is
scheduled on a workqueue - completes.
Before this, an error won't be returned by the ->show()
After this, the ->show() won't be called.

I can reproduce it reliably only by putting delay like
usleep_range(500000,700000);
early in mddev_delayed_delete(). Then after creating an
md device md0 run
  echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state

The bug can be triggered without the usleep.

Fixes: 65da3484d9be ("sysfs: correctly handle short reads on PREALLOC attrs.")
Fixes: 17d0774f8068 ("sysfs: correctly handle read offset on PREALLOC attrs")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoDocumentation: stable-kernel-rules: fix stable-tag format
Johan Hovold [Mon, 3 Apr 2017 13:53:34 +0000 (15:53 +0200)]
Documentation: stable-kernel-rules: fix stable-tag format

A patch documenting how to specify which kernels a particular fix should
be backported to (seemingly) inadvertently added a minus sign after the
kernel version. This particular stable-tag format had never been used
prior to this patch, and was neither present when the patch in question
was first submitted (it was added in v2 without any comment).

Drop the minus sign to avoid any confusion.

Fixes: fdc81b7910ad ("stable_kernel_rules: Add clause about specification of kernel versions to patch.")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agostaging: android: ashmem: lseek failed due to no FMODE_LSEEK.
Shuxiao Zhang [Thu, 6 Apr 2017 14:30:29 +0000 (22:30 +0800)]
staging: android: ashmem: lseek failed due to no FMODE_LSEEK.

vfs_llseek will check whether the file mode has
FMODE_LSEEK, no return failure. But ashmem can be
lseek, so add FMODE_LSEEK to ashmem file.

Comment From Greg Hackmann:
ashmem_llseek() passes the llseek() call through to the backing
shmem file.  91360b02ab48 ("ashmem: use vfs_llseek()") changed
this from directly calling the file's llseek() op into a VFS
layer call.  This also adds a check for the FMODE_LSEEK bit, so
without that bit ashmem_llseek() now always fails with -ESPIPE.

Fixes: 91360b02ab48 ("ashmem: use vfs_llseek()")
Signed-off-by: Shuxiao Zhang <zhangshuxiao@xiaomi.com>
Tested-by: Greg Hackmann <ghackmann@google.com>
Cc: stable <stable@vger.kernel.org> # 3.18+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Linus Torvalds [Sat, 8 Apr 2017 08:42:05 +0000 (01:42 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

Pull sparc fixes from David Miller:
 "Several fixes here, mostly having to due with either build errors or
  memory corruptions depending upon whether you have THP enabled or not"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: remove unused wp_works_ok macro
  sparc32: Export vac_cache_size to fix build error
  sparc64: Fix memory corruption when THP is enabled
  sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
  arch/sparc: Avoid DCTI Couples
  sparc64: kern_addr_valid regression
  sparc64: Add support for 2G hugepages
  sparc64: Fix size check in huge_pte_alloc

7 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sat, 8 Apr 2017 08:39:43 +0000 (01:39 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "ARM:
   - Fix a problem with GICv3 userspace save/restore
   - Clarify GICv2 userspace save/restore ABI
   - Be more careful in clearing GIC LRs
   - Add missing synchronization primitive to our MMU handling code

  PPC:
   - Check for a NULL return from kzalloc

  s390:
   - Prevent translation exception errors on valid page tables for the
     instruction-exection-protection support

  x86:
   - Fix Page-Modification Logging when running a nested guest"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
  KVM: nVMX: initialize PML fields in vmcs02
  KVM: nVMX: do not leak PML full vmexit to L1
  KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI
  KVM: arm64: Ensure LRs are clear when they should be
  kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  KVM: s390: remove change-recording override support
  arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
  arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm

7 years agoMerge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit
Linus Torvalds [Sat, 8 Apr 2017 08:37:25 +0000 (01:37 -0700)]
Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit

Pull audit cleanup from Paul Moore:
 "A week later than I had hoped, but as promised, here is the audit
  uninline-fix we talked about during the last audit pull request.

  The patch is slightly different than what we originally discussed as
  it made more sense to keep the audit_signal_info() function in
  auditsc.c rather than move it and bunch of other related
  variables/definitions into audit.c/audit.h.

  At some point in the future I need to look at how the audit code is
  organized across kernel/audit*, I suspect we could do things a bit
  better, but it doesn't seem like a -rc release is a good place for
  that ;)

  Regardless, this patch passes our tests without problem and looks good
  for v4.11"

* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
  audit: move audit_signal_info() into kernel/auditsc.c

7 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 8 Apr 2017 08:35:32 +0000 (01:35 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: move pcp and lru-pcp draining into single wq
  mailmap: update Yakir Yang email address
  mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
  dax: fix radix tree insertion race
  mm, thp: fix setting of defer+madvise thp defrag mode
  ptrace: fix PTRACE_LISTEN race corrupting task->state
  vmlinux.lds: add missing VMLINUX_SYMBOL macros
  mm/page_alloc.c: fix print order in show_free_areas()
  userfaultfd: report actual registered features in fdinfo
  mm: fix page_vma_mapped_walk() for ksm pages

7 years agomm: move pcp and lru-pcp draining into single wq
Michal Hocko [Fri, 7 Apr 2017 23:05:05 +0000 (16:05 -0700)]
mm: move pcp and lru-pcp draining into single wq

We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
per cpu lru caches.  This seems more than necessary because both can run
on a single WQ.  Both do not block on locks requiring a memory
allocation nor perform any allocations themselves.  We will save one
rescuer thread this way.

On the other hand drain_all_pages() queues work on the system wq which
doesn't have rescuer and so this depend on memory allocation (when all
workers are stuck allocating and new ones cannot be created).

Initially we thought this would be more of a theoretical problem but
Hugh Dickins has reported:

: 4.11-rc has been giving me hangs after hours of swapping load.  At
: first they looked like memory leaks ("fork: Cannot allocate memory");
: but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
: before looking at /proc/meminfo one time, and the stat_refresh stuck
: in D state, waiting for completion of flush_work like many kworkers.
: kthreadd waiting for completion of flush_work in drain_all_pages().

This worker should be using WQ_RECLAIM as well in order to guarantee a
forward progress.  We can reuse the same one as for lru draining and
vmstat.

Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Yang Li <pku.leo@gmail.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomailmap: update Yakir Yang email address
Jeffy Chen [Fri, 7 Apr 2017 23:05:02 +0000 (16:05 -0700)]
mailmap: update Yakir Yang email address

Set current email address to replace previous employers email addresses.

Link: http://lkml.kernel.org/r/1491450722-6633-1-git-send-email-jeffy.chen@rock-chips.com
Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
David Rientjes [Fri, 7 Apr 2017 23:05:00 +0000 (16:05 -0700)]
mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()

We got need_resched() warnings in swap_cgroup_swapoff() because
swap_cgroup_ctrl[type].length is particularly large.

Reschedule when needed.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704061315270.80559@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agodax: fix radix tree insertion race
Ross Zwisler [Fri, 7 Apr 2017 23:04:57 +0000 (16:04 -0700)]
dax: fix radix tree insertion race

While running generic/340 in my test setup I hit the following race.  It
can happen with kernels that support FS DAX PMDs, so v4.10 thru
v4.11-rc5.

Thread 1 Thread 2
-------- --------
dax_iomap_pmd_fault()
  grab_mapping_entry()
    spin_lock_irq()
    get_unlocked_mapping_entry()
    'entry' is NULL, can't call lock_slot()
    spin_unlock_irq()
    radix_tree_preload()
dax_iomap_pmd_fault()
  grab_mapping_entry()
    spin_lock_irq()
    get_unlocked_mapping_entry()
    ...
    lock_slot()
    spin_unlock_irq()
  dax_pmd_insert_mapping()
    <inserts a PMD mapping>
    spin_lock_irq()
    __radix_tree_insert() fails with -EEXIST
    <fall back to 4k fault, and die horribly
     when inserting a 4k entry where a PMD exists>

The issue is that we have to drop mapping->tree_lock while calling
radix_tree_preload(), but since we didn't have a radix tree entry to
lock (unlike in the pmd_downgrade case) we have no protection against
Thread 2 coming along and inserting a PMD at the same index.  For 4k
entries we handled this with a special-case response to -EEXIST coming
from the __radix_tree_insert(), but this doesn't save us for PMDs
because the -EEXIST case can also mean that we collided with a 4k entry
in the radix tree at a different index, but one that is covered by our
PMD range.

So, correctly handle both the 4k and 2M collision cases by explicitly
re-checking the radix tree for an entry at our index once we reacquire
mapping->tree_lock.

This patch has made it through a clean xfstests run with the current
v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a
loop.  It used to fail within the first 10 iterations.

Link: http://lkml.kernel.org/r/20170406212944.2866-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org> [4.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm, thp: fix setting of defer+madvise thp defrag mode
David Rientjes [Fri, 7 Apr 2017 23:04:54 +0000 (16:04 -0700)]
mm, thp: fix setting of defer+madvise thp defrag mode

Setting thp defrag mode of "defer+madvise" actually sets "defer" in the
kernel due to the name similarity and the out-of-order way the string is
checked in defrag_store().

Check the string in the correct order so that
TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG is set appropriately for
"defer+madvise".

Fixes: 21440d7eb904 ("mm, thp: add new defer+madvise defrag option")
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704051814420.137626@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoptrace: fix PTRACE_LISTEN race corrupting task->state
bsegall@google.com [Fri, 7 Apr 2017 23:04:51 +0000 (16:04 -0700)]
ptrace: fix PTRACE_LISTEN race corrupting task->state

In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against
__TASK_TRACED.  If this races with the ptrace_unfreeze_traced at the end
of a PTRACE_LISTEN, this can wake the task /after/ the check against
__TASK_TRACED, but before the reset of state to TASK_TRACED.  This
causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup
against TRACED while the task is still on the rq wake_list, corrupting
it.

Oleg said:
 "The kernel can crash or this can lead to other hard-to-debug problems.
  In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced()
  assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the
  contract. Obviusly it is very wrong to manipulate task->state if this
  task is already running, or WAKING, or it sleeps again"

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL")
Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com
Signed-off-by: Ben Segall <bsegall@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agovmlinux.lds: add missing VMLINUX_SYMBOL macros
Jessica Yu [Fri, 7 Apr 2017 23:04:48 +0000 (16:04 -0700)]
vmlinux.lds: add missing VMLINUX_SYMBOL macros

When __{start,end}_ro_after_init is referenced from C code, we run into
the following build errors on blackfin:

  kernel/extable.c:169: undefined reference to `__start_ro_after_init'
  kernel/extable.c:169: undefined reference to `__end_ro_after_init'

The build error is due to the fact that blackfin is one of the few
arches that prepends an underscore '_' to all symbols defined in C.

Fix this by wrapping __{start,end}_ro_after_init in vmlinux.lds.h with
VMLINUX_SYMBOL(), which adds the necessary prefix for arches that have
HAVE_UNDERSCORE_SYMBOL_PREFIX.

Link: http://lkml.kernel.org/r/1491259387-15869-1-git-send-email-jeyu@redhat.com
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm/page_alloc.c: fix print order in show_free_areas()
Alexander Polakov [Fri, 7 Apr 2017 23:04:45 +0000 (16:04 -0700)]
mm/page_alloc.c: fix print order in show_free_areas()

Fixes: 11fb998986a72a ("mm: move most file-based accounting to the node")
Link: http://lkml.kernel.org/r/1490377730.30219.2.camel@beget.ru
Signed-off-by: Alexander Polyakov <apolyakov@beget.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agouserfaultfd: report actual registered features in fdinfo
Mike Rapoport [Fri, 7 Apr 2017 23:04:42 +0000 (16:04 -0700)]
userfaultfd: report actual registered features in fdinfo

fdinfo for userfault file descriptor reports UFFD_API_FEATURES.  Up
until recently, the UFFD_API_FEATURES was defined as 0, therefore
corresponding field in fdinfo always contained zero.  Now, with
introduction of several additional features, UFFD_API_FEATURES is not
longer 0 and it seems better to report actual features requested for the
userfaultfd object described by the fdinfo.

First, the applications that were using userfault will still see zero at
the features field in fdinfo.  Next, reporting actual features rather
than available features, gives clear indication of what userfault
features are used by an application.

Link: http://lkml.kernel.org/r/1491140181-22121-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agomm: fix page_vma_mapped_walk() for ksm pages
Hugh Dickins [Fri, 7 Apr 2017 23:04:39 +0000 (16:04 -0700)]
mm: fix page_vma_mapped_walk() for ksm pages

Doug Smythies reports oops with KSM in this backtrace, I've been seeing
the same:

  page_vma_mapped_walk+0xe6/0x5b0
  page_referenced_one+0x91/0x1a0
  rmap_walk_ksm+0x100/0x190
  rmap_walk+0x4f/0x60
  page_referenced+0x149/0x170
  shrink_active_list+0x1c2/0x430
  shrink_node_memcg+0x67a/0x7a0
  shrink_node+0xe1/0x320
  kswapd+0x34b/0x720

Just as observed in commit 4b0ece6fa016 ("mm: migrate: fix
remove_migration_pte() for ksm pages"), you cannot use page->index
calculations on ksm pages.

page_vma_mapped_walk() is relying on __vma_address(), where a ksm page
can lead it off the end of the page table, and into whatever nonsense is
in the next page, ending as an oops inside check_pte()'s pte_page().

KSM tells page_vma_mapped_walk() exactly where to look for the page, it
does not need any page->index calculation: and that's so also for all
the normal and file and anon pages - just not for THPs and their
subpages.  Get out early in most cases: instead of a PageKsm test, move
down the earlier not-THP-page test, as suggested by Kirill.

I'm also slightly worried that this loop can stray into other vmas, so
added a vm_end test to prevent surprises; though I have not imagined
anything worse than a very contrived case, in which a page mlocked in
the next vma might be reclaimed because it is not mlocked in this vma.

Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1704031104400.1118@eggly.anvils
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoorangefs: move features validation to fix filesystem hang
Martin Brandenburg [Thu, 6 Apr 2017 22:11:00 +0000 (18:11 -0400)]
orangefs: move features validation to fix filesystem hang

Without this fix (and another to the userspace component itself
described later), the kernel will be unable to process any OrangeFS
requests after the userspace component is restarted (due to a crash or
at the administrator's behest).

The bug here is that inside orangefs_remount, the orangefs_request_mutex
is locked.  When the userspace component restarts while the filesystem
is mounted, it sends a ORANGEFS_DEV_REMOUNT_ALL ioctl to the device,
which causes the kernel to send it a few requests aimed at synchronizing
the state between the two.  While this is happening the
orangefs_request_mutex is locked to prevent any other requests going
through.

This is only half of the bugfix.  The other half is in the userspace
component which outright ignores(!) requests made before it considers
the filesystem remounted, which is after the ioctl returns.  Of course
the ioctl doesn't return until after the userspace component responds to
the request it ignores.  The userspace component has been changed to
allow ORANGEFS_VFS_OP_FEATURES regardless of the mount status.

Mike Marshall says:
 "I've tested this patch against the fixed userspace part. This patch is
  real important, I hope it can make it into 4.11...

  Here's what happens when the userspace daemon is restarted, without
  the patch:

    =============================================
    [ INFO: possible recursive locking detected ]
    [   4.10.0-00007-ge98bdb3 #1 Not tainted    ]
    ---------------------------------------------
    pvfs2-client-co/29032 is trying to acquire lock:
     (orangefs_request_mutex){+.+.+.}, at: service_operation+0x3c7/0x7b0 [orangefs]
                  but task is already holding lock:
     (orangefs_request_mutex){+.+.+.}, at: dispatch_ioctl_command+0x1bf/0x330 [orangefs]

    CPU: 0 PID: 29032 Comm: pvfs2-client-co Not tainted 4.10.0-00007-ge98bdb3 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
    Call Trace:
     __lock_acquire+0x7eb/0x1290
     lock_acquire+0xe8/0x1d0
     mutex_lock_killable_nested+0x6f/0x6e0
     service_operation+0x3c7/0x7b0 [orangefs]
     orangefs_remount+0xea/0x150 [orangefs]
     dispatch_ioctl_command+0x227/0x330 [orangefs]
     orangefs_devreq_ioctl+0x29/0x70 [orangefs]
     do_vfs_ioctl+0xa3/0x6e0
     SyS_ioctl+0x79/0x90"

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoMerge tag 'pci-v4.11-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaa...
Linus Torvalds [Fri, 7 Apr 2017 19:26:36 +0000 (12:26 -0700)]
Merge tag 'pci-v4.11-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - fix ThunderX legacy firmware resources

 - fix ARTPEC-6 and DesignWare platform driver NULL pointer dereferences

 - fix HiSilicon link error

* tag 'pci-v4.11-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: dwc: Fix dw_pcie_ops NULL pointer dereference
  PCI: dwc: Select PCI_HOST_COMMON for hisi
  PCI: thunder-pem: Fix legacy firmware PEM-specific resources

7 years agoblk-mq: Restart a single queue if tag sets are shared
Bart Van Assche [Fri, 7 Apr 2017 18:40:09 +0000 (12:40 -0600)]
blk-mq: Restart a single queue if tag sets are shared

To improve scalability, if hardware queues are shared, restart
a single hardware queue in round-robin fashion. Rename
blk_mq_sched_restart_queues() to reflect the new semantics.
Remove blk_mq_sched_mark_restart_queue() because this function
has no callers. Remove flag QUEUE_FLAG_RESTART because this
patch removes the code that uses this flag.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agodm rq: Avoid that request processing stalls sporadically
Bart Van Assche [Fri, 7 Apr 2017 18:16:54 +0000 (11:16 -0700)]
dm rq: Avoid that request processing stalls sporadically

While running the srp-test software I noticed that request
processing stalls sporadically at the beginning of a test, namely
when mkfs is run against a dm-mpath device. Every time when that
happened the following command was sufficient to resume request
processing:

    echo run >/sys/kernel/debug/block/dm-0/state

This patch avoids that such request processing stalls occur. The
test I ran is as follows:

    while srp-test/run_tests -d -r 30 -t 02-mq; do :; done

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoscsi: Avoid that SCSI queues get stuck
Bart Van Assche [Fri, 7 Apr 2017 18:16:53 +0000 (11:16 -0700)]
scsi: Avoid that SCSI queues get stuck

If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block
driver that implements that function is responsible for rerunning the
hardware queue once requests can be queued again successfully.

commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped
queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq()
for the BLK_MQ_RQ_QUEUE_BUSY case. Hence change all calls to functions
that are intended to rerun a busy queue such that these examine all
hardware queues instead of only stopped queues.

Since no other functions than scsi_internal_device_block() and
scsi_internal_device_unblock() should ever stop or restart a SCSI
queue, change the blk_mq_delay_queue() call into a
blk_mq_delay_run_hw_queue() call.

Fixes: commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped queues")
Fixes: commit 7e79dadce222 ("blk-mq: stop hardware queue in blk_mq_delay_queue()")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Long Li <longli@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblk-mq: Introduce blk_mq_delay_run_hw_queue()
Bart Van Assche [Fri, 7 Apr 2017 18:16:52 +0000 (11:16 -0700)]
blk-mq: Introduce blk_mq_delay_run_hw_queue()

Introduce a function that runs a hardware queue unconditionally
after a delay. Note: there is already a function that stops and
restarts a hardware queue after a delay, namely blk_mq_delay_queue().

This function will be used in the next patch in this series.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Long Li <longli@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoMerge tag 'dm-4.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device...
Linus Torvalds [Fri, 7 Apr 2017 17:47:20 +0000 (10:47 -0700)]
Merge tag 'dm-4.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - two stable fixes for the verity target's FEC support

 - a stable fix for raid target's raid1 support (when no bitmap is used)

 - a 4.11 cache metadata v2 format fix to properly test blocks are clean

* tag 'dm-4.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm verity fec: fix bufio leaks
  dm raid: fix NULL pointer dereference for raid1 without bitmap
  dm cache metadata: fix metadata2 format's blocks_are_clean_separate_dirty
  dm verity fec: limit error correction recursion

7 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 7 Apr 2017 17:43:22 +0000 (10:43 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "We've got a regression fix for the signal raised when userspace makes
  an unsupported unaligned access and a revert of the contiguous
  (hugepte) support for hugetlb, which has once again been found to be
  broken. One day, maybe, we'll get it right.

  Summary:

   - restore previous SIGBUS behaviour for unhandled unaligned user
     accesses

   - revert broken support for the contiguous bit in hugetlb (again...)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  Revert "Revert "arm64: hugetlb: partial revert of 66b3923a1a0f""
  arm64: mm: unaligned access by user-land should be received as SIGBUS

7 years agoMerge tag 'metag-for-v4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 7 Apr 2017 17:11:53 +0000 (10:11 -0700)]
Merge tag 'metag-for-v4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag

Pull metag usercopy fixes from James Hogan:
 "Metag usercopy fault handling fixes

  These patches fix a bunch of longstanding (some over a decade old)
  metag user copy fault handling bugs. Thanks go to Al Viro for spotting
  some of the questionable code in the first place"

* tag 'metag-for-v4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  metag/usercopy: Add missing fixups
  metag/usercopy: Fix src fixup in from user rapf loops
  metag/usercopy: Set flags before ADDZ
  metag/usercopy: Zero rest of buffer from copy_from_user
  metag/usercopy: Add early abort to copy_to_user
  metag/usercopy: Fix alignment error checking
  metag/usercopy: Drop unused macros

7 years agoMerge tag 'acpi-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 7 Apr 2017 17:01:45 +0000 (10:01 -0700)]
Merge tag 'acpi-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "This fixes a core device enumeration code change made in 4.10, in
  order to address a reported issue, that went too far.

  Specifics:

   - Refine the check for the existence of _HID in find_child_checks()
     so that it doesn't trigger for device objects with device IDs made
     up by the kernel (Rafael Wysocki)"

* tag 'acpi-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / scan: Prefer devices without _HID for _ADR matching

7 years agoMerge tag 'for-linus-4.11b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 7 Apr 2017 16:58:01 +0000 (09:58 -0700)]
Merge tag 'for-linus-4.11b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fix from Juergen Gross:
 "A fix for error path cleanup in the xenbus handler"

* tag 'for-linus-4.11b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xenbus: remove transaction holder from list before freeing

7 years agosysctl: don't print negative flag for proc_douintvec
Liping Zhang [Fri, 7 Apr 2017 15:51:06 +0000 (23:51 +0800)]
sysctl: don't print negative flag for proc_douintvec

I saw some very confusing sysctl output on my system:
  # cat /proc/sys/net/core/xfrm_aevent_rseqth
  -2
  # cat /proc/sys/net/core/xfrm_aevent_etime
  -10
  # cat /proc/sys/net/ipv4/tcp_notsent_lowat
  -4294967295

Because we forget to set the *negp flag in proc_douintvec, so it will
become a garbage value.

Since the value related to proc_douintvec is always an unsigned integer,
so we can set *negp to false explictily to fix this issue.

Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agosysctl: add sanity check for proc_douintvec
Liping Zhang [Fri, 7 Apr 2017 15:51:05 +0000 (23:51 +0800)]
sysctl: add sanity check for proc_douintvec

Commit e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32
fields") introduced the proc_douintvec helper function, but it forgot to
add the related sanity check when doing register_sysctl_table.  So add
it now.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7 years agoperf annotate s390: Fix perf annotate error -95 (4.10 regression)
Christian Borntraeger [Thu, 6 Apr 2017 07:51:51 +0000 (09:51 +0200)]
perf annotate s390: Fix perf annotate error -95 (4.10 regression)

since 4.10 perf annotate exits on s390 with an "unknown error -95".
Turns out that commit 786c1b51844d ("perf annotate: Start supporting
cross arch annotation") added a hard requirement for architecture
support when objdump is used but only provided x86 and arm support.
Meanwhile power was added so lets add s390 as well.

While at it make sure to implement the branch and jump types.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-s390 <linux-s390@vger.kernel.org>
Cc: stable@kernel.org # v4.10+
Fixes: 786c1b51844 "perf annotate: Start supporting cross arch annotation"
Link: http://lkml.kernel.org/r/1491465112-45819-2-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
7 years agoblk-mq: remap queues when adding/removing hardware queues
Omar Sandoval [Fri, 7 Apr 2017 14:53:11 +0000 (08:53 -0600)]
blk-mq: remap queues when adding/removing hardware queues

blk_mq_update_nr_hw_queues() used to remap hardware queues, which is the
behavior that drivers expect. However, commit 4e68a011428a changed
blk_mq_queue_reinit() to not remap queues for the case of CPU
hotplugging, inadvertently making blk_mq_update_nr_hw_queues() not remap
queues as well. This breaks, for example, NBD's multi-connection mode,
leaving the added hardware queues unused. Fix it by making
blk_mq_update_nr_hw_queues() explicitly remap the queues.

Fixes: 4e68a011428a ("blk-mq: don't redistribute hardware queues on a CPU hotplug event")
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblk-mq-sched: fix crash in switch error path
Omar Sandoval [Fri, 7 Apr 2017 14:52:27 +0000 (08:52 -0600)]
blk-mq-sched: fix crash in switch error path

In elevator_switch(), if blk_mq_init_sched() fails, we attempt to fall
back to the original scheduler. However, at this point, we've already
torn down the original scheduler's tags, so this causes a crash. Doing
the fallback like the legacy elevator path is much harder for mq, so fix
it by just falling back to none, instead.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblk-mq-sched: set up scheduler tags when bringing up new queues
Omar Sandoval [Wed, 5 Apr 2017 19:01:31 +0000 (12:01 -0700)]
blk-mq-sched: set up scheduler tags when bringing up new queues

If a new hardware queue is added at runtime, we don't allocate scheduler
tags for it, leading to a crash. This hooks up the scheduler framework
to blk_mq_{init,exit}_hctx() to make sure everything gets properly
initialized/freed.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblk-mq-sched: refactor scheduler initialization
Omar Sandoval [Wed, 5 Apr 2017 19:01:30 +0000 (12:01 -0700)]
blk-mq-sched: refactor scheduler initialization

Preparation cleanup for the next couple of fixes, push
blk_mq_sched_setup() and e->ops.mq.init_sched() into a helper.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoblk-mq: use the right hctx when getting a driver tag fails
Omar Sandoval [Fri, 7 Apr 2017 14:56:26 +0000 (08:56 -0600)]
blk-mq: use the right hctx when getting a driver tag fails

While dispatching requests, if we fail to get a driver tag, we mark the
hardware queue as waiting for a tag and put the requests on a
hctx->dispatch list to be run later when a driver tag is freed. However,
blk_mq_dispatch_rq_list() may dispatch requests from multiple hardware
queues if using a single-queue scheduler with a multiqueue device. If
blk_mq_get_driver_tag() fails, it doesn't update the hardware queue we
are processing. This means we end up using the hardware queue of the
previous request, which may or may not be the same as that of the
current request. If it isn't, the wrong hardware queue will end up
waiting for a tag, and the requests will be on the wrong dispatch list,
leading to a hang.

The fix is twofold:

1. Make sure we save which hardware queue we were trying to get a
   request for in blk_mq_get_driver_tag() regardless of whether it
   succeeds or not.
2. Make blk_mq_dispatch_rq_list() take a request_queue instead of a
   blk_mq_hw_queue to make it clear that it must handle multiple
   hardware queues, since I've already messed this up on a couple of
   occasions.

This didn't appear in testing with nvme and mq-deadline because nvme has
more driver tags than the default number of scheduler tags. However,
with the blk_mq_update_nr_hw_queues() fix, it showed up with nbd.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
7 years agoReset TreeId to zero on SMB2 TREE_CONNECT
Jan-Marek Glogowski [Mon, 20 Feb 2017 11:25:58 +0000 (12:25 +0100)]
Reset TreeId to zero on SMB2 TREE_CONNECT

Currently the cifs module breaks the CIFS specs on reconnect as
described in http://msdn.microsoft.com/en-us/library/cc246529.aspx:

"TreeId (4 bytes): Uniquely identifies the tree connect for the
command. This MUST be 0 for the SMB2 TREE_CONNECT Request."

Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Tested-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
7 years agoCIFS: Fix build failure with smb2
Tobias Regnery [Thu, 30 Mar 2017 10:34:14 +0000 (12:34 +0200)]
CIFS: Fix build failure with smb2

I saw the following build error during a randconfig build:

fs/cifs/smb2ops.c: In function 'smb2_new_lease_key':
fs/cifs/smb2ops.c:1104:2: error: implicit declaration of function 'generate_random_uuid' [-Werror=implicit-function-declaration]

Explicit include the right header to fix this issue.

Signed-off-by: Tobias Regnery <tobias.regnery@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
7 years agoIntroduce cifs_copy_file_range()
Sachin Prabhu [Fri, 10 Feb 2017 10:33:51 +0000 (16:03 +0530)]
Introduce cifs_copy_file_range()

The earlier changes to copy range for cifs unintentionally disabled the more
common form of server side copy.

The patch introduces the file_operations helper cifs_copy_file_range()
which is used by the syscall copy_file_range. The new file operations
helper allows us to perform server side copies for SMB2.0 and 2.1
servers as well as SMB 3.0+ servers which do not support the ioctl
FSCTL_DUPLICATE_EXTENTS_TO_FILE.

The new helper uses the ioctl FSCTL_SRV_COPYCHUNK_WRITE to perform
server side copies. The helper is called by vfs_copy_file_range() only
once an attempt to clone the file using the ioctl
FSCTL_DUPLICATE_EXTENTS_TO_FILE has failed.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
7 years agoSMB3: Rename clone_range to copychunk_range
Sachin Prabhu [Tue, 4 Apr 2017 07:12:04 +0000 (02:12 -0500)]
SMB3: Rename clone_range to copychunk_range

Server side copy is one of the most important mechanisms smb2/smb3
supports and it was unintentionally disabled for most use cases.

Renaming calls to reflect the underlying smb2 ioctl called. This is
similar to the name duplicate_extents used for a similar ioctl which is
also used to duplicate files by reusing fs blocks. The name change is to
avoid confusion.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
7 years agoHandle mismatched open calls
Sachin Prabhu [Fri, 3 Mar 2017 23:41:38 +0000 (15:41 -0800)]
Handle mismatched open calls

A signal can interrupt a SendReceive call which result in incoming
responses to the call being ignored. This is a problem for calls such as
open which results in the successful response being ignored. This
results in an open file resource on the server.

The patch looks into responses which were cancelled after being sent and
in case of successful open closes the open fids.

For this patch, the check is only done in SendReceive2()

RH-bz: 1403319

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Cc: Stable <stable@vger.kernel.org>
7 years agoMerge branch 'acpi-scan-fixes'
Rafael J. Wysocki [Fri, 7 Apr 2017 11:48:26 +0000 (13:48 +0200)]
Merge branch 'acpi-scan-fixes'

* acpi-scan-fixes:
  ACPI / scan: Prefer devices without _HID for _ADR matching

7 years agoRevert "Revert "arm64: hugetlb: partial revert of 66b3923a1a0f""
Will Deacon [Fri, 31 Mar 2017 11:23:43 +0000 (12:23 +0100)]
Revert "Revert "arm64: hugetlb: partial revert of 66b3923a1a0f""

The use of the contiguous bit by our hugetlb implementation violates
the break-before-make requirements of the architecture and can lead to
silent data corruption or TLB conflict aborts. Once again, disable these
hugetlb sizes whilst it gets worked out.

This reverts commit ab2e1b89230fa80328262c91d2d0a539a2790d6f.

Conflicts:
arch/arm64/mm/hugetlbpage.c

Signed-off-by: Will Deacon <will.deacon@arm.com>
7 years agopowerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
Michael Ellerman [Thu, 6 Apr 2017 13:34:38 +0000 (23:34 +1000)]
powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()

In crc32c_vpmsum() we call enable_kernel_altivec() without first
disabling preemption, which is not allowed:

  WARNING: CPU: 9 PID: 2949 at ../arch/powerpc/kernel/process.c:277 enable_kernel_altivec+0x100/0x120
  Modules linked in: dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c vmx_crypto ...
  CPU: 9 PID: 2949 Comm: docker Not tainted 4.11.0-rc5-compiler_gcc-6.3.1-00033-g308ac7563944 #381
  ...
  NIP [c00000000001e320] enable_kernel_altivec+0x100/0x120
  LR [d000000003df0910] crc32c_vpmsum+0x108/0x150 [crc32c_vpmsum]
  Call Trace:
    0xc138fd09 (unreliable)
    crc32c_vpmsum+0x108/0x150 [crc32c_vpmsum]
    crc32c_vpmsum_update+0x3c/0x60 [crc32c_vpmsum]
    crypto_shash_update+0x88/0x1c0
    crc32c+0x64/0x90 [libcrc32c]
    dm_bm_checksum+0x48/0x80 [dm_persistent_data]
    sb_check+0x84/0x120 [dm_thin_pool]
    dm_bm_validate_buffer.isra.0+0xc0/0x1b0 [dm_persistent_data]
    dm_bm_read_lock+0x80/0xf0 [dm_persistent_data]
    __create_persistent_data_objects+0x16c/0x810 [dm_thin_pool]
    dm_pool_metadata_open+0xb0/0x1a0 [dm_thin_pool]
    pool_ctr+0x4cc/0xb60 [dm_thin_pool]
    dm_table_add_target+0x16c/0x3c0
    table_load+0x184/0x400
    ctl_ioctl+0x2f0/0x560
    dm_ctl_ioctl+0x38/0x50
    do_vfs_ioctl+0xd8/0x920
    SyS_ioctl+0x68/0xc0
    system_call+0x38/0xfc

It used to be sufficient just to call pagefault_disable(), because that
also disabled preemption. But the two were decoupled in commit 8222dbe21e79
("sched/preempt, mm/fault: Decouple preemption from the page fault
logic") in mid 2015.

So add the missing preempt_disable/enable(). We should also call
disable_kernel_fp(), although it does nothing by default, there is a
debug switch to make it active and all enables should be paired with
disables.

Fixes: 6dd7a82cc54e ("crypto: powerpc - Add POWER8 optimised crc32c")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
7 years agopinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()
Tony Lindgren [Thu, 30 Mar 2017 16:16:39 +0000 (09:16 -0700)]
pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()

Recent pinctrl changes to allow dynamic allocation of pins exposed one
more issue with the pinctrl pins claimed early by the controller itself.
This caused a regression for IMX6 pinctrl hogs.

Before enabling the pin controller driver we need to wait until it has
been properly initialized, then claim the hogs, and only then enable it.

To fix the regression, split the code into pinctrl_claim_hogs() and
pinctrl_enable(). And then let's require that pinctrl_enable() is always
called by the pin controller driver when ready after calling
pinctrl_register_and_init().

Depends-on: 950b0d91dc10 ("pinctrl: core: Fix regression caused by delayed
work for hogs")
Fixes: df61b366af26 ("pinctrl: core: Use delayed work for hogs")
Fixes: e566fc11ea76 ("pinctrl: imx: use generic pinctrl helpers for
managing groups")
Cc: Haojian Zhuang <haojian.zhuang@linaro.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Nishanth Menon <nm@ti.com>
Cc: Shawn Guo <shawnguo@kernel.org>
Cc: Stefan Agner <stefan@agner.ch>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Gary Bisson <gary.bisson@boundarydevices.com>
Tested-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
7 years agoMerge tag 'xfs-4.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Thu, 6 Apr 2017 21:42:05 +0000 (14:42 -0700)]
Merge tag 'xfs-4.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull XFS fixes from Darrick Wong:
 "Here are three more fixes for 4.11.

  The first one reworks the inline directory verifier to check the
  working copy of the directory metadata and to avoid triggering a
  periodic crash in xfs/348. The second patch fixes a regression in hole
  punching at EOF that corrupts files; and the third patch closes a
  kernel memory disclosure bug.

  Summary:

   - rework the inline directory verifier to avoid crashes on disk
     corruption

   - don't change file size when punching holes w/ KEEP_SIZE

   - close a kernel memory exposure bug"

* tag 'xfs-4.11-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix kernel memory exposure problems
  xfs: Honor FALLOC_FL_KEEP_SIZE when punching ends of files
  xfs: rework the inline directory verifiers

7 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Thu, 6 Apr 2017 20:16:34 +0000 (13:16 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
 "Lantiq:
    - Fix adding xbar resoures causing a panic

  Loongson3:
    - Some Loongson 3A don't identify themselves as having an FTLB so
      hardwire that knowledge into CPU probing.
    - Handle Loongson 3 TLB peculiarities in the fast path of the RDHWR
      emulation.
    - Fix invalid FTLB entries with huge page on VTLB+FTLB platforms
    - Add missing calculation of S-cache and V-cache cache-way size

  Ralink:
    - Fix typos in rt3883 pinctrl data

  Generic:
    - Force o32 fp64 support on 32bit MIPS64r6 kernels
    - Yet another build fix after the linux/sched.h changes
    - Wire up statx system call
    - Fix stack unwinding after introduction of IRQ stack
    - Fix spinlock code to build even for microMIPS with recent binutils

  SMP-CPS:
    - Fix retrieval of VPE mask on big endian CPUs"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: IRQ Stack: Unwind IRQ stack onto task stack
  MIPS: c-r4k: Fix Loongson-3's vcache/scache waysize calculation
  MIPS: Flush wrong invalid FTLB entry for huge page
  MIPS: Check TLB before handle_ri_rdhwr() for Loongson-3
  MIPS: Add MIPS_CPU_FTLB for Loongson-3A R2
  MIPS: Lantiq: fix missing xbar kernel panic
  MIPS: smp-cps: Fix retrieval of VPE mask on big endian CPUs
  MIPS: Wire up statx system call
  MIPS: Include asm/ptrace.h now linux/sched.h doesn't
  MIPS: ralink: Fix typos in rt3883 pinctrl
  MIPS: End spinlocks with .insn
  MIPS: Force o32 fp64 support on 32bit MIPS64r6 kernels

7 years agoMerge tag 'trace-v4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Thu, 6 Apr 2017 20:12:12 +0000 (13:12 -0700)]
Merge tag 'trace-v4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Wei Yongjun fixed a long standing bug in the ring buffer startup test.

  If for some unknown reason, the kthread that is created fails to be
  created, the return from kthread_create() is an PTR_ERR and not a
  NULL. The test incorrectly checks for NULL instead of an error"

* tag 'trace-v4.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  ring-buffer: Fix return value check in test_ringbuffer()

7 years agosparc: remove unused wp_works_ok macro
Mathias Krause [Wed, 5 Apr 2017 19:24:20 +0000 (21:24 +0200)]
sparc: remove unused wp_works_ok macro

It's unused for ages, used to be required for ksyms.c back in the v1.1
times.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc32: Export vac_cache_size to fix build error
Guenter Roeck [Sat, 1 Apr 2017 20:47:44 +0000 (13:47 -0700)]
sparc32: Export vac_cache_size to fix build error

sparc32:allmodconfig fails to build with the following error.

ERROR: "vac_cache_size" [drivers/infiniband/sw/rxe/rdma_rxe.ko] undefined!

Fixes: cb8864559631 ("infiniband: Fix alignment of mmap cookies ...")
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc64: Fix memory corruption when THP is enabled
Nitin Gupta [Fri, 31 Mar 2017 22:48:53 +0000 (15:48 -0700)]
sparc64: Fix memory corruption when THP is enabled

The memory corruption was happening due to incorrect
TLB/TSB flushing of hugepages.

Reported-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agosparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
Tom Hromatka [Fri, 31 Mar 2017 22:31:42 +0000 (16:31 -0600)]
sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()

This commit moves sparc64's prototype of pmd_write() outside
of the CONFIG_TRANSPARENT_HUGEPAGE ifdef.

In 2013, commit a7b9403f0e6d ("sparc64: Encode huge PMDs using PTE
encoding.") exposed a path where pmd_write() could be called without
CONFIG_TRANSPARENT_HUGEPAGE defined.  This can result in the panic below.

The diff is awkward to read, but the changes are straightforward.
pmd_write() was moved outside of #ifdef CONFIG_TRANSPARENT_HUGEPAGE.
Also, __HAVE_ARCH_PMD_WRITE was defined.

kernel BUG at include/asm-generic/pgtable.h:576!
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
oracle_8114_cdb(8114): Kernel bad sw trap 5 [#1]
CPU: 120 PID: 8114 Comm: oracle_8114_cdb Not tainted
4.1.12-61.7.1.el6uek.rc1.sparc64 #1
task: fff8400700a24d60 ti: fff8400700bc4000 task.ti: fff8400700bc4000
TSTATE: 0000004411e01607 TPC: 00000000004609f8 TNPC: 00000000004609fc Y:
00000005    Not tainted
TPC: <gup_huge_pmd+0x198/0x1e0>
g0: 000000000001c000 g1: 0000000000ef3954 g2: 0000000000000000 g3: 0000000000000001
g4: fff8400700a24d60 g5: fff8001fa5c10000 g6: fff8400700bc4000 g7: 0000000000000720
o0: 0000000000bc5058 o1: 0000000000000240 o2: 0000000000006000 o3: 0000000000001c00
o4: 0000000000000000 o5: 0000048000080000 sp: fff8400700bc6ab1 ret_pc: 00000000004609f0
RPC: <gup_huge_pmd+0x190/0x1e0>
l0: fff8400700bc74fc l1: 0000000000020000 l2: 0000000000002000 l3: 0000000000000000
l4: fff8001f93250950 l5: 000000000113f800 l6: 0000000000000004 l7: 0000000000000000
i0: fff8400700ca46a0 i1: bd0000085e800453 i2: 000000026a0c4000 i3: 000000026a0c6000
i4: 0000000000000001 i5: fff800070c958de8 i6: fff8400700bc6b61 i7: 0000000000460dd0
I7: <gup_pud_range+0x170/0x1a0>
Call Trace:
 [0000000000460dd0] gup_pud_range+0x170/0x1a0
 [0000000000460e84] get_user_pages_fast+0x84/0x120
 [00000000006f5a18] iov_iter_get_pages+0x98/0x240
 [00000000005fa744] do_direct_IO+0xf64/0x1e00
 [00000000005fbbc0] __blockdev_direct_IO+0x360/0x15a0
 [00000000101f74fc] ext4_ind_direct_IO+0xdc/0x400 [ext4]
 [00000000101af690] ext4_ext_direct_IO+0x1d0/0x2c0 [ext4]
 [00000000101af86c] ext4_direct_IO+0xec/0x220 [ext4]
 [0000000000553bd4] generic_file_read_iter+0x114/0x140
 [00000000005bdc2c] __vfs_read+0xac/0x100
 [00000000005bf254] vfs_read+0x54/0x100
 [00000000005bf368] SyS_pread64+0x68/0x80

Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
7 years agoMerge branch 'kvm-ppc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus...
Radim Krčmář [Thu, 6 Apr 2017 12:41:39 +0000 (14:41 +0200)]
Merge branch 'kvm-ppc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc

From: Paul Mackerras <paulus@ozlabs.org>

A check for a NULL return from kzalloc in recently-added code.

7 years agoKVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
Dan Carpenter [Fri, 17 Mar 2017 20:41:14 +0000 (23:41 +0300)]
KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl

kzalloc() won't actually fail because sizeof(*resize) is small, but
static checkers complain.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>