perf/x86/intel/pt: Add support for PTWRITE and power event tracing
The Intel PT facility grew some new functionality:
* PTWRITE packet carries the payload of the new PTWRITE instruction
that can be used to instrument Intel PT traces with user-supplied
data. Packets of this type are only generated if 'ptwrite' capability
is set and PTWEn bit is set in the event attribute's config. Flow
update packets (FUP) can be generated on PTWRITE packets if FUPonPTW
config bit is set. Setting these bits is not allowed if 'ptwrite'
capability is not set.
* PWRE, PWRX, MWAIT, EXSTOP packets communicate core power management
events. These depend on 'power_event_tracing' capability and are
enabled by setting PwrEvtEn bit in the event attribute.
Extend the driver capabilities and provide the proper sanity checks in the
event validation function.
[ tglx: Massaged changelog ]
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: vince@deater.net Cc: eranian@google.com Cc: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/20160916134819.1978-1-alexander.shishkin@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Kan Liang [Tue, 16 Aug 2016 20:09:50 +0000 (16:09 -0400)]
perf/x86/intel/uncore: Add Skylake server uncore support
This patch implements the uncore monitoring driver for Skylake server.
The uncore subsystem in Skylake server is similar to previous
server. There are some differences in config register encoding and pci
device IDs. Besides, Skylake introduces many new boxes to reflect the
MESH architecture changes.
The control registers for IIO and UPI have been extended to 64 bit. This
patch also introduces event_mask_ext to handle the high 32 bit mask.
The CHA box number could vary for different machines. This patch gets
the CHA box number by counting the CHA register space during
initialization at runtime.
Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1471378190-17276-3-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Harry Pan [Thu, 8 Sep 2016 09:08:57 +0000 (17:08 +0800)]
perf/x86/rapl: Enable Apollo Lake RAPL support
This patch enables RAPL counters (energy consumption counters)
support for Intel Apollo Lake (Goldmont) processors (Model 92):
RAPL of Goldmont, unlikes ESU increment of Silvermont/Airmont,
it likes the Haswell microarchitecture in 1/2^ESU joules and
supports power domains in PP0/PP1/PKG/RAM.
ESU and power domains refer to Intel Software Developers' Manual,
Vol. 3C, Order No. 325384, Table 35-12.
Usage example:
$ perf list
$ perf stat -a -e power/energy-cores/,power/energy-pkg/ sleep 10
Signed-off-by: Harry Pan <harry.pan@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: bp@alien8.de Cc: gs0622@gmail.com Cc: hpa@zytor.com Cc: srinivas.pandruvada@linux.intel.com Link: http://lkml.kernel.org/r/1473325738-730-1-git-send-email-harry.pan@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
At the moment, intel_bts will WARN() out if there is more than one
event writing to the same ring buffer, via SET_OUTPUT, and will only
send data from one event to a buffer.
There is no reason to have this warning in, so kill it.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160906132353.19887-6-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Since BTS doesn't have a dedicated PMI status bit, the driver needs to
take extra care to check for the condition that triggers it to avoid
spurious NMI warnings.
Regardless of the local BTS context state, the only way of knowing that
the NMI is ours is to compare the write pointer against the interrupt
threshold.
Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160906132353.19887-5-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf/x86/intel/bts: Fix confused ordering of PMU callbacks
The intel_bts driver is using a CPU-local 'started' variable to order
callbacks and PMIs and make sure that AUX transactions don't get messed
up. However, the ordering rules in regard to this variable is a complete
mess, which recently resulted in perf_fuzzer-triggered warnings and
panics.
The general ordering rule that is patch is enforcing is that this
cpu-local variable be set only when the cpu-local AUX transaction is
active; consequently, this variable is to be checked before the AUX
related bits can be touched.
Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160906132353.19887-4-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf/core: Fix aux_mmap_count vs aux_refcount order
The order of accesses to ring buffer's aux_mmap_count and aux_refcount
has to be preserved across the users, namely perf_mmap_close() and
perf_aux_output_begin(), otherwise the inversion can result in the latter
holding the last reference to the aux buffer and subsequently free'ing
it in atomic context, triggering a warning.
perf/core: Fix a race between mmap_close() and set_output() of AUX events
In the mmap_close() path we need to stop all the AUX events that are
writing data to the AUX area that we are unmapping, before we can
safely free the pages. To determine if an event needs to be stopped,
we're comparing its ->rb against the one that's getting unmapped.
However, a SET_OUTPUT ioctl may turn up inside an AUX transaction
and swizzle event::rb to some other ring buffer, but the transaction
will keep writing data to the old ring buffer until the event gets
scheduled out. At this point, mmap_close() will skip over such an
event and will proceed to free the AUX area, while it's still being
used by this event, which will set off a warning in the mmap_close()
path and cause a memory corruption.
To avoid this, always stop an AUX event before its ->rb is updated;
this will release the (potentially) last reference on the AUX area
of the buffer. If the event gets restarted, its new ring buffer will
be used. If another SET_OUTPUT comes and switches it back to the
old ring buffer that's getting unmapped, it's also fine: this
ring buffer's aux_mmap_count will be zero and AUX transactions won't
start any more.
Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160906132353.19887-2-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
The resent conversion of the cpu hotplug support in the uncore driver
introduced a regression due to the way the callbacks are invoked at
initialization time.
The old code called the prepare/starting/online function on each online cpu
as a block. The new code registers the hotplug callbacks in the core for
each state. The core invokes the callbacks at each registration on all
online cpus.
The code implicitely relied on the prepare/starting/online callbacks being
called as combo on a particular cpu, which was not obvious and completely
undocumented.
The resulting subtle wreckage happens due to the way how the uncore code
manages shared data structures for cpus which share an uncore resource in
hardware. The sharing is determined in the cpu starting callback, but the
prepare callback allocates per cpu data for the upcoming cpu because
potential sharing is unknown at this point. If the starting callback finds
a online cpu which shares the hardware resource it takes a refcount on the
percpu data of that cpu and puts the own data structure into a
'free_at_online' pointer of that shared data structure. The online callback
frees that.
With the old model this worked because in a starting callback only one non
unused structure (the one of the starting cpu) was available. The new code
allocates the data structures for all cpus when the prepare callback is
registered.
Now the starting function iterates through all online cpus and looks for a
data structure (skipping its own) which has a matching hardware id. The id
member of the data structure is initialized to 0, but the hardware id can
be 0 as well. The resulting wreckage is:
CPU0 finds a matching id on CPU1, takes a refcount on CPU1 data and puts
its own data structure into CPU1s data structure to be freed.
CPU1 skips CPU0 because the data structure is its allegedly unsued own.
It finds a matching id on CPU2, takes a refcount on CPU1 data and puts
its own data structure into CPU2s data structure to be freed.
....
Now the online callbacks are invoked.
CPU0 has a pointer to CPU1s data and frees the original CPU0 data. So
far so good.
CPU1 has a pointer to CPU2s data and frees the original CPU1 data, which
is still referenced by CPU0 ---> Booom
So there are two issues to be solved here:
1) The id field must be initialized at allocation time to a value which
cannot be a valid hardware id, i.e. -1
This prevents the above scenario, but now CPU1 and CPU2 both stick their
own data structure into the free_at_online pointer of CPU0. So we leak
CPU1s data structure.
2) Fix the memory leak described in #1
Instead of having a single pointer, use a hlist to enqueue the
superflous data structures which are then freed by the first cpu
invoking the online callback.
Ideally we should know the sharing _before_ invoking the prepare callback,
but that's way beyond the scope of this bug fix.
[ tglx: Rewrote changelog ]
Fixes: 96b2bd3866a0 ("perf/x86/amd/uncore: Convert to hotplug state machine") Reported-and-tested-by: Eric Sandeen <sandeen@sandeen.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20160909160822.lowgmkdwms2dheyv@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Merge tag 'perf-core-for-mingo-20160908' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Add branch stack / basic block info to 'perf annotate --stdio', where for
each branch, we add an asm comment after the instruction with information on
how often it was taken and predicted. See example with color output at:
- Only open an evsel in CPUs in its cpu map, fixing some use cases in
systems with multiple PMUs with different CPU maps (Mark Rutland)
- Fix handling of huge TLB maps, recognizing it as anonymous (Wang Nan)
Infrastructure changes:
- Remove the symbol filtering code, i.e. the callbacks passed to all functions
that could end up loading a DSO symtab, simplifying the code, eventually
allowing what we should have had since day one: removing the 'map' parameter
from dso__load() functions (Arnaldo Carvalho de Melo)
Arch specific build fixes:
- Fix detached tarball build on powerpc, where we were still accessing a
file outside tools/ (Ravi Bangoria)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ravi Bangoria [Wed, 31 Aug 2016 08:03:11 +0000 (13:33 +0530)]
perf powerpc: Fix build-test failure
'make -C tools/perf build-test' is failing with below log for poewrpc.
In file included from /tmp/tmp.3eEwmGlYaF/perf-4.8.0-rc4/tools/perf/perf.h:15:0,
from util/cpumap.h:8,
from util/env.c:1:
/tmp/tmp.3eEwmGlYaF/perf-4.8.0-rc4/tools/perf/perf-sys.h:23:56:
fatal error: ../../arch/powerpc/include/uapi/asm/unistd.h: No such file or directory
compilation terminated.
I bisected it and found it's failing from commit ad430729ae00 ("Remove:
kernel unistd*h files from perf's MANIFEST, not used").
Header file '../../arch/powerpc/include/uapi/asm/unistd.h' is included
only for powerpc in tools/perf/perf-sys.h.
By looking closly at commit history, I found little weird thing:
Commit f2d9cae9ea9e ("perf powerpc: Use uapi/unistd.h to fix build
error") replaced 'asm/unistd.h' with 'uapi/asm/unistd.h'
Commit d2709c7ce4c5 ("perf: Make perf build for x86 with UAPI
disintegration applied") removes all arch specific 'uapi/asm/unistd.h'
for all archs and adds generic <asm/unistd.h>.
Commit f0b9abfb0446 ("Merge branch 'linus' into perf/core") again
includes 'uapi/asm/unistd.h' for powerpc. Don't know how exactly this
happened as this change is not part of commit also.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1472630591-5089-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Fixes: ad430729ae00 ("Remove: kernel unistd*h files from perf's MANIFEST, not used") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mark Rutland [Thu, 8 Sep 2016 10:21:52 +0000 (11:21 +0100)]
perf pmu: Support alternative sysfs cpumask
The perf tools can read a cpumask file for a PMU, describing a subset of
CPUs which that PMU covers. So far this has only been used to cater for
uncore PMUs, which in practice happen to only have a single CPU
described in the mask.
Until recently, the perf tools only correctly handled cpumask containing
a single CPU, and only when monitoring in system-wide mode. For example,
prior to commit 00e727bb389359c8 ("perf stat: Balance opening and
reading events"), a mask with more than a single CPU could cause perf
stat to hang. When a CPU PMU covers a subset of CPUs, but lacks a
cpumask, perf record will fail to open events (on the cores the PMU does
not support), and gives up.
For systems with heterogeneous CPUs such as ARM big.LITTLE systems, this
presents a problem. We have a PMU for each microarchitecture (e.g. a big
PMU and a little PMU), and would like to expose a cpumask for each (so
as to allow perf record and other tools to do the right thing). However,
doing so kernel-side will cause old perf binaries to not function (e.g.
hitting the issue solved by 00e727bb389359c8), and thus commits the
cardinal sin of breaking (existing) userspace.
To address this chicken-and-egg problem, this patch adds support got a
new file, cpus, which is largely identical to the existing cpumask file.
A kernel can expose this file, knowing that new perf binaries will
correctly support it, while old perf binaries will not look for it (and
thus will not be broken).
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1473330112-28528-8-git-send-email-mark.rutland@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mark Rutland [Thu, 8 Sep 2016 10:21:51 +0000 (11:21 +0100)]
perf evlist: Only open events on CPUs an evsel permits
In systems with heterogeneous CPU PMUs, it's possible for each evsel to
cover a distinct set of CPUs, and hence the cpu_map associated with each
evsel may have a distinct idx<->id mapping. Any of these may be distinct
from the evlist's cpu map.
Events can be tied to the same fd so long as they use the same per-cpu
ringbuffer (i.e. so long as they are on the same CPU). To acquire the
correct FDs, we must compare the Linux logical IDs rather than the evsel
or evlist indices.
This path adds logic to perf_evlist__mmap_per_evsel to handle this,
translating IDs as required. As PMUs may cover a subset of CPUs from the
evlist, we skip the CPUs a PMU cannot handle.
Without this patch, perf record may try to mmap erroneous FDs on
heterogeneous systems, and will bail out early rather than running the
workload.
Signed-off-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1473330112-28528-7-git-send-email-mark.rutland@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Peter Zijlstra [Mon, 5 Sep 2016 19:08:12 +0000 (16:08 -0300)]
perf annotate: Add branch stack / basic block
I wanted to know the hottest path through a function and figured the
branch-stack (LBR) information should be able to help out with that.
The below uses the branch-stack to create basic blocks and generate
statistics from them.
from to branch_i
* ----> *
|
| block
v
* ----> *
from to branch_i+1
The blocks are broken down into non-overlapping ranges, while tracking
if the start of each range is an entry point and/or the end of a range
is a branch.
Each block iterates all ranges it covers (while splitting where required
to exactly match the block) and increments the 'coverage' count.
For the range including the branch we increment the taken counter, as
well as the pred counter if flags.predicted.
Using these number we can find if an instruction:
- had coverage; given by:
br->coverage / br->sym->max_coverage
This metric ensures each symbol has a 100% spot, which reflects the
observation that each symbol must have a most covered/hottest
block.
- is a branch target: br->is_target && br->start == add
- for targets, how much of a branch's coverages comes from it:
target->entry / branch->coverage
- is a branch: br->is_branch && br->end == addr
- for branches, how often it was taken:
br->taken / br->coverage
after all, all execution that didn't take the branch would have
incremented the coverage and continued onward to a later branch.
- for branches, how often it was predicted:
br->pred / br->taken
The coverage percentage is used to color the address and asm sections;
for low (<1%) coverage we use NORMAL (uncolored), indicating that these
instructions are not 'important'. For high coverage (>75%) we color the
address RED.
For each branch, we add an asm comment after the instruction with
information on how often it was taken and predicted.
Output looks like (sans color, which does loose a lot of the
information :/)
(Note: the --branch-filter u,any was used to avoid spurious target and
branch points due to interrupts/faults, they show up as very small -/+
annotations on 'weird' locations)
Wang Nan [Tue, 6 Sep 2016 04:58:27 +0000 (04:58 +0000)]
perf tools: Recognize hugetlb mapping as anon mapping
Hugetlbfs mapping should be recognized as anon mapping so user has a
chance to create /tmp/perf-<pid>.map file for symbol resolving. This
patch utilizes MAP_HUGETLB to identify hugetlb mapping.
After this patch, if perf is started before a program starts using huge
pages (so perf gets MMAP2 events from kernel), perf is able to recognize
hugetlb mapping as anon mapping.
Signed-off-by: Wang Nan <wangnan0@huawei.com> Cc: He Kuang <hekuang@huawei.com> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1473137909-142064-2-git-send-email-wangnan0@huawei.com Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The reason is that we currently allow to init mbm event
even if mbm support is not detected. Adding checks for
both cqm and mbm events and support into cqm's event_init.
Fixes: 33c3cc7acfd9 ("perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init") Reported-by: Yanqiu Zhang <yanqzhan@redhat.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1473089407-21857-1-git-send-email-jolsa@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We're not using it anymore, few users were, but we really could do
without it, simplify lots of functions by removing it.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-1zng8wdznn00iiz08bb7q3vn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf test vmlinux: Remove dead symbol_filter_t code
We don't need to initialize that area as we're not using it afterwards,
leftover, ditch it.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-jb2un8buy4rqawz73mcdm1sn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf machine: Remove machine->symbol_filter and friends
Including machines__set_symbol_filter(), not used anymore.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-7o1qgmrpvzuis4a9f0t8mnri@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Not needed, we already have code to prune aliases.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-1ysyce7qjgui93gi1efbjwhf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf symbols: Mark if a symbol is idle in the library
This was being done just in 'perf top', but grouping idle symbols should
be useful in other places as well, so remove one more symbol_filter_t
user by moving this to the symbol library.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-5r7xitjkzjr9jak1zy3d8u5l@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge tag 'perf-core-for-mingo-20160901' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Support generating cross arch probes, i.e. if you specify a vmlinux
file for different arch than the one in the host machine,
$ perf probe --definition function_name args
will generate the probe definition string needed to append to the
target machine /sys/kernel/debug/tracing/kprobes_events file, using
scripting (Masami Hiramatsu).
- Make 'perf probe' skip the function prologue in uprobes if program
compiled without optimization, using the same strategy as gdb and
systemtap uses, fixing a bug where:
$ perf probe -x ./test 'foo i'
When 'foo(42)' was used on the "./test" executable would produce i=0
instead of the expected i=42 (Ravi Bangoria)
- Demangle symbols for synthesized @plt entries too (Millian Wolff)
Documentation changes:
- Show default report configuration in 'perf config' example
and docs (Millian Wolff)
Infrastructure changes:
- Make 'perf test vmlinux' tolerate the symbol aliasing pruning done when
loading kallsyms and vmlinux (Arnaldo Carvalho de Melo)
- Improve output of 'perf test vmlinux' test, to help identify on the verbose
output which lines are warning and which are errors (Arnaldo Carvalho de Melo)
- Prep work to stop having to pass symbol_filter_t to lots of functions,
simplifying symtab loading routines (Arnaldo Carvalho de Melo)
- Honor symbol_conf.allow_aliases when loading kallsyms as well, it was using
it only when loading vmlinux files (Arnaldo Carvalho de Melo)
- Fixup symbol->end before doing alias pruning when loading symbol tables
(Arnaldo Carvalho de Melo)
Will Deacon [Mon, 15 Aug 2016 10:42:45 +0000 (11:42 +0100)]
perf/core: Don't pass PERF_EF_START to the PMU ->start callback
PERF_EF_START is a flag to indicate to the PMU ->add() callback that, as
well as claiming the PMU resources required by the event being added,
it should also start the PMU.
Passing this flag to the ->start() callback doesn't make sense, because
->start() always tries to start the PMU. Remove it.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: mark.rutland@arm.com Link: http://lkml.kernel.org/r/1471257765-29662-1-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Kan Liang [Tue, 16 Aug 2016 20:09:48 +0000 (16:09 -0400)]
perf/x86/intel/uncore: Remove hard-coded implementation for Node ID mapping location
The method to build PCI bus to socket mapping is similar among
platforms. However, the PCI location which stores Node ID mapping could
vary between different platforms. For example, the Node ID mapping address
on Skylake server is different from the previous platform. Also, to
build the mapping for the PCI bus without UBOX, it has to start from
bus 0 on Skylake server.
This patch removes the current hardcoded implementation and adds
three parameters for snbep_pci2phy_map_init(). This way the Node ID mapping
address and bus searching direction can be configured according to
different platforms.
Signed-off-by: Kan Liang <kan.liang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Nilay Vaish <nilayvaish@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1471378190-17276-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Merge tag 'dm-4.8-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- a stable fix in both DM crypt and DM log-writes for too large bios
(as generated by bcache)
- two other stable fixes for DM log-writes
- a stable fix for a DM crypt bug that could result in freeing pointers
from uninitialized memory in the tfm allocation error path
- a DM bufio cleanup to discontinue using create_singlethread_workqueue()
* tag 'dm-4.8-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm bufio: remove use of deprecated create_singlethread_workqueue()
dm crypt: fix free of bad values after tfm allocation failure
dm crypt: fix error with too large bios
dm log writes: fix check of kthread_run() return value
dm log writes: fix bug with too large bios
dm log writes: move IO accounting earlier to fix error path
Merge branch 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
"I'm still prepping a set of fixes for btrfs fsync, just nailing down a
hard to trigger memory corruption. For now, these are tested and ready."
* 'for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket()
Btrfs: fix endless loop in balancing block groups
Btrfs: kill invalid ASSERT() in process_all_refs()
Merge tag 'char-misc-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a number of small driver fixes for 4.8-rc5.
The largest thing here is deleting an obsolete driver,
drivers/misc/bh1780gli.c, as the functionality of it was replaced by
an iio driver a while ago.
The other fixes are things that have been reported, or reverts of
broken stuff (the binder change). All of these changes have been in
linux-next for a while with no reported issues"
* tag 'char-misc-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
thunderbolt: Don't declare Falcon Ridge unsupported
thunderbolt: Add support for INTEL_FALCON_RIDGE_2C controller.
thunderbolt: Fix resume quirk for Falcon Ridge 4C.
lkdtm: Mark lkdtm_rodata_do_nothing() notrace
mei: me: disable driver on SPT SPS firmware
Revert "android: binder: fix dangling pointer comparison"
drivers/iio/light/Kconfig: SENSORS_BH1780 cleanup
android: binder: fix dangling pointer comparison
misc: delete bh1780 driver
Merge tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are three small fixes for 4.8-rc5.
One for sysfs, one for kernfs, and one documentation fix, all for
reported issues. All of these have been in linux-next for a while"
* tag 'driver-core-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: correctly handle read offset on PREALLOC attrs
documentation: drivers/core/of: fix name of of_node symlink
kernfs: don't depend on d_find_any_alias() when generating notifications
Merge tag 'tty-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull serial driver fixes from Greg KH:
"Here are some small serial driver fixes for 4.8-rc5. One fixes an
oft-reported build issue with the fintek driver, another reverts a
patch that was causing problems, one fixes a crash, and some new
device ids were added.
All of these have been in linux-next for a while"
* tag 'tty-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: 8250: added acces i/o products quad and octal serial cards
serial: 8250_mid: fix divide error bug if baud rate is 0
Revert "tty/serial/8250: use mctrl_gpio helpers"
8250/fintek: rename IRQ_MODE macro
Merge tag 'usb-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/PHY fixes from Greg KH:
"Here are some USB and PHY driver fixes for 4.8-rc5
Nothing major, lots of little fixes for reported bugs, and a build fix
for a missing .h file that the phy drivers needed. All of these have
been in linux-next for a while with no reported issues"
* tag 'usb-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (24 commits)
usb: musb: Fix locking errors for host only mode
usb: dwc3: gadget: always decrement by 1
usb: dwc3: debug: fix ep name on trace output
usb: gadget: udc: core: don't starve DMA resources
USB: serial: option: add WeTelecom 0x6802 and 0x6803 products
USB: avoid left shift by -1
USB: fix typo in wMaxPacketSize validation
usb: gadget: Add the gserial port checking in gs_start_tx()
usb: dwc3: gadget: don't rely on jiffies while holding spinlock
usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame()
usb: gadget: function: f_rndis: socket buffer may be NULL
usb: gadget: function: f_eem: socket buffer may be NULL
usb: renesas_usbhs: gadget: fix return value check in usbhs_mod_gadget_probe()
usb: dwc2: Add reset control to dwc2
usb: dwc3: core: allow device to runtime_suspend several times
usb: dwc3: pci: runtime_resume child device
USB: serial: option: add WeTelecom WM-D200
usb: chipidea: udc: don't touch DP when controller is in host mode
USB: serial: mos7840: fix non-atomic allocation in write path
USB: serial: mos7720: fix non-atomic allocation in write path
...
devpts: return NULL pts 'priv' entry for non-devpts nodes
In commit 8ead9dd54716 ("devpts: more pty driver interface cleanups") I
made devpts_get_priv() just return the dentry->fs_data directly. And
because I thought it wouldn't happen, I added a warning if you ever saw
a pts node that wasn't on devpts.
And no, that warning never triggered under any actual real use, but you
can trigger it by creating nonsensical pts nodes by hand.
So just revert the warning, and make devpts_get_priv() return NULL for
that case like it used to.
Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org # 4.6+ Cc: Eric W Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes for the nvme over fabrics code"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme-rdma: Get rid of redundant defines
nvme-rdma: Get rid of duplicate variable
nvme: fabrics drivers don't need the nvme-pci driver
nvme-fabrics: get a reference when reusing a nvme_host structure
nvme-fabrics: change NQN UUID to big-endian format
nvme-loop: set sqsize to 0-based value, per spec
nvme-rdma: fix sqsize/hsqsize per spec
fabrics: define admin sqsize min default, per spec
nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize
nvmet-rdma: Fix use after free
nvme-rdma: initialize ret to zero to avoid returning garbage
Jarkko Sakkinen [Thu, 1 Sep 2016 23:36:58 +0000 (02:36 +0300)]
tpm: invalid self test error message
The driver emits invalid self test error message even though the init
succeeds.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Fixes: cae8b441fc20 ("tpm: Factor out common startup code") Reviewed-by: James Morris <james.l.morris@oracle.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
Merge tag 'acpi-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes ffrom Rafael Wysocki:
"Two stable-candidate fixes for the ACPI early device probing code
added during the 4.4 cycle, one fixing a typo in a stub macro used
when CONFIG_ACPI is unset and one that prevents sleeping functions
from being called under a spinlock (Lorenzo Pieralisi)"
* tag 'acpi-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / drivers: replace acpi_probe_lock spinlock with mutex
ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro
Merge tag 'pm-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"This includes a stable-candidate cpufreq-dt driver problem fix and
annotations of tracepoints in the runtime PM framework.
Specifics:
- Fix the definition of the cpufreq-dt driver's machines table
introduced during the 4.7 cycle that should be NULL-terminated, but
the termination entry is missing from it (Wei Yongjun).
- Annotate tracepoints in the runtime PM framework's core so as to
allow the functions containing them to be called from the idle code
path without causing RCU to complain about illegal usage (Paul
McKenney)"
* tag 'pm-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / runtime: Add _rcuidle suffix to allow rpm_idle() use from idle
PM / runtime: Add _rcuidle suffix to allow rpm_resume() to be called from idle
cpufreq: dt: Add terminate entry for of_device_id tables
Merge branches 'pm-cpufreq-fixes' and 'pm-core-fixes'
* pm-cpufreq-fixes:
cpufreq: dt: Add terminate entry for of_device_id tables
* pm-core-fixes:
PM / runtime: Add _rcuidle suffix to allow rpm_idle() use from idle
PM / runtime: Add _rcuidle suffix to allow rpm_resume() to be called from idle
ACPI / drivers: replace acpi_probe_lock spinlock with mutex
Commit e647b532275b ("ACPI: Add early device probing infrastructure")
introduced code that allows inserting driver specific
struct acpi_probe_entry probe entries into ACPI linker sections
(one per-subsystem, eg irqchip, clocksource) that are then walked
to retrieve the data and function hooks required to probe the
respective kernel components.
Probing for all entries in a section is triggered through
the __acpi_probe_device_table() function, that in turn, according
to the table ID a given probe entry reports parses the table
with the function retrieved from the respective section structures
(ie struct acpi_probe_entry). Owing to the current ACPI table
parsing implementation, the __acpi_probe_device_table() function
has to share global variables with the acpi_match_madt() function, so
in order to guarantee mutual exclusion locking is required
between the two functions.
Current kernel code implements the locking through the acpi_probe_lock
spinlock; this has the side effect of requiring all code called
within the lock (ie struct acpi_probe_entry.probe_{table/subtbl} hooks)
not to sleep.
However, kernel subsystems that make use of the early probing
infrastructure are relying on kernel APIs that may sleep (eg
irq_domain_alloc_fwnode(), among others) in the function calls
pointed at by struct acpi_probe_entry.{probe_table/subtbl} entries
(eg gic_v2_acpi_init()), which is a bug.
Since __acpi_probe_device_table() is called from context
that is allowed to sleep the acpi_probe_lock spinlock can be replaced
with a mutex; this fixes the issue whilst still guaranteeing
mutual exclusion.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Fixes: e647b532275b (ACPI: Add early device probing infrastructure) Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro
When the ACPI_DECLARE_PROBE_ENTRY macro was added in
commit e647b532275b ("ACPI: Add early device probing infrastructure"),
a stub macro adding an unused entry was added for the !CONFIG_ACPI
Kconfig option case to make sure kernel code making use of the
macro did not require to be guarded within CONFIG_ACPI in order to
be compiled.
The stub macro was never used since all kernel code that defines
ACPI_DECLARE_PROBE_ENTRY entries is currently guarded within
CONFIG_ACPI; it contains a typo that should be nonetheless fixed.
Fix the typo in the stub (ie !CONFIG_ACPI) ACPI_DECLARE_PROBE_ENTRY()
macro so that it can actually be used if needed.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Fixes: e647b532275b (ACPI: Add early device probing infrastructure) Cc: 4.4+ <stable@vger.kernel.org> # 4.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
x86/AMD: Apply erratum 665 on machines without a BIOS fix
AMD F12h machines have an erratum which can cause DIV/IDIV to behave
unpredictably. The workaround is to set MSRC001_1029[31] but sometimes
there is no BIOS update containing that workaround so let's do it
ourselves unconditionally. It is simple enough.
Steven Rostedt [Wed, 25 May 2016 17:47:26 +0000 (13:47 -0400)]
x86/paravirt: Do not trace _paravirt_ident_*() functions
Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up
after enabling function tracer. I asked him to bisect the functions within
available_filter_functions, which he did and it came down to three:
_paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()
It was found that this is only an issue when noreplace-paravirt is added
to the kernel command line.
This means that those functions are most likely called within critical
sections of the funtion tracer, and must not be traced.
In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
longer an issue. But both _paravirt_ident_{32,64}() causes the
following splat when they are traced:
Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Most of this is regression fixes for posix acl behavior introduced in
4.8-rc1 (these were caught by the pjd-fstest suite). The are also
miscellaneous fixes marked as stable material and cleanups.
Other than overlayfs code, it touches <linux/fs.h> to add a constant
with which to disable posix acl caching. No changes needed to the
actual caching code, it automatically does the right thing, although
later we may want to optimize this case.
I'm now testing overlayfs with the following test suites to catch
regressions:
- unionmount-testsuite
- xfstests
- pjd-fstest"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: update doc
ovl: listxattr: use strnlen()
ovl: Switch to generic_getxattr
ovl: copyattr after setting POSIX ACL
ovl: Switch to generic_removexattr
ovl: Get rid of ovl_xattr_noacl_handlers array
ovl: Fix OVL_XATTR_PREFIX
ovl: fix spelling mistake: "directries" -> "directories"
ovl: don't cache acl on overlay layer
ovl: use cached acl on underlying layer
ovl: proper cleanup of workdir
ovl: remove posix_acl_default from workdir
ovl: handle umask and posix_acl_default correctly on creation
ovl: don't copy up opaqueness
James Morse [Fri, 26 Aug 2016 15:03:42 +0000 (16:03 +0100)]
arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1
Changes to make the resume from cpu_suspend() code behave more like
secondary boot caused debug exceptions to be unmasked early by
__cpu_setup(). We then go on to restore mdscr_el1 in cpu_do_resume(),
potentially taking break or watch points based on uninitialised registers.
Mask debug exceptions in cpu_do_resume(), which is specific to resume
from cpu_suspend(). Debug exceptions will be restored to their original
state by local_dbg_restore() in cpu_suspend(), which runs after
hw_breakpoint_restore() has re-initialised the other registers.
Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Fixes: cabe1c81ea5b ("arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va") Cc: <stable@vger.kernel.org> # 4.7+ Signed-off-by: James Morse <james.morse@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Stefan Wahren [Sat, 27 Aug 2016 16:19:50 +0000 (16:19 +0000)]
drivers/perf: arm_pmu: Fix NULL pointer dereference during probe
Patch 7f1d642fbb5c ("drivers/perf: arm-pmu: Fix handling of SPI lacking
interrupt-affinity property") unintended also fixes perf_event support
for bcm2835 which doesn't have PMU interrupts. Unfortunately this change
introduce a NULL pointer dereference on bcm2835, because irq_is_percpu
always expected to be called with a valid IRQ. So fix this regression
by validating the IRQ before.
Tested-by: Kevin Hilman <khilman@baylibre.com> Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Fixes: 7f1d642fbb5c ("drivers/perf: arm-pmu: Fix handling of SPI lacking "interrupt-affinity" property") Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Merge tag 'dmaengine-fix-4.8-rc5' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"The fixes this time are all in drivers:
- possible NULL dereference in img-mdc
- correct device identity for free_irq in at_xdmac
- missing of_node_put() in fsl probe
- fix debug log and hotchain corner case for pxa-dma
- fix checking hardware bits in isr in usb dmac"
* tag 'dmaengine-fix-4.8-rc5' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: img-mdc: fix a possible NULL dereference
dmaengine: at_xdmac: fix to pass correct device identity to free_irq()
dmaengine: fsl_raid: add missing of_node_put() in fsl_re_probe()
dmaengine: pxa_dma: fix debug message
dmaengine: pxa_dma: fix hotchain corner case
dmaengine: usb-dmac: check CHCR.DE bit in usb_dmac_isr_channel()
Merge tag 'drm-fixes-for-4.8-rc5' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Contains fixes for imx, amdgpu, vc4, msm and one nouveau ACPI fix"
* tag 'drm-fixes-for-4.8-rc5' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: record error code when ring test failed
drm/amd/amdgpu: compute ring test fail during S4 on CI
drm/amd/amdgpu: sdma resume fail during S4 on CI
drm/nouveau/acpi: use DSM if bridge does not support D3cold
drm/imx: fix crtc vblank state regression
drm/imx: Add active plane reconfiguration support
drm/msm: protect against faults from copy_from_user() in submit ioctl
drm/msm: fix use of copy_from_user() while holding spinlock
drm/vc4: Fix oops when userspace hands in a bad BO.
drm/vc4: Fix overflow mem unreferencing when the binner runs dry.
drm/vc4: Free hang state before destroying BO cache.
drm/vc4: Fix handling of a pm_runtime_get_sync() success case.
drm/vc4: Use drm_malloc_ab to fix large rendering jobs.
drm/vc4: Use drm_free_large() on handles to match its allocation.
Wanpeng Li [Fri, 2 Sep 2016 06:38:23 +0000 (14:38 +0800)]
tick/nohz: Fix softlockup on scheduler stalls in kvm guest
tick_nohz_start_idle() is prevented to be called if the idle tick can't
be stopped since commit 1f3b0f8243cb934 ("tick/nohz: Optimize nohz idle
enter"). As a result, after suspend/resume the host machine, full dynticks
kvm guest will softlockup:
The idle and iowait states are weird 0 for cpu0(housekeeping).
The bug is present in both guest and host kernels, and they both have
cpu0's idle and iowait states issue, however, host kernel's suspend/resume
path etc will touch watchdog to avoid the softlockup.
- The watchdog will not be touched in tick_nohz_stop_idle path (need be
touched since the scheduler stall is expected) if idle_active flags are
not detected.
- The idle and iowait states will not be accounted when exit idle loop
(resched or interrupt) if idle start time and idle_active flags are
not set.
This patch fixes it by reverting commit 1f3b0f8243cb934 since can't stop
idle tick doesn't mean can't be idle.
Fixes: 1f3b0f8243cb934 ("tick/nohz: Optimize nohz idle enter") Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Cc: Sanjeev Yadav<sanjeev.yadav@spreadtrum.com> Cc: Gaurav Jindal<gaurav.jindal@spreadtrum.com> Cc: stable@vger.kernel.org Cc: kvm@vger.kernel.org Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: http://lkml.kernel.org/r/1472798303-4154-1-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Dave Airlie [Fri, 2 Sep 2016 05:55:15 +0000 (15:55 +1000)]
Merge tag 'drm-vc4-fixes-2016-08-29' of https://github.com/anholt/linux into drm-fixes
This pull request brings in fixes for VC4 3D in 4.8, most of which are
covered by testcases.
* tag 'drm-vc4-fixes-2016-08-29' of https://github.com/anholt/linux:
drm/vc4: Fix oops when userspace hands in a bad BO.
drm/vc4: Fix overflow mem unreferencing when the binner runs dry.
drm/vc4: Free hang state before destroying BO cache.
drm/vc4: Fix handling of a pm_runtime_get_sync() success case.
drm/vc4: Use drm_malloc_ab to fix large rendering jobs.
drm/vc4: Use drm_free_large() on handles to match its allocation.
Dave Airlie [Fri, 2 Sep 2016 05:48:38 +0000 (15:48 +1000)]
Merge tag 'imx-drm-fixes-2016-08-30' of git://git.pengutronix.de/git/pza/linux into drm-fixes
imx-drm atomic modeset regression fixes
- add active plane reconfiguration support
- add back crtc vblank state reporting
* tag 'imx-drm-fixes-2016-08-30' of git://git.pengutronix.de/git/pza/linux:
drm/imx: fix crtc vblank state regression
drm/imx: Add active plane reconfiguration support
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A collection of small fixes for various SoC vendor clk drivers"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: rockchip: mark aclk_emmc_noc as a critical clock on rk3399
clk: tegra: remove TEGRA_PLL_USE_LOCK for PLLD/PLLD2
clk: rockchip: fix incorrect GATE bits for {c, g}pll_aclk_perihp_src on rk3399
clk: rockchip: fix incorrect aclk_emmc source gate bits on rk3399
clk: renesas: r8a7795: Fix SD clocks
clk: rockchip: fix rk3399 aclk_vio gate bit
clk: sunxi-ng: Fix inverted test condition in ccu_helper_wait_for_lock
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
rapidio/tsi721: fix incorrect detection of address translation condition
rapidio/documentation/mport_cdev: add missing parameter description
kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
MAINTAINERS: Vladimir has moved
mm, mempolicy: task->mempolicy must be NULL before dropping final reference
printk/nmi: avoid direct printk()-s from __printk_nmi_flush()
treewide: remove references to the now unnecessary DEFINE_PCI_DEVICE_TABLE
drivers/scsi/wd719x.c: remove last declaration using DEFINE_PCI_DEVICE_TABLE
mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator
lib/test_hash.c: fix warning in preprocessor symbol evaluation
lib/test_hash.c: fix warning in two-dimensional array init
kconfig: tinyconfig: provide whole choice blocks to avoid warnings
kexec: fix double-free when failing to relocate the purgatory
mm, oom: prevent premature OOM killer invocation for high order request
Michal Hocko [Thu, 1 Sep 2016 23:15:13 +0000 (16:15 -0700)]
kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
Commit fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal
exit") has caused a subtle regression in nscd which uses
CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the
shared databases, so that the clients are notified when nscd is
restarted. Now, when nscd uses a non-persistent database, clients that
have it mapped keep thinking the database is being updated by nscd, when
in fact nscd has created a new (anonymous) one (for non-persistent
databases it uses an unlinked file as backend).
The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):
: The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
: on behalf of pthread_create() library calls. This feature is used to
: request that the kernel clear the thread-id in user space (at an address
: provided in the syscall) when the thread disassociates itself from the
: address space, which is done in mm_release().
:
: Unfortunately, when a multi-threaded process incurs a core dump (such as
: from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
: the other threads, which then proceed to clear their user-space tids
: before synchronizing in exit_mm() with the start of core dumping. This
: misrepresents the state of process's address space at the time of the
: SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
: problems (misleading him/her to conclude that the threads had gone away
: before the fault).
:
: The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
: core dump has been initiated.
The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for. It
seems that limitting the scope of the check to core dumping should work
for SIGSEGV issue describe above.
[Changelog partly based on Andreas' description] Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit") Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Tested-by: William Preston <wpreston@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Andreas Schwab <schwab@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Thu, 1 Sep 2016 23:15:07 +0000 (16:15 -0700)]
mm, mempolicy: task->mempolicy must be NULL before dropping final reference
KASAN allocates memory from the page allocator as part of
kmem_cache_free(), and that can reference current->mempolicy through any
number of allocation functions. It needs to be NULL'd out before the
final reference is dropped to prevent a use-after-free bug:
BUG: KASAN: use-after-free in alloc_pages_current+0x363/0x370 at addr ffff88010b48102c
CPU: 0 PID: 15425 Comm: trinity-c2 Not tainted 4.8.0-rc2+ #140
...
Call Trace:
dump_stack
kasan_object_err
kasan_report_error
__asan_report_load2_noabort
alloc_pages_current <-- use after free
depot_save_stack
save_stack
kasan_slab_free
kmem_cache_free
__mpol_put <-- free
do_exit
This patch sets current->mempolicy to NULL before dropping the final
reference.
printk/nmi: avoid direct printk()-s from __printk_nmi_flush()
__printk_nmi_flush() can be called from nmi_panic(), therefore it has to
test whether it's executed in NMI context and thus must route the
messages through deferred printk() or via direct printk().
This is to avoid potential deadlocks, as described in commit cf9b1106c81c ("printk/nmi: flush NMI messages on the system panic").
However there remain two places where __printk_nmi_flush() does
unconditional direct printk() calls:
Factor out print_nmi_seq_line() parts into a new printk_nmi_flush_line()
function, which takes care of in_nmi(), and use it in
__printk_nmi_flush() for printing and error-reporting.
Link: http://lkml.kernel.org/r/20160830161354.581-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm, vmscan: only allocate and reclaim from zones with pages managed by the buddy allocator
Firmware Assisted Dump (FA_DUMP) on ppc64 reserves substantial amounts
of memory when booting a secondary kernel. Srikar Dronamraju reported
that multiple nodes may have no memory managed by the buddy allocator
but still return true for populated_zone().
Commit 1d82de618ddd ("mm, vmscan: make kswapd reclaim in terms of
nodes") was reported to cause kswapd to spin at 100% CPU usage when
fadump was enabled. The old code happened to deal with the situation of
a populated node with zero free pages by co-incidence but the current
code tries to reclaim populated zones without realising that is
impossible.
We cannot just convert populated_zone() as many existing users really
need to check for present_pages. This patch introduces a managed_zone()
helper and uses it in the few cases where it is critical that the check
is made for managed pages -- zonelist construction and page reclaim.
lib/test_hash.c: fix warning in preprocessor symbol evaluation
Some versions of gcc don't like tests for the value of an undefined
preprocessor symbol, even in the #else branch of an #ifndef:
lib/test_hash.c:224:7: warning: "HAVE_ARCH__HASH_32" is not defined [-Wundef]
#elif HAVE_ARCH__HASH_32 != 1
^
lib/test_hash.c:229:7: warning: "HAVE_ARCH_HASH_32" is not defined [-Wundef]
#elif HAVE_ARCH_HASH_32 != 1
^
lib/test_hash.c:234:7: warning: "HAVE_ARCH_HASH_64" is not defined [-Wundef]
#elif HAVE_ARCH_HASH_64 != 1
^
Seen with gcc 4.9, not seen with 4.1.2.
Change the logic to only check the value inside an #ifdef to fix this.
Fixes: 468a9428521e7d00 ("<linux/hash.h>: Add support for architecture-specific functions") Link: http://lkml.kernel.org/r/20160829214952.1334674-4-arnd@arndb.de Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: George Spelvin <linux@sciencehorizons.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kconfig: tinyconfig: provide whole choice blocks to avoid warnings
Using "make tinyconfig" produces a couple of annoying warnings that show
up for build test machines all the time:
.config:966:warning: override: NOHIGHMEM changes choice state
.config:965:warning: override: SLOB changes choice state
.config:963:warning: override: KERNEL_XZ changes choice state
.config:962:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
.config:933:warning: override: SLOB changes choice state
.config:930:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
.config:870:warning: override: SLOB changes choice state
.config:868:warning: override: KERNEL_XZ changes choice state
.config:867:warning: override: CC_OPTIMIZE_FOR_SIZE changes choice state
I've made a previous attempt at fixing them and we discussed a number of
alternatives.
I tried changing the Makefile to use "merge_config.sh -n
$(fragment-list)" but couldn't get that to work properly.
This is yet another approach, based on the observation that we do want
to see a warning for conflicting 'choice' options, and that we can
simply make them non-conflicting by listing all other options as
disabled. This is a trivial patch that we can apply independent of
plans for other changes.
kexec: fix double-free when failing to relocate the purgatory
If kexec_apply_relocations fails, kexec_load_purgatory frees pi->sechdrs
and pi->purgatory_buf. This is redundant, because in case of error
kimage_file_prepare_segments calls kimage_file_post_load_cleanup, which
will also free those buffers.
This causes two warnings like the following, one for pi->sechdrs and the
other for pi->purgatory_buf:
kexec-bzImage64: Loading purgatory failed
------------[ cut here ]------------
WARNING: CPU: 1 PID: 2119 at mm/vmalloc.c:1490 __vunmap+0xc1/0xd0
Trying to vfree() nonexistent vm area (ffffc90000e91000)
Modules linked in:
CPU: 1 PID: 2119 Comm: kexec Not tainted 4.8.0-rc3+ #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
dump_stack+0x4d/0x65
__warn+0xcb/0xf0
warn_slowpath_fmt+0x4f/0x60
? find_vmap_area+0x19/0x70
? kimage_file_post_load_cleanup+0x47/0xb0
__vunmap+0xc1/0xd0
vfree+0x2e/0x70
kimage_file_post_load_cleanup+0x5e/0xb0
SyS_kexec_file_load+0x448/0x680
? putname+0x54/0x60
? do_sys_open+0x190/0x1f0
entry_SYSCALL_64_fastpath+0x13/0x8f
---[ end trace 158bb74f5950ca2b ]---
Fix by setting pi->sechdrs an pi->purgatory_buf to NULL, since vfree
won't try to free a NULL pointer.
Link: http://lkml.kernel.org/r/1472083546-23683-1-git-send-email-bauerman@linux.vnet.ibm.com Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Thu, 1 Sep 2016 23:14:41 +0000 (16:14 -0700)]
mm, oom: prevent premature OOM killer invocation for high order request
There have been several reports about pre-mature OOM killer invocation
in 4.7 kernel when order-2 allocation request (for the kernel stack)
invoked OOM killer even during basic workloads (light IO or even kernel
compile on some filesystems). In all reported cases the memory is
fragmented and there are no order-2+ pages available. There is usually
a large amount of slab memory (usually dentries/inodes) and further
debugging has shown that there are way too many unmovable blocks which
are skipped during the compaction. Multiple reporters have confirmed
that the current linux-next which includes [1] and [2] helped and OOMs
are not reproducible anymore.
A simpler fix for the late rc and stable is to simply ignore the
compaction feedback and retry as long as there is a reclaim progress and
we are not getting OOM for order-0 pages. We already do that for
CONFING_COMPACTION=n so let's reuse the same code when compaction is
enabled as well.
Merge branch 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit
Pull audit fixes from Paul Moore:
"Two small patches to fix some bugs with the audit-by-executable
functionality we introduced back in v4.3 (both patches are marked
for the stable folks)"
* 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit:
audit: fix exe_file access in audit_exe_compare
mm: introduce get_task_exe_file
Merge tag 'xfs-iomap-for-linus-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs and iomap fixes from Dave Chinner:
"Most of these changes are small regression fixes that address problems
introduced in the 4.8-rc1 window. The two fixes that aren't (IO
completion fix and superblock inprogress check) are fixes for problems
introduced some time ago and need to be pushed back to stable kernels.
* tag 'xfs-iomap-for-linus-4.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: track log done items directly in the deferred pending work item
iomap: don't set FIEMAP_EXTENT_MERGED for extent based filesystems
xfs: prevent dropping ioend completions during buftarg wait
xfs: fix superblock inprogress check
xfs: simple btree query range should look right if LE lookup fails
xfs: fix some key handling problems in _btree_simple_query_range
xfs: don't log the entire end of the AGF
xfs: disallow mounting of realtime + rmap filesystems
xfs: don't perform lookups on zero-height btrees
Ravi Bangoria [Tue, 30 Aug 2016 08:39:37 +0000 (14:09 +0530)]
perf probe: Move dwarf specific functions to dwarf-aux.c
Move generic dwarf related functions from util/probe-finder.c to
util/dwarf-aux.c. Functions name and their prototype are also changed
accordingly. No functionality changes.
Suggested-and-Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1472546377-25612-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
$ perf record -e probe_test:foo ./test
$ perf script
test 5778 [001] 4918.562027: probe_test:foo: (400536) i=0
Here variable 'i' is passed via stack which is pushed on stack at
0x40053e. But we are probing at 0x400536.
To resolve this issues, we need to probe on next instruction after
prologue. gdb and systemtap also does same thing. I've implemented this
patch based on approach systemtap has used.
# perf probe -d *:*
Removed event: probe_test:foo
# perf probe -x ./test foo i
Target program is compiled without optimization. Skipping prologue.
Probe on address 0x400526 to force probing at the function entry.
Added new event:
probe_test:foo (on foo in /home/acme/c/test with i)
Reported-by: Michael Petlan <mpetlan@redhat.com> Link: https://www.mail-archive.com/linux-perf-users@vger.kernel.org/msg02348.html Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Cc: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1299021 Link: http://lkml.kernel.org/r/1470214725-5023-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com
[ Rename 'die' to 'cu_die' to avoid shadowing a die() definition on at least centos 5, Debian 7 and ubuntu:12.04.5]
[ Use PRIx64 instead of lx to format a Dwarf_Addr, aka long long unsigned int, fixing the build on 32-bit systems ]
[ dwarf_getsrclines() expects a size_t * argument ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf symbols: Fixup symbol sizes before picking best ones
When we call symbol__fixup_duplicate() we use algorithms to pick the
"best" symbols for cases where there are various functions/aliases to an
address, and those check zero size symbols, which, before calling
symbol__fixup_end() are _all_ symbols in a just parsed kallsyms file.
So first fixup the end, then fixup the duplicates.
Found while trying to figure out why 'perf test vmlinux' failed, see the
output of 'perf test -v vmlinux' to see cases where the symbols picked
as best for vmlinux don't match the ones picked for kallsyms.
Cc: Anton Blanchard <anton@samba.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 694bf407b061 ("perf symbols: Add some heuristics for choosing the best duplicate symbol") Link: http://lkml.kernel.org/n/tip-rxqvdgr0mqjdxee0kf8i2ufn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf symbols: Check symbol_conf.allow_aliases for kallsyms loading too
We can allow aliases to be kept, but we were checking this just when
loading vmlinux files, be consistent, do it for any symbol table loading
code that calls symbol__fixup_duplicate() by making this function check
.allow_aliases instead.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 680d926a8cb0 ("perf symbols: Allow symbol alias when loading map for symbol name") Link: http://lkml.kernel.org/n/tip-z0avp0s6cfjckc4xj3pdfjdz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The algorithms used to prune aliases in symbols__fixup_duplicate() uses
information available on ELF symtabs that are not present on
/proc/kallsyms, so it picks different aliases as "best" for vmlinux and
kallsyms.
We could probably improve a bit this by having a list of aliases for the
"best" symbols picked, instead of throwing this info, but that is left
for when we find a real need.
With this, 'perf test vmlinux' passes:
# perf test -F 1
1: vmlinux symtab matches kallsyms: Ok
#
When we ask for verbose mode, we can see those warning:
# perf test -F -v 1
1: vmlinux symtab matches kallsyms:
--- start ---
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/4.8.0-rc4+/build/vmlinux for symbols
WARN: 0xffffffffb7001000: diff name v: xen_hypercall_set_trap_table k: hypercall_page
WARN: 0xffffffffb7077970: diff end addr for aesni_gcm_dec v: 0xffffffffb707a2f2 k: 0xffffffffb7077a02
WARN: 0xffffffffb707a300: diff end addr for aesni_gcm_enc v: 0xffffffffb707cc03 k: 0xffffffffb707a392
WARN: 0xffffffffb707f950: diff end addr for aesni_gcm_enc_avx_gen2 v: 0xffffffffb7084ef6 k: 0xffffffffb707f9c3
WARN: 0xffffffffb7084f00: diff end addr for aesni_gcm_dec_avx_gen2 v: 0xffffffffb708a691 k: 0xffffffffb7084f73
WARN: 0xffffffffb708aa10: diff end addr for aesni_gcm_enc_avx_gen4 v: 0xffffffffb708f844 k: 0xffffffffb708aa83
WARN: 0xffffffffb708f850: diff end addr for aesni_gcm_dec_avx_gen4 v: 0xffffffffb709486f k: 0xffffffffb708f8c3
WARN: 0xffffffffb71a6e50: diff name v: perf_pmu_commit_txn.part.98 k: perf_pmu_cancel_txn.part.97
WARN: 0xffffffffb752e480: diff name v: wakeup_expire_count_show.part.5 k: wakeup_active_count_show.part.7
WARN: 0xffffffffb76e8d00: diff name v: phys_switch_id_show.part.11 k: phys_port_name_show.part.12
WARN: Maps only in vmlinux: ffffffffb7d7d000-ffffffffb7eeaac8117d000 [kernel].init.text ffffffffb7eeaac8-ffffffffc03ad00012eaac8 [kernel].exit.text
---- end ----
vmlinux symtab matches kallsyms: Ok
#
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-6v5w1k8rpx4ggczlkw730vt0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf test vmlinux: Avoid printing headers for empty lists
Before:
# perf test -F -v 1
1: vmlinux symtab matches kallsyms:
--- start ---
<SNIP>
WARN: Maps only in vmlinux: ffffffffb7d7d000-ffffffffb7eeaac8117d000 [kernel].init.text ffffffffb7eeaac8-ffffffffc03ad00012eaac8 [kernel].exit.text
WARN: Maps in vmlinux with a different name in kallsyms:
WARN: Maps only in kallsyms:
---- end ----
vmlinux symtab matches kallsyms: Ok
#
The two last WARN lines are now suppressed, since there are no such
cases detected.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-9ww8uvzl682ykaw8ht1tozlr@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf test vmlinux: Clarify which -v lines are errors or warning
When the 'perf test -v vmlinux' test fails, it is not clear which of the
lines are errors or warnings, clarify that adding ERR/WARN prefixes:
# perf test -F -v 1
1: vmlinux symtab matches kallsyms :
--- start ---
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/4.8.0-rc4+/build/vmlinux for symbols
ERR : 0xffffffffb7001000: diff name v: xen_hypercall_set_trap_table k: hypercall_page
WARN: 0xffffffffb7077970: diff end addr for aesni_gcm_dec v: 0xffffffffb707a2f2 k: 0xffffffffb7077a02
WARN: 0xffffffffb707a300: diff end addr for aesni_gcm_enc v: 0xffffffffb707cc03 k: 0xffffffffb707a392
WARN: 0xffffffffb707f950: diff end addr for aesni_gcm_enc_avx_gen2 v: 0xffffffffb7084ef6 k: 0xffffffffb707f9c3
WARN: 0xffffffffb7084f00: diff end addr for aesni_gcm_dec_avx_gen2 v: 0xffffffffb708a691 k: 0xffffffffb7084f73
WARN: 0xffffffffb708aa10: diff end addr for aesni_gcm_enc_avx_gen4 v: 0xffffffffb708f844 k: 0xffffffffb708aa83
WARN: 0xffffffffb708f850: diff end addr for aesni_gcm_dec_avx_gen4 v: 0xffffffffb709486f k: 0xffffffffb708f8c3
ERR : 0xffffffffb71a6e50: diff name v: perf_pmu_commit_txn.part.98 k: perf_pmu_cancel_txn.part.97
ERR : 0xffffffffb752e480: diff name v: wakeup_expire_count_show.part.5 k: wakeup_active_count_show.part.7
ERR : 0xffffffffb76e8d00: diff name v: phys_switch_id_show.part.11 k: phys_port_name_show.part.12
WARN: Maps only in vmlinux: ffffffffb7d7d000-ffffffffb7eeaac8117d000 [kernel].init.text ffffffffb7eeaac8-ffffffffc03ad00012eaac8 [kernel].exit.text
WARN: Maps in vmlinux with a different name in kallsyms:
WARN: Maps only in kallsyms:
---- end ----
vmlinux symtab matches kallsyms: FAILED!
#
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-n5ml8m7y9x8kzvxt09ipku88@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masami Hiramatsu [Fri, 26 Aug 2016 14:57:58 +0000 (23:57 +0900)]
perf probe: Ignore vmlinux Build-id when offline vmlinux given
Ignore vmlinux build-id when user gives offline vmlinux if the command
does not affect running kernel.
perf-probe has several actions some of them does not change the running
kernel, like --lines, --vars, and --funcs.
e.g.
-----
$ ./perf probe -k ./vmlinux-arm -V do_sys_open:14
Available variables at do_sys_open:14
@<do_sys_open+202>
char* filename
int dfd
int fd
int flags
struct filename* tmp
struct open_flags op
umode_t mode
-----
Masami Hiramatsu [Thu, 25 Aug 2016 16:24:57 +0000 (01:24 +0900)]
perf probe: Support probing on offline cross-arch binary
Support probing on offline cross-architecture binary by adding getting
the target machine arch from ELF and choose correct register string for
the machine.
Here is an example:
-----
$ perf probe --vmlinux=./vmlinux-arm --definition 'do_sys_open $params'
p:probe/do_sys_open do_sys_open+0 dfd=%r5:s32 filename=%r1:u32 flags=%r6:s32 mode=%r3:u16
-----
Here, we can get probe/do_sys_open from above and append it to to the target
machine's tracing/kprobe_events file in the tracefs mountput, usually
/sys/kernel/debug/tracing/kprobe_events (or /sys/kernel/tracing/kprobe_events).
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/147214229717.23638.6440579792548044658.stgit@devbox
[ Add definition for EM_AARCH64 to fix the build on at least centos 6, debian 7 & ubuntu 12.04.5 ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wang Xiaoguang [Wed, 31 Aug 2016 11:46:16 +0000 (19:46 +0800)]
btrfs: fix one bug that process may endlessly wait for ticket in wait_reserve_ticket()
If can_overcommit() in btrfs_calc_reclaim_metadata_size() returns true,
btrfs_async_reclaim_metadata_space() will not reclaim metadata space, just
return directly and also forget to wake up process which are waiting for
their tickets, so these processes will wait endlessly.
Fstests case generic/172 with mount option "-o compress=lzo" have revealed
this bug in my test machine. Here if we have tickets to handle, we must
handle them first.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
Liu Bo [Wed, 31 Aug 2016 23:43:33 +0000 (16:43 -0700)]
Btrfs: fix endless loop in balancing block groups
Qgroup function may overwrite the saved error 'err' with 0
in case quota is not enabled, and this ends up with a
endless loop in balance because we keep going back to balance
the same block group.
It really should use 'ret' instead.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Wed, 24 Aug 2016 15:57:52 +0000 (11:57 -0400)]
Btrfs: kill invalid ASSERT() in process_all_refs()
Suppose you have the following tree in snap1 on a file system mounted with -o
inode_cache so that inode numbers are recycled
└── [ 258] a
└── [ 257] b
and then you remove b, rename a to c, and then re-create b in c so you have the
following tree
└── [ 258] c
└── [ 257] b
and then you try to do an incremental send you will hit
ASSERT(pending_move == 0);
in process_all_refs(). This is because we assume that any recycling of inodes
will not have a pending change in our path, which isn't the case. This is the
case for the DELETE side, since we want to remove the old file using the old
path, but on the create side we could have a pending move and need to do the
normal pending rename dance. So remove this ASSERT() and put a comment about
why we ignore pending_move. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
Masami Hiramatsu [Thu, 25 Aug 2016 16:24:42 +0000 (01:24 +0900)]
perf probe: Ignore vmlinux buildid if offline kernel is given
Ignore the buildid of running kernel when both of --definition and
--vmlinux is given because that kernel should be off-line.
This also skips post-processing of kprobe event for relocating symbol
and checking blacklist, because it can not be done on off-line kernel.
E.g. without this fix perf shows an error as below
----
$ perf probe --vmlinux=./vmlinux-arm --definition do_sys_open
./vmlinux-arm with build id 7a1f76dd56e9c4da707cd3d6333f50748141434b not found, continuing without symbols
Failed to find symbol do_sys_open in kernel
Error: Failed to add events.
----
with this fix, we can get the definition
----
$ perf probe --vmlinux=./vmlinux-arm --definition do_sys_open
p:probe/do_sys_open do_sys_open+0
----
Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20160831130427.GA13095@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>