Andi Kleen [Thu, 30 Mar 2017 00:07:53 +0000 (17:07 -0700)]
perf vendor events intel: Add missing UNC_M_DCLOCKTICKS for Broadwell DE uncore
An earlier update removed the UNC_M_CLOCKTICKS event for Broadwell DE.
But Metric events were still referring to it.
This adds it back under a different name from the event list,
and also fixes up the Metric events to use the new name.
Tommi Rantala [Wed, 22 Mar 2017 13:06:24 +0000 (15:06 +0200)]
perf utils: Readlink /proc/self/exe to find the perf binary
Simplification: it is easier to open /proc/self/exe than /proc/$pid/exe.
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170322130624.21881-7-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tommi Rantala [Wed, 22 Mar 2017 13:06:23 +0000 (15:06 +0200)]
perf utils: Null terminate buf in read_ftrace_printk()
Ensure that the string that we read from the data file is null terminated.
Valgrind was complaining:
==31357== Invalid read of size 1
==31357== at 0x4EC8C1: __strtok_r_1c (string2.h:200)
==31357== by 0x4EC8C1: parse_ftrace_printk (trace-event-parse.c:161)
==31357== by 0x4F82A8: read_ftrace_printk (trace-event-read.c:204)
==31357== by 0x4F82A8: trace_report (trace-event-read.c:468)
==31357== by 0x4CD552: process_tracing_data (header.c:1576)
==31357== by 0x4D3397: perf_file_section__process (header.c:2705)
==31357== by 0x4D3397: perf_header__process_sections (header.c:2488)
==31357== by 0x4D3397: perf_session__read_header (header.c:2925)
==31357== by 0x4E71E2: perf_session__open (session.c:32)
==31357== by 0x4E71E2: perf_session__new (session.c:139)
==31357== by 0x429F5D: cmd_annotate (builtin-annotate.c:472)
==31357== by 0x497150: run_builtin (perf.c:359)
==31357== by 0x428CE0: handle_internal_command (perf.c:421)
==31357== by 0x428CE0: run_argv (perf.c:467)
==31357== by 0x428CE0: main (perf.c:614)
==31357== Address 0x8ac0efb is 0 bytes after a block of size 1,963 alloc'd
==31357== at 0x4C2DB9D: malloc (vg_replace_malloc.c:299)
==31357== by 0x4F827B: read_ftrace_printk (trace-event-read.c:195)
==31357== by 0x4F827B: trace_report (trace-event-read.c:468)
==31357== by 0x4CD552: process_tracing_data (header.c:1576)
==31357== by 0x4D3397: perf_file_section__process (header.c:2705)
==31357== by 0x4D3397: perf_header__process_sections (header.c:2488)
==31357== by 0x4D3397: perf_session__read_header (header.c:2925)
==31357== by 0x4E71E2: perf_session__open (session.c:32)
==31357== by 0x4E71E2: perf_session__new (session.c:139)
==31357== by 0x429F5D: cmd_annotate (builtin-annotate.c:472)
==31357== by 0x497150: run_builtin (perf.c:359)
==31357== by 0x428CE0: handle_internal_command (perf.c:421)
==31357== by 0x428CE0: run_argv (perf.c:467)
==31357== by 0x428CE0: main (perf.c:614)
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170322130624.21881-6-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tommi Rantala [Wed, 22 Mar 2017 13:06:22 +0000 (15:06 +0200)]
perf utils: use sizeof(buf) - 1 in readlink() call
Ensure that we have space for the null byte in buf.
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170322130624.21881-5-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tommi Rantala [Wed, 22 Mar 2017 13:06:21 +0000 (15:06 +0200)]
perf tests: Do not assume that readlink() returns a null terminated string
Ensure that the string in buf is null terminated.
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170322130624.21881-4-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tommi Rantala [Wed, 22 Mar 2017 13:06:20 +0000 (15:06 +0200)]
perf buildid: Do not assume that readlink() returns a null terminated string
Valgrind was complaining:
$ valgrind ./perf list >/dev/null
==11643== Memcheck, a memory error detector
==11643== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==11643== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==11643== Command: ./perf list
==11643==
==11643== Conditional jump or move depends on uninitialised value(s)
==11643== at 0x4C30620: rindex (vg_replace_strmem.c:199)
==11643== by 0x49DAA9: build_id_cache__origname (build-id.c:198)
==11643== by 0x49E1C7: build_id_cache__valid_id (build-id.c:222)
==11643== by 0x49E1C7: build_id_cache__list_all (build-id.c:507)
==11643== by 0x4B9C8F: print_sdt_events (parse-events.c:2067)
==11643== by 0x4BB0B3: print_events (parse-events.c:2313)
==11643== by 0x439501: cmd_list (builtin-list.c:53)
==11643== by 0x497150: run_builtin (perf.c:359)
==11643== by 0x428CE0: handle_internal_command (perf.c:421)
==11643== by 0x428CE0: run_argv (perf.c:467)
==11643== by 0x428CE0: main (perf.c:614)
[...]
Additionally, a zero length result from readlink() is not very interesting.
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170322130624.21881-3-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tommi Rantala [Wed, 22 Mar 2017 13:06:19 +0000 (15:06 +0200)]
perf buildid: Do not update SDT cache with null filename
Valgrind was complaining:
==2633== Syscall param open(filename) points to unaddressable byte(s)
==2633== at 0x5281CC0: __open_nocancel (syscall-template.S:84)
==2633== by 0x537D38: open (fcntl2.h:53)
==2633== by 0x537D38: get_sdt_note_list (symbol-elf.c:2017)
==2633== by 0x5396FD: probe_cache__scan_sdt (probe-file.c:700)
==2633== by 0x49EA2C: build_id_cache__add_sdt_cache (build-id.c:625)
==2633== by 0x49EA2C: build_id_cache__add_s (build-id.c:697)
==2633== by 0x49EE72: build_id_cache__add_b (build-id.c:717)
==2633== by 0x49EE72: dso__cache_build_id (build-id.c:782)
==2633== by 0x49F190: __dsos__cache_build_ids (build-id.c:793)
==2633== by 0x49F190: machine__cache_build_ids (build-id.c:801)
==2633== by 0x49F190: perf_session__cache_build_ids (build-id.c:815)
==2633== by 0x4CD4F2: write_build_id (header.c:165)
==2633== by 0x4D26F7: do_write_feat (header.c:2296)
==2633== by 0x4D26F7: perf_header__adds_write (header.c:2335)
==2633== by 0x4D26F7: perf_session__write_header (header.c:2414)
==2633== by 0x43B324: __cmd_record (builtin-record.c:1154)
==2633== by 0x43B324: cmd_record (builtin-record.c:1839)
==2633== by 0x455A07: __cmd_record (builtin-kmem.c:1868)
==2633== by 0x455A07: cmd_kmem (builtin-kmem.c:1944)
==2633== by 0x497150: run_builtin (perf.c:359)
==2633== by 0x428CE0: handle_internal_command (perf.c:421)
==2633== by 0x428CE0: run_argv (perf.c:467)
==2633== by 0x428CE0: main (perf.c:614)
==2633== Address 0x0 is not stack'd, malloc'd or (recently) free'd
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tommi Rantala <tommi.t.rantala@nokia.com> Link: http://lkml.kernel.org/r/20170322130624.21881-2-tommi.t.rantala@nokia.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Taeung Song [Mon, 27 Mar 2017 07:10:37 +0000 (16:10 +0900)]
perf annotate: Fix a bug of division by zero when calculating percent
Currently perf-annotate with --print-line can print
-nan(0x8000000000000) because of division by zero when calculating
percent. The division by zero happens when a sum of samples is zero in
symbol__get_source_line(), so fix it.
For example:
After running 'perf record' like below,
$ perf record -e "{cycles,page-faults,branch-misses}" ./a.out
Before:
$ perf annotate --stdio -l
Sorted summary for file /home/taeung/workspace/a.out
----------------------------------------------
Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1490598638-13947-3-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Taeung Song [Mon, 27 Mar 2017 07:10:36 +0000 (16:10 +0900)]
perf annotate: Fix a bug following symbolic link of a build-id file
It is wrong way to read link name from a build-id file. Because a
build-id file is not anymore a symbolic link but build-id directory of
it is symbolic link, so fix it.
For example, if build-id file name gotten from
dso__build_id_filename() is as below,
Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1490598638-13947-2-git-send-email-treeze.taeung@gmail.com Fixes: 01412261d994 ("perf buildid-cache: Use path/to/bin/buildid/elf instead of path/to/bin/buildid") Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Milian Wolff [Sat, 18 Mar 2017 21:49:28 +0000 (22:49 +0100)]
perf report: Enable sorting by srcline as key
Often it is interesting to know how costly a given source line is in
total. Previously, one had to build these sums manually based on all
addresses that pointed to the same source line. This patch introduces
srcline as a sort key, which will do the aggregation for us.
Paired with the recent addition of showing inline frames, this makes
perf report much more useful for many C++ work loads.
The following shows the new feature in action. First, let's show the
status quo output when we sort by address. The result contains many hist
entries that generate the same output:
Jin Yao [Sat, 25 Mar 2017 20:34:26 +0000 (04:34 +0800)]
perf report: Find the inline stack for a given address
It would be useful for perf to support a mode to query the inline stack
for a given callgraph address. This would simplify finding the right
code in code that does a lot of inlining.
The srcline.c has contained the code which supports to translate the
address to filename:line_nr. This patch just extends the function to let
it support getting the inline stacks.
It introduces the inline_list which will store the inline function
result (filename:line_nr and funcname).
If BFD lib is not supported, the result is only filename:line_nr.
Signed-off-by: Yao Jin <yao.jin@linux.intel.com> Tested-by: Milian Wolff <milian.wolff@kdab.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Link: http://lkml.kernel.org/r/1490474069-15823-3-git-send-email-yao.jin@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Remove unused 'prefix' from builtin functions
We got it from the git sources but never used it for anything, with the
place where this would be somehow used remaining:
static int run_builtin(struct cmd_struct *p, int argc, const char **argv)
{
prefix = NULL;
if (p->option & RUN_SETUP)
prefix = NULL; /* setup_perf_directory(); */
Ditch it.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-uw5swz05vol0qpr32c5lpvus@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Fri, 24 Mar 2017 12:15:52 +0000 (14:15 +0200)]
perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms()
Address filtering with kernel symbols incorrectly resulted in the error
"Cannot determine size of symbol" because the no_size logic was the wrong
way around.
In trace__vfs_getname() and when checking if a thread is filtered in
trace__process_sample() we were not dropping the reference obtained via
machine__findnew_thread(), fix it.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-9gc470phavxwxv5d9w7ck8ev@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It shouldn't be zero, but if the 'perf probe' on getname_flags() (or
elsewhere in the future we need to probe to catch the pathname for
syscalls like 'open' being copied from userspace to the kernel) is
misplaced somehow, then we will end up not allocating space and trying
to copy the "" empty string to ttrace->filename.name, causing a
segfault, fix it.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-c4f1t6sx1nczuzop19r5si5s@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Fri, 24 Mar 2017 18:37:40 +0000 (19:37 +0100)]
Merge tag 'perf-core-for-mingo-4.12-20170324' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Allow suppressing 'uncore_' when specifying PMU events (Andi Kleen)
- Collapse identically named PMU events in 'perf stat', allow
not merging it via --no-merge (Andi Kleen)
Fixes:
- Use more precise 'grep -v' to suppress unwanted 'objdump -dS'
disassembly output to not ditch line:number lines needed by
'perf annotate --print-lines' logic (Taeung Song)
Infrastructure changes:
- SDT (Statically Defined Tracing)/uprobes_events arguments improvements
(Alexis Berlemont, Ravi Bangoria)
- Improvements for the handling of JSON described vendor events,
including having an expression parser to calculate metrics
from multiple vendor events (Andi Kleen)
Andi Kleen [Mon, 20 Mar 2017 20:17:11 +0000 (13:17 -0700)]
perf list: Move extra details printing to new option
Move the printing of perf expressions and internal events to a new
clearer --details flag, instead of lumping it together with other debug
options in --debug. This makes it clearer to use.
Before
perf list --debug
...
unc_m_power_critical_throttle_cycles
[Cycles all ranks are in critical thermal throttle. Unit: uncore_imc]
uncore_imc_2/event=0x86/ MetricName: power_critical_throttle_cycles % MetricExpr: (unc_m_power_critical_throttle_cycles / unc_m_clockticks) * 100.
after
perf list --details
...
unc_m_power_critical_throttle_cycles
[Cycles all ranks are in critical thermal throttle. Unit: uncore_imc]
uncore_imc_2/event=0x86/ MetricName: power_critical_throttle_cycles % MetricExpr: (unc_m_power_critical_throttle_cycles / unc_m_clockticks) * 100.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/20170320201711.14142-14-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andi Kleen [Mon, 20 Mar 2017 20:17:09 +0000 (13:17 -0700)]
perf list: Support printing MetricExpr with --debug
Output the metric expr in perf list when --debug is specified, so that
the user can check the formula.
Before:
% perf list
...
unc_m_power_channel_ppd
[Cycles where DRAM ranks are in power down (CKE) mode. Derived from unc_m_power_channel_ppd. Unit:
uncore_imc]
uncore_imc_2/event=0x85/
After:
% perf list --debug
...
unc_m_power_channel_ppd
[Cycles where DRAM ranks are in power down (CKE) mode. Derived from unc_m_power_channel_ppd. Unit:
uncore_imc]
Perf: uncore_imc_2/event=0x85/ MetricExpr: (unc_m_power_channel_ppd / unc_m_clockticks) * 100.
Andi Kleen [Mon, 20 Mar 2017 20:17:08 +0000 (13:17 -0700)]
perf stat: Output JSON MetricExpr metric
Add generic infrastructure to perf stat to output ratios for
"MetricExpr" entries in the event lists. Many events are more useful as
ratios than in raw form, typically some count in relation to total
ticks.
Transfer the MetricExpr information from the alias to the evsel.
We mark the events that need to be collected for MetricExpr, and also
link the events using them with a pointer. The code is careful to always
prefer the right event in the same group to minimize multiplexing
errors. At the moment only a single relation is supported.
Then add a rblist to the stat shadow code that remembers stats based on
the cpu and context.
Then finally update and retrieve and print these values similarly to the
existing hardcoded perf metrics. We use the simple expression parser
added earlier to evaluate the expression.
Normally we just output the result without further commentary, but for
--metric-only this would lead to empty columns. So for this case use the
original event as description.
There is no attempt to automatically add the MetricExpr event, if it is
missing, however we suggest it to the user, because the user tool
doesn't have enough information to reliably construct a group that is
guaranteed to schedule. So we leave that to the user.
% perf stat -a -I 1000 -e '{unc_p_clockticks,unc_p_freq_max_os_cycles}' --metric-only
# time freq_max_os_cycles %
1.000127077 0.9
2.000301436 0.7
3.000456379 0.0
v2: Change from DivideBy to MetricExpr
v3: Use expr__ prefix. Support more than one other event.
v4: Update description
v5: Only print warning message once for multiple PMUs.
Andi Kleen [Mon, 20 Mar 2017 20:17:05 +0000 (13:17 -0700)]
perf tools: Add a simple expression parser for JSON
Add a simple expression parser good enough to parse JSON relation
expressions. The parser is implemented using bison.
This is just intended as an simple parser for internal usage in the
event lists, not the beginning of a "perf scripting language"
v2: Use expr__ prefix instead of expr_
Support multiple free variables for parser
Committer note:
The v2 patch had:
%define api.pure full
In expr.y, that is a feature introduced in bison 2.7, to have reentrant
parsers, not using global variables, which would make tools/perf stop
building with the bison version shipped in older distros, so Andi
realised that the other parsers (e.g. parse-events.y) were using:
%pure-parser
Which is present in older versions of bison and fits the bill.
To finally make it build, copying what was there for pmu-bison.o,
another parser.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/20170320201711.14142-8-andi@firstfloor.org
[ stdlib.h is needed in tests/expr.c for free() fixing build in systems such as ubuntu:16.04-x-s390 ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andi Kleen [Mon, 20 Mar 2017 20:17:03 +0000 (13:17 -0700)]
perf pmu: Expand PMU events by prefix match
When the user specifies a pmu directly, expand it automatically with a
prefix match for all available PMUs, similar as we do for the normal
aliases now.
This allows to specify attributes for duplicated boxes quickly. For
example uncore_cbox_{0,6}/.../ can be now specified as uncore_cbox/.../
and it gets automatically expanded for all boxes.
This generally makes it more concise to write uncore specifications, and
also avoids the need to know the exact topology of the system.
Before:
% perf stat -a -e uncore_cbox_0/event=0x35,umask=0x1,filter_opc=0x19C/,\
uncore_cbox_1/event=0x35,umask=0x1,filter_opc=0x19C/,\
uncore_cbox_2/event=0x35,umask=0x1,filter_opc=0x19C/,\
uncore_cbox_3/event=0x35,umask=0x1,filter_opc=0x19C/,\
uncore_cbox_4/event=0x35,umask=0x1,filter_opc=0x19C/,\
uncore_cbox_5/event=0x35,umask=0x1,filter_opc=0x19C/ sleep 1
After:
% perf stat -a -e uncore_cbox/event=0x35,umask=0x1,filter_opc=0x19C/ sleep 1
v2: Handle all bison rules. Move multi add code to separate function.
Handle uncore_ prefix correctly.
v3: Move parse_events_multi_pmu_add to separate patch. Move uncore
prefix check to separate patch.
Andi Kleen [Mon, 20 Mar 2017 20:17:00 +0000 (13:17 -0700)]
perf stat: Collapse identically named events
The uncore PMU has a lot of duplicated PMUs for different subsystems.
When expanding an uncore alias we usually end up with a large
number of identically named aliases, which makes perf stat
output difficult to read.
Automatically sum them up in perf stat, unless --no-merge is specified.
This can be default because only the uncores generally have duplicated
aliases. Other PMUs have unique names.
Before:
% perf stat --no-merge -a -e unc_c_llc_lookup.any sleep 1
v2: Split collect_aliases. Rename alias flag.
v3: Make sure unsupported/not counted is always printed.
v4: Factor out callback change into separate patch.
v5: Move check for bad results here
Move merged check into collect_data
perf annotate: Add comment clarifying how the source code line is parsed
The source code line number (lineno) needs to be kept in accross calls
to symbol__parse_objdump_line() when parsing the output of 'objdump -l
-dS', so that it can associate it with the instructions till the next
line.
See disasm_line__new() and struct disasm_line::line_nr.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-7hpx8f8ybdpiujceysaj229w@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Taeung Song [Mon, 20 Mar 2017 02:56:57 +0000 (11:56 +0900)]
perf annotate: More exactly grep -v of the objdump command
The 'grep -v "filename"' applied to the objdump command output cause a
side effect eliminating filename:linenr of output of 'objdump -l' if the
object file name and source file name are the same, fix it.
E.g. the output of the following objdump command in symbol__disassemble():
But it uses grep -v "filename" e.g. "/home/taeung/hello" in the objdump
command to remove the first line containing file name and file format
("/home/taeung/hello: file format elf64-x86-64"):
But this causes a side effect, removing filename:linenr too, because the
object file and source file have the same name e.g. "/home/taueng/hello",
"/home/taeung/hello.c"
So more do a better match by using grep -v as below to correctly remove
that first line:
Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1489978617-31396-5-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Thu, 2 Feb 2017 11:11:40 +0000 (16:41 +0530)]
perf sdt x86: Add renaming logic for rNN and other registers
'perf probe' is failing for sdt markers whose arguments has rNN (with
postfix b/w/d), %rsp, %esp, %sil etc. registers. Add renaming logic for
these registers.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexis Berlemont <alexis.berlemont@gmail.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170202111143.14319-3-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Alexis Berlemont [Wed, 14 Dec 2016 00:07:32 +0000 (01:07 +0100)]
perf probe: Add sdt probes arguments into the uprobe cmd string
An sdt probe can be associated with arguments but they were not passed
to the user probe tracing interface (uprobe_events); this patch adapts
the sdt argument descriptors according to the uprobe input format.
As the uprobe parser does not support scaled address mode, perf will
skip arguments which cannot be adapted to the uprobe format.
Alexis Berlemont [Wed, 14 Dec 2016 00:07:31 +0000 (01:07 +0100)]
perf sdt: Add scanning of sdt probes arguments
During a "perf buildid-cache --add" command, the section ".note.stapsdt"
of the "added" binary is scanned in order to list the available SDT
markers available in a binary. The parts containing the probes arguments
were left unscanned.
The whole section is now parsed; the probe arguments are extracted for
later use.
Signed-off-by: Alexis Berlemont <alexis.berlemont@gmail.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20161214000732.1710-2-alexis.berlemont@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ravi Bangoria [Tue, 7 Feb 2017 05:45:47 +0000 (11:15 +0530)]
perf probe: Change MAX_CMDLEN
There are many SDT markers in powerpc whose uprobe definition goes
beyond current MAX_CMDLEN, especially when target filename is long and
sdt marker has long list of arguments. For example, definition of sdt
marker
'perf probe' fails with segfault for such markers. As the uprobe_events
file accepts definitions up to 4094 characters(4096 - 2 (\n\0)),
increase value of MAX_CMDLEN match that.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexis Berlemont <alexis.berlemont@gmail.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20170207054547.3690-1-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Tue, 21 Mar 2017 06:41:29 +0000 (07:41 +0100)]
Merge tag 'perf-core-for-mingo-4.12-20170320' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
Fixes:
- Fix concat_probe_trace_events() in 'perf probe', it should dereference a
pointer, not test its value (Ravi Bangoria)
User visible changes:
- Handle partial AUX records, checking if 'kvm_intel.ko' is loaded and
if its 'vmm_exclusive' parameter is set to 0, suggesting tweaking
it to reduce gaps (Alexander Shishkin)
Infrastructure changes:
- Sync the kvm.h, cpufeatures.h and perf_event.h tools/ headers copies
with the kernel (Arnaldo Carvalho de Melo, Alexander Shishkin)
- 'perf lock' subcommands should include common options, using
OPT_PARENT() (Changbin Du)
- Ditto for 'perf timechart' (Arnaldo Carvalho de Melo)
Documentation changes:
Correct 'perf stat --no-aggr' description (Ravi Bangoria)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
The changes in the following csets are not relevant for what is used in
tools/perf/arch/powerpc/util/kvm-stat.c, but lets sync it to silence the
diff detector in the tools build system:
Ravi Bangoria [Wed, 8 Mar 2017 06:59:07 +0000 (12:29 +0530)]
perf probe: Fix concat_probe_trace_events
'*ntevs' contains number of elements present in 'tevs' array. If there
are no elements in array, 'tevs2' can be directly assigned to 'tevs'
without allocating more space. So the condition should be '*ntevs == 0'
not 'ntevs == 0'.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 42bba263eb58 ("perf probe: Allow wildcard for cached events") Link: http://lkml.kernel.org/r/20170308065908.4128-1-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Handle partial AUX records and print a warning
This patch decodes the 'partial' flag in AUX records and prints
a warning to the user, so that they don't have to guess why their
PT traces contain gaps (or missing altogether):
Warning:
AUX data had gaps in it 8 times out of 8!
Are you running a KVM guest in the background?
Trying to be even more helpful, we will detect if the user's kvm driver sets up
exclusive VMX root mode for the entire lifespan of the kvm process:
Reloading kvm_intel module with vmm_exclusive=0
will reduce the gaps to only guest's timeslices.
Note however, that you'll still have gaps in cpu-wide traces even with
vmm_exclusive=0, but the number of gaps will be below 100% (as opposed to the
above example).
Currently this is the only reason for partial records.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vince@deater.net> Link: http://lkml.kernel.org/r/8760j941ig.fsf@ashishki-desk.ger.corp.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move -T/--tasks-only and -P/--power-only options to a separate options
array that then gets referenced via OPT_PARENT from the 'perf timechart'
and 'perf timechart record' option arrays.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-j80lol9wj1i6556ibh48iebe@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf lock: Make 'f' part of the common 'lock_options'
All options need the -f/--force option, so move it to the array
referenced via OPT_PARENT.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Changbin Du <changbin.du@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-unbeionpi58rioh4e9w8kp4n@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Fri, 17 Mar 2017 14:13:28 +0000 (15:13 +0100)]
Merge tag 'perf-urgent-for-mingo-4.11-20170317' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fix from Arnaldo Carvalho de Melo:
- Fix symbols__fixup_end heuristic for corner cases, such as JITted eBPF
programs, that are loaded at page aligned addresses, just after the
kernel proper (Daniel Borkmann)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Daniel Borkmann [Wed, 15 Mar 2017 21:53:37 +0000 (22:53 +0100)]
perf symbols: Fix symbols__fixup_end heuristic for corner cases
The current symbols__fixup_end() heuristic for the last entry in the rb
tree is suboptimal as it leads to not being able to recognize the symbol
in the call graph in a couple of corner cases, for example:
i) If the symbol has a start address (f.e. exposed via kallsyms)
that is at a page boundary, then the roundup(curr->start, 4096)
for the last entry will result in curr->start == curr->end with
a symbol length of zero.
ii) If the symbol has a start address that is shortly before a page
boundary, then also here, curr->end - curr->start will just be
very few bytes, where it's unrealistic that we could perform a
match against.
Instead, change the heuristic to roundup(curr->start, 4096) + 4096, so
that we can catch such corner cases and have a better chance to find
that specific symbol. It's still just best effort as the real end of the
symbol is unknown to us (and could even be at a larger offset than the
current range), but better than the current situation.
Alexei reported that he recently run into case i) with a JITed eBPF
program (these are all page aligned) as the last symbol which wasn't
properly shown in the call graph (while other eBPF program symbols in
the rb tree were displayed correctly). Since this is a generic issue,
lets try to improve the heuristic a bit.
Andy Lutomirski [Thu, 16 Mar 2017 19:59:39 +0000 (12:59 -0700)]
x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
If one thread mmaps a perf event while another thread in the same mm
is in some context where active_mm != mm (which can happen in the
scheduler, for example), refresh_pce() would write the wrong value
to CR4.PCE. This broke some PAPI tests.
Reported-and-tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 7911d3f7af14 ("perf/x86: Only allow rdpmc if a perf_event is mapped") Link: http://lkml.kernel.org/r/0c5b38a76ea50e405f9abe07a13dfaef87c173a1.1489694270.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
- Make perf_event__synthesize_mmap_events() scale on older kernels where
reading /proc/pid/maps is way slower than reading /proc/pid/task/pid/maps (Stephane Eranian)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 16 Mar 2017 12:47:51 +0000 (13:47 +0100)]
perf/core: Better explain the inherit magic
While going through the event inheritance code Oleg got confused.
Add some comments to better explain the silent dissapearance of
orphaned events.
So what happens is that at perf_event_release_kernel() time; when an
event looses its connection to userspace (and ceases to exist from the
user's perspective) we can still have an arbitrary amount of inherited
copies of the event. We want to synchronously find and remove all
these child events.
Since that requires a bit of lock juggling, there is the possibility
that concurrent clone()s will create new child events. Therefore we
first mark the parent event as DEAD, which marks all the extant child
events as orphaned.
We then avoid copying orphaned events; in order to avoid getting more
of them.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fweisbec@gmail.com Link: http://lkml.kernel.org/r/20170316125823.289567442@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 16 Mar 2017 12:47:49 +0000 (13:47 +0100)]
perf/core: Fix event inheritance on fork()
While hunting for clues to a use-after-free, Oleg spotted that
perf_event_init_context() can loose an error value with the result
that fork() can succeed even though we did not fully inherit the perf
event context.
Spotted-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: oleg@redhat.com Cc: stable@vger.kernel.org Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists") Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch closes the hole by making perf_event_free_task() destroy the
task <-> ctx relation such that perf_event_release_kernel() will no longer
observe the now dead task.
Spotted-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fweisbec@gmail.com Cc: oleg@redhat.com Cc: stable@vger.kernel.org Fixes: c6e5b73242d2 ("perf: Synchronously clean up child events") Link: http://lkml.kernel.org/r/20170314155949.GE32474@worktop Link: http://lkml.kernel.org/r/20170316125823.140295131@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andi Kleen [Thu, 23 Feb 2017 23:46:34 +0000 (15:46 -0800)]
perf script: Add 'brstackinsn' for branch stacks
Implement printing instruction sequences as hex dump for branch stacks.
This relies on the x86 instruction decoder used by the PT decoder to
find the lengths of instructions to dump them individually.
This is good enough for pattern matching.
This allows to study hot paths for individual samples, together with
branch misprediction and cycle count / IPC information if available (on
Skylake systems).
Only works when no special branch filters are specified.
Occasionally the path does not reach up to the sample IP, as the LBRs
may be frozen before executing a final jump. In this case we print a
special message.
The instruction dumper piggy backs on the existing infrastructure from
the IP PT decoder.
An earlier iteration of this patch relied on a disassembler, but this
version only uses the existing instruction decoder.
Committer note:
Added hint about how to get suitable perf.data files for use with
'-F brstackinsm':
$ perf record usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.018 MB perf.data (8 samples) ]
$
$ perf script -F brstackinsn
Display of branch stack assembler requested, but non all-branch filter set
Hint: run 'perf record -b ...'
$
Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Link: http://lkml.kernel.org/r/20170223234634.583-1-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
1c5ac21a0e ("perf/x86/intel/pt: Don't die on VMXON")
... PT events depend on re-scheduling to get enabled after a VMX session
has taken place. This is, in particular, a problem for CPU context events,
which don't normally get re-scheduled, unless there is a reason.
This patch changes the VMX handling so that PT event gets re-enabled
when VMX root mode exits.
Also, notify the user when there's a gap in PT data due to VMX root
mode by flagging AUX records as partial.
In combination with vmm_exclusive=0 parameter of the kvm_intel driver,
this will result in trace gaps only for the duration of the guest's
timeslices.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170220133352.17995-5-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Intel PT driver needs to be able to communicate partial AUX transactions,
that is, transactions with gaps in data for reasons other than no room
left in the buffer (i.e. truncated transactions). Therefore, this condition
does not imply a wakeup for the consumer.
To this end, add a new "partial" AUX flag.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170220133352.17995-4-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Will Deacon [Mon, 20 Feb 2017 13:33:50 +0000 (15:33 +0200)]
perf/core: Keep AUX flags in the output handle
In preparation for adding more flags to perf AUX records, introduce a
separate API for setting the flags for a session, rather than appending
more bool arguments to perf_aux_output_end. This allows to set each
flag at the time a corresponding condition is detected, instead of
tracking it in each driver's private state.
Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20170220133352.17995-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Wed, 15 Mar 2017 23:54:58 +0000 (16:54 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Four small fixes for this cycle:
- followup fix from Neil for a fix that went in before -rc2, ensuring
that we always see the full per-task bio_list.
- fix for blk-mq-sched from me that ensures that we retain similar
direct-to-issue behavior on running the queue.
- fix from Sagi fixing a potential NULL pointer dereference in blk-mq
on spurious CPU unplug.
- a memory leak fix in writeback from Tahsin, fixing a case where
device removal of a mounted device can leak a struct
wb_writeback_work"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq-sched: don't run the queue async from blk_mq_try_issue_directly()
writeback: fix memory leak in wb_queue_work()
blk-mq: Fix tagset reinit in the presence of cpu hot-unplug
blk: Ensure users for current->bio_list can see the full list.
Stephane Eranian [Wed, 15 Mar 2017 17:17:13 +0000 (10:17 -0700)]
perf tools: Make perf_event__synthesize_mmap_events() scale
This patch significantly improves the execution time of
perf_event__synthesize_mmap_events() when running perf record on systems
where processes have lots of threads.
It just happens that cat /proc/pid/maps support uses a O(N^2) algorithm to
generate each map line in the maps file. If you have 1000 threads, then you
have necessarily 1000 stacks. For each vma, you need to check if it
corresponds to a thread's stack. With a large number of threads, this can take
a very long time. I have seen latencies >> 10mn.
As of today, perf does not use the fact that a mapping is a stack, therefore we
can work around the issue by using /proc/pid/tasks/pid/maps. This entry does
not try to map a vma to stack and is thus much faster with no loss of
functonality.
The proc-map-timeout logic is kept in case users still want some upper limit.
In V2, we fix the file path from /proc/pid/tasks/pid/maps to actual
/proc/pid/task/pid/maps, tasks -> task. Thanks Arnaldo for catching this.
Committer note:
This problem seems to have been elliminated in the kernel since commit : b18cb64ead40 ("fs/proc: Stop trying to report thread stacks").
Naveen N. Rao [Wed, 8 Mar 2017 08:26:06 +0000 (13:56 +0530)]
trace/kprobes: Fix check for kretprobe offset within function entry
perf specifies an offset from _text and since this offset is fed
directly into the arch-specific helper, kprobes tracer rejects
installation of kretprobes through perf. Fix this by looking up the
actual offset from a function for the specified sym+offset.
Refactor and reuse existing routines to limit code duplication -- we
repurpose kprobe_addr() for determining final kprobe address and we
split out the function entry offset determination into a separate
generic helper.
Before patch:
naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -v do_open%return
probe-definition(0): do_open%return
symbol:do_open file:(null) line:0 offset:0 return:1 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /boot/vmlinux for symbols
Open Debuginfo file: /boot/vmlinux
Try to find probe point from debuginfo.
Matched function: do_open [2d0c7ff]
Probe point found: do_open+0
Matched function: do_open [35d76dc]
found inline addr: 0xc0000000004ba9c4
Failed to find "do_open%return",
because do_open is an inlined function and has no return point.
An error occurred in debuginfo analysis (-22).
Trying to use symbols.
Opening /sys/kernel/debug/tracing//README write=0
Opening /sys/kernel/debug/tracing//kprobe_events write=1
Writing event: r:probe/do_open _text+4469776
Failed to write event: Invalid argument
Error: Failed to add events. Reason: Invalid argument (Code: -22)
naveen@ubuntu:~/linux/tools/perf$ dmesg | tail
<snip>
[ 33.568656] Given offset is not valid for return probe.
After patch:
naveen@ubuntu:~/linux/tools/perf$ sudo ./perf probe -v do_open%return
probe-definition(0): do_open%return
symbol:do_open file:(null) line:0 offset:0 return:1 lazy:(null)
0 arguments
Looking at the vmlinux_path (8 entries long)
Using /boot/vmlinux for symbols
Open Debuginfo file: /boot/vmlinux
Try to find probe point from debuginfo.
Matched function: do_open [2d0c7d6]
Probe point found: do_open+0
Matched function: do_open [35d76b3]
found inline addr: 0xc0000000004ba9e4
Failed to find "do_open%return",
because do_open is an inlined function and has no return point.
An error occurred in debuginfo analysis (-22).
Trying to use symbols.
Opening /sys/kernel/debug/tracing//README write=0
Opening /sys/kernel/debug/tracing//kprobe_events write=1
Writing event: r:probe/do_open _text+4469808
Writing event: r:probe/do_open_1 _text+4956344
Added new events:
probe:do_open (on do_open%return)
probe:do_open_1 (on do_open%return)
You can now use it in all perf tools, such as:
perf record -e probe:do_open_1 -aR sleep 1
naveen@ubuntu:~/linux/tools/perf$ sudo cat /sys/kernel/debug/kprobes/list c000000000041370 k kretprobe_trampoline+0x0 [OPTIMIZED] c0000000004ba0b8 r do_open+0x8 [DISABLED] c000000000443430 r do_open+0x0 [DISABLED]
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/d8cd1ef420ec22e3643ac332fdabcffc77319a42.1488961018.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Wed, 15 Mar 2017 18:27:27 +0000 (19:27 +0100)]
Merge tag 'perf-core-for-mingo-4.12-20170314' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New features:
- Add PERF_RECORD_NAMESPACES so that the kernel can record information
required to associate samples to namespaces, helping in container
problem characterization.
Now the 'perf record has a --namespace' option to ask for such info,
and when present, it can be used, initially, via a new sort order,
'cgroup_id', allowing histogram entry bucketization by a (device, inode)
based cgroup identifier (Hari Bathini)
- Add --next option to 'perf sched timehist', showing what is the next
thread to run (Brendan Gregg)
Linus Torvalds [Wed, 15 Mar 2017 17:44:19 +0000 (10:44 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a rather large set of fixes. The bulk are for lpfc correcting
a lot of issues in the new NVME driver code which just went in in the
merge window.
The others are:
- fix a hang in the vmware paravirt driver caused by incorrect
handling of the new MSI vector allocation
- long standing bug in storvsc, which recent block changes turned
from being a harmless annoyance into a hang
- yet more fallout (in mpt3sas) from the changes to device blocking
The remainder are small fixes and updates"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (34 commits)
scsi: lpfc: Add shutdown method for kexec
scsi: storvsc: Workaround for virtual DVD SCSI version
scsi: lpfc: revise version number to 11.2.0.10
scsi: lpfc: code cleanups in NVME initiator discovery
scsi: lpfc: code cleanups in NVME initiator base
scsi: lpfc: correct rdp diag portnames
scsi: lpfc: remove dead sli3 nvme code
scsi: lpfc: correct double print
scsi: lpfc: Rename LPFC_MAX_EQ_DELAY to LPFC_MAX_EQ_DELAY_EQID_CNT
scsi: lpfc: Rework lpfc Kconfig for NVME options
scsi: lpfc: add transport eh_timed_out reference
scsi: lpfc: Fix eh_deadline setting for sli3 adapters.
scsi: lpfc: add NVME exchange aborts
scsi: lpfc: Fix nvme allocation bug on failed nvme_fc_register_localport
scsi: lpfc: Fix IO submission if WQ is full
scsi: lpfc: Fix NVME CMD IU byte swapped word 1 problem
scsi: lpfc: Fix RCTL value on NVME LS request and response
scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters
scsi: lpfc: fix missing spin_unlock on sql_list_lock
scsi: lpfc: don't dereference dma_buf->iocbq before null check
...
Linus Torvalds [Wed, 15 Mar 2017 16:33:15 +0000 (09:33 -0700)]
Merge tag 'gfs2-4.11-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Bob Peterson:
"This is an emergency patch for 4.11-rc3
The GFS2 developers uncovered a really nasty problem that can lead to
random corruption and kernel panic, much like the last one. Andreas
Gruenbacher wrote a simple one-line patch to fix the problem."
* tag 'gfs2-4.11-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Avoid alignment hole in struct lm_lockname
Commit 88ffbf3e03 switches to using rhashtables for glocks, hashing over
the entire struct lm_lockname instead of its individual fields. On some
architectures, struct lm_lockname contains a hole of uninitialized
memory due to alignment rules, which now leads to incorrect hash values.
Get rid of that hole.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> CC: <stable@vger.kernel.org> #v4.3+
1) Ensure that mtu is at least IPV6_MIN_MTU in ipv6 VTI tunnel driver,
from Steffen Klassert.
2) Fix crashes when user tries to get_next_key on an LPM bpf map, from
Alexei Starovoitov.
3) Fix detection of VLAN fitlering feature for bnx2x VF devices, from
Michal Schmidt.
4) We can get a divide by zero when TCP socket are morphed into
listening state, fix from Eric Dumazet.
5) Fix socket refcounting bugs in skb_complete_wifi_ack() and
skb_complete_tx_timestamp(). From Eric Dumazet.
6) Use after free in dccp_feat_activate_values(), also from Eric
Dumazet.
7) Like bonding team needs to use ETH_MAX_MTU as netdev->max_mtu, from
Jarod Wilson.
8) Fix use after free in vrf_xmit(), from David Ahern.
9) Don't do UDP Fragmentation Offload on IPComp ipsec packets, from
Alexey Kodanev.
10) Properly check napi_complete_done() return value in order to decide
whether to re-enable IRQs or not in amd-xgbe driver, from Thomas
Lendacky.
11) Fix double free of hwmon device in marvell phy driver, from Andrew
Lunn.
12) Don't crash on malformed netlink attributes in act_connmark, from
Etienne Noss.
13) Don't remove routes with a higher metric in ipv6 ECMP route replace,
from Sabrina Dubroca.
14) Don't write into a cloned SKB in ipv6 fragmentation handling, from
Florian Westphal.
15) Fix routing redirect races in dccp and tcp, basically the ICMP
handler can't modify the socket's cached route in it's locked by the
user at this moment. From Jon Maxwell.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (108 commits)
qed: Enable iSCSI Out-of-Order
qed: Correct out-of-bound access in OOO history
qed: Fix interrupt flags on Rx LL2
qed: Free previous connections when releasing iSCSI
qed: Fix mapping leak on LL2 rx flow
qed: Prevent creation of too-big u32-chains
qed: Align CIDs according to DORQ requirement
mlxsw: reg: Fix SPVMLR max record count
mlxsw: reg: Fix SPVM max record count
net: Resend IGMP memberships upon peer notification.
dccp: fix memory leak during tear-down of unsuccessful connection request
tun: fix premature POLLOUT notification on tun devices
dccp/tcp: fix routing redirect race
ucc/hdlc: fix two little issue
vxlan: fix ovs support
net: use net->count to check whether a netns is alive or not
bridge: drop netfilter fake rtable unconditionally
ipv6: avoid write to a possibly cloned skb
net: wimax/i2400m: fix NULL-deref at probe
isdn/gigaset: fix NULL-deref at probe
...
Linus Torvalds [Tue, 14 Mar 2017 22:00:43 +0000 (15:00 -0700)]
Merge branch 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Three libata fixes:
- fix for a circular reference bug in sysfs code which prevented
pata_legacy devices from being released after probe failure, which
in turn prevented devres from releasing the associated resources.
- drop spurious WARN in the command issue path which can be triggered
by a legitimate passthrough command.
- an ahci_qoriq specific fix"
* 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: qoriq: correct the sata ecc setting error
libata: drop WARN from protocol error in ata_sff_qc_issue()
libata: transport: Remove circular dependency at free time
Linus Torvalds [Tue, 14 Mar 2017 21:52:08 +0000 (14:52 -0700)]
Merge branch 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
"If a delayed work is queued with NULL @wq, workqueue code explodes
after the timer expires at which point it's difficult to tell who the
culprit was.
This actually happened and the offender was net/smc this time.
Add an explicit sanity check for it in the queueing path"
* 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
Linus Torvalds [Tue, 14 Mar 2017 21:48:50 +0000 (14:48 -0700)]
Merge branch 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
Pull percpu fixes from Tejun Heo:
- the allocation path was updating pcpu_nr_empty_pop_pages without the
required locking which can lead to incorrect handling of empty chunks
(e.g. keeping too many around), which is buggy but shouldn't lead to
critical failures. Fixed by adding the locking
- a trivial patch to drop an unused param from pcpu_get_pages()
* 'for-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
percpu: remove unused chunk_alloc parameter from pcpu_get_pages()
percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages
David S. Miller [Tue, 14 Mar 2017 18:37:06 +0000 (11:37 -0700)]
Merge branch 'qed-fixes'
Yuval Mintz says:
====================
qed: Fixes series
This address several different issues in qed.
The more significant portions:
Patch #1 would cause timeout when qedr utilizes the highest
CIDs availble for it [or when future qede adapters would utilize
queues in some constellations].
Patch #4 fixes a leak of mapped addresses; When iommu is enabled,
offloaded storage protocols might eventually run out of resources
and fail to map additional buffers.
Patches #6,#7 were missing in the initial iSCSI infrastructure
submissions, and would hamper qedi's stability when it reaches
out-of-order scenarios.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Tue, 14 Mar 2017 13:26:04 +0000 (15:26 +0200)]
qed: Enable iSCSI Out-of-Order
Missing in the initial submission, qed fails to propagate qedi's
request to enable OOO to firmware.
Fixes: fc831825f99e ("qed: Add support for hardware offloaded iSCSI") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Tue, 14 Mar 2017 13:26:03 +0000 (15:26 +0200)]
qed: Correct out-of-bound access in OOO history
Need to set the number of entries in database, otherwise the logic
would quickly surpass the array.
Fixes: 1d6cff4fca43 ("qed: Add iSCSI out of order packet handling") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ram Amrani [Tue, 14 Mar 2017 13:26:02 +0000 (15:26 +0200)]
qed: Fix interrupt flags on Rx LL2
Before iterating over the the LL2 Rx ring, the ring's
spinlock is taken via spin_lock_irqsave().
The actual processing of the packet [including handling
by the protocol driver] is done without said lock,
so qed releases the spinlock and re-claims it afterwards.
Problem is that the final spin_lock_irqrestore() at the end
of the iteration uses the original flags saved from the
initial irqsave() instead of the flags from the most recent
irqsave(). So it's possible that the interrupt status would
be incorrect at the end of the processing.
Fixes: 0a7fb11c23c0 ("qed: Add Light L2 support"); CC: Ram Amrani <Ram.Amrani@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Tue, 14 Mar 2017 13:26:01 +0000 (15:26 +0200)]
qed: Free previous connections when releasing iSCSI
Fixes: fc831825f99e ("qed: Add support for hardware offloaded iSCSI") Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tomer Tayar [Tue, 14 Mar 2017 13:25:59 +0000 (15:25 +0200)]
qed: Prevent creation of too-big u32-chains
Current Logic would allow the creation of a chain with U32_MAX + 1
elements, when the actual maximum supported by the driver infrastructure
is U32_MAX.
Fixes: a91eb52abb50 ("qed: Revisit chain implementation") Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>