Adrian Hunter [Thu, 8 Aug 2013 11:32:20 +0000 (14:32 +0300)]
perf machine: Add symbol filter to struct machine
The symbol filter needs to be applied machine-wide, so add it to struct
machine.
Currently tools pass the symbol filter as a parameter to various
map-related functions. However a need to load a map can occur anywhere
in the code, at which point the filter is needed.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375961547-30267-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Thu, 8 Aug 2013 02:50:58 +0000 (22:50 -0400)]
perf session: Change perf_session__has_traces to actually check for tracepoints
Any event can have RAW data attribute set. The intent of the function is
to determine if the session has tracepoints, so check for the type of
each event explicitly.
Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375930261-77273-17-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Thu, 8 Aug 2013 02:50:47 +0000 (22:50 -0400)]
perf sched: Remove sched_process_fork tracepoint
The PERF_RECORD_FORK event is already collected as part of the use of
cmd_record and those events are analyzed as part of the libperf
machinery. Using the fork tracepoint as well just duplicates the event
load.
Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375930261-77273-6-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Mon, 12 Aug 2013 08:14:47 +0000 (10:14 +0200)]
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
* Do annotation using /proc/kcore and /proc/kallsyms, removing the
need for a vmlinux file kernel assembly annotation. This also improves
this use case because vmlinux has just the initial kernel image, not
what is actually in use after various code patchings by things like
alternatives, etc. From Adrian Hunter.
* Add various improvements and fixes to the "vmlinux matches kallsyms"
'perf test' entry, related to the /proc/kcore annotation feature.
* Add --initial-delay option to 'perf stat' to skip measuring for
the startup phase, from Andi Kleen.
* Add perf kvm stat live mode that combines aspects of 'perf kvm stat' record
and report, from David Ahern.
* Add option to analyze specific VM in perf kvm stat report, from David Ahern.
* Do not require /lib/modules/* on a guest, fix from Jason Wessel.
* Group leader sampling, that allows just one event in a group to sample while
the other events have just its values read, from Jiri Olsa.
* Add support for a new modifier "D", which requests that the event, or group
of events, be pinned to the PMU, from Michael Ellerman.
* Fix segmentation fault on the gtk browser, from Namhyung Kim.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jason Wessel [Mon, 15 Jul 2013 20:27:53 +0000 (15:27 -0500)]
perf machine: Do not require /lib/modules/* on a guest
For some types of work loads and special guest environments, you might
have a kernel that has no kernel modules. The perf kvm record tool
fails instantiate vmlinux maps when the kernel modules directory cannot
be opened, even though the kallsyms has been properly processed. This
leads to a perf kvm report that has no guest symbols resolved.
This patch changes the failure to locate kernel modules to be non-fatal.
Add a negative test to test__checkevent_pmu_events() to get lots of
coverage of the negative case, ie. when the modifier is not specified.
Add a test of a single event, and of the group case.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1375795686-4226-2-git-send-email-michael@ellerman.id.au Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This commit adds support for a new modifier "D", which requests that the
event, or group of events, be pinned to the PMU.
The "p" modifier is already taken for precise, and "P" may be used in
future to mean "fully precise".
So we use "D", which stands for pinneD - and looks like a padlock, or if
you're using the ":D" syntax perf smiles at you.
This is an oft-requested feature from our HW folks, who want to be able
to run a large number of events, but also want 100% accurate results for
instructions per cycle.
Comparison of results with and without pinning:
$ perf stat -e '{cycles,instructions}:D' -e cycles,instructions,...
79,590,480,683 cycles # 0.000 GHz
166,123,716,524 instructions # 2.09 insns per cycle
# 0.11 stalled cycles per insn
79,352,134,463 cycles # 0.000 GHz [11.11%]
165,178,301,818 instructions # 2.08 insns per cycle
# 0.11 stalled cycles per insn [11.13%]
As you can see although perf does a very good job of scaling the values
in the non-pinned case, there is some small discrepancy.
The patch is fairly straight forward, the one detail is that we need to
make sure we only request pinning for the group leader when we have a
group.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1375795686-4226-1-git-send-email-michael@ellerman.id.au
[ Use perf_evsel__is_group_leader instead of open coded equivalent, as
suggested by Jiri Olsa ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Tue, 6 Aug 2013 05:14:13 +0000 (14:14 +0900)]
perf ui/gtk: Fix segmentation fault on perf_hpp__for_each_format loop
The commit 2b8bfa6bb8a7 ("perf tools: Centralize default columns init in
perf_hpp__init") moves initialization of common overhead column to
perf_hpp__init() but forgot about the gtk code.
So the gtk code added the same column to the list twice causing infinite
loop when iterating it by perf_hpp__for_each_format loop. When I run
perf report --gtk, I can see following messages indefinitely.
David Ahern [Tue, 6 Aug 2013 01:41:37 +0000 (21:41 -0400)]
perf kvm stat report: Add option to analyze specific VM
Add an option to analyze a specific VM within a data file. This allows
the collection of kvm events for all VMs and then analyze data for each
VM (or set of VMs) individually.
Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1375753297-69645-6-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Tue, 6 Aug 2013 01:41:34 +0000 (21:41 -0400)]
perf kvm: Add live mode
perf kvm stat currently requires back to back record and report commands
to see stats. e.g,.
perf kvm stat record -p $pid -- sleep 1
perf kvm stat report
This is inconvenvient for on box monitoring of a VM. This patch
introduces a 'live' mode that in effect combines the record plus report
into one command. e.g., to monitor a single VM:
perf kvm stat live -p $pid
or all VMs:
perf kvm stat live
Same stats options for the record+report path work with the live mode.
Display rate defaults to 1 second and can be changed using the -d
option.
v4:
- address comments from Xiao -- verify_vcpu check should not look at
processors on line for the host, prune configurable options.
- set attr->{mmap,comm,task} to 0 - don't need task events so trim events
we have to deal with
- better control of time for queue event flushing to reduce frequency of
"Timestamp below last timeslice flush" failures.
v3:
updated to use existing tracepoint parsing code
v2:
removed ABSTIME arg from timerfd_settime as mentioned by Namhyung
only call perf_kvm__handle_stdin when poll returns activity.
Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1375753297-69645-3-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Tue, 6 Aug 2013 01:41:33 +0000 (21:41 -0400)]
perf session: Export queue_event function
Taking a lesson from perf-trace and bringing in control of event
processing to perf-kvm-stat-live: parse the sample to get access the
time leaving just the need to queue it to the ordered samples list. For
that the queue_event function needs to be exported.
Unexport perf_session__process_event.
Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com> Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1375753297-69645-2-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf annotate browser: Improve description of '?' hotkey
The previous description: "Search previous string" is usually associated
with the 'N' following a '/string', the opposite of 'n', which is
'Search next string' in the direction established with '/' or '?'.
So change it to 'Search string backwards', to clarify that.
The 'N' hotkey remains to be implemented with the semantic described
above.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-5lw5y15d7vv308xbpm8pqe4g@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 7 Aug 2013 11:38:56 +0000 (14:38 +0300)]
perf annotate: Remove nop at end of annotation
When kcore is used for annotation, symbols do not have correct sizes
because they come from kallsyms, that has only its start address, with
the end address being the next symbol's minus one.
That sometimes results in an extra nop being seen after the end of a
function. Remove it.
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375875537-4509-13-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 7 Aug 2013 11:38:54 +0000 (14:38 +0300)]
perf annotate: Allow disassembly using /proc/kcore
Annotation with /proc/kcore is possible so the logic is adjusted to
allow it. The main difference is that /proc/kcore had no symbols so the
parsing logic needed a tweak to read jump offsets.
The other difference is that objdump cannot always read from kcore.
That seems to be a bug with objdump.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375875537-4509-11-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 7 Aug 2013 11:38:53 +0000 (14:38 +0300)]
perf tests: Add kcore to the object code reading test
Make the "object code reading" test attempt to read from kcore.
The test uses objdump which struggles with kcore. i.e. doesn't always
work, sometimes takes a long time. The test has been made to work
around those issues.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375875537-4509-10-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 7 Aug 2013 11:38:52 +0000 (14:38 +0300)]
perf tests: Adjust the vmlinux symtab matches kallsyms test again
The kallsyms maps now may map to kcore and the symbol values now may be
file offsets. For comparison with vmlinux the virtual memory address is
needed which is obtained by unmapping the symbol value.
The "vmlinux symtab matches kallsyms" is adjusted accordingly.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375875537-4509-9-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 7 Aug 2013 11:38:50 +0000 (14:38 +0300)]
perf tools: Make it possible to read object code from kernel modules
The new "object code reading" test shows that it is not possible to read
object code from kernel modules. That is because the mappings do not
map to the dsos. This patch fixes that.
This involves identifying and flagging relocatable (ELF type ET_REL)
files (e.g. kernel modules) for symbol adjustment and updating
map__rip_2objdump() accordingly. The kmodule parameter of
dso__load_sym() is taken into use and the module map altered to map to
the dso.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375875537-4509-7-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 7 Aug 2013 11:38:48 +0000 (14:38 +0300)]
perf tests: Adjust the vmlinux symtab matches kallsyms test
The vmlinux maps now map to the dso and the symbol values are now file
offsets. For comparison with kallsyms the virtual memory address is
needed which is obtained by unmapping the symbol value.
The "vmlinux symtab matches kallsyms" is adjusted accordingly.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375875537-4509-5-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 7 Aug 2013 11:38:47 +0000 (14:38 +0300)]
perf tools: Make it possible to read object code from vmlinux
The new "object code reading" test shows that it is not possible to read
object code from vmlinux. That is because the mappings do not map to
the dso. This patch fixes that.
A side-effect of changing the kernel map is that the "reloc" offset must
be taken into account. As a result of that separate map functions for
relocation are no longer needed.
Also fixing up the maps to match the symbols no longer makes sense and
so is not done.
The vmlinux dso data_type is now set to either DSO_BINARY_TYPE__VMLINUX
or DSO_BINARY_TYPE__GUEST_VMLINUX as approprite, which enables the
correct file name to be determined by dso__binary_type_file().
This patch breaks the "vmlinux symtab matches kallsyms" test. That is
fixed in a following patch.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375875537-4509-4-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 7 Aug 2013 11:38:46 +0000 (14:38 +0300)]
perf symbols: Load kernel maps before using
In order to use kernel maps to read object code, those maps must be
adjusted to map to the dso file offset. Because lazy-initialization is
used, that is not done until symbols are loaded. However the maps are
first used by thread__find_addr_map() before symbols are loaded. So
this patch changes thread__find_addr() to "load" kernel maps before
using them.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375875537-4509-3-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Wed, 7 Aug 2013 11:38:45 +0000 (14:38 +0300)]
perf tests: Add test for reading object code
Using the information in mmap events, perf tools can read object code
associated with sampled addresses. A test is added that compares bytes
read by perf with the same bytes read using objdump.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375875537-4509-2-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andi Kleen [Sat, 3 Aug 2013 00:41:11 +0000 (17:41 -0700)]
perf stat: Add support for --initial-delay option
When measuring workloads the startup phase -- doing page faults, dynamic
linking, opening files -- is often very different from the rest of the
workload. Especially with smaller kernels and using counter
multiplexing this can give significant measurement errors.
Multiplexing assumes that the workload is mostly the same over longer
periods. But at startup there is typically some spike of activity which
is relatively short. If many groups are multiplexing the one group
seeing the spike, and which is then scaled up over the time to run all
groups, may see a significant error.
Also in general it's often not useful to measure the startup, because it
is so different from the rest.
One way around this is to use interval mode and discard the first
sample, but this can be awkward because interval mode doesn't support
intervals of less than 100ms, and also a useful interval is not
necessarily the same as a useful startup delay.
This patch adds a new --initial-delay / -D option to skip measuring for
the startup phase. The time can be specified in ms
Here's a simple example:
perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done'
...
3,721 page-faults
...
If we just wait 20 ms the number of page faults is 1/3 less:
perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done'
...
2,823 page-faults
...
So we filtered out most of the startup noise from bash.
Andi Kleen [Sat, 3 Aug 2013 00:41:09 +0000 (17:41 -0700)]
perf evlist: Remove obsolete dummy execve
Minor cleanup.
The dummy execve to pre-resolve the PLT is obsolete since
"enable_on_execve" was added. The counters are only
running after the execve anyways. So just remove it.
David Ahern [Fri, 26 Jul 2013 14:27:23 +0000 (08:27 -0600)]
perf tools: Fix compile of util/tsc.c
On Fedora 18, with gcc 4.6.4 compile fails with:
arch/x86/util/tsc.c: In function ‘perf_time_to_tsc’:
arch/x86/util/tsc.c:13:6: error: declaration of ‘time’ shadows a global declaration [-Werror=shadow]
cc1: all warnings being treated as errors
make: *** [/tmp/junk/arch/x86/util/tsc.o] Error 1
make: *** Waiting for unfinished jobs....
Jiri Olsa [Wed, 10 Oct 2012 15:39:03 +0000 (17:39 +0200)]
perf tools: Add 'S' event/group modifier to read sample value
Adding 'S' event/group modifier to specify that the event value/s are
read by PERF_SAMPLE_READ sample type processing, instead of the period
value offered by lower layers.
There's additional behaviour change for 'S' modifier being specified on
event group:
Currently all the events within a group makes samples. If user now
specifies 'S' within group modifier, only the leader will trigger
samples. The rest of events in the group will have sampling disabled.
And same as for single events, values of all events within the group
(including leader) are read by PERF_SAMPLE_READ sample type processing.
Following example will create event group with cycles and cache-misses
events, setting the cycles as group leader and the only event to
actually sample. Both cycles and cache-misses event period values are
read by PERF_SAMPLE_READ sample type processing with PERF_FORMAT_GROUP
read format.
Example:
$ perf record -e '{cycles,cache-misses}:S' ls
...
$ perf report --group --show-total-period --stdio
...
# Samples: 36 of event 'anon group { cycles, cache-misses }'
# Event count (approx.): 12585593
#
# Overhead Period Command Shared Object Symbol
# .............. .............. ....... ................. ..........................
#
19.92% 1.20% 2505936 31 ls [kernel.kallsyms] [k] mark_held_locks
13.74% 0.47% 1729327 12 ls [kernel.kallsyms] [k] sched_clock_local
13.64% 23.72% 1716147 612 ls ld-2.14.90.so [.] check_match.10805
13.12% 23.22% 1650778 599 ls libc-2.14.90.so [.] _nl_intern_locale_data
11.24% 29.19% 1414554 753 ls [kernel.kallsyms] [k] sched_clock_cpu
8.50% 0.35% 1070150 9 ls [kernel.kallsyms] [k] check_chain_key
...
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-iyoinu3axi11mymwnh2b7fxj@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Wed, 10 Oct 2012 16:52:24 +0000 (18:52 +0200)]
perf evsel: Add PERF_SAMPLE_READ sample related processing
For sample with sample type PERF_SAMPLE_READ the period value is stored
in the 'struct sample_read'.
Moreover if the read format has PERF_FORMAT_GROUP, the 'struct
sample_read' contains period values for all events in the group (for
which the sample's event is a leader).
We deliver separated samples for all the values contained within the
'struct sample_read'.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-6mdm5xkrm6kypouh1c33cyys@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 12 Oct 2012 11:02:21 +0000 (13:02 +0200)]
perf evlist: Fix event ID retrieval for group format read case
We need to fail the event ID retrieval in case both following conditions
are true:
- we are on kernel with no PERF_EVENT_IOC_ID support
- PERF_FORMAT_GROUP read format is set
The PERF_FORMAT_GROUP read format bit is the killer for retrieving event
ID out of the read syscall, because we have no guarantee of the event
placement within leader kernel sibling list.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-e93pgyj20rqx48qzw10vj4r4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Wed, 10 Oct 2012 15:38:13 +0000 (17:38 +0200)]
perf tools: Add support for parsing PERF_SAMPLE_READ sample type
Adding support to parse out the PERF_SAMPLE_READ sample bits. The code
contains both single and group format specification.
This code parse out and prepare PERF_SAMPLE_READ data into the
perf_sample struct. It will be used for group leader sampling feature
comming in shortly.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-0tgdoln5rwk3wocshb442cl3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Wed, 4 Apr 2012 17:32:27 +0000 (19:32 +0200)]
perf evlist: Use PERF_EVENT_IOC_ID perf ioctl to read event id
Changing the way we retrieve the event ID. Instead of parsing out
the ID out of the read data, using the PERF_EVENT_IOC_ID ioctl.
Keeping the old way in place to support kernels without
PERF_EVENT_IOC_ID ioctl support.
This will be useful for retrieving the event ID for events
with PERF_FORMAT_GROUP read format set, where it's impossible
to get correct event id out of the read call data.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-psgb4n7kte8e6tfenbe7nj2h@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Mon, 15 Oct 2012 18:13:45 +0000 (20:13 +0200)]
perf: Do not get values from disabled counters in group format read
It's possible some of the counters in the group could be
disabled when sampling member of the event group is reading
the rest via PERF_SAMPLE_READ sample type processing. Disabled
counters could then produce wrong numbers.
Fixing that by reading only enabled counters for PERF_SAMPLE_READ
sample type processing.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-wwkjb0bbcuslnz0klrmqi26r@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Wed, 24 Oct 2012 11:37:58 +0000 (13:37 +0200)]
perf: Add PERF_EVENT_IOC_ID ioctl to return event ID
The only way to get the event ID is by reading the event fd,
followed by parsing the ID value out of the returned data.
While this is ok for current read format used by perf tool,
it is not ok when we use PERF_FORMAT_GROUP format.
With this format the data are returned for the whole group
and there's no way to find out what ID belongs to our fd
(if we are not group leader event).
Adding a simple ioctl that returns event primary ID for given fd.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-v1bn5cto707jn0bon34afqr1@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
A perf event can be used without forcing the tick to
stay alive if it doesn't use a frequency but a sample
period and if it doesn't throttle (raise storm of events).
Since the lockup detector neither use a perf event frequency
nor should ever throttle due to its high period, it can now
run concurrently with the full dynticks feature.
So remove the hack that disabled the watchdog.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Anish Singh <anish198519851985@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-9-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently the full dynticks subsystem keep the
tick alive as long as there are perf events running.
This prevents the tick from being stopped as long as features
such that the lockup detectors are running. As a temporary fix,
the lockup detector is disabled by default when full dynticks
is built but this is not a long term viable solution.
To fix this, only keep the tick alive when an event configured
with a frequency rather than a period is running on the CPU,
or when an event throttles on the CPU.
These are the only purposes of the perf tick, especially now that
the rotation of flexible events is handled from a seperate hrtimer.
The tick can be shutdown the rest of the time.
Original-patch-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-8-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
In case of allocation failure, get_callchain_buffer() keeps the
refcount incremented for the current event.
As a result, when get_callchain_buffers() returns an error,
we must cleanup what it did by cancelling its last refcount
with a call to put_callchain_buffers().
This is a hack in order to be able to call free_event()
after that failure.
The original purpose of that was to simplify the failure
path. But this error handling is actually counter intuitive,
ugly and not very easy to follow because one expect to
see the resources used to perform a service to be cleaned
by the callee if case of failure, not by the caller.
So lets clean this up by cancelling the refcount from
get_callchain_buffer() in case of failure. And correctly free
the event accordingly in perf_event_alloc().
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-3-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf: Fix branch stack refcount leak on callchain init failure
On callchain buffers allocation failure, free_event() is
called and all the accounting performed in perf_event_alloc()
for that event is cancelled.
But if the event has branch stack sampling, it is unaccounted
as well from the branch stack sampling events refcounts.
This is a bug because this accounting is performed after the
callchain buffer allocation. As a result, the branch stack sampling
events refcount can become negative.
To fix this, move the branch stack event accounting before the
callchain buffer allocation.
Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374539466-4799-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 4 Jul 2013 04:56:46 +0000 (12:56 +0800)]
sched: Micro-optimize the smart wake-affine logic
Smart wake-affine is using node-size as the factor currently, but the overhead
of the mask operation is high.
Thus, this patch introduce the 'sd_llc_size' percpu variable, which will record
the highest cache-share domain size, and make it to be the new factor, in order
to reduce the overhead and make it more reasonable.
Tested-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Tested-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Michael Wang <wangyun@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Link: http://lkml.kernel.org/r/51D5008E.6030102@linux.vnet.ibm.com
[ Tidied up the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Michael Wang [Thu, 4 Jul 2013 04:55:51 +0000 (12:55 +0800)]
sched: Implement smarter wake-affine logic
The wake-affine scheduler feature is currently always trying to pull
the wakee close to the waker. In theory this should be beneficial if
the waker's CPU caches hot data for the wakee, and it's also beneficial
in the extreme ping-pong high context switch rate case.
Testing shows it can benefit hackbench up to 15%.
However, the feature is somewhat blind, from which some workloads
such as pgbench suffer. It's also time-consuming algorithmically.
Testing shows it can damage pgbench up to 50% - far more than the
benefit it brings in the best case.
So wake-affine should be smarter and it should realize when to
stop its thankless effort at trying to find a suitable CPU to wake on.
This patch introduces 'wakee_flips', which will be increased each
time the task flips (switches) its wakee target.
So a high 'wakee_flips' value means the task has more than one
wakee, and the bigger the number, the higher the wakeup frequency.
Now when making the decision on whether to pull or not, pay attention to
the wakee with a high 'wakee_flips', pulling such a task may benefit
the wakee. Also imply that the waker will face cruel competition later,
it could be very cruel or very fast depends on the story behind
'wakee_flips', waker therefore suffers.
Furthermore, if waker also has a high 'wakee_flips', that implies that
multiple tasks rely on it, then waker's higher latency will damage all
of them, so pulling wakee seems to be a bad deal.
Thus, when 'waker->wakee_flips / wakee->wakee_flips' becomes
higher and higher, the cost of pulling seems to be worse and worse.
The patch therefore helps the wake-affine feature to stop its pulling
work when:
The 'factor' here is the number of CPUs in the current CPU's NUMA node,
so a bigger node will lead to more pulling since the trial becomes more
severe.
After applying the patch, pgbench shows up to 40% improvements and no regressions.
Tested with 12 cpu x86 server and tip 3.10.0-rc7.
The percentages in the final column highlight the areas with the biggest wins,
all other areas improved as well:
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51D50057.9000809@linux.vnet.ibm.com
[ Improved the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vladimir Davydov [Mon, 15 Jul 2013 13:49:19 +0000 (17:49 +0400)]
sched: Move h_load calculation to task_h_load()
The bad thing about update_h_load(), which computes hierarchical load
factor for task groups, is that it is called for each task group in the
system before every load balancer run, and since rebalance can be
triggered very often, this function can eat really a lot of cpu time if
there are many cpu cgroups in the system.
Although the situation was improved significantly by commit a35b646
('sched, cgroup: Reduce rq->lock hold times for large cgroup
hierarchies'), the problem still can arise under some kinds of loads,
e.g. when cpus are switching from idle to busy and back very frequently.
For instance, when I start 1000 of processes that wake up every
millisecond on my 8 cpus host, 'top' and 'perf top' show:
Then if I create 10000 *idle* cpu cgroups (no processes in them), cpu
usage increases significantly although the 'wakers' are still executing
in the root cpu cgroup:
This happens because this particular kind of load triggers 'new idle'
rebalance very frequently, which requires calling update_h_load(),
which, in turn, calls tg_load_down() for every *idle* cpu cgroup even
though it is absolutely useless, because idle cpu cgroups have no tasks
to pull.
This patch tries to improve the situation by making h_load calculation
proceed only when h_load is really necessary. To achieve this, it
substitutes update_h_load() with update_cfs_rq_h_load(), which computes
h_load only for a given cfs_rq and all its ascendants, and makes the
load balancer call this function whenever it considers if a task should
be pulled, i.e. it moves h_load calculations directly to task_h_load().
For h_load of the same cfs_rq not to be updated multiple times (in case
several tasks in the same cgroup are considered during the same balance
run), the patch keeps the time of the last h_load update for each cfs_rq
and breaks calculation when it finds h_load to be uptodate.
The benefit of it is that h_load is computed only for those cfs_rq's,
which really need it, in particular all idle task groups are skipped.
Although this, in fact, moves h_load calculation under rq lock, it
should not affect latency much, because the amount of work done under rq
lock while trying to pull tasks is limited by sched_nr_migrate.
After the patch applied with the setup described above (1000 wakers in
the root cgroup and 10000 idle cgroups), I get:
Adrian Hunter [Fri, 28 Jun 2013 13:22:19 +0000 (16:22 +0300)]
perf tools: Add test for converting perf time to/from TSC
The test uses the newly added cap_usr_time_zero and time_zero of
perf_event_mmap_page. TSC from rdtsc is compared with the time
from 2 perf events. The test passes if the calculated times are
all in the correct order.
Adrian Hunter [Fri, 28 Jun 2013 13:22:18 +0000 (16:22 +0300)]
perf/x86: Add ability to calculate TSC from perf sample timestamps
For modern CPUs, perf clock is directly related to TSC. TSC
can be calculated from perf clock and vice versa using a simple
calculation. Two of the three componenets of that calculation
are already exported in struct perf_event_mmap_page. This patch
exports the third.
kprobes/x86: Call out into INT3 handler directly instead of using notifier
In fd4363fff3d96 ("x86: Introduce int3 (breakpoint)-based
instruction patching"), the mechanism that was introduced for
notifying alternatives code from int3 exception handler that and
exception occured was die_notifier.
This is however problematic, as early code might be using jump
labels even before the notifier registration has been performed,
which will then lead to an oops due to unhandled exception. One
of such occurences has been encountered by Fengguang:
Fix this by directly calling poke_int3_handler() from the int3
exception handler (analogically to what ftrace has been doing
already), instead of relying on notifier, registration of which
might not have yet been finalized by the time of the first trap.
For global/local weights, I'm not entirely sure to place them into the
memory dimension. But it's the only user at this time.
Well TSX is another (in fact the original) user of the flags, and it
needs them to be common. So move local/global weight back to the common
sort keys.
Jiri Olsa [Mon, 22 Jul 2013 12:43:33 +0000 (14:43 +0200)]
perf tests: Add 'make install/install-bin' tests into tests/make
Adding 'make install' and 'make install-bin' tests into tests/make. It's
run as part of the suite, but could be run separately like:
$ make -f tests/make make_install
- make_install: cd . && make -f Makefile DESTDIR=/tmp/tmp.LpkYbk5pfs install
test: test -x /tmp/tmp.LpkYbk5pfs/bin/perf
$ make -f tests/make make_install_bin
- make_install_bin: cd . && make -f Makefile DESTDIR=/tmp/tmp.dMxePBMcFT
install-bin
test: test -x /tmp/tmp.dMxePBMcFT/bin/perf
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1374497014-2817-5-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding TMP_DEST tests/make variable to provide the DESTDIR directory for
installation tests.
Adding this to existing test targets, since DESTDIR variable 'should
not' affect other than install* targets. We can always separate this if
there's a need for DESTDIR-free build test.
Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1374497014-2817-4-git-send-email-jolsa@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perl.h from new Perl release doesn't like -Wundef and -Wswitch-default:
/usr/lib/perl5/core_perl/CORE/perl.h:548:5: error: "SILENT_NO_TAINT_SUPPORT" is not defined [-Werror=undef]
#if SILENT_NO_TAINT_SUPPORT && !defined(NO_TAINT_SUPPORT)
^
/usr/lib/perl5/core_perl/CORE/perl.h:556:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef]
#if NO_TAINT_SUPPORT
^
In file included from /usr/lib/perl5/core_perl/CORE/perl.h:3471:0,
from util/scripting-engines/trace-event-perl.c:30:
/usr/lib/perl5/core_perl/CORE/sv.h:1455:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef]
#if NO_TAINT_SUPPORT
^
In file included from /usr/lib/perl5/core_perl/CORE/perl.h:3472:0,
from util/scripting-engines/trace-event-perl.c:30:
/usr/lib/perl5/core_perl/CORE/regexp.h:436:5: error: "NO_TAINT_SUPPORT" is not defined [-Werror=undef]
#if NO_TAINT_SUPPORT
^
In file included from /usr/lib/perl5/core_perl/CORE/hv.h:592:0,
from /usr/lib/perl5/core_perl/CORE/perl.h:3480,
from util/scripting-engines/trace-event-perl.c:30:
/usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_siphash_2_4’:
/usr/lib/perl5/core_perl/CORE/hv_func.h:222:3: error: switch missing default case [-Werror=switch-default]
switch( left )
^
/usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_superfast’:
/usr/lib/perl5/core_perl/CORE/hv_func.h:274:5: error: switch missing default case [-Werror=switch-default]
switch (rem) { \
^
/usr/lib/perl5/core_perl/CORE/hv_func.h: In function ‘S_perl_hash_murmur3’:
/usr/lib/perl5/core_perl/CORE/hv_func.h:398:5: error: switch missing default case [-Werror=switch-default]
switch(bytes_in_carry) { /* how many bytes in carry */
^
Let's disable the warnings for code which uses perl.h.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1372063394-20126-1-git-send-email-kirill@shutemov.name Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Support callchain sorting based on addresses
With programs with very large functions it can be useful to distinguish
the callgraph nodes on more than just function names. So for example if
you have multiple calls to the same function, it ends up being separate
nodes in the chain.
This patch adds a new key field to the callgraph options, that allows
comparing nodes on functions (as today, default) and addresses.
Longer term it would be nice to also handle src lines, but that would
need more changes and address is a reasonable proxy for it today.
I right now reference the global params, as there was no simple way to
register a params pointer.
The glibc calloc() function has an optimization to not explicitely
memset() very large calloc allocations that just came from mmap(),
because they are known to be zero.
This could result in the perf memcpy benchmark reading only from
the zero page, which gives unrealistic results.
Always call memset explicitly on the source area to avoid this problem.
David Ahern [Thu, 18 Jul 2013 23:27:59 +0000 (17:27 -0600)]
perf evsel: Handle ENODEV on default cycles event
Some systems (e.g., VMs on qemu-0.13 with the default vcpu model) report
an unsupported CPU model:
Performance Events: unsupported p6 CPU model 2 no PMU driver, software events only.
Subsequent invocations of perf fail with:
The sys_perf_event_open() syscall returned with 19 (No such device) for event (cycles).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
Add ENODEV to the list of errno's to fallback to cpu-clock.
Signed-off-by: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1374190079-28507-1-git-send-email-dsahern@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Thu, 18 Jul 2013 22:06:15 +0000 (16:06 -0600)]
perf script: Fix named threads support
Commit 73994dc broke named thread support in perf-script. The thread
struct in al is the main thread for a multithreaded process. The thread
struct used for analysis (e.g., dumping events) should be the specific
thread for the sample.
kprobes/x86: Use text_poke_bp() instead of text_poke_smp*()
Use text_poke_bp() for optimizing kprobes instead of
text_poke_smp*(). Since the number of kprobes is usually not so
large (<100) and text_poke_bp() is much lighter than
text_poke_smp() [which uses stop_machine()], this just stops
using batch processing.
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Jiri Kosina <jkosina@suse.cz> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jason Baron <jbaron@akamai.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Borislav Petkov <bpetkov@suse.de> Link: http://lkml.kernel.org/r/20130718114750.26675.9174.stgit@mhiramat-M0-7522 Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull networking fixes from David Miller:
"A couple interesting SKB fragment handling fixes, plus the usual small
bits here and there:
1) Fix 64-bit divide build failure on 32-bit platforms in mlx5, from
Tim Gardner.
2) Get rid of a stupid reimplementation on "%*phC" in our sysfs MAC
address printing helper.
3) Fix NETIF_F_SG capability advertisement in hyperv driver, if the
device can't do checksumming offloads then it shouldn't say it can
do SG either. From Haiyang Zhang.
4) bgmac needs to depend on PHYLIB, from Hauke Mehrtens.
5) Don't leak DMA mappings on mapping failures, from Neil Horman.
6) We need to reset the transport header of SKBs in ipv4 before we
attempt to perform early socket demux, just like ipv6 does. From
Eric Dumazet.
7) Add missing locking on vxlan device removal, from Stephen
Hemminger.
8) xen-netfront has to make two passes over an SKB to prepare it for
transfer. One pass calculates the number of slots needed, the
second massages the SKB and fills the slots. Unfortunately, the
first pass doesn't calculate the number of slots properly so we
can end up trying to build a MAX_SKB_FRAGS + 1 SKB which doesn't
work out so well. Fix from Jan Beulich with help and discussion
with several others.
9) Fix a similar problem in tun and macvtap, which have to split up
scatter-gather elements at PAGE_SIZE boundaries. Don't do
zerocopy if it would result in a > MAX_SKB_FRAGS skb. Fixes from
Jason Wang.
10) On receive, once we've decoded the VLAN state completely, clear
skb->vlan_tci. Otherwise demuxed tunnels underneath can trigger
the VLAN code again, corrupting the packet. Fix from Eric
Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
vlan: fix a race in egress prio management
vlan: mask vlan prio bits
macvtap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
tuntap: do not zerocopy if iov needs more pages than MAX_SKB_FRAGS
pkt_sched: sch_qfq: remove a source of high packet delay/jitter
xen-netfront: pull on receive skb may need to happen earlier
vxlan: add necessary locking on device removal
hyperv: Fix the NETIF_F_SG flag setting in netvsc
net: Fix sysfs_format_mac() code duplication.
be2net: Fix to avoid hardware workaround when not needed
macvtap: do not assume 802.1Q when send vlan packets
macvtap: fix the missing ret value of TUNSETQUEUE
ipv4: set transport header earlier
mlx5 core: Fix __udivdi3 when compiling for 32 bit arches
bgmac: add dependency to phylib
net/irda: fixed style issues in irlan_eth
ethtool: fixed trailing statements in ethtool
ndisc: bool initializations should use true and false
atl1e: unmap partially mapped skb on dma error and free skb
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
"Trying again to get the fixes queue, including the fixed IDT alignment
patch.
The UEFI patch is by far the biggest issue at hand: it is currently
causing quite a few machines to boot. Which is sad, because the only
reason they would is because their BIOSes touch memory that has
already been freed. The other major issue is that we finally have
tracked down the root cause of a significant number of machines
failing to suspend/resume"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Make sure IDT is page aligned
x86, suspend: Handle CPUs which fail to #GP on RDMSR
x86/platform/ce4100: Add header file for reboot type
Revert "UEFI: Don't pass boot services regions to SetVirtualAddressMap()"
efivars: check for EFI_RUNTIME_SERVICES