Vinson Lee [Mon, 4 Apr 2016 22:07:39 +0000 (22:07 +0000)]
perf config: Fix build with older toolchain.
Fix build error on Ubuntu 12.04.5 with GCC 4.6.3.
CC util/config.o
util/config.c: In function ‘perf_buildid_config’:
util/config.c:384:15: error: declaration of ‘dirname’ shadows a global declaration [-Werror=shadow]
Signed-off-by: Vinson Lee <vlee@freedesktop.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Taeung Song <treeze.taeung@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Fixes: 9cb5987c8227 ("perf config: Rework buildid_dir_command_config to perf_buildid_config") Link: http://lkml.kernel.org/r/1459807659-9020-1-git-send-email-vlee@freedesktop.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge tag 'perf-core-for-mingo-20160401' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
User visible changes:
- Do not use events that don't have timestamps when setting 'perf trace's
base timestamp, fixing up the timestamp column for syscalls (Arnaldo Carvalho de Melo)
- Make the 'bpf-output' sample_type be the same as tracepoint's, fixing up
'perf trace's timestamp column for bpf events (Wang Nan)
- Fix PMU term format max value calculation (Kan Liang)
- Pretty print 'seccomp', 'getrandom' syscalls in 'perf trace' (Arnaldo Carvalho de Melo)
Infrastructure changes:
- Add support for using TSC as an ARCH timestamp when synthesizing
JIT records (Adrian Hunter)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Wang Nan <wangnan0@huawei.com> Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: pi3orama@163.com Link: http://lkml.kernel.org/r/1459517202-42320-1-git-send-email-wangnan0@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-p7m8llv81iv55ekxexdp5n57@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf trace: Introduce function to set the base timestamp
That is used in both live runs, i.e.:
# trace ls
As when processing events recorded in a perf.data file:
# trace -i perf.data
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-901l6yebnzeqg7z8mbaf49xb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Wed, 30 Mar 2016 19:16:15 +0000 (12:16 -0700)]
perf tools: Fix PMU term format max value calculation
Currently the max value of format is calculated by the bits number. It
relies on the continuity of the format.
However, uncore event format is not continuous. E.g. uncore qpi event
format can be 0-7,21.
If bit 21 is set, there is parsing issues as below.
$ perf stat -a -e uncore_qpi_0/event=0x200002,umask=0x8/
event syntax error: '..pi_0/event=0x200002,umask=0x8/'
\___ value too big for format, maximum is 511
This patch return the real max value by setting all possible bits to 1.
Signed-off-by: Kan Liang <kan.liang@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1459365375-14285-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Tue, 8 Mar 2016 08:38:50 +0000 (10:38 +0200)]
perf jit: Add support for using TSC as a timestamp
Intel PT uses TSC as a timestamp, so add support for using TSC instead
of the monotonic clock. Use of TSC is selected by an environment
variable "JITDUMP_USE_ARCH_TIMESTAMP" and flagged in the jitdump file
with flag JITDUMP_FLAGS_ARCH_TIMESTAMP.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1457426330-30226-1-git-send-email-adrian.hunter@intel.com
[ Added the fixup from He Kuang to make it build on other arches, ]
[ such as aarch64, to avoid inserting this bisectiong breakage upstream ] Link: http://lkml.kernel.org/r/1459482572-129494-1-git-send-email-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adrian Hunter [Tue, 8 Mar 2016 08:38:44 +0000 (10:38 +0200)]
perf tools: Add time conversion event
Intel PT uses the time members from the perf_event_mmap_page to convert
between TSC and perf time.
Due to a lack of foresight when Intel PT was implemented, those time
members were recorded in the (implementation dependent) AUXTRACE_INFO
event, the structure of which is generally inaccessible outside of the
Intel PT decoder. However now the conversion between TSC and perf time
is needed when processing a jitdump file when Intel PT has been used for
tracing.
So add a user event to record the time members. 'perf record' will
synthesize the event if the information is available. And session
processing will put a copy of the event on the session so that tools
like 'perf inject' can easily access it.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-gca0n1p3aca3depey703ph2q@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-t369uckshlwp4evkks4bcoo7@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We catch this record to provide a visual indication that events are
getting lost, then call the default method to allow extra logging shared
with the other tools to take place.
This extra logging was done twice because we were continuing to the
"default" clause where machine__process_event() will end up calling
machine__process_lost_event() again, fix it.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-wus2zlhw3qo24ye84ewu4aqw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wang Nan [Mon, 28 Mar 2016 06:41:31 +0000 (06:41 +0000)]
perf/ring_buffer: Prepare writing into the ring-buffer from the end
Convert perf_output_begin() to __perf_output_begin() and make the later
function able to write records from the end of the ring-buffer.
Following commits will utilize the 'backward' flag.
This is the core patch to support writing to the ring-buffer backwards,
which will be introduced by upcoming patches to support reading from
overwritable ring-buffers.
In theory, this patch should not introduce any extra performance
overhead since we use always_inline, but it does not hurt to double
check that assumption:
When CONFIG_OPTIMIZE_INLINING is disabled, the output object is nearly
identical to original one. See:
When CONFIG_OPTIMIZE_INLINING is enabled, the resuling object file becomes
smaller:
$ size kernel/events/ring_buffer.o*
text data bss dec hex filename
4641 4 8 4653 122d kernel/events/ring_buffer.o.old
4545 4 8 4557 11cd kernel/events/ring_buffer.o.new
Performance testing results:
Calling 3000000 times of 'close(-1)', use gettimeofday() to check
duration. Use 'perf record -o /dev/null -e raw_syscalls:*' to capture
system calls. In ns.
Testing environment:
CPU : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Kernel : v4.5.0
MEAN STDVAR
BASE 800214.950 2853.083
PRE 2253846.700 9997.014
POST 2257495.540 8516.293
Where 'BASE' is pure performance without capturing. 'PRE' is test
result of pure 'v4.5.0' kernel. 'POST' is test result after this
patch.
Considering the stdvar, this patch doesn't hurt performance, within
noise margin.
Wang Nan [Mon, 28 Mar 2016 06:41:30 +0000 (06:41 +0000)]
perf/core: Set event's default ::overflow_handler()
Set a default event->overflow_handler in perf_event_alloc() so don't
need to check event->overflow_handler in __perf_event_overflow().
Following commits can give a different default overflow_handler.
Since the default value of event->overflow_handler is not NULL, existing
'if (!overflow_handler)' checks need to be changed.
is_default_overflow_handler() is introduced for this.
No extra performance overhead is introduced into the hot path because in the
original code we still need to read this handler from memory. A conditional
branch is avoided so actually we remove some instructions.
Signed-off-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <pi3orama@163.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1459147292-239310-3-git-send-email-wangnan0@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Wang Nan [Mon, 28 Mar 2016 06:41:29 +0000 (06:41 +0000)]
perf/ring_buffer: Introduce new ioctl options to pause and resume the ring-buffer
Add new ioctl() to pause/resume ring-buffer output.
In some situations we want to read from the ring-buffer only when we
ensure nothing can write to the ring-buffer during reading. Without
this patch we have to turn off all events attached to this ring-buffer
to achieve this.
This patch is a prerequisite to enable overwrite support for the
perf ring-buffer support. Following commits will introduce new methods
support reading from overwrite ring buffer. Before reading, caller
must ensure the ring buffer is frozen, or the reading is unreliable.
Signed-off-by: Wang Nan <wangnan0@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <pi3orama@163.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1459147292-239310-2-git-send-email-wangnan0@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jiri Olsa [Wed, 16 Mar 2016 14:34:29 +0000 (15:34 +0100)]
ftrace/perf: Check sample types only for sampling events
Currently we check sample type for ftrace:function events
even if it's not created as a sampling event. That prevents
creating ftrace_function event in counting mode.
Make sure we check sample types only for sampling events.
Before:
$ sudo perf stat -e ftrace:function ls
...
perf/x86/intel/bts: Move transaction start/stop to start/stop callbacks
As per AUX buffer management requirement, AUX output has to happen between
pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
and therefore perf can free the AUX data after it has called pmu::stop.
This patch moves perf_aux_output_{begin,end} from bts_event_{add,del} to
bts_event_{start,stop}. As a bonus, we get rid of bts_buffer_is_full(),
which is already taken care of by perf_aux_output_begin() anyway.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1457098969-21595-6-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf/x86/intel/pt: Move transaction start/stop to PMU start/stop callbacks
As per AUX buffer management requirement, AUX output has to happen between
pmu::start and pmu::stop calls so that perf_event_stop() actually stops it
and therefore perf can free the AUX data after it has called pmu::stop.
This patch moves perf_aux_output_{begin,end} from pt_event_{add,del} to
pt_event_{start,stop}. As a bonus, we get rid of pt_buffer_is_full(),
which is already taken care of by perf_aux_output_begin() anyway.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1457098969-21595-5-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that we can ensure that when ring buffer's AUX area is on the way
to getting unmapped new transactions won't start, we only need to stop
all events that can potentially be writing aux data to our ring buffer.
Having done that, we can safely free the AUX pages and corresponding
PMU data, as this time it is guaranteed to be the last aux reference
holder.
... which was made to defer deallocation that was otherwise possible
from an NMI context. Now it is no longer the case; the last call to
rb_free_aux() that drops the last AUX reference has to happen in
perf_mmap_close() on that AUX area.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/87d1qtz23d.fsf@ashishki-desk.ger.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf/ring_buffer: Refuse to begin AUX transaction after rb->aux_mmap_count drops
When ring buffer's AUX area is unmapped and rb->aux_mmap_count drops to
zero, new AUX transactions into this buffer can still be started,
even though the buffer in en route to deallocation.
This patch adds a check to perf_aux_output_begin() for rb->aux_mmap_count
being zero, in which case there is no point starting new transactions,
in other words, the ring buffers that pass a certain point in
perf_mmap_close will not have their events sending new data, which
clears path for freeing those buffers' pages right there and then,
provided that no active transactions are holding the AUX reference.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: vince@deater.net Link: http://lkml.kernel.org/r/1457098969-21595-2-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 22 Mar 2016 21:09:18 +0000 (22:09 +0100)]
perf/core: Verify we have a single perf_hw_context PMU
There should (and can) only be a single PMU for perf_hw_context
events.
This is because of how we schedule events: once a hardware event fails to
schedule (the PMU is 'full') we stop trying to add more. The trivial
'fix' would break the Round-Robin scheduling we do.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 29 Mar 2016 12:30:35 +0000 (14:30 +0200)]
perf/x86: Move Kconfig.perf and other perf configuration bits to events/Kconfig
Ingo says:
"If we do a separate file we should have it in arch/x86/events/Kconfig
(not in arch/x86/Kconfig.perf), and also move some of the other bits,
such as PERF_EVENTS_AMD_POWER?"
Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
AMD Zeppelin (Family 17h, Model 00h) introduces an instructions
retired performance counter which is indicated by
CPUID.8000_0008H:EBX[1]. A dedicated Instructions Retired MSR register
(MSR 0xC000_000E9) increments once for every instruction retired.
Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1454056197-5893-3-git-send-email-ray.huang@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Huang Rui [Fri, 29 Jan 2016 08:29:56 +0000 (16:29 +0800)]
perf/x86/msr: Add AMD PTSC (Performance Time-Stamp Counter) support
AMD Carrizo (Family 15h, Model 60h) introduces a time-stamp counter
which is indicated by CPUID.8000_0001H:ECX[27]. It increments at a 100
MHz rate in all P-states, and C states, S0, or S1. The frequency is
about 100MHz. This counter will be used to calculate processor power
and other parts. So add an interface into the MSR PMU to get the PTSC
counter value.
Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1454056197-5893-2-git-send-email-ray.huang@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Thomas Gleixner [Sun, 20 Mar 2016 18:59:03 +0000 (18:59 +0000)]
x86/perf/intel/cstate: Sanitize error handling
There is no point in WARN_ON() inside of a well known init function. We
already know the call stack and it's really not of critical importance whether
the registration of a PMU fails.
Aside of that for consistency reasons it's just pointless to try to register
another PMU if the first register attempt failed. There is also no value in
keeping one PMU if the second one can not be registered.
Make it consistent so we can finaly modularize the driver.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20160320185623.579794064@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Thomas Gleixner [Sun, 20 Mar 2016 18:59:03 +0000 (18:59 +0000)]
x86/perf/intel/cstate: Sanitize probing
The whole probing functionality can simply be expressed with model matching
and a bunch of structures describing the variants. This is a first step to
make that driver modular.
While at it, get rid of completely pointless comments and name the enums so
they are self explaining.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Reworked probing to clear msr[].attr for all !present msrs. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20160320185623.500381872@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
The migration of pkg events does not work either. Not that I'm surprised.
I really could not be bothered to decode that loop mess and simply replaced it
by querying the proper cpumasks which give us the answer in a comprehensible
way.
This also requires to direct the event to the current active reader CPU in
cstate_pmu_event_init() otherwise the hotplug logic can't work.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[ Added event->cpu < 0 test to not explode] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20160320185623.422519970@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Tue, 29 Mar 2016 07:26:44 +0000 (09:26 +0200)]
perf/core: Fix time tracking bug with multiplexing
Stephane reported that commit:
3cbaa5906967 ("perf: Fix ctx time tracking by introducing EVENT_TIME")
introduced a regression wrt. time tracking, as easily observed by:
> This patch introduce a bug in the time tracking of events when
> multiplexing is used.
>
> The issue is easily reproducible with the following perf run:
>
> $ perf stat -a -C 0 -e branches,branches,branches,branches,branches,branches -I 1000
> 1.000730239 652,394 branches (66.41%)
> 1.000730239 597,809 branches (66.41%)
> 1.000730239 593,870 branches (66.63%)
> 1.000730239 651,440 branches (67.03%)
> 1.000730239 656,725 branches (66.96%)
> 1.000730239 <not counted> branches
>
> One branches event is shown as not having run. Yet, with
> multiplexing, all events should run especially with a 1s (-I 1000)
> interval. The delta for time_running comes out to 0. Yet, the event
> has run because the kernel is actually multiplexing the events. The
> problem is that the time tracking is the kernel and especially in
> ctx_sched_out() is wrong now.
>
> The problem is that in case that the kernel enters ctx_sched_out() with the
> following state:
> ctx->is_active=0x7 event_type=0x1
> Call Trace:
> [<ffffffff813ddd41>] dump_stack+0x63/0x82
> [<ffffffff81182bdc>] ctx_sched_out+0x2bc/0x2d0
> [<ffffffff81183896>] perf_mux_hrtimer_handler+0xf6/0x2c0
> [<ffffffff811837a0>] ? __perf_install_in_context+0x130/0x130
> [<ffffffff810f5818>] __hrtimer_run_queues+0xf8/0x2f0
> [<ffffffff810f6097>] hrtimer_interrupt+0xb7/0x1d0
> [<ffffffff810509a8>] local_apic_timer_interrupt+0x38/0x60
> [<ffffffff8175ca9d>] smp_apic_timer_interrupt+0x3d/0x50
> [<ffffffff8175ac7c>] apic_timer_interrupt+0x8c/0xa0
>
> In that case, the test:
> if (is_active & EVENT_TIME)
>
> will be false and the time will not be updated. Time must always be updated on
> sched out.
Fix this by always updating time if EVENT_TIME was set, as opposed to
only updating time when EVENT_TIME changed.
Reported-by: Stephane Eranian <eranian@google.com> Tested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: kan.liang@intel.com Cc: namhyung@kernel.org Fixes: 3cbaa5906967 ("perf: Fix ctx time tracking by introducing EVENT_TIME") Link: http://lkml.kernel.org/r/20160329072644.GB3408@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Thu, 31 Mar 2016 06:33:43 +0000 (08:33 +0200)]
Merge tag 'perf-core-for-mingo-20160330' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes:
User visible changes:
- Add support for skipping itrace instructions, useful to fast forward
processor trace (Intel PT, BTS) to right after initialization code at the start
of a workload (Andi Kleen)
- Add support for backtraces in perl 'perf script's (Dima Kogan)
- Add -U/-K (--all-user/--all-kernel) options to 'perf mem' (Jiri Olsa)
- Make -f/--force option documentation consistent across tools (Jiri Olsa)
Infrastructure changes:
- Add 'perf test' to check for event times (Jiri Olsa)
- 'perf config' cleanups (Taeung Song)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Thu, 31 Mar 2016 06:27:35 +0000 (08:27 +0200)]
Merge tag 'perf-urgent-for-mingo-20160330' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
- Fix determination of a callchain node's childlessness in
the top/report TUI, which was preventing navigating some
callchains, --stdio unnaffected (Andres Freund)
- Fix jitdump's genelf assumption that PowerPC is big endian
only (Anton Blanchard)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Andres Freund [Wed, 30 Mar 2016 19:02:45 +0000 (21:02 +0200)]
perf hists: Fix determination of a callchain node's childlessness
The 4b3a3212233a ("perf hists browser: Support flat callchains") commit
over-aggressively tried to optimize callchain_node__init_have_children().
That lead to --tui mode not allowing to expand call chain elements if a
call chain element had only one parent. That's why --inverted callgraphs
looked halfway sane, but plain ones didn't.
Revert that individual optimization, it wasn't really related to the
rest of the commit.
Signed-off-by: Andres Freund <andres@anarazel.de> Acked-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 4b3a3212233a ("perf hists browser: Support flat callchains") Link: http://lkml.kernel.org/r/20160330190245.GB13305@awork2.anarazel.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andi Kleen [Mon, 28 Mar 2016 17:45:38 +0000 (10:45 -0700)]
perf tools: Add support for skipping itrace instructions
When using 'perf script' to look at PT traces it is often useful to
ignore the initialization code at the beginning.
On larger traces which may have many millions of instructions in
initialization code doing that in a pipeline can be very slow, with perf
script spending a lot of CPU time calling printf and writing data.
This patch adds an extension to the --itrace argument that skips 'n'
events (instructions, branches or transactions) at the beginning. This
is much more efficient.
v2:
Add support for BTS (Adrian Hunter)
Document in itrace.txt
Fix branch check
Check transactions and instructions too
Committer note:
To test intel_pt one needs to make sure VT-x isn't active, i.e.
stopping KVM guests on the test machine, as described by Andi Kleen
at http://lkml.kernel.org/r/20160301234953.GD23621@tassilo.jf.intel.com
Signed-off-by: Andi Kleen <ak@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1459187142-20035-1-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Dima Kogan [Tue, 29 Mar 2016 15:47:53 +0000 (12:47 -0300)]
perf script perl: Perl scripts now get a backtrace, like the python ones
We have some infrastructure to use perl or python to analyze logs
generated by perf. Prior to this patch, only the python tools had
access to backtrace information. This patch makes this information
available to perl scripts as well. Example:
Let's look at malloc() calls made by the seq utility. First we
create a probe point:
$ perf probe -x /lib/x86_64-linux-gnu/libc.so.6 malloc
Added new events:
...
Now we run seq, while monitoring malloc() calls with perf
$ perf record --call-graph=dwarf -e probe_libc:malloc seq 5
1
2
3
4
5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.064 MB perf.data (6 samples) ]
We can use perf to look at its log to see the malloc calls and the backtrace
Taeung Song [Sun, 27 Mar 2016 17:22:19 +0000 (02:22 +0900)]
perf config: Rework buildid_dir_command_config to perf_buildid_config
To avoid repeated calling perf_config() remove
buildid_dir_command_config() and add new perf_buildid_config into
perf_default_config.
Because perf_config() is already called with perf_default_config at
main().
Signed-off-by: Taeung Song <treeze.taeung@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1459099340-16911-2-git-send-email-treeze.taeung@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Thu, 24 Mar 2016 12:52:20 +0000 (13:52 +0100)]
perf tests: Add test to check for event times
This test creates software event 'cpu-clock' attaches it in several ways
and checks that enabled and running times match.
Committer notes:
Testing it:
[acme@jouet linux]$ perf test -v times
44: Test events times :
--- start ---
test child forked, pid 27170
attaching to spawned child, enable on exec
OK : ena 307328, run 307328
attaching to current thread as enabled
OK : ena 7826, run 7826
attaching to current thread as disabled
OK : ena 738, run 738
attaching to CPU 0 as enabled
SKIP : not enough rights
attaching to CPU 0 as enabled
SKIP : not enough rights
test child finished with -2
---- end ----
Test events times: Skip
[acme@jouet linux]$
[root@jouet ~]# perf test times
44: Test events times : Ok
[root@jouet ~]# perf test -v times
44: Test events times :
--- start ---
test child forked, pid 27306
attaching to spawned child, enable on exec
OK : ena 479290, run 479290
attaching to current thread as enabled
OK : ena 11356, run 11356
attaching to current thread as disabled
OK : ena 987, run 987
attaching to CPU 0 as enabled
OK : ena 3717, run 3717
attaching to CPU 0 as enabled
OK : ena 2323, run 2323
test child finished with 0
---- end ----
Test events times: Ok
[root@jouet ~]#
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1458823940-24583-7-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Wed, 30 Mar 2016 10:31:03 +0000 (12:31 +0200)]
Merge tag 'perf-urgent-for-mingo-20160329' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fix from Arnaldo Carvalho de Melo:
- Add missing initialization of perf_sample.cpumode in synthesized samples,
affects jitdump, records for pre-existing threads and records synthesized
from processor trace data, noticed while testing intel_pt events with
'perf script' (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf tools: Add missing initialization of perf_sample.cpumode in synthesized samples
In 473398a21d28 ("perf tools: Add cpumode to struct perf_sample"), I
missed some places where perf_sample fields are directly initialized in
addition to what is done in perf_evsel__parse_sample(), namely when
synthesizing PERF_RECORD_{MMAP*,COMM,FORK,EXIT} for pre-existing threads
and also in intel_pt and intel_bts when synthesizing events from
processor trace, the jitdump code also was affected, fix it.
The problem was noticed with running:
# perf record -e intel_pt//u true
# perf script
Where the samples wouldn't get resolved because perf_sample.cpumode
would be left as zero, i.e. PERF_RECORD_MISC_CPUMODE_UNKNOWN, not
resolving as kernel, hypervisor or user cpu modes.
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 473398a21d28 ("perf tools: Add cpumode to struct perf_sample") Link: http://lkml.kernel.org/n/tip-n5sdauxgk24d5nun8kuuu2mh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Tue, 29 Mar 2016 08:39:12 +0000 (10:39 +0200)]
Merge tag 'perf-urgent-for-mingo-20160328' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fix from Arnaldo Carvalho de Melo:
- Fix build break on PowerPC due to missing headers with prototypes for
functions defined in tools/perf/arch/powerpc/util/header.c (Sukadev Bhattiprolu)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 531d2410635c ("perf tools: Do not include stringify.h from the
kernel sources") seems to have accidentially removed the inclusion of
"util/header.h" from "arch/powerpc/util/header.c".
"util/header.h" provides the prototype for get_cpuid() and is needed to
build perf on Powerpc:
arch/powerpc/util/header.c:17:1: error: no previous prototype for 'get_cpuid' [-Werror=missing-prototypes]
Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: 531d2410635c ("perf tools: Do not include stringify.h from the kernel sources") Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
[ Included "util.h" too, to get the scnprintf() prototype ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf bench: Fix detached tarball building due to missing 'perf bench memcpy' headers
A change on kernel files included by the 'perf bench memcpy' code grew some new
include deps, breaking the detached tarball build:
$ make -C tools/perf build-test
make: Entering directory '/home/acme/git/linux/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
tests/make:302: recipe for target 'tarpkg' failed
make[1]: *** [tarpkg] Error 2
Makefile:102: recipe for target 'build-test' failed
make: *** [build-test] Error 2
make: Leaving directory '/home/acme/git/linux/tools/perf'
$ cat tools/perf/tarpkg
./tests/perf-targz-src-pkg .
PERF_VERSION = 4.5.g05f5ec
PERF_VERSION = 4.5.g05f5ec
In file included from bench/mem-memcpy-x86-64-asm.S:9:0:
bench/../../../arch/x86/lib/memcpy_64.S:5:29: fatal error: asm/cpufeatures.h: No such file or directory
compilation terminated.
mv: cannot stat ‘bench/.mem-memcpy-x86-64-asm.o.tmp’: No such file or directory
make[5]: *** [bench/mem-memcpy-x86-64-asm.o] Error 1
make[5]: *** Waiting for unfinished jobs....
make[4]: *** [bench] Error 2
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [perf-in.o] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [all] Error 2
$
Add arch/*/include/asm/*features.h to tools/perf/MANIFEST so that we can
continue to use detached tarballs to build perf.
Now it builds ok, doing it manually:
$ make help | grep perf
perf-tar-src-pkg - Build perf-4.5.0.tar source tarball
perf-targz-src-pkg - Build perf-4.5.0.tar.gz source tarball
perf-tarbz2-src-pkg - Build perf-4.5.0.tar.bz2 source tarball
perf-tarxz-src-pkg - Build perf-4.5.0.tar.xz source tarball
$ ls -la perf-4.5.0.tar
ls: cannot access perf-4.5.0.tar: No such file or directory
$ make perf-tar-src-pkg
TAR
PERF_VERSION = 4.5.g32c25b
$ ls -la perf-4.5.0.tar
-rw-rw-r--. 1 acme acme 6318080 Mar 24 11:52 perf-4.5.0.tar
$ mv perf-4.5.0.tar /tmp
$ cd /tmp
$ tar xf perf-4.5.0.tar
$ cd perf-4.5.0/tools/perf
$ make > /dev/null
PERF_VERSION = 4.5.g32c25b
$ ls -la perf
-rwxrwxr-x. 1 acme acme 14046416 Mar 24 11:53 perf
$ ./perf --version
perf version 4.5.g32c25b
$ perf bench
Usage:
perf bench [<common options>] <collection> <benchmark> [<options>]
# List of all available benchmark collections:
sched: Scheduler and IPC benchmarks
mem: Memory access benchmarks
numa: NUMA scheduling and MM benchmarks
futex: Futex stressing benchmarks
all: All benchmarks
$ perf bench mem
# List of available benchmarks for collection 'mem':
memcpy: Benchmark for memcpy() functions
memset: Benchmark for memset() functions
all: Run all memory access benchmarks
$ perf bench mem memcpy
# Running 'mem/memcpy' benchmark:
# function 'default' (Default memcpy() provided by glibc)
# Copying 1MB bytes ...
15.024038 GB/sec
# function 'x86-64-unrolled' (unrolled memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB bytes ...
17.438616 GB/sec
# function 'x86-64-movsq' (movsq-based memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB bytes ...
25.040064 GB/sec
# function 'x86-64-movsb' (movsb-based memcpy() in arch/x86/lib/memcpy_64.S)
# Copying 1MB bytes ...
25.040064 GB/sec
$
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2c2sncwffuabw58fj1pw86gu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tests: Fix tarpkg build test error output redirection
So we need to trow away just stdout, leaving stderr to be caught by
the build tests infrastructure, so that we can see what went wrong
when the tarpkg build test fails:
$ make -C tools/perf build-test
make: Entering directory '/home/acme/git/linux/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
tests/make:302: recipe for target 'tarpkg' failed
make[1]: *** [tarpkg] Error 2
Makefile:102: recipe for target 'build-test' failed
make: *** [build-test] Error 2
make: Leaving directory '/home/acme/git/linux/tools/perf'
$ cat tools/perf/tarpkg
./tests/perf-targz-src-pkg .
PERF_VERSION = 4.5.g05f5ec
PERF_VERSION = 4.5.g05f5ec
In file included from bench/mem-memcpy-x86-64-asm.S:9:0:
bench/../../../arch/x86/lib/memcpy_64.S:5:29: fatal error: asm/cpufeatures.h: No such file or directory
compilation terminated.
mv: cannot stat ‘bench/.mem-memcpy-x86-64-asm.o.tmp’: No such file or directory
make[5]: *** [bench/mem-memcpy-x86-64-asm.o] Error 1
make[5]: *** Waiting for unfinished jobs....
make[4]: *** [bench] Error 2
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [perf-in.o] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [all] Error 2
$
So the test flow is:
1. Run: 'make -C tools/perf build-test'
2. One of its tests failed, in this case, the 'tarpkg' one
3. Look at what went wrong, by looking at the output of that test, in
tools/perf/tarpkg
Admittedly, this should be shortcircuited to showing what went wrong directly
from the 'make build-test' step, but lets first fix this tarpkg one and the
problem it spotted, which should be fixed by adding some extra file to the
tools/perf/MANIFEST so that detached tarballs continue being self contained and
build successfully.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ynld6egoxolmftcddpnd7oh6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- Remove redundant CPU output in libtraceevent (Steven Rostedt)
- Remove 'core_id' check in topology 'perf test' (Sukadev Bhattiprolu)
Infrastructure changes/fixes:
- Record text offset in dso to calculate objdump address, to use with
modules in addition to vDSO symbol address calculations (Wang Nan)
- Move utilities.mak from perf to tools/scripts/ (Arnaldo Carvalho de Melo)
- Add cpumode to the perf_sample struct, this way we don't need to pass
the union event to the machine and thread resolving routines, shortening
function signatures and allowing the future introduction of a way
to use tracepoint events instead of the unavailable HW cycles counter on
powerpc guests in perf kvm by just hooking on perf_evsel__parse_sample,
at the end (Arnaldo Carvalho de Melo)
- Remove/unexport die() related infrastructure, that at some point will
finally be removed (Arnaldo Carvalho de Melo)
- Adopt linux/stringify.h from the kernel sources, not to touch this
kernel header from tools/ (Arnaldo Carvalho de Melo)
- Stop using strbuf for things we can instead trivially use libc's asprintf()
(Arnaldo Carvalho de Melo)
- Ditch tools/lib/util/abspath.c, its only exported function was used at just
one place and can be replaced by libc's realpath() (Arnaldo Carvalho de Melo)
- Use strerror_r() in the llvm infrastructure, tread safe, its what is used
elsewhere in tools/perf/ (Arnaldo Carvalho de Melo)
Cleanups:
- Removed misplaced or needless __maybe_unused/export (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf llvm: Use strerror_r instead of the thread unsafe strerror one
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-5njrq9dltckgm624omw9ljgu@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To kill the last user of make_nonrelative_path(), that gets ditched,
one more panicking function killed.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-3hu56rvyh4q5gxogovb6ko8a@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Unexport some methods unused outside strbuf.c
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-nq1wvtky4mpu0nupjyar7sbw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf probe: No need to use formatting strbuf method
We have addch() for chars, add() for fixed size data, and addstr() for
variable length strings, use them.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-0ap02fn2xtvpduj2j6b2o1j4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf help: Use asprintf instead of adhoc equivalents
That doesn't chekcs malloc return and that, when using strbuf, if it
can't grow, just explodes away via die().
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-vr8qsjbwub7e892hpa9msz95@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-s87zi5d03m6rz622y1z6rlsa@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Do not include stringify.h from the kernel sources
Use instead the copy just made to tools/include/linux/.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-q736w12nwy98x5ox2hamp5ow@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tools include: Copy linux/stringify.h from the kernel
There is code in tools/ that is directly including this file from the
kernel, and this is verboten for a while, copy it so that the next csets
can fix this situation.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-e0r3nks2uai020ndghvxv5qw@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Steven Rostedt [Wed, 23 Mar 2016 14:16:28 +0000 (10:16 -0400)]
tools lib traceevent: Remove redundant CPU output
Commit a6745330789f ("tools lib traceevent: Split pevent_print_event()
into specific functionality functions") broke apart the function
pevent_print_event() into three functions.
The first function prints the comm, pid and CPU, the second prints the
timestamp.
But that commit added the printing of the CPU in the timestamp function,
which now causes pevent_print_event() to duplicate the CPU output.
Remove the redundant printing of the record's CPU from the timestamp
function.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: a6745330789f ("tools lib traceevent: Split pevent_print_event() into specific functionality functions") Link: http://lkml.kernel.org/r/20160323101628.459375d2@gandalf.local.home Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tools: Remove needless 'extern' from function prototypes
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-w246stf7ponfamclsai6b9zo@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This should die altogether, but for now lets remove a bit of this stuff,
as it is not used at all.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ade3n99xscldhg5mx2vzd8p3@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-elxg25jd4dhwod4wqbko87qh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf thread: Rename perf_event__preprocess_sample_addr to thread__resolve
Since none of the perf_event fields are used anymore, just the
perf_sample ones, and since this resolves to (map, symbol) from data
structures within struct thread, rename it to thread__resolve and make
the argument ordering similar to the one in machine__resolve().
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-2b33hs9bp550tezzlhl4kejh@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf machine: Rename perf_event__preprocess_sample to machine__resolve
Since we only deal with fields in the passed struct perf_sample move
this method to struct machine, that is where the perf_sample fields
will be resolved to a struct addr_location, i.e. thread, map, symbol,
etc.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-a1ww2lbm2vbuqsv4p7ilubu9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To avoid parsing event->header.misc in many locations.
This will also allow setting perf.sample.{ip,cpumode} in a single place,
from tracepoint fields, as needed by 'perf kvm' with PPC guests, where
the guest hardware counters is not available at the host.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-qp3yradhyt6q3wl895b1aat0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf tests: Forward the perf_sample in the dwarf unwind test
It _will_ be used, no sense in receiving it and nor fowarding it along.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-ht8v5et209wuoh5o6nh9pzyq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-8nzhnokxyp8y4v7gf0j00oyb@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jakub Jelen [Sat, 19 Mar 2016 11:58:07 +0000 (12:58 +0100)]
perf bench numa: Fix assertion for nodes bitfield
Comparing bits and bytes in numa benchmark assertion
I hit the issue on two socket Power8 machine presenting its numa nodes
as 0,1,16,17 (according to numactl). Therefore I got error (and hang of
parent process):
This is obviously false positive. We can fit all the 18 nodes into
bitfield of 8 bytes (long on 64b architecture).
Signed-off-by: Jakub Jelen <jakuje@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jakub Jelen <jjelen@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/1458388687-24421-1-git-send-email-jakuje@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Sun, 20 Mar 2016 18:58:21 +0000 (11:58 -0700)]
perf/x86/intel/uncore: Remove ev_sel_ext bit support for PCU
The ev_sel_ext in PCU_MSR_PMON_CTL is locked on some CPU models, so despite
it being documented in the SDM, if we write 1 to that bit then we can get a #GP
fault.
Which #GP the perf fuzzer happily triggered in Peter Zijlstra's testing.
Also, there are no public events which use that bit, so remove ev_sel_ext
bit support for PCU.
Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1458500301-3594-1-git-send-email-kan.liang@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Huang Rui [Wed, 9 Mar 2016 05:45:06 +0000 (13:45 +0800)]
perf/x86/amd/power: Add AMD accumulated power reporting mechanism
Introduce an AMD accumlated power reporting mechanism for the Family
15h, Model 60h processor that can be used to calculate the average
power consumed by a processor during a measurement interval. The
feature support is indicated by CPUID Fn8000_0007_EDX[12].
This feature will be implemented both in hwmon and perf. The current
design provides one event to report per package/processor power
consumption by counting each compute unit power value.
Here the gory details of how the computation is done:
* Tsample: compute unit power accumulator sample period
* Tref: the PTSC counter period (PTSC: performance timestamp counter)
* N: the ratio of compute unit power accumulator sample period to the
PTSC period
* Jmax: max compute unit accumulated power which is indicated by
MSR_C001007b[MaxCpuSwPwrAcc]
* Jx/Jy: compute unit accumulated power which is indicated by
MSR_C001007a[CpuSwPwrAcc]
* Tx/Ty: the value of performance timestamp counter which is indicated
by CU_PTSC MSR_C0010280[PTSC]
* PwrCPUave: CPU average power
i. Determine the ratio of Tsample to Tref by executing CPUID Fn8000_0007.
N = value of CPUID Fn8000_0007_ECX[CpuPwrSampleTimeRatio[15:0]].
ii. Read the full range of the cumulative energy value from the new
MSR MaxCpuSwPwrAcc.
Jmax = value returned.
iii. At time x, software reads CpuSwPwrAcc and samples the PTSC.
Jx = value read from CpuSwPwrAcc and Tx = value read from PTSC.
iv. At time y, software reads CpuSwPwrAcc and samples the PTSC.
Jy = value read from CpuSwPwrAcc and Ty = value read from PTSC.
v. Calculate the average power consumption for a compute unit over
time period (y-x). Unit of result is uWatt:
Huang Rui [Thu, 14 Jan 2016 02:50:06 +0000 (10:50 +0800)]
x86/cpufeature, perf/x86: Add AMD Accumulated Power Mechanism feature flag
AMD CPU family 15h model 0x60 introduces a mechanism for measuring
accumulated power. It is used to report the processor power consumption
and support for it is indicated by CPUID Fn8000_0007_EDX[12].
Signed-off-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wan Zongshun <Vincent.Wan@amd.com> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/1452739808-11871-4-git-send-email-ray.huang@amd.com
[ Resolved conflict and moved the synthetic CPUID slot to 19. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf/x86/amd: Add support for new IOMMU performance events
This patch adds new IOMMU performance event based on
the information in table 74 of the AMD I/O Virtualization Technology
(IOMMU) Specification (Document Id: 4882, Rev 2.62, Feb 2015)
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Joerg Roedel <jroedel@suse.de> Acked-by: Joerg Roedel <jroedel@suse.de> Cc: <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://support.amd.com/TechDocs/48882_IOMMU.pdf Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vikas Shivappa [Fri, 11 Mar 2016 19:26:17 +0000 (11:26 -0800)]
perf/x86/mbm: Add support for MBM counter overflow handling
This patch adds a per package timer which periodically updates the
memory bandwidth counters for the events that are currently active.
Current patch has a periodic timer every 1s since the SDM guarantees
that the counter will not overflow in 1s but this time can be definitely
improved by calibrating on the system. The overflow is really a function
of the max memory b/w that the socket can support, max counter value and
scaling factor.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/013b756c5006b1c4ca411f3ecf43ed52f19fbf87.1457723885.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vikas Shivappa [Thu, 10 Mar 2016 23:32:11 +0000 (15:32 -0800)]
perf/x86/mbm: Implement RMID recycling
RMID could be allocated or deallocated as part of RMID recycling.
When an RMID is allocated for MBM event, the MBM counter needs to be
initialized because next time we read the counter we need the previous
value to account for total bytes that went to the memory controller.
Similarly, when RMID is deallocated we need to update the ->count
variable.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1457652732-4499-6-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Includes all the core infrastructure to measure the total_bytes and
bandwidth.
We have per socket counters for both total system wide L3 external
bytes and local socket memory-controller bytes. The OS does MSR writes
to MSR_IA32_QM_EVTSEL and MSR_IA32_QM_CTR to read the counters and
uses the IA32_PQR_ASSOC_MSR to associate the RMID with the task. The
tasks have a common RMID for CQM (cache quality of service monitoring)
and MBM. Hence most of the scheduling code is reused from CQM.
Signed-off-by: Tony Luck <tony.luck@intel.com>
[ Restructured rmid_read to not have an obvious hole, removed MBM_CNTR_MAX as its unused. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/abd7aac9a18d93b95b985b931cf258df0164746d.1457723885.git.tony.luck@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vikas Shivappa [Thu, 10 Mar 2016 23:32:09 +0000 (15:32 -0800)]
perf/x86/mbm: Add Intel Memory B/W Monitoring enumeration and init
The MBM init patch enumerates the Intel MBM (Memory b/w monitoring)
and initializes the perf events and datastructures for monitoring the
memory b/w.
Its based on original patch series by Tony Luck and Kanaka Juvva.
Memory bandwidth monitoring (MBM) provides OS/VMM a way to monitor
bandwidth from one level of cache to another. The current patches
support L3 external bandwidth monitoring. It supports both 'local
bandwidth' and 'total bandwidth' monitoring for the socket. Local
bandwidth measures the amount of data sent through the memory controller
on the socket and total b/w measures the total system bandwidth.
Extending the cache quality of service monitoring (CQM) we add two
more events to the perf infrastructure:
intel_cqm_llc/local_bytes - bytes sent through local socket memory controller
intel_cqm_llc/total_bytes - total L3 external bytes sent
The tasks are associated with a Resouce Monitoring ID (RMID) just like
in CQM and OS uses a MSR write to indicate the RMID of the task during
scheduling.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1457652732-4499-4-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Vikas Shivappa [Thu, 10 Mar 2016 23:32:07 +0000 (15:32 -0800)]
perf/x86/cqm: Fix CQM handling of grouping events into a cache_group
Currently CQM (cache quality of service monitoring) is grouping all
events belonging to same PID to use one RMID. However its not counting
all of these different events. Hence we end up with a count of zero
for all events other than the group leader.
The patch tries to address the issue by keeping a flag in the
perf_event.hw which has other CQM related fields. The field is updated
at event creation and during grouping.
Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com>
[peterz: Changed hw_perf_event::is_group_event to an int] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Tony Luck <tony.luck@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: fenghua.yu@intel.com Cc: h.peter.anvin@intel.com Cc: ravi.v.shankar@intel.com Cc: vikas.shivappa@intel.com Link: http://lkml.kernel.org/r/1457652732-4499-2-git-send-email-vikas.shivappa@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 29 Jan 2016 14:17:51 +0000 (15:17 +0100)]
perf/core: Fix Undefined behaviour in rb_alloc()
Sasha reported:
[ 3494.030114] UBSAN: Undefined behaviour in kernel/events/ring_buffer.c:685:22
[ 3494.030647] shift exponent -1 is negative
Andrey spotted that this is because:
It happens if nr_pages = 0:
rb->page_order = ilog2(nr_pages);
Fix it by making both assignments conditional on nr_pages; since
otherwise they should both be 0 anyway, and will be because of the
kzalloc() used to allocate the structure.
Reported-by: Sasha Levin <sasha.levin@oracle.com> Reported-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20160129141751.GA407@worktop Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 17 Mar 2016 14:17:35 +0000 (15:17 +0100)]
perf/core: Fix dynamic interrupt throttle
There were two problems with the dynamic interrupt throttle mechanism,
both triggered by the same action.
When you (or perf_fuzzer) write a huge value into
/proc/sys/kernel/perf_event_max_sample_rate the computed
perf_sample_allowed_ns becomes 0. This effectively disables the whole
dynamic throttle.
This is fixed by ensuring update_perf_cpu_limits() never sets the
value to 0. However, we allow disabling of the dynamic throttle by
writing 100 to /proc/sys/kernel/perf_cpu_time_max_percent. This will
generate a warning in dmesg.
The second problem is that by setting the max_sample_rate to a huge
number, the adaptive process can take a few tries, since it halfs the
limit each time. Change that to directly compute a new value based on
the observed duration.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 17 Mar 2016 14:06:58 +0000 (15:06 +0100)]
perf/x86/ibs: Add IBS interrupt to the dynamic throttle
Interrupt throttling is normally only done against
sysctl_perf_event_sample_rate. This means that if that number is too
high (for whatever reason) you can lock up your machine.
We have, however, a dynamic throttling scheme too, but for that to
work, we need to add a callback to the interrupt handler, IBS did not
have this, so add it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Wed, 16 Mar 2016 22:55:21 +0000 (23:55 +0100)]
perf/x86/ibs: Fix race with IBS_STARTING state
While tracing the IBS bits I saw the NMI hitting between clearing
IBS_STARTING and the actual MSR writes to disable the counter.
Since IBS_STARTING was cleared, the handler assumed these were spurious
NMIs and because STOPPING wasn't set yet either, insta-triggered an
"Unknown NMI".
Cure this by clearing IBS_STARTING after disabling the hardware.
Tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Fri, 11 Mar 2016 14:23:46 +0000 (15:23 +0100)]
perf/x86/ibs: Fix IBS throttle
When the IBS IRQ handler get a !0 return from perf_event_overflow;
meaning it should throttle the event, it only disables it, it doesn't
call perf_ibs_stop().
This confuses the state machine, as we'll use pmu::start() ->
perf_ibs_start() to unthrottle.
Tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: dvyukov@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Link: http://lkml.kernel.org/r/20160311142346.GE6344@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
Peter Zijlstra [Thu, 10 Mar 2016 14:39:24 +0000 (15:39 +0100)]
perf/core: Fix the unthrottle logic
Its possible to IOC_PERIOD while the event is throttled, this would
re-start the event and the next tick would then try to unthrottle it,
and find the event wasn't actually stopped anymore.
This would tickle a WARN in the x86-pmu code which isn't expecting to
start a !stopped event.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: dvyukov@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160310143924.GR6356@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>