Peter Zijlstra [Thu, 29 May 2014 17:00:24 +0000 (19:00 +0200)]
perf: Fix use after free in perf_remove_from_context()
While that mutex should guard the elements, it doesn't guard against the
use-after-free that's from list_for_each_entry_rcu().
__perf_event_exit_task() can actually free the event.
And because list addition/deletion is guarded by both ctx->mutex and
ctx->lock, holding ctx->mutex is sufficient for reading the list, so we
don't actually need the rcu list iteration.
Fixes: 3a497f48637e ("perf: Simplify perf_event_exit_task_context()") Reported-by: Sasha Levin <sasha.levin@oracle.com> Tested-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Dave Jones <davej@redhat.com> Cc: acme@ghostprotocols.net Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140529170024.GA2315@laptop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jianyu Zhan [Mon, 2 Jun 2014 16:44:34 +0000 (00:44 +0800)]
perf tools: Fix 'make help' message error
Currently 'make help' message has such hint:
use "make prefix=<path> <install target>" to install to a particular
path like make prefix=/usr/local install install-doc
But this is misleading, when I specify "prefix=/usr/local", it has got no
respect at all.
This is because that, "DESTDIR" is considered first. In this case, "DESTDIR"
has an empty value, so "prefix" is honored. However, "prefix" is unconditionally
assigned to $HOME, regardless of what it is set to from command line. So our
"prefix" setting got no respect and the actual destination falls back to $HOME.
This patch fixes this issue and corrects the help message.
Jiri Olsa [Mon, 2 Jun 2014 17:44:23 +0000 (13:44 -0400)]
perf record: Fix poll return value propagation
If the perf record command is interrupted in record__mmap_read_all
function, the 'done' is set and err has the latest poll return
value, which is most likely positive number (= number of pollfds
ready to read).
This 'positive err' is then propagated to the exit code, resulting
in not finishing the perf.data header properly, causing following
error in report:
# perf record -F 50000 -a
---
make the system real busy, so there's more chance
to interrupt perf in event writing code
---
^C[ perf record: Woken up 16 times to write data ]
[ perf record: Captured and wrote 30.292 MB perf.data (~1323468 samples) ]
# perf report --stdio > /dev/null
WARNING: The perf.data file's data size field is 0 which is unexpected.
Was the 'perf record' command properly terminated?
Fixing this by checking for positive poll return value
and setting err to 0.
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1401732126-19465-1-git-send-email-jolsa@kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Dianfang Zhang <zhangdianfang@huawei.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jean Delvare <jdelvare@suse.de> Link: http://lkml.kernel.org/r/20140530154709.GC1202@kernel.org
[ changed the changelog a bit ] Signed-off-by: Jiri Olsa <jolsa@kernel.org>
When the audit-libs devel package is not found at build time we disable
the 'trace' command, as we are not able to map syscall numbers to
strings, but then the message the user is presented is cryptic:
[root@zoo linux]# trace ls
perf: 'ls' is not a perf-command. See 'perf --help'.
Fix it by presenting a more helpful message:
[root@zoo linux]# trace l
trace command not available: missing audit-libs devel package at build time.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-uxeunqetd0sgxyibusapen9a@git.kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Namhyung Kim [Fri, 23 May 2014 05:59:57 +0000 (14:59 +0900)]
perf tests: Define and use symbolic names for fake symbols
In various histogram test cases, fake symbols are used as raw numbers.
Define macros for each pid, map, symbols so that it can increase
readability somewhat.
Namhyung Kim [Fri, 23 May 2014 09:49:33 +0000 (18:49 +0900)]
perf ui/gtk: Fix callchain display
With current output field change, GTK browser cannot display callchain
information correctly since it couldn't determine where the symbol
column is. This is a problem - just for now I changed to use the last
column since it'll work for most cases.
Also it has a same problem of the percentage as stdio code.
Namhyung Kim [Fri, 23 May 2014 09:31:52 +0000 (18:31 +0900)]
perf ui/stdio: Fix invalid percentage value of cumulated hist entries
On stdio, there's a problem that it shows invalid values for
callchains in cumulated hist entries. It's because it only cares
about the self period. But with --children behavior, we always add
callchain info to the cumulated entries so it should use the value in
that case.
Namhyung Kim [Tue, 22 Jan 2013 09:09:46 +0000 (18:09 +0900)]
perf top: Add top.children config option
Add top.children config option for setting default value of
callchain accumulation. It affects the output only if one of
-g or --call-graph option is given as well.
A user can write .perfconfig file like below to enable accumulation
by default:
Namhyung Kim [Wed, 30 Oct 2013 08:05:55 +0000 (17:05 +0900)]
perf top: Add --children option
The --children option is for showing accumulated overhead (period)
value as well as self overhead. It should be used with one of -g or
--call-graph option.
Namhyung Kim [Tue, 7 Jan 2014 08:41:03 +0000 (17:41 +0900)]
perf top: Convert to hist_entry_iter
Reuse hist_entry_iter__add() function to share the similar code with
perf report. Note that it needs to be called with hists.lock so tweak
some internal functions not to deadlock or hold the lock too long.
Namhyung Kim [Tue, 7 Jan 2014 08:02:25 +0000 (17:02 +0900)]
perf tools: Add callback function to hist_entry_iter
The new ->add_entry_cb() will be called after an entry was added to
the histogram. It's used for code sharing between perf report and
perf top. Note that ops->add_*_entry() should set iter->he properly
in order to call the ->add_entry_cb.
Also pass @arg to the callback function. It'll be used by perf top
later.
Namhyung Kim [Thu, 20 Mar 2014 00:10:29 +0000 (09:10 +0900)]
perf tools: Do not auto-remove Children column if --fields given
Depending on the configuration perf inserts/removes the Children
column in the output automatically. But it might not be what user
wants if [s]he give --fields option explicitly.
Namhyung Kim [Tue, 22 Jan 2013 09:09:46 +0000 (18:09 +0900)]
perf report: Add report.children config option
Add report.children config option for setting default value of
callchain accumulation. It affects the report output only if
perf.data contains callchain info.
A user can write .perfconfig file like below to enable accumulation
by default:
Namhyung Kim [Thu, 31 Oct 2013 01:17:39 +0000 (10:17 +0900)]
perf tools: Apply percent-limit to cumulative percentage
If -g cumulative option is given, it needs to show entries which don't
have self overhead. So apply percent-limit to accumulated overhead
percentage in this case.
Namhyung Kim [Wed, 30 Oct 2013 07:06:59 +0000 (16:06 +0900)]
perf ui/hist: Add support to accumulated hist stat
Print accumulated stat of a hist entry if requested.
To do that, add new HPP_PERCENT_ACC_FNS macro and generate a
perf_hpp_fmt using it. The __hpp__sort_acc() function sorts entries
by accumulated period value. When accumulated periods of two entries
are same (i.e. single path callchain) put the caller above since
accumulation tends to put callers on higher position for obvious
reason.
Also add "overhead_children" output field to be selected by user.
Namhyung Kim [Thu, 31 Oct 2013 01:05:29 +0000 (10:05 +0900)]
perf report: Cache cumulative callchains
It is possble that a callchain has cycles or recursive calls. In that
case it'll end up having entries more than 100% overhead in the
output. In order to prevent such entries, cache each callchain node
and skip if same entry already cumulated.
Namhyung Kim [Thu, 31 Oct 2013 04:58:30 +0000 (13:58 +0900)]
perf tools: Update cpumode for each cumulative entry
The cpumode and level in struct addr_localtion was set for a sample
and but updated as cumulative callchains were added. This led to have
non-matching symbol and cpumode in the output.
Update it accordingly based on the fact whether the map is a part of
the kernel or not. This is a reverse of what thread__find_addr_map()
does.
Namhyung Kim [Tue, 11 Sep 2012 04:34:27 +0000 (13:34 +0900)]
perf hists: Check if accumulated when adding a hist entry
To support callchain accumulation, @entry should be recognized if it's
accumulated or not when add_hist_entry() called. The period of an
accumulated entry should be added to ->stat_acc but not ->stat. Add
@sample_self arg for that.
Namhyung Kim [Tue, 11 Sep 2012 04:15:07 +0000 (13:15 +0900)]
perf hists: Add support for accumulated stat of hist entry
Maintain accumulated stat information in hist_entry->stat_acc if
symbol_conf.cumulate_callchain is set. Fields in ->stat_acc have same
vaules initially, and will be updated as callchain is processed later.
Namhyung Kim [Wed, 30 Oct 2013 00:40:34 +0000 (09:40 +0900)]
perf tools: Introduce struct hist_entry_iter
There're some duplicate code when adding hist entries. They are
different in that some have branch info or mem info but generally do
same thing. So introduce new struct hist_entry_iter and add callbacks
to customize each case in general way.
The new perf_evsel__add_entry() function will look like:
iter->prepare_entry();
iter->add_single_entry();
while (iter->next_entry())
iter->add_next_entry();
iter->finish_entry();
This will help further work like the cumulative callchain patchset.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: Arun Sharma <asharma@fb.com> Tested-by: Rodrigo Campos <rodrigo@sdfg.com.ar> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1401335910-16832-3-git-send-email-namhyung@kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Ingo Molnar [Thu, 22 May 2014 09:39:08 +0000 (11:39 +0200)]
Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/uprobes
Pull uprobes fixes and changes from Oleg Nesterov:
" Denys found another nasty old bug in uprobes/x86: div, mul, shifts with
count in CL, and cmpxchg are not handled correctly.
Plus a couple of other minor fixes. Nobody acked the changes in x86/traps,
hopefully they are simple enough, and I believe that they make sense anyway
and allow to do more cleanups."
Michael Lentine [Tue, 20 May 2014 09:48:50 +0000 (11:48 +0200)]
perf tools: Add automatic remapping of Android libraries
This patch automatically adjusts the path of MMAP records
associated with Android system libraries.
The Android system is organized with system libraries found in
/system/lib and user libraries in /data/app-lib. On the host system
(not running Android), system libraries can be found in the downloaded
NDK directory under ${NDK_ROOT}/platforms/${APP_PLATFORM}/arch-${ARCH}/usr/lib
and the user libraries are installed under libs/${APP_ABI} within
the apk build directory. This patch makes running the reporting
tools possible on the host system using the libraries from the NDK.
Signed-off-by: Michael Lentine <mlentine@google.com> Reviewed-by: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1400579330-5043-3-git-send-email-eranian@google.com
[ fixed 'space required before the open parenthesis' checkpatch.pl errors ] Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Michael Lentine [Tue, 20 May 2014 09:48:49 +0000 (11:48 +0200)]
perf tools: Add cat as fallback pager
This patch adds a fallback to cat for the pager. This is useful
on environments, such as Android, where less does not exist.
It is better to default to cat than to abort.
Namhyung Kim [Mon, 19 May 2014 05:19:30 +0000 (14:19 +0900)]
perf tools: Get rid of obsolete hist_entry__sort_list
Now we moved to the perf_hpp_[_sort]_list so no need to keep the old
hist_entry__sort_list and sort__first_dimension. Also the
hist_entry__sort_snprintf() can be gone as hist_entry__snprintf()
provides the functionality.
Namhyung Kim [Wed, 16 Apr 2014 02:16:33 +0000 (11:16 +0900)]
perf report/tui: Fix a bug when --fields/sort is given
The hists__filter_entries() function is called when down arrow key is
pressed for navigating through the entries in TUI. It has a check for
filtering out entries that have very small overhead (under min_pcnt).
However it just assumed the entries are sorted by the overhead so when
it saw such a small overheaded entry, it just stopped navigating as an
optimization. But it's not true anymore due to new --fields and
--sort optoin behavior and this case users cannot go down to a next
entry if ther's an entry with small overhead in-between.
Namhyung Kim [Tue, 4 Mar 2014 02:01:41 +0000 (11:01 +0900)]
perf tools: Add ->sort() member to struct sort_entry
Currently, what the sort_entry does is just identifying hist entries
so that they can be grouped properly. However, with -F option
support, it indeed needs to sort entries appropriately to be shown to
users. So add ->sort() member to do it.
Namhyung Kim [Tue, 4 Mar 2014 01:46:34 +0000 (10:46 +0900)]
perf report: Add -F option to specify output fields
The -F/--fields option is to allow user setup output field in any
order. It can receive any sort keys and following (hpp) fields:
overhead, overhead_sys, overhead_us, sample and period
If guest profiling is enabled, overhead_guest_{sys,us} will be
available too.
The output fields also affect sort order unless you give -s/--sort
option. And any keys specified on -s option, will also be added to
the output field list automatically.
Namhyung Kim [Wed, 16 Apr 2014 02:04:51 +0000 (11:04 +0900)]
perf tools: Call perf_hpp__init() before setting up GUI browsers
So that it can be set properly prior to set up output fields. That
makes easy to handle/warn errors during the setup since it doesn't
need to be bothered with the GUI.
Namhyung Kim [Tue, 18 Mar 2014 02:31:39 +0000 (11:31 +0900)]
perf tools: Consolidate management of default sort orders
The perf uses different default sort orders for different use-cases,
and this was scattered throughout the code. Add get_default_sort_
order() function to handle this and change initial value of sort_order
to NULL to distinguish it from user-given one.
Namhyung Kim [Mon, 3 Mar 2014 08:05:19 +0000 (17:05 +0900)]
perf ui: Get rid of callback from __hpp__fmt()
The callback was used by TUI for determining color of folded sign
using percent of first field/column. But it cannot be used anymore
since it now support dynamic reordering of output field.
So move the logic to the hist_browser__show_entry().
Namhyung Kim [Mon, 3 Mar 2014 07:16:20 +0000 (16:16 +0900)]
perf tools: Consolidate output field handling to hpp format routines
Until now the hpp and sort functions do similar jobs different ways.
Since the sort functions converted/wrapped to hpp formats it can do
the job in a uniform way.
The perf_hpp__sort_list has a list of hpp formats to sort entries and
the perf_hpp__list has a list of hpp formats to print output result.
To have a backward compatibility, it automatically adds 'overhead'
field in front of sort list. And then all of fields in sort list
added to the output list (if it's not already there).
Stephane Eranian [Thu, 15 May 2014 15:56:44 +0000 (17:56 +0200)]
fix Haswell precise store data source encoding
This patch fixes a bug in precise_store_data_hsw() whereby
it would set the data source memory level to the wrong value.
As per the the SDM Vol 3b Table 18-41 (Layout of Data Linear
Address Information in PEBS Record), when status bit 0 is set
this is a L1 hit, otherwise this is a L1 miss.
This patch encodes the memory level according to the specification.
In V2, we added the filtering on the store events.
Only the following events produce L1 information:
* MEM_UOPS_RETIRED.STLB_MISS_STORES
* MEM_UOPS_RETIRED.LOCK_STORES
* MEM_UOPS_RETIRED.SPLIT_STORES
* MEM_UOPS_RETIRED.ALL_STORES
Cc: mingo@elte.hu Cc: acme@ghostprotocols.net Cc: jolsa@redhat.com Cc: jmario@redhat.com Cc: ak@linux.intel.com Tested-and-Reviewed-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140515155644.GA3884@quad Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Zijlstra [Wed, 23 Apr 2014 10:22:54 +0000 (12:22 +0200)]
perf: Fix perf_event_open(.flags) test
Vince noticed that we test the (unsigned long) flags field against an
(unsigned int) constant. This would allow setting the high bits on 64bit
platforms and not get an error.
There is nothing that uses the high bits, so it should be entirely
harmless, but we don't want userspace to accidentally set them anyway,
so fix the constants.
Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140423102254.GL11096@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Jean Pihet [Fri, 16 May 2014 08:41:11 +0000 (10:41 +0200)]
perf tests: Add dwarf unwind test on ARM
Adding dwarf unwind test, that setups live machine data over
the perf test thread and does the remote unwind.
Need to use -fno-optimize-sibling-calls for test compilation,
otherwise 'krava_*' function calls are optimized into jumps
and omitted from the stack unwind.
So far it was enabled only for x86.
Signed-off-by: Jean Pihet <jean.pihet@linaro.org> Reviewed-by: Will Deacon <will.deacon@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1400229672-16104-3-git-send-email-jean.pihet@linaro.org Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Masanari Iida [Wed, 14 May 2014 17:13:38 +0000 (02:13 +0900)]
perf session: Fix possible null pointer dereference in session.c
cppcheck detected following warning:
[tools/perf/util/session.c:1628] -> [tools/perf/util/session.c:1632]:
(warning) Possible null pointer dereference: session - otherwise it
is redundant to check it against null.
In order to avoide null pointer, check the pointer before use.
Dongsheng Yang [Tue, 13 May 2014 01:38:21 +0000 (10:38 +0900)]
perf sched: Remove nr_state_machine_bugs in perf latency
As we do not use .success in sched_wakeup event any more, then
we can not guarantee that the task when wakeup event happen is
out of run queue. So the message of nr_state_machine_bugs is
not correct.
Oleg Nesterov [Mon, 12 May 2014 16:24:45 +0000 (18:24 +0200)]
uprobes/x86: Fix the wrong ->si_addr when xol triggers a trap
If the probed insn triggers a trap, ->si_addr = regs->ip is technically
correct, but this is not what the signal handler wants; we need to pass
the address of the probed insn, not the address of xol slot.
Add the new arch-agnostic helper, uprobe_get_trap_addr(), and change
fill_trap_info() and math_error() to use it. !CONFIG_UPROBES case in
uprobes.h uses a macro to avoid include hell and ensure that it can be
compiled even if an architecture doesn't define instruction_pointer().
Oleg Nesterov [Thu, 8 May 2014 18:04:11 +0000 (20:04 +0200)]
x86/traps: Shift fill_trap_info() from DO_ERROR_INFO() to do_error_trap()
Move the callsite of fill_trap_info() into do_error_trap() and remove
the "siginfo_t *info" argument.
This obviously breaks DO_ERROR() which passed info == NULL, we simply
change fill_trap_info() to return "siginfo_t *" and add the "default"
case which returns SEND_SIG_PRIV.
Extract the fill-siginfo code from DO_ERROR_INFO() into the new helper,
fill_trap_info().
It can calculate si_code and si_addr looking at trapnr, so we can remove
these arguments from DO_ERROR_INFO() and simplify the source code. The
generated code is the same, __builtin_constant_p(trapnr) == T.
Oleg Nesterov [Wed, 7 May 2014 15:21:34 +0000 (17:21 +0200)]
x86/traps: Introduce do_error_trap()
Move the common code from DO_ERROR() and DO_ERROR_INFO() into the new
helper, do_error_trap(). This simplifies define's and shaves 527 bytes
from traps.o.
Denys Vlasenko [Fri, 2 May 2014 15:04:00 +0000 (17:04 +0200)]
uprobes/x86: Fix scratch register selection for rip-relative fixups
Before this patch, instructions such as div, mul, shifts with count
in CL, cmpxchg are mishandled.
This patch adds vex prefix handling. In particular, it avoids colliding
with register operand encoded in vex.vvvv field.
Since we need to avoid two possible register operands, the selection of
scratch register needs to be from at least three registers.
After looking through a lot of CPU docs, it looks like the safest choice
is SI,DI,BX. Selecting BX needs care to not collide with implicit use of
BX by cmpxchg8b.
long two = 2;
void test1(void)
{
long ax = 0, dx = 0;
asm volatile("\n"
" xor %%edx,%%edx\n"
" lea 2(%%edx),%%eax\n"
// We divide 2 by 2. Result (in eax) should be 1:
" probe1: .globl probe1\n"
" divl two(%%rip)\n"
// If we have a bug (eax mangled on entry) the result will be 2,
// because eax gets restored by probe machinery.
: "=a" (ax), "=d" (dx) /*out*/
: "0" (ax), "1" (dx) /*in*/
: "memory" /*clobber*/
);
dprintf(2, "%s: %s\n", __func__,
pass[ax == 1]
);
}
long val2 = 0;
void test2(void)
{
long old_val = val2;
long ax = 0, dx = 0;
asm volatile("\n"
" mov val2,%%eax\n" // eax := val2
" lea 1(%%eax),%%edx\n" // edx := eax+1
// eax is equal to val2. cmpxchg should store edx to val2:
" probe2: .globl probe2\n"
" cmpxchg %%edx,val2(%%rip)\n"
// If we have a bug (eax mangled on entry), val2 will stay unchanged
: "=a" (ax), "=d" (dx) /*out*/
: "0" (ax), "1" (dx) /*in*/
: "memory" /*clobber*/
);
dprintf(2, "%s: %s\n", __func__,
pass[val2 == old_val + 1]
);
}
long val3[2] = {0,0};
void test3(void)
{
long old_val = val3[0];
long ax = 0, dx = 0;
asm volatile("\n"
" mov val3,%%eax\n" // edx:eax := val3
" mov val3+4,%%edx\n"
" mov %%eax,%%ebx\n" // ecx:ebx := edx:eax + 1
" mov %%edx,%%ecx\n"
" add $1,%%ebx\n"
" adc $0,%%ecx\n"
// edx:eax is equal to val3. cmpxchg8b should store ecx:ebx to val3:
" probe3: .globl probe3\n"
" cmpxchg8b val3(%%rip)\n"
// If we have a bug (edx:eax mangled on entry), val3 will stay unchanged.
// If ecx:edx in mangled, val3 will get wrong value.
: "=a" (ax), "=d" (dx) /*out*/
: "0" (ax), "1" (dx) /*in*/
: "cx", "bx", "memory" /*clobber*/
);
dprintf(2, "%s: %s\n", __func__,
pass[val3[0] == old_val + 1 && val3[1] == 0]
);
}
Denys Vlasenko [Thu, 1 May 2014 14:52:46 +0000 (16:52 +0200)]
uprobes/x86: Simplify rip-relative handling
It is possible to replace rip-relative addressing mode with addressing
mode of the same length: (reg+disp32). This eliminates the need to fix
up immediate and correct for changing instruction length.
Oleg Nesterov [Mon, 5 May 2014 14:38:18 +0000 (16:38 +0200)]
uprobes: Add mem_cgroup_charge_anon() into uprobe_write_opcode()
Hugh says:
The one I noticed was that it forgets all about memcg (because
it was copied from KSM, and there the replacement page has already
been charged to a memcg). See how mm/memory.c do_anonymous_page()
does a mem_cgroup_charge_anon().
Hopefully not a big problem, uprobes is a system-wide thing and only
root can insert the probes. But I agree, should be fixed anyway.
Add mem_cgroup_{un,}charge_anon() into uprobe_write_opcode(). To simplify
the error handling (and avoid the new "uncharge" label) the patch also
moves anon_vma_prepare() up before we alloc/charge the new page.
While at it fix the comment about ->mmap_sem, it is held for write.
Peter Zijlstra [Mon, 12 May 2014 18:19:46 +0000 (20:19 +0200)]
perf tools: Remove usage of trace_sched_wakeup(.success)
trace_sched_wakeup(.success) is a dead argument and has been for ages,
the only reason its still there is because of brain dead software, which
apparently includes perf tools
There's a few more instances in pearly snake shit, but that's not
supported as far as I care anyhow, so let that bitrot.
Namhyung Kim [Mon, 12 May 2014 00:56:42 +0000 (09:56 +0900)]
perf tools: Use tid for finding thread
I believe that passing pid (instead of tid) as the 3rd arg of the
machine__find*_thread() was to find a main thread so that it can
search proper map group for symbols. However with the map sharing
patch applied, it now can do it in any thread.
It fixes a bug when each thread has different name, it only reports a
main thread for samples in other threads.
Cc: Adrian Hunter <adrian.hunter@intel.com> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1399856202-26221-1-git-send-email-namhyung@kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Namhyung Kim [Mon, 12 May 2014 00:47:24 +0000 (09:47 +0900)]
perf record: Propagate exit status of a command line workload
Currently perf record doesn't propagate the exit status of a workload
given by the command line. But sometimes it'd useful if it's
propagated so that a monitoring script can handle errors
appropriately.
To do that, it moves most of logic out of the exit handlers and run
them directly in the __cmd_record(). The only thing needs to be done
in the handler is propagating terminating signal so that the shell can
terminate its loop properly when Ctrl-C was pressed. Also it cleaned
up the resource management code in record__exit().
With this change, perf record returns the child exit status in case of
normal termination and send signal to itself when terminated by signal.
Example run of Stephane's case:
$ perf record true && echo yes || echo no
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data (~589 samples) ]
yes
$ perf record false && echo yes || echo no
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB perf.data (~589 samples) ]
no
Jiri's case (error in parent):
$ perf record -m 10G true && echo yes || echo no
rounding mmap pages size to 17179869184 bytes (4194304 pages)
failed to mmap with 12 (Cannot allocate memory)
no
$ ulimit -n 6
$ perf record sleep 1 && echo yes || echo no
failed to create 'go' pipe: Too many open files
Couldn't run the workload!
no
And Peter's case (interrupted by signal):
$ while :; do perf record sleep 1; done
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data (~593 samples) ]
Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/1399855645-25815-1-git-send-email-namhyung@kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org>
But B0, which is explained as swapper:0 did not appear in the
left part of output. Instead, we use '.' as the shortname of
swapper:0. So the comment of "B0 => swapper:0" is not easy to
understand.
This patch clarify the output of perf sched map with not allocating
one letter-number shortname for swapper:0 and print ". => swapper:0"
as the explanation for swapper:0.
Dongsheng [Mon, 5 May 2014 07:05:53 +0000 (16:05 +0900)]
perf tools: Add missing event for perf sched record.
We should record and process sched:sched_wakeup_new event in
perf sched tool, but currently, there is the process function
for it, without recording it in record subcommand.
This patch add -e sched:sched_wakeup_new to perf sched record.
Peter Zijlstra [Mon, 5 May 2014 10:11:24 +0000 (12:11 +0200)]
perf: Rework free paths
Primarily make perf_event_release_kernel() into put_event(), this will
allow kernel space to create per-task inherited events, and is safer
in general.
Peter Zijlstra [Mon, 5 May 2014 09:41:02 +0000 (11:41 +0200)]
perf: Always destroy groups on exit
Commit 38b435b16c36 ("perf: Fix tear-down of inherited group events")
states that we need to destroy groups for inherited events, but it
doesn't make any sense to not also destroy groups for normal events.
And while it usually makes no difference (the normal events won't
leak, and its very likely all the group events will die in quick
succession) it does make the code more consistent and closes a
potential hole for trouble.
Peter Zijlstra [Tue, 6 May 2014 07:59:34 +0000 (09:59 +0200)]
perf: Ensure consistent inherit state in groups
Make sure all events in a group have the same inherit state. It was
possible for group leaders to have inherit set while sibling events
would not have inherit set.
In this case we'd still inherit the siblings, leading to some
non-fatal weirdness.