We were keeping the session around just because we kept pointers to
struct thread instances, but now we reference count them, so no need
for deferring the perf_session__delete call to after we traverse the
work_list entries.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-9agtck6jdr3rebdp39z1lo0e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We need to do that to stop accumulating entries in the dead_threads
linked list, i.e. we were keeping references to threads in struct hists
that continue to exist even after a thread exited and was removed from
the machine threads rbtree.
We still keep the dead_threads list, but just for debugging, allowing us
to iterate at any given point over the threads that still are referenced
by things like struct hist_entry.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-3ejvfyed0r7ue61dkurzjux4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf probe: Remove bias offset to find probe point by address
Remove bias offset to find probe point by address.
Without this patch, probe points on kernel and executables are shown
correctly, but do not work with libraries:
# ./perf probe -l
probe:do_fork (on do_fork@kernel/fork.c)
probe_libc:malloc (on malloc in /usr/lib64/libc-2.17.so)
probe_perf:strlist__new (on strlist__new@util/strlist.c in /home/mhiramat/ksrc/linux-3/tools/perf/perf)
Removing bias allows it to show it as real place:
# ./perf probe -l
probe:do_fork (on do_fork@kernel/fork.c)
probe_libc:malloc (on __libc_malloc@malloc/malloc.c in /usr/lib64/libc-2.17.so)
probe_perf:strlist__new (on strlist__new@util/strlist.c in /home/mhiramat/ksrc/linux-3/tools/perf/perf)
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naohiro Aota <naota@elisp.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150302124946.9191.64085.stgit@localhost.localdomain Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masami Hiramatsu [Sat, 28 Feb 2015 02:53:29 +0000 (11:53 +0900)]
perf probe: Warn if given uprobe event accesses memory on older kernel
Warn if given uprobe event accesses memory on older kernel.
Until 3.14, uprobe event only supports accessing registers so this warns
to upgrade kernel if uprobe-event returns -EINVAL and an argument of the
event accesses memory ($stack, @+offset, and +|-offs() symtax).
With this patch (on 3.10.0-123.13.2.el7.x86_64);
-----
# ./perf probe -x ./perf warn_uprobe_event_compat stack=-0\(%sp\)
Added new event:
Failed to write event: Invalid argument
Please upgrade your kernel to at least 3.14 to have access to feature -0(%sp)
Error: Failed to add events.
-----
Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20150228025329.32106.70581.stgit@localhost.localdomain Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Sat, 28 Feb 2015 09:16:27 +0000 (10:16 +0100)]
perf tools: Improve feature test debuggability
Certain feature tests fail with link errors:
triton:~/tip/tools/perf/config/feature-checks> make test-libbabeltrace.bin
gcc -MD -o test-libbabeltrace.bin test-libbabeltrace.c # -lbabeltrace provided by
/tmp/cc6dRSqd.o: In function `main':
test-libbabeltrace.c:(.text+0xf): undefined reference to `bt_ctf_stream_class_get_packet_context_type'
although they should already fail with a build error due to lack of a
proper prototype for the function. Due to this I first tried to find
which library was missing - while it was the whole feature that was
missing from the .h file already.
To solve this, propagate -Wall -Werror to all testcases and remove them
from testcase Makefile rules that used them explicitly.
A missing feature now outputs:
triton:~/tip/tools/perf/config/feature-checks> make test-libbabeltrace.bin
gcc -MD -Wall -Werror -o test-libbabeltrace.bin test-libbabeltrace.c # -lbabeltrace provided by
test-libbabeltrace.c: In function ‘main’:
test-libbabeltrace.c:6:2: error: implicit declaration of function ‘bt_ctf_stream_class_get_packet_context_type’ [-Werror=implicit-function-declaration]
Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <david.ahern@oracle.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150228091627.GF31887@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
config/Makefile:566: No python-config tool was found
config/Makefile:566: Python support will not be built
config/Makefile:565: No 'python-config' tool was found: disables Python support - please install python-devel/python-dev
It's now a standard one-line message with a package install suggestion,
and it also uses the standard language used by other feature detection
messages.
Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <david.ahern@oracle.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150228083345.GB31887@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ingo Molnar [Sat, 28 Feb 2015 08:17:50 +0000 (09:17 +0100)]
perf tools: Remove annoying extra message from the features build
This message:
Makefile:153: The path 'python-config' is not executable.
Appears on every perf build that does not have a sufficient python
environment installed. It's really just an internal detail of python
configuration pass and users should not see it - and it's pretty
meaningless to them in any case because the message is not very helpful.
(So it's not executable. Why does that matter? What can the user do
about it?)
Remove the warning, the missing python feature warning is sufficient:
config/Makefile:566: No python-config tool was found
config/Makefile:566: Python support will not be built
although even that one isn't very helpful to users: so no Python support
will be built, what can the user do to fix that? Most other such
warnings give package install suggestions.
Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <david.ahern@oracle.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150228081750.GA31887@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 2 Mar 2015 03:13:33 +0000 (12:13 +0900)]
perf record: Document --group option
The 'perf record --group' option lacks documentation and confuses users.
As -e/--event option already supports group spec, it should not be used
anymore.
Also add a short description of event group itself.
Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1425266013-5034-1-git-send-email-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Namhyung Kim [Mon, 2 Mar 2015 04:31:03 +0000 (13:31 +0900)]
perf tools: Fix build error on ARCH=i386/x86_64/sparc64
He Kuang reported that current perf tools failed to build when ARCH
variable was given like above.
It was because the name is different that internal directory name. I
can see that David's sparc64 build has same problem.
So fix it by applying the sed conversion script to the command line ARCH
variable also, and fixing the converted name there (i.e. i386/x86_64 ->
x86, sparc64 -> sparc).
Reported-by: He Kuang <hekuang@huawei.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Tested-by: He Kuang <hekuang@huawei.com>
Acked: Jiri Olsa <jolsa@redhat.com> Cc: David Ahern <david.ahern@oracle.com> Cc: He Kuang <hekuang@huawei.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1425270663-10215-1-git-send-email-namhyung@kernel.org
[ Resolved conflict with 4861f87cd3d1 "Make sparc64 arch point to sparc" ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We ended up emitting PERF_RECORD_FORK events after their corresponding
PERF_RECORD_COMM, so the code below will remove the "existing thread"
and then recreates it, unnecessarily:
[root@ssdandy ~]# perf probe -x ~/bin/perf fork_after_comm=machine__process_fork_event:12
Added new event:
probe_perf:fork_after_comm (on machine__process_fork_event:12 in /home/acme/bin/perf)
You can now use it in all perf tools, such as:
perf record -e probe_perf:fork_after_comm -aR sleep 1
[root@ssdandy ~]#
[root@ssdandy ~]# perf record -g -e probe_perf:* trace -o /tmp/bla
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.021 MB perf.data (30 samples) ]
Terminated
[root@ssdandy ~]#
[root@ssdandy ~]# perf report --no-children --show-total-period --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 30 of event 'probe_perf:fork_after_comm'
# Event count (approx.): 30
#
# Overhead Period Command Shared Object Symbol
# ........ ............ ....... ............. ...............................
#
100.00% 30 trace trace [.] machine__process_fork_event
|
---machine__process_fork_event
__event__synthesize_thread.part.2
perf_event__synthesize_threads
cmd_trace
main
__libc_start_main
Fix it by more closely mimicking how the kernel generates those records
when a new fork happens, i.e. first a PERF_RECORD_FORK, then a
PERF_RECORD_COMM.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-h0emvymi2t3mw8dlqd6d6z73@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch fixes the issues by checking if the counter is supported,
before reading and logging the counter value.
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: http://lkml.kernel.org/r/1423852858-8455-1-git-send-email-suzuki.poulose@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Wed, 18 Feb 2015 23:59:31 +0000 (18:59 -0500)]
perf tools: Compare JOBS to 0 after grep
If JOBS is not by user perf tries to autodetect the number by grepping
the number of CPUs from /proc/cpuinfo. 'grep -c' will always return an
integer so after this command JOBS should be compared to 0, not "".
He Kuang [Sun, 15 Feb 2015 02:33:37 +0000 (10:33 +0800)]
perf report: Fix branch stack mode cannot be set
When perf.data file is obtained using 'perf record -b', perf report
should use branch stack mode to generate output. But this function is
broken by improper comparison between boolean and constant -1.
before this patch:
$ perf report -b -i perf.data
Samples: 16 of event 'cycles', Event count (approx.): 3171896
Overhead Command Shared Object Symbol
13.59% ls [kernel.kallsyms] [k] prio_tree_remove
13.16% ls [kernel.kallsyms] [k] change_pte_range
12.09% ls [kernel.kallsyms] [k] page_fault
12.02% ls [kernel.kallsyms] [k] zap_pte_range
...
after this patch:
$ perf report -b -i perf.data
Samples: 256 of event 'cycles', Event count (approx.): 256
Overhead Command Source Shared Object Source Symbol Target Shared Object Target Symbol
9.38% ls [unknown] [k] 0000000000000000 [unknown] [k] 0000000000000000
6.25% ls libc-2.19.so [.] _dl_addr libc-2.19.so [.] _dl_addr
6.25% ls [kernel.kallsyms] [k] zap_pte_range [kernel.kallsyms] [k] zap_pte_range
6.25% ls [kernel.kallsyms] [k] change_pte_range [kernel.kallsyms] [k] change_pte_range
0.39% ls [kernel.kallsyms] [k] prio_tree_remove [kernel.kallsyms] [k] prio_tree_remove
...
Signed-off-by: He Kuang <hekuang@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1423967617-28879-1-git-send-email-hekuang@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masami Hiramatsu [Fri, 27 Feb 2015 04:50:26 +0000 (13:50 +0900)]
perf buildid-cache: Add --purge FILE to remove all caches of FILE
Add --purge FILE to remove all caches of FILE.
Since the current --remove FILE removes a cache which has
same build-id of given FILE. Since the command takes a
FILE path, it can confuse user who tries to remove cache
about FILE path.
-----
# ./perf buildid-cache -v --add ./perf
Adding 133b7b5486d987a5ab5c3ebf4ea14941f45d4d4f ./perf: Ok
# (update the ./perf binary)
# ./perf buildid-cache -v --remove ./perf
Removing 305bbd1be68f66eca7e2d78db294653031edfa79 ./perf: FAIL
./perf wasn't in the cache
-----
Actually, the --remove's FAIL is not shown, it just silently fails.
So, this patch adds --purge FILE action for such usecase.
perf buildid-cache --purge FILE removes all caches which has same FILE
path.
In other words, it removes all caches including old binaries.
BTW, if you want to purge all the caches, remove ~/.debug/* .
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Hemant Kumar <hemant@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20150227045026.1999.64084.stgit@localhost.localdomain
[ s/dirname/dir_name/g to fix build on fedora14, where dirname is a global ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yunlong Song [Fri, 27 Feb 2015 10:21:31 +0000 (18:21 +0800)]
perf tools: Fix the bash completion problem of 'perf --*'
The perf-completion.sh uses a predefined string '--help --version
--exec-path --html-path --paginate --no-pager --perf-dir --work-tree
--debugfs-dir' for the bash completion of 'perf --*', which has two
problems:
Problem 1: If the options of perf are changed (see handle_options() in
perf.c), the perf-completion.sh has to be changed at the same time. If
not, the bash completion of 'perf --*' and the options which perf
really supports will be inconsistent.
Problem 2: When typing another single character after 'perf --', e.g.
'h', and hit TAB key to get the bash completion of 'perf --h', the
character 'h' disappears at once. This is not what we want, we wish the
bash completion can return '--help --html-path' and then we can
continue to choose one.
To solve this problem, we add '--list-opts' to perf, which now supports
'perf --list-opts' directly, and its result can be used in bash
completion now.
Example:
Before this patch:
$ perf --h <-- hit TAB key after character 'h'
$ perf -- <-- 'h' disappears and no required result
After this patch:
$ perf --h <-- hit TAB key after character 'h'
--help --html-path <-- the required result
Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1425032491-20224-8-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yunlong Song [Fri, 27 Feb 2015 10:21:28 +0000 (18:21 +0800)]
perf list: Extend raw-dump to certain kind of events
Extend 'perf list --raw-dump' to 'perf list --raw-dump [hw|sw|cache
|tracepoint|pmu|event_glob]' in order to show the raw-dump of a certain
kind of events rather than all of the events.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1425032491-20224-5-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yunlong Song [Fri, 27 Feb 2015 10:21:27 +0000 (18:21 +0800)]
perf list: Clean up the printing functions of hardware/software events
Do not need print_events_type or __print_events_type for listing hw/sw
events, let print_symbol_events do its job instead. Moreover,
print_symbol_events can also handle event_glob and name_only.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1425032491-20224-4-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yunlong Song [Fri, 27 Feb 2015 10:21:30 +0000 (18:21 +0800)]
perf tools: Remove the '--(null)' long_name for --list-opts
If the long_name of a 'struct option' is defined as NULL, --list-opts
will incorrectly print '--(null)' in its output. As a result, '--(null)'
will finally appear in the case of bash completion, e.g. 'perf record
--'.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1425032491-20224-7-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yunlong Song [Fri, 27 Feb 2015 10:21:29 +0000 (18:21 +0800)]
perf list: Avoid confusion of perf output and the next command prompt
Distinguish the output of 'perf list --list-opts' or 'perf --list-cmds'
with the next command prompt, which also happens in other cases (e.g.
record, report ...).
Example:
Before this patch:
$perf list --list-opts
--raw-dump $ <-- the output and the next command prompt are at
the same line
After this patch:
$perf list --list-opts
--raw-dump
$ <-- the new line
Yunlong Song [Fri, 27 Feb 2015 10:21:26 +0000 (18:21 +0800)]
perf list: Allow listing events with 'tracepoint' prefix
If somebody happens to name an event with the beginning of 'tracepoint'
(e.g. tracepoint_foo), then it will never be showed with perf list
event_glob, thus we parse the argument 'tracepoint' more carefully for
accuracy.
As shown above, only the events of tracepoint_foo are printed.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1425032491-20224-3-git-send-email-yunlong.song@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1425032491-20224-2-git-send-email-yunlong.song@huawei.com
[ Don't forget closedir({sys,evt}_dir) when handling errors ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Yunlong Song [Fri, 27 Feb 2015 11:53:46 +0000 (19:53 +0800)]
perf data: Fix sentinel setting for data_cmds array
The recent new patch "perf tools: Add new 'perf data' command" (commit 2245bf14 in acme's git repo perf/core) has caused a building error when
compiling the source code of perf:
Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1425038026-27604-1-git-send-email-yunlong.song@huawei.com
[ .name == NULL ends the loop, use it instead of seting all fields to NULL ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Kan Liang [Mon, 9 Feb 2015 05:39:44 +0000 (05:39 +0000)]
perf diff: Support for different binaries
Currently, the perf diff only works with same binaries. That's because
it compares the symbol start address. It doesn't work if the perf.data
comes from different binaries. This patch matches the symbol names.
Actually, perf diff once intended to compare the symbol names. The
commit as below can look for a pair by name.
604c5c92972d (perf diff: Change the default sort order to "dso,symbol")
However, at that time, perf diff used a global list of dsos. That means
the binaries which has same name can only be loaded once. That's a
problem for comparing different binaries.
For example, we have an old binary and an updated binary. They very
likely have same name and most of the functions, so only dsos from old
binary will be loaded. When processing the data from updated binary,
perf still use the symbol information from old binary. That's wrong.
Then the commit as below used IP to replace symbol name. 9c443dfdd31e ("perf diff: Fix support for all --sort combinations")
>From that time, perf diff starts to compare the symbol address.
The global dsos is discarded from a patch in 2010. a1645ce12adb ("perf: 'perf kvm' tool for monitoring guest performance
from host")
However, at that time, perf diff already compared by address. So perf
diff cannot work for different binaries as well.
This patch actually rolls back the perf diff to original design. The
document is also changed, so everybody knows the original design is to
compare the symbol names.
Here are some examples:
The only difference between example_v1.c and example_v2.c is the
location of f2 and f3. There is no change in behavior, but the previous
perf diff display the wrong differential profile.
example_v1.c
noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {
if(i%2)
i++;
else
i++;
}
}
noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;
}
noinline void f1(void)
{
f2();
f3();
}
int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}
example_v2.c
noinline void f2(void)
{
volatile int a = 100, b, c;
for (b = 0; b < 10000; b++)
c = a * b;
}
noinline void f3(void)
{
volatile int i;
for (i = 0; i < 10000;) {
if(i%2)
i++;
else
i++;
}
}
noinline void f1(void)
{
f2();
f3();
}
int main()
{
int i;
for (i = 0; i < 100000; i++)
f1();
}
[lk@localhost perf_diff]$ gcc example_v1.c -o example
[lk@localhost perf_diff]$ perf record -o example_v1.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.813 MB example_v1.data (~35522 samples) ]
[lk@localhost perf_diff]$ gcc example_v2.c -o example
[lk@localhost perf_diff]$ perf record -o example_v2.data ./example
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.824 MB example_v2.data (~36015 samples) ]
Masami Hiramatsu [Thu, 19 Feb 2015 14:31:13 +0000 (23:31 +0900)]
perf probe: Check kprobes blacklist when adding new events
Recent linux kernel provides a blacklist of the functions which can not
be probed. perf probe can now check this blacklist before setting new
events and indicate better error message for users.
Without this patch,
----
# perf probe --add vmalloc_fault
Added new event:
Failed to write event: Invalid argument
Error: Failed to add events.
----
With this patch
----
# perf probe --add vmalloc_fault
Added new event:
Warning: Skipped probing on blacklisted function: vmalloc_fault
----
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150219143113.14434.5387.stgit@localhost.localdomain Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
David Ahern [Thu, 19 Feb 2015 20:00:22 +0000 (15:00 -0500)]
perf trace: Fix SIGBUS failures due to misaligned accesses
On Sparc64 perf-trace is failing in many spots due to extended load
instructions being used on misaligned accesses.
(gdb) run trace ls
Starting program: /tmp/perf/perf trace ls
[Thread debugging using libthread_db enabled]
Detaching after fork from child process 169460.
<ls output removed>
Program received signal SIGBUS, Bus error.
0x000000000014f4dc in tp_field__u64 (field=0x4cc700, sample=0x7feffffa098) at builtin-trace.c:61
warning: Source file is more recent than executable.
61 TP_UINT_FIELD(64);
(gdb) bt
0 0x000000000014f4dc in tp_field__u64 (field=0x4cc700, sample=0x7feffffa098) at builtin-trace.c:61
1 0x0000000000156ad4 in trace__sys_exit (trace=0x7feffffc268, evsel=0x4cc580, event=0xfffffc0104912000,
sample=0x7feffffa098) at builtin-trace.c:1701
2 0x0000000000158c14 in trace__run (trace=0x7feffffc268, argc=1, argv=0x7fefffff360) at builtin-trace.c:2160
3 0x000000000015b78c in cmd_trace (argc=1, argv=0x7fefffff360, prefix=0x0) at builtin-trace.c:2609
4 0x0000000000107d94 in run_builtin (p=0x4549c8, argc=2, argv=0x7fefffff360) at perf.c:341
5 0x0000000000108140 in handle_internal_command (argc=2, argv=0x7fefffff360) at perf.c:400
6 0x0000000000108308 in run_argv (argcp=0x7feffffef2c, argv=0x7feffffef20) at perf.c:444
7 0x0000000000108728 in main (argc=2, argv=0x7fefffff360) at perf.c:559
sample->raw_data is guaranteed to not be 8-byte aligned because it is preceded
by the size as a u3. So accessing raw data with an extended load instruction causes
the SIGBUS. Resolve by using memcpy to a temporary variable of appropriate size.
Ingo Molnar [Thu, 26 Feb 2015 11:25:20 +0000 (12:25 +0100)]
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
New user selectable features:
- Support recording running/enabled time in 'perf record' (Andi Kleen)
- New tool: 'perf data' for converting perf.data to other formats,
initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior)
User visible changes:
- Only insert blank duration bracket when tracing syscalls in 'perf trace' (Arnaldo Carvalho de Melo)
- Filter out the trace pid when no threads are specified in 'perf trace' (Arnaldo Carvalho de Melo)
- Add 'perf trace' man page entry for --event (Arnaldo Carvalho de Melo)
- Dump stack on segfaults in 'perf trace' (Arnaldo Carvalho de Melo)
Infrastructure changes:
- Introduce set_filter_pid and set_filter_pids methods in the evlist class (Arnaldo Carvalho de Melo)
- Some perf_session untanglement patches, removing the need to pass a
perf_session instance for things that are related to evlists, so that
tools that don't deal with perf.data files like trace in live mode can
make use of the ordered_events class (Arnaldo Carvalho de Melo)
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
Adrian Hunter [Tue, 24 Feb 2015 11:20:59 +0000 (13:20 +0200)]
perf tools: Fix probing for PERF_FLAG_FD_CLOEXEC flag
Commit f6edb53c4993ffe92ce521fb449d1c146cea6ec2 converted the probe to
a CPU wide event first (pid == -1). For kernels that do not support
the PERF_FLAG_FD_CLOEXEC flag the probe fails with EINVAL. Since this
errno is not handled pid is not reset to 0 and the subsequent use of
pid = -1 as an argument brings in an additional failure path if
perf_event_paranoid > 0:
$ perf record -- sleep 1
perf_event_open(..., 0) failed unexpectedly with error 13 (Permission denied)
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.007 MB /tmp/perf.data (11 samples) ]
Also, ensure the fd of the confirmation check is closed and comment why
pid = -1 is used.
Needs to go to 3.18 stable tree as well.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Based-on-patch-by: David Ahern <david.ahern@oracle.com> Acked-by: David Ahern <david.ahern@oracle.com> Cc: David Ahern <dsahern@gmail.com> Link: http://lkml.kernel.org/r/54EC610C.8000403@intel.com Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf data: Add a 'perf' prefix to the generic fields
Some of the tracers bring their own id or pid fields and we can end up
having two of them. This patch adds a "perf_" prefix to the 'generic'
fields so we avoid a clash of the member names.
The change is visible in the babeltrace output:
Before:
$ babeltrace ./ctf-data/
[03:19:13.962131936] (+0.000001935) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 8 }
[03:19:13.962133732] (+0.000001796) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 114 }
...
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeremie Galarneau <jgalar@efficios.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1424470628-5969-5-git-send-email-jolsa@kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 20 Feb 2015 22:17:00 +0000 (23:17 +0100)]
perf data: Add perf data to CTF conversion support
Adding 'perf data convert' to convert perf data file into different
format. This patch adds support for CTF format conversion.
To convert perf.data into CTF run:
$ perf data convert --to-ctf=./ctf-data/
[ perf data convert: Converted 'perf.data' into CTF data './ctf-data/' ]
[ perf data convert: Converted and wrote 11.268 MB (100230 samples) ]
The command will create CTF metadata out of perf.data file (or one
specified via -i option) and then convert all sample events into single
CTF stream.
Each sample_type bit is translated into separated CTF event field apart
from following exceptions:
PERF_SAMPLE_RAW - added in next patch
PERF_SAMPLE_READ - TODO
PERF_SAMPLE_CALLCHAIN - TODO
PERF_SAMPLE_BRANCH_STACK - TODO
PERF_SAMPLE_REGS_USER - TODO
PERF_SAMPLE_STACK_USER - TODO
$ perf --debug=data-convert=2 data convert ...
The converted CTF data could be analyzed by CTF tools, like babletrace
or tracecompass [1].
$ babeltrace ./ctf-data/
[03:19:13.962125533] (+?.?????????) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 1 }
[03:19:13.962130001] (+0.000004468) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 1 }
[03:19:13.962131936] (+0.000001935) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 8 }
[03:19:13.962133732] (+0.000001796) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 114 }
[03:19:13.962135557] (+0.000001825) cycles: { }, { ip = 0xFFFFFFFF8105443A, tid = 20714, pid = 20714, period = 2087 }
[03:19:13.962137627] (+0.000002070) cycles: { }, { ip = 0xFFFFFFFF81361938, tid = 20714, pid = 20714, period = 37582 }
[03:19:13.962161091] (+0.000023464) cycles: { }, { ip = 0xFFFFFFFF8124218F, tid = 20714, pid = 20714, period = 600246 }
[03:19:13.962517569] (+0.000356478) cycles: { }, { ip = 0xFFFFFFFF811A75DB, tid = 20714, pid = 20714, period = 1325731 }
[03:19:13.969518008] (+0.007000439) cycles: { }, { ip = 0x34080917B2, tid = 20714, pid = 20714, period = 1144298 }
The following members to the ctf-environment were decided to be added to
distinguish and specify perf CTF data:
- domain
It says "kernel" because it contains a kernel trace (not to be
confused with a user space like lttng-ust does)
- tracer_name
It says perf. This can be used to distinguish between lttng and perf
CTF based trace.
- version
The kernel version from stream. In addition to release, this is what
it looks like on a Debian kernel:
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeremie Galarneau <jgalar@efficios.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1424470628-5969-4-git-send-email-jolsa@kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 20 Feb 2015 22:16:59 +0000 (23:16 +0100)]
perf tools: Add new 'perf data' command
Adding new 'perf data' command to provide operations over data files.
The 'perf data convert' sub command is coming in following patch, but
there's possibility for other useful commands like 'perf data ls' (to
display perf data file in directory in ls style).
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeremie Galarneau <jgalar@efficios.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1424470628-5969-3-git-send-email-jolsa@kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Jiri Olsa [Fri, 20 Feb 2015 22:16:58 +0000 (23:16 +0100)]
perf tools: Add feature check for libbabeltrace
Adding feature check for babeltrace library [1], which will be used for
perf data file CTF [2] conversion in following patches.
The babeltrace library is now automatically detected as standard
feature. It's possible to specify LIBBABELTRACE_DIR make variable to
specify location of installed libbabeltrace, like:
$ make LIBBABELTRACE_DIR=/opt/libbabeltrace/
BUILD: Doing 'make -j4' parallel build
Auto-detecting system features:
... dwarf: [ on ]
... glibc: [ on ]
... gtk2: [ on ]
... libaudit: [ on ]
... libbfd: [ on ]
... libelf: [ on ]
... libnuma: [ on ]
... libperl: [ on ]
... libpython: [ on ]
... libslang: [ on ]
... libunwind: [ on ]
... libbabeltrace: [ on ]
... libdw-dwarf-unwind: [ on ]
... zlib: [ on ]
... DWARF post unwind library: libunwind
NOTE The installation of the [1] to to used by above make:
$ git clone git://git.efficios.com/babeltrace.git
$ cd babeltrace
$ vim README
$ ./bootstrap
$ ./configure --prefix=/opt/libbabeltrace
$ make prefix=/opt/libbabeltrace
$ sudo make install prefix=/opt/libbabeltrace
Please make sure that the /opt/libbabeltrace/lib directory is in your
LD_LIBRARY_PATH:
$ export LD_LIBRARY_PATH=/opt/libbabeltrace/lib
[1] babeltrace - http://www.efficios.com/babeltrace
[2] Common Trace Format - http://www.efficios.com/ctf
Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jeremie Galarneau <jgalar@efficios.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/1424470628-5969-2-git-send-email-jolsa@kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[ Added missing babeltrace build instructions ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Andi Kleen [Tue, 24 Feb 2015 23:13:40 +0000 (15:13 -0800)]
perf record: Support recording running/enabled time
Add an option to perf record to record running/enabled time for read
events, similar to what stat does.
This is useful to understand multiplexing problems.
Right now the report support is not great, but at least report -D
already supports it.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1424819620-16043-1-git-send-email-andi@firstfloor.org
[ Fixed the Documentation entry to match the OPT_BOOLEAN one ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Feature detection for pthread_attr_setaffinity_np was failing, producing
this error:
In file included from bench/futex-hash.c:17:0:
bench/futex.h:73:19: error: conflicting types for ‘pthread_attr_setaffinity_np’
static inline int pthread_attr_setaffinity_np(pthread_attr_t *attr,
^
In file included from bench/futex.h:72:0,
from bench/futex-hash.c:17:
/usr/include/pthread.h:407:12: note: previous declaration of ‘pthread_attr_setaffinity_np’ was here
extern int pthread_attr_setaffinity_np (pthread_attr_t *__attr,
^
make[3]: *** [bench/futex-hash.o] Error 1
make[2]: *** [bench] Error 2
make[2]: *** Waiting for unfinished jobs....
This was because compiling test-pthread-attr-setaffinity-np.c
failed due to the function arguments:
test-pthread-attr-setaffinity-np.c: In function ‘main’:
test-pthread-attr-setaffinity-np.c:11:2: warning: null argument where non-null required (argument 3) [-Wnonnull]
ret = pthread_attr_setaffinity_np(&thread_attr, 0, NULL);
^
So fix the arguments.
Josh Boyer [Wed, 11 Feb 2015 16:24:05 +0000 (11:24 -0500)]
perf tools: Define _GNU_SOURCE on pthread_attr_setaffinity_np feature check
The man page for pthread_attr_set_affinity_np states that _GNU_SOURCE
must be defined before pthread.h is included in order to get the proper
function declaration. Define this in the Makefile.
Without this defined, the feature check fails on a Fedora system with
gcc5 and then the perf build later fails with conflicting prototypes for
the function.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Link: http://lkml.kernel.org/r/20150211162404.GA15522@hansolo.redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf ordered_events: Stop using tool->ordered_events
To figure out if ordered_events are being used when doing a flush
operation, it is enough to check if there were in fact some events
queued, i.e. look at oe->nr_events.
Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-1c5r404vy766kt5nflv88uag@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Mon, 23 Feb 2015 02:21:14 +0000 (18:21 -0800)]
Linux 4.0-rc1
.. after extensive statistical analysis of my G+ polling, I've come to
the inescapable conclusion that internet polls are bad.
Big surprise.
But "Hurr durr I'ma sheep" trounced "I like online polls" by a 62-to-38%
margin, in a poll that people weren't even supposed to participate in.
Who can argue with solid numbers like that? 5,796 votes from people who
can't even follow the most basic directions?
In contrast, "v4.0" beat out "v3.20" by a slimmer margin of 56-to-44%,
but with a total of 29,110 votes right now.
Now, arguably, that vote spread is only about 3,200 votes, which is less
than the almost six thousand votes that the "please ignore" poll got, so
it could be considered noise.
Bruce Merry [Thu, 15 Jan 2015 09:20:22 +0000 (11:20 +0200)]
perf bench: Fix order of arguments to memcpy_alloc_mem
This was causing the destination instead of the source to be filled. As
a result, the source was typically all mapped to one zero page, and
hence very cacheable.
Signed-off-by: Bruce Merry <bmerry@ska.ac.za> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150115092022.GA11292@kryton Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus Torvalds [Mon, 23 Feb 2015 02:05:13 +0000 (18:05 -0800)]
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 fixes from Ted Ts'o:
"Ext4 bug fixes.
We also reserved code points for encryption and read-only images (for
which the implementation is mostly just the reserved code point for a
read-only feature :-)"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix indirect punch hole corruption
ext4: ignore journal checksum on remount; don't fail
ext4: remove duplicate remount check for JOURNAL_CHECKSUM change
ext4: fix mmap data corruption in nodelalloc mode when blocksize < pagesize
ext4: support read-only images
ext4: change to use setup_timer() instead of init_timer()
ext4: reserve codepoints used by the ext4 encryption feature
jbd2: complain about descriptor block checksum errors
Linus Torvalds [Mon, 23 Feb 2015 01:42:14 +0000 (17:42 -0800)]
Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
"Assorted stuff from this cycle. The big ones here are multilayer
overlayfs from Miklos and beginning of sorting ->d_inode accesses out
from David"
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits)
autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation
procfs: fix race between symlink removals and traversals
debugfs: leave freeing a symlink body until inode eviction
Documentation/filesystems/Locking: ->get_sb() is long gone
trylock_super(): replacement for grab_super_passive()
fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
SELinux: Use d_is_positive() rather than testing dentry->d_inode
Smack: Use d_is_positive() rather than testing dentry->d_inode
TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR()
Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode
Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb
VFS: Split DCACHE_FILE_TYPE into regular and special types
VFS: Add a fallthrough flag for marking virtual dentries
VFS: Add a whiteout dentry type
VFS: Introduce inode-getting helpers for layered/unioned fs environments
Infiniband: Fix potential NULL d_inode dereference
posix_acl: fix reference leaks in posix_acl_create
autofs4: Wrong format for printing dentry
...
Linus Torvalds [Sun, 22 Feb 2015 17:57:16 +0000 (09:57 -0800)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM fix from Russell King:
"Just one fix this time around. __iommu_alloc_buffer() can cause a
BUG() if dma_alloc_coherent() is called with either __GFP_DMA32 or
__GFP_HIGHMEM set. The patch from Alexandre addresses this"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8305/1: DMA: Fix kzalloc flags in __iommu_alloc_buffer()
Al Viro [Sun, 22 Feb 2015 03:05:11 +0000 (22:05 -0500)]
debugfs: leave freeing a symlink body until inode eviction
As it is, we have debugfs_remove() racing with symlink traversals.
Supply ->evict_inode() and do freeing there - inode will remain
pinned until we are done with the symlink body.
And rip the idiocy with checking if dentry is positive right after
we'd verified debugfs_positive(), which is a stronger check...
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
trylock_super(): replacement for grab_super_passive()
I've noticed significant locking contention in memory reclaimer around
sb_lock inside grab_super_passive(). Grab_super_passive() is called from
two places: in icache/dcache shrinkers (function super_cache_scan) and
from writeback (function __writeback_inodes_wb). Both are required for
progress in memory allocator.
Grab_super_passive() acquires sb_lock to increment sb->s_count and check
sb->s_instances. It seems sb->s_umount locked for read is enough here:
super-block deactivation always runs under sb->s_umount locked for write.
Protecting super-block itself isn't a problem: in super_cache_scan() sb
is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers
are still active. Inside writeback super-block comes from inode from bdi
writeback list under wb->list_lock.
This patch removes locking sb_lock and checks s_instances under s_umount:
generic_shutdown_super() unlinks it under sb->s_umount locked for write.
New variant is called trylock_super() and since it only locks semaphore,
callers must call up_read(&sb->s_umount) instead of drop_super(sb) when
they're done.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Thu, 29 Jan 2015 12:02:36 +0000 (12:02 +0000)]
fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup()
rather than d_is_dir() when checking a dir watch and give an error on fake
directories.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Thu, 29 Jan 2015 12:02:35 +0000 (12:02 +0000)]
VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
Convert the following where appropriate:
(1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
(2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
(3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
complicated than it appears as some calls should be converted to
d_can_lookup() instead. The difference is whether the directory in
question is a real dir with a ->lookup op or whether it's a fake dir with
a ->d_automount op.
In some circumstances, we can subsume checks for dentry->d_inode not being
NULL into this, provided we the code isn't in a filesystem that expects
d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
use d_inode() rather than d_backing_inode() to get the inode pointer).
Note that the dentry type field may be set to something other than
DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
manages the fall-through from a negative dentry to a lower layer. In such a
case, the dentry type of the negative union dentry is set to the same as the
type of the lower dentry.
However, if you know d_inode is not NULL at the call site, then you can use
the d_is_xxx() functions even in a filesystem.
There is one further complication: a 0,0 chardev dentry may be labelled
DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
intended for special directory entry types that don't have attached inodes.
The following perl+coccinelle script was used:
use strict;
my @callers;
open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
die "Can't grep for S_ISDIR and co. callers";
@callers = <$fd>;
close($fd);
unless (@callers) {
print "No matches\n";
exit(0);
}
David Howells [Thu, 29 Jan 2015 12:02:31 +0000 (12:02 +0000)]
Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode
Use d_is_positive(dentry) or d_is_negative(dentry) rather than testing
dentry->d_inode as the dentry may cover another layer that has an inode when
the top layer doesn't or may hold a 0,0 chardev that's actually a whiteout.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Thu, 29 Jan 2015 12:02:28 +0000 (12:02 +0000)]
VFS: Add a fallthrough flag for marking virtual dentries
Add a DCACHE_FALLTHRU flag to indicate that, in a layered filesystem, this is
a virtual dentry that covers another one in a lower layer that should be used
instead. This may be recorded on medium if directory integration is stored
there.
The flag can be set with d_set_fallthru() and tested with d_is_fallthru().
Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Thu, 29 Jan 2015 12:02:27 +0000 (12:02 +0000)]
VFS: Add a whiteout dentry type
Add DCACHE_WHITEOUT_TYPE and provide a d_is_whiteout() accessor function. A
d_is_miss() accessor is also added for ordinary cache misses and
d_is_negative() is modified to indicate either an ordinary miss or an enforced
miss (whiteout).
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
David Howells [Thu, 29 Jan 2015 12:02:27 +0000 (12:02 +0000)]
VFS: Introduce inode-getting helpers for layered/unioned fs environments
Introduce some function for getting the inode (and also the dentry) in an
environment where layered/unioned filesystems are in operation.
The problem is that we have places where we need *both* the union dentry and
the lower source or workspace inode or dentry available, but we can only have
a handle on one of them. Therefore we need to derive the handle to the other
from that.
The idea is to introduce an extra field in struct dentry that allows the union
dentry to refer to and pin the lower dentry.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Linus Torvalds [Sun, 22 Feb 2015 03:41:38 +0000 (19:41 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
"This is the main pull request for MIPS:
- a number of fixes that didn't make the 3.19 release.
- a number of cleanups.
- preliminary support for Cavium's Octeon 3 SOCs which feature up to
48 MIPS64 R3 cores with FPU and hardware virtualization.
- support for MIPS R6 processors.
Revision 6 of the MIPS architecture is a major revision of the MIPS
architecture which does away with many of original sins of the
architecture such as branch delay slots. This and other changes in
R6 require major changes throughout the entire MIPS core
architecture code and make up for the lion share of this pull
request.
- finally some preparatory work for eXtendend Physical Address
support, which allows support of up to 40 bit of physical address
space on 32 bit processors"
[ Ahh, MIPS can't leave the PAE brain damage alone. It's like
every CPU architect has to make that mistake, but pee in the snow
by changing the TLA. But whether it's called PAE, LPAE or XPA,
it's horrid crud - Linus ]
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (114 commits)
MIPS: sead3: Corrected get_c0_perfcount_int
MIPS: mm: Remove dead macro definitions
MIPS: OCTEON: irq: add CIB and other fixes
MIPS: OCTEON: Don't do acknowledge operations for level triggered irqs.
MIPS: OCTEON: More OCTEONIII support
MIPS: OCTEON: Remove setting of processor specific CVMCTL icache bits.
MIPS: OCTEON: Core-15169 Workaround and general CVMSEG cleanup.
MIPS: OCTEON: Update octeon-model.h code for new SoCs.
MIPS: OCTEON: Implement DCache errata workaround for all CN6XXX
MIPS: OCTEON: Add little-endian support to asm/octeon/octeon.h
MIPS: OCTEON: Implement the core-16057 workaround
MIPS: OCTEON: Delete unused COP2 saving code
MIPS: OCTEON: Use correct instruction to read 64-bit COP0 register
MIPS: OCTEON: Save and restore CP2 SHA3 state
MIPS: OCTEON: Fix FP context save.
MIPS: OCTEON: Save/Restore wider multiply registers in OCTEON III CPUs
MIPS: boot: Provide more uImage options
MIPS: Remove unneeded #ifdef __KERNEL__ from asm/processor.h
MIPS: ip22-gio: Remove legacy suspend/resume support
mips: pci: Add ifdef around pci_proc_domain
...
Linus Torvalds [Sun, 22 Feb 2015 03:21:54 +0000 (19:21 -0800)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"A few fixes that came in too late to make it into the first set of
pull requests but would still be nice to have in -rc1.
The majority of these are trivial build fixes for bugs that I found
myself using randconfig testing, and a set of two patches from Uwe to
mark DT strings as 'const' where appropriate, to resolve inconsistent
section attributes"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: make of_device_ids const
ARM: make arrays containing machine compatible strings const
ARM: mm: Remove Kconfig symbol CACHE_PL310
ARM: rockchip: force built-in regulator support for PM
ARM: mvebu: build armada375-smp code conditionally
ARM: sti: always enable RESET_CONTROLLER
ARM: rockchip: make rockchip_suspend_init conditional
ARM: ixp4xx: fix {in,out}s{bwl} data types
ARM: prima2: do not select SMP_ON_UP
ARM: at91: fix pm declarations
ARM: davinci: multi-soc kernels require AUTO_ZRELADDR
ARM: davinci: davinci_cfg_reg cannot be init
ARM: BCM: put back ARCH_MULTI_V7 dependency for mobile
ARM: vexpress: use ARM_CPU_SUSPEND if needed
ARM: dts: add I2C device nodes for Broadcom Cygnus
ARM: dts: BCM63xx: fix L2 cache properties
Linus Torvalds [Sun, 22 Feb 2015 03:16:42 +0000 (19:16 -0800)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull misc SCSI patches from James Bottomley:
"This is a short patch set representing a couple of left overs from the
merge window (debug removal and MAINTAINER changes).
Plus one merge window regression (the local workqueue for hpsa) and a
set of bug fixes for several issues (two for scsi-mq and the rest an
assortment of long standing stuff, all cc'd to stable)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
sg: fix EWOULDBLOCK errors with scsi-mq
sg: fix unkillable I/O wait deadlock with scsi-mq
sg: fix read() error reporting
wd719x: add missing .module to wd719x_template
hpsa: correct compiler warnings introduced by hpsa-add-local-workqueue patch
fixed invalid assignment of 64bit mask to host dma_boundary for scatter gather segment boundary limit.
fcoe: Transition maintainership to Vasu
am53c974: remove left-over debugging code
Linus Torvalds [Sat, 21 Feb 2015 22:09:38 +0000 (14:09 -0800)]
Merge tag 'xfs-pnfs-for-linus-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs pnfs block layout support from Dave Chinner:
"This contains the changes to XFS needed to support the PNFS block
layout server that you pulled in through Bruce's NFS server tree
merge.
I originally thought that I'd need to merge changes into the NFS
server side, but Bruce had already picked them up and so this is
purely changes to the fs/xfs/ codebase.
Summary:
This update contains the implementation of the PNFS server export
methods that enable use of XFS filesystems as a block layout target"
* tag 'xfs-pnfs-for-linus-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: recall pNFS layouts on conflicting access
xfs: implement pNFS export operations