Jason Baron [Tue, 21 Jul 2009 18:16:29 +0000 (14:16 -0400)]
perf_counter: Detect debugfs location
If "/sys/kernel/debug" is not a debugfs mount point, search for the debugfs
filesystem in /proc/mounts, but also allows the user to specify
'--debugfs-dir=blah' or set the environment variable: 'PERF_DEBUGFS_DIR'
Signed-off-by: Jason Baron <jbaron@redhat.com>
[ also made it probe "/debug" by default ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090721181629.GA3094@redhat.com>
Jason Baron [Tue, 21 Jul 2009 16:20:22 +0000 (12:20 -0400)]
perf_counter: Add tracepoint support to perf list, perf stat
Add support to 'perf list' and 'perf stat' for kernel tracepoints. The
implementation creates a 'for_each_subsystem' and 'for_each_event' for
easy iteration over the tracepoints.
Arjan van de Ven [Tue, 21 Jul 2009 05:54:26 +0000 (22:54 -0700)]
perf: avoid structure size confusion by using a fixed size
for some reason, this structure gets compiled as 36 bytes in some files
(the ones that alloacte it) but 40 bytes in others (the ones that use it).
The cause is an off_t type that gets a different size in different
compilation units for some yet-to-be-explained reason.
But the effect is disasterous; the size/offset members of the struct
are at different offsets, and result in mostly complete garbage.
The parser in perf is so robust that this all gets hidden, and after
skipping an certain amount of samples, it recovers.... so this bug
is not normally noticed.
.... except when you want every sample to be exact.
Fix this by just using an explicitly sized type.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4A655917.9080504@linux.intel.com>
Peter Zijlstra [Tue, 21 Jul 2009 11:19:40 +0000 (13:19 +0200)]
perf_counter: PERF_SAMPLE_ID and inherited counters
Anton noted that for inherited counters the counter-id as provided by
PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID
because each inherited counter gets its own id.
His suggestion was to always return the parent counter id, since that
is the primary counter id as exposed. However, these inherited
counters have a unique identifier so that events like
PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which
counter gets modified, which is important when trying to normalize the
sample streams.
This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD,
which is more useful anyway, since changing periods became a lot more
common than initially thought -- rendering PERF_EVENT_PERIOD the less
useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate
value, since it reports the value used to trigger the overflow,
whereas PERF_EVENT_PERIOD simply reports the requested period changed,
which might only take effect on the next cycle).
This still leaves us PERF_EVENT_THROTTLE to consider, but since that
_should_ be a rare occurrence, and linking it to a primary id is the
most useful bit to diagnose the problem, we introduce a
PERF_SAMPLE_STREAM_ID, for those few cases where the full
reconstruction is important.
[Does change the ABI a little, but I see no other way out]
Suggested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1248095846.15751.8781.camel@twins>
Anton Blanchard [Thu, 16 Jul 2009 13:44:29 +0000 (15:44 +0200)]
perf_counter: Make call graph option consistent
perf record uses -g for logging call graph data but perf report
uses -c to print call graph data. Be consistent and use -g
everywhere for call graph data.
Also update the help text to reflect the current default -
fractal,0.5
Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090716104817.803604373@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Anton Blanchard [Thu, 16 Jul 2009 13:44:29 +0000 (15:44 +0200)]
perf_counter: Log vfork as a fork event
Right now we don't output vfork events. Even though we should
always see an exec after a vfork, we may get perfcounter
samples between the vfork and exec. These samples can lead to
some confusion when parsing perfcounter data.
To keep things consistent we should always log a fork event. It
will result in a little more log data, but is less confusing to
trace parsing tools.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090716104817.589309391@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Anton Blanchard [Thu, 16 Jul 2009 13:44:29 +0000 (15:44 +0200)]
perf_counter: Synthesize VDSO mmap event
perf record synthesizes mmap events for the running process.
Right now it just catches file mappings, but we can check for
the vdso symbol and add that too.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090716104817.517264409@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Anton Blanchard [Thu, 16 Jul 2009 13:15:52 +0000 (15:15 +0200)]
perf_counter: Make sure we dont leak kernel memory to userspace
There are a few places we are leaking tiny amounts of kernel
memory to userspace. This happens when writing out strings
because we always align the end to 64 bits.
To avoid this we should always use an appropriately sized
temporary buffer and ensure it is zeroed.
Since d_path assembles the string from the end of the buffer
backwards, we need to add 64 bits after the buffer to allow for
alignment.
We also need to copy arch_vma_name to the temporary buffer,
because if we use it directly we may end up copying to
userspace a number of bytes after the end of the string
constant.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090716104817.273972048@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Chris Wilson [Mon, 6 Jul 2009 08:31:33 +0000 (09:31 +0100)]
perf_counter: Fix the tracepoint channel to perfcounters
Fix a missed rename in EVENT_PROFILE support so that it gets
built and allows tracepoint tracing from the 'perf' tool.
Fix a typo in the (never before built & enabled) portion in
perf_counter.c as well, and update that code to the
attr.config changes as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Gamari <bgamari.foss@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1246869094-21237-1-git-send-email-chris@chris-wilson.co.uk> Signed-off-by: Ingo Molnar <mingo@elte.hu>
But that erratum (related to IA32_MISC_ENABLES.7) does not
affect perfcounters as we dont use this toggle to disable RDPMC
and WRMSR/RDMSR access to performance counters. We keep cr4's
bit 8 (X86_CR4_PCE) clear so unprivileged RDPMC access is not
allowed anyway.
When we filter by column content we may end up with a column
that has the same value for all the lines. So remove that
column and tell its unique value on the top, as a comment.
strlist: Introduce strlist__entry and strlist__nr_entries methods
The strlist__entry method allows accessing strlists like an
array, will be used in the 'perf report' to access the first
entry.
We now keep the nr_entries so that we can check if we have just
one entry, will be used in 'perf report' to improve the output
by showing just at the top when we have just, say, one DSO.
While at it use nr_entries to optimize strlist__is_empty by not
using the far more costly rb_first based implementation.
Also add these extra options to control the new behaviour:
-w, --field-width
Force each column width to the provided list, for large terminal
readability.
-t, --field-separator:
Use a special separator character and don't pad with spaces, replacing
all occurances of this separator in symbol names (and other output) with
a '.' character, that thus it's the only non valid separator.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090711014728.GH3452@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 10 Jul 2009 07:59:56 +0000 (09:59 +0200)]
perf_counter: Clean up global vs counter enable
Ingo noticed that both AMD and P6 call
x86_pmu_disable_counter() on *_pmu_enable_counter(). This is
because we rely on the side effect of that call to program
the event config but not touch the EN bit.
We change that for AMD by having enable_all() simply write
the full config in, and for P6 by explicitly coding it.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Wed, 8 Jul 2009 08:21:41 +0000 (10:21 +0200)]
perf_counter: Fix up P6 PMU details
The P6 doesn't seem to support cache ref/hit/miss counts, so
we extend the generic hardware event codes to have 0 and -1
mean the same thing as for the generic cache events.
Furthermore, it turns out the 0 event does not count
(that is, its reported that on PPro it actually does count
something), therefore use a event configuration that's
specified not to count to disable the counters.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to
enable/disable both its counters. We use this for the
global enable/disable, and clear all config bits (except EN)
to disable individual counters.
Actual ia32 hardware doesn't support lfence, so use a locked
op without side-effect to implement a full barrier.
perf stat and perf record seem to function correctly.
[a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code]
Anton Blanchard [Mon, 6 Jul 2009 12:01:31 +0000 (22:01 +1000)]
perf_counter tools: Rename cache events to remove $
The cache events contain '$' which will hit shell variable
expansion. To avoid confusion change this to 'cache', ie
L1-d$-loads becomes L1-dcache-loads.
Signed-off-by: Anton Blanchard <anton@samba.org> Cc: Roland Dreier <rdreier@cisco.com> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090706120131.GB4391@kryten> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_counter tools: callchains: Manage the cumul hits on the fly
The cumul hits are the number of hits of every childs of a node
plus the hits of the current nodes, required for percentage
computing of a branch.
Theses numbers are calculated during the sorting of the branches of
the callchain tree using a depth first postfix traversal, so that
cumulative hits are propagated in the right order.
But if we plan to implement percentages relative to the parent and not
absolute percentages (relative to the whole overhead), we need to know
the cumulative hits of the parent before computing the children
because the relative minimum acceptable number of entries (ie: minimum
rate against the cumulative hits from the parent) is the basis to
filter the children against a given rate.
Then we need to handle the cumul hits on the fly to prepare the
implementation of relative overhead rates.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246772361-9960-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf report: Use a modifiable string for default callchain options
If the user doesn't provide options to tune his callchain output
(ie: if he uses -c without arguments) then the default value passed
in the OPT_CALLBACK_DEFAULT() macro is used.
But it's parsed later by strtok() which will replace comma separators
to a zero. This may segfault as we are using a read-only string.
Use a modifiable one instead, and also fix the "100%" default
minimum threshold value by turning it into a 0 (output every callchains)
as it was intended in the origin.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246772361-9960-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Paul Mackerras [Wed, 1 Jul 2009 22:57:12 +0000 (08:57 +1000)]
x86: atomic64: Code atomic(64)_read and atomic(64)_set in C not CPP
Occasionally we get bugs where atomic_read or atomic_set are
used on atomic64_t variables or vice versa. These bugs don't
generate warnings on x86 because atomic_read and atomic_set are
coded as macros rather than C functions, so we don't get any
type-checking on their arguments; similarly for atomic64_read
and atomic64_set in 64-bit kernels.
This converts them to C functions so that the arguments are
type-checked and bugs like this will get caught more easily. It
also converts atomic_cmpxchg and atomic_xchg, and
atomic64_cmpxchg and atomic64_xchg on 64-bit, so we get
type-checking on their arguments too.
Compiling a typical 64-bit x86 config, this generates no new
warnings, and the vmlinux text is 86 bytes smaller.
Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86: atomic64: Fix unclean type use in atomic64_xchg()
Linus noticed that atomic64_xchg() uses atomic_read(), which
happens to work because atomic_read() is a macro so the
.counter value gets u64-read on 32-bit too - but this is really
bogus and serious bugs are waiting to happen.
Fix atomic64_xchg() to use __atomic64_read() instead.
No code changed:
arch/x86/lib/atomic64_32.o:
text data bss dec hex filename
435 0 0 435 1b3 atomic64_32.o.before
435 0 0 435 1b3 atomic64_32.o.after
Linus noticed that atomic64_xchg() uses atomic_read(), which
happens to work because atomic_read() is a macro so the
.counter value gets u64-read on 32-bit too - but this is really
bogus and serious bugs are waiting to happen.
Change atomic_read() to be a type-safe inline, and this exposes
the atomic64 bogosity as well:
arch/x86/lib/atomic64_32.c: In function ‘atomic64_xchg’:
arch/x86/lib/atomic64_32.c:39: warning: passing argument 1 of ‘atomic_read’ from incompatible pointer type
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus noted (based on Eric Dumazet's numbers) that we would
probably be better off not trying an atomic_read() in
atomic64_add_return() but intead intentionally let the first
cmpxchg8b fail - to get a cache-friendly 'give me ownership
of this cacheline' transaction. That can then be followed
by the real cmpxchg8b which sets the value local to the CPU.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Eric Dumazet [Fri, 3 Jul 2009 10:14:27 +0000 (12:14 +0200)]
x86: atomic64: Improve atomic64_read()
Linus noticed that the 32-bit version of atomic64_read() was
being overly complex with re-reading the value and doing a
retry loop over that.
Instead we can just rely on cmpxchg8b returning either the new
value or returning the current value.
We can use any 'old' value, which will be faster as it can be
loaded via immediates. Using some value that is not equal to
the real value in memory the instruction gets faster.
This also has the advantage that the CPU could avoid dirtying
the cacheline.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file
Linus noted that the atomic64_t primitives are all inlines
currently which is crazy because these functions have a large
register footprint anyway.
Move them to a separate file: arch/x86/lib/atomic64_32.c
Also, while at it, rename all uses of 'unsigned long long' to
the much shorter u64.
This makes the appearance of the prototypes a lot nicer - and
it also uncovered a few bugs where (yet unused) API variants
had 'long' as their return type instead of u64.
[ More intrusive changes are not yet done in this patch. ]
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_counter tools: Adjust symbols in ET_EXEC files too
Ingo Molnar wrote:
> i just bisected a 'perf report' bug that would cause us to not
> resolve all user-space symbols in a 'git gc' run to:
>
> f5812a7a336fb952d819e4427b9a2dce02368e82 is first bad commit
> commit f5812a7a336fb952d819e4427b9a2dce02368e82
> Author: Arnaldo Carvalho de Melo <acme@redhat.com>
> Date: Tue Jun 30 11:43:17 2009 -0300
>
> perf_counter tools: Adjust only prelinked symbol's addresses
Rename ->prelinked to ->adjust_symbols and making what was done
only for prelinked libraries also to ET_EXEC binaries, such as
/usr/bin/git:
perf_counter tools: Provide helper to print percents color
Among perf annotate, perf report and perf top, we can find the
common colored printing of percents according to the following
rules:
High overhead = > 5%, colored in red
Mid overhead = > 0.5%, colored in green
Low overhead = < 0.5%, default color
Factorize these multiple checks in a single function named
percent_color_fprintf() and also provide a get_percent_color()
for sites which print percentages and other things at the same
time.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246558475-10624-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_counter tools: Create new chain_for_each_child() iterator
Iterating through children of a node in the callchain tree
shows something that may be quite confusing at a first glance.
The head is the children field of the parent and the list nodes
are in the brothers field of the children.
This is because the childs are linked to the parent as a list
of "brothers" using the "children" list of the parent as a
head:
Building builtin-stat.c reports the following errors:
cc1: warnings being treated as errors
builtin-stat.c: In function ‘run_perf_stat’:
builtin-stat.c:242: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
builtin-stat.c:255: erreur: ignoring return value of ‘read’, declared with attribute warn_unused_result
make: *** [builtin-stat.o] Erreur 1
This patch handles the possible pipe read failures.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246474930-6088-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
So remove the copy and use the lib/rbtree.c directly, sharing
the source code while still generating a separate object file,
since tools/perf uses a far more agressive -O6 switch.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20090701152837.GG15682@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
builtin-report.c: In function ‘hist_entry__add’:
builtin-report.c:1015: error: case label not within a switch statement
builtin-report.c:1017: error: break statement not within loop or switch
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
This reworks the parser for event descriptors to make it more
consistent in what it accepts. It is now structured as a
recursive descent parser for the following grammar:
with the extra restriction that you can have at most one
cache_op and at most one cache_result.
We pass the current string pointer by reference (i.e. as a
const char **) to the various parsing functions so that they
can advance the pointer to indicate how much they consumed.
They return 0 if they didn't recognize the thing at the pointer
or 1 if they did (and advance the pointer past it).
This also fixes parse_aliases to take the longest matching
alias from the table, not the first one. Otherwise "l1-data"
would match the "l1-d" alias and the "ata" would not be
consumed.
This allows event modifiers indicating what processor modes to
count in to be applied to any event, not just numeric events,
and adds a ":h" modifier to indicate counting in hypervisor
mode. Specifying ":u" now sets both exclude_kernel and
exclude_hv, and so on. Multiple modes can be specified, e.g.
":uk" will count in user or hypervisor mode (i.e. only
exclude_kernel will be set).
Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <19018.53826.843815.189847@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
The symbol resolving has of course revealed some bugs in the
callchain tree handling. This patch fixes some of them,
including:
- inherit the children from the parents while splitting a node
- fix list range moving
- fix indexes setting in callchains
- create a child on the current node if the path doesn't match in
the existent children (was only done on the root)
- compare using symbols when possible so that we can match a function
using any ip inside by referring to its start address.
The practical effects are:
- remove double callchains
- fix upside down or any random order of callchains
- fix wrong paths
- fix bad hits and percentage accounts
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1246419315-9968-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merge branch 'kmemleak' of git://linux-arm.org/linux-2.6
* 'kmemleak' of git://linux-arm.org/linux-2.6:
kmemleak: Inform kmemleak about pid_hash
kmemleak: Do not warn if an unknown object is freed
kmemleak: Do not report new leaked objects if the scanning was stopped
kmemleak: Slightly change the policy on newly allocated objects
kmemleak: Do not trigger a scan when reading the debug/kmemleak file
kmemleak: Simplify the reports logged by the scanning thread
kmemleak: Enable task stacks scanning by default
kmemleak: Allow the early log buffer to be configurable.
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm table: fix blk_stack_limits arg to use bytes not sectors
dm exception store: really fix type lookup
Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (47 commits)
perf report: Add --symbols parameter
perf report: Add --comms parameter
perf report: Add --dsos parameter
perf_counter tools: Adjust only prelinked symbol's addresses
perf_counter: Provide a way to enable counters on exec
perf_counter tools: Reduce perf stat measurement overhead/skew
perf stat: Use percentages for scaling output
perf_counter, x86: Update x86_pmu after WARN()
perf stat: Micro-optimize the code: memcpy is only required if no event is selected and !null_run
perf stat: Improve output
perf stat: Fix multi-run stats
perf stat: Add -n/--null option to run without counters
perf_counter tools: Remove dead code
perf_counter: Complete counter swap
perf report: Print sorted callchains per histogram entries
perf_counter tools: Prepare a small callchain framework
perf record: Fix unhandled io return value
perf_counter tools: Add alias for 'l1d' and 'l1i'
perf-report: Add bare minimum PERF_EVENT_READ parsing
perf-report: Add modes for inherited stats and no-samples
...
Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
Add Fenghua Yu as temporary co-maintainer for ia64
[IA64] address compiler warnings perfmon.c/salinfo.c
[IA64] Remove unnecessary semicolons
[IA64] sprintf should not be used with same source & destination address
hostfs: set maximum filesize in superblock for proper LFS support
Maximum file size for hostfs mounts defaults to 2GB, so bigger files cannot be
read/written through hostfs. This patch initializes the maximum file size to
MAX_LFS_SIZE.
Signed-off-by: Wolfgang Illmeyer <wolfgang@illmeyer.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Tue, 30 Jun 2009 18:41:43 +0000 (11:41 -0700)]
bfin: delay IRQ registration until driver is ready
Make sure we do not actually request the RTC IRQ until the device driver
is fully ready to handle and process any interrupt. This way a spurious
interrupt won't crash the system (which may happen if the bootloader was
poking the RTC right before booting Linux).
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ville Syrjala [Tue, 30 Jun 2009 18:41:42 +0000 (11:41 -0700)]
atyfb: fix alignment for block writes
Block writes require 64 byte alignment. Since block writes could be used
with SGRAM or WRAM also refine the memory type detection to check for
either type before deciding to use the 64 byte alignment.
Signed-off-by: Ville Syrjala <syrjala@sci.fi> Tested-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ville Syrjala [Tue, 30 Jun 2009 18:41:40 +0000 (11:41 -0700)]
atyfb: fix HP OmniBook 500 reboot hang
Apparently HP OmniBook 500's BIOS doesn't like the way atyfb reprograms
the hardware. The BIOS will simply hang after a reboot. Fix the problem
by restoring the hardware to it's original state on reboot.
Signed-off-by: Ville Syrjala <syrjala@sci.fi> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Tue, 30 Jun 2009 18:41:37 +0000 (11:41 -0700)]
x86: only clear node_states for 64bit
Nathan reported that
| commit 73d60b7f747176dbdff826c4127d22e1fd3f9f74
| Author: Yinghai Lu <yinghai@kernel.org>
| Date: Tue Jun 16 15:33:00 2009 -0700
|
| page-allocator: clear N_HIGH_MEMORY map before we set it again
|
| SRAT tables may contains nodes of very small size. The arch code may
| decide to not activate such a node. However, currently the early boot
| code sets N_HIGH_MEMORY for such nodes. These nodes therefore seem to be
| active although these nodes have no present pages.
|
| For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
unintentionally and incorrectly clears the cpuset.mems cgroup attribute on
an i386 kvm guest, meaning that cpuset.mems can not be used.
Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
and need to do save/restore for that in find_zone_movable_pfn
Reported-by: Nathan Lynch <ntl@pobox.com> Tested-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cpusets: document adding/removing cpus to cpuset elaborately
By writing a tasks's pid to the file, a process adds that task to that
cgroup/cpuset. But to add a cpu/mem to a cpuset, the new list of cpus
should be written to the cpuset.mems file which would replace the old list
of cpus. Make this clearer in the documentation.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Kennedy [Tue, 30 Jun 2009 18:41:35 +0000 (11:41 -0700)]
mm: prevent balance_dirty_pages() from doing too much work
balance_dirty_pages can overreact and move all of the dirty pages to
writeback unnecessarily.
balance_dirty_pages makes its decision to throttle based on the number of
dirty plus writeback pages that are over the calculated limit,so it will
continue to move pages even when there are plenty of pages in writeback
and less than the threshold still dirty.
This allows it to overshoot its limits and move all the dirty pages to
writeback while waiting for the drives to catch up and empty the writeback
list.
A simple fio test easily demonstrates this problem.
This is the simplest fix I could find, but I'm not entirely sure that it
alone will be enough for all cases. But it certainly is an improvement on
my desktop machine writing to 2 disks.
Do we need something more for machines with large arrays where
bdi_threshold * number_of_drives is greater than the dirty_ratio ?
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Renaud Lottiaux [Tue, 30 Jun 2009 18:41:34 +0000 (11:41 -0700)]
bsdacct: fix access to invalid filp in acct_on()
The file opened in acct_on and freshly stored in the ns->bacct struct can
be closed in acct_file_reopen by a concurrent call after we release
acct_lock and before we call mntput(file->f_path.mnt).
Record file->f_path.mnt in a local variable and use this variable only.
Signed-off-by: Renaud Lottiaux <renaud.lottiaux@kerlabs.com> Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 30 Jun 2009 18:41:30 +0000 (11:41 -0700)]
spi: bitbang bugfix in message setup
Bugfix to spi_bitbang infrastructure: make sure to always set transfer
parameters on the first pass through the message's per-transfer loop.
This can matter with drivers that replace the per-word or per-buffer
transfer primitives, on busses with multiple SPI devices.
Previously, this could have started messages using the settings left after
previous messages. The problem was observed when a high speed chip
(m25p80 type flash) was running very slowly because a low speed device
(avr8 microcontroller) had previously used the bus. Similar faults could
have driven the low speed device too fast, or used an unexpected word
size.
Acked-by: Steven A. Falco <sfalco@harris.com> Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Tue, 30 Jun 2009 18:41:29 +0000 (11:41 -0700)]
fbdev: add mutex for fb_mmap locking
Add a mutex to avoid a circular locking problem between the mm layer
semaphore and fbdev ioctl mutex through the fb_mmap() call.
Also, add mutex to all places where smem_start and smem_len fields change
so the mutex inside the fb_mmap() is actually used. Changing of these
fields before calling the framebuffer_register() are not mutexed.
This is 2.6.31 material. It removes one lockdep (fb_mmap() and
register_framebuffer()) but there is still another one (fb_release() and
register_framebuffer()). It also cleans up handling of the smem_start and
smem_len fields used by mutexed section of the fb_mmap().
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 30 Jun 2009 18:41:27 +0000 (11:41 -0700)]
spi: add spi_master flag word
Add a new spi_master.flags word listing constraints relevant to that
controller. Define the first constraint bit: a half duplex restriction.
Include that constraint in the OMAP1 MicroWire controller driver.
Have the mmc_spi host be the first customer of this flag. Its coding
relies heavily on full duplex transfers, so it must fail when the
underlying controller driver won't perform them.
(The spi_write_then_read routine could use it too: use the
temporarily-withdrawn full-duplex speedup unless this flag is set, in
which case the existing code applies. Similarly, any spi_master
implementing only SPI_3WIRE should set the flag.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Tue, 30 Jun 2009 18:41:26 +0000 (11:41 -0700)]
spi: new spi->mode bits
Add two new spi_device.mode bits to accomodate more protocol options, and
pass them through to usermode drivers:
* SPI_NO_CS ... a second 3-wire variant, where the chipselect
line is removed instead of a data line; transfers are still
full duplex.
This obviously has STRONG protocol implications since the
chipselect transitions can't be used to synchronize state
transitions with the SPI master.
* SPI_READY ... defines open drain signal that's pulled low
to pause the clock. This defines a 5-wire variant (normal
4-wire SPI plus READY) and two 4-wire variants (READY plus
each of the 3-wire flavors).
Such hardware flow control can be a big win. There are ADC
converters and flash chips that expose READY signals, but not
many host controllers support it today.
The spi_bitbang code should be changed to use SPI_NO_CS instead of its
current nonportable hack. That's a mode most hardware can easily support
(unlike SPI_READY).
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: "Paulraj, Sandeep" <s-paulraj@ti.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Tue, 30 Jun 2009 18:41:25 +0000 (11:41 -0700)]
dmapools: protect page_list walk in show_pools()
show_pools() walks the page_list of a pool w/o protection against the list
modifications in alloc/free. Take pool->lock to avoid stomping into
nirvana.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Donlan [Tue, 30 Jun 2009 18:41:24 +0000 (11:41 -0700)]
ext2: return -EIO not -ESTALE on directory traversal through deleted inode
ext2_iget() returns -ESTALE if invoked on a deleted inode, in order to
report errors to NFS properly. However, in ext[234]_lookup(), this
-ESTALE can be propagated to userspace if the filesystem is corrupted such
that a directory entry references a deleted inode. This leads to a
misleading error message - "Stale NFS file handle" - and confusion on the
part of the admin.
The bug can be easily reproduced by creating a new filesystem, making a
link to an unused inode using debugfs, then mounting and attempting to ls
-l said link.
This patch thus changes ext2_lookup to return -EIO if it receives -ESTALE
from ext2_iget(), as ext2 does for other filesystem metadata corruption;
and also invokes the appropriate ext*_error functions when this case is
detected.