perf: Enhance perf to allow for guest statistic collection from host
Below patch introduces perf_guest_info_callbacks and related
register/unregister functions. Add more PERF_RECORD_MISC_XXX bits
meaning guest kernel and guest user space.
Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
perf: Make the trace events sample period default to 1
Trace events are mostly used for tracing and then require not to
be lost when possible. As opposite to hardware events that really
require to trigger after a given sample period, trace events mostly
need to trigger everytime.
It is a frustrating experience to trace with perf and realize we
lost a lot of events because we forgot the "-c 1" option.
Then default sample_period to 1 for trace events but let the user
override it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de>
perf: Always record tracepoints raw samples from perf record
Trace events are mostly used for tracing rather than simple
counting. Don't bother anymore with adding -R when using them,
just record raw samples of trace events every time.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 14 Apr 2010 21:58:03 +0000 (23:58 +0200)]
perf: Fix dynamic field detection
Checking if a tracing field is an array with a dynamic length
requires to check the field type and seek the "__data_loc"
string that prepends the actual type, as can be found in a trace
event format file:
hlist helpers need to be available for all software events, not
only trace events.
Pull them out outside the ifdef CONFIG_EVENT_TRACING section.
Fixes:
kernel/perf_event.c:4573: error: implicit declaration of function 'swevent_hlist_put'
kernel/perf_event.c:4614: error: implicit declaration of function 'swevent_hlist_get'
kernel/perf_event.c:5534: error: implicit declaration of function 'swevent_hlist_release
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1271281338-23491-1-git-send-regression-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf probe: Support DW_OP_plus_uconst in DW_AT_data_member_location
DW_OP_plus_uconst can be used for DW_AT_data_member_location.
This patch adds DW_OP_plus_uconst support when getting
structure member offset.
Commiter note:
Fixed up the size_t format specifier in one case:
cc1: warnings being treated as errors
util/probe-finder.c: In function ‘die_get_data_member_location’:
util/probe-finder.c:270: error: format ‘%d’ expects type ‘int’, but argument 4 has type ‘size_t’
make: *** [/home/acme/git/build/perf/util/probe-finder.o] Error 1
LKML-Reference: <20100414223958.14630.5230.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Line range should reject the range if the number of lines is 0
(e.g. "sched.c:1024+0"), and it should show the lines include
the end of line number (e.g. "sched.c:1024-2048" should show
2048th line).
LKML-Reference: <20100414223950.14630.42263.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf probe: Fix a bug that --line range can be overflow
Since line_finder.lno_s/e are signed int but line_range.start/end
are unsigned int, it is possible to be overflow when converting
line_range->start/end to line_finder->lno_s/e.
This changes line_range.start/end and line_list.line to signed int
and adds overflow checks when setting line_finder.lno_s/e.
LKML-Reference: <20100414223942.14630.72730.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf probe: Fix mis-estimation for shortening filename
Fix mis-estimation size for making a short filename.
Since the buffer size is 32 bytes and there are '@' prefix and
'\0' termination, maximum shorten filename length should be
30. This means, before searching '/', it should be 31 bytes.
LKML-Reference: <20100414223935.14630.11954.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf probe: Remove xstrdup()/xstrndup() from util/probe-{event, finder}.c
Remove all xstr*dup() calls from util/probe-{event,finder}.c since
it may cause 'sudden death' in utility functions and it makes
reusing it from other code difficult.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100412171756.3790.89607.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf probe: Remove xzalloc() from util/probe-{event, finder}.c
Remove all xzalloc() calls from util/probe-{event,finder}.c since
it may cause 'sudden death' in utility functions and it makes
reusing it from other code difficult.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100412171749.3790.33303.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove die() and DIE_IF() code from util/probe-event.c since
these 'sudden death' in utility functions make reusing it from
other code (especially tui/gui) difficult.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100412171742.3790.33650.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove die() and DIE_IF() code from util/probe-finder.c since
these 'sudden death' in utility functions make reusing it from
other code (especially tui/gui) difficult.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100412171735.3790.88853.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf probe: Support DW_OP_call_frame_cfa in debuginfo
When building kernel without CONFIG_FRAME_POINTER, gcc uses
CFA (canonical frame address) for frame base. With this patch,
perf probe just gets CFI (call frame information) from debuginfo
and search corresponding CFA from the CFI. IOW, this allows
perf probe works correctly on the kernel without CONFIG_FRAME_POINTER.
<Before>
./perf probe -fn sched_slice:12 lw.weight
Fatal: DW_OP 156 is not supported.
(^^^ DW_OP_call_frame_cfa)
<After>
./perf probe -fn sched_slice:12 lw.weight
Add new event:
probe:sched_slice (on sched_slice:12 with weight=lw.weight)
Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100412171728.3790.98217.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add basic type casting for arguments to perf probe. This allows
users to specify the actual type of arguments. Of course, if
user sets invalid types, kprobe-tracer rejects that.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100412171722.3790.50372.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Query the basic type information (byte-size and signed-flag) from
debuginfo and pass that to kprobe-tracer. This is especially useful
for tracing the members of data structure, because each member has
different byte-size on the memory.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100412171715.3790.23730.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
tracing/kprobes: Support basic types on dynamic events
Support basic types of integer (u8, u16, u32, u64, s8, s16, s32, s64) in
kprobe tracer. With this patch, users can specify above basic types on
each arguments after ':'. If omitted, the argument type is set as
unsigned long (u32 or u64, arch-dependent).
e.g.
echo 'p account_system_time+0 hardirq_offset=%si:s32' > kprobe_events
adds a probe recording hardirq_offset in signed-32bits value on the
entry of account_system_time.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100412171708.3790.18599.stgit@localhost6.localdomain6> Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
struct sort_entry has a callback named snprintf that turns an
entry into a string result.
But there are glibc versions that implement snprintf through a
macro. The following expression is then going to get the snprintf
call preprocessed:
ent->snprintf(...)
to finally end up in a build error:
util/hist.c: Dans la fonction «hist_entry__snprintf» :
util/hist.c:539: erreur: «struct sort_entry» has no member named «__builtin___snprintf_chk»
To fix this, prepend struct sort_entry callbacks with an "se_"
prefix.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf: Make clock software events consistent with general exclusion rules
The cpu/task clock events implement their own version of exclusion
on top of exclude_user and exclude_kernel.
The result is that when the event triggered in the kernel but we
have exclude_kernel set, we try to rewind using task_pt_regs.
There are two side effects of this:
- we call task_pt_regs even on kernel threads, which doesn't give
us the desired result.
- if the event occured in the kernel, we shouldn't rewind to the
user context. We want to actually ignore the event.
get_irq_regs() will always give us the right interrupted context, so
use its result and submit it to perf_exclude_context() that knows
when an event must be ignored.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu>
Each time a software event triggers, we need to walk through
the entire list of events from the current cpu and task contexts
to retrieve a running perf event that matches.
We also need to check a matching perf event is actually counting.
This walk is wasteful and makes the event fast path scaling
down with a growing number of events running on the same
contexts.
To solve this, we store the running perf events in a hashlist to
get an immediate access to them against their type:event_id when
they trigger.
v2: - Fix SWEVENT_HLIST_SIZE definition (and re-learn some basic
maths along the way)
- Only allocate hlist for online cpus, but keep track of the
refcount on offline possible cpus too, so that we allocate it
if needed when it becomes online.
- Drop the kref use as it's not adapted to our tricks anymore.
v3: - Fix bad refcount check (address instead of value). Thanks to
Eric Dumazet who spotted this.
- While exiting cpu, move the hlist release out of the IPI path
to lock the hlist mutex sanely.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu>
In terms of usability, it's not that bad, but it does require
the user to type and remember more than necessary.
This patch allows the user to accomplish the same thing without
specifying the separate record/report steps or the pipe. So the
same command as above can be accomplished more simply as:
$ perf trace rwtop 5
Notice that the '-i -' and '-o -' aren't required in this case -
they're added internally, and that any extra arguments are
passed along to the report script (but not to the record
script).
The overall effect is that any of the scripts listed in 'perf
trace -l' can now be used directly in live mode, with the
expected arguments, by simply specifying the script and args to
'perf trace'.
Tom Zanussi [Fri, 2 Apr 2010 04:59:24 +0000 (23:59 -0500)]
perf trace/scripting: Enable scripting shell scripts for live mode
It should be possible to run any perf trace script in 'live
mode'. This requires being able to pass in e.g. '-i -' or other
args, which the current shell scripts aren't equipped to handle.
In a few cases, there are required or optional args that also
need special handling. This patch makes changes the current set
of shell scripts as necessary.
Tom Zanussi [Fri, 2 Apr 2010 04:59:23 +0000 (23:59 -0500)]
perf trace/scripting: Add rwtop and sctop scripts
A couple of scripts, one in Python and the other in Perl, that
demonstrate 'live mode' tracing. For each, the output of the
perf event stream is fed continuously to the script, which
continuously aggregates the data and reports the current results
every 3 seconds, or at the optionally specified interval. After
the current results are displayed, the aggregations are cleared
and the cycle begins anew.
To run the scripts, simply pipe the output of the 'perf trace
record' step as input to the corresponding 'perf trace report'
step, using '-' as the filename to -o and -i:
Tom Zanussi [Fri, 2 Apr 2010 04:59:22 +0000 (23:59 -0500)]
perf: Convert perf header build_ids into build_id events
Bypasses the build_id perf header code and replaces it with a
synthesized event and processing function that accomplishes the
same thing, used when reading/writing perf data to/from a pipe.
Tom Zanussi [Fri, 2 Apr 2010 04:59:21 +0000 (23:59 -0500)]
perf: Convert perf tracing data into a tracing_data event
Bypasses the tracing_data perf header code and replaces it with
a synthesized event and processing function that accomplishes
the same thing, used when reading/writing perf data to/from a
pipe.
The tracing data is pretty large, and this patch doesn't attempt
to break it down into component events. The tracing_data event
itself doesn't actually contain the tracing data, rather it
arranges for the event processing code to skip over it after
it's read, using the skip return value added to the event
processing loop in a previous patch.
Tom Zanussi [Fri, 2 Apr 2010 04:59:20 +0000 (23:59 -0500)]
perf: Convert perf event types into event type events
Bypasses the event type perf header code and replaces it with a
synthesized event and processing function that accomplishes the
same thing, used when reading/writing perf data to/from a pipe.
Tom Zanussi [Fri, 2 Apr 2010 04:59:19 +0000 (23:59 -0500)]
perf: Convert perf header attrs into attr events
Bypasses the attr perf header code and replaces it with a
synthesized event and processing function that accomplishes the
same thing, used when reading/writing perf data to/from a pipe.
Making the attrs into events allows them to be streamed over a
pipe along with the rest of the header data (in later patches).
It also paves the way to allowing events to be added and removed
from perf sessions dynamically.
Tom Zanussi [Fri, 2 Apr 2010 04:59:18 +0000 (23:59 -0500)]
perf trace: Introduce special handling for pipe input
Adds special treatment for stdin - if the user specifies '-i -'
to perf trace, the intent is that the event stream be read from
stdin rather than from a disk file.
The actual handling of the '-' filename is done by the session;
this just adds a signal handler to stop reporting, and turns off
interference by the pager.
Tom Zanussi [Fri, 2 Apr 2010 04:59:17 +0000 (23:59 -0500)]
perf report: Introduce special handling for pipe input
Adds special treatment for stdin - if the user specifies '-i -'
to perf report, the intent is that the event stream be written
to stdin rather than from a disk file.
The actual handling of the '-' filename is done by the session;
this just adds a signal handler to stop reporting, and turns off
interference by the pager.
Tom Zanussi [Fri, 2 Apr 2010 04:59:16 +0000 (23:59 -0500)]
perf record: Introduce special handling for pipe output
Adds special treatment for stdout - if the user specifies '-o -'
to perf record, the intent is that the event stream be written
to stdout rather than to a disk file.
Also, redirect stdout of forked child to stderr - in pipe mode,
stdout of the forked child interferes with the stdout perf
stream, so redirect it to stderr where it can still be seen but
won't be mixed in with the perf output.
Tom Zanussi [Fri, 2 Apr 2010 04:59:15 +0000 (23:59 -0500)]
perf: Add pipe-specific header read/write and event processing code
This patch makes several changes to allow the perf event stream
to be sent and received over a pipe:
- adds pipe-specific versions of the header read/write code
- adds pipe-specific version of the event processing code
- adds a range of event types to be used for header or other
pseudo events, above the range used by the kernel
- checks the return value of event handlers, which they can use
to skip over large events during event processing rather than actually
reading them into event objects.
- unifies the multiple do_read() functions and updates its
users.
Note that none of these changes affect the existing perf data
file format or processing - this code only comes into play if
perf output is sent to stdout (or is read from stdin).
Ian Munsie [Tue, 13 Apr 2010 08:37:33 +0000 (18:37 +1000)]
perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce OPT_INCR()
Parsing an option from the command line with OPT_BOOLEAN on a
bool data type would not work on a big-endian machine due to the
manner in which the boolean was being cast into an int and
incremented. For example, running 'perf probe --list' on a
PowerPC machine would fail to properly set the list_events bool
and would therefore print out the usage information and
terminate.
This patch makes OPT_BOOLEAN work as expected with a bool
datatype. For cases where the original OPT_BOOLEAN was
intentionally being used to increment an int each time it was
passed in on the command line, this patch introduces OPT_INCR
with the old behaviour of OPT_BOOLEAN (the verbose variable is
currently the only such example of this).
I have reviewed every use of OPT_BOOLEAN to verify that a true
C99 bool was passed. Where integers were used, I verified that
they were only being used for boolean logic and changed them to
bools to ensure that they would not be mistakenly used as ints.
The major exception was the verbose variable which now uses
OPT_INCR instead of OPT_BOOLEAN.
Signed-off-by: Ian Munsie <imunsie@au.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: <stable@kernel.org> # NOTE: wont apply to .3[34].x cleanly, please backport Cc: Git development list <git@vger.kernel.org> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Valdis.Kletnieks@vt.edu Cc: WANG Cong <amwang@redhat.com> Cc: Thiago Farina <tfransosi@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Mike Galbraith <efault@gmx.de> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: John Kacur <jkacur@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1271147857-11604-1-git-send-email-imunsie@au.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf trace: Ignore "overwrite" field if present in /events/header_page
That is not used in perf where we have the LOST events.
Without this patch we get:
[root@doppio ~]# perf lock report | head -3
Warning: Error: expected 'data' but read 'overwrite'
So, to make the same perf command work with kernels with and without
this field, introduce variants for the parsing routines to not warn the
user in such case.
Discussed-with: Steven Rostedt <rostedt@goodmis.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Randy Dunlap [Wed, 31 Mar 2010 18:31:00 +0000 (11:31 -0700)]
perf: cleanup some Documentation
Correct typos in perf bench & perf sched help text.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20100331113100.cc898487.randy.dunlap@oracle.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Randy Dunlap [Wed, 31 Mar 2010 18:30:56 +0000 (11:30 -0700)]
perf bench: fix spello
Fix spello in user message.
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, Cc: Paul Mackerra <paulus@samba.org>s
LKML-Reference: <20100331113056.2c7df509.randy.dunlap@oracle.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Using 'pahole --packable' I found some structs that could be reorganized
to eliminate alignment holes, in some cases getting them to be cacheline
multiples.
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
hvc_console: Fix race between hvc_close and hvc_remove
virtio: disable multiport console support.
virtio: console makes incorrect assumption about virtio API
virtio: console: Fix early_put_chars usage
MAINTAINERS: Put the virtio-console entry in correct alphabetical order
Currently early_put_chars is not used by virtio_console because it can
only be used once a port has been found, at which point it's too late
because it is no longer needed. This patch should fix it.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Kevin Hilman [Wed, 7 Apr 2010 18:52:46 +0000 (11:52 -0700)]
rwsem generic spinlock: use IRQ save/restore spinlocks
rwsems can be used with IRQs disabled, particularily in early boot
before IRQs are enabled. Currently the spin_unlock_irq() usage in the
slow-patch will unconditionally enable interrupts and cause problems
since interrupts are not yet initialized or enabled.
This patch uses save/restore versions of IRQ spinlocks in the slowpath
to ensure interrupts are not unintentionally disabled.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Al Viro [Wed, 7 Apr 2010 23:06:07 +0000 (00:06 +0100)]
Have nfs ->d_revalidate() report errors properly
If nfs atomic open implementation ends up doing open request from
->d_revalidate() codepath and gets an error from server, return that error
to caller explicitly and don't bother with lookup_instantiate_filp() at all.
->d_revalidate() can return an error itself just fine...
See
http://bugzilla.kernel.org/show_bug.cgi?id=15674
http://marc.info/?l=linux-kernel&m=126988782722711&w=2
for original report.
Reported-by: Daniel J Blueman <daniel.blueman@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards
x86: Increase CONFIG_NODES_SHIFT max to 10
ibft, x86: Change reserve_ibft_region() to find_ibft_region()
x86, hpet: Fix bug in RTC emulation
x86, hpet: Erratum workaround for read after write of HPET comparator
bootmem, x86: Fix 32bit numa system without RAM on node 0
nobootmem, x86: Fix 32bit numa system without RAM on node 0
x86: Handle overlapping mptables
x86: Make e820_remove_range to handle all covered case
x86-32, resume: do a global tlb flush in S4 resume
arch/arm/mach-davinci/include/mach/da8xx.h:104: warning: ‘struct platform_device’
declared inside parameter list
arch/arm/mach-davinci/include/mach/da8xx.h:104: warning: its scope is only this
definition or declaration, which is probably not what you want
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: mixart: range checking proc file
ALSA: hda - Fix a wrong array range check in patch_realtek.c
ALSA: ASoC: move dma_data from snd_soc_dai to snd_soc_pcm_stream
ALSA: hda - Enable amplifiers on Acer Inspire 6530G
ASoC: Only do WM8994 bias off transition from standby
ASoC: Don't use DCS_DATAPATH_BUSY for WM hubs devices
ASoC: Don't do runtime wm_hubs DC servo updates if using offset correction
ASoC: Support second DC servo readback method for wm_hubs
ASoC: Avoid wraparound in wm_hubs DC servo correction
ALSA: echoaudio - Eliminate use after free
ALSA: i2c: cleanup: change parameter to pointer
ALSA: hda - Add MSI blacklist for Aopen MZ915-M
ASoC: OMAP: Fix capture pointer handling for OMAP1510 to work correctly with recent ALSA PCM code
ALSA: hda - Update document about MSI and interrupts
ALSA: hda: Fix 0 dB offset for Lenovo Thinkpad models using AD1981
ALSA: hda - Add missing printk argument in previous patch
ASoC: Fix passing platform_data to ac97 bus users and fix a leak
ALSA: hda - Fix ADC/MUX assignment of ALC269 codec
ALSA: hda - Fix invalid bit values passed to snd_hda_codec_amp_stereo()
ASoC: wm8994: playback => capture
David Howells [Tue, 6 Apr 2010 21:35:09 +0000 (14:35 -0700)]
fs-cache: order the debugfs stats correctly
Order the debugfs statistics correctly. The values displayed through a
seq_printf() statement should be in the same order as the names in the
format string.
In the 'Lookups' line, objects created ('crt=') and lookups timed out
('tmo=') have their values transposed.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Tue, 6 Apr 2010 21:35:09 +0000 (14:35 -0700)]
frv: fix kernel/user segment handling in NOMMU mode
In NOMMU mode, the FRV segment handling is broken because KERNEL_DS ==
USER_DS. This causes tests of the following sort:
/* don't pin down non-user-based iovecs */
if (segment_eq(get_fs(), KERNEL_DS))
return NULL;
to malfunction.
To fix this, make USER_DS the top of RAM instead of the top of the non-IO
address space, and make KERNEL_DS one more than the top of the non-IO
address space.
Also get rid of FRV's __addr_ok() as nothing uses it.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On exit paths in mxc_rtc_probe() method some resources are not freed
correctly.
This patch fixes:
* unrequested memory region containing imx RTC registers
* iounmap() isn't called on exit_free_pdata branch
* clock get rate is called for freed clock source
* clock isn't disabled on exit_put_clk branch
To simplify the fix managed device resources are used.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vladimir Zapolskiy <vzapolskiy@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Daniel Mack <daniel@caiaq.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presently, memcg's FILE_MAPPED accounting has following race with
move_account (happens at rmdir()).
increment page->mapcount (rmap.c)
mem_cgroup_update_file_mapped() move_account()
lock_page_cgroup()
check page_mapped() if
page_mapped(page)>1 {
FILE_MAPPED -1 from old memcg
FILE_MAPPED +1 to old memcg
}
.....
overwrite pc->mem_cgroup
unlock_page_cgroup()
lock_page_cgroup()
FILE_MAPPED + 1 to pc->mem_cgroup
unlock_page_cgroup()
Then,
old memcg (-1 file mapped)
new memcg (+2 file mapped)
This happens because move_account see page_mapped() which is not guarded
by lock_page_cgroup(). This patch adds FILE_MAPPED flag to page_cgroup
and move account information based on it. Now, all checks are synchronous
with lock_page_cgroup().
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Balbir Singh <balbir@in.ibm.com> Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Andrea Righi <arighi@develer.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we look into pagemap using page-types with option -p, the value of
pfn for hugepages looks wrong (see below.) This is because pte was
evaluated only once for one vma although it should be updated for each
hugepage. This patch fixes it.
One hugepage contains 1 head page and 511 tail pages in x86_64 and each
two lines represent each hugepage. Voffset and offset mean virtual
address and physical address in the page unit, respectively. The
different hugepages should not have the same offset value.
ratelimit: fix the return value when __ratelimit() fails to acquire the lock
The log of commit edaac8e3167501cda336231d00611bf59c164346 ("ratelimit:
Fix/allow use in atomic contexts"), indicates that we want to suppress the
callback when the trylock fails.
Signed-off-by: Yong Zhang <yong.zhang@windriver.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 6 Apr 2010 21:35:00 +0000 (14:35 -0700)]
vfs: rename block_fsync() to blkdev_fsync()
Requested by hch, for consistency now it is exported.
Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Anton Blanchard <anton@samba.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We now call through generic_file_aio_write -> generic_write_sync ->
vfs_fsync_range. vfs_fsync_range has:
if (!fop || !fop->fsync) {
ret = -EINVAL;
goto out;
}
But drivers/char/raw.c doesn't set an fsync method.
We have two options: fix it or remove the raw driver completely. I'm
happy to do either, the fact this has been broken for so long suggests it
is rarely used.
The patch below adds an fsync method to the raw driver. My knowledge of
the block layer is pretty sketchy so this could do with a once over.
If we instead decide to remove the raw driver, this patch might still be
useful as a backport to 2.6.33 and 2.6.32.
Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since Valentin's email address @siemens.com is no longer valid, it's time
to change it to the one that actually works so that I don't have to
manually forward patches against mb862xx to him every time.
Signed-off-by: Alexander Shishkin <virtuoso@slind.org> Acked-by: Valentin Sitdikov <v.sitdikov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 6 Apr 2010 21:34:57 +0000 (14:34 -0700)]
mb862xxfb: fix acceleration module license
mb862xxfb_accel built as a separate module, but it does not have a
MODULE_LICENSE, so it taints the kernel. Add a MODULE_LICENSE to it (same
as mb862xxfb license).
Shaohua Li reported his tmpfs streaming I/O test can lead to make oom.
The test uses a 6G tmpfs in a system with 3G memory. In the tmpfs, there
are 6 copies of kernel source and the test does kbuild for each copy. His
investigation shows the test has a lot of rotated anon pages and quite few
file pages, so get_scan_ratio calculates percent[0] (i.e. scanning
percent for anon) to be zero. Actually the percent[0] shoule be a big
value, but our calculation round it to zero.
Although before commit 84b18490 ("vmscan: get_scan_ratio() cleanup") , we
have the same problem too. But the old logic can rescue percent[0]==0
case only when priority==0. It had hided the real issue. I didn't think
merely streaming io can makes percent[0]==0 && priority==0 situation. but
I was wrong.
So, definitely we have to fix such tmpfs streaming io issue. but anyway I
revert the regression commit at first.
Anton Blanchard [Tue, 6 Apr 2010 21:34:55 +0000 (14:34 -0700)]
devmem: handle class_create() failure
I hit this when we had a bug in IDR for a few days. Basically sysfs would
fail to create new inodes since it uses an IDR and therefore class_create
would fail.
While we are unlikely to see this fail we may as well handle it instead of
oopsing.
Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prevents further "key xxx not in .data" bug-reports. Although some
attributes could probably be converted to static ones, this is left for
people having hardware to test.
Found by this semantic patch:
@ init @
type T;
identifier A;
@@
T {
...
struct device_attribute A;
...
};
@ main extends init @
expression E;
statement S;
identifier err;
T *name;
@@
... when != sysfs_attr_init(&name->A.attr);
(
+ sysfs_attr_init(&name->A.attr);
if (device_create_file(E, &name->A))
S
|
+ sysfs_attr_init(&name->A.attr);
err = device_create_file(E, &name->A);
)
While reviewing, I put the initialization to apropriate places.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg KH <gregkh@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Mike Isely <isely@pobox.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Sujith Thomas <sujith.thomas@intel.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Len Brown <len.brown@intel.com> Cc: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Tue, 6 Apr 2010 21:34:50 +0000 (14:34 -0700)]
mxser: spin_lock() => spin_lock_irq()
This should be spin_lock_irq() to match the spin_unlock_irq(). Originally
it was a lock_kernel() but we switched everything to spin_lock_irq() last
November.
[akpm@linux-foundation.org: fix the MOXA_ASPP_MON case too (per Jiri)] Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The reset of data lines when the card is removed from the cage results in
a failure.The failure is seen if the card is removed from the cage when TC
is pending after a CMD with data received CC.The reset logic leaves the
controller in a state where niether a TC is received nor DTO.
The rest code can be safely removed here since it is taken care in the IRQ
handler.
Signed-off-by: Madhusudhan Chikkature <madhu.cr@ti.com> Cc: Adrian Hunter <adrian.hunter@nokia.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Tue, 6 Apr 2010 21:34:44 +0000 (14:34 -0700)]
vesafb: use platform_driver_probe() instead of platform_driver_register()
Commit c2e13037e6794bd0d9de3f9ecabf5615f15c160b ("platform-drivers: move
probe to .devinit.text in drivers/video") introduced a huge amount of
section mismatch warnings in vesafb code. Rather than converting all of
the annotations, do the obvious and revert the __init -> __devinit change,
and use the recommended (in that patch) alternative to calling
platform_driver_register(): vesafb depends on information obtained from by
kernel at boot time, cannot be a module, and no post-boot devices can ever
show up.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Greg KH <greg@kroah.com> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Härdeman [Tue, 6 Apr 2010 21:34:43 +0000 (14:34 -0700)]
include/linux/kfifo.h: fix INIT_KFIFO()
DECLARE_KFIFO creates a union with a struct kfifo and a buffer array with
size [size + sizeof(struct kfifo)].
INIT_KFIFO then sets the buffer pointer in struct kfifo to point to the
beginning of the buffer array which means that the first call to kfifo_in
will overwrite members of the struct kfifo.
Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Stefani Seibold <stefani@seibold.net> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 6 Apr 2010 21:34:41 +0000 (14:34 -0700)]
bitops: remove temporary for_each_bit()
Migration has been completed so remove this now. There's one straggler in
linux-next's drivers/mtd/sm_ftl.c. A patch has been sent.
Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>