Steven Rostedt [Tue, 9 Jun 2009 21:29:07 +0000 (17:29 -0400)]
tracing: add protection around module events unload
When reading the trace buffer, there is a race that when a module
is unloaded it removes events that is stilled referenced in the buffers.
This patch adds the protection around the unloading of the events
from modules and the reading of the trace buffers.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 9 Jun 2009 18:04:26 +0000 (14:04 -0400)]
tracing: fix the block trace points print size
The sector field is either u64 or unsigned long depending on
the arch. This patch casts the sector to unsigned long long to
prevent the printf warnings.
[ Impact: remove compile warnings ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Li Zefan [Tue, 9 Jun 2009 05:43:05 +0000 (13:43 +0800)]
tracing/events: convert block trace points to TRACE_EVENT()
TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
these new capabilities to this tracepoint:
- zero-copy and per-cpu splice() tracing
- binary tracing without printf overhead
- structured logging records exposed under /debug/tracing/events
- trace events embedded in function tracer output and other plugins
- user-defined, per tracepoint filter expressions
...
Cons:
- no dev_t info for the output of plug, unplug_timer and unplug_io events.
no dev_t info for getrq and sleeprq events if bio == NULL.
no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.
This is mainly because we can't get the deivce from a request queue.
But this may change in the future.
- A packet command is converted to a string in TP_assign, not TP_print.
While blktrace do the convertion just before output.
Since pc requests should be rather rare, this is not a big issue.
- In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
has a unique format, which means we have some unused data in a trace entry.
The overhead is minimized by using __dynamic_array() instead of __array().
I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
Peter Zijlstra [Mon, 8 Jun 2009 16:18:39 +0000 (18:18 +0200)]
ring-buffer: pass in lockdep class key for reader_lock
On Sun, 7 Jun 2009, Ingo Molnar wrote:
> Testing tracer sched_switch: <6>Starting ring buffer hammer
> PASSED
> Testing tracer sysprof: PASSED
> Testing tracer function: PASSED
> Testing tracer irqsoff:
> =============================================
> PASSED
> Testing tracer preemptoff: PASSED
> Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ]
> PASSED
> Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760
> ---------------------------------------------
> rb_consumer/431 is trying to acquire lock:
> (&cpu_buffer->reader_lock){......}, at: [<c109eef7>] ring_buffer_reset_cpu+0x37/0x70
>
> but task is already holding lock:
> (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
>
> other info that might help us debug this:
> 1 lock held by rb_consumer/431:
> #0: (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
The ring buffer is a generic structure, and can be used outside of
ftrace. If ftrace traces within the use of the ring buffer, it can produce
false positives with lockdep.
This patch passes in a static lock key into the allocation of the ring
buffer, so that different ring buffers will have their own lock class.
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1244477919.13761.9042.camel@twins>
[ store key in ring buffer descriptor ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
This looks a bit awkward, and if we have both stack and user stack traces
running, it would be nice to have a title to tell them apart, although
it is easy to tell by the output.
This patch adds an annotation to the start of the stack traces:
tracing: fix multiple use of __print_flags and __print_symbolic
Here is an updated patch to include the extra call to
trace_seq_init() as requested. This is vs. the latest
-tip tree and fixes the use of multiple __print_flags
and __print_symbolic in a single tracer. Also tested
to ensure its working now:
mount.gfs2-2534 [000] 235.850587: gfs2_glock_queue: 8.7 glock 1:2 dequeue PR
mount.gfs2-2534 [000] 235.850591: gfs2_demote_rq: 8.7 glock 1:0 demote EX to NL flags:DI
mount.gfs2-2534 [000] 235.850591: gfs2_glock_queue: 8.7 glock 1:0 dequeue EX
glock_workqueue-2529 [000] 235.850666: gfs2_glock_state_change: 8.7 glock 1:0 state EX => NL tgt:NL dmt:NL flags:lDpI
glock_workqueue-2529 [000] 235.850672: gfs2_glock_put: 8.7 glock 1:0 state NL => IV flags:I
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
LKML-Reference: <1244037123.29604.603.camel@localhost.localdomain> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
walimis [Wed, 3 Jun 2009 08:01:29 +0000 (16:01 +0800)]
tracing/events: fix output format of kernel stack
According to "events/ftrace/kernel_stack/format", output format of
kernel stack should use "=>" instead of "<=".
The second problem is that we shouldn't skip the first entry in the stack,
although it seems to be duplicated when used in the "function" tracer,
but events also use it. If we skip the first one, we will drop the topmost
entry of the stack.
The last problem is that if the last entry is ULONG_MAX(0xffffffff), we should
drop it, otherwise it will print a NULL name line.
walimis [Wed, 3 Jun 2009 08:01:28 +0000 (16:01 +0800)]
tracing/trace_stack: fix the number of entries in the header
The last entry in the stack_dump_trace is ULONG_MAX, which is not
a valid entry, but max_stack_trace.nr_entries has accounted for it.
So when printing the header, we should decrease it by one.
Before fix, print as following, for example:
Steven Rostedt [Wed, 3 Jun 2009 13:30:10 +0000 (09:30 -0400)]
ring-buffer: discard timestamps that are at the start of the buffer
Every buffer page in the ring buffer includes its own time stamp.
When an event is recorded to the ring buffer with a delta time greater
than what can be held in the event header, a time stamp event is created.
If the the create timestamp falls over to the next buffer page, it is
redundant because the buffer page holds a full time stamp. This patch
will try to discard the time stamp when it falls to the start of the
next page.
This change also fixes a issues with disarding events. If most events are
discarded, timestamps will start to creep into the ring buffer. If we
do not discard the timestamps then they can fill up the ring buffer over
time and waste space.
This change will keep time stamps from filling up over another page. If
something is recorded in the buffer page, and the rest is filtered, then
the time stamps can only fill up to the end of the page.
[ Impact: prevent time stamps from filling ring buffer ]
Reported-by: Tim Bird <tim.bird@am.sony.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 3 Jun 2009 03:00:53 +0000 (23:00 -0400)]
ring-buffer: try to discard unneeded timestamps
There are times that a race may happen that we add a timestamp in a
nested write. This timestamp would just contain a zero delta and serves
no purpose.
Now that we have a way to discard events, this patch will try to discard
the timestamp instead of just wasting the space in the ring buffer.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Tim Bird [Wed, 3 Jun 2009 00:06:54 +0000 (17:06 -0700)]
ring-buffer: fix bug in ring_buffer_discard_commit
There's a bug in ring_buffer_discard_commit. The wrong
pointer is being compared in order to check if the event
can be freed from the buffer rather than discarded
(i.e. marked as PAD).
I noticed this when I was working on duration filtering.
The bug is not deadly - it just results in lots of wasted
space in the buffer. All filtered events are left in
the buffer and marked as discarded, rather than being
removed from the buffer to make space for other events.
Unfortunately, when I fixed this bug, I got errors doing a
filtered function trace. Multiple TIME_EXTEND
events pile up in the buffer, and trigger the
following loop overage warning in rb_iter_peek():
again:
...
if (RB_WARN_ON(cpu_buffer, ++nr_loops > 10))
return NULL;
I'm not sure what the best way is to fix this. I don't
know if I should extend the loop threshhold, or if I should
make the test more complex (ignore TIME_EXTEND
events), or just get rid of this loop check completely.
Note that if I implement a workaround for this, then I
see another problem from rb_advance_iter(). I haven't
tracked that one down yet.
In general, it seems like the case of removing filtered
events has not been working properly, and so some assumptions
about buffer invariant conditions need to be revisited.
Here's the patch for the simple fix:
Compare correct pointer for checking if an event can be
freed rather than left as discarded in the buffer.
Signed-off-by: Tim Bird <tim.bird@am.sony.com>
LKML-Reference: <4A25BE9E.5090909@am.sony.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 2 Jun 2009 01:51:28 +0000 (21:51 -0400)]
ftrace: do not profile functions when disabled
A race was found that if one were to enable and disable the function
profiler repeatedly, then the system can panic. This was because a profiled
function may be preempted just before disabling interrupts. While
the profiler is disabled and then reenabled, the preempted function
could start again, and access the hash as it is being initialized.
This just adds a check in the irq disabled part to check if the profiler
is enabled, and if it is not then it will just exit.
When the system is disabled, the profile_enabled variable is cleared
before calling the unregistering of the function profiler. This
unregistering calls stop machine which also acts as a synchronize schedule.
[ Impact: fix panic in enabling/disabling function profiler ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 1 Jun 2009 19:16:05 +0000 (15:16 -0400)]
tracing: make trace pipe recognize latency format flag
The trace_pipe did not recognize the latency format flag and would produce
different output than the trace file. The problem was partly due that
the trace flags in the iterator was not set as well as the trace_pipe
zeros out part of the iterator (including the flags) to be able to use
the same routines as the trace file. trace_flags of the iterator should
not cause any problems when not zeroed out by for trace_pipe.
Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 1 Jun 2009 16:20:40 +0000 (12:20 -0400)]
tracing: remove redundant SOFTIRQ from softirq event traces
After converting the softirq tracer to use te flags options, this
caused a regression with the name. Since the flag was used directly
it was printed out (i.e. HRTIMER_SOFTIRQ).
This patch only shows the softirq name without the SOFTIRQ part.
[ Impact: fix regression of output from softirq events ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
tracing: add exports to use __print_symbolic and __print_flags from a module
A patch to allow the use of __print_symbolic and __print_flags
from a module. This allows the current GFS2 tracing patch to
build.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
LKML-Reference: <1243868015.29604.542.camel@localhost.localdomain> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Li Zefan [Mon, 1 Jun 2009 07:35:46 +0000 (15:35 +0800)]
tracing/events: introduce __dynamic_array()
__string() is limited:
- it's a char array, but we may want to define array with other types
- a source string should be available, but we may just know the string size
We introduce __dynamic_array() to break those limitations, and __string()
becomes a wrapper of it. As a side effect, now __get_str() can be used
in TP_fast_assign but not only TP_print.
Take XFS for example, we have the string length in the dirent, but the
string itself is not NULL-terminated, so __dynamic_array() can be used:
Steven Rostedt [Thu, 28 May 2009 20:31:21 +0000 (16:31 -0400)]
tracing: combine the default tracers into one config
Both event tracer and sched switch plugin are selected by default
by all generic tracers. But if no generic tracer is enabled, their options
appear. But ether one of them will select the other, thus it only
makes sense to have the default tracers be selected by one option.
[ Impact: clean up kconfig menu ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Thu, 28 May 2009 19:50:13 +0000 (15:50 -0400)]
tracing: fix config options to not show when automatically selected
There are two options that are selected by all tracers, but we want
to have those options available when no tracer is selected. These are
The event tracer and sched switch tracer.
The are enabled by all tracers, but if a tracer is not selected we want
the options to appear. All tracers including them select TRACING.
Thus what we would like to do is:
config EVENT_TRACER
bool "prompt"
depends on TRACING
select TRACING
But that gives us a bug in the kbuild system since we just created a
circular dependency. We only want the prompt to show when TRACING is off.
This patch adds GENERIC_TRACER that all tracers will select instead of
TRACING. The two options (sched switch and event tracer) will select
TRACING directly and depend on !GENERIC_TRACER. This solves the cicular
dependency.
[ Impact: hide options that are selected by default ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Where function-list is a comma separated list of functions to filter.
The ftrace_notrace will make the functions listed not be included
in the function tracing, and ftrace_filter will only trace the functions
listed.
These two act the same as the debugfs/tracing/set_ftrace_notrace and
debugfs/tracing/set_ftrace_filter respectively.
The simple glob expressions that are allowed by the filter files can also
be used by the command line interface.
ftrace_notrace=rcu*,*lock,*spin*
Will not trace any function that starts with rcu, ends with lock, or has
the word spin in it.
Note, if the self tests are enabled, they may interfere with the filtering
set by the command lines.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
tracing/stat: remove unappropriate safe walk on list
register_stat_tracer() uses list_for_each_entry_safe
to check whether a tracer is already present in the list.
But we don't delete anything from the list here, so
we don't need the safe version
tracing/stat: replace linked list by an rbtree for sorting
When the stat tracing framework prepares the entries from a tracer
to output them to the user, it starts by computing a linear sort
through a linked list to give the entries ordered by relevance
to the user.
This is quite ugly and causes a small latency when we begin to
read the file.
This patch changes that by turning the linked list into a red-black
tree. Athough the whole iteration using the start and next tracer
callbacks while opening the file remain the same, it is now much
more fast and scalable.
The rbtree guarantees O(log(n)) insertions whereas a linked
list with linear sorting brought us a O(n) despair. Now the
(visible) latency has disapeared.
[ Impact: kill the latency while starting to read a stat tracer file ]
tracing/stat: replace trace_stat_session by stat_session
The "trace" prefix in struct trace_stat_session type is annoying while
reading the trace_stat.c file. It makes the lines longer, and
is not that much useful to explain the sense of this type.
trace_workqueue: remove blank line between each cpu
The blankline between each cpu's workqueue stat is not necessary, because
the cpu number is enough to part them by eye.
Old style also caused a blankline below headline, and made code complex
by using lock, disableirq and get cpu var.
cpu_workqueue_stats->first_entry is useless because we can retrieve the
header of a cpu workqueue using:
if (&cpu_workqueue_stats->list == workqueue_cpu_stat(cpu)->list.next)
[ Impact: cleanup ]
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
ftrace, workqueuetrace: make workqueue tracepoints use TRACE_EVENT macro
v3: zhaolei@cn.fujitsu.com: Change TRACE_EVENT definition to new format
introduced by Steven Rostedt: consolidate trace and trace_event headers
v2: kosaki@jp.fujitsu.com: print the function names instead of addr, and zap
the work addr
v1: zhaolei@cn.fujitsu.com: Make workqueue tracepoints use TRACE_EVENT macro
TRACE_EVENT is a more generic way to define tracepoints.
Doing so adds these new capabilities to the tracepoints:
- zero-copy and per-cpu splice() tracing
- binary tracing without printf overhead
- structured logging records exposed under /debug/tracing/events
- trace events embedded in function tracer output and other plugins
- user-defined, per tracepoint filter expressions
Then, this patch converts DEFINE_TRACE to TRACE_EVENT in workqueue related
tracepoints.
[ Impact: expand workqueue tracer to events tracing ]
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Zhaolei [Wed, 27 May 2009 13:36:02 +0000 (21:36 +0800)]
ftrace: don't convert function's local variable name in macro
"call" is an argument of macro, but it is also used as a local
variable name of function in macro.
We should keep this local variable name distinct from any
CPP macro parameter name if both are in the same macro scope,
although it hasn't caused any problem yet.
[ Impact: robustify macro ]
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Heiko Carstens [Tue, 26 May 2009 15:28:02 +0000 (17:28 +0200)]
trace: disable preemption before taking raw spinlocks
s390 code uses smp_processor_id() in __raw_spin_lock() code which
reveals that a (raw) spinlock is taken without preemption disabled.
This can potentially deadlock.
To fix this explicitly disable and enable preemption.
Steven Rostedt [Wed, 20 May 2009 23:56:19 +0000 (19:56 -0400)]
tracing: convert irq events to use __print_symbolic
The recording of the names at trace time is inefficient. This patch
implements the softirq event recording to only record the vector
and then use the __print_symbolic interface to print out the names.
[ Impact: faster recording of softirq events ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Steven Rostedt [Wed, 20 May 2009 23:21:47 +0000 (19:21 -0400)]
tracing: add __print_symbolic to trace events
This patch adds __print_symbolic which is similar to __print_flags but
works for an enumeration type instead. That is, there is only a one to one
mapping between the values and the symbols. When a match is made, then
it is printed, otherwise the hex value is outputed.
[ Impact: add interface for showing symbol names in events ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Steven Rostedt [Fri, 15 May 2009 20:16:30 +0000 (16:16 -0400)]
tracing: add flag output for kmem events
This patch changes the output for gfp_flags from being a simple hex value
to the actual names.
gfp_flags=GFP_ATOMIC instead of gfp_flags=00000020
And even
gfp_flags=GFP_KERNEL instead of gfp_flags=000000d0
(Thanks to Frederic Weisbecker for pointing out that the first version
had a bad order of GFP masks)
[ Impact: more human readable output from tracer ]
Acked-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Steven Rostedt [Fri, 15 May 2009 14:51:13 +0000 (10:51 -0400)]
tracing: add previous task state info to sched switch event
It is useful to see the state of a task that is being switched out.
This patch adds the output of the state of the previous task in
the context switch event.
[ Impact: see state of switched out task in context switch ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Steven Rostedt [Tue, 26 May 2009 18:25:22 +0000 (20:25 +0200)]
tracing: add __print_flags for events
Developers have been asking for the ability in the ftrace event tracer
to display names of bits in a flags variable.
Instead of printing out c2, it would be easier to read FOO|BAR|GOO,
assuming that FOO is bit 1, BAR is bit 6 and GOO is bit 7.
Some examples where this would be useful are the state flags in a context
switch, kmalloc flags, and even permision flags in accessing files.
[
v2 changes include:
Frederic Weisbecker's idea of using a mask instead of bits,
thus we can output GFP_KERNEL instead of GPF_WAIT|GFP_IO|GFP_FS.
Li Zefan's idea of allowing the caller of __print_flags to add their
own delimiter (or no delimiter) where we can get for file permissions
rwx instead of r|w|x.
]
[
v3 changes:
Christoph Hellwig's idea of using an array instead of va_args.
]
[ Impact: better displaying of flags in trace output ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Zhaolei [Mon, 25 May 2009 10:11:59 +0000 (18:11 +0800)]
ftrace: Add task_comm support for trace_event
If we enable a trace event alone without any tracer running (such as
function tracer, sched switch tracer, etc...) it can't output enough
task command information.
We need to use the tracing_{start/stop}_cmdline_record() helpers
which are designed to keep track of cmdlines for any tasks that
were scheduled during the tracing.
Changelog:
v1->v2: Update Kconfig to select CONTEXT_SWITCH_TRACER in
ENABLE_EVENT_TRACING
v2->v3: v2 can solve problem that was caused by config EVENT_TRACING
alone, but when CONFIG_FTRACE is off and CONFIG_TRACING is
selected by other config, compile fail happened again.
This version solves it.
[ Impact: fix incomplete output of event tracing ]
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4A14FDFE.2080402@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Lai Jiangshan [Mon, 18 May 2009 11:35:34 +0000 (19:35 +0800)]
tracing: add trace_event_read_lock()
I found that there is nothing to protect event_hash in
ftrace_find_event(). Rcu protects the event hashlist
but not the event itself while we use it after its extraction
through ftrace_find_event().
This lack of a proper locking in this spot opens a race
window between any event dereferencing and module removal.
If the event retrieved belongs to a module and this module
is concurrently removed, we may end up dereferencing a data
from a freed module.
RCU could solve this, but it would add latency to the kernel and
forbid tracers output callbacks to call any sleepable code.
So this fix converts 'trace_event_mutex' to a read/write semaphore,
and adds trace_event_read_lock() to protect ftrace_find_event().
[ Impact: fix possible freed memory dereference in ftrace ]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4A114806.7090302@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Stefan Raspl [Tue, 19 May 2009 07:59:08 +0000 (09:59 +0200)]
blktrace: remove debugfs entries on bad path
debugfs directory entries for devices are not removed on some
of the failure pathes in do_blk_trace_setup().
One way to reproduce is to start blktrace on multiple devices
with insufficient Vmalloc space: Devices will fail with
a message like this:
If so, the respective entries in debugfs
(e.g. /sys/kernel/debug/block/sdu) will remain and subsequent
attempts to start blktrace on the respective devices will not
succeed due to existing directories.
[ Impact: fix /debug/tracing file cleanup corner case ]
Signed-off-by: Stefan Raspl <stefan.raspl@linux.vnet.ibm.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: schwidefsky@de.ibm.com Cc: heiko.carstens@de.ibm.com
LKML-Reference: <4A1266CC.5040801@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Li Zefan [Tue, 19 May 2009 06:43:15 +0000 (14:43 +0800)]
tracing/events: Documentation updates
- fix some typos
- document the difference between '>' and '>>'
- document the 'enable' toggle
- remove section "Defining an event-enabled tracepoint", since it's
out-dated and sample/trace_events/ already serves this purpose.
Linus Torvalds [Fri, 15 May 2009 23:47:55 +0000 (16:47 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI MSI: Fix MSI-X with NIU cards
PCI: Fix pci-e port driver slot_reset bad default return value
Linus Torvalds [Fri, 15 May 2009 20:22:11 +0000 (13:22 -0700)]
Merge branch 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
* 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
drm/i915: Add new GET_PIPE_FROM_CRTC_ID ioctl.
drm/i915: Set HDMI hot plug interrupt enable for only the output in question.
drm/i915: Include 965GME pci ID in IS_I965GM(dev) to match UMS.
drm/i915: Use the GM45 VGA hotplug workaround on G45 as well.
drm/i915: ignore LVDS on intel graphics systems that lie about having it
drm/i915: sanity check IER at wait_request time
drm/i915: workaround IGD i2c bus issue in kernel side (v2)
drm/i915: Don't allow binding objects into the last page of the aperture.
drm/i915: save/restore fence registers across suspend/resume
drm/i915: x86 always has writeq. Add I915_READ64 for symmetry.
Linus Torvalds [Fri, 15 May 2009 19:04:37 +0000 (12:04 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: Media rotation rate and form factor heuristics
libata: Report disk alignment and physical block size
sata_fsl: Fix the command description of FSL SATA controller
sata_fsl: Fix compile warnings
[libata] sata_sx4: fixup interrupt handling
[libata] sata_sx4: convert to new exception handling methods
* git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6:
iwlwifi: fix device id registration for 6000 series 2x2 devices
ath5k: update channel in sw state after stopping RX and TX
rtl8187: use DMA-aware buffers with usb_control_msg
mac80211: avoid NULL ptr deref when finding max_rates in PID and minstrel
airo: airo_get_encode{,ext} potential buffer overflow
Pulled directly by Linus because Davem is off playing shuffle-board at
some Alaskan cruise, and the NULL ptr deref issue hits people and should
get merged sooner rather than later.
David - make us proud on the shuffle-board tournament!
libata: Media rotation rate and form factor heuristics
This patch provides new heuristics for parsing both the form factor and
media rotation rate ATA IDENFITY words.
The reported ATA version must be 7 or greater and the device must return
values defined as valid in the standard. Only then are the
characteristics reported to SCSI via the VPD B1 page.
This seems like a reasonable compromise to me considering that we have
been shipping several kernel releases that key off the rotation rate bit
without any version checking whatsoever. With no complaints so far.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
libata: Report disk alignment and physical block size
For disks with 4KB sectors, report the correct block size and alignment
when filling out the READ CAPACITY(16) response.
This patch is based upon code from Matthew Wilcox' 4KB ATA tree. I
fixed the bug I reported a while back caused by ATA and SCSI using
different approaches to describing the alignment.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Dave Liu [Thu, 14 May 2009 14:47:07 +0000 (09:47 -0500)]
sata_fsl: Fix the command description of FSL SATA controller
The bit 11 of command description is reserved bit in Freescale
SATA controller and needs to be set to '1'. This is needed to
make sure the last write from the controller to the buffer
descriptor is seen before an interrupt is raised.
Signed-off-by: Dave Liu <daveliu@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Kumar Gala [Thu, 14 May 2009 03:10:50 +0000 (22:10 -0500)]
sata_fsl: Fix compile warnings
We we build with dma_addr_t as a 64-bit quantity we get:
drivers/ata/sata_fsl.c: In function 'sata_fsl_fill_sg':
drivers/ata/sata_fsl.c:340: warning: format '%x' expects type 'unsigned int', but argument 4 has type 'dma_addr_t'
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Linus Torvalds [Fri, 15 May 2009 15:07:25 +0000 (08:07 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix race in ext4_inode_info.i_cached_extent
ext4: Clear the unwritten buffer_head flag after the extent is initialized
ext4: Use a fake block number for delayed new buffer_head
ext4: Fix sub-block zeroing for writes into preallocated extents
Linus Torvalds [Fri, 15 May 2009 15:06:45 +0000 (08:06 -0700)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: gdb documentation fix
kgdb,i386: use address that SP register points to in the exception frame
sysrq, intel_fb: fix sysrq g collision
Linus Torvalds [Fri, 15 May 2009 15:05:37 +0000 (08:05 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Revert "mm: add /proc controls for pdflush threads"
viocd: needs to depend on BLOCK
block: fix the bio_vec array index out-of-bounds test
Linus Torvalds [Fri, 15 May 2009 15:05:02 +0000 (08:05 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: Fix PCI ROM access
powerpc/pseries: Really fix the oprofile CPU type on pseries
serial/nwpserial: Fix wrong register read address and add interrupt acknowledge.
powerpc/cell: Make ptcal more reliable
powerpc: Allow mem=x cmdline to work with 4G+
powerpc/mpic: Fix incorrect allocation of interrupt rev-map
powerpc: Fix oprofile sampling of marked events on POWER7
powerpc/iseries: Fix pci breakage due to bad dma_data initialization
powerpc: Fix mktree build error on Mac OS X host
powerpc/virtex: Fix duplicate level irq events.
powerpc/virtex: Add uImage to the default images list
powerpc/boot: add simpleImage.* to clean-files list
powerpc/8xx: Update defconfigs
powerpc/embedded6xx: Update defconfigs
powerpc/86xx: Update defconfigs
powerpc/85xx: Update defconfigs
powerpc/83xx: Update defconfigs
powerpc/fsl_soc: Remove mpc83xx_wdt_init, again
devpts_get_sb() calls memset(0) to clear mount options and calls
parse_mount_options() if user specified any mount options.
The memset(0) is bogus since the 'mode' and 'ptmxmode' options are
non-zero by default. parse_mount_options() restores options to default
anyway and can properly deal with NULL mount options.
So in devpts_get_sb() remove memset(0) and call parse_mount_options() even
for NULL mount options.
Bug reported by Eric Paris: http://lkml.org/lkml/2009/5/7/448.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Tested-by: Marc Dionne <marc.c.dionne@gmail.com> Reported-by: Eric Paris <eparis@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Theodore Ts'o [Fri, 15 May 2009 13:07:28 +0000 (09:07 -0400)]
ext4: Fix race in ext4_inode_info.i_cached_extent
If two CPU's simultaneously call ext4_ext_get_blocks() at the same
time, there is nothing protecting the i_cached_extent structure from
being used and updated at the same time. This could potentially cause
the wrong location on disk to be read or written to, including
potentially causing the corruption of the block group descriptors
and/or inode table.
This bug has been in the ext4 code since almost the very beginning of
ext4's development. Fortunately once the data is stored in the page
cache cache, ext4_get_blocks() doesn't need to be called, so trying to
replicate this problem to the point where we could identify its root
cause was *extremely* difficult. Many thanks to Kevin Shanahan for
working over several months to be able to reproduce this easily so we
could finally nail down the cause of the corruption.
Jason Wessel [Thu, 12 Feb 2009 00:46:32 +0000 (18:46 -0600)]
kgdb,i386: use address that SP register points to in the exception frame
The treatment of the SP register is different on x86_64 and i386.
This is a regression fix that lived outside the mainline kernel from
2.6.27 to now. The regression was a result of the original merge
consolidation of the i386 and x86_64 archs to x86.
The incorrectly reported SP on i386 prevented stack tracebacks from
working correctly in gdb.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Jason Wessel [Thu, 14 May 2009 02:56:59 +0000 (21:56 -0500)]
sysrq, intel_fb: fix sysrq g collision
Commit 79e539453b34e35f39299a899d263b0a1f1670bd introduced a
regression where you cannot use sysrq 'g' to enter kgdb. The solution
is to move the intel fb sysrq over to V for video instead of G for
graphics. The SMP VOYAGER code to register for the sysrq-v is not
anywhere to be found in the mainline kernel, so the comments in the
code were cleaned up as well.
This patch also cleans up the sysrq definitions for kgdb to make it
generic for the kernel debugger, such that the sysrq 'g' can be used
in the future to enter a gdbstub or another kernel debugger.
Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Work is progressing to switch away from pdflush as the process backing
for flushing out dirty data. So it seems pointless to add more knobs
to control pdflush threads. The original author of the patch did not
have any specific use cases for adding the knobs, so we can easily
revert this before 2.6.30 to avoid having to maintain this API
forever.
David Brownell [Thu, 14 May 2009 20:01:59 +0000 (13:01 -0700)]
ASoC: DaVinci EVM board support buildfixes
This is a build fix, resyncing the DaVinci EVM ASoC board code
with the version in the DaVinci tree. That resync includes
support for the DM355 EVM, although that board isn't yet in
mainline.
(NOTE: also includes a bugfix to the platform_add_resources
call, recently sent by Chaithrika U S <chaithrika@ti.com> but
not yet merged into the DaVinci tree.)
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
David Brownell [Thu, 14 May 2009 19:47:42 +0000 (12:47 -0700)]
ASoC: DaVinci I2S updates
This resyncs the DaVinci I2S code with the version in the DaVinci
tree. The behavioral change uses updated clock interfaces which
recently merged to mainline. Two other changes include adding a
comment on the ASP/McBSP/McASP confusion, and dropping pdev->id in
order to support more boards than just the DM644x EVM.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
David Brownell [Thu, 14 May 2009 19:41:22 +0000 (12:41 -0700)]
ASoC: davinci-pcm buildfixes
This is a buildfix for the DaVinci PCM code, resyncing it with
the version in the DaVinci tree. The notable change is using
current EDMA interfaces, which recently merged to mainline.
(The older interfaces never made it into mainline.)
NOTE: open issue, the DMA should be to/from SRAM; see chip
errata for more info. The artifacts are extremely easy to
hear on DM355 hardware (not yet supported in mainline), but
don't seem as audible on DM6446 hardwaare (which does have
mainline support).
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
A couple of issues crept in since about 2.6.27 related to accessing PCI
device ROMs on various powerpc machines.
First, historically, we don't allocate the ROM resource in the resource
tree. I'm not entirely certain of why, I susepct they often contained
garbage on x86 but it's hard to tell. This causes the current generic
code to always call pci_assign_resource() when trying to access the said
ROM from sysfs, which will try to re-assign some new address regardless
of what the ROM BAR was already set to at boot time. This can be a
problem on hypervisor platforms like pSeries where we aren't supposed
to move PCI devices around (and in fact probably can't).
Second, our code that generates the PCI tree from the OF device-tree
(instead of doing config space probing) which we mostly use on pseries
at the moment, didn't set the (new) flag IORESOURCE_SIZEALIGN on any
resource. That means that any attempt at re-assigning such a resource
with pci_assign_resource() would fail due to resource_alignment()
returning 0.
This fixes this by doing these two things:
- The code that calculates resource flags based on the OF device-node
is improved to set IORESOURCE_SIZEALIGN on any valid BAR, and while at
it also set IORESOURCE_READONLY for ROMs since we were lacking that too
- We now allocate ROM resources as part of the resource tree. However
to limit the chances of nasty conflicts due to busted firmwares, we
only do it on the second pass of our two-passes allocation scheme,
so that all valid and enabled BARs get precedence.
This brings pSeries back the ability to access PCI ROMs via sysfs (and
thus initialize various video cards from X etc...).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/pseries: Really fix the oprofile CPU type on pseries
My previous pach for fixing the oprofile CPU type got somewhat mismerged
(by my fault) when it collided with another related patch. This should
finally (fingers crossed) fix the whole thing.
We make sure we keep the -old- oprofile type and CPU type whenever
one of them was specified in the first pass through the function.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Krill [Wed, 13 May 2009 05:56:54 +0000 (05:56 +0000)]
serial/nwpserial: Fix wrong register read address and add interrupt acknowledge.
The receive interrupt routine checks the wrong register if the
receive fifo is empty. Further an explicit interrupt acknowledge
write is introduced. In some circumstances another interrupt was
issued.
Signed-off-by: Benjamin Krill <ben@codiert.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Gerhard Stenzel [Wed, 13 May 2009 05:50:46 +0000 (05:50 +0000)]
powerpc/cell: Make ptcal more reliable
There have been a series of checkstops on QS21 related to
ptcal being set up incorrectly. On systems that only
have memory on a single node, ptcal fails when it gets
a pointer to memory on the remote node.
Moreover, agressive prefetching in memcpy and other
functions may accidentally touch the first cache line
of the page that we reserve for ptcal, which causes
an ECC checkstop.
We now allocate pages only from the specified node, moves the
ptcal area into the middle of the allocated page to avoid
potential prefetch problems and prints the address of the
ptcal area to facilitate diagnostics.
Signed-off-by: Gerhard Stenzel <gerhard.stenzel@de.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Kumar Gala [Fri, 8 May 2009 12:08:20 +0000 (12:08 +0000)]
powerpc/mpic: Fix incorrect allocation of interrupt rev-map
Before when we were setting up the irq host map for mpic we passed in
just isu_size for the size of the linear map. However, for a number of
mpic implementations we have no isu (thus pass in 0) and will end up
with a no linear map (size = 0). This causes us to always call
irq_find_mapping() from mpic_get_irq().
By moving the allocation of the host map to after we've determined the
number of sources we can actually benefit from having a linear map for
the non-isu users that covers all the interrupt sources.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Maynard Johnson [Thu, 7 May 2009 05:48:32 +0000 (05:48 +0000)]
powerpc: Fix oprofile sampling of marked events on POWER7
Description
-----------
Change ppc64 oprofile kernel driver to use the SLOT bits (MMCRA[37:39]only on
older processors where those bits are defined.
Background
----------
The performance monitor unit of the 64-bit POWER processor family has the
ability to collect accurate instruction-level samples when profiling on marked
events (i.e., "PM_MRK_<event-name>"). In processors prior to POWER6, the MMCRA
register contained "slot information" that the oprofile kernel driver used to
adjust the value latched in the SIAR at the time of a PMU interrupt. But as of
POWER6, these slot bits in MMCRA are no longer necessary for oprofile to use,
since the SIAR itself holds the accurate sampled instruction address. With
POWER6, these MMCRA slot bits were zero'ed out by hardware so oprofile's use of
these slot bits was, in effect, a NOP. But with POWER7, these bits are no
longer zero'ed out; however, they serve some other purpose rather than slot
information. Thus, using these bits on POWER7 to adjust the SIAR value results
in samples being attributed to the wrong instructions. The attached patch
changes the oprofile kernel driver to ignore these slot bits on all newer
processors starting with POWER6.
Signed-off-by: Maynard Johnson <maynardj@us.ibm.com> Signed-off-by: Michael Wolf <mjw@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc/iseries: Fix pci breakage due to bad dma_data initialization
Commit 4fc665b88a79a45bae8bbf3a05563c27c7337c3d "powerpc: Merge 32 and
64-bit dma code" made changes to the PCI initialisation code that added
an assignment to archdata.dma_data but only for 32 bit code. Commit 7eef440a545c7f812ed10b49d4a10a351df9cad6 "powerpc/pci: Cosmetic cleanups
of pci-common.c" removed the conditional compilation. Unfortunately,
the iSeries code setup the archdata.dma_data before that assignment was
done - effectively overwriting the dma_data with NULL.
Fix this up by moving the iSeries setup of dma_data into a
pci_dma_dev_setup callback.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Timur Tabi [Thu, 30 Apr 2009 18:16:44 +0000 (18:16 +0000)]
powerpc: Fix mktree build error on Mac OS X host
The mktree utility defines some variables as "uint", although this is not a
standard C type, and so cross-compiling on Mac OS X fails. Change this to
"unsigned int".
Signed-off-by: Timur Tabi <timur@freescale.com> Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Steven Rostedt [Fri, 15 May 2009 03:40:06 +0000 (23:40 -0400)]
tracing: stop stack trace on first empty entry
The stack tracer stores eight entries in the ring buffer when an event
traces the stack. The output outputs all eight entries regardless of
how many entries were recorded.
This patch breaks out of the loop when a null entry is discovered.
[ Impact: only print the stack that is recorded ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 15 May 2009 03:19:09 +0000 (23:19 -0400)]
x86/stacktrace: return 0 instead of -1 for stack ops
If we return -1 in the ops->stack for the stacktrace saving, we end up
breaking out of the loop if the stack we are tracing is in the exception
stack. This causes traces like:
Linus Torvalds [Fri, 15 May 2009 02:19:43 +0000 (19:19 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (38 commits)
MIPS: Sibyte: Fix locking in set_irq_affinity
MIPS: Use force_sig when handling address errors.
MIPS: Cavium: Add struct clocksource * argument to octeon_cvmcount_read()
MIPS: Rewrite <asm/div64.h> to work with gcc 4.4.0.
MIPS: Fix highmem.
MIPS: Fix sign-extension bug in 32-bit kernel on 32-bit hardware.
MIPS: MSP71xx: Remove the RAMROOT functions
MIPS: Use -mno-check-zero-division
MIPS: Set compiler options only after the compiler prefix has ben set.
MIPS: IP27: Get rid of #ident. Gcc 4.4.0 doesn't like it.
MIPS: uaccess: Switch lock annotations to might_fault().
MIPS: MSP71xx: Resolve use of non-existent GPIO routines in msp71xx reset
MIPS: MSP71xx: Resolve multiple definition of plat_timer_setup
MIPS: Make uaccess.h slightly more sparse friendly.
MIPS: Make access_ok() sideeffect proof.
MIPS: IP27: Fix clash with NMI_OFFSET from hardirq.h
MIPS: Alchemy: Timer build fix
MIPS: Kconfig: Delete duplicate definition of RWSEM_GENERIC_SPINLOCK.
MIPS: Cavium: Add support for 8k and 32k page sizes.
MIPS: TXx9: Fix possible overflow in clock calculations
...
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: Spelling fix in btrfs_lookup_first_block_group comments
Btrfs: make show_options result match actual option names
Btrfs: remove outdated comment in btrfs_ioctl_resize()
Btrfs: remove some WARN_ONs in the IO failure path
Btrfs: Don't loop forever on metadata IO failures
Btrfs: init inode ordered_data_close flag properly
Carl Worth [Wed, 29 Apr 2009 21:43:54 +0000 (14:43 -0700)]
drm/i915: Add new GET_PIPE_FROM_CRTC_ID ioctl.
This allows userlevel code to discover the pipe number corresponding
to a given CRTC ID. This is necessary for doing pipe-specific
operations such as waiting for vblank on a given CRTC. Failure to use
the right pipe mapping can result in GPU hangs, or at least failure
to actually sync to vblank.
Signed-off-by: Carl Worth <cworth@cworth.org>
[anholt: Style touchups from review] Signed-off-by: Eric Anholt <eric@anholt.net>
Ma Ling [Mon, 11 May 2009 03:33:22 +0000 (11:33 +0800)]
drm/i915: Set HDMI hot plug interrupt enable for only the output in question.
We detect HDMI output connection status by writing to HOT Plug Interrupt
Detect Enable bit in PORT_HOTPLUG_EN. The behavior will generate a specified
interrupt, which is caught by audio driver, but during one detection driver
set all Detect Enable bits of HDMIB, HDMIC HDMID, and generate wrong
interrupt signals for current output, according to the signals audio driver
misunderstand device status. The patch intends to handle corresponding
output precisely.
It fixed freedesktop.org bug #21371
Signed-off-by: Ma Ling <ling.ma@intel.com> Signed-off-by: Eric Anholt <eric@anholt.net>