perf_events: Fix BTS interrupt handling to avoid being dazed by NMI (v2)
Fix a bug introduced with commit de725de and the change in the
meaning of the return value of intel_pmu_handle_irq(). With the
current code, when you are using the BTS, you get 'dazed by NMI'
each time the BTS buffer fills up.
BTS does interrupt on the PMU vector, thus NMI. You need to take
this into account in the return value of the function.
This version fixes initial patch which was missing changes to
perf_event_intel_ds.c.
Peter Zijlstra [Fri, 10 Sep 2010 10:51:54 +0000 (12:51 +0200)]
perf: Ensure we call add_event_to_ctx() with the right locks held
Even though we call it from the inherit path, where the child is
not yet accessible, we need to hold ctx->lock, add_event_to_ctx()
assumes IRQs are disabled.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
asm-generic/hardirq.h needs asm/irq.h which might include
linux/interrupt.h as in the sparc 32 case. At this point
we need irq_cpustat generic definitions, but those are
included later in asm-generic/hardirq.h.
Then delay a bit the inclusion of irq.h from
asm-generic/hardirq.h, it doesn't need to be included early.
This fixes:
include/linux/interrupt.h: In function '__raise_softirq_irqoff':
include/linux/interrupt.h:414: error: implicit declaration of function 'local_softirq_pending'
include/linux/interrupt.h:414: error: lvalue required as left operand of assignment
Peter Zijlstra [Tue, 7 Sep 2010 16:32:22 +0000 (18:32 +0200)]
perf: Optimize context ops
Assuming we don't mix events of different pmus onto a single context
(with the exeption of software events inside a hardware group) we can
now assume that all events on a particular context belong to the same
pmu, hence we can disable the pmu for the entire context operations.
This reduces the amount of hardware writes.
The exception for swevents comes from the fact that the sw pmu disable
is a nop.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com>
LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 2 Sep 2010 14:50:03 +0000 (16:50 +0200)]
perf: Multiple task contexts
Provide the infrastructure for multiple task contexts.
A more flexible approach would have resulted in more pointer chases
in the scheduling hot-paths. This approach has the limitation of a
static number of task contexts.
Since I expect most external PMUs to be system wide, or at least node
wide (as per the intel uncore unit) they won't actually need a task
context.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com>
LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 6 Sep 2010 14:32:21 +0000 (16:32 +0200)]
perf: Per cpu-context rotation timer
Give each cpu-context its own timer so that it is a self contained
entity, this eases the way for per-pmu-per-cpu contexts as well as
provides the basic infrastructure to allow different rotation
times per pmu.
Things to look at:
- folding the tick and these TICK_NSEC timers
- separate task context rotation
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com>
LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Wed, 18 Aug 2010 12:37:15 +0000 (14:37 +0200)]
perf: Separate find_get_context() from event initialization
Separate find_get_context() from the event allocation and
initialization so that we may make find_get_context() depend
on the event pmu in a later patch.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com>
LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Wed, 16 Jun 2010 12:37:10 +0000 (14:37 +0200)]
perf: Rework the PMU methods
Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.
The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.
This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).
It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).
The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:
1) We disable the counter:
a) the pmu has per-counter enable bits, we flip that
b) we program a NOP event, preserving the counter state
2) We store the counter state and ignore all read/overflow events
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Masami Hiramatsu [Fri, 27 Aug 2010 11:39:06 +0000 (20:39 +0900)]
tracing/kprobes: Fix handling of argument names
Set "argN" name for each argument automatically if it has no specified name.
Since dynamic trace event(kprobe_events) accepts special characters for its
argument, its format can show those special characters (e.g. '$', '%', '+').
However, perf can't parse those format because of the character (especially
'%') mess up the format. This sets "argX" name for those arguments if user
omitted the argument names.
Masami Hiramatsu [Fri, 27 Aug 2010 11:38:59 +0000 (20:38 +0900)]
perf probe: Fix handling of arguments names
Don't make argument names from raw parameters (means the parameters are written
in kprobe-tracer syntax), because the argument syntax may include special
characters. Just leave it, then kprobe-tracer gives a new name.
Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100827113859.22882.75598.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Masami Hiramatsu [Fri, 27 Aug 2010 11:38:46 +0000 (20:38 +0900)]
tracing/kprobe: Fix a memory leak in error case
Fix a memory leak which happens when a field name conflicts with others. In
error case, free_trace_probe() will free all arguments until nr_args, so this
increments nr_args the begining of the loop instead of the end.
Steven Rostedt [Wed, 8 Sep 2010 15:20:37 +0000 (11:20 -0400)]
tracing: Do not allow llseek to set_ftrace_filter
Reading the file set_ftrace_filter does three things.
1) shows whether or not filters are set for the function tracer
2) shows what functions are set for the function tracer
3) shows what triggers are set on any functions
3 is independent from 1 and 2.
The way this file currently works is that it is a state machine,
and as you read it, it may change state. But this assumption breaks
when you use lseek() on the file. The state machine gets out of sync
and the t_show() may use the wrong pointer and cause a kernel oops.
Luckily, this will only kill the app that does the lseek, but the app
dies while holding a mutex. This prevents anyone else from using the
set_ftrace_filter file (or any other function tracing file for that matter).
A real fix for this is to rewrite the code, but that is too much for
a -rc release or stable. This patch simply disables llseek on the
set_ftrace_filter() file for now, and we can do the proper fix for the
next major release.
Reported-by: Robert Swiecki <swiecki@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Tavis Ormandy <taviso@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: vendor-sec@lst.de Cc: <stable@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Koki Sanagi [Mon, 23 Aug 2010 09:47:09 +0000 (18:47 +0900)]
perf: Add a script to show packets processing
Add a perf script which shows packets processing and processed
time. It helps us to investigate networking or network devices.
If you want to use it, install perf and record perf.data like
following.
If you set script, perf gathers records until it ends.
If not, you must Ctrl-C to stop recording.
And if you want a report from record,
If you use some options, you can limit the output.
Option is below.
tx: show only tx packets processing
rx: show only rx packets processing
dev=: show processing on this device
debug: work with debug mode. It shows buffer status.
For example, if you want to show received packets processing
associated with eth4,
Koki Sanagi [Mon, 23 Aug 2010 09:46:12 +0000 (18:46 +0900)]
skb: Add tracepoints to freeing skb
This patch adds tracepoint to consume_skb and add trace_kfree_skb
before __kfree_skb in skb_free_datagram_locked and net_tx_action.
Combinating with tracepoint on dev_hard_start_xmit, we can check
how long it takes to free transmitted packets. And using it, we can
calculate how many packets driver had at that time. It is useful when
a drop of transmitted packet is a problem.
Koki Sanagi [Mon, 23 Aug 2010 09:45:02 +0000 (18:45 +0900)]
netdev: Add tracepoints to netdev layer
This patch adds tracepoint to dev_queue_xmit, dev_hard_start_xmit,
netif_rx and netif_receive_skb. These tracepoints help you to monitor
network driver's input/output.
Neil Horman [Mon, 23 Aug 2010 09:43:51 +0000 (18:43 +0900)]
napi: Convert trace_napi_poll to TRACE_EVENT
This patch converts trace_napi_poll from DECLARE_EVENT to TRACE_EVENT
to improve the usability of napi_poll tracepoint.
<idle>-0 [001] 241302.750777: napi_poll: napi poll on napi struct f6acc480 for device eth3
<idle>-0 [000] 241302.852389: napi_poll: napi poll on napi struct f5d0d70c for device eth1
The original patch is below:
http://marc.info/?l=linux-kernel&m=126021713809450&w=2
[ sanagi.koki@jp.fujitsu.com: And add a fix by Steven Rostedt:
http://marc.info/?l=linux-kernel&m=126150506519173&w=2 ]
Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Neil Horman <nhorman@tuxdriver.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Kaneshige Kenji <kaneshige.kenji@jp.fujitsu.com> Cc: Izumo Taku <izumi.taku@jp.fujitsu.com> Cc: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Scott Mcmillan <scott.a.mcmillan@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Eric Dumazet <eric.dumazet@gmail.com>
LKML-Reference: <4C7242D7.4050009@jp.fujitsu.com> Signed-off-by: Koki Sanagi <sanagi.koki@jp.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Robert Richter [Thu, 2 Sep 2010 19:07:48 +0000 (15:07 -0400)]
perf, x86: Try to handle unknown nmis with an enabled PMU
When the PMU is enabled it is valid to have unhandled nmis, two
events could trigger 'simultaneously' raising two back-to-back
NMIs. If the first NMI handles both, the latter will be empty
and daze the CPU.
The solution to avoid an 'unknown nmi' massage in this case was
simply to stop the nmi handler chain when the PMU is enabled by
stating the nmi was handled. This has the drawback that a) we
can not detect unknown nmis anymore, and b) subsequent nmi
handlers are not called.
This patch addresses this. Now, we check this unknown NMI if it
could be a PMU back-to-back NMI. Otherwise we pass it and let
the kernel handle the unknown nmi.
This is a debug log:
cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
Uhhuh. NMI received for unknown reason 00 on CPU 6.
Do you have a strange power saving mode enabled?
Dazed and confused, but trying to continue
For back-to-back nmi detection there are the following rules:
The PMU nmi handler was handling more than one counter and no
counter was handled in the subsequent nmi (see [1] and [2]
above).
There is another case if there are two subsequent back-to-back
nmis [3]. The 2nd is detected as back-to-back because the first
handled more than one counter. If the second handles one counter
and the 3rd handles nothing, we drop the 3rd nmi because it
could be a back-to-back nmi.
Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[ renamed nmi variable to pmu_nmi to avoid clash with .nmi in entry.S ] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com Cc: ying.huang@intel.com Cc: ming.m.lin@intel.com Cc: eranian@google.com
LKML-Reference: <1283454469-1909-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Don Zickus [Thu, 2 Sep 2010 19:07:47 +0000 (15:07 -0400)]
perf, x86: Fix accidentally ack'ing a second event on intel perf counter
During testing of a patch to stop having the perf subsytem
swallow nmis, it was uncovered that Nehalem boxes were randomly
getting unknown nmis when using the perf tool.
Moving the ack'ing of the PMI closer to when we get the status
allows the hardware to properly re-set the PMU bit signaling
another PMI was triggered during the processing of the first
PMI. This allows the new logic for dealing with the
shortcomings of multiple PMIs to handle the extra NMI by
'eat'ing it later.
Now one can wonder why are we getting a second PMI when we
disable all the PMUs in the begining of the NMI handler to
prevent such a case, for that I do not know. But I know the fix
below helps deal with this quirk.
Tested on multiple Nehalems where the problem was occuring.
With the patch, the code now loops a second time to handle the
second PMI (whereas before it was not).
Steven Rostedt [Wed, 1 Sep 2010 16:23:12 +0000 (12:23 -0400)]
ring-buffer: Place duplicate expression into a single function
While discussing the strictness of the 80 character limit on the
Kernel Summit Discussion mailing list, I showed examples that I
broke that limit slightly with some algorithms. In discussing with
John Linville, what looked better, I realized that two of the
80 char breaking culprits were an identical expression.
As a clean up, this patch moves the identical expression into its
own helper function and that is used instead. As a side effect,
the offending code is now under the 80 character limit. :-)
This clean up code also changes the expression from
(A - B) - C to A - (B + C)
This makes the code look a little nicer too.
Cc: John W. Linville <linville@tuxdriver.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Don Zickus [Wed, 1 Sep 2010 03:00:07 +0000 (23:00 -0400)]
lockup_detector: Sync touch_*_watchdog back to old semantics
During my rewrite, the semantics of touch_nmi_watchdog and
touch_softlockup_watchdog changed enough to break some drivers
(mostly over preemptable regions).
These are cases where long delays on one CPU (due to
print_delay for example) can cause long delays on other
CPUs - so we must 'touch' the nmi_watchdog flag of those
other CPUs as well.
This change brings those touch_*_watchdog() functions back in line
with to how they used to work.
Cyrill Gorcunov [Wed, 25 Aug 2010 18:23:34 +0000 (22:23 +0400)]
perf, x86, Pentium4: Add RAW events verification
Implements verification of
- Bits of ESCR EventMask field (meaningful bits in field are hardware
predefined and others bits should be set to zero)
- INSTR_COMPLETED event (it is available on predefined cpu model only)
- Thread shared events (they should be guarded by "perf_event_paranoid"
sysctl due to security reason). The side effect of this action is
that PERF_COUNT_HW_BUS_CYCLES become a "paranoid" general event.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Tested-by: Lin Ming <ming.m.lin@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100825182334.GB14874@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Akinobu Mita [Wed, 1 Sep 2010 03:00:08 +0000 (23:00 -0400)]
lockup_detector: Convert cpu notifier to return encapsulate errno value
By the commit e6bde73b07edeb703d4c89c1daabc09c303de11f
("cpu-hotplug: return better errno on cpu hotplug failure"),
the cpu notifier can return encapsulate errno value, resulting
in more meaningful error codes for CPU hotplug failures.
This converts the cpu notifier to return encapsulate errno value
for the lockup_detector as well.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com
LKML-Reference: <1283310009-22168-3-git-send-email-dzickus@redhat.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Steven Rostedt [Tue, 31 Aug 2010 20:35:20 +0000 (16:35 -0400)]
tracing/lockdep: Fix dependency of TRACE_IRQFLAGS
When CONFIG_IRQSOFF_TRACER is set and CONFIG_PROVE_LOCKING is not, we
get the following error:
$ make oldconfig
scripts/kconfig/conf --oldconfig arch/x86/Kconfig
warning: (IRQSOFF_TRACER && TRACING_SUPPORT && FTRACE && TRACE_IRQFLAGS_SUPPORT && !ARCH_USES_GETTIMEOFFSET) selects TRACE_IRQFLAGS which has unmet direct dependencies (DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && PROVE_LOCKING)
warning: (IRQSOFF_TRACER && TRACING_SUPPORT && FTRACE && TRACE_IRQFLAGS_SUPPORT && !ARCH_USES_GETTIMEOFFSET) selects TRACE_IRQFLAGS which has unmet direct dependencies (DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && PROVE_LOCKING)
This is because IRQSOFF_TRACER selects TRACE_IRQFLAGS but TRACE_IRQFLAGS
has PROVE_LOCKING as a dependency. This code is incorrect, and
this patch changes the TRACE_IRQFLAGS to be just a simple bool that
does not depend or select anything. Instead both IRQSOFF_TRACER and
PROVE_LOCKING select it.
Reported-by: Richard Kennedy <richard@rsk.demon.co.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Robert Richter [Mon, 30 Aug 2010 08:56:18 +0000 (10:56 +0200)]
oprofile, x86: fix init_sysfs error handling
On failure init_sysfs() might not properly free resources. The error
code of the function is not checked. And, when reinitializing the exit
function might be called twice. This patch fixes all this.
Cc: stable@kernel.org Signed-off-by: Robert Richter <robert.richter@amd.com>
Stephane Eranian [Thu, 26 Aug 2010 14:40:01 +0000 (16:40 +0200)]
perf_events: Fix time tracking for events with pid != -1 and cpu != -1
Per-thread events with a cpu filter, i.e., cpu != -1, were not
reporting correct timings when the thread never ran on the
monitored cpu. The time enabled was reported as a negative
value.
This patch fixes the problem by updating tstamp_stopped,
tstamp_running in event_sched_out() for events with filters and
which are marked as INACTIVE.
The function group_sched_out() is modified to systematically
call into event_sched_out() to avoid duplicating the timing
adjustment code twice.
Each histogram entry has a callchain root that stores the
callchain samples. However we forgot to initialize the
tracking of children hits of these roots, which then got
random values on their creation.
The root children hits is multiplied by the minimum percentage
of hits provided by the user, and the result becomes the minimum
hits expected from children branches. If the random value due
to the uninitialization is big enough, then this minimum number
of hits can be huge and eventually filter every children branches.
The end result was invisible callchains. All we need to
fix this is to initialize the children hits of the root.
Reported-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: 2.6.32.x-2.6.35.y <stable@kernel.org>
Tom Zanussi [Tue, 24 Aug 2010 05:23:50 +0000 (00:23 -0500)]
perf tools: Fix linking errors with --as-needed flag
External shared libraries should never be appended to the LDFLAGS as this
messes the linking order. As EXTLIBS collects those libraries, it seems that
perl and python libraries should also be appended to EXTLIBS.
Also fix the broken linking order.
This is a refresh of a patch by Ozan Çağlayan and improved by both Tom Zanussi
and Kirill A. Shutemov.
Cc: Ozan Çağlayan <ozan@pardus.org.tr> Tested-by: Kirill A. Shutemov <kirill@shutemov.name> Tested-by: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <1282627430.28324.8.camel@tropicana> Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf hists browser: Introduce "expand/collapse all callchains" action
When looking at a callchains enabled perf data file one can find it
tiresome to start with all callchains collapsed and then to have to go
one by one expanding them.
So associate 'E' with "Expand all callchains" and 'C' with "Collapse all
callchains".
This way now one can have the top level view and then switch to/from
having all callchains expanded.
More work is needed to allow expanding just from one branch down to its
leaves.
Reported-by: Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Anton Blanchard [Wed, 25 Aug 2010 01:32:38 +0000 (11:32 +1000)]
tracing/trace_stack: Fix stack trace on ppc64
save_stack_trace() stores the instruction pointer, not the
function descriptor. On ppc64 the trace stack code currently
dereferences the instruction pointer and shows 8 bytes of
instructions in our backtraces:
Robert Richter [Fri, 13 Aug 2010 14:29:04 +0000 (16:29 +0200)]
oprofile: fix crash when accessing freed task structs
This patch fixes a crash during shutdown reported below. The crash is
caused by accessing already freed task structs. The fix changes the
order for registering and unregistering notifier callbacks.
All notifiers must be initialized before buffers start working. To
stop buffer synchronization we cancel all workqueues, unregister the
notifier callback and then flush all buffers. After all of this we
finally can free all tasks listed.
This should avoid accessing freed tasks.
On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:
> So the initial observation is a spinlock bad magic followed by a crash
> in the spinlock debug code:
>
> [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136
> [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03
>
> Backtrace looks like:
>
> spin_bug+0x74/0xd4
> ._raw_spin_lock+0x48/0x184
> ._spin_lock+0x10/0x24
> .get_task_mm+0x28/0x8c
> .sync_buffer+0x1b4/0x598
> .wq_sync_buffer+0xa0/0xdc
> .worker_thread+0x1d8/0x2a8
> .kthread+0xa8/0xb4
> .kernel_thread+0x54/0x70
>
> So we are accessing a freed task struct in the work queue when
> processing the samples.
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: stable@kernel.org Signed-off-by: Robert Richter <robert.richter@amd.com>
Luck, Tony [Tue, 24 Aug 2010 18:44:18 +0000 (11:44 -0700)]
guard page for stacks that grow upwards
pa-risc and ia64 have stacks that grow upwards. Check that
they do not run into other mappings. By making VM_GROWSUP
0x0 on architectures that do not ever use it, we can avoid
some unpleasant #ifdefs in check_stack_guard_page().
Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesse Barnes [Tue, 24 Aug 2010 18:31:16 +0000 (11:31 -0700)]
drm/i915: fix vblank wait test condition
When converting this to the new wait_for macro I inverted the wait
condition, which causes all sorts of problems. So correct it to fix
several failures caused by the bad wait (flickering, bad output
detection, tearing, etc.).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Tue, 24 Aug 2010 08:41:33 +0000 (10:41 +0200)]
V4L/DVB: mantis: Fix IR_CORE dependency
This build bug triggers:
drivers/built-in.o: In function `mantis_exit':
(.text+0x377413): undefined reference to `ir_input_unregister'
drivers/built-in.o: In function `mantis_input_init':
(.text+0x3774ff): undefined reference to `__ir_input_register'
If MANTIS_CORE is enabled but IR_CORE is not. Add the correct
dependency.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Get rid of indirect p1275 PROM call buffer.
sparc64: Fill a missing delay slot.
sparc64: Make lock backoff really a NOP on UP builds.
sparc64: simple microoptimizations for atomic functions
sparc64: Make rwsems 64-bit.
sparc64: Really fix atomic64_t interface types.
[S390] fix tlb flushing vs. concurrent /proc accesses
The tlb flushing code uses the mm_users field of the mm_struct to
decide if each page table entry needs to be flushed individually with
IPTE or if a global flush for the mm_struct is sufficient after all page
table updates have been done. The comment for mm_users says "How many
users with user space?" but the /proc code increases mm_users after it
found the process structure by pid without creating a new user process.
Which makes mm_users useless for the decision between the two tlb
flusing methods. The current code can be confused to not flush tlb
entries by a concurrent access to /proc files if e.g. a fork is in
progres. The solution for this problem is to make the tlb flushing
logic independent from the mm_users field.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (25 commits)
powerpc: Fix config dependency problem with MPIC_U3_HT_IRQS
via-pmu: Add compat_pmu_ioctl
powerpc: Wire up fanotify_init, fanotify_mark, prlimit64 syscalls
powerpc/pci: Fix checking for child bridges in PCI code.
powerpc: Fix typo in uImage target
powerpc: Initialise paca->kstack before early_setup_secondary
powerpc: Fix bogus it_blocksize in VIO iommu code
powerpc: Inline ppc64_runlatch_off
powerpc: Correct smt_enabled=X boot option for > 2 threads per core
powerpc: Silence xics_migrate_irqs_away() during cpu offline
powerpc: Silence __cpu_up() under normal operation
powerpc: Re-enable preemption before cpu_die()
powerpc/pci: Drop unnecessary null test
powerpc/powermac: Drop unnecessary null test
powerpc/powermac: Drop unnecessary of_node_put
powerpc/kdump: Stop all other CPUs before running crash handlers
powerpc/mm: Fix vsid_scrample typo
powerpc: Use is_32bit_task() helper to test 32 bit binary
powerpc: Export memstart_addr and kernstart_addr on ppc64
powerpc: Make rwsem use "long" type
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
68328serial: check return value of copy_*_user() instead of access_ok()
synclink: add mutex_unlock() on error path
rocket: add a mutex_unlock()
ip2: return -EFAULT on copy_to_user errors
ip2: remove unneeded NULL check
serial: print early console device address in hex
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
kobject_uevent: fix typo in comments
firmware_class: fix typo in error path
kobject: Break the kobject namespace defs into their own header
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (29 commits)
ARM: imx: fix build failure concerning otg/ulpi
USB: ftdi_sio: add product ID for Lenz LI-USB
USB: adutux: fix misuse of return value of copy_to_user()
USB: iowarrior: fix misuse of return value of copy_to_user()
USB: xHCI: update ring dequeue pointer when process missed tds
USB: xhci: Remove buggy assignment in next_trb()
USB: ftdi_sio: Add ID for Ionics PlugComputer
USB: serial: io_ti.c: don't return 0 if writing the download record failed
USB: otg: twl4030: fix wrong assumption of starting state
USB: gadget: Return -ENOMEM on memory allocation failure
USB: gadget: fix composite kernel-doc warnings
USB: ssu100: set tty_flags in ssu100_process_packet
USB: ssu100: add disconnect function for ssu100
USB: serial: export symbol usb_serial_generic_disconnect
USB: ssu100: rework logic for TIOCMIWAIT
USB: ssu100: add register parameter to ssu100_setregister
USB: ssu100: remove duplicate #defines in ssu100
USB: ssu100: refine process_packet in ssu100
USB: ssu100: add locking for port private data in ssu100
USB: r8a66597-udc: return -ENOMEM if kzalloc() fails
...
David S. Miller [Tue, 24 Aug 2010 06:10:57 +0000 (23:10 -0700)]
sparc64: Get rid of indirect p1275 PROM call buffer.
This is based upon a report by Meelis Roos showing that it's possible
that we'll try to fetch a property that is 32K in size with some
devices. With the current fixed 3K buffer we use for moving data in
and out of the firmware during PROM calls, that simply won't work.
In fact, it will scramble random kernel data during bootup.
The reasoning behind the temporary buffer is entirely historical. It
used to be the case that we had problems referencing dynamic kernel
memory (including the stack) early in the boot process before we
explicitly told the firwmare to switch us over to the kernel trap
table.
So what we did was always give the firmware buffers that were locked
into the main kernel image.
But we no longer have problems like that, so get rid of all of this
indirect bounce buffering.
Besides fixing Meelis's bug, this also makes the kernel data about 3K
smaller.
It was also discovered during these conversions that the
implementation of prom_retain() was completely wrong, so that was
fixed here as well. Currently that interface is not in use.
Reported-by: Meelis Roos <mroos@linux.ee> Tested-by: Meelis Roos <mroos@linux.ee> Signed-off-by: David S. Miller <davem@davemloft.net>
Grant Likely [Wed, 18 Aug 2010 08:27:55 +0000 (08:27 +0000)]
powerpc/pci: Fix checking for child bridges in PCI code.
pci_device_to_OF_node() can return null, and list_for_each_entry will
never enter the loop when dev is NULL, so it looks like this test is
a typo.
Reported-by: Julia Lawall <julia@diku.dk> Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Matt Evans [Thu, 12 Aug 2010 20:58:28 +0000 (20:58 +0000)]
powerpc: Initialise paca->kstack before early_setup_secondary
As early setup calls down to slb_initialize(), we must have kstack
initialised before checking "should we add a bolted SLB entry for our kstack?"
Failing to do so means stack access requires an SLB miss exception to refill
an entry dynamically, if the stack isn't accessible via SLB(0) (kernel text
& static data). It's not always allowable to take such a miss, and
intermittent crashes will result.
Primary CPUs don't have this issue; an SLB entry is not bolted for their
stack anyway (as that lives within SLB(0)). This patch therefore only
affects the init of secondaries.
Signed-off-by: Matt Evans <matt@ozlabs.org> Cc: stable <stable@kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Huge unexplained gaps in what should be an empty TCE table. It turns out
it_blocksize, the amount we want to align the next allocation to, was c0000000fe903b20. Completely bogus.
Initialise it to something reasonable in the VIO IOMMU code, and use kzalloc
everywhere to protect against this when we next add a non compulsary
field to iommu code and forget to initialise it.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Fri, 6 Aug 2010 03:28:19 +0000 (03:28 +0000)]
powerpc: Inline ppc64_runlatch_off
I'm sick of seeing ppc64_runlatch_off in our profiles, so inline it
into the callers. To avoid a mess of circular includes I didn't add
it as an inline function.
Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Olof Johansson <olof@lixom.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Nathan Fontenot [Thu, 5 Aug 2010 07:42:11 +0000 (07:42 +0000)]
powerpc: Correct smt_enabled=X boot option for > 2 threads per core
The 'smt_enabled=X' boot option does not handle values of X > 2.
For Power 7 processors with smt modes of 0,1,2,3, and 4 this does
not work. This patch allows the smt_enabled option to be set to
any value limited to a max equal to the number of threads per
core.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Silence xics_migrate_irqs_away() during cpu offline
All IRQs are migrated away from a CPU that is being offlined so the
following messages suggest a problem when the system is behaving as
designed:
IRQ 262 affinity broken off cpu 1
IRQ 17 affinity broken off cpu 0
IRQ 18 affinity broken off cpu 0
IRQ 19 affinity broken off cpu 0
IRQ 256 affinity broken off cpu 0
IRQ 261 affinity broken off cpu 0
IRQ 262 affinity broken off cpu 0
Don't print these messages when the CPU is not online.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nathan Fontenot <nfont@austin.ibm.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
powerpc: Silence __cpu_up() under normal operation
During CPU offline/online tests __cpu_up would flood the logs with
the following message:
Processor 0 found.
This provides no useful information to the user as there is no context
provided, and since the operation was a success (to this point) it is expected
that the CPU will come back online, providing all the feedback necessary.
Change the "Processor found" message to DBG() similar to other such messages in
the same function. Also, add an appropriate log level for the "Processor is
stuck" message.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nathan Fontenot <nfont@austin.ibm.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
start_secondary() is called shortly after _start and also via
cpu_idle()->cpu_die()->pseries_mach_cpu_die()
start_secondary() expects a preempt_count() of 0. pseries_mach_cpu_die() is
called via the cpu_idle() routine with preemption disabled, resulting in the
following repeating message during rapid cpu offline/online tests
with CONFIG_PREEMPT=y:
Move the cpu_die() call inside the existing preemption enabled block of
cpu_idle(). This is safe as the idle task is affined to a single CPU so the
debug_smp_processor_id() tests (from cpu_should_die()) won't trigger as we are
in a "migration disabled" region.
Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Acked-by: Will Schmidt <will_schmidt@vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nathan Fontenot <nfont@austin.ibm.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
for_each_node_by_name(np,...) {
... when != break;
when != goto l;
}
... when != np = E
- of_node_put(np);
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 2 Aug 2010 20:39:41 +0000 (20:39 +0000)]
powerpc/kdump: Stop all other CPUs before running crash handlers
During kdump we run the crash handlers first then stop all other CPUs.
We really want to stop all CPUs as close to the fail as possible and also
have a very controlled environment for running the crash handlers, so it
makes sense to reverse the order.
Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>