Eric Paris [Thu, 13 Aug 2009 13:44:51 +0000 (09:44 -0400)]
Networking: use CAP_NET_ADMIN when deciding to call request_module
The networking code checks CAP_SYS_MODULE before using request_module() to
try to load a kernel module. While this seems reasonable it's actually
weakening system security since we have to allow CAP_SYS_MODULE for things
like /sbin/ip and bluetoothd which need to be able to trigger module loads.
CAP_SYS_MODULE actually grants those binaries the ability to directly load
any code into the kernel. We should instead be protecting modprobe and the
modules on disk, rather than granting random programs the ability to load code
directly into the kernel. Instead we are going to gate those networking checks
on CAP_NET_ADMIN which still limits them to root but which does not grant
those processes the ability to load arbitrary code into the kernel.
Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Paul Moore <paul.moore@hp.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: James Morris <jmorris@namei.org>
Linus Torvalds [Mon, 10 Aug 2009 20:21:19 +0000 (13:21 -0700)]
pty: fix data loss when stopped (^S/^Q)
Commit d945cb9cc ("pty: Rework the pty layer to use the normal buffering
logic") dropped the test for 'tty->stopped' in pty_write_room(), which
then causes the n_tty line discipline thing to not throttle the data
properly when the tty is stopped.
So instead of pausing the write due to the tty being stopped, the ldisc
layer would go ahead and push it down to the pty. The pty write()
routine would then refuse to take the data (because it _did_ check
'stopped'), and the data wouldn't actually be written.
This whole stopped test should eventually be moved into the tty ldisc
layer rather than have low-level tty drivers care about these things,
but right now the fix is to just re-instate the missing pty 'stopped'
handling.
Reported-and-tested-by: Artur Skawina <art.08.09@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 10 Aug 2009 18:48:51 +0000 (11:48 -0700)]
Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (27 commits)
perf_counter: Zero dead bytes from ftrace raw samples size alignment
perf_counter: Subtract the buffer size field from the event record size
perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data
perf_counter: Correct PERF_SAMPLE_RAW output
perf tools: callchain: Fix bad rounding of minimum rate
perf_counter tools: Fix libbfd detection for systems with libz dependency
perf: "Longum est iter per praecepta, breve et efficax per exempla"
perf_counter: Fix a race on perf_counter_ctx
perf_counter: Fix tracepoint sampling to be part of generic sampling
perf_counter: Work around gcc warning by initializing tracepoint record unconditionally
perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode
perf tools: callchain: Fix 'perf report' display to be callchain by default
perf tools: callchain: Fix spurious 'perf report' warnings: ignore empty callchains
perf record: Fix the -A UI for empty or non-existent perf.data
perf util: Fix do_read() to fail on EOF instead of busy-looping
perf list: Fix the output to not include tracepoints without an id
perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
perf stat: Fix tool option consistency: rename -S/--scale to -c/--scale
perf report: Add debug help for the finding of symbol bugs - show the symtab origin (DSO, build-id, kernel, etc)
perf report: Fix per task mult-counter stat reporting
...
Linus Torvalds [Mon, 10 Aug 2009 18:00:37 +0000 (11:00 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI hotplug: SGI hotplug: do not use hotplug_slot_attr
PCI hotplug: SGI hotplug: fix build failure
Wei Chong Tan reported a fast-PIT-calibration corner-case:
| pit_expect_msb() is vulnerable to SMI disturbance corner case
| in some platforms which causes /proc/cpuinfo to show wrong
| CPU MHz value when quick_pit_calibrate() jumps to success
| section.
I think that the real issue isn't even an SMI - but the fact
that in the very last iteration of the loop, there's no
serializing instruction _after_ the last 'rdtsc'. So even in
the absense of SMI's, we do have a situation where the cycle
counter was read without proper serialization.
The last check should be done outside the outer loop, since
_inside_ the outer loop, we'll be testing that the PIT has
the right MSB value has the right value in the next iteration.
So only the _last_ iteration is special, because that's the one
that will not check the PIT MSB value any more, and because the
final 'get_cycles()' isn't serialized.
In other words:
- I'd like to move the PIT MSB check to after the last
iteration, rather than in every iteration
- I think we should comment on the fact that it's also a
serializing instruction and so 'fences in' the TSC read.
Linus Torvalds [Mon, 10 Aug 2009 16:00:47 +0000 (09:00 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
mm_for_maps: take ->cred_guard_mutex to fix the race with exec
mm_for_maps: shift down_read(mmap_sem) to the caller
mm_for_maps: simplify, use ptrace_may_access()
Figo.zhang [Sat, 8 Aug 2009 13:01:22 +0000 (21:01 +0800)]
mempool.c: clean up type-casting
clean up type-casting twice. "size_t" is typedef as "unsigned long" in
64-bit system, and "unsigned int" in 32-bit system, and the intermediate
cast to 'long' is pointless.
perf_counter: Zero dead bytes from ftrace raw samples size alignment
After aligning the ftrace raw samples, there are dead bytes storing
random data from the stack. We don't want to leak these to userspace,
then zero these out.
perf_counter: Subtract the buffer size field from the event record size
We compute the perf raw sample size by aligning the raw ftrace
event size plus the buffer size field itself. We do that
instead of aligning only the perf raw sample size, so that we
might economize some in some cases.
But this buffer size field is not stored in the perf raw
sample, we must then substract its size from the buffer once we
computed the alignment unless we may get a useless u32 field in
the buffer.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <20090810141129.GA5124@nowhere> Signed-off-by: Ingo Molnar <mingo@elte.hu>
mm_for_maps: take ->cred_guard_mutex to fix the race with exec
The problem is minor, but without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
Now we do not need to re-check task->mm after ptrace_may_access(), it
can't be changed to the new mm under us.
Strictly speaking, this also fixes another very minor problem. Unless
security check fails or the task exits mm_for_maps() should never
return NULL, the caller should get either old or new ->mm.
mm_for_maps: shift down_read(mmap_sem) to the caller
mm_for_maps() takes ->mmap_sem after security checks, this looks
strange and obfuscates the locking rules. Move this lock to its
single caller, m_start().
Oleg Nesterov [Tue, 23 Jun 2009 19:25:32 +0000 (21:25 +0200)]
mm_for_maps: simplify, use ptrace_may_access()
It would be nice to kill __ptrace_may_access(). It requires task_lock(),
but this lock is only needed to read mm->flags in the middle.
Convert mm_for_maps() to use ptrace_may_access(), this also simplifies
the code a little bit.
Also, we do not need to take ->mmap_sem in advance. In fact I think
mm_for_maps() should not play with ->mmap_sem at all, the caller should
take this lock.
With or without this patch, without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
Peter Zijlstra [Mon, 10 Aug 2009 09:16:52 +0000 (11:16 +0200)]
perf_counter: Correct PERF_SAMPLE_RAW output
PERF_SAMPLE_* output switches should unconditionally output the
correct format, as they are the only way to unambiguously parse
the PERF_EVENT_SAMPLE data.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249896447.17467.74.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
powerpc/dma: pci_set_dma_mask() shouldn't fail if mask fits in RAM
On an iMac G5, the b43 driver is failing to initialise because trying to
set the dma mask to 30-bit fails. Even though there's only 512MiB of RAM
in the machine anyway:
https://bugzilla.redhat.com/show_bug.cgi?id=514787
We should probably let it succeed if the available RAM in the system
doesn't exceed the requested limit.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
mm_for_maps: take ->cred_guard_mutex to fix the race with exec
The problem is minor, but without ->cred_guard_mutex held we can race
with exec() and get the new ->mm but check old creds.
Now we do not need to re-check task->mm after ptrace_may_access(), it
can't be changed to the new mm under us.
Strictly speaking, this also fixes another very minor problem. Unless
security check fails or the task exits mm_for_maps() should never
return NULL, the caller should get either old or new ->mm.
mm_for_maps: shift down_read(mmap_sem) to the caller
mm_for_maps() takes ->mmap_sem after security checks, this looks
strange and obfuscates the locking rules. Move this lock to its
single caller, m_start().
Linus Torvalds [Sun, 9 Aug 2009 21:58:21 +0000 (14:58 -0700)]
Merge branch 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.31' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Avoid redelivery of edge interrupt before next edge
KVM: MMU: limit rmap chain length
KVM: ia64: fix build failures due to ia64/unsigned long mismatches
KVM: Make KVM_HPAGES_PER_HPAGE unsigned long to avoid build error on powerpc
KVM: fix ack not being delivered when msi present
KVM: s390: fix wait_queue handling
KVM: VMX: Fix locking imbalance on emulation failure
KVM: VMX: Fix locking order in handle_invalid_guest_state
KVM: MMU: handle n_free_mmu_pages > n_alloc_mmu_pages in kvm_mmu_change_mmu_pages
KVM: SVM: force new asid on vcpu migration
KVM: x86: verify MTRR/PAT validity
KVM: PIT: fix kpit_elapsed division by zero
KVM: Fix KVM_GET_MSR_INDEX_LIST
Linus Torvalds [Sun, 9 Aug 2009 21:57:41 +0000 (14:57 -0700)]
Merge branch 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
posix_cpu_timers_exit_group(): Do not use thread_group_cputimer()
Linus Torvalds [Sun, 9 Aug 2009 21:57:26 +0000 (14:57 -0700)]
Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Fix/complete ftrace event records sampling
perf_counter, ftrace: Fix perf_counter integration
tracing/filters: Always free pred on filter_add_subsystem_pred() failure
tracing/filters: Don't use pred on alloc failure
ring-buffer: Fix memleak in ring_buffer_free()
tracing: Fix recordmcount.pl to handle sections with only weak functions
ring-buffer: Fix advance of reader in rb_buffer_peek()
tracing: do not use functions starting with .L in recordmcount.pl
ring-buffer: do not disable ring buffer on oops_in_progress
ring-buffer: fix check of try_to_discard result
Linus Torvalds [Sun, 9 Aug 2009 21:57:09 +0000 (14:57 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: fix buffer overflow in efi_init()
x86: Add quirk to make Apple MacBookPro5,1 use reboot=pci
x86: Fix MSI-X initialization by using online_mask for x2apic target_cpus
x86: Fix VMI && stack protector
Linus Torvalds [Sun, 9 Aug 2009 21:56:51 +0000 (14:56 -0700)]
Merge branch 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: Fix typos in documentation
lockdep: Fix file mode of lock_stat
rtmutex: Avoid deadlock in rt_mutex_start_proxy_lock()
It is abnormal to get a 7.14% branch whereas we passed a 10%
filter.
The problem is that we round down the minimum threshold. This
happens mostly when we have very low number of events. If the
total amount of your branch is 4 and you have a subranch of 3
events, filtering to 90% will be computed like follows:
limit = 4 * 0.9;
The result is about 3.6, but the cast to integer will round
down to 3. It means that our filter is actually of 75%
We must then explicitly round up the minimum threshold.
Peter Zijlstra [Fri, 7 Aug 2009 17:49:01 +0000 (19:49 +0200)]
perf_counter: Fix a race on perf_counter_ctx
While extending perfcounters with BTS hw-tracing, Markus
Metzger managed to trigger this warning:
[ 995.557128] WARNING: at kernel/perf_counter.c:1191 __perf_counter_task_sched_out+0x48/0x6b()
triggers because commit 9f498cc5be7e013d8d6e4c616980ed0ffc8680d2 (perf_counter: Full
task tracing) removed clearing of tsk->perf_counter_ctxp out
from under ctx->lock which introduced a race (against
perf_lock_task_context).
Move it back and deal with the exit notification by explicitly
passing along the former task context.
Reported-by: Markus T Metzger <markus.t.metzger@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249667341.17467.5.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_counter: Fix tracepoint sampling to be part of generic sampling
Based on Peter's comments, make tracepoint sampling generic
just like all the other sampling bits are. This is a rename
with no code changes:
- PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW
- struct perf_tracepoint_record to perf_raw_record
We want the system in place that transport tracepoints raw
samples events into the perf ring buffer to be generalized and
usable by any type of counter.
Reported-by; Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249698400-5441-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf_counter: Work around gcc warning by initializing tracepoint record unconditionally
Despite that the tracepoint record is always present when the
PERF_SAMPLE_TP_RECORD flag is set, gcc raises a warning,
thinking it might not be initialized:
kernel/perf_counter.c: In function ‘perf_counter_output’:
kernel/perf_counter.c:2650: warning: ‘tp’ may be used uninitialized in this function
Then, initialize it to NULL and always check if it's not NULL
before dereference it.
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1249698400-5441-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf tools: callchain: Fix sum of percentages to be 100% by displaying amount of ignored chains in fractal mode
When we filter the callchains below a given percentage, we
ignore them and the end result only shows entries that have an
upper percentage than the filter threshold.
It seems to users then that we have an imbalance in the
percentage, as if the sum inside a profiled branch doesn't
reach 100%.
Since in the past there have been real perf report bugs that
showed the same sypmtom, it would be nice to assure the user
that the data is perfect and trustable and it all sums up to
100.00%.
So fix this by displaying the remaining hits that have been
filtered but without more detail than their amount in each
branches. Example while filtering below 50%:
perf tools: callchain: Fix 'perf report' display to be callchain by default
If we recorded with -g option to record the callchain, right now
we require a -g option to perf report as well - and people reported
this as unnecessary complication: the user already specified -g
once, no need to require it a second time.
So if the recording includes call-chains, display the callchain by
default from perf report.
( The user can override this default using "-g none" option from
perf report. )
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249690585-9145-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
When the callchain tree comes to insert an empty backtrace, it
raises a spurious warning about the fact we are inserting an
empty. This is spurious because the radix tree assumes it did
something wrong to reach this state. But it didn't, we just met
an empty callchain that has to be ignored.
This happens occasionally with certain types of call-chain
recordings. If it happens it's a big nuisance as perf report
output starts with thousands of warning lines.
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <1249690585-9145-2-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pierre Habouzit [Fri, 7 Aug 2009 12:16:01 +0000 (14:16 +0200)]
perf record: Fix the -A UI for empty or non-existent perf.data
1. Ignore the -A argument if there is no perf.data file
2. Treat an empty file like a non existent file.
Else, perf will try to read the perf.data header, and fail with
an error.
Treating an empty file like a non-existent file makes sense,
since an interupted (as in SIGKILLed) perf could leave such
files around, and you don't want to annoy the user with errors
for files with no data in it.
Signed-off-by: Pierre Habouzit <pierre.habouzit@intersec.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 6 Aug 2009 14:48:54 +0000 (16:48 +0200)]
perf list: Fix the output to not include tracepoints without an id
Stop perf list from displaying tracepoints without an id file,
those are special tracepoints that are not interfaced to
perfcounters so listing them is erroneous and passing them as
events will produce no output.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jason Baron <jbaron@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Paul Mackerras [Fri, 7 Aug 2009 06:59:45 +0000 (16:59 +1000)]
perf_counter/powerpc: Fix oops on cpus without perf_counter hardware support
If we have the powerpc perf_counter backend compiled in, but
the cpu we are running on is one where we don't support the
PMU, we currently oops in hw_perf_group_sched_in if we try to
use any counters, because ppmu is NULL in that case, and we
unconditionally dereference ppmu.
This fixes the problem by adding a check if ppmu is NULL at the
beginning of hw_perf_group_sched_in, and also at the beginning
of the other functions that get called from the perf_counter
core, i.e. hw_perf_disable, hw_perf_enable, and
hw_perf_counter_setup.
Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: benh@kernel.crashing.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
perf tools: Fix call-chain cumul hit based sub-total (fractal mode)
The callchain fractal mode builds each new total hits in a new
branch of profiling by using the parent's hits of the current
branch plus the hits of the children.
This is wrong, the total hits of a branch should be made of the
sum of every children hits, we must ignore the parent hits in
this scope.
This patch also fixes another mistake with the hit counting.
Now the rates are correct.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Galbraith [Tue, 4 Aug 2009 08:21:23 +0000 (10:21 +0200)]
perf top: Improve interactive key handling
Pressing any key which is not currently mapped to
functionality, based on startup command line options, displays
currently mapped keys, and prompts for input.
Pressing any unmapped key at the prompt returns the user to
display mode with variables unchanged. eg, pressing ? <SPACE>
<ESC> etc displays currently available keys, the value of the
variable associated with that key, and prompts.
Pressing same again aborts input.
Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Wed, 22 Jul 2009 07:29:32 +0000 (09:29 +0200)]
perf_counter: Fix software counters for fast moving event sources
Reimplement the software counters to deal with fast moving
event sources (such as tracepoints). This means being able
to generate multiple overflows from a single 'event' as well
as support throttling.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Mike Galbraith [Wed, 22 Jul 2009 18:36:03 +0000 (20:36 +0200)]
perf_counter tools: Fix/resurrect perf top annotation in a simple interactive form
perf top used to have annotation support, but it has bitrotted and
removed.
This patch restores that: it allows the user to select any symbol
in kernel space for source level annotation on the fly, switch
between event counters and alter display variables. When symbol
details are being displayed, stopping annotation reverts to normal.
known keys:
[d] select display delay.
[e] select display entries (lines).
[E] select annotation event counter.
[f] select normal display count filter.
[F] select annotation display count filter (percentage).
[qQ] quit.
[s] select annotation symbol and start annotation.
[S] stop annotation, revert to normal display.
[z] toggle event count zeroing.
perf_counter: Fix/complete ftrace event records sampling
This patch implements the kernel side support for ftrace event
record sampling.
A new counter sampling attribute is added:
PERF_SAMPLE_TP_RECORD
which requests ftrace events record sampling. In this case
if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
fires, we emit the tracepoint binary record to the
perfcounter event buffer, as a sample.
Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
record:
perf record -f -F 1 -a -e workqueue:workqueue_execution
perf report -D
- Userspace support ('perf trace'), 'flight data recorder' mode
for perf trace, etc.
- The unconditional copy from the profiling callback brings
some costs however if someone wants no such sampling to
occur, and needs to be fixed in the future. For that we need
to have an instant access to the perf counter attribute.
This is a matter of a flag to add in the struct ftrace_event.
- Take care of the events recursivity! Don't ever try to record
a lock event for example, it seems some locking is used in
the profiling fast path and lead to a tracing recursivity.
That will be fixed using raw spinlock or recursivity
protection.
- [...]
- Profit! :-)
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Which, when specified make the swcounter increment with @foo instead
of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address
associated with the event) when this triggers a counter overflow.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
KVM: Avoid redelivery of edge interrupt before next edge
The check for an edge is broken in current ioapic code. ioapic->irr is
cleared on each edge interrupt by ioapic_service() and this makes
old_irr != ioapic->irr condition in kvm_ioapic_set_irq() to be always
true. The patch fixes the code to properly recognise edge.
Some HW emulation calls set_irq() without level change. If each such
call is propagated to an OS it may confuse a device driver. This is the
case with keyboard device emulation and Windows XP x64 installer on SMP VM.
Each keystroke produce two interrupts (down/up) one interrupt is
submitted to CPU0 and another to CPU1. This confuses Windows somehow
and it ignores keystrokes.
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Paul Rolland [Sun, 9 Aug 2009 02:24:01 +0000 (12:24 +1000)]
drm: silence pointless vblank warning.
Some applications/hardware combinations are triggering the message "failed to
acquire vblank counter" to be issued up to 20 times a second, which makes it
both useless and dangerous, as this may hide other important messages.
This changes makes it only appear when people are debugging.
Signed-off-by: Paul Rolland <rol@as2917.net> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Lost-twice-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
Keith Packard [Mon, 20 Jul 2009 21:49:17 +0000 (14:49 -0700)]
drm: When adding probed modes, preserve duplicate mode types
The code which takes probed modes and adds them to a connector eliminates
duplicate modes by comparing them using drm_mode_equal. That function
doesn't consider the type bits, which means that any modes which differ only
in the type field will be lost.
One of the bits in the mode->type field is the DRM_MODE_TYPE_PREFERRED bit.
If the mode with that bit is lost, then higher level code will not know
which mode to select, causing a random mode to be used instead.
This patch simply merges the two mode type bits together; that seems
reasonable to me, but perhaps only a subset of the bits should be used? None
of these can be user defined as they all come from looking at just the
hardware.
Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
posix_cpu_timers_exit_group(): Do not use thread_group_cputimer()
When the process exits we don't have to run new cputimer nor
use running one (as it not accounts when tsk->exit_state != 0)
to get process CPU times. As there is only one thread we can
just use CPU times fields from task and signal structs.
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Roland McGrath <roland@redhat.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Sat, 8 Aug 2009 15:49:53 +0000 (10:49 -0500)]
tracing/filters: Always free pred on filter_add_subsystem_pred() failure
If filter_add_subsystem_pred() fails due to ENOSPC or ENOMEM,
the pred doesn't get freed, while as a side effect it does for
other errors. Make it so the caller always frees the pred for
any error.
Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <1249746593.6453.32.camel@tropicana> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tom Zanussi [Sat, 8 Aug 2009 15:49:09 +0000 (10:49 -0500)]
tracing/filters: Don't use pred on alloc failure
Dan Carpenter sent me a fix to prevent pred from being used if
it couldn't be allocated. I noticed the same problem also
existed for the create_pred() case and added a fix for that.
Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <1249746549.6453.29.camel@tropicana> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Yinghai Lu [Tue, 4 Aug 2009 15:59:59 +0000 (08:59 -0700)]
x86: Fix MSI-X initialization by using online_mask for x2apic target_cpus
found a system where x2apic reports an MSI-X irq initialization
failure:
[ 302.859446] igbvf 0000:81:10.4: enabling device (0000 -> 0002)
[ 302.874369] igbvf 0000:81:10.4: using 64bit DMA mask
[ 302.879023] igbvf 0000:81:10.4: using 64bit consistent DMA mask
[ 302.894386] igbvf 0000:81:10.4: enabling bus mastering
[ 302.898171] igbvf 0000:81:10.4: setting latency timer to 64
[ 302.914050] reserve_memtype added 0xefb08000-0xefb0c000, track uncached-minus, req uncached-minus, ret uncached-minus
[ 302.933839] reserve_memtype added 0xefb28000-0xefb29000, track uncached-minus, req uncached-minus, ret uncached-minus
[ 302.940367] alloc irq_desc for 265 on node 4
[ 302.956874] alloc kstat_irqs on node 4
[ 302.959452] alloc irq_2_iommu on node 0
[ 302.974328] igbvf 0000:81:10.4: irq 265 for MSI/MSI-X
[ 302.977778] alloc irq_desc for 266 on node 4
[ 302.980347] alloc kstat_irqs on node 4
[ 302.995312] free_memtype request 0xefb28000-0xefb29000
[ 302.998816] igbvf 0000:81:10.4: Failed to initialize MSI-X interrupts.
... it turns out that when trying to enable MSI-X,
__assign_irq_vector(new, cfg_new, apic->target_cpus()) can not
get vector because for x2apic target-cpus returns cpumask_of(0)
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
USB: fix oops on disconnect in cdc-acm
USB: storage: include Prolific Technology USB drive in unusual_devs list
USB: ftdi_sio: add product_id for Marvell OpenRD Base, Client
USB: ftdi_sio: add vendor and product id for Bayer glucose meter serial converter cable
USB: EHCI: fix counting of transaction error retries
USB: EHCI: fix two new bugs related to Clear-TT-Buffer
USB: usbfs: fix -ENOENT error code to be -ENODEV
USB: musb: fix the nop registration for OMAP3EVM
USB: devio: Properly do access_ok() checks
USB: pl2303: New vendor and product id
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6:
Staging: rspiusb: Fix buffer overflow
staging: add dependencies on PCI for drivers that require it
Staging: rtl8192su: fix build error
Staging: rt2870: Revert d44ca7 Removal of kernel_thread() API
Staging: rt2870: Add USB ID for Linksys, Planex Communications, Belkin
Linus Torvalds [Sat, 8 Aug 2009 02:03:59 +0000 (19:03 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (22 commits)
drm/i915: Fix read outside array bounds in restoring the SWF10 range.
drm/i915: Use our own workqueue to avoid wedging the system along with the GPU.
drm/i915: Add support for dual-channel LVDS on 8xx.
drm/i915: Return disconnected for SDVO DVI when there's no digital EDID.
drm/i915: Choose real sdvo output according to result from detection
drm/i915: Set preferred mode for integrated TV according to TV format
drm/i915: fix 845G FIFO size & burst length
drm/i915: fix VGA detect on IGDNG
drm/i915: Add eDP support on IGDNG mobile chip
drm/i915: enable DisplayPort support on IGDNG
drm/i915: Fix channel ending action for DP aux transaction
drm/i915: fix issue in display pipe setup on IGDNG
drm/i915: disable VGA plane reliably
drm/I915: Fix offset to DVO timings in LVDS data
drm/i915: hdmi detection according by reading edid
drm/i915: correct self-refresh calculation in "everything off" case
drm/i915: handle FIFO oversubsription correctly
drm/i915: FIFO watermark calculation fixes
drm/i915: ignore lvds on AOpen Mini PC MP-915
drm/i915: Allow frame buffers up to 4096x4096 on 915/945 class hardware
...
This fixes a build error when selecting the rtl8192su driver as a
module. This has been reported by me, and the opensuse kernel developer
team, and I finally tracked it down.
Oliver Neukum [Tue, 4 Aug 2009 21:52:09 +0000 (23:52 +0200)]
USB: fix oops on disconnect in cdc-acm
This patch fixes an oops caused when during an unplug a device's table
of endpoints is zeroed before the driver is notified. A pointer to
the endpoint must be cached.
Marko Hänninen [Fri, 31 Jul 2009 19:32:39 +0000 (22:32 +0300)]
USB: ftdi_sio: add vendor and product id for Bayer glucose meter serial converter cable
Attached patch adds USB vendor and product IDs for Bayer's USB to serial
converter cable used by Bayer blood glucose meters. It seems to be a
FT232RL based device and works without any problem with ftdi_sio driver
when this patch is applied. See: http://winglucofacts.com/cables/
Alan Stern [Fri, 31 Jul 2009 14:41:40 +0000 (10:41 -0400)]
USB: EHCI: fix counting of transaction error retries
This patch (as1274) simplifies the counting of transaction-error
retries. Now we will count up from 0 to QH_XACTERR_MAX instead of
down from QH_XACTERR_MAX to 0.
The patch also fixes a small bug: qh->xacterr was not getting
initialized for interrupt endpoints.
Alan Stern [Fri, 31 Jul 2009 14:40:22 +0000 (10:40 -0400)]
USB: EHCI: fix two new bugs related to Clear-TT-Buffer
This patch (as1273) fixes two(!) bugs introduced by the new
Clear-TT-Buffer implementation in ehci-hcd.
It is now possible for an idle QH to have some URBs on its
queue -- this will happen if a Clear-TT-Buffer is pending for
the QH's endpoint. Consequently we should not issue a warning
when someone tries to unlink an URB from an idle QH; instead
we should process the request immediately.
The refcounts for QHs could get messed up, because
submit_async() would increment the refcount when calling
qh_link_async() and qh_link_async() would then refuse to link
the QH into the schedule if a Clear-TT-Buffer was pending.
Instead we should increment the refcount only when the QH
actually is added to the schedule. The current code tries to
be clever by leaving the refcount alone if an unlink is
immediately followed by a relink; the patch changes this to an
unconditional decrement and increment (although they occur in
the opposite order).
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: David Brownell <david-b@pacbell.net> Tested-by: Manuel Lauss <manuel.lauss@gmail.com> Tested-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Thu, 30 Jul 2009 19:28:14 +0000 (15:28 -0400)]
USB: usbfs: fix -ENOENT error code to be -ENODEV
This patch (as1272) changes the error code returned when an open call
for a USB device node fails to locate the corresponding device. The
appropriate error code is -ENODEV, not -ENOENT.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: Kay Sievers <kay.sievers@vrfy.org> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Michael Buesch [Wed, 29 Jul 2009 09:39:03 +0000 (11:39 +0200)]
USB: devio: Properly do access_ok() checks
access_ok() checks must be done on every part of the userspace structure
that is accessed. If access_ok() on one part of the struct succeeded, it
does not imply it will succeed on other parts of the struct. (Does
depend on the architecture implementation of access_ok()).
This changes the __get_user() users to first check access_ok() on the
data structure.
Signed-off-by: Michael Buesch <mb@bu3sch.de> Cc: stable <stable@kernel.org> Cc: Pete Zaitcev <zaitcev@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
I am submitting a patch for the pl2303 driver. This patch adds support
for the "Sony QN-3USB" cable (vendor=0x054c, product=0x0437). This USB
cable is a so-called data cable used to connect a Sony mobile phone to a
computer. Supported models are Sony CMD-J5, J6, J7, J16, J26, J70 and
Z7.
I have used this patch with my Sony CMD-J70 for several days and I
haven't encountered any kernel/hardware issue.
Yan Zheng [Fri, 7 Aug 2009 17:51:33 +0000 (13:51 -0400)]
Btrfs: fix balancing oops when invalidate_inode_pages2 returns EBUSY
invalidate_inode_pages2_range may return -EBUSY occasionally
which results Oops. This patch fixes the issue by moving
invalidate_inode_pages2_range into a loop and keeping calling
it until the return value is not -EBUSY.
The EBUSY return is temporary, and can happen when the btrfs release page
function is unable to release a page because the EXTENT_LOCK
bit is set.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Linus Torvalds [Fri, 7 Aug 2009 17:43:07 +0000 (10:43 -0700)]
Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf_counter: Fix double list iteration in per task precise stats
perf: Auto-detect libelf
perf symbol: Fix symbol parsing in certain cases: use the build-id as a symlink
perf_counter/powerpc: Check oprofile_cpu_type for NULL before using it
ftrace: Fix perf-tracepoint OOPS
perf report: Add missing command line options to man page
perf: Auto-detect libbfd
perf report: Make --sort comm,dso,symbol the default
Linus Torvalds [Fri, 7 Aug 2009 17:42:31 +0000 (10:42 -0700)]
Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
jffs2: Fix return value from jffs2_do_readpage_nolock()
mtd: mtdblock: introduce mtdblks_lock
mtd: remove 'SBC8240 Wind River' Device Driver Code
mtd: OneNAND: OMAP2/3: free GPMC CS on module removal
mtd: OneNAND: fix incorrect bufferram offset
mtd: blkdevs: do not forget to get MTD devices
mtd: fix the conversion from dev to mtd_info
mtd: let include/linux/mtd/partitions.h stand on its own
Linus Torvalds [Fri, 7 Aug 2009 17:41:36 +0000 (10:41 -0700)]
Merge branch 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: setup MC/VRAM the same way for suspend/resume
drm/radeon/kms: Fix caching mode selection for GTT object