Peter Zijlstra [Thu, 14 Oct 2010 06:01:34 +0000 (14:01 +0800)]
irq_work: Add generic hardirq context callbacks
Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.
Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.
The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.
Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
[ various fixes ] Signed-off-by: Huang Ying <ying.huang@intel.com>
LKML-Reference: <1287036094.7768.291.camel@yhuang-dev> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stephane Eranian [Fri, 15 Oct 2010 14:54:01 +0000 (16:54 +0200)]
perf_events: Fix transaction recovery in group_sched_in()
The group_sched_in() function uses a transactional approach to schedule
a group of events. In a group, either all events can be scheduled or
none are. To schedule each event in, the function calls event_sched_in().
In case of error, event_sched_out() is called on each event in the group.
The problem is that event_sched_out() does not completely cancel the
effects of event_sched_in(). Furthermore event_sched_out() changes the
state of the event as if it had run which is not true is this particular
case.
Those inconsistencies impact time tracking fields and may lead to events
in a group not all reporting the same time_enabled and time_running values.
This is demonstrated with the example below:
Notice that in the first group, the last baclears event does not
report the same timings as its siblings.
This issue comes from the fact that tstamp_stopped is updated
by event_sched_out() as if the event had actually run.
To solve the issue, we must ensure that, in case of error, there is
no change in the event state whatsoever. That means timings must
remain as they were when entering group_sched_in().
To do this we defer updating tstamp_running until we know the
transaction succeeded. Therefore, we have split event_sched_in()
in two parts separating the update to tstamp_running.
Similarly, in case of error, we do not want to update tstamp_stopped.
Therefore, we have split event_sched_out() in two parts separating
the update to tstamp_stopped.
Note that the uneven timing between groups is a side effect of
the process spending most of its time sleeping, i.e., not enough
event rotations (but that's a separate issue).
Stephane Eranian [Fri, 15 Oct 2010 13:15:01 +0000 (15:15 +0200)]
perf_events: Fix bogus AMD64 generic TLB events
PERF_COUNT_HW_CACHE_DTLB:READ:MISS had a bogus umask value of 0 which
counts nothing. Needed to be 0x7 (to count all possibilities).
PERF_COUNT_HW_CACHE_ITLB:READ:MISS had a bogus umask value of 0 which
counts nothing. Needed to be 0x3 (to count all possibilities).
Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: <stable@kernel.org> # as far back as it applies
LKML-Reference: <4cb85478.41e9d80a.44e2.3f00@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Stephane Eranian [Fri, 15 Oct 2010 13:26:01 +0000 (15:26 +0200)]
perf_events: Fix bogus context time tracking
You can only call update_context_time() when the context
is active, i.e., the thread it is attached to is still running.
However, perf_event_read() can be called even when the context
is inactive, e.g., user read() the counters. The call to
update_context_time() must be conditioned on the status of
the context, otherwise, bogus time_enabled, time_running may
be returned. Here is an example on AMD64. The task program
is an example from libpfm4. The -p prints deltas every 1s.
Steven Rostedt [Fri, 15 Oct 2010 16:09:25 +0000 (12:09 -0400)]
ftrace: Use objtree for C version of recordmcount
The C version of recordmcount is compiled to a binary, which will
end up located in the objtree. If the kernel is built with O=path,
the srctree will not include the binary recordmcount caller.
Cc: Michal Marek <mmarek@suse.cz> Cc: linux-kbuild@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 15 Oct 2010 15:49:47 +0000 (11:49 -0400)]
ftrace: Do not process kernel/trace/ftrace.o with C recordmcount program
The file kernel/trace/ftrace.c references the mcount() call to
convert the mcount() callers to nops. But because it references
mcount(), the mcount() address is placed in the relocation table.
The C version of recordmcount reads the relocation table of all
object files, and it will add all references to mcount to the
__mcount_loc table that is used to find the places that call mcount()
and change the call to a nop. When recordmcount finds the mcount reference
in kernel/trace/ftrace.o, it saves that location even though the code
is not a call, but references mcount as data.
On boot up, when all calls are converted to nops, the code has a safety
check to determine what op code it is actually replacing before it
replaces it. If that op code at the address does not match, then
a warning is printed and the function tracer is disabled.
The reference to mcount in ftrace.c, causes this warning to trigger,
since the reference is not a call to mcount(). The ftrace.c file is
not compiled with the -pg flag, so no calls to mcount() should be
expected.
This patch simply makes recordmcount.c skip the kernel/trace/ftrace.c
file. This was the same solution used by the perl version of
recordmcount.
Reported-by: Ingo Molnar <mingo@elte.hu> Cc: John Reiser <jreiser@bitwagon.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Anand Gadiyar [Thu, 14 Oct 2010 15:31:42 +0000 (11:31 -0400)]
oprofile: fix linker errors
Commit e9677b3ce (oprofile, ARM: Use oprofile_arch_exit() to
cleanup on failure) caused oprofile_perf_exit to be called
in the cleanup path of oprofile_perf_init. The __exit tag
for oprofile_perf_exit should therefore be dropped.
The same has to be done for exit_driverfs as well, as this
function is called from oprofile_perf_exit. Else, we get
the following two linker errors.
LD .tmp_vmlinux1
`oprofile_perf_exit' referenced in section `.init.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
make: *** [.tmp_vmlinux1] Error 1
LD .tmp_vmlinux1
`exit_driverfs' referenced in section `.text' of arch/arm/oprofile/built-in.o: defined in discarded section `.exit.text' of arch/arm/oprofile/built-in.o
make: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
Anand Gadiyar [Thu, 14 Oct 2010 15:31:43 +0000 (11:31 -0400)]
oprofile: include platform_device.h to fix build break
oprofile_perf.c needs to include platform_device.h
Otherwise we get the following build break.
CC arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.o
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: 'struct platform_device' declared inside parameter list
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:192: warning: its scope is only this definition or declaration, which is probably not what you want
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:201: warning: 'struct platform_device' declared inside parameter list
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:210: error: variable 'oprofile_driver' has initializer but incomplete type
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: unknown field 'driver' specified in initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: extra brace group at end of initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:211: error: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: excess elements in struct initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:213: warning: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: error: unknown field 'resume' specified in initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: excess elements in struct initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:214: warning: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: error: unknown field 'suspend' specified in initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: excess elements in struct initializer
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c:215: warning: (near initialization for 'oprofile_driver')
arch/arm/oprofile/../../../drivers/oprofile/oprofile_perf.c: In function 'init_driverfs':
Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Cc: Matt Fleming <matt@console-pimps.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
Steven Rostedt [Fri, 15 Oct 2010 03:32:44 +0000 (23:32 -0400)]
ftrace: Rename config option HAVE_C_MCOUNT_RECORD to HAVE_C_RECORDMCOUNT
The config option used by archs to let the build system know that
the C version of the recordmcount works for said arch is currently
called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To
be more consistent with the name that all archs may use, it has been
renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since
we are building a C recordmcount and not a mcount_record.
Suggested-by: Ingo Molnar <mingo@elte.hu> Cc: <linux-arch@vger.kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: linux-kbuild@vger.kernel.org Cc: John Reiser <jreiser@bitwagon.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 13 Oct 2010 23:06:14 +0000 (19:06 -0400)]
ftrace: Remove duplicate code for 64 and 32 bit in recordmcount.c
The elf reader for recordmcount.c had duplicate functions for both
32 bit and 64 bit elf handling. This was due to the need of using
the 32 and 64 bit elf structures.
This patch consolidates the two by using macros to define the 32
and 64 bit names in a recordmcount.h file, and then by just defining
a RECORD_MCOUNT_64 macro and including recordmcount.h twice we
create the funtions for both the 32 bit version as well as the
64 bit version using one code source.
Cc: John Reiser <jreiser@bitwagon.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 13 Oct 2010 21:12:30 +0000 (17:12 -0400)]
ftrace/x86: Add support for C version of recordmcount
This patch adds the support for the C version of recordmcount and
compile times show ~ 12% improvement.
After verifying this works, other archs can add:
HAVE_C_MCOUNT_RECORD
in its Kconfig and it will use the C version of recordmcount
instead of the perl version.
Cc: <linux-arch@vger.kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: linux-kbuild@vger.kernel.org Cc: John Reiser <jreiser@bitwagon.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
John Reiser [Wed, 13 Oct 2010 19:12:54 +0000 (15:12 -0400)]
ftrace: Add C version of recordmcount compile time code
Currently, the mcount callers are found with a perl script that does
an objdump on every file in the kernel. This is a C version of that
same code which should increase the performance time of compiling
the kernel with dynamic ftrace enabled.
Signed-off-by: John Reiser <jreiser@bitwagon.com>
[ Updated the code to include .text.unlikely section as well as
changing the format to follow Linux coding style. ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
x86: Barf when vmalloc and kmemcheck faults happen in NMI
In x86, faults exit by executing the iret instruction, which then
reenables NMIs if we faulted in NMI context. Then if a fault
happens in NMI, another NMI can nest after the fault exits.
But we don't yet support nested NMIs because we have only one NMI
stack. To prevent from that, check that vmalloc and kmemcheck
faults don't happen in this context. Most of the other kernel faults
in NMIs can be more easily spotted by finding explicit
copy_from,to_user() calls on review.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Masami Hiramatsu [Thu, 14 Oct 2010 03:10:36 +0000 (12:10 +0900)]
x86: Use __stop_machine() in text_poke_smp()
Use __stop_machine() in text_poke_smp() because the caller
must get online_cpus before calling text_poke_smp(), but
stop_machine() do it again. We don't need it.
Masami Hiramatsu [Thu, 14 Oct 2010 03:10:30 +0000 (12:10 +0900)]
stopmachine: Define __stop_machine when CONFIG_STOP_MACHINE=n
Define dummy __stop_machine() function even when
CONFIG_STOP_MACHINE=n. This getcpu-required version of
stop_machine() will be used from poke_text_smp().
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20101014031030.4100.34156.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Masami Hiramatsu [Thu, 14 Oct 2010 03:10:24 +0000 (12:10 +0900)]
kprobes: Fix selftest to clear flags field for reusing probes
Fix selftest to clear flags field for reusing probes
because the flags field can be modified by Kprobes.
This also set NULL to kprobe.addr instead of 0.
Masami Hiramatsu [Thu, 14 Oct 2010 03:10:18 +0000 (12:10 +0900)]
kprobes: Update document about irq disabled state in kprobe handler
Update kprobes.txt about interrupts disabled state inside
kprobes handlers, because optimized probe/boosted kretprobe
run without disabling interrrupts on x86.
Ingo Molnar [Thu, 14 Oct 2010 06:09:42 +0000 (08:09 +0200)]
perf, ARM: Fix sysfs bits removal build failure
Fix this linux-next build failure that Stephen reported:
arch/arm/kernel/perf_event.c: In function 'armpmu_event_init':
arch/arm/kernel/perf_event.c:543: error: request for member 'num_events' in something not a structure or union
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org>
LKML-Reference: <20101014164925.4fa16b75.sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
tracing: Fix function-graph build warning on 32-bit
Fix
kernel/trace/trace_functions_graph.c: In function ‘trace_print_graph_duration’:
kernel/trace/trace_functions_graph.c:652: warning: comparison of distinct pointer types lacks a cast
when building 36-rc6 on a 32-bit due to the strict type check failing
in the min() macro.
Robert Richter [Mon, 4 Oct 2010 19:09:36 +0000 (21:09 +0200)]
oprofile: disable write access to oprofilefs while profiler is running
Oprofile counters are setup when profiling is disabled. Thus, writing
to oprofilefs has no immediate effect. Changes are updated only after
oprofile is reenabled.
To keep userland and kernel states synchronized, we now allow
configuration of oprofile only if profiling is disabled. In this case
it checks if the profiler is running and then disables write access to
oprofilefs by returning -EBUSY. The change should be backward
compatible with current oprofile userland daemon.
Acked-by: Maynard Johnson <maynardj@us.ibm.com> Cc: William Cohen <wcohen@redhat.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
Matt Fleming [Fri, 10 Sep 2010 19:36:23 +0000 (20:36 +0100)]
sh: oprofile: Use perf-events oprofile backend
Now that we've got a generic perf-events based oprofile backend we might
as well make use of it seeing as SH doesn't do anything special with its
oprofile backend. Also introduce a new CONFIG_HW_PERF_EVENTS symbol so
that we can fallback to using the timer interrupt for oprofile if the
CPU doesn't support perf events.
Also, to avoid a section mismatch warning we need to annotate
oprofile_arch_exit() with an __exit marker.
Signed-off-by: Matt Fleming <matt@console-pimps.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Robert Richter <robert.richter@amd.com>
Matt Fleming [Mon, 27 Sep 2010 19:45:08 +0000 (20:45 +0100)]
oprofile: Abstract the perf-events backend
Move the perf-events backend from arch/arm/oprofile into
drivers/oprofile so that the code can be shared between architectures.
This allows each architecture to maintain only a single copy of the PMU
accessor functions instead of one for both perf and OProfile. It also
becomes possible for other architectures to delete much of their
OProfile code in favour of the common code now available in
drivers/oprofile/oprofile_perf.c.
Signed-off-by: Matt Fleming <matt@console-pimps.org> Tested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
Matt Fleming [Mon, 27 Sep 2010 19:35:29 +0000 (20:35 +0100)]
ARM: oprofile: Move non-ARM code into separate init/exit
In preparation for moving the majority of this oprofile code into an
architecture-neutral place separate the architecture-independent code
into oprofile_perf_init() and oprofile_perf_exit().
Signed-off-by: Matt Fleming <matt@console-pimps.org> Tested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
Matt Fleming [Mon, 27 Sep 2010 19:29:58 +0000 (20:29 +0100)]
ARM: oprofile: Rename op_arm to oprofile_perf
In preparation for moving the generic functions out of this file, give
the functions more general names (e.g. remove "arm" from the names).
Signed-off-by: Matt Fleming <matt@console-pimps.org> Tested-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robert Richter <robert.richter@amd.com>
Matt Fleming [Fri, 8 Oct 2010 20:42:17 +0000 (21:42 +0100)]
oprofile: Make op_name_from_perf_id() global
Make op_name_from_perf_id() global so that we have a way for each
architecture to construct an oprofile name for op->cpu_type. We need to
remove the argument from the function prototype so that we can hide all
implementation details inside the function.
Signed-off-by: Matt Fleming <matt@console-pimps.org> Signed-off-by: Robert Richter <robert.richter@amd.com>
Matt Fleming [Sun, 3 Oct 2010 20:41:13 +0000 (21:41 +0100)]
perf: New helper function for pmu name
Introduce perf_pmu_name() helper function that returns the name of the
pmu. This gives us a generic way to get the name of a pmu regardless of
how an architecture identifies it internally.
Signed-off-by: Matt Fleming <matt@console-pimps.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Robert Richter <robert.richter@amd.com>
Matt Fleming [Mon, 27 Sep 2010 19:22:24 +0000 (20:22 +0100)]
perf: Add helper function to return number of counters
The number of counters for the registered pmu is needed in a few places
so provide a helper function that returns this number.
Signed-off-by: Matt Fleming <matt@console-pimps.org> Tested-by: Will Deacon <will.deacon@arm.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Robert Richter <robert.richter@amd.com>
Linus Torvalds [Wed, 6 Oct 2010 20:27:19 +0000 (13:27 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
MIPS: Octeon: Place cnmips_cu2_setup in __init memory.
MIPS: Don't place cu2 notifiers in __cpuinitdata
MIPS: Calculate VMLINUZ_LOAD_ADDRESS based on the length of vmlinux.bin
MIPS: Alchemy: Resolve prom section mismatches
MIPS: Fix syscall 64 bit number comments.
MIPS: Hookup fanotify_init, fanotify_mark, and prlimit64 syscalls.
MIPS: TX49xx: Rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN
MIPS: N32: Fix getdents64 syscall for n32
MIPS: Remove pr_<level> uses of KERN_<level>
MIPS: PNX8550: Sort out machine halt, restart and powerdown functions.
MIPS: GIC: Remove dependencies from Malta files.
MIPS: Kconfig: Fix and clarify kconfig help text for VSMP and SMTC.
MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.
MIPS: Audit: Fix hang in entry.S.
MIPS: Document why RELOC_HIDE is there.
MIPS: Octeon: Determine if helper needs to be built
MIPS: Use generic atomic64 for 32-bit kernels
MIPS: RM7000: Symbol should be static
MIPS: kspd: Adjust confusing if indentation
MIPS: Fix a typo.
Linus Torvalds [Wed, 6 Oct 2010 16:51:28 +0000 (09:51 -0700)]
Merge branch 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm
* 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
xen: do not initialize PV timers on HVM if !xen_have_vector_callback
xen: do not set xenstored_ready before xenbus_probe on hvm
Since powerpc uses -Werror on arch powerpc, the build was broken like
this:
cc1: warnings being treated as errors
arch/powerpc/kernel/module.c: In function 'module_finalize':
arch/powerpc/kernel/module.c:66: error: unused variable 'err'
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 5 Oct 2010 18:57:37 +0000 (11:57 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf trace scripting: Fix extern struct definitions
perf ui hist browser: Fix segfault on 'a' for annotate
perf tools: Fix build breakage
perf, x86: Handle in flight NMIs on P4 platform
oprofile, ARM: Release resources on failure
oprofile: Add Support for Intel CPU Family 6 / Model 29
The "flags" member of "struct wait_queue_t" is used in several places in
the kernel code without beeing initialized by init_wait(). "flags" is
used in bitwise operations.
If "flags" not initialized then unexpected behaviour may take place.
Incorrect flags might used later in code.
Added initialization of "wait_queue_t.flags" with zero value into
"init_wait".
Signed-off-by: Evgeny Kuznetsov <EXT-Eugeny.Kuznetsov@nokia.com>
[ The bit we care about does end up being initialized by both
prepare_to_wait() and add_to_wait_queue(), so this doesn't seem to
cause actual bugs, but is definitely the right thing to do -Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 5 Oct 2010 18:29:27 +0000 (11:29 -0700)]
modules: Fix module_bug_list list corruption race
With all the recent module loading cleanups, we've minimized the code
that sits under module_mutex, fixing various deadlocks and making it
possible to do most of the module loading in parallel.
However, that whole conversion totally missed the rather obscure code
that adds a new module to the list for BUG() handling. That code was
doubly obscure because (a) the code itself lives in lib/bugs.c (for
dubious reasons) and (b) it gets called from the architecture-specific
"module_finalize()" rather than from generic code.
Calling it from arch-specific code makes no sense what-so-ever to begin
with, and is now actively wrong since that code isn't protected by the
module loading lock any more.
So this commit moves the "module_bug_{finalize,cleanup}()" calls away
from the arch-specific code, and into the generic code - and in the
process protects it with the module_mutex so that the list operations
are now safe.
Future fixups:
- move the module list handling code into kernel/module.c where it
belongs.
- get rid of 'module_bug_list' and just use the regular list of modules
(called 'modules' - imagine that) that we already create and maintain
for other reasons.
Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xen: do not initialize PV timers on HVM if !xen_have_vector_callback
if !xen_have_vector_callback do not initialize PV timer unconditionally
because we still don't know how many cpus are available and if there is
more than one we won't be able to receive the timer interrupts on
cpu > 0.
This patch fixes an hang at boot when Xen does not support vector
callbacks and the guest has multiple vcpus.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
xen: do not set xenstored_ready before xenbus_probe on hvm
Register_xenstore_notifier should guarantee that the caller gets
notified even if xenstore is already up.
Therefore we revert "do not notify callers from
register_xenstore_notifier" and set xenstored_read at the right time for
PV on HVM guests too.
In fact in case of PV on HVM guests xenstored is ready only after the
platform pci driver has completed the initialization, so do not set
xenstored_ready before the call to xenbus_probe().
This patch fixes a shutdown_event watcher registration bug that causes
"xm shutdown" not to work properly.
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
Linus Torvalds [Mon, 4 Oct 2010 18:15:59 +0000 (11:15 -0700)]
Merge branch 'fix/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'fix/misc' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: i2c/other/ak4xx-adda: Fix a compile warning with CONFIG_PROCFS=n
ALSA: prevent heap corruption in snd_ctl_new()
Linus Torvalds [Mon, 4 Oct 2010 18:13:22 +0000 (11:13 -0700)]
Merge branch 'merge-spi' of git://git.secretlab.ca/git/linux-2.6
* 'merge-spi' of git://git.secretlab.ca/git/linux-2.6:
of/spi: Fix OF-style driver binding of spi devices
spi: spi-gpio.c tests SPI_MASTER_NO_RX bit twice, but not SPI_MASTER_NO_TX
spi/mpc8xxx: fix buffer overrun on large transfers
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
vlan: dont drop packets from unknown vlans in promiscuous mode
Phonet: Correct header retrieval after pskb_may_pull
um: Proper Fix for f25c80a4: remove duplicate structure field initialization
ip_gre: Fix dependencies wrt. ipv6.
net-2.6: SYN retransmits: Add new parameter to retransmits_timed_out()
iwl3945: queue the right work if the scan needs to be aborted
mac80211: fix use-after-free
Linus Torvalds [Mon, 4 Oct 2010 18:10:26 +0000 (11:10 -0700)]
Merge branch 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel
* 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel:
drm/i915: Rephrase pwrite bounds checking to avoid any potential overflow
drm/i915: Sanity check pread/pwrite
drm/i915: Use pipe state to tell when pipe is off
drm/i915: vblank status not valid while training display port
drivers/gpu/drm/i915/i915_gem.c: Add missing error handling code
drm/i915: Fix refleak during eviction.
drm/i915: fix GMCH power reporting
Hugh Dickins [Sun, 3 Oct 2010 00:49:08 +0000 (17:49 -0700)]
ksm: fix bad user data when swapping
Building under memory pressure, with KSM on 2.6.36-rc5, collapsed with
an internal compiler error: typically indicating an error in swapping.
Perhaps there's a timing issue which makes it now more likely, perhaps
it's just a long time since I tried for so long: this bug goes back to
KSM swapping in 2.6.33.
Notice how reuse_swap_page() allows an exclusive page to be reused, but
only does SetPageDirty if it can delete it from swap cache right then -
if it's currently under Writeback, it has to be left in cache and we
don't SetPageDirty, but the page can be reused. Fine, the dirty bit
will get set in the pte; but notice how zap_pte_range() does not bother
to transfer pte_dirty to page_dirty when unmapping a PageAnon.
If KSM chooses to share such a page, it will look like a clean copy of
swapcache, and not be written out to swap when its memory is needed;
then stale data read back from swap when it's needed again.
We could fix this in reuse_swap_page() (or even refuse to reuse a
page under writeback), but it's more honest to fix my oversight in
KSM's write_protect_page(). Several days of testing on three machines
confirms that this fixes the issue they showed.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sun, 3 Oct 2010 00:46:06 +0000 (17:46 -0700)]
ksm: fix page_address_in_vma anon_vma oops
2.6.36-rc1 commit 21d0d443cdc1658a8c1484fdcece4803f0f96d0e "rmap:
resurrect page_address_in_vma anon_vma check" was right to resurrect
that check; but now that it's comparing anon_vma->roots instead of
just anon_vmas, there's a danger of oopsing on a NULL anon_vma.
In most cases no NULL anon_vma ever gets here; but it turns out that
occasionally KSM, when enabled on a forked or forking process, will
itself call page_address_in_vma() on a "half-KSM" page left over from
an earlier failed attempt to merge - whose page_anon_vma() is NULL.
It's my bug that those should be getting here at all: I thought they
were already dealt with, this oops proves me wrong, I'll fix it in
the next release - such pages are effectively pinned until their
process exits, since rmap cannot find their ptes (though swapoff can).
For now just work around it by making page_address_in_vma() safe (and
add a comment on why that check is wanted anyway). A similar check
in __page_check_anon_rmap() is safe because do_page_add_anon_rmap()
already excluded KSM pages.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shmulik Ladkani [Tue, 31 Aug 2010 10:24:19 +0000 (13:24 +0300)]
MIPS: Calculate VMLINUZ_LOAD_ADDRESS based on the length of vmlinux.bin
Fix VMLINUZ_LOAD_ADDRESS calculation to be based on the length of
vmlinux.bin, the actual uncompressed kernel binary.
Previously it was based on the length of KBUILD_IMAGE (the unstripped ELF
vmlinux), which is bigger than vmlinux.bin. As a result, vmlinuz was
loaded into a memory address higher then actually needed - a problem for
small memory platforms.
FUJITA Tomonori [Sat, 14 Aug 2010 07:02:37 +0000 (16:02 +0900)]
MIPS: TX49xx: Rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN
Architectures need to set ARCH_DMA_MINALIGN to the minimum DMA
alignment (commit a6eb9fe105d5de0053b261148cee56c94b4720ca). Defining
ARCH_KMALLOC_MINALIGN doesn't work anymore.
Bernhard Walle [Fri, 3 Sep 2010 08:15:34 +0000 (10:15 +0200)]
MIPS: N32: Fix getdents64 syscall for n32
Commit 31c984a5acabea5d8c7224dc226453022be46f33 introduced a new syscall
getdents64. However, in the syscall table, the new syscall still refers to
the old getdents which doesn't work.
The problem appeared with a system that uses the eglibc 2.12-r11187 (that
utilizes that new syscall) is very confused. The fix has been tested with
that eglibc version.
Signed-off-by: Bernhard Walle <walle@corscience.de>
To: linux-mips@linux-mips.org Cc: ddaney@caviumnetworks.com Cc: akpm@linux-foundation.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/1567/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
MIPS: PNX8550: Sort out machine halt, restart and powerdown functions.
No rubbish printks - those belong to userspace. The halt function now
actually halts the system and the poweroff function was deleted because
it didn't actually power down the system.
MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.
This only matters for ISA devices with a 24-bit DMA limit or for devices
with a 32-bit DMA limit on systems with ZONE_DMA32 enabled. The latter
currently only affects 32-bit PCI cards on Sibyte-based systems with more
than 1GB RAM installed.
_TIF_WORK_MASK false had _TIF_SYSCALL_AUDIT set. If a thread's
_TIF_SYSCALL_AUDIT is ever set this will lead to an endless loop on the
way out from a syscall.
Currently this is only a theoretic bug as init/Kconfig doesn't allow
AUDIT_SYSCALL to be enabled for MIPS.
Andreas Bießmann [Wed, 11 Aug 2010 16:49:53 +0000 (18:49 +0200)]
MIPS: Octeon: Determine if helper needs to be built
This patch adds an config switch to determine if we need to build some
workaround helper files.
The staging driver octeon-ethernet references some symbols which are only
built when PCI is enabled. The new config switch enables these symbols in
bothe cases.
Signed-off-by: Andreas Bießmann <biessmann@corscience.de>
To: linux-kernel@vger.kernel.org Cc: Andreas Bießmann <biessmann@corscience.de> Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/1543/ Acked-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Deng-Cheng Zhu [Wed, 9 Jun 2010 04:35:25 +0000 (12:35 +0800)]
MIPS: Use generic atomic64 for 32-bit kernels
The 64-bit kernel has already had its atomic64 functions. Except for that,
we use the generic spinlocked version. The atomic64 types and related
functions are needed for the Linux performance counter subsystem.
Julia Lawall [Thu, 5 Aug 2010 20:17:22 +0000 (22:17 +0200)]
MIPS: kspd: Adjust confusing if indentation
Indent the branch of an if.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r disable braces4@
position p1,p2;
statement S1,S2;
@@
(
if (...) { ... }
|
if (...) S1@p1 S2@p2
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
if (p1[0].column == p2[0].column):
cocci.print_main("branch",p1)
cocci.print_secs("after",p2)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
To: linux-mips@linux-mips.org
To: linux-kernel@vger.kernel.org
To: kernel-janitors@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/1539/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Both python_scripting_ops and perl_scripting_ops have two global definitions.
One in trace-event-scripting.c and one in their respective scripting-engine
modules.
The issue is that depending on the linker order one definition or the other
is chosen. One is uninitialized (bss), while the other is initialized. If
the uninitialized version is chosen, then perf does not function properly.
This patch fixes this by adding the extern prefix to the definitions in
trace-event-scripting.c.
Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4c97e41a.078fd80a.7a8b.3cc9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
writeback: always use sb->s_bdi for writeback purposes
We currently use struct backing_dev_info for various different purposes.
Originally it was introduced to describe a backing device which includes
an unplug and congestion function and various bits of readahead information
and VM-relevant flags. We're also using for tracking dirty inodes for
writeback.
To make writeback properly find all inodes we need to only access the
per-filesystem backing_device pointed to by the superblock in ->s_bdi
inside the writeback code, and not the instances pointeded to by
inode->i_mapping->backing_dev which can be overriden by special devices
or might not be set at all by some filesystems.
Long term we should split out the writeback-relevant bits of struct
backing_device_info (which includes more than the current bdi_writeback)
and only point to it from the superblock while leaving the traditional
backing device as a separate structure that can be overriden by devices.
The one exception for now is the block device filesystem which really
wants different writeback contexts for it's different (internal) inodes
to handle the writeout more efficiently. For now we do this with
a hack in fs-writeback.c because we're so late in the cycle, but in
the future I plan to replace this with a superblock method that allows
for multiple writeback contexts per filesystem.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
perf_events: Fix invalid pointer when pid is invalid
This patch fixes an error in perf_event_open() when the pid
provided by the user is invalid. find_lively_task_by_vpid()
does not return NULL on error but an error code. Without the
fix the error code was silently passed to find_get_context()
which would eventually cause a invalid pointer dereference.
Robert Richter [Thu, 19 Aug 2010 08:50:26 +0000 (10:50 +0200)]
oprofile: Remove duplicate code around __oprofilefs_create_file()
Removing duplicate code by assigning the inodes private data pointer
in __oprofilefs_create_file(). Extending the function interface to
pass the pointer.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Chris Wilson [Sun, 26 Sep 2010 19:50:05 +0000 (20:50 +0100)]
drm/i915: Sanity check pread/pwrite
Move the access control up from the fast paths, which are no longer
universally taken first, up into the caller. This then duplicates some
sanity checking along the slow paths, but is much simpler.
Tracked as CVE-2010-2962.
Reported-by: Kees Cook <kees@ubuntu.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org
hwmon: f71882fg: use a muxed resource lock for the Super I/O port
Sleep while acquiring a resource lock on the Super I/O port. This should
prevent collisions from causing the hardware probe to fail with -EBUSY.
Signed-off-by: Giel van Schijndel <me@mortis.eu> Acked-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>
Keith Packard [Sun, 3 Oct 2010 07:33:06 +0000 (00:33 -0700)]
drm/i915: Use pipe state to tell when pipe is off
Instead of waiting for the display line value to settle, we can simply
wait for the pipe configuration register 'state' bit to turn off.
Contrarywise, disabling the plane will not cause the display line
value to stop changing, so instead we wait for the vblank interrupt
bit to get set. And, we only do this when we're not about to wait for
the pipe to turn off.
Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>