Mel Gorman [Wed, 30 Jan 2008 12:32:54 +0000 (13:32 +0100)]
x86: make NUMA work on 32-bit again
On 32-bit NUMA, the memmap representing struct pages on each node is
allocated from node-local memory if possible. As only node-0 has memory from
ZONE_NORMAL, the memmap must be mapped into low memory. This is done by
reserving space in the Kernel Virtual Area (KVA) for the memmap belonging
to other nodes by taking pages from the end of ZONE_NORMAL and remapping
the other nodes memmap into those virtual addresses. The node boundaries
are then adjusted so that the region of pages is not used and it is marked
as reserved in the bootmem allocator.
This reserved portion of the KVA is PMD aligned althought
strictly speaking that requirement could be lifted (see thread at
http://lkml.org/lkml/2007/8/24/220). The problem is that when aligned, there
may be a portion of ZONE_NORMAL at the end that is not used for memmap and
does not have an initialised memmap nor is it marked reserved in the bootmem
allocator. Later in the boot process, these pages are freed and a storm of
Bad page state messages result.
This patch marks these pages reserved that are wasted due to alignment
in the bootmem allocator so they are not accidently freed. It is worth
noting that memory from node-0 is wasted where it could have been put into
ZONE_HIGHMEM on NUMA machines. Worse, the KVA is always reserved from the
location of real memory even when there is plenty of spare virtual address
space.
This patch also makes sure that reserve_bootmem() is not called with a
0-length size in numa_kva_reserve(). When this happens, it usually means
that a kernel built for Summit is being booted on a normal machine. The
resulting BUG_ON() is misleading so it is caught here.
Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Pavel Machek [Wed, 30 Jan 2008 12:32:54 +0000 (13:32 +0100)]
x86: unify arch/x86/kernel/acpi/sleep*.c
Unify arch/x86/kernel/acpi/sleep*.c
Pretty trivial unification; when two functions differed, it was
usually in error handling, and better of the two was picked up.
Signed-off-by: Pavel Machek <pavel@suse.cz> Looks-okay-to: Rafael J. Wysocki <rjw@sisk.pl> Tested-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86: kprobes: add kprobes smoke tests that run on boot
Here is a quick and naive smoke test for kprobes. This is intended to
just verify if some unrelated change broke the *probes subsystem. It is
self contained, architecture agnostic and isn't of any great use by itself.
This needs to be built in the kernel and runs a basic set of tests to
verify if kprobes, jprobes and kretprobes run fine on the kernel. In case
of an error, it'll print out a message with a "BUG" prefix.
This is a start; we intend to add more tests to this bucket over time.
Thanks to Jim Keniston and Masami Hiramatsu for comments and suggestions.
Tested on x86 (32/64) and powerpc.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
travis@sgi.com [Wed, 30 Jan 2008 12:32:53 +0000 (13:32 +0100)]
x86: unify percpu.h
Form a single percpu.h from percpu_32.h and percpu_64.h. Both are now pretty
small so this is simply adding them together.
Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
travis@sgi.com [Wed, 30 Jan 2008 12:32:52 +0000 (13:32 +0100)]
percpu: make the asm-generic/percpu.h more "generic"
- add support for PER_CPU_ATTRIBUTES
- fix generic smp percpu_modcopy to use per_cpu_offset() macro.
Add the ability to use generic/percpu even if the arch needs to override
several aspects of its operations. This will enable the use of generic
percpu.h for all arches.
An arch may define:
__per_cpu_offset Do not use the generic pointer array. Arch must
define per_cpu_offset(cpu) (used by x86_64, s390).
__my_cpu_offset Can be defined to provide an optimized way to determine
the offset for variables of the currently executing
processor. Used by ia64, x86_64, x86_32, sparc64, s/390.
SHIFT_PTR(ptr, offset) If an arch defines it then special handling
of pointer arithmentic may be implemented. Used
by s/390.
(Some of these special percpu arch implementations may be later consolidated
so that there are less cases to deal with.)
Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
travis@sgi.com [Wed, 30 Jan 2008 12:32:51 +0000 (13:32 +0100)]
percpu: use a kconfig variable to signal arch specific percpu setup
The use of the __GENERIC_PERCPU is a bit problematic since arches
may want to run their own percpu setup while using the generic
percpu definitions. Replace it through a kconfig variable.
Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
H. Peter Anvin [Wed, 30 Jan 2008 12:32:51 +0000 (13:32 +0100)]
i386: handle an initrd in highmem (version 2)
The boot protocol has until now required that the initrd be located in
lowmem, which makes the lowmem/highmem boundary visible to the boot
loader. This was exported to the bootloader via a compile-time
field. Unfortunately, the vmalloc= command-line option breaks this
part of the protocol; instead of adding yet another hack that affects
the bootloader, have the kernel relocate the initrd down below the
lowmem boundary inside the kernel itself.
Note that this does not rely on HIGHMEM being enabled in the kernel.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Miguel Boton [Wed, 30 Jan 2008 12:32:51 +0000 (13:32 +0100)]
x86: reboot_{32|64}.c unification
reboot_{32|64}.c unification patch.
This patch unifies the code from the reboot_32.c and reboot_64.c files.
It has been tested in computers with X86_32 and X86_64 kernels and it
looks like all reboot modes work fine (EFI restart system hasn't been
tested yet).
Probably I made some mistakes (like I usually do) so I hope
we can identify and fix them soon.
Signed-off-by: Miguel Boton <mboton@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Arjan van de Ven [Wed, 30 Jan 2008 12:32:50 +0000 (13:32 +0100)]
debug: add the end-of-trace marker and the module list to
Unlike oopses, WARN_ON() currently does't print the loaded modules list.
This makes it harder to take action on certain bug reports. For example,
recently there were a set of WARN_ON()s reported in the mac80211 stack,
which were just signalling a driver bug. It takes then anther round trip
to the bug reporter (if he responds at all) to find out which driver
is at fault.
Another issue is that, unlike oopses, WARN_ON() doesn't currently printk
the helpful "cut here" line, nor the "end of trace" marker.
Now that WARN_ON() is out of line, the size increase due to this is
minimal and it's worth adding.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Arjan van de Ven [Wed, 30 Jan 2008 12:32:50 +0000 (13:32 +0100)]
debug: move WARN_ON() out of line
A quick grep shows that there are currently 1145 instances of WARN_ON
in the kernel. Currently, WARN_ON is pretty much entirely inlined,
which makes it hard to enhance it without growing the size of the kernel
(and getting Andrew unhappy).
This patch build on top of Olof's patch that introduces __WARN,
and places the slowpath out of line. It also uses Ingo's suggestion
to not use __FUNCTION__ but to use kallsyms to do the lookup;
this saves a ton of extra space since gcc doesn't need to store the function
string twice now:
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Olof Johansson <olof@lixom.net> Acked-by: Matt Meckall <mpm@selenic.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
c0113218 T acpi_restore_state_mem c0113219 T acpi_save_state_mem
<Big Hole> c0114000 t wakeup_code
This is because arch/x86/kernel/acpi/wakeup_32.S forces a .text alignment
of 4096 bytes. (I have no idea if it is really needed, since
arch/x86/kernel/acpi/wakeup_64.S uses a 16 bytes alignment *only*)
So arch/x86/kernel/built-in.o also has this alignment
arch/x86/kernel/built-in.o: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00018c94000000000000000000001000 2**12
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
But as arch/x86/kernel/acpi/wakeup_32.o is not the first object linked
into arch/x86/kernel/built-in.o, linker had to build several holes to meet
alignement requirements, because of .o nestings in the kbuild process.
This can be solved by using a special section, .text.page_aligned, so that
no holes are needed.
# size vmlinux.before vmlinux.after
text data bss dec hex filename 4619942 422838 458752 5501532 53f25c vmlinux.before 4610534 422838 458752 5492124 53cd9c vmlinux.after
This saves 9408 bytes
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
1. Defines arch-specific types for the contents of a pagetable entry.
That is, 32-bit entries for 32-bit non-PAE, and 64-bit entries for
32-bit PAE and 64-bit. However, even though the latter two are the
same size, they're defined with different types in order to retain
compatibility with printk format strings, etc.
2. Defines arch-specific pte_t. This is different because 32-bit PAE
defines it in two halves, whereas 32-bit PAE and 64-bit define it as a
single entry. All the other pagetable levels can be defined in a
common way. This also defines arch-specific pte_val/make_pte functions.
3. Define PAGETABLE_LEVELS for each architecture variation, for later use.
4. Define common pagetable entry accessors in a paravirt-compatible
way. (64-bit does not yet use paravirt-ops in any way).
5. Convert a few instances of using a *_val() as an lvalue where it is
no longer a macro. There are still places in the 64-bit code which
use pte_val() as an lvalue.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move, and to some extent unify, the various page copying and clearing
functions. The only unification here is that both architectures use
the same function for copying/clearing user and kernel pages.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
are supposed to fix the detection of contant TSC for AMD CPUs.
Unfortunately on x86_64 it does still not work with current x86/mm.
For a Phenom I still get:
...
TSC calibrated against PM_TIMER
Marking TSC unstable due to TSCs unsynchronized
time.c: Detected 2288.366 MHz processor.
...
We have to set c->x86_power in early_identify_cpu to properly detect
the CONSTANT_TSC bit in early_init_amd.
Attached patch fixes this issue. Following the relevant boot
messages when the fix is used:
Andi Kleen [Wed, 30 Jan 2008 12:32:41 +0000 (13:32 +0100)]
x86: don't disable TSC in any C states on AMD Fam10h
The ACPI code currently disables TSC use in any C2 and C3
states. But the AMD Fam10h BKDG documents that the TSC
will never stop in any C states when the CONSTANT_TSC bit is
set. Make this disabling conditional on CONSTANT_TSC
not set on AMD.
I actually think this is true on Intel too for C2 states
on CPUs with p-state invariant TSC, but this needs
further discussions with Len to really confirm :-)
Andi Kleen [Wed, 30 Jan 2008 12:32:40 +0000 (13:32 +0100)]
x86: allow TSC clock source on AMD Fam10h and some cleanup
After a lot of discussions with AMD it turns out that TSC
on Fam10h CPUs is synchronized when the CONSTANT_TSC cpuid bit is set.
Or rather that if there are ever systems where that is not
true it would be their BIOS' task to disable the bit.
So finally use TSC gettimeofday on Fam10h by default.
Or rather it is always used now on CPUs where the AMD
specific CONSTANT_TSC bit is set.
This gives a nice speed bost for gettimeofday() on these systems
which tends to be by far the most common v/syscall.
On a Fam10h system here TSC gtod uses about 20% of the CPU time of
acpi_pm based gtod(). This was measured on 32bit, on 64bit
it is even better because TSC gtod() can use a vsyscall
and stay in ring 3, which acpi_pm doesn't.
The Intel check simply checks for CONSTANT_TSC too without hardcoding
Intel vendor. This is equivalent on 64bit because all 64bit capable Intel
CPUs will have CONSTANT_TSC set.
On Intel there is no CPU supplied CONSTANT_TSC bit currently,
but we synthesize one based on hardcoded knowledge which steppings
have p-state invariant TSC.
So the new logic is now: On CPUs which have the AMD specific
CONSTANT_TSC bit set or on Intel CPUs which are new enough
to be known to have p-state invariant TSC always use
TSC based gettimeofday()
Andi Kleen [Wed, 30 Jan 2008 12:32:38 +0000 (13:32 +0100)]
x86: introduce rdtsc_barrier()
rdtsc_barrier() is a new barrier primitive that stops RDTSC speculation
to avoid races with timer interrupts on other CPUs.
It expands either to LFENCE (for Intel CPUs) or MFENCE (for
AMD CPUs) which stops RDTSC on all currently known microarchitectures
that implement SSE. On CPUs without SSE there is generally no RDTSC
speculation.
[ mingo@elte.hu: renamed it to rdtsc_barrier() and made it x86-only ]
WANG Cong [Wed, 30 Jan 2008 12:32:38 +0000 (13:32 +0100)]
git-x86: unbreak UML
Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Quentin Barnes [Wed, 30 Jan 2008 12:32:32 +0000 (13:32 +0100)]
x86: code clarification patch to Kprobes arch code
When developing the Kprobes arch code for ARM, I ran across some code
found in x86 and s390 Kprobes arch code which I didn't consider as
good as it could be.
Once I figured out what the code was doing, I changed the code
for ARM Kprobes to work the way I felt was more appropriate.
I've tested the code this way in ARM for about a year and would
like to push the same change to the other affected architectures.
The code in question is in kprobe_exceptions_notify() which
does:
====
/* kprobe_running() needs smp_processor_id() */
preempt_disable();
if (kprobe_running() &&
kprobe_fault_handler(args->regs, args->trapnr))
ret = NOTIFY_STOP;
preempt_enable();
====
For the moment, ignore the code having the preempt_disable()/
preempt_enable() pair in it.
The problem is that kprobe_running() needs to call smp_processor_id()
which will assert if preemption is enabled. That sanity check by
smp_processor_id() makes perfect sense since calling it with preemption
enabled would return an unreliable result.
But the function kprobe_exceptions_notify() can be called from a
context where preemption could be enabled. If that happens, the
assertion in smp_processor_id() happens and we're dead. So what
the original author did (speculation on my part!) is put in the
preempt_disable()/preempt_enable() pair to simply defeat the check.
Once I figured out what was going on, I considered this an
inappropriate approach. If kprobe_exceptions_notify() is called
from a preemptible context, we can't be in a kprobe processing
context at that time anyways since kprobes requires preemption to
already be disabled, so just check for preemption enabled, and if
so, blow out before ever calling kprobe_running(). I wrote the ARM
kprobe code like this:
====
/* To be potentially processing a kprobe fault and to
* trust the result from kprobe_running(), we have
* be non-preemptible. */
if (!preemptible() && kprobe_running() &&
kprobe_fault_handler(args->regs, args->trapnr))
ret = NOTIFY_STOP;
====
The above code has been working fine for ARM Kprobes for a year.
So I changed the x86 code (2.6.24-rc6) to be the same way and ran
the Systemtap tests on that kernel. As on ARM, Systemtap on x86
comes up with the same test results either way, so it's a neutral
external functional change (as expected).
This issue has been discussed previously on linux-arm-kernel and the
Systemtap mailing lists. Pointers to the by base for the two
discussions:
http://lists.arm.linux.org.uk/lurker/message/20071219.223225.1f5c2a5e.en.html
http://sourceware.org/ml/systemtap/2007-q1/msg00251.html
Signed-off-by: Quentin Barnes <qbarnes@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ananth N Mavinakayahanalli <ananth@in.ibm.com> Acked-by: Ananth N Mavinakayahanalli <ananth@in.ibm.com>
Ingo Molnar [Wed, 30 Jan 2008 12:32:31 +0000 (13:32 +0100)]
x86: hlt on early crash
H. Peter Anvin <hpa@zytor.com> wrote:
> It probably should actually HLT, to avoid sucking power, and stressing
> the thermal system. We're dead at this point, and the early 486's
> which had problems with HLT will lock up - we don't care.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sam Ravnborg [Wed, 30 Jan 2008 12:32:27 +0000 (13:32 +0100)]
x86: unify arch/x86/kernel/Makefile(s)
Combine the 32 and 64 bit specific Makefiles in one file.
While doing so link order was (almost) preserved on 32 bit
but on 64 bit link order changed a lot.
Patch was checked with defconfig + allyesconfig builds.
The same .o files were linked in these configurations.
To keep readability of the Makefiles a few Kconfig
symbols was added/modified and it was checked that
they were not used anywhere else.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sam Ravnborg [Wed, 30 Jan 2008 12:32:27 +0000 (13:32 +0100)]
x86: teach vdso to clean
A few files remained after 'make clean' in arch/x86/vdso/.
Teach vdso to clean up those files in a bit brutal fashion.
The filenames are just hardcoded in the Makefile.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Roland McGrath <roland@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sam Ravnborg [Wed, 30 Jan 2008 12:32:21 +0000 (13:32 +0100)]
x86: share more options between 32 and 64 bit build
On recommendation from Andi Kleen share a few more options
between 32 and 64 bit builds.
A defconfig build for i386 did not show any difference in
size of text and data.
Sam Ravnborg [Wed, 30 Jan 2008 12:32:20 +0000 (13:32 +0100)]
x86: unification of arch/x86/Makefiles
Unify the 32 and 64 bit specific Makefiles.
The unification was simplest to do in one step although the
readability of the patch suffers a bit from this.
Noteworthy remarks on the unification:
- The 64 bit cpu stuff should be moved to Makefile_32.cpu
but I did not feel confident doing it due to subtle differences
- The use of cflags-y were abandoned since we have seen one bug where
we did wrong due to missing assignment to KBUILD_CFLAGS.
The cc-option marcro uses KBUILD_CFLAGS.
- The "No need to remake" line are deleted. It caused "make -B" to fail
- For 64 bit the sub architecture stuff is not used.
- The way head64.o is specified could be nicer - but it awaits the
introduction of head32.o (which seems like a win to introduce for readability)
- Patch is checkpatch clean
Patch is tested by doing a defconfig build for i386 and x86_64 and in both
cases monitoring that only relevant files were recompiled when applying
the patch.
[ mingo@elte.hu: build fix ]
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:32:19 +0000 (13:32 +0100)]
x86: kprobes change kprobe_handler flow
Make the control flow of kprobe_handler more obvious.
Collapse the separate if blocks/gotos with if/else blocks
this unifies the duplication of the check for a breakpoint
instruction race with another cpu.
Create two jump targets:
preempt_out: re-enables preemption before returning ret
out: only returns ret
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:32:19 +0000 (13:32 +0100)]
x86: cosmetic fixes fault_{32|64}.c
First step towards unifying these files.
- Checkpatch trailing whitespace fixes
- Checkpatch indentation of switch statement fixes
- Checkpatch single statement ifs need no braces fixes
- Checkpatch consistent spacing after comma fixes
- Introduce defines for pagefault error bits from X86_64 and add useful
comment from X86_32. Use these defines in X86_32 where obvious.
- Unify comments between 32|64 bit
- Small ifdef movement for CONFIG_KPROBES in notify_page_fault()
- Introduce X86_64 only case statement
No Functional Changes.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/math-emu/errors.c:163: warning: format '%ld' expects type 'long int', but argument 3 has type 'u32'
arch/x86/math-emu/errors.c:175: warning: format '%ld' expects type 'long int', but argument 3 has type 'u32'
arch/x86/math-emu/errors.c:175: warning: format '%ld' expects type 'long int', but argument 4 has type 'u32'
arch/x86/math-emu/errors.c:175: warning: format '%ld' expects type 'long int', but argument 5 has type 'u32'
arch/x86/math-emu/errors.c:175: warning: format '%ld' expects type 'long int', but argument 6 has type 'u32'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>