1. Defines arch-specific types for the contents of a pagetable entry.
That is, 32-bit entries for 32-bit non-PAE, and 64-bit entries for
32-bit PAE and 64-bit. However, even though the latter two are the
same size, they're defined with different types in order to retain
compatibility with printk format strings, etc.
2. Defines arch-specific pte_t. This is different because 32-bit PAE
defines it in two halves, whereas 32-bit PAE and 64-bit define it as a
single entry. All the other pagetable levels can be defined in a
common way. This also defines arch-specific pte_val/make_pte functions.
3. Define PAGETABLE_LEVELS for each architecture variation, for later use.
4. Define common pagetable entry accessors in a paravirt-compatible
way. (64-bit does not yet use paravirt-ops in any way).
5. Convert a few instances of using a *_val() as an lvalue where it is
no longer a macro. There are still places in the 64-bit code which
use pte_val() as an lvalue.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Move, and to some extent unify, the various page copying and clearing
functions. The only unification here is that both architectures use
the same function for copying/clearing user and kernel pages.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
are supposed to fix the detection of contant TSC for AMD CPUs.
Unfortunately on x86_64 it does still not work with current x86/mm.
For a Phenom I still get:
...
TSC calibrated against PM_TIMER
Marking TSC unstable due to TSCs unsynchronized
time.c: Detected 2288.366 MHz processor.
...
We have to set c->x86_power in early_identify_cpu to properly detect
the CONSTANT_TSC bit in early_init_amd.
Attached patch fixes this issue. Following the relevant boot
messages when the fix is used:
Andi Kleen [Wed, 30 Jan 2008 12:32:41 +0000 (13:32 +0100)]
x86: don't disable TSC in any C states on AMD Fam10h
The ACPI code currently disables TSC use in any C2 and C3
states. But the AMD Fam10h BKDG documents that the TSC
will never stop in any C states when the CONSTANT_TSC bit is
set. Make this disabling conditional on CONSTANT_TSC
not set on AMD.
I actually think this is true on Intel too for C2 states
on CPUs with p-state invariant TSC, but this needs
further discussions with Len to really confirm :-)
Andi Kleen [Wed, 30 Jan 2008 12:32:40 +0000 (13:32 +0100)]
x86: allow TSC clock source on AMD Fam10h and some cleanup
After a lot of discussions with AMD it turns out that TSC
on Fam10h CPUs is synchronized when the CONSTANT_TSC cpuid bit is set.
Or rather that if there are ever systems where that is not
true it would be their BIOS' task to disable the bit.
So finally use TSC gettimeofday on Fam10h by default.
Or rather it is always used now on CPUs where the AMD
specific CONSTANT_TSC bit is set.
This gives a nice speed bost for gettimeofday() on these systems
which tends to be by far the most common v/syscall.
On a Fam10h system here TSC gtod uses about 20% of the CPU time of
acpi_pm based gtod(). This was measured on 32bit, on 64bit
it is even better because TSC gtod() can use a vsyscall
and stay in ring 3, which acpi_pm doesn't.
The Intel check simply checks for CONSTANT_TSC too without hardcoding
Intel vendor. This is equivalent on 64bit because all 64bit capable Intel
CPUs will have CONSTANT_TSC set.
On Intel there is no CPU supplied CONSTANT_TSC bit currently,
but we synthesize one based on hardcoded knowledge which steppings
have p-state invariant TSC.
So the new logic is now: On CPUs which have the AMD specific
CONSTANT_TSC bit set or on Intel CPUs which are new enough
to be known to have p-state invariant TSC always use
TSC based gettimeofday()
Andi Kleen [Wed, 30 Jan 2008 12:32:38 +0000 (13:32 +0100)]
x86: introduce rdtsc_barrier()
rdtsc_barrier() is a new barrier primitive that stops RDTSC speculation
to avoid races with timer interrupts on other CPUs.
It expands either to LFENCE (for Intel CPUs) or MFENCE (for
AMD CPUs) which stops RDTSC on all currently known microarchitectures
that implement SSE. On CPUs without SSE there is generally no RDTSC
speculation.
[ mingo@elte.hu: renamed it to rdtsc_barrier() and made it x86-only ]
WANG Cong [Wed, 30 Jan 2008 12:32:38 +0000 (13:32 +0100)]
git-x86: unbreak UML
Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Quentin Barnes [Wed, 30 Jan 2008 12:32:32 +0000 (13:32 +0100)]
x86: code clarification patch to Kprobes arch code
When developing the Kprobes arch code for ARM, I ran across some code
found in x86 and s390 Kprobes arch code which I didn't consider as
good as it could be.
Once I figured out what the code was doing, I changed the code
for ARM Kprobes to work the way I felt was more appropriate.
I've tested the code this way in ARM for about a year and would
like to push the same change to the other affected architectures.
The code in question is in kprobe_exceptions_notify() which
does:
====
/* kprobe_running() needs smp_processor_id() */
preempt_disable();
if (kprobe_running() &&
kprobe_fault_handler(args->regs, args->trapnr))
ret = NOTIFY_STOP;
preempt_enable();
====
For the moment, ignore the code having the preempt_disable()/
preempt_enable() pair in it.
The problem is that kprobe_running() needs to call smp_processor_id()
which will assert if preemption is enabled. That sanity check by
smp_processor_id() makes perfect sense since calling it with preemption
enabled would return an unreliable result.
But the function kprobe_exceptions_notify() can be called from a
context where preemption could be enabled. If that happens, the
assertion in smp_processor_id() happens and we're dead. So what
the original author did (speculation on my part!) is put in the
preempt_disable()/preempt_enable() pair to simply defeat the check.
Once I figured out what was going on, I considered this an
inappropriate approach. If kprobe_exceptions_notify() is called
from a preemptible context, we can't be in a kprobe processing
context at that time anyways since kprobes requires preemption to
already be disabled, so just check for preemption enabled, and if
so, blow out before ever calling kprobe_running(). I wrote the ARM
kprobe code like this:
====
/* To be potentially processing a kprobe fault and to
* trust the result from kprobe_running(), we have
* be non-preemptible. */
if (!preemptible() && kprobe_running() &&
kprobe_fault_handler(args->regs, args->trapnr))
ret = NOTIFY_STOP;
====
The above code has been working fine for ARM Kprobes for a year.
So I changed the x86 code (2.6.24-rc6) to be the same way and ran
the Systemtap tests on that kernel. As on ARM, Systemtap on x86
comes up with the same test results either way, so it's a neutral
external functional change (as expected).
This issue has been discussed previously on linux-arm-kernel and the
Systemtap mailing lists. Pointers to the by base for the two
discussions:
http://lists.arm.linux.org.uk/lurker/message/20071219.223225.1f5c2a5e.en.html
http://sourceware.org/ml/systemtap/2007-q1/msg00251.html
Signed-off-by: Quentin Barnes <qbarnes@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Ananth N Mavinakayahanalli <ananth@in.ibm.com> Acked-by: Ananth N Mavinakayahanalli <ananth@in.ibm.com>
Ingo Molnar [Wed, 30 Jan 2008 12:32:31 +0000 (13:32 +0100)]
x86: hlt on early crash
H. Peter Anvin <hpa@zytor.com> wrote:
> It probably should actually HLT, to avoid sucking power, and stressing
> the thermal system. We're dead at this point, and the early 486's
> which had problems with HLT will lock up - we don't care.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sam Ravnborg [Wed, 30 Jan 2008 12:32:27 +0000 (13:32 +0100)]
x86: unify arch/x86/kernel/Makefile(s)
Combine the 32 and 64 bit specific Makefiles in one file.
While doing so link order was (almost) preserved on 32 bit
but on 64 bit link order changed a lot.
Patch was checked with defconfig + allyesconfig builds.
The same .o files were linked in these configurations.
To keep readability of the Makefiles a few Kconfig
symbols was added/modified and it was checked that
they were not used anywhere else.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sam Ravnborg [Wed, 30 Jan 2008 12:32:27 +0000 (13:32 +0100)]
x86: teach vdso to clean
A few files remained after 'make clean' in arch/x86/vdso/.
Teach vdso to clean up those files in a bit brutal fashion.
The filenames are just hardcoded in the Makefile.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Roland McGrath <roland@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Sam Ravnborg [Wed, 30 Jan 2008 12:32:21 +0000 (13:32 +0100)]
x86: share more options between 32 and 64 bit build
On recommendation from Andi Kleen share a few more options
between 32 and 64 bit builds.
A defconfig build for i386 did not show any difference in
size of text and data.
Sam Ravnborg [Wed, 30 Jan 2008 12:32:20 +0000 (13:32 +0100)]
x86: unification of arch/x86/Makefiles
Unify the 32 and 64 bit specific Makefiles.
The unification was simplest to do in one step although the
readability of the patch suffers a bit from this.
Noteworthy remarks on the unification:
- The 64 bit cpu stuff should be moved to Makefile_32.cpu
but I did not feel confident doing it due to subtle differences
- The use of cflags-y were abandoned since we have seen one bug where
we did wrong due to missing assignment to KBUILD_CFLAGS.
The cc-option marcro uses KBUILD_CFLAGS.
- The "No need to remake" line are deleted. It caused "make -B" to fail
- For 64 bit the sub architecture stuff is not used.
- The way head64.o is specified could be nicer - but it awaits the
introduction of head32.o (which seems like a win to introduce for readability)
- Patch is checkpatch clean
Patch is tested by doing a defconfig build for i386 and x86_64 and in both
cases monitoring that only relevant files were recompiled when applying
the patch.
[ mingo@elte.hu: build fix ]
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:32:19 +0000 (13:32 +0100)]
x86: kprobes change kprobe_handler flow
Make the control flow of kprobe_handler more obvious.
Collapse the separate if blocks/gotos with if/else blocks
this unifies the duplication of the check for a breakpoint
instruction race with another cpu.
Create two jump targets:
preempt_out: re-enables preemption before returning ret
out: only returns ret
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Harvey Harrison [Wed, 30 Jan 2008 12:32:19 +0000 (13:32 +0100)]
x86: cosmetic fixes fault_{32|64}.c
First step towards unifying these files.
- Checkpatch trailing whitespace fixes
- Checkpatch indentation of switch statement fixes
- Checkpatch single statement ifs need no braces fixes
- Checkpatch consistent spacing after comma fixes
- Introduce defines for pagefault error bits from X86_64 and add useful
comment from X86_32. Use these defines in X86_32 where obvious.
- Unify comments between 32|64 bit
- Small ifdef movement for CONFIG_KPROBES in notify_page_fault()
- Introduce X86_64 only case statement
No Functional Changes.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
arch/x86/math-emu/errors.c:163: warning: format '%ld' expects type 'long int', but argument 3 has type 'u32'
arch/x86/math-emu/errors.c:175: warning: format '%ld' expects type 'long int', but argument 3 has type 'u32'
arch/x86/math-emu/errors.c:175: warning: format '%ld' expects type 'long int', but argument 4 has type 'u32'
arch/x86/math-emu/errors.c:175: warning: format '%ld' expects type 'long int', but argument 5 has type 'u32'
arch/x86/math-emu/errors.c:175: warning: format '%ld' expects type 'long int', but argument 6 has type 'u32'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
WARNING: line over 80 characters
#45: FILE: arch/x86/kernel/cpu/mcheck/k7.c:50:
+ snprintf (addr, 24, " at %08x%08x", ahigh, alow);
WARNING: no space between function name and open parenthesis '('
#45: FILE: arch/x86/kernel/cpu/mcheck/k7.c:50:
+ snprintf (addr, 24, " at %08x%08x", ahigh, alow);
WARNING: no space between function name and open parenthesis '('
#48: FILE: arch/x86/kernel/cpu/mcheck/k7.c:52:
+ printk (KERN_EMERG "CPU %d: Bank %d: %08x%08x%s%s\n",
WARNING: no space between function name and open parenthesis '('
#65: FILE: arch/x86/kernel/cpu/mcheck/p4.c:161:
+ printk (KERN_DEBUG "CPU %d: EIP: %08x EFLAGS: %08x\n"
WARNING: no space between function name and open parenthesis '('
#88: FILE: arch/x86/kernel/cpu/mcheck/p4.c:182:
+ snprintf (misc, 20, "[%08x%08x]", ahigh, alow);
WARNING: line over 80 characters
#93: FILE: arch/x86/kernel/cpu/mcheck/p4.c:186:
+ snprintf (addr, 24, " at %08x%08x", ahigh, alow);
WARNING: no space between function name and open parenthesis '('
#93: FILE: arch/x86/kernel/cpu/mcheck/p4.c:186:
+ snprintf (addr, 24, " at %08x%08x", ahigh, alow);
WARNING: no space between function name and open parenthesis '('
#96: FILE: arch/x86/kernel/cpu/mcheck/p4.c:188:
+ printk (KERN_EMERG "CPU %d: Bank %d: %08x%08x%s%s\n",
WARNING: no space between function name and open parenthesis '('
#120: FILE: arch/x86/kernel/cpu/mcheck/p6.c:46:
+ snprintf (misc, 20, "[%08x%08x]", ahigh, alow);
WARNING: line over 80 characters
#125: FILE: arch/x86/kernel/cpu/mcheck/p6.c:50:
+ snprintf (addr, 24, " at %08x%08x", ahigh, alow);
WARNING: no space between function name and open parenthesis '('
#125: FILE: arch/x86/kernel/cpu/mcheck/p6.c:50:
+ snprintf (addr, 24, " at %08x%08x", ahigh, alow);
WARNING: no space between function name and open parenthesis '('
#128: FILE: arch/x86/kernel/cpu/mcheck/p6.c:52:
+ printk (KERN_EMERG "CPU %d: Bank %d: %08x%08x%s%s\n",
total: 0 errors, 13 warnings, 100 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Please run checkpatch prior to sending patches
Cc: Min Zhang <mzhang@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Min Zhang [Wed, 30 Jan 2008 12:32:11 +0000 (13:32 +0100)]
arch/x86/kernel/cpu/mcheck/p4.c: cleanups
SMP, the machine check exception dispatches all logical processors within a
physical package to the machine-check exception handler, so the printk
within each handler outputs concurrently and makes the output unreadable.
Refer to Intel system programming guide Part 1 Section 7.8.5
http://developer.intel.com/design/processor/manuals/253668.pdf
Signed-off-by: Min Zhang <mzhang@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
replace x86_read/write_per_cpu with a common function.
x86_read_per_cpu() and its writeish sister are not present in x86_64. So in
this patch, we replace them with __get_cpu_var(), which is present in both
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Like i386, x86_64 also need to include its own patching function.
(Well, if you're not in a hurry, and don't care about speed, you don't
really _need_ ;-))
So here they are. Not much different in essence from i386
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The core patching code for paravirt is sufficiently different
among i386 and x86_64, and we move them to specific files.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
x86_64 needs a potentially larger clobber list than i386, due to its calling
convention. So we add more CLBR_ defines for it.
Note that CLBR_ANY is different for each of the architectures, since it comprises
the notion of "All call clobbers in this architecture"
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since the advent of ticket locking, CLI_STRING, STI_STRING, and friends
are not used anymore. They can now be safely deleted.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
x86: replace privileged instructions with paravirt macros
The assembly code in entry_64.S issues a bunch of privileged instructions,
like cli, sti, swapgs, and others. Paravirt guests are forbidden to do so,
and we then replace them with macros that will do the right thing.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds paravirt hook for swapgs operation, which is a privileged
operation in x86_64.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
i386 has a macro GET_CR0_INTO_EAX, used in early trap handling code.
x86_64 has similar needs, only it needs to put cr2 into rcx. We provide
a macro for such task, in the same way
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch changes the irq handling function definitions
in paravirt.h (like raw_local_irq_disable) to accomodate for x86_64.
The differences are in the calling convention.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adjust the paravirt macros used in assembly code
to accomodate for x86_64 as well.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
x86: change assembly definition of paravirt_patch_site
To account for differences in x86_64, we change the macros that
create raw instances of the paravirt_patch_site struct.
We need to align 64-pointers to 64-bit boundaries, so we add an alignment
directive. Also, we need to make room for a word-sized pointer,
instead of a fixed 32-bit one
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds a field in pv_cpu_ops for a paravirtualized hook
for rdtscp, needed for x86_64.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
write_tsc() does not need to be enclosed in any paravirt closure,
as it uses wrmsr(). So we rip off the duplicate in msr.h
and the definition from paravirt.h
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adjust the PVOP_VCALL and PVOP_CALL macros to
work with x86_64. It has a different calling convention, and
we use auxiliary macros to account for both calling conventions
as cleanly as possible
Comments are adjusted accordingly.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch changes paravirt_32.c to paravirt.c. The goal
is to have paravirt support in x86_64, so we do it in a common file
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Markus Metzger [Wed, 30 Jan 2008 12:32:04 +0000 (13:32 +0100)]
x86, ptrace: overflow signal API
Establish the user API for sending a user-defined signal to the traced task on a BTS buffer overflow.
This should complete the user API for the BTS ptrace extension.
The patches so far implement wrap-around overflow handling as is needed for debugging.
The remaining open is another overflow handling mechanism that sends a signal to the traced task on a buffer overflow.
This will take some more time from my side.
Since, from a user perspective, this occurs behind the scenes, the patch set should already be useful. More features may/will be added on top of it (overflow signal, pageable back-up buffers, kernel tracing, core file support, profiling, ...).
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Markus Metzger [Wed, 30 Jan 2008 12:32:03 +0000 (13:32 +0100)]
x86, ptrace: add buffer size checks
Pass the buffer size for (most) ptrace commands that pass user-allocated buffers and check that size before accessing the buffer. Unfortunately, PTRACE_BTS_GET already uses all 4 parameters.
Commands that access user buffers return the number of bytes or records read or written.
Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Venki Pallipadi [Wed, 30 Jan 2008 12:32:01 +0000 (13:32 +0100)]
x86: voluntary leave_mm before entering ACPI C3
Aviod TLB flush IPIs during C3 states by voluntary leave_mm()
before entering C3.
The performance impact of TLB flush on C3 should not be significant with
respect to C3 wakeup latency. Also, CPUs tend to flush TLB in hardware while in
C3 anyways.
On a 8 logical CPU system, running make -j2, the number of tlbflush IPIs goes
down from 40 per second to ~ 0. Total number of interrupts during the run
of this workload was ~1200 per second, which makes it ~3% savings in wakeups.
There was no measurable performance or power impact however.
[ akpm@linux-foundation.org: symbol export fixes. ]
Ingo Molnar [Wed, 30 Jan 2008 12:32:00 +0000 (13:32 +0100)]
x86: add some pirq debugging
we use a few static mapping rules in our pirq routing functions,
and for example regression f3ac84324fd94 was due to the pirq
being out of range of the remapping array. Put in a few
WARN_ON_ONCE() lines so that we get notified about any such
out-of-bound incidents.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Gary Hade [Wed, 30 Jan 2008 12:31:59 +0000 (13:31 +0100)]
PCI: remove default PCI expansion ROM memory allocation
increasing number of PCI slots in large multi-node systems. The kernel
currently attempts by default to allocate memory for all PCI expansion
ROMs so there has also been an increasing number of PCI memory
allocation failures seen on these systems. This occurs because the BIOS
either (1) provides insufficient PCI memory resource for all the
expansion ROMs or (2) provides adequate PCI memory resource for
expansion ROMs but provides the space in kernel unexpected BIOS assigned
P2P non-prefetch windows.
The resulting PCI memory allocation failures may be benign when related
to memory requests for expansion ROMs themselves but in some cases they
can occur when attempting to allocate space for more critical BARs.
This can happen when a successful expansion ROM allocation request
consumes memory resource that was intended for a non-ROM BAR. We have
seen this happen during PCI hotplug of an adapter that contains a P2P
bridge where successful memory allocation for an expansion ROM BAR on
device behind the bridge consumed memory that was intended for a non-ROM
BAR on the P2P bridge. In all cases the allocation failure messages can
be very confusing for users.
This patch addresses the issue by changing the kernel default behavior
so that expansion ROM memory allocations are no longer attempted by
default when the BIOS has not assigned a specific address range to the
expansion ROM BAR. This was done by changing the 'pci=rom' boot option
behavior for BIOS unassigned expansion ROMs to actually match it's
current kernel-parameters.txt description which already implies "off" by
default. Behavior for BIOS assigned expansion ROMs implemented in
pcibios_assign_resources() [arch/x86/pci/i386.c] is unchanged.
Signed-off-by: Gary Hade <garyhade@us.ibm.com> Cc: Greg KH <greg@kroah.com> Cc: Jan Beulich <jbeulich@novell.com> Acked-by: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Parag Warudkar [Wed, 30 Jan 2008 12:31:59 +0000 (13:31 +0100)]
x86: fix dmi_alloc() to not advance alloc index in case of
dmi_alloc() for CONFIG_X86_64 is defined to allocate from a static array
and it maintains a allocation index which is advanced each time allocation
is attempted - it gets incremented even if an allocation fails thereby
depriving any future request that may be small enough to be satisfied from
the array.
Fix this by first testing if allocation is going to be possible and
incrementing alloc index only then.
Parag Warudkar [Wed, 30 Jan 2008 12:31:59 +0000 (13:31 +0100)]
x86: fix DMI out of memory problems
People with HP Desktops (including me) encounter couple of DMI errors
during boot - dmi_save_oem_strings_devices: out of memory and
dmi_string: out of memory.
On some HP desktops the DMI data include OEM strings (type 11) out of
which only few are meaningful and most other are empty. DMI code
religiously creates copies of these 27 strings (65 bytes each in my
case) and goes OOM in dmi_string().
If DMI_MAX_DATA is bumped up a little then it goes and fails in
dmi_save_oem_strings while allocating dmi_devices of sizeof(struct
dmi_device) corresponding to these strings.
On x86_64 since we cannot use alloc_bootmem this early, the code uses a
static array of 2048 bytes (DMI_MAX_DATA) for allocating the memory DMI
needs. It does not survive the creation of empty strings and devices.
Fix this by detecting and not newly allocating empty strings and instead
using a one statically defined dmi_empty_string.
Also do not create a new struct dmi_device for each empty string - use
one statically define dmi_device with .name=dmi_empty_string and add
that to the dmi_devices list.
On x64 this should stop the OOM with same current size of DMI_MAX_DATA
and on x86 this should save a good amount of (27*65 bytes +
27*sizeof(struct dmi_device) bootmem.
Compile and boot tested on both 32-bit and 64-bit x86.
What's left in processor_32.h and processor_64.h cannot be cleanly
integrated. However, it's just a couple of definitions. They are moved
to processor.h around ifdefs, and the original files are deleted. Note that
there's much less headers included in the final version.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
x86: remove __init modifier from header declaration
This patch removes the __init modifier from an extern function
declaration in acpi.h.
Besides not being strictly needed, it requires the inclusion of
linux/init.h, which is usually not even included directly, increasing
header mess by a lot.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Roland McGrath [Wed, 30 Jan 2008 12:31:56 +0000 (13:31 +0100)]
x86: x86 core dump TLS
This makes ELF core dumps of 32-bit processes include a new
note type NT_386_TLS (0x200) giving the contents of the TLS
slots in struct user_desc format. This lets post mortem
examination figure out what the segment registers mean like
the debugger does with get_thread_area on a live process.
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Roland McGrath [Wed, 30 Jan 2008 12:31:54 +0000 (13:31 +0100)]
x86: x86 ptrace user_regset
This cleans up the PTRACE_*REGS* request code so each one is just a
simple call to copy_regset_to_user or copy_regset_from_user. The
ptrace layouts already match the user_regset formats (core dump formats).
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>