x86, mce: don't log boot MCEs on Pentium M (model == 13) CPUs
On my legacy Pentium M laptop (Acer Extensa 2900) I get bogus MCE on a cold
boot with CONFIG_X86_NEW_MCE enabled, i.e. (after decoding it with mcelog):
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 1 MCG status:
MCi status:
Error overflow
Uncorrected error
Error enabled
Processor context corrupt
MCA: Data CACHE Level-1 UNKNOWN Error
STATUS f200000000000195 MCGSTATUS 0
To verify that this is not a CONFIG_X86_NEW_MCE bug I also modified
the CONFIG_X86_OLD_MCE code (which doesn't log any MCEs) to dump
content of STATUS MSR before it is cleared during initialization. ]
Since the bogus MCE results in a kernel taint (which in turn disables
lockdep support) don't log boot MCEs on Pentium M (model == 13) CPUs
by default ("mce=bootlog" boot parameter can be be used to get the old
behavior).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Reviewed-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
x86: mce: Update X86_MCE description in x86/Kconfig
- Clarify that this config controls thermal throttling
reporting too
- Clarify the types of errors reported by machine checks
- Drop references to ancient CPUs.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
x86: mce: Make CONFIG_X86_ANCIENT_MCE dependent on CONFIG_X86_MCE
Add a missing depency for ANCIENT_MCE. It didn't matter
in practice because the ANCIENT code wasn't compiled without
X86_MCE, but it's better to express that clearly in
Kconfig.
Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Yinghai Lu [Wed, 17 Jun 2009 23:21:33 +0000 (16:21 -0700)]
x86: use zalloc_cpumask_var for mce_dev_initialized
We need a cleared cpu_mask to record if mce is initialized, especially
when MAXSMP is used.
used zalloc_... instead
Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Prarit Bhargava [Wed, 10 Jun 2009 19:41:02 +0000 (12:41 -0700)]
x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround
Expand Intel NMI perfctr1 workaround to include a Core2 processor stepping
(cpuid family-6, model-f, stepping-4). Resolves a situation where the NMI
would not enable on these processors.
Figo.zhang [Wed, 17 Jun 2009 14:25:20 +0000 (22:25 +0800)]
x86, io_apic.c: Work around compiler warning
This compiler warning:
arch/x86/kernel/apic/io_apic.c: In function ‘ioapic_write_entry’:
arch/x86/kernel/apic/io_apic.c:466: warning: ‘eu’ is used uninitialized in this function
arch/x86/kernel/apic/io_apic.c:465: note: ‘eu’ was declared here
Is bogus as 'eu' is always initialized. But annotate it away by
initializing the variable, to make it easier for people to notice
real warnings. A compiler that sees through this logic will
optimize away the initialization.
This happens on QEMU which reports MCA capability, but no banks.
Without this patch there is a buffer overrun and boot ops because
the code would try to initialize the 0 element of a zero length
kmalloc() buffer.
H. Peter Anvin [Wed, 17 Jun 2009 00:39:04 +0000 (17:39 -0700)]
x86, boot: use .code16gcc instead of .code16
Use .code16gcc to compile arch/x86/boot/bioscall.S rather than
.code16, since some older versions of binutils can't generate 32-bit
addressing expressions (67 prefixes) in .code16 mode, only in
.code16gcc mode.
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cliff Wickman [Tue, 16 Jun 2009 21:43:40 +0000 (16:43 -0500)]
x86: correct the conversion of EFI memory types
This patch causes all the EFI_RESERVED_TYPE memory reservations to be recorded
in the e820 table as type E820_RESERVED.
(This patch replaces one called 'x86: vendor reserved memory type'.
This version has been discussed a bit with Peter and Yinghai but not given
a final opinion.)
Without this patch EFI_RESERVED_TYPE memory reservations may be
marked usable in the e820 table. There may be a collision between
kernel use and some reserver's use of this memory.
(An example use of this functionality is the UV system, which
will access extremely large areas of memory with a memory engine
that allows a user to address beyond the processor's range. Such
areas are reserved in the EFI table by the BIOS.
Some loaders have a restricted number of entries possible in the e820 table,
hence the need to record the reservations in the unrestricted EFI table.)
The call to do_add_efi_memmap() is only made if "add_efi_memmap" is specified
on the kernel command line.
Signed-off-by: Cliff Wickman <cpw@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
H. Peter Anvin [Wed, 10 Jun 2009 01:20:39 +0000 (18:20 -0700)]
x86: cap iomem_resource to addressable physical memory
iomem_resource is by default initialized to -1, which means 64 bits of
physical address space if 64-bit resources are enabled. However, x86
CPUs cannot address 64 bits of physical address space. Thus, we want
to cap the physical address space to what the union of all CPU can
actually address.
Without this patch, we may end up assigning inaccessible values to
uninitialized 64-bit PCI memory resources.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Martin Mares <mj@ucw.cz> Cc: stable@kernel.org
Hidetoshi Seto [Mon, 15 Jun 2009 08:22:49 +0000 (17:22 +0900)]
x86, mce: make mce_disabled boolean
The mce_disabled on 32bit is a tristate variable [1,0,-1],
while 64bit version is boolean [0,1].
This patch makes mce_disabled always boolean, and use mce_p5_enabled
to indicate the third state instead.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Hidetoshi Seto [Mon, 15 Jun 2009 08:18:45 +0000 (17:18 +0900)]
x86, mce: don't init timer if !mce_available
In mce_cpu_restart, mce_init_timer is called unconditionally.
If !mce_available (e.g. mce is disabled), there are no useful work
for timer. Stop running it.
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Huang Ying [Mon, 15 Jun 2009 07:37:07 +0000 (15:37 +0800)]
x86, mce: fix a race condition about mce_callin and no_way_out
If one CPU has no_way_out == 1, all other CPUs should have no_way_out
== 1. But despite global_nwo is read after mce_callin, global_nwo is
updated after mce_callin too. So it is possible that some CPU read
global_nwo before some other CPU update global_nwo, so that no_way_out
== 1 for some CPU, while no_way_out == 0 for some other CPU.
This patch fixes this race condition via moving mce_callin updating
after global_nwo updating, with a smp_wmb in between. A smp_rmb is
added between their reading too.
Linus Torvalds [Tue, 16 Jun 2009 19:11:57 +0000 (12:11 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2/net: Use wait_event() in o2net_send_message_vec()
ocfs2: Adjust rightmost path in ocfs2_add_branch.
ocfs2: fdatasync should skip unimportant metadata writeout
ocfs2: Remove redundant gotos in ocfs2_mount_volume()
ocfs2: Add statistics for the checksum and ecc operations.
ocfs2 patch to track delayed orphan scan timer statistics
ocfs2: timer to queue scan of all orphan slots
ocfs2: Correct ordering of ip_alloc_sem and localloc locks for directories
ocfs2: Fix possible deadlock in quota recovery
ocfs2: Fix possible deadlock with quotas in ocfs2_setattr()
ocfs2: Fix lock inversion in ocfs2_local_read_info()
ocfs2: Fix possible deadlock in ocfs2_global_read_dquot()
ocfs2: update comments in masklog.h
ocfs2: Don't printk the error when listing too many xattrs.
Linus Torvalds [Tue, 16 Jun 2009 19:03:43 +0000 (12:03 -0700)]
Merge branch 'serial'
* serial:
imx: Check for NULL pointer deref before calling tty_encode_baud_rate
atmel_serial: fix hang in set_termios when crtscts is enabled
MAINTAINERS: update 8250 section, give Alan Cox a name
tty: fix sanity check
pty: Narrow the race on ldisc locking
tty: fix unused warning when TCGETX is not defined
ldisc: debug aids
ldisc: Make sure the ldisc isn't active when we close it
tty: Fix leaks introduced by the shift to separate ldisc objects
Fix conflicts in drivers/char/pty.c due to earlier version of the ldisc
race narrowing.
atmel_serial: fix hang in set_termios when crtscts is enabled
After enabling hardware flow control, any subsequent termios call may hang
waiting for the transmitter to drain. This appears to be caused by a
busy-loop in set_termios() waiting for the transmitter to become empty,
which may take a very long time (or hang indefinitely) if the device at
the other end is blocking us.
A quick look through the tty and serial_core code indicates that any
necessary flushing (which is optional) has already been done at this
point, so there's no need for the driver to flush the transmitter on its
own.
Fix it by removing the busy-loop altogether.
Tested-by: Eirik Aanonsen <eaa@wprmedical.com> Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Tue, 16 Jun 2009 16:01:33 +0000 (17:01 +0100)]
tty: fix sanity check
The WARN_ON() that was added to tty_reopen can be triggered in the specific
case of a hangup occurring during a re-open of a tty which is not in the
middle of being otherwise closed.
In that case however the WARN() is bogus as we don't hold the neccessary
locks to make a correct decision.
The case we should be checking is "if the ldisc is not changing and reopen
is occuring". We could drop the WARN_ON but for the moment the debug is more
valuable even if it means taking a mutex as it will find any other cases.
Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Tue, 16 Jun 2009 16:01:13 +0000 (17:01 +0100)]
pty: Narrow the race on ldisc locking
The pty code has always been buggy on its ldisc handling. The recent
changes made the window for the race much bigger. Pending fixing it
properly which is not at all trivial, at least make the race small again so
we don't disrupt other dev work.
Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Tue, 16 Jun 2009 16:01:02 +0000 (17:01 +0100)]
tty: fix unused warning when TCGETX is not defined
If TCGETX is not defined, we end up with this warning:
drivers/char/tty_ioctl.c: In function ‘tty_mode_ioctl’:
drivers/char/tty_ioctl.c:950: warning: unused variable ‘ktermx’
Since the variable is only used in one case statement, push it down to
the local case scope.
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next-2.6: (30 commits)
sparc64: Update defconfig.
sparc: Wire up sys_rt_tgsigqueueinfo().
openprom: Squelch useless GCC warning.
sparc: replace uses of CPU_MASK_ALL_PTR
sparc64: Add proper dynamic ftrace support.
sparc: Simplify code using is_power_of_2() routine.
sparc: move of_device common code to of_device_common
sparc: remove dma-mapping_{32|64}.h
sparc: use dma_map_page instead of dma_map_single
sparc: add sync_single_for_device and sync_sg_for_device to struct dma_ops
sparc: move the duplication in dma-mapping_{32|64}.h to dma-mapping.h
p9100: use standard fields for framebuffer physical address and length
leo: use standard fields for framebuffer physical address and length
cg6: use standard fields for framebuffer physical address and length
cg3: use standard fields for framebuffer physical address and length
cg14: use standard fields for framebuffer physical address and length
bw2: use standard fields for framebuffer physical address and length
sparc64: fix and optimize irq distribution
sparc64: Use new dynamic per-cpu allocator.
sparc64: Only allocate per-cpu areas for possible cpus.
...
Linus Torvalds [Tue, 16 Jun 2009 18:49:58 +0000 (11:49 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin: (27 commits)
Blackfin: hook up new rt_tgsigqueueinfo syscall
Blackfin: improve CLKIN_HZ config default
Blackfin: initial support for ftrace grapher
Blackfin: initial support for ftrace
Blackfin: enable support for LOCKDEP
Blackfin: add preliminary support for STACKTRACE
Blackfin: move custom sections into sections.h
Blackfin: punt unused/wrong mutex-dec.h
Blackfin: add support for irqflags
Blackfin: add support for bzip2/lzma compressed kernel images
Blackfin: convert Kconfig style to def_bool
Blackfin: bf548-ezkit: update smsc911x resources
Blackfin: update aedos-ipipe code to upstream 1.10-00
Blackfin: bf537-stamp: update ADP5520 resources
Blackfin: bf518f-ezbrd: fix SPI CS for SPI flash
Blackfin: define SPI IRQ in board resources
Blackfin: do not configure the UART early if on wrong processor
Blackfin: fix deadlock in SMP IPI handler
Blackfin: fix flag storage for irq funcs
Blackfin: push down exception oops checking
...
Linus Torvalds [Tue, 16 Jun 2009 18:46:45 +0000 (11:46 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: remove some includings of blktrace_api.h
mg_disk: seperate mg_disk.h again
block: Introduce helper to reset queue limits to default values
cfq: remove extraneous '\n' in blktrace output
ubifs: register backing_dev_info
btrfs: properly register fs backing device
block: don't overwrite bdi->state after bdi_init() has been run
cfq: cleanup for last_end_request in cfq_data
Linus Torvalds [Tue, 16 Jun 2009 18:30:37 +0000 (11:30 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (38 commits)
ps3flash: Always read chunks of 256 KiB, and cache them
ps3flash: Cache the last accessed FLASH chunk
ps3: Replace direct file operations by callback
ps3: Switch ps3_os_area_[gs]et_rtc_diff to EXPORT_SYMBOL_GPL()
ps3: Correct debug message in dma_ioc0_map_pages()
drivers/ps3: Add missing annotations
ps3fb: Use ps3_system_bus_[gs]et_drvdata() instead of direct access
ps3flash: Use ps3_system_bus_[gs]et_drvdata() instead of direct access
ps3: shorten ps3_system_bus_[gs]et_driver_data to ps3_system_bus_[gs]et_drvdata
ps3: Use dev_[gs]et_drvdata() instead of direct access for system bus devices
block/ps3: remove driver_data direct access of struct device
ps3vram: Make ps3vram_priv.reports a void *
ps3vram: Remove no longer used ps3vram_priv.ddr_base
ps3vram: Replace mutex by spinlock + bio_list
block: Add bio_list_peek()
powerpc: Use generic atomic64_t implementation on 32-bit processors
lib: Provide generic atomic64_t implementation
powerpc: Add compiler memory barrier to mtmsr macro
powerpc/iseries: Mark signal_vsp_instruction() as maybe unused
powerpc/iseries: Fix unused function warning in iSeries DT code
...
Linus Torvalds [Tue, 16 Jun 2009 18:29:17 +0000 (11:29 -0700)]
Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
therm_windtunnel: Convert to a new-style i2c driver
therm_adt746x: Convert to a new-style i2c driver
windfarm: Convert to new-style i2c drivers
therm_pm72: Convert to a new-style i2c driver
i2c-viapro: Add new PCI device ID for VX855
i2c/chips: Move max6875 to drivers/misc/eeprom
i2c: Do not give adapters a default parent
i2c: Do not probe for TV chips on Voodoo3 adapters
i2c: Retry automatically on arbitration loss
i2c: Remove void casts
Linus Torvalds [Tue, 16 Jun 2009 18:24:23 +0000 (11:24 -0700)]
Merge branch 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (27 commits)
ACPICA: Update version to 20090521.
ACPICA: Disable preservation of SCI enable bit (SCI_EN)
ACPICA: Region deletion: Ensure region object is removed from handler list
ACPICA: Eliminate extra call to NsGetParentNode
ACPICA: Simplify internal operation region interface
ACPICA: Update Load() to use operation region interfaces
ACPICA: New: AcpiInstallMethod - install a single control method
ACPICA: Invalidate DdbHandle after table unload
ACPICA: Fix reference count issues for DdbHandle object
ACPICA: Simplify and optimize NsGetNextNode function
ACPICA: Additional validation of _PRT packages (resource mgr)
ACPICA: Fix DebugObject output for DdbHandle objects
ACPICA: Fix allowable release order for ASL mutex objects
ACPICA: Mutex support: Fix release ordering issue and current sync level
ACPICA: Update version to 20090422.
ACPICA: Linux OSL: cleanup/update/merge
ACPICA: Fix implementation of AML BreakPoint operator (break to debugger)
ACPICA: Fix miscellaneous warnings under gcc 4+
ACPICA: Miscellaneous lint changes
ACPICA: Fix possible dereference of null pointer
...
Alan Cox [Mon, 15 Jun 2009 15:27:29 +0000 (16:27 +0100)]
pty: Narrow the race on ldisc locking
The pty code has always been buggy on its ldisc handling. The recent
changes made the window for the race much bigger. Pending fixing it
properly which is not at all trivial, at least make the race small again so
we don't disrupt other dev work.
Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 16 Jun 2009 18:07:14 +0000 (11:07 -0700)]
printk: add KERN_DEFAULT loglevel to print_modules()
Several WARN_ON() messages omit the '\n' at the end of the string, which
is a simple (and understandable) error. The next line printed after
that warning line is usually the current module list, and that printk
does not have a log-level marker - resulting in one long mixed-up line.
Adding this loglevel marker will now avoid this unreadable mess.
Linus Torvalds [Tue, 16 Jun 2009 18:02:28 +0000 (11:02 -0700)]
printk: Add KERN_DEFAULT printk log-level
This adds a KERN_DEFAULT loglevel marker, for when you cannot decide
which loglevel you want, and just want to keep an existing printk
with the default loglevel.
The difference between having KERN_DEFAULT and having no log-level
marker at all is two-fold:
- having the log-level marker will now force a new-line if the
previous printout had not added one (perhaps because it forgot,
but perhaps because it expected a continuation)
- having a log-level marker is required if you are printing out a
message that otherwise itself could perhaps otherwise be mistaken
for a log-level.
Linus Torvalds [Tue, 16 Jun 2009 17:57:02 +0000 (10:57 -0700)]
printk: clean up handling of log-levels and newlines
It used to be that we would only look at the log-level in a printk()
after explicit newlines, which can cause annoying problems when the
previous printk() did not end with a '\n'. In that case, the log-level
marker would be just printed out in the middle of the line, and be
seen as just noise rather than change the logging level.
This changes things to always look at the log-level in the first
bytes of the printout. If a log level marker is found, it is always
used as the log-level. Additionally, if no newline existed, one is
added (unless the log-level is the explicit KERN_CONT marker, to
explicitly show that it's a continuation of a previous line).
Acked-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Tue, 16 Jun 2009 10:46:54 +0000 (03:46 -0700)]
openprom: Squelch useless GCC warning.
drivers/sbus/char/openprom.c: In function ‘openprom_sunos_ioctl’:
drivers/sbus/char/openprom.c:306: warning: ‘opp’ may be used uninitialized in this function
Signed-off-by: David S. Miller <davem@davemloft.net>
FUJITA Tomonori [Thu, 14 May 2009 16:23:11 +0000 (16:23 +0000)]
sparc: remove dma-mapping_{32|64}.h
This modifies SPARC32 to use struct dma_map ops. It means that we can
remove dma-mapping_{32|64}.h.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Tested-by: Robert Reif <reif@earthlink.net> Signed-off-by: David S. Miller <davem@davemloft.net>
FUJITA Tomonori [Thu, 14 May 2009 16:23:10 +0000 (16:23 +0000)]
sparc: use dma_map_page instead of dma_map_single
This patch converts dma_map_single and dma_unmap_single to use
map_page and unmap_page respectively and removes unnecessary
map_single and unmap_single. map_page can be used to implement
map_single but the opposite is impossible. Having only dma_map_page in
struct dma_ops is enough.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Tested-by: Robert Reif <reif@earthlink.net> Signed-off-by: David S. Miller <davem@davemloft.net>
FUJITA Tomonori [Thu, 14 May 2009 16:23:09 +0000 (16:23 +0000)]
sparc: add sync_single_for_device and sync_sg_for_device to struct dma_ops
This adds sync_single_for_device() and sync_sg_for_device() to struct
dma_ops in order to unify dma-mpping_{32|64}.h. dma-mpping_32.h needs them though dma-mpping_64.h doesn't.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Tested-by: Robert Reif <reif@earthlink.net> Signed-off-by: David S. Miller <davem@davemloft.net>
FUJITA Tomonori [Thu, 14 May 2009 16:23:08 +0000 (16:23 +0000)]
sparc: move the duplication in dma-mapping_{32|64}.h to dma-mapping.h
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Tested-by: Robert Reif <reif@earthlink.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Hong H. Pham [Thu, 4 Jun 2009 09:10:11 +0000 (02:10 -0700)]
sparc64: fix and optimize irq distribution
irq_choose_cpu() should compare the affinity mask against cpu_online_map
rather than CPU_MASK_ALL, since irq_select_affinity() sets the interrupt's
affinity mask to cpu_online_map "and" CPU_MASK_ALL (which ends up being
just cpu_online_map). The mask comparison in irq_choose_cpu() will always
fail since the two masks are not the same. So the CPU chosen is the first CPU
in the intersection of cpu_online_map and CPU_MASK_ALL, which is always CPU0.
That means all interrupts are reassigned to CPU0...
Distributing interrupts to CPUs in a linearly increasing round robin fashion
is not optimal for the UltraSPARC T1/T2. Also, the irq_rover in
irq_choose_cpu() causes an interrupt to be assigned to a different
processor each time the interrupt is allocated and released. This may lead
to an unbalanced distribution over time.
A static mapping of interrupts to processors is done to optimize and balance
interrupt distribution. For the T1/T2, interrupts are spread to different
cores first, and then to strands within a core.
The following is some benchmarks showing the effects of interrupt
distribution on a T2. The test was done with iperf using a pair of T5220
boxes, each with a 10GBe NIU (XAUI) connected back to back.
David S. Miller [Wed, 1 Apr 2009 23:15:20 +0000 (16:15 -0700)]
sparc64: Get rid of real_setup_per_cpu_areas().
Now that we defer the cpu_data() initializations to the end of per-cpu
setup, we can get rid of this local hack we had to setup the per-cpu
areas eary.
This is a necessary step in order to support HAVE_DYNAMIC_PER_CPU_AREA
since the per-cpu setup must run when page structs are available.
Signed-off-by: David S. Miller <davem@davemloft.net>