[PATCH] Increase number of e820 entries hard limit from 32 to 128
The specifications that talk about E820 map doesn't have an upper limit on
the number of e820 entries. But, today's kernel has a hard limit of 32.
With increase in memory size, we are seeing the number of E820 entries
reaching close to 32. Patch below bumps the number upto 128.
The patch changes the location of EDDBUF in zero-page (as it comes after E820).
As, EDDBUF is not used by boot loaders, this patch should not have any effect
on bootloader-setup code interface.
Patch covers both i386 and x86-64.
Tested on:
* grub booting bzImage
* lilo booting bzImage with EDID info enabled
* pxeboot of bzImage
Side-effect:
bss increases by ~ 2K and init.data increases by ~7.5K
on all systems, due to increase in size of static arrays.
Zwane Mwaikambo [Sun, 1 May 2005 15:58:51 +0000 (08:58 -0700)]
[PATCH] cpuid x87 bit on AMD falsely marked as PNI
http://bugme.osdl.org/show_bug.cgi?id=4426
vendor_id : AuthenticAMD
cpu family : 6
model : 10
model name : AMD Athlon(tm) XP
stepping : 0
cpu MHz : 2204.807
<snipped>
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse pni syscall mmxext 3dnowext 3dnow
bogomips : 4358.14
We're marking bit 0 of extended function 0x80000001 cpuid as PNI support on
AMD processors, when it actually denotes x87 FPU present. Patch for i386
and x86_64 below.
Signed-off-by: Zwane Mwaikambo <zwane@arm.linux.org.uk> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
john stultz [Sun, 1 May 2005 15:58:50 +0000 (08:58 -0700)]
[PATCH] i386: fix hpet for systems that don't support legacy replacement
Currently the i386 HPET code assumes the entire HPET implementation from
the spec is present. This breaks on boxes that do not implement the
optional legacy timer replacement functionality portion of the spec.
This patch, which is very similar to my x86-64 patch for the same issue,
fixes the problem allowing i386 systems that cannot use the HPET for the
timer interrupt and RTC to still use the HPET as a time source. I've
tested this patch on a system systems without HPET, with HPET but without
legacy timer replacement, as well as HPET with legacy timer replacement.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
H. Peter Anvin [Sun, 1 May 2005 15:58:49 +0000 (08:58 -0700)]
[PATCH] CPUID bug and inconsistency fix
The recent support for K8 multicore was misported from x86-64 to i386, due
to an unnecessary inconsistency between the CPUID code. Sure, there is are
no x86-64 VIA chips yet, but it should happen eventually.
This patch fixes the i386 bug as well as makes x86-64 match i386 in the
handing of the CPUID array.
Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jaya Kumar [Sun, 1 May 2005 15:58:49 +0000 (08:58 -0700)]
[PATCH] x86 reboot: Add reboot fixup for gx1/cs5530a
This patch by Jaya Kumar introduces a generic infrastructure to deal with
x86 chipsets with nonstandard reset sequences, and adds support for the
Geode gx1/cs5530a chipset.
Signed-off-by: Jaya Kumar <jayalk@intworks.biz> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jack F Vogel [Sun, 1 May 2005 15:58:48 +0000 (08:58 -0700)]
[PATCH] check nmi watchdog is broken
A bug against an xSeries system showed up recently noting that the
check_nmi_watchdog() test was failing.
I have been investigating it and discovered in both i386 and x86_64 the
recent change to the routine to use the cpu_callin_map has uncovered a
problem. Prior to that change, on an SMP box, the test was trivally
passing because all cpu's were found to not yet be online, but now with the
callin_map they are discovered, it goes on to test the counter and they
have not yet begun to increment, so it announces a CPU is stuck and bails
out.
On all the systems I have access to test, the announcement of failure is
also bougs... by the time you can login and check /proc/interrupts, the
NMI count is happily incrementing on all CPUs. Its just that the test is
being done too early.
I have tried moving the call to the test around a bit, and it was always
too early. I finally hit on this proposed solution, it delays the routine
via a late_initcall(), seems like the right solution to me.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The new i386/x86_64 assemblers no longer accept instructions for moving
between a segment register and a 32bit memory location, i.e.,
movl (%eax),%ds
movl %ds,(%eax)
To generate instructions for moving between a segment register and a
16bit memory location without the 16bit operand size prefix, 0x66,
mov (%eax),%ds
mov %ds,(%eax)
should be used. It will work with both new and old assemblers. The
assembler starting from 2.16.90.0.1 will also support
movw (%eax),%ds
movw %ds,(%eax)
without the 0x66 prefix. I am enclosing patches for 2.4 and 2.6 kernels
here. The resulting kernel binaries should be unchanged as before, with
old and new assemblers, if gcc never generates memory access for
Jake Moilanen [Sun, 1 May 2005 15:58:47 +0000 (08:58 -0700)]
[PATCH] ppc64: reverse prediction on spinlock busy loop code
On our raw spinlocks, we currently have an attempt at the lock, and if we do
not get it we enter a spin loop. This spinloop will likely continue for
awhile, and we pridict likely.
Shouldn't we predict that we will get out of the loop so our next instructions
are already prefetched. Even when we miss because the lock is still held, it
won't matter since we are waiting anyways.
I did a couple quick benchmarks, but the results are inconclusive.
16-way 690 running specjbb with original code
# ./specjbb 3000 16 1 1 19 30 120
...
Valid run, Score is 59282
Mispredicting the spinlock busy loop also means we slow down the rate at which
we do the loads which can be good for heavily contended locks.
Note: There are some gcc issues with our default build and branch prediction,
but a CONFIG_POWER4_ONLY build should emit them correctly. I'm working with
Alan Modra on it now.
Signed-off-by: Jake Moilanen <moilanen@austin.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Anton Blanchard [Sun, 1 May 2005 15:58:46 +0000 (08:58 -0700)]
[PATCH] ppc64: firmware workaround
Recent gcc 4.0 testing uncovered a firmware issue. Some properties are larger
than 31 bytes and due to gcc 4.0s better stack allocation this overflow ran
over non volatile register storage.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Anton Blanchard [Sun, 1 May 2005 15:58:45 +0000 (08:58 -0700)]
[PATCH] ppc64: noexec fixes
There were a few issues with the ppc64 noexec support:
The 64bit ABI has a non executable stack by default. At the moment 64bit apps
require a PT_GNU_STACK section in order to have a non executable stack.
Disable the read implies exec workaround on the 64bit ABI. The 64bit
toolchain has never had problems with incorrect mmap permissions (the 32bit
has, thats why we need to retain the workaround).
With these fixes as well as a gcc fix from Alan Modra (that was recently
committed) 64bit apps work as expected.
Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Olof Johansson [Sun, 1 May 2005 15:58:45 +0000 (08:58 -0700)]
[PATCH] PPC64: Remove hot busy-wait loop in __hash_page
It turns out that our current __hash_page code will do a very hot busy-wait
loop waiting on _PAGE_BUSY to be cleared. It even does ldarx/stdcx in the
loop, which will bounce reservations around like crazy if there's more than
one CPU spinning on the same PTE (or even another PTE in the same
reservation granule). The end result is that each fault takes longer when
there's contention, which in turn increases the chance of another thread
hitting the same fault and also piling up. Not pretty.
There's two options here:
1. Do an out-of-line busy loop a'la spinlocks with just loads (no
reserves)
2. Just bail and refault if needed.
(2) makes sense here: If the PTE is busy, chances are it's in flux anyway
and the other code path making a change might just be ready to hash it.
This fixes a stampede seen on a large-ish system where a multithreaded
HPC app faults in the same text pages on several cpus at the same time.
Signed-off-by: Olof Johansson <olof@lixom.net> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mackerras [Sun, 1 May 2005 15:58:45 +0000 (08:58 -0700)]
[PATCH] ppc64: tell firmware about kernel capabilities
On pSeries systems, according to the platform architecture specs, we are
supposed to be supplying a structure to firmware that tells firmware about
our capabilities, such as which version of the data structures that
describe available memory we are expecting to see. The way we end up
having to supply this data structure is a bit gross, since it was designed
for AIX and doesn't suit us very well. This patch adds the code to supply
this data structure to the firmware.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mackerras [Sun, 1 May 2005 15:58:44 +0000 (08:58 -0700)]
[PATCH] ppc64: Fix irq parsing on powermac
When I tried Ben's patches to the powermac sound driver on my G5, I found
that it was taking enormous numbers of sound DMA transmit interrupts. This
turned out to be because it was incorrectly configured as level-sensitive
instead of edge-sensitive, which in turn was because the code that parses
the interrupt tree that Open Firmware gives us was incorrectly assigning
another device the same irq number as the sound DMA transmit interrupt
(i.e. 1).
This patch fixes the problem, in a somewhat quick and dirty way for now,
but one which will work for all the machines we currently run on.
Ultimately Ben and I want to do something more general and robust, but this
should go in for 2.6.12.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch from Roland adds a PT_NOTE section to both 32 and 64 bits vDSOs
to expose the kernel version to glibc, thus avoiding a uname syscall on
every launch. This is equivalent to the patches Roland posted already for
x86 and x86-64.
Note: the 64 bits .note is actually using the 32 bits format. This is
normal. The ELF spec specifies a different format for 64 bits .note, but
for some reason, this was never properly implemented, the core dumps for
example are all using 32 bits format .note, and binutils cannot even read a
64 bits format .note. Talking to our toolchain folks, they think we'd
rather stick to 32 bits format .note everywhere and get the spec fixed some
day ...
Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Colin Leroy [Sun, 1 May 2005 15:58:43 +0000 (08:58 -0700)]
[PATCH] pmac: save master volume on sleep
Ben's patch that shutdowns master switch and restores it after resume
("pmac: Improve sleep code of tumbler driver") isn't enough here on an
iBook (snapper chip).
The master switch is correctly saved and restored, but somehow
tumbler_put_master_volume() gets called just after
tumbler_set_master_volume() and sets mix->master_vol[*] to 0. So, on
resuming, the master switch is reenabled, but the volume is set to 0.
Here's a patch that also saves and restores master_vol.
Signed-off-by: Colin Leroy <colin@colino.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch applies on top of my previous g5 related sound patches and adds
support for the Mac Mini to the PowerMac Alsa driver.
However, I haven't found any kind of HW support for volume control on this
machine. If it exist, it's well hidden. That means that you probably want
to make sure you use software with the ability to do soft volume control,
or use Alsa 0.9 pre-release with the softvol plugin.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch fixes a couple more issues with the management of the GPIOs
dealing with headphone and line out mute on the G5. It should fix the
remaining problems of people not getting any sound out of the headphone
jack.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some earlier models of aluminium powerbooks and ibook G4s have a clock chip
that requires some tweaking before and after sleep. It seems that without
that magic incantation to disable and re-enable clock spreading, RAM isn't
properly refreshed during sleep. This fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andreas Jaggi [Sun, 1 May 2005 15:58:41 +0000 (08:58 -0700)]
[PATCH] macintosh/adbhid.c: adb buttons support for aluminium PowerBook G4
This patch adds support for the special adb buttons of the aluminium
PowerBook G4.
Signed-off-by: Andreas Jaggi <andreas.jaggi@waterwave.ch> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I noticed an occasional crash on wakeup from sleep on my powerbook
(strangly never happened before, probably timing related) that appears to
be due to a dangling interrupt while the chip is put to sleep and beeing
reset on wakeup.
This patch fixes is by disabling the irq in the ide pmac driver while
asleep and only re-enable it after the chip has been fully reset. This is
safe to do so as the interrupt of these apple IDE cells is never shared.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chris Elston [Sun, 1 May 2005 15:58:40 +0000 (08:58 -0700)]
[PATCH] ppc32: fix for misreported SDRAM size on Radstone PPC7D platform
This patch fixes the SDRAM output from /proc/cpuinfo. The previous code
assumed that there was only one bank of SDRAM, and that the size in the memory
configuration register was the total size.
Signed-off-by: Chris Elston <chris.elston@radstone.co.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mackerras [Sun, 1 May 2005 15:58:40 +0000 (08:58 -0700)]
[PATCH] ppc32: refactor FPU exception handling
Moved common FPU exception handling code out of head.S so it can be used by
several of the sub-architectures that might of a full PowerPC FPU.
Also, uses new CONFIG_PPC_FPU define to fix alignment exception handling
for floating point load/store instructions to only occur if we have a
hardware FPU.
Signed-off-by: Jason McMullan <jason.mcmullan@timesys.com> Signed-off-by: Kumar Gala <kumar.gala@freescale.com> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Some G3 CPUs can crash in funny way if a store from an FPU register
instruction is executed on a register that has never been initialized since
power on. This patch fixes it by making sure all FP registers have been
properly initialized at kernel boot and when waking from sleep. It also makes
the code that decides wether HID0_BTIC and HID0_DPM are allowed on a given CPU
smarter (it can actually _clear_ them now if they are not allowed instead of
just setting them when they are allowed in case the firmware got them wrong)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
James Morris [Sun, 1 May 2005 15:58:40 +0000 (08:58 -0700)]
[PATCH] SELinux: add finer grained permissions to Netlink audit processing
This patch provides finer grained permissions for the audit family of
Netlink sockets under SELinux.
1. We need a way to differentiate between privileged and unprivileged
reads of kernel data maintained by the audit subsystem. The AUDIT_GET
operation is unprivileged: it returns the current status of the audit
subsystem (e.g. whether it's enabled etc.). The AUDIT_LIST operation
however returns a list of the current audit ruleset, which is considered
privileged by the audit folk. To deal with this, a new SELinux
permission has been implemented and applied to the operation:
nlmsg_readpriv, which can be allocated to appropriately privileged
domains. Unprivileged domains would only be allocated nlmsg_read.
2. There is a requirement for certain domains to generate audit events
from userspace. These events need to be collected by the kernel,
collated and transmitted sequentially back to the audit daemon. An
example is user level login, an auditable event under CAPP, where
login-related domains generate AUDIT_USER messages via PAM which are
relayed back to auditd via the kernel. To prevent handing out
nlmsg_write permissions to such domains, a new permission has been
added, nlmsg_relay, which is intended for this type of purpose: data is
passed via the kernel back to userspace but no privileged information is
written to the kernel.
Also, AUDIT_LOGIN messages are now valid only for kernel->user messaging,
so this value has been removed from the SELinux nlmsgtab (which is only
used to check user->kernel messages).
Signed-off-by: James Morris <jmorris@redhat.com> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Stephen Smalley [Sun, 1 May 2005 15:58:39 +0000 (08:58 -0700)]
[PATCH] SELinux: cleanup ipc_has_perm
This patch removes the sclass argument from ipc_has_perm in the SELinux
module, as it can be obtained from the ipc security structure. The use of
a separate argument was a legacy of the older precondition function
handling in SELinux and is obsolete. Please apply.
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
akpm@osdl.org [Sun, 1 May 2005 15:58:39 +0000 (08:58 -0700)]
[PATCH] drop_buffers() oops fix
In rare situations, drop_buffers() can be called for a page which has buffers,
but no ->mapping (it was truncated, but the buffers were left behind because
ext3 was still fiddling with them).
But if there was an I/O error in a buffer_head, drop_buffers() will try to get
at the address_space and will oops.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Manfred Spraul [Sun, 1 May 2005 15:58:38 +0000 (08:58 -0700)]
[PATCH] add kmalloc_node, inline cleanup
The patch makes the following function calls available to allocate memory
on a specific node without changing the basic operation of the slab
allocator:
kmem_cache_alloc_node(kmem_cache_t *cachep, unsigned int flags, int node);
kmalloc_node(size_t size, unsigned int flags, int node);
in a similar way to the existing node-blind functions:
kmem_cache_alloc(kmem_cache_t *cachep, unsigned int flags);
kmalloc(size, flags);
kmem_cache_alloc_node was changed to pass flags and the node information
through the existing layers of the slab allocator (which lead to some minor
rearrangements). The functions at the lowest layer (kmem_getpages,
cache_grow) are already node aware. Also __alloc_percpu can call
kmalloc_node now.
Performance measurements (using the pageset localization patch) yields:
w/o patches:
Tasks jobs/min jti jobs/min/task real cpu
1 484.27 100 484.2736 12.02 1.97 Wed Mar 30 20:50:43 2005
100 25170.83 91 251.7083 23.12 150.10 Wed Mar 30 20:51:06 2005
200 34601.66 84 173.0083 33.64 294.14 Wed Mar 30 20:51:40 2005
300 37154.47 86 123.8482 46.99 436.56 Wed Mar 30 20:52:28 2005
400 39839.82 80 99.5995 58.43 580.46 Wed Mar 30 20:53:27 2005
500 40036.32 79 80.0726 72.68 728.60 Wed Mar 30 20:54:40 2005
600 44074.21 79 73.4570 79.23 872.10 Wed Mar 30 20:55:59 2005
700 44016.60 78 62.8809 92.56 1015.84 Wed Mar 30 20:57:32 2005
800 40411.05 80 50.5138 115.22 1161.13 Wed Mar 30 20:59:28 2005
900 42298.56 79 46.9984 123.83 1303.42 Wed Mar 30 21:01:33 2005
1000 40955.05 80 40.9551 142.11 1441.92 Wed Mar 30 21:03:55 2005
with pageset localization and slab API patches:
Tasks jobs/min jti jobs/min/task real cpu
1 484.19 100 484.1930 12.02 1.98 Wed Mar 30 21:10:18 2005
100 27428.25 92 274.2825 21.22 149.79 Wed Mar 30 21:10:40 2005
200 37228.94 86 186.1447 31.27 293.49 Wed Mar 30 21:11:12 2005
300 41725.42 85 139.0847 41.84 434.10 Wed Mar 30 21:11:54 2005
400 43032.22 82 107.5805 54.10 582.06 Wed Mar 30 21:12:48 2005
500 42211.23 83 84.4225 68.94 722.61 Wed Mar 30 21:13:58 2005
600 40084.49 82 66.8075 87.12 873.11 Wed Mar 30 21:15:25 2005
700 44169.30 79 63.0990 92.24 1008.77 Wed Mar 30 21:16:58 2005
800 43097.94 79 53.8724 108.03 1155.88 Wed Mar 30 21:18:47 2005
900 41846.75 79 46.4964 125.17 1303.38 Wed Mar 30 21:20:52 2005
1000 40247.85 79 40.2478 144.60 1442.21 Wed Mar 30 21:23:17 2005
Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The smp_mb() is becaus sync_page() doesn't have PG_locked while it accesses
page_mapping(page). The comments in the patch (the entire patch is the
addition of this comment) try to explain further how and why smp_mb() is
used.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This is a patch for counting the number of pages for bounce buffers. It's
shown in /proc/vmstat.
Currently, the number of bounce pages are not counted anywhere. So, if
there are many bounce pages, it seems that there are leaked pages. And
it's difficult for a user to imagine the usage of bounce pages. So, it's
meaningful to show # of bouce pages.
Nick Piggin [Sun, 1 May 2005 15:58:37 +0000 (08:58 -0700)]
[PATCH] mempool: simplify alloc
Mempool is pretty clever. Looks too clever for its own good :) It
shouldn't really know so much about page reclaim internals.
- don't guess about what effective page reclaim might involve.
- don't randomly flush out all dirty data if some unlikely thing
happens (alloc returns NULL). page reclaim can (sort of :P) handle
it.
I think the main motivation is trying to avoid pool->lock at all costs.
However the first allocation is attempted with __GFP_WAIT cleared, so it
will be 'can_try_harder' if it hits the page allocator. So if allocation
still fails, then we can probably afford to hit the pool->lock - and what's
the alternative? Try page reclaim and hit zone->lru_lock?
A nice upshot is that we don't need to do any fancy memory barriers or do
(intentionally) racy access to pool-> fields outside the lock.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Sun, 1 May 2005 15:58:36 +0000 (08:58 -0700)]
[PATCH] mempool: NOMEMALLOC and NORETRY
Mempools have 2 problems.
The first is that mempool_alloc can possibly get stuck in __alloc_pages
when they should opt to fail, and take an element from their reserved pool.
The second is that it will happily eat emergency PF_MEMALLOC reserves
instead of going to their reserved pools.
Fix the first by passing __GFP_NORETRY in the allocation calls in
mempool_alloc. Fix the second by introducing a __GFP_MEMPOOL flag which
directs the page allocator not to allocate from the reserve pool.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Nick Piggin [Sun, 1 May 2005 15:58:36 +0000 (08:58 -0700)]
[PATCH] mm: pcp use non powers of 2 for batch size
Jack Steiner reported this to have fixed his problem (bad colouring):
"The patches fix both problems that I found - bad
coloring & excessive pages in pagesets."
In most workloads this is not likely to be such a pronounced problem,
however it should help corner cases. And avoiding powers of 2 in these
types of memory operations is always a good idea.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
akpm@osdl.org [Sun, 1 May 2005 15:58:35 +0000 (08:58 -0700)]
[PATCH] RLIMIT_AS checking fix
Address bug #4508: there's potential for wraparound in the various places
where we perform RLIMIT_AS checking.
(I'm a bit worried about acct_stack_growth(). Are we sure that vma->vm_mm is
always equal to current->mm? If not, then we're comparing some other
process's total_vm with the calling process's rlimits).
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
akpm@osdl.org [Sun, 1 May 2005 15:58:35 +0000 (08:58 -0700)]
[PATCH] generic_file_buffered_write fixes
Anton Altaparmakov <aia21@cam.ac.uk> points out:
- It calls fault_in_pages_readable() which is completely bogus if @nr_segs >
1. It needs to be replaced by a to be written
"fault_in_pages_readable_iovec()".
- It increments @buf even in the iovec case thus @buf can point to random
memory really quickly (in the iovec case) and then it calls
fault_in_pages_readable() on this random memory.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sam Ravnborg [Sat, 30 Apr 2005 23:51:42 +0000 (16:51 -0700)]
[PATCH] kbuild: Set NOSTDINC_FLAGS late to speed up compile (a little)
Move definition of NOSTDINC_FLAGS below inclusion of arch Makefile, so
any arch specific settings to $(CC) takes effect before looking up the
compiler include directory.
The previous solution that replaced ':=' with '=' caused gcc to be
invoked one additional time for each directory visited.
This decreases kernel compile time with 0.1 second (3.6 -> 3.5 seconds) when
running make on a fully built kernel
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sam Ravnborg [Sat, 30 Apr 2005 23:51:42 +0000 (16:51 -0700)]
[PATCH] kbuild/ppc: tell when uimage was not built
Tom Rini said:
Note that there is still a trivial'ish change to make. When mkimage
doesn't exist on the host we should say "uImage not made" or
something similar.
So I did like Tom asked.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Sam Ravnborg [Sat, 30 Apr 2005 23:51:42 +0000 (16:51 -0700)]
[PATCH] kbuild/i386: re-introduce dependency on vmlinux for install target, and add kernel_install
Removing the dependency on vmlinux for the install target raised a few
complaints, so instead a new target i added: kernel_install.
kernel_install will install the kernel just like the ordinary install target.
The only difference is that install has a dependency on vmlinux,
kernel_install does not. Therefore kernel_install is the best choice
when accessing the kernel over a NFS mount or as another user.
kernel_install is similar to modules_install in the fact that neither does
a full kernel compile before performing the install.
In this way they are good for root use. Also added back the
dependency on vmlinux for the install target so peoples scripts are no
longer broken.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mackerras [Sat, 30 Apr 2005 17:01:40 +0000 (10:01 -0700)]
[PATCH] ppc64: fix 32-bit signal frame back link
When the kernel creates a signal frame on the user stack, it puts the
old stack pointer value at the beginning so that the signal frame is
linked into the chain of stack frames like any other frame.
Unfortunately, for 32-bit processes we are writing the old stack
pointer as a 64-bit value rather than a 32-bit value, and the process
sees that as a null pointer, since it only looks at the first 32 bits,
which are zero since ppc is bigendian and the stack pointer is below
4GB. This bug is in SLES9 and RHEL4 too, hence the ccs.
This patch fixes the bug by making the signal code write the old stack
pointer as a u32 instead of an unsigned long.
Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] ARM: 2654/1: i.MX UART initialization sets and honors UFCR value
Patch from Sascha Hauer
This patch adds UCFR_RFDIV setting into i.MX serial driver.
This is required, if loader does not fully agree with Linux kernel
about UART setup manner. Linux only blindly expected some values until
now. This should enable to use even serial ports not recognized by
boot-loader as for example third UART found in the bluethoot module.
Patch also enables to detect original setup baudrate in more cases.
Signed-off-by: Pavel Pisa Signed-off-by: Sascha Hauer Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[PATCH] ARM: 2660/2: fix ixdp2800 boot and pci init
Patch from Lennert Buytenhek
The IXDP2800 is an evalution platform for the IXP2800 processor that
has two IXP2800s connected to the same PCI bus. This is problematic
as both CPUs will try to configure the PCI bus as they boot linux.
Contrary to on the other IXP2000 platforms, the boot loader on the
IXDP2800 doesn't configure the PCI bus properly, so we do want the
linux instance on one of the CPUs to do that.
Making one of the CPUs ignore the PCI bus (and thus act like a pure
PCI slave device) is not an option because there is a 82559 NIC on
the PCI bus for each of the CPUs.
The chosen solution is to have the master CPU configure the PCI bus
while the slave is kept in a quiescent state, and then to have the
slave CPU scan the PCI bus (without assigning resources) while the
master is kept in a quiescent state. After this ritual, the master
deletes the slave NIC from its PCI device list, the slave deletes
the master NIC from its device list, and (almost) all is well.
There's still one little problem: each of the CPUs has a 1G SDRAM
BAR, but the IXP2000 only has 512M of outbound PCI memory window.
We solve this by hand-assigning the master and slave SDRAM BARs to
a location outside each of the IXP's outbound PCI windows, and by
having the rest of the BARs autoconfigured in the outbound PCI
windows, in the range [e0000000..ffffffff], so that there is a 1:1
pci:phys mapping between them.
Even with this patch, a number of issues still remain -- just imagine
what happens if one of the CPUs is rebooted, by watchdog or by hand,
but the other one isn't. But those issues are not easily fixable
given the strange PCI layout of this board and the behavior of the
boot loader shipped with the platform.
Signed-off-by: Lennert Buytenhek Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
George G. Davis [Fri, 29 Apr 2005 21:08:35 +0000 (22:08 +0100)]
[PATCH] ARM: 2656/1: Access permission bits are wrong for kernel XIP sections on ARMv6
Patch from George G. Davis
This patch is required for kernel XIP support on ARMv6 machines. It ensures that the access permission bits for kernel XIP section descriptors are APX=1 and AP[1:0]=01, which is Kernel read-only/User no access permissions. Prior to this change, kernel XIP section descriptor access permissions were set to Kernel no access/User no access on ARMv6 machines and the kernel would therefore hang upon entry to userspace when set_fs(USER_DS) was executed.
Signed-off-by: Steve Longerbeam Signed-off-by: George G. Davis Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Olav Kongas [Fri, 29 Apr 2005 21:08:34 +0000 (22:08 +0100)]
[PATCH] ARM: 2649/1: Fix 'sparse -Wbitwise' warnings from MMIO macros
Patch from Olav Kongas
On ARM, the outX() and writeX() families of macros take the
result of cpu_to_leYY(), which is of restricted type __leYY,
and feed it to __raw_writeX(), which expect an argument of
unrestricted type. This results in 'sparse -Wbitwise'
warnings about incorrect types in assignments. Analogous
type mismatch warnings are issued for inX() and readX()
counterparts. The below patch resolves these warnings by
adding forced typecasts.
Signed-off-by: Olav Kongas Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Nicolas Pitre [Fri, 29 Apr 2005 21:08:33 +0000 (22:08 +0100)]
[PATCH] ARM: 2651/3: kernel helpers for NPTL support
Patch from Nicolas Pitre
This patch entirely reworks the kernel assistance for NPTL on ARM.
In particular this provides an efficient way to retrieve the TLS
value and perform atomic operations without any instruction emulation
nor special system call. This even allows for pre ARMv6 binaries to
be forward compatible with SMP systems without any penalty.
The problematic and performance critical operations are performed
through segment of kernel provided user code reachable from user space
at a fixed address in kernel memory. Those fixed entry points are
within the vector page so we basically get it for free as no extra
memory page is required and nothing else may be mapped at that
location anyway.
This is different from (but doesn't preclude) a full blown VDSO
implementation, however a VDSO would prevent some assembly tricks with
constants that allows for efficient branching to those code segments.
And since those code segments only use a few cycles before returning to
user code, the overhead of a VDSO far call would add a significant
overhead to such minimalistic operations.
The ARM_NR_set_tls syscall also changed number. This is done for two
reasons:
1) this patch changes the way the TLS value was previously meant to be
retrieved, therefore we ensure whatever library using the old way
gets fixed (they only exist in private tree at the moment since the
NPTL work is still progressing).
2) the previous number was allocated in a range causing an undefined
instruction trap on kernels not supporting that syscall and it was
determined that allocating it in a range returning -ENOSYS would be
much nicer for libraries trying to determine if the feature is
present or not.
Signed-off-by: Nicolas Pitre Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
As noted in http://www.arm.com/linux/patch-2.6.9-arm1.gz, the "Faulty SWP instruction on 1136 doesn't set bit 11 in DFSR." So the v6_early_abort handler does not report the correct rd/wr direction for the SWP instruction which may result in SEGVS or hangs. In order to work around this problem, this patch merely updates the fix contained in the ARM Ltd. patch to use the macroised abort handler fixups.
Signed-off-by: George G. Davis Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[PATCH] ARM: 2659/1: do not assign PCI I/O address zero on IXP2000
Patch from Lennert Buytenhek
Assigning the address zero to a PCI device BAR causes some part of the
PCI subsystem to believe that resource allocation for that BAR failed
due to resource conflicts, which will make attempts to enable the
device fail. Work around this by assigning I/O addresses starting
from 00010000.
While we're at it, make the PCI I/O resource end at 0001ffff, since we
only have 64k of outbound I/O window on the IXP2000, and we don't do
bank switching.
Signed-off-by: Lennert Buytenhek Signed-off-by: Deepak Saxena Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
[PATCH] ARM: 2658/1: start ixp2000 pci memory resource at 0xe0000000
Patch from Lennert Buytenhek
On the IXDP2800, the bootloader does an awful job of configuring
the PCI bus, so we make linux reconfigure everything. Having a 1:1
pci:phys address mapping generally simplifies everything, so try to
allocate PCI addresses from the [e0000000..ffffffff] range, which is
the physical address range of the outbound PCI window on the IXP2000.
This does not affect any of the other IXP2000 platforms since they
all use their bootloader's PCI resource assignment.
Signed-off-by: Lennert Buytenhek Signed-off-by: Deepak Saxena Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
In order to properly fix some issues with cpufreq vs. sleep on
PowerBooks, I had to add a suspend callback to the pmac_cpufreq driver.
I must force a switch to full speed before sleep and I switch back to
previous speed on resume.
I also added a driver flag to disable the warnings in suspend/resume
since it is expected in this case to have different speed (and I want it
to fixup the jiffies properly).
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Steve French [Fri, 29 Apr 2005 05:41:08 +0000 (22:41 -0700)]
[PATCH] cifs: Fix caching problem
pointed out by Dave Stahl and Vince Negri in which cifs can update the
last modify time on a server modified file without invalidating the
local cached data due to an intervening readdir.
Signed-off-by: Steve French (sfrench@us.ibm.com) Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Steve French [Fri, 29 Apr 2005 05:41:07 +0000 (22:41 -0700)]
[PATCH] cifs: Do not use large smb buffers in response path
unless response is larger than 256 bytes. This cuts more than 1/3 of
the large memory allocations that cifs does and should be a huge help to
memory pressure under stress.
Signed-off-by: Steve French (sfrench@us.ibm.com) Signed-off-by: Linus Torvalds <torvalds@osdl.org>