git.karo-electronics.de Git - linux-beck.git/log

[PATCH] Fix kerneldoc comments in mm/vmalloc.c

The empty line between the short description and the first argument
description causes a section to appear twice in the generated manpage.
Also the short description should really be short: the script can't handle
multiple lines.

Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] mm/page_alloc: use NULL instead of 0 for ptr

Use NULL instead of 0 for pointer value, eliminate sparse warnings.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] mspec driver

Implement the special memory driver (mspec) based on the do_no_pfn
approach. The driver is currently used only on SN2 hardware with special
fetchop support but could be beneficial on other architectures using the
uncached mode.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] do_no_pfn()

Implement do_no_pfn() for handling mapping of memory without a struct page
backing it. This avoids creating fake page table entries for regions which
are not backed by real memory.

This feature is used by the MSPEC driver and other users, where it is
highly undesirable to have a struct page sitting behind the page (for
instance if the page is accessed in cached mode via the struct page in
parallel to the the driver accessing it uncached, which can result in data
corruption on some architectures, such as ia64).

This version uses specific NOPFN_{SIGBUS,OOM} return values, rather than
expect all negative pfn values would be an error. It also bugs on cow
mappings as this would not work with the VM.

[akpm@osdl.org: micro-optimise]
Signed-off-by: Jes Sorensen <jes@sgi.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] zone_statistics: Use hot node instead of cold zone_pgdat

Now that we have the node in the hot zone of struct zone we can avoid
accessing zone_pgdat in zone_statistics.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Do not allocate pagesets for unpopulated zones.

We do not need to allocate pagesets for unpopulated zones.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Add node to zone for the NUMA case

Add the node in order to optimize zone_to_nid.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] GFP_THISNODE for the slab allocator

This patch insures that the slab node lists in the NUMA case only contain
slabs that belong to that specific node.  All slab allocations use
GFP_THISNODE when calling into the page allocator.  If an allocation fails
then we fall back in the slab allocator according to the zonelists appropriate
for a certain context.

This allows a replication of the behavior of alloc_pages and alloc_pages node
in the slab layer.

Currently allocations requested from the page allocator may be redirected via
cpusets to other nodes.  This results in remote pages on nodelists and that in
turn results in interrupt latency issues during cache draining.  Plus the slab
is handing out memory as local when it is really remote.

Fallback for slab memory allocations will occur within the slab allocator and
not in the page allocator.  This is necessary in order to be able to use the
existing pools of objects on the nodes that we fall back to before adding more
pages to a slab.

The fallback function insures that the nodes we fall back to obey cpuset
restrictions of the current context.  We do not allocate objects from outside
of the current cpuset context like before.

Note that the implementation of locality constraints within the slab allocator
requires importing logic from the page allocator.  This is a mischmash that is
not that great.  Other allocators (uncached allocator, vmalloc, huge pages)
face similar problems and have similar minimal reimplementations of the basic
fallback logic of the page allocator.  There is another way of implementing a
slab by avoiding per node lists (see modular slab) but this wont work within
the existing slab.

V1->V2:
- Use NUMA_BUILD to avoid #ifdef CONFIG_NUMA
- Exploit GFP_THISNODE being 0 in the NON_NUMA case to avoid another
  #ifdef

[akpm@osdl.org: build fix]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Disable GFP_THISNODE in the non-NUMA case

GFP_THISNODE must be set to 0 in the non numa case otherwise we disable retry
and warnings for failing allocations in the SMP and UP case.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Add NUMA_BUILD definition in kernel.h to avoid #ifdef CONFIG_NUMA

The NUMA_BUILD constant is always available and will be set to 1 on
NUMA_BUILDs. That way checks valid only under CONFIG_NUMA can easily be done
without #ifdef CONFIG_NUMA

F.e.

if (NUMA_BUILD && <numa_condition>) {
...
}

[akpm: not a thing we'd normally do, but CONFIG_NUMA is special: it is
causing ifdef explosion in core kernel, so let's see if this is a comfortable
way in whcih to control that]

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Condense output of show_free_areas()

On larger systems, the amount of output dumped on the console when you do
SysRq-M is beyond insane.  This patch is trying to reduce it somewhat as
even with the smaller NUMA systems that have hit the desktop this seems to
be a fair thing to do.

The philosophy I have taken is as follows:
1) If a zone is empty, don't tell, we don't need yet another line
    telling us so. The information is available since one can look up
    the fact how many zones were initialized in the first place.
2) Put as much information on a line is possible, if it can be done
    in one line, rahter than two, then do it in one. I tried to format
    the temperature stuff for easy reading.

Change show_free_areas() to not print lines for empty zones.  If no zone
output is printed, the zone is empty.  This reduces the number of lines
dumped to the console in sysrq on a large system by several thousand lines.

Change the zone temperature printouts to use one line per CPU instead of
two lines (one hot, one cold).  On a 1024 CPU, 1024 node system, this
reduces the console output by over a million lines of output.

While this is a bigger problem on large NUMA systems, it is also applicable
to smaller desktop sized and mid range NUMA systems.

Old format:

Mem-info:
Node 0 DMA per-cpu:
cpu 0 hot: high 42, batch 7 used:24
cpu 0 cold: high 14, batch 3 used:1
cpu 1 hot: high 42, batch 7 used:34
cpu 1 cold: high 14, batch 3 used:0
cpu 2 hot: high 42, batch 7 used:0
cpu 2 cold: high 14, batch 3 used:0
cpu 3 hot: high 42, batch 7 used:0
cpu 3 cold: high 14, batch 3 used:0
cpu 4 hot: high 42, batch 7 used:0
cpu 4 cold: high 14, batch 3 used:0
cpu 5 hot: high 42, batch 7 used:0
cpu 5 cold: high 14, batch 3 used:0
cpu 6 hot: high 42, batch 7 used:0
cpu 6 cold: high 14, batch 3 used:0
cpu 7 hot: high 42, batch 7 used:0
cpu 7 cold: high 14, batch 3 used:0
Node 0 DMA32 per-cpu: empty
Node 0 Normal per-cpu: empty
Node 0 HighMem per-cpu: empty
Node 1 DMA per-cpu:
[snip]
Free pages:     5410688kB (0kB HighMem)
Active:9536 inactive:4261 dirty:6 writeback:0 unstable:0 free:338168 slab:1931 mapped:1900 pagetables:208
Node 0 DMA free:1676304kB min:3264kB low:4080kB high:4896kB active:128048kB inactive:61568kB present:1970880kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 HighMem free:0kB min:512kB low:512kB high:512kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 1 DMA free:1951728kB min:3280kB low:4096kB high:4912kB active:5632kB inactive:1504kB present:1982464kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
....

New format:

Mem-info:
Node 0 DMA per-cpu:
CPU    0: Hot: hi:   42, btch:   7 usd:  41   Cold: hi:   14, btch:   3 usd:   2
CPU    1: Hot: hi:   42, btch:   7 usd:  40   Cold: hi:   14, btch:   3 usd:   1
CPU    2: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
CPU    3: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
CPU    4: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
CPU    5: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
CPU    6: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
CPU    7: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
Node 1 DMA per-cpu:
[snip]
Free pages:     5411088kB (0kB HighMem)
Active:9558 inactive:4233 dirty:6 writeback:0 unstable:0 free:338193 slab:1942 mapped:1918 pagetables:208
Node 0 DMA free:1677648kB min:3264kB low:4080kB high:4896kB active:129296kB inactive:58864kB present:1970880kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 1 DMA free:1948448kB min:3280kB low:4096kB high:4912kB active:6864kB inactive:3536kB present:1982464kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] slab: fix kmalloc_node applying memory policies if nodeid == numa_node_id()

kmalloc_node() falls back to ___cache_alloc() under certain conditions and
at that point memory policies may be applied redirecting the allocation
away from the current node. Therefore kmalloc_node(...,numa_node_id()) or
kmalloc_node(...,-1) may not return memory from the local node.

Fix this by doing the policy check in __cache_alloc() instead of
____cache_alloc().

This version here is a cleanup of Kiran's patch.

- Tested on ia64.
- Extra material removed.
- Consolidate the exit path if alternate_node_alloc() returned an object.

[akpm@osdl.org: warning fix]
Signed-off-by: Alok N Kataria <alok.kataria@calsoftinc.com>
Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Shai Fultheim <shai@scalex86.org>
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] page invalidation cleanup

Clean up the invalidate code, and use a common function to safely remove
the page from pagecache.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] own header file for struct page

This moves the definition of struct page from mm.h to its own header file
page-struct.h.  This is a prereq to fix SetPageUptodate which is broken on
s390:

#define SetPageUptodate(_page)
       do {
               struct page *__page = (_page);
               if (!test_and_set_bit(PG_uptodate, &__page->flags))
                       page_test_and_clear_dirty(_page);
       } while (0)

_page gets used twice in this macro which can cause subtle bugs.  Using
__page for the page_test_and_clear_dirty call doesn't work since it causes
yet another problem with the page_test_and_clear_dirty macro as well.

In order to avoid all these problems caused by macros it seems to be a good
idea to get rid of them and convert them to static inline functions.
Because of header file include order it's necessary to have a seperate
header file for the struct page definition.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] vm: add per-zone writeout counter

The VM is supposed to minimise the number of pages which get written off the
LRU (for IO scheduling efficiency, and for high reclaim-success rates). But
we don't actually have a clear way of showing how true this is.

So add `nr_vmscan_write' to /proc/vmstat and /proc/zoneinfo - the number of
pages which have been written by the vm scanner in this zone and globally.

Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Allow an arch to expand node boundaries

Arch-independent zone-sizing determines the size of a node
(pgdat->node_spanned_pages) based on the physical memory that was
registered by the architecture. However, when
CONFIG_MEMORY_HOTPLUG_RESERVE is set, the architecture expects that the
spanned_pages will be much larger and that mem_map will be allocated that
is used lated on memory hot-add.

This patch allows an architecture that sets CONFIG_MEMORY_HOTPLUG_RESERVE
to call push_node_boundaries() which will set the node beginning and end to
at *least* the requested boundary.

Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Account for holes that are outside the range of physical memory

absent_pages_in_range() made the assumption that users of the API would not
care about holes beyound the end of physical memory. This was not the
case. This patch will account for ranges outside of physical memory as
holes correctly.

Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Account for memmap and optionally the kernel image as holes

The x86_64 code accounted for memmap and some portions of the the DMA zone as
holes.  This was because those areas would never be reclaimed and accounting
for them as memory affects min watermarks.  This patch will account for the
memmap as a memory hole.  Architectures may optionally use set_dma_reserve()
if they wish to account for a portion of memory in ZONE_DMA as a hole.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Have ia64 use add_active_range() and free_area_init_nodes

Size zones and holes in an architecture independent manner for ia64.

[bob.picco@hp.com: fix ia64 FLATMEM+VIRTUAL_MEM_MAP]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Have x86_64 use add_active_range() and free_area_init_nodes

Size zones and holes in an architecture independent manner for x86_64.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Have x86 use add_active_range() and free_area_init_nodes

Size zones and holes in an architecture independent manner for x86.

[akpm@osdl.org: build fix]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Have Power use add_active_range() and free_area_init_nodes()

Size zones and holes in an architecture independent manner for Power.

[judith@osdl.org: build fix]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Introduce mechanism for registering active regions of memory

At a basic level, architectures define structures to record where active
ranges of page frames are located.  Once located, the code to calculate zone
sizes and holes in each architecture is very similar.  Some of this zone and
hole sizing code is difficult to read for no good reason.  This set of patches
eliminates the similar-looking architecture-specific code.

The patches introduce a mechanism where architectures register where the
active ranges of page frames are with add_active_range().  When all areas have
been discovered, free_area_init_nodes() is called to initialise the pgdat and
zones.  The zone sizes and holes are then calculated in an architecture
independent manner.

Patch 1 introduces the mechanism for registering and initialising PFN ranges
Patch 2 changes ppc to use the mechanism - 139 arch-specific LOC removed
Patch 3 changes x86 to use the mechanism - 136 arch-specific LOC removed
Patch 4 changes x86_64 to use the mechanism - 74 arch-specific LOC removed
Patch 5 changes ia64 to use the mechanism - 52 arch-specific LOC removed
Patch 6 accounts for mem_map as a memory hole as the pages are not reclaimable.
It adjusts the watermarks slightly

Tony Luck has successfully tested for ia64 on Itanium with tiger_defconfig,
gensparse_defconfig and defconfig.  Bob Picco has also tested and debugged on
IA64.  Jack Steiner successfully boot tested on a mammoth SGI IA64-based
machine.  These were on patches against 2.6.17-rc1 and release 3 of these
patches but there have been no ia64-changes since release 3.

There are differences in the zone sizes for x86_64 as the arch-specific code
for x86_64 accounts the kernel image and the starting mem_maps as memory holes
but the architecture-independent code accounts the memory as present.

The big benefit of this set of patches is a sizable reduction of
architecture-specific code, some of which is very hairy.  There should be a
greater reduction when other architectures use the same mechanisms for zone
and hole sizing but I lack the hardware to test on.

Additional credit;
Dave Hansen for the initial suggestion and comments on early patches
Andy Whitcroft for reviewing early versions and catching numerous
errors
Tony Luck for testing and debugging on IA64
Bob Picco for fixing bugs related to pfn registration, reviewing a
number of patch revisions, providing a number of suggestions
on future direction and testing heavily
Jack Steiner and Robin Holt for testing on IA64 and clarifying
issues related to memory holes
Yasunori for testing on IA64
Andi Kleen for reviewing and feeding back about x86_64
Christian Kujau for providing valuable information related to ACPI
problems on x86_64 and testing potential fixes

This patch:

Define the structure to represent an active range of page frames within a node
in an architecture independent manner.  Architectures are expected to register
active ranges of PFNs using add_active_range(nid, start_pfn, end_pfn) and call
free_area_init_nodes() passing the PFNs of the end of each zone.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Andi Kleen <ak@muc.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Keith Mannthey" <kmannth@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] fix x86_64-mm-spinlock-cleanup

We need processor.h for cpu_relax().

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Make kmem_cache_destroy() return void

un-, de-, -free, -destroy, -exit, etc functions should in general return
void. Also,

There is very little, say, filesystem driver code can do upon failed
kmem_cache_destroy(). If it will be decided to BUG in this case, BUG
should be put in generic code, instead.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] Really ignore kmem_cache_destroy return value

* Rougly half of callers already do it by not checking return value
* Code in drivers/acpi/osl.c does the following to be sure:

(void)kmem_cache_destroy(cache);

* Those who check it printk something, however, slab_error already printed
the name of failed cache.
* XFS BUGs on failed kmem_cache_destroy which is not the decision
low-level filesystem driver should make. Converted to ignore.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] fs: Removing useless casts

* Removing useless casts
* Removing useless wrapper
* Conversion from kmalloc+memset to kzalloc

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Acked-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] fs: Conversions from kmalloc+memset to k(z|c)alloc

Conversions from kmalloc+memset to kzalloc.

Signed-off-by: Panagiotis Issaris <takis@issaris.org>
Jffs2-bit-acked-by: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] more ext3 16T overflow fixes

Some of the changes in balloc.c are just cosmetic, as Andreas pointed out -
if they overflow they'll then underflow and things are fine.

5th hunk actually fixes an overflow problem.

Also check for potential overflows in inode & block counts when resizing.

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] ext3: Fix sparse warnings

Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] ext3: More whitespace cleanups

More white space cleanups in preparation of cloning ext4 from ext3.
Removing spaces that precede a tab.

Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] ext3: wrong error behavior

SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error
behavior was broken in linux kernels since 2.5.x versions by the following
patch:

2002/10/31 02:15:26-05:00 tytso@snap.thunk.org
Default mount options from superblock for ext2/3 filesystems
http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ

In case ext3 file system is mounted with errors=continue
(EXT3_ERRORS_CONTINUE) errors should be ignored when possible. However at
present in case of any error kernel aborts journal and remounts filesystem
to read-only. Such behavior was hit number of times and noted to differ
from that of 2.4.x kernels.

This patch fixes this:
- do nothing in case of EXT3_ERRORS_CONTINUE,
- set EXT3_MOUNT_ABORT and call journal_abort() in all other cases
- panic() should be called after ext3_commit_super() to save
sb marked as EXT3_ERROR_FS

Signed-off-by: Vasily Averin <vvs@sw.ru>
Acked-by: Kirill Korotaev <dev@sw.ru>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Stephen C. Tweedie" <sct@redhat.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] ext3: more comments about block allocation/reservation code

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] ext3: turn on reservation dump on block allocation errors

In the past there were a few kernel panics related to block reservation
tree operations failure (insert/remove etc). It would be very useful to
get the block allocation reservation map info when such error happens.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] JBD: 16T fixes

These are a few places I've found in jbd that look like they may not be
16T-safe, or consistent with the use of unsigned longs for block
containers. Problems here would be somewhat hard to hit, would require
journal blocks past the 8T boundary, which would not be terribly common.
Still, should fix.

(some of these have come from the ext4 work on jbd as well).

I think there's one more possibility that the wrap() function may not be
safe IF your last block in the journal butts right up against the 232 block
boundary, but that seems like a VERY remote possibility, and I'm not
worrying about it at this point.

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] ext3: inode numbers are unsigned long

This is primarily format string fixes, with changes to ialloc.c where large
inode counts could overflow, and also pass around journal_inum as an
unsigned long, just to be pedantic about it....

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] ext2: fix mounts at 16T

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] fix ext3 mounts at 16T

I need to do some actual IO testing now, but this gets things mounting for
a 16T ext3 filesystem.  (patched up e2fsprogs is needed too, I'll send that
off the kernel list)

This patch fixes these issues in the kernel:

o sbi->s_groups_count overflows in ext3_fill_super()

sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) -
       le32_to_cpu(es->s_first_data_block) +
       EXT3_BLOCKS_PER_GROUP(sb) - 1) /
      EXT3_BLOCKS_PER_GROUP(sb);

  at 16T, s_blocks_count is already maxed out; adding
  EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0.
  Not really what we want, and causes a failed mount.

  Feel free to check my math (actually, please do!), but changing it this
  way should work & avoid the overflow:

  (A + B - 1)/B changed to: ((A - 1)/B) + 1

o ext3_check_descriptors() overflows range checks

  ext3_check_descriptors() iterates over all block groups making sure
  that various bits are within the right block ranges...  on the last pass
  through, it is checking the error case

   [item] >= block + EXT3_BLOCKS_PER_GROUP(sb)

  where "block" is the first block in the last block group.  The last
  block in this group (and the last one that will fit in 32 bits) is block
  + EXT3_BLOCKS_PER_GROUP(sb)- 1.  block + EXT3_BLOCKS_PER_GROUP(sb) wraps
  back around to 0.

  so, make things clearer with "first_block" and "last_block" where those
  are first and last, inclusive, and use <, > rather than <, >=.

  Finally, the last block group may be smaller than the rest, so account
  for this on the last pass through: last_block = sb->s_blocks_count - 1;

(a similar patch could be done for ext2; does anyone in their right mind
use ext2 at 16T?  I'll send an ext2 patch doing the same thing if that's
warranted)

Signed-off-by: Eric Sandeen <esandeen@redhat.com>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] jbd: use BUILD_BUG_ON in journal init

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Stephen Tweedie <sct@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] ext3 and jbd cleanup: remove whitespace

Remove whitespace from ext3 and jbd, before we clone ext4.

Signed-off-by: Mingming Cao<cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] jbd: add lock annotation to jbd_sync_bh

jbd_sync_bh releases journal->j_list_lock. Add a lock annotation to this
function so that sparse can check callers for lock pairing, and so that
sparse will not complain about this function since it intentionally uses
the lock in this manner.

Signed-off-by: Josh Triplett <josh@freedesktop.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

[PATCH] fix "cpu to node relationship fixup: map cpu to node"

Fix build error introduced by 3212fe1594e577463bc8601d28aa008f520c3377

Non-NUMA case should be handled.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/i2c-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/i2c-2.6: (30 commits)
  i2c: Drop unimplemented slave functions
  i2c: Constify i2c_algorithm declarations, part 2
  i2c: Constify i2c_algorithm declarations, part 1
  i2c: Let drivers constify i2c_algorithm data
  i2c-isa: Restore driver owner
  i2c-viapro: Add support for the VT8237A and VT8251
  i2c: Warn on i2c client creation failure
  i2c-core: Drop useless bitmaskings
  i2c-algo-pcf: Discard the mdelay data struct member
  i2c-algo-bit: Cleanups
  i2c-isa: Fail adding driver on attach_adapter error
  i2c: __must_check fixes (chip drivers)
  i2c-dev: attach/detach_adapter cleanups
  i2c-stub: Chip address as a module parameter
  i2c: Plan i2c-isa for removal
  i2c: New bus driver for TI OMAP boards
  i2c-algo-bit: Discard the mdelay data struct member
  i2c-matroxfb: Struct init conversion
  i2c: Fix copy-n-paste in subsystem Kconfig
  i2c-au1550: Add I2C support for Au1200
  ...

Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: (28 commits)
  pciehp - fix wrong return value
  IA64: PCI: dont disable irq which is not enabled
  acpiphp: add support for ioapic hot-remove
  PCI: assign ioapic resource at hotplug
  acpiphp: disable bridges
  acpiphp: stop bus device before acpi_bus_trim
  PCI: add pci_stop_bus_device
  acpiphp: do not initialize existing ioapics
  acpiphp: initialize ioapics before starting devices
  acpiphp: set hpp values before starting devices
  PCI Hotplug: cleanup pcihp skeleton code.
  PCI: Restore PCI Express capability registers after PM event
  PCI: drivers/pci/hotplug/acpiphp_glue.c: make a function static
  PCI: Multiprobe sanitizer
  PCI: fix __must_check warnings
  PCI Hotplug: fix __must_check warnings
  SHPCHP: fix __must_check warnings
  PCI-Express AER implemetation: pcie_portdrv error handler
  PCI-Express AER implemetation: AER core and aerdriver
  PCI-Express AER implemetation: export pcie_port_bus_type
  ...

[MIPS] setup.c: use early_param() for early command line parsing

There's no point to rewrite some logic to parse command line
to pass initrd parameters or to declare a user memory area.
We could use instead parse_early_param() that does the same
thing.

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] setup.c: remove MAXMEM macro

It doesn't improve readability.

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] setup.c: do not inline functions

There's no point to inline any functions in setup.c. Let's GCC
doing its job, it's good enough for that now.

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] setup.c: remove useless includes.

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] setup.c: move initrd code inside dedicated functions

NUMA specific code could rely on them too.

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] setup.c: cleanup bootmem_init()

This function although doing simple thing is hard to follow. It's
mainly due to:

    - a lot of #ifdef
    - bad local names
    - redundant tests

So this patch try to address these issues. It also do not use
max_pfn global which is marked as an unused exported symbol.

As a bonus side, it's now really easy to see what part of the
code is for no-numa system.

There's also no point to make this function inline.

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] get_wchan(): remove uses of mfinfo[64]

This array was used to 'cache' some frame info about scheduler
functions to speed up get_wchan(). This array was 1Ko size and
was only used when CONFIG_KALLSYMS was set but declared for all
configs.

Rather than make the array statement conditional, this patches
removes this array and its uses. Indeed the common case doesn't
seem to use this array and get_wchan() is not a critical path
anyways.

It results in a smaller bss and a smaller/cleaner code:

   text    data     bss     dec     hex filename
2543808  254148  139296 2937252  2cd1a4 vmlinux-new-get-wchan
2544080  254148  143392 2941620  2ce2b4 vmlinux~old

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] get_frame_info(): null function size means size is unknown

This patch adds 2 sanity checks.

The first one test that the start address of the function to analyze has been
set by the caller. If not return an error since nothing usefull can be done
without.

The second one checks that the function's size has been set. A null size can
happen if CONFIG_KALLSYMS is not set and it means that we don't know the size
of the function to analyze. In this case, we make it equal to 128 instructions
by default.

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] unwind_stack(): return ra if an exception occured at the first instruction

Signed-off-by: Franck Bui-Huu <vagabon.xyz@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Enable tmpfs for anything that possibly runs a full distribution.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] s/__ASSEMBLER__/__ASSEMBLY__/ for clarity sake.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Have headers_install install <asm/cachectl.h> and <asm/sysmips.h>.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] fstatat syscall names

MIPS is the only port to call its fstatat()-related syscalls
"__NR_fstatat".  Now I can see why that might be seen as every
other port being wrong, but I think for o32, it is at best confusing.
__NR_fstat provides a plain (32-bit) stat while __NR_fstatat provides a
64-bit stat.  Changing the name to __NR_fstatat64 would make things more
explicit, match x86, and make the glibc port slightly easier.

The current name is more appropriate for n32 and n64, but it would be
appropriate for other 64-bit targets too, and those targets have chosen
to call it __NR_newfstatat instead.  Using the same name for MIPS would
again be more consistent and make the glibc port slightly easier.

I'm not wedded to this idea if the current names are preferred,
but FWIW...

Signed-off-by: Richard Sandiford <richard@codesourcery.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] The o32 fstatat syscall behaves differently on 32 and 64 bit kernels

While working on a glibc patch to support the fstatat() functions[1],
I noticed that the o32 implementation behaves differently on 32-bit and
64-bit kernels; the former provides a stat64 while the latter provides
a plain (o32) stat. I think the former is what's intended, as there is
no separate fstatat64. It's also what x86 does.

I think this is just a case of a compat too far.

[1] I've seen Khem's patch, but I don't think it's right.

Signed-off-by: Richard Sandiford <richard@codesourcery.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Remove EV96100 as previously announced.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Replace BARRIER with more appropriate hazard barrier.

This is the unchanged part 2 of Chris' hazard cleanup.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Cleanup hazard handling.

Mostly based on patch by Chris Dearman and cleanups from Yoichi.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] IP27: Delete useless declaration of allocate_irqno().

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Fix USER_PTRS_PER_PGD for 64K page size.

The code in pgtable-64.h assumes TASK_SIZE is always bigger than a first
level PGDIR_SIZE. This is not the case for 64K pages, where task size is
40 bits (1TB) and a pgd entry can map 42 bits. This leads to
USER_PTRS_PER_PGD being zero for 64K pages.

Signed-off-by: Peter Watkins <treestem@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Add configuration variables for RM9xxx processor

This patch introduces a number of configuration variables. These allow to
specify presence/absence of integrated peripherals found on the MIPS
RM9xxx processor family, based on the particular processor model used.

Signed-off-by: Thomas Koeller <thomas.koeller@baslerweb.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Move excite_fpga.h to include/asm-mips/mach-excite

excite_fpga.h, like all platform headers, really belongs in the
platform header directory.

Signed-off-by: Thomas Koeller <thomas.koeller@baslerweb.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Suppress compiler warnings

The excite platform exports hardware resources for device drivers to use.
Any driver wanting to use these resources will look up them by their names.
Since these resources are declared to have static linkage, but are not
used in the source file defining them, the compiler used to emit an
'unused' warning, which this patch suppresses.

Signed-off-by: Thomas Koeller <thomas.koeller@baslerweb.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Reformat missformated SMTC bits.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Qemu does not have D-cache aliases

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Remove F_SETSIG and F_GETSIG in favor of the asm-generic definitions.

Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Move definition of IRIX compat constant into IRIX compat code.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Use common definitions from asm-generic/signal.h

Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] c-r4k: Convert init functions from inline to __init.

With more recent compilers inline doesn't necessarily means a function
will always be inlined. So leave that decission to the compiler and
make the function as __init.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] TLS: set_thread_area returns asmlinkage int not void.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] TLS: Delete unused sys32_set_thread_area

There is no need for a compat version.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Make PROT_WRITE imply PROT_READ.

[MIPS] Atlas: update interrupt handling

The following change updates the Atlas interrupt handling to match that
of Malta. Tested with a 5Kc and a 34Kf successfully.

Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Atlas: Fix building the RTC driver

Atlas maps its RTC chip in the host mmio space rather than using the
"traditional" location in the PCI/ISA port space. A change that has
happened to the generic RTC header requires to define ARCH_RTC_LOCATION
now.

Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Patch to arch/mips/mips-boards/generic/time.c

In hooking up the perf counter overflow interrupt to the experimental
deprecated-real-soon-now /proc/perf interface last night, I had to
revisit arch/mips/mips-boards/generic/time.c, and discovered that
when the 2.6.9-based SMTC prototype was merged with the more
recent tree, it was missed that arch/mips/kernel/time.c had changed
so that even in SMP kernels, timer_interrupt() calls
local_timer_interrupt(), so there is no longer a need to invoke it
directly from mips_timer_interrupt() in those cases where
timer_interrupt() has been called. So I got rid of that, and added the
invocation of perf_irq() if Cause.PCI is set, more-or-less following the
same logic as in the non-SMTC case, with the modifications that (a) a
runtime check for Release 2 isn't done, because it's redundant in SMTC),
and (b) we check for a clock interrupt regardless of the value returned
by the perf counter service - I don't understand why we'd want to control
that with perf_irq(), but maybe one of you knows the story. I also got
rid of the stupid warning about the unused variable when compiled for
SMTC (another artifact of the merge). The result hasn't been beaten to
death, but boots, seems stable, and supports extended precision event
counting.

Signed-off-by: Kevin D. Kissell <kevink@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Reduce race between cpu_wait() and need_resched() checking

If a thread became runnable between need_resched() and the WAIT
instruction, switching to the thread will delay until a next interrupt.
Some CPUs can execute the WAIT instruction with interrupt disabled, so
we can get rid of this race on them (at least UP case).

Original Patch by Atsushi with fixing up for MIPS Technology's cores by
Ralf based on feedback from the RTL designers.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Wire up set_robust_list(2) and get_robust_list(2)

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Fix errors detected by "make headers_check"

* export asm/sgidefs.h
* include asm/isadep.h only if in kernel
* do not export contents of asm/timex.h and asm/user.h

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Do not lose upper 32-bit on MIPS32 with 64-bit addresses in __pte().

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Replace generic__raw_read_trylock usage

generic__raw_read_trylock() is a defect generic function actually doing
a __raw_read_lock ...

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] SEAD defconfig build fix

Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Fix for pci config_access on alchemy au1x000

I've encountered a serious problem with PCI config space access on Au1x000
platforms with recent 2.6.x-kernel. With 2.4.31 the same hardware works fine.
So I was looking for the differences:

Symptoms:
- no PCI-device is seen on bootup though two or three cards are present
- lspci output is empty
- OR: lspci shows 20 times the same device
(- OR: in some slot-configurations it worked anyhow)

System(s):
1. platform with Au1500 and three PCI-devices (actually a mycable XXS1500
with backplane for three PCI-devices)
2. platform with Au1550 and two PCI-devices (custom board)

Debugging:
I digged down to the config_access() of the au1xxx-processors in
arch/mips/pci/ops-au1000.c and switched on DEBUG.

The code of config_access() seems to be almost the same as of the
2.4.x-kernel. But the "pci_cfg_vm->addr" returned by get_vm_area(0x2000, 0)
once on booting is different. That's of course not forbidden. But the
alignment seems to be wrong. In my case, I received:

2.4.31: pci_cfg_vm->addr = c0000000
2.6.18-rc5: pci_cfg_vm->addr = c0101000

To make it short: With 2.6.x it fails on the first config-access with:
"PCI ERR detected: status 83a00356".

Fixup:
My fix is now, to use the VM_IOREMAP-flag in the get_vm_area call. This flag
seems to be introduced in mm/vmalloc.c a long time ago (in 2.6.7-bk13, I
found in gitweb).
Now, the returned address is pci_cfg_vm->addr = c0104000 and everything works
fine.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Make prepare_frametrace() not clobber v0

Since lmo commit 323a380bf9e1a1679a774a2b053e3c1f2aa3f179 ("Simplify
dump_stack()") made prepare_frametrace() always inlined, using $2 (v0)
in __asm__ is not safe anymore. We can use $1 (at) instead. Also we
should use "dla" instead of "la" for 64-bit kernel.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Do not use drop_mmu_context to flusing other task's VIPT I-cache.

c-r4k.c and c-sb1.c use drop_mmu_context() to flush virtually tagged
I-caches, but this does not work for flushing other task's icache. This
is for example triggered by copy_to_user_page() called from ptrace(2).
Use indexed flush for such cases.

Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] MT: Fix setting of XTC.

XTC can only be set if VPA is clear, which it may not be. There is
also the possibility of a back to back c0 register access hazard to
take care of.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] SMTC Build fix.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Fix 32-bit kernel by replacing 64-bit-only code.

dclz() expects its 64-bit argument being passed as a single register
but on 32-bit kernels it'll actually be in a register pair.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] MT: When doing "select SMP" also select SMP's prerequesites or ...

... kconfig will do weird stuff.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] eXcite: Don't set SERIAL_RM9000.

The driver has not been merged yet so selecting it results in a warning
message.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Retire flush_icache_page from mm use.

On the 34K the redundant cache operations were causing excessive stalls
resulting in realtime code running on the second VPE missing its deadline.
For all other platforms this patch is just a significant performance
improvment as illustrated by below benchmark numbers.

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
25Kf      2.6.18-rc4     533 0.49 1.16 7.57 33.4 30.5 1.34 12.4 5497 17.K 54.K
25Kf      2.6.18-rc4-p   533 0.49 1.16 6.68 23.0 30.7 1.36 8.55 5030 16.K 48.K
4Kc       2.6.18-rc4      80 4.21 15.0 131. 289. 261. 16.5 258. 18.K 70.K 227K
4Kc       2.6.18-rc4-p    80 4.34 13.1 128. 285. 262. 18.2 258. 12.K 52.K 176K
34Kc      2.6.18-rc4      40 5.01 14.0 61.6 90.0 477. 17.9 94.7 29.K 108K 342K
34Kc      2.6.18-rc4-p    40 4.98 13.9 61.2 89.7 475. 17.6 93.7 8758 44.K 158K
BCM1480   2.6.18-rc4     700 0.28 0.60 3.68 5.92 16.0 0.78 5.08 931. 3163 15.K
BCM1480   2.6.18-rc4-p   700 0.28 0.61 3.65 5.85 16.0 0.79 5.20 395. 1464 8385
TX49-16K  2.6.18-rc3     197 0.73 2.41 19.0 37.8 82.9 2.94 17.5 4438 14.K 56.K
TX49-16K  2.6.18-rc3-p   197 0.73 2.40 19.9 36.3 82.9 2.94 23.4 2577 9103 38.K
TX49-32K  2.6.18-rc3     396 0.36 1.19 6.80 11.8 41.0 1.46 8.17 2738 8465 32.K
TX49-32K  2.6.18-rc3-p   396 0.36 1.19 6.82 10.2 41.0 1.46 8.18 1330 4638 18.K

Original patch by me with enhancements by Atsushi Nemoto.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>

[MIPS] Avoid double signal restarting.

In entry.S resume_userspace ... jal do_notify_resume form a loop through
which the kernel will iterate as long as work is pending. If we
iterate through this loop more than once with no signal pending for at
least one but the last iteration we will take do the syscall restarting
multiple times resulting in a syscall return prior to the the syscall
instruction in userspace. This may happen when debugging a multithreaded
program.

Debugging and original fix by Maciej; extended to other ABIs by me.

Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] MT: Initialise all writable bits in Cause register to zero.

Recent 34Ks come out of reset with WP enabled on VPE 1 so we take an
immediate exception when starting the second VPE.

Signed-off-by: Chris Dearman <chris@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Fix EV64120 PCI fixup in Makefile

Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Add missing returns in signal code.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] IRIX: Crapectopy.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Don't call try_to_freeze in do_signal & co.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

[MIPS] Cleanup leftovers of ARCH_HAS_IRQ_PER_CPU

CONFIG_IRQ_PER_CPU now controls the IRQ_PER_CPU stuff.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>