Add cmpxchg_local to asm-generic for per cpu atomic operations
Emulates the cmpxchg_local by disabling interrupts around variable modification.
This is not reentrant wrt NMIs and MCEs. It is only protected against normal
interrupts, but this is enough for architectures without such interrupt sources
or if used in a context where the data is not shared with such handlers.
It can be used as a fallback for architectures lacking a real cmpxchg
instruction.
For architectures that have a real cmpxchg but does not have NMIs or MCE,
testing which of the generic vs architecture specific cmpxchg is the fastest
should be done.
asm-generic/cmpxchg.h defines a cmpxchg that uses cmpxchg_local. It is meant to
be used as a cmpxchg fallback for architectures that do not support SMP.
* Patch series comments
Using cmpxchg_local shows a performance improvements of the fast path goes from
a 66% speedup on a Pentium 4 to a 14% speedup on AMD64.
In detail:
Tested-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Measurements on a Pentium4, 3GHz, Hyperthread.
SLUB Performance testing
========================
1. Kmalloc: Repeatedly allocate then free test
Tested-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Measurements on a AMD64 2.0 GHz dual-core
In this test, we seem to remove 10 cycles from the kmalloc fast path.
On small allocations, it gives a 14% performance increase. kfree fast
path also seems to have a 10 cycles improvement.
H. Peter Anvin [Thu, 7 Feb 2008 08:15:57 +0000 (00:15 -0800)]
Sanitize the type of struct user.u_ar0
struct user.u_ar0 is defined to contain a pointer offset on all
architectures in which it is defined (all architectures which define an
a.out format except SPARC.) However, it has a pointer type in the headers,
which is pointless -- <asm/user.h> is not exported to userspace, and it
just makes the code messy.
Redefine the field as "unsigned long" (which is the same size as a pointer
on all Linux architectures) and change the setting code to user offsetof()
instead of hand-coded arithmetic.
Cc: Linux Arch Mailing List <linux-arch@vger.kernel.org> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Lennert Buytenhek <kernel@wantstofly.org> Cc: HÃ¥vard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Do not export asm/user.h and linux/user.h during make headers_install.
Signed-off-by: Kirill A. Shutemov <k.shutemov@gmail.com> Reviewed-by: David Woodhouse <dwmw2@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:52 +0000 (00:15 -0800)]
iget: remove iget() and the read_inode() super op as being obsolete
Remove the old iget() call and the read_inode() superblock operation it uses
as these are really obsolete, and the use of read_inode() does not produce
proper error handling (no distinction between ENOMEM and EIO when marking an
inode bad).
Furthermore, this removes the temptation to use iget() to find an inode by
number in a filesystem from code outside that filesystem.
iget_locked() should be used instead. A new function is added in an earlier
patch (iget_failed) that is to be called to mark an inode as bad, unlock it
and release it should the get routine fail. Mark iget() and read_inode() as
being obsolete and remove references to them from the documentation.
Typically a filesystem will be modified such that the read_inode function
becomes an internal iget function, for example the following:
David Howells [Thu, 7 Feb 2008 08:15:51 +0000 (00:15 -0800)]
iget: stop HPPFS from using iget() and read_inode()
Stop the HPPFS filesystem from using iget() and read_inode(). Provide an
hppfs_iget(), and call that instead of iget(). hppfs_iget() then uses
iget_locked() directly and returns a proper error code instead of an inode in
the event of an error.
hppfs_fill_sb_common() returns any error incurred when getting the root inode
instead of EINVAL.
Note that the contents of hppfs_kern.c need to be examined:
(*) The HPPFS inode retains a pointer to the proc dentry it is shadowing, but
whilst it does appear to retain a reference to it, it doesn't appear to
destroy the reference if the inode goes away.
(*) hppfs_iget() should perhaps subsume init_inode() and hppfs_read_inode().
(*) It would appear that all hppfs inodes are the same inode because iget()
was being called with inode number 0, which forms the lookup key.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:50 +0000 (00:15 -0800)]
iget: stop HOSTFS from using iget() and read_inode()
Stop the HOSTFS filesystem from using iget() and read_inode(). Provide
hostfs_iget(), and call that instead of iget(). hostfs_iget() then uses
iget_locked() directly and returns a proper error code instead of an inode in
the event of an error.
hostfs_fill_sb_common() returns any error incurred when getting the root inode
instead of EINVAL.
Note that the contents of hostfs_kern.c need to be examined:
(*) hostfs_iget() should perhaps subsume init_inode() and hostfs_read_inode().
(*) It would appear that all hostfs inodes are the same inode because iget()
was being called with inode number 0 - which forms the lookup key.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:49 +0000 (00:15 -0800)]
iget: stop OPENPROMFS from using iget() and read_inode()
Stop the OPENPROMFS filesystem from using iget() and read_inode(). Replace
openpromfs_read_inode() with openpromfs_iget(), and call that instead of
iget(). openpromfs_iget() then uses iget_locked() directly and returns a
proper error code instead of an inode in the event of an error.
openpromfs_fill_super() returns any error incurred when getting the root inode
instead of ENOMEM (not that it currently incurs any other error).
Signed-off-by: David Howells <dhowells@redhat.com> Cc: "David S. Miller" <davem@davemloft.net> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:48 +0000 (00:15 -0800)]
iget: stop UFS from using iget() and read_inode()
Stop the UFS filesystem from using iget() and read_inode(). Replace
ufs_read_inode() with ufs_iget(), and call that instead of iget(). ufs_iget()
then uses iget_locked() directly and returns a proper error code instead of an
inode in the event of an error.
ufs_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:47 +0000 (00:15 -0800)]
iget: stop the SYSV filesystem from using iget() and read_inode()
Stop the SYSV filesystem from using iget() and read_inode(). Replace
sysv_read_inode() with sysv_iget(), and call that instead of iget().
sysv_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:46 +0000 (00:15 -0800)]
iget: stop ROMFS from using iget() and read_inode()
Stop the ROMFS filesystem from using iget() and read_inode(). Replace
romfs_read_inode() with romfs_iget(), and call that instead of iget().
romfs_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
romfs_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:45 +0000 (00:15 -0800)]
iget: stop QNX4 from using iget() and read_inode()
Stop the QNX4 filesystem from using iget() and read_inode(). Replace
qnx4_read_inode() with qnx4_iget(), and call that instead of iget().
qnx4_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
qnx4_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Anders Larsen <al@alarsen.net> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:45 +0000 (00:15 -0800)]
iget: stop PROCFS from using iget() and read_inode()
Stop the PROCFS filesystem from using iget() and read_inode(). Merge
procfs_read_inode() into procfs_get_inode(), and have that call iget_locked()
instead of iget().
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:44 +0000 (00:15 -0800)]
iget: stop the MINIX filesystem from using iget() and read_inode()
Stop the MINIX filesystem from using iget() and read_inode(). Replace
minix_read_inode() with minix_iget(), and call that instead of iget().
minix_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
minix_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:43 +0000 (00:15 -0800)]
iget: stop JFS from using iget() and read_inode()
Stop the JFS filesystem from using iget() and read_inode(). Replace
jfs_read_inode() with jfs_iget(), and call that instead of iget(). jfs_iget()
then uses iget_locked() directly and returns a proper error code instead of an
inode in the event of an error.
jfs_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:42 +0000 (00:15 -0800)]
iget: stop JFFS2 from using iget() and read_inode()
Stop the JFFS2 filesystem from using iget() and read_inode(). Replace
jffs2_read_inode() with jffs2_iget(), and call that instead of iget().
jffs2_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
jffs2_do_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:41 +0000 (00:15 -0800)]
iget: stop ISOFS from using read_inode()
Stop the ISOFS filesystem from using read_inode(). Make isofs_read_inode()
return an error code, and make isofs_iget() pass it on.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Jan Kara <jack@ucw.cz> Acked-by: Christoph Hellwig <hch@lst.de> Cc: "Dave Young" <hidave.darkstar@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:40 +0000 (00:15 -0800)]
iget: stop HFSPLUS from using iget() and read_inode()
Stop the HFSPLUS filesystem from using iget() and read_inode(). Replace
hfsplus_read_inode() with hfsplus_iget(), and call that instead of iget().
hfsplus_iget() then uses iget_locked() directly and returns a proper error
code instead of an inode in the event of an error.
hfsplus_fill_super() returns any error incurred when getting the root inode.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:39 +0000 (00:15 -0800)]
iget: stop FreeVXFS from using iget() and read_inode()
Stop the FreeVXFS filesystem from using iget() and read_inode(). Replace
vxfs_read_inode() with vxfs_iget(), and call that instead of iget().
vxfs_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
vxfs_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:37 +0000 (00:15 -0800)]
iget: stop EXT4 from using iget() and read_inode()
Stop the EXT4 filesystem from using iget() and read_inode(). Replace
ext4_read_inode() with ext4_iget(), and call that instead of iget().
ext4_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
ext4_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:36 +0000 (00:15 -0800)]
iget: stop EXT3 from using iget() and read_inode()
Stop the EXT3 filesystem from using iget() and read_inode(). Replace
ext3_read_inode() with ext3_iget(), and call that instead of iget().
ext3_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
ext3_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:35 +0000 (00:15 -0800)]
iget: stop EXT2 from using iget() and read_inode()
Stop the EXT2 filesystem from using iget() and read_inode(). Replace
ext2_read_inode() with ext2_iget(), and call that instead of iget().
ext2_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
ext2_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:34 +0000 (00:15 -0800)]
iget: stop EFS from using iget() and read_inode()
Stop the EFS filesystem from using iget() and read_inode(). Replace
efs_read_inode() with efs_iget(), and call that instead of iget(). efs_iget()
then uses iget_locked() directly and returns a proper error code instead of an
inode in the event of an error.
efs_fill_super() returns any error incurred when getting the root inode
instead of EACCES.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:33 +0000 (00:15 -0800)]
iget: stop CIFS from using iget() and read_inode()
Stop the CIFS filesystem from using iget() and read_inode(). Replace
cifs_read_inode() with cifs_iget(), and call that instead of iget().
cifs_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
cifs_read_super() now returns any error incurred when getting the root inode
instead of ENOMEM.
cifs_iget() needs examining. The comment "can not call macro FreeXid here
since in a void func" is no longer true.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Steven French <sfrench@us.ibm.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:32 +0000 (00:15 -0800)]
iget: stop BFS from using iget() and read_inode()
Stop the BFS filesystem from using iget() and read_inode(). Replace
bfs_read_inode() with bfs_iget(), and call that instead of iget(). bfs_iget()
then uses iget_locked() directly and returns a proper error code instead of an
inode in the event of an error.
bfs_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[kamalesh@linux.vnet.ibm.com: build fix] Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:31 +0000 (00:15 -0800)]
iget: stop BEFS from using iget() and read_inode()
Stop the BEFS filesystem from using iget() and read_inode(). Replace
befs_read_inode() with befs_iget(), and call that instead of iget().
befs_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
befs_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Will Dyson <will_dyson@pobox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:30 +0000 (00:15 -0800)]
iget: stop autofs from using iget() and read_inode()
Stop the autofs filesystem from using iget() and read_inode(). Replace
autofs_read_inode() with autofs_iget(), and call that instead of iget().
autofs_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Ian Kent <raven@themaw.net> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:29 +0000 (00:15 -0800)]
iget: stop AFFS from using iget() and read_inode()
Stop the AFFS filesystem from using iget() and read_inode(). Replace
affs_read_inode() with affs_iget(), and call that instead of iget().
affs_iget() then uses iget_locked() directly and returns a proper error code
instead of an inode in the event of an error.
affs_fill_super() returns any error incurred when getting the root inode
instead of EINVAL.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:29 +0000 (00:15 -0800)]
iget: use iget_failed() in GFS2
Use iget_failed() in GFS2 to kill a failed inode.
Signed-off-by: David Howells <dhowells@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:27 +0000 (00:15 -0800)]
iget: introduce a function to register iget failure
Introduce a function to register failure in an inode construction path. This
includes marking the inode under construction as bad, unlocking it and
releasing it.
Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 7 Feb 2008 08:15:26 +0000 (00:15 -0800)]
Add an ERR_CAST() function to complement ERR_PTR and co.
Add an ERR_CAST() function to complement ERR_PTR and co. for the purposes
of casting an error entyped as one pointer type to an error of another
pointer type whilst making it explicit as to what is going on.
This provides a replacement for the ERR_PTR(PTR_ERR(p)) construct.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes the configuration dependencies in the vmcoreinfo data.
i386's "node_data" is defined in arch/x86/mm/discontig_32.c,
and x86_64's one is defined in arch/x86/mm/numa_64.c.
They depend on CONFIG_NUMA:
arch/x86/mm/Makefile_32:7
obj-$(CONFIG_NUMA) += discontig_32.o
arch/x86/mm/Makefile_64:7
obj-$(CONFIG_NUMA) += numa_64.o
ia64's "pgdat_list" is defined in arch/ia64/mm/discontig.c,
and it depends on CONFIG_DISCONTIGMEM and CONFIG_SPARSEMEM:
arch/ia64/mm/Makefile:9-10
obj-$(CONFIG_DISCONTIGMEM) += discontig.o
obj-$(CONFIG_SPARSEMEM) += discontig.o
ia64's "node_memblk" is defined in arch/ia64/mm/numa.c,
and it depends on CONFIG_NUMA:
arch/ia64/mm/Makefile:8
obj-$(CONFIG_NUMA) += numa.o
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Acked-by: Simon Horman <horms@verge.net.au> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
vmcoreinfo: rename vmcoreinfo's macros returning the size
This patchset is for the vmcoreinfo data.
The vmcoreinfo data has the minimum debugging information only for dump
filtering. makedumpfile (dump filtering command) gets it to distinguish
unnecessary pages, and makedumpfile creates a small dumpfile.
This patch:
VMCOREINFO_SIZE() should be renamed VMCOREINFO_STRUCT_SIZE() since it's always
returning the size of the struct with a given name. This change would allow
VMCOREINFO_TYPEDEF_SIZE() to simply become VMCOREINFO_SIZE() since it need not
be used exclusively for typedefs.
This discussion is the following:
http://www.ussg.iu.edu/hypermail/linux/kernel/0709.3/0582.html
Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bernhard Walle [Thu, 7 Feb 2008 08:15:19 +0000 (00:15 -0800)]
Use BOOTMEM_EXCLUSIVE for kdump
Use the BOOTMEM_EXCLUSIVE, introduced in the previous patch, to avoid
conflicts while reserving the memory for the kdump capture kernel
(crashkernel=).
Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: <linux-arch@vger.kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bernhard Walle [Thu, 7 Feb 2008 08:15:17 +0000 (00:15 -0800)]
Introduce flags for reserve_bootmem()
This patchset adds a flags variable to reserve_bootmem() and uses the
BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
between crashkernel area and already used memory.
This patch:
Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
If that flag is set, the function returns with -EBUSY if the memory already
has been reserved in the past. This is to avoid conflicts.
Because that code runs before SMP initialisation, there's no race condition
inside reserve_bootmem_core().
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix powerpc build] Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: <linux-arch@vger.kernel.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Thu, 7 Feb 2008 08:15:16 +0000 (00:15 -0800)]
fs menu: small reorg
- move minixfs and ROMfs to the Miscellaneous filesystems menu
- move DNOTIFY config symbol so that it is adjacent to INOTIFY
instead of being split by the QUOTA config options
- add some 'endif' annotations
- remove some whitespace (extra blank lines)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a set of changes to implement proper resource management in the
driver, including iomem space reservation and operating on physical
addresses ioremap()ped appropriately using accessory functions rather than
unportable direct assignments.
Some adjustments to code are made to reflect the architecture of the
interface, which is a centrally controlled multiport (or, as referred to
from DEC documentation, a serial line multiplexer, going up to 8 lines
originally) rather than a bundle of separate ports.
Types are changed, where applicable, to specify the width of hardware
registers explicitly. The interrupt handler is now managed in the
->startup() and ->shutdown() calls for consistency with other drivers and
also in preparation to handle the handover from the initial firmware-based
console gracefully.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dz.c: Use a helper to cast from "struct uart_port *"
Replace all casts from "struct uart_port *" to "struct dz_port *" with a
construct based on container_of(). This makes the conversion work
irrespective of where the former struct is located within the latter.
By popular request I have implemented it as an inline function rather than
a macro this time.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dz: clean up and improve the setup of termios settings
A set of changes to the way termios settings are propagated to the serial
port hardware. The DZ11 only supports a selection of fixed baud settings,
so some requests may not be fulfilled. Keep the old setting in such a case
and failing that resort to 9600bps. Also add a missing update of the
transmit timeout. And remove the explicit encoding of the line selected
from writes to the Line Parameters Register as it has been preencoded by
the ->set_termios() call already. Finally, remove a duplicate macro for
the Receiver Enable bit.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that I have got the necessary piece of hardware (thanks, Thiemo!), I may
well offer myself as the maintainer for the dz serial driver. I hope nobody
objects.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dz: handle special conditions on reception correctly
Handle the read and ignore status masks correctly. Handle the BREAK condition
as expected: a framing error with a null character is a BREAK, any other
framing error is a framing error indeed.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ->start_tx(), ->stop_tx() and ->stop_rx() backends are called with the
port's lock already taken. Remove locking from within them and wrap around
calls as necessary.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reformat the Kconfig entries and update descriptions for accuracy. Select the
driver by default for configurations of interest. For the curious: 32BIT
means only 32-bit DECstations support the device, not that the driver is not
64-bit clean; I have not checked that either though.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Well, panic() is a little bit undue if request_irq() fails; there is probably
no need to justify it any further. Handle the case gracefully, by
unregistering the driver.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dz: always check if it is safe to console_putchar()
Polled transmission is tricky enough with the DZ11 design. While "loop" is
set to a high value, conceptually you are not allowed to transmit without
checking whether the device offers the right transmission line (yes, it is the
device that selects the line -- the driver has no control over it other than
disabling the transmitter offered if it is the wrong one), so the loop has to
be run at least once.
Well, the '1977 or PDP11 view of how serial lines should be handled... Except
that the serial interface used to be quite an impressive board back then
rather than chip.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Boatright [Thu, 7 Feb 2008 08:14:58 +0000 (00:14 -0800)]
drivers/edac: pci: broken parity regression
Using the EDAC code in kernel.org kernel version 2.6.23.8 I am seeing the
following problem:
In the kernel there is a pci device attribute located in sysfs that is
checked by the EDAC PCI scanning code. If that attribute is set,
PCI parity/error scannining is skipped for that device. The attribute
is:
broken_parity_status
as is located in /sys/devices/pci<XXX>/0000:XX:YY.Z directorys for
PCI devices.
I don't think this check was actually implemented. I have a misbehaved card
that reports a parity error every 1000 ms:
Nov 25 07:28:43 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
Nov 25 07:28:44 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
Nov 25 07:28:45 beta kernel: EDAC PCI: Master Data Parity Error on 0000:05:01.0
Setting that card's broken_parity_status bit did not mask the error:
I looked through the EDAC code and did not readily see any reference to
broken_parity_status at all (which makes sense based on the behavior I am
seeing). I applied the following patch as a proof-of-concept and now EDAC's
PCI parity error reporting behaves as documented:
bryan
Good regression find, bryan. It used to work. sigh.
I added more logic to your patch, for more coverage of the error.
Dave Jiang [Thu, 7 Feb 2008 08:14:56 +0000 (00:14 -0800)]
drivers-edac: add marvell mv64x60 driver
Marvell mv64x60 SoC support for EDAC. Used on PPC and MIPS platforms.
Development and testing done on PPC Motorola prpmc2800 ATCA board.
[akpm@linux-foundation.org: make mv64x60_ctl_name static] Signed-off-by: Dave Jiang <djiang@mvista.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk Signed-off-by: Douglas Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adds driver for the Cell memory controller when used without a Hypervisor such
as on the IBM Cell blades. There might still be some improvements to do to
this such as finding if it's possible to properly obtain more details about
the address of the error but it's good enough already to report CE counts
which is our main priority at the moment.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Blanchard [Thu, 7 Feb 2008 08:14:51 +0000 (00:14 -0800)]
drivers-edac: use round_jiffies_relative
When rounding a relative timeout we need to use round_jiffies_relative().
Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samuel Ortiz [Thu, 7 Feb 2008 08:14:49 +0000 (00:14 -0800)]
ASIC3 driver
This is a patch for the Compaq ASIC3 multi function chip, found in many
PDAs (iPAQs, HTCs...).
It is a simplified version of Paul Sokolovsky's first proposal [1]. With
this code, it is basically a GPIO and IRQ expander. My plan is to add more
features once this patch gets reviewed and accepted.
[1] http://lkml.org/lkml/2007/5/1/46
Signed-off-by: Samuel Ortiz <sameo@openedhand.com> Cc: Paul Sokolovsky <pmiscml@gmail.com> Cc: Ben Dooks <ben@trinity.fluff.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Jackson [Thu, 7 Feb 2008 08:14:48 +0000 (00:14 -0800)]
cpusets: update_cpumask documentation fix
Update cpuset documentation to match the October 2007 "Fix cpusets
update_cpumask" changes that now apply changes to a cpusets 'cpus' allowed
mask immediately to the cpus_allowed of the tasks in that cpuset.
Signed-off-by: Paul Jackson <pj@sgi.com> Acked-by: Cliff Wickman <cpw@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pavel Emelyanov [Thu, 7 Feb 2008 08:14:47 +0000 (00:14 -0800)]
Handle pid namespaces in cgroups code
There's one place that works with task pids - its the "tasks" file in cgroups.
The read/write handlers assume, that the pid values go to/come from the user
space and thus it is a virtual pid, i.e. the pid as it is seen from inside a
namespace.
Tune the code accordingly.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Jackson [Thu, 7 Feb 2008 08:14:47 +0000 (00:14 -0800)]
hotplug cpu move tasks in empty cpusets - refinements
- Narrow the scope of callback_mutex in scan_for_empty_cpusets().
- Avoid rewriting the cpus, mems of cpusets except when it is likely that
we'll be changing them.
- Have remove_tasks_in_empty_cpuset() also check for empty mems.
Signed-off-by: Paul Jackson <pj@sgi.com> Acked-by: Cliff Wickman <cpw@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Jackson [Thu, 7 Feb 2008 08:14:46 +0000 (00:14 -0800)]
hotplug cpu: move tasks in empty cpusets to parent various other fixes
Various minor formatting and comment tweaks to Cliff Wickman's
[PATCH_3_of_3]_cpusets__update_cpumask_revision.patch
I had had "iff", meaning "if and only if" in a comment. However, except for
ancient mathematicians, the abbreviation "iff" was a tad too cryptic. Cliff
changed it to "if", presumably figuring that the "iff" was a typo. However,
it was the "only if" half of the conjunction that was most interesting.
Reword to emphasis the "only if" aspect.
The locking comment for remove_tasks_in_empty_cpuset() was wrong; it said
callback_mutex had to be held on entry. The opposite is true.
Several mentions of attach_task() in comments needed to be
changed to cgroup_attach_task().
A comment about notify_on_release was no longer relevant,
as the line of code it had commented, namely:
set_bit(CS_RELEASED_RESOURCE, &parent->flags);
is no longer present in that place in the cpuset.c code.
Similarly a comment about notify_on_release before the
scan_for_empty_cpusets() routine was no longer relevant.
Removed extra parentheses and unnecessary return statement.
Renamed attach_task() to cpuset_attach() in various comments.
Removed comment about not needing memory migration, as it seems the migration
is done anyway, via the cpuset_attach() callback from cgroup_attach_task().
Signed-off-by: Paul Jackson <pj@sgi.com> Acked-by: Cliff Wickman <cpw@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Menage [Thu, 7 Feb 2008 08:14:45 +0000 (00:14 -0800)]
cgroups: update comments in cpuset.c
Some of the comments in kernel/cpuset.c were stale following the
transition to control groups; this patch updates them to more closely
match reality.
Signed-off-by: Paul Menage <menage@google.com> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cliff Wickman [Thu, 7 Feb 2008 08:14:43 +0000 (00:14 -0800)]
hotplug cpu: move tasks in empty cpusets to parent
This patch corrects a situation that occurs when one disables all the cpus in
a cpuset.
Currently, the disabled (cpu-less) cpuset inherits the cpus of its parent,
which is incorrect because it may then overlap its cpu-exclusive sibling.
Tasks of an empty cpuset should be moved to the cpuset which is the parent of
their current cpuset. Or if the parent cpuset has no cpus, to its parent,
etc.
And the empty cpuset should be released (if it is flagged notify_on_release).
Depends on the cgroup_scan_tasks() function (proposed by David Rientjes) to
iterate through all tasks in the cpu-less cpuset. We are deliberately
avoiding a walk of the tasklist.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cliff Wickman [Thu, 7 Feb 2008 08:14:42 +0000 (00:14 -0800)]
cgroups: mechanism to process each task in a cgroup
Provide cgroup_scan_tasks(), which iterates through every task in a cgroup,
calling a test function and a process function for each. And call the process
function without holding the css_set_lock lock.
The idea is David Rientjes', predicting that such a function will make it much
easier in the future to extend things that require access to each task in a
cgroup without holding the lock,
[akpm@linux-foundation.org: cleanup]
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Balbir Singh [Thu, 7 Feb 2008 08:14:41 +0000 (00:14 -0800)]
Memory controller remove control_type feature
Based on the discussion at http://lkml.org/lkml/2007/12/20/383, it was felt
that control_type might not be a good thing to implement right away. We
can add this flexibility at a later point when required.
per-zone and reclaim enhancements for memory controller: modifies vmscan.c for isolate globa/cgroup lru activity
When using memory controller, there are 2 levels of memory reclaim.
1. zone memory reclaim because of system/zone memory shortage.
2. memory cgroup memory reclaim because of hitting limit.
These two can be distinguished by sc->mem_cgroup parameter.
(scan_global_lru() macro)
This patch tries to make memory cgroup reclaim routine avoid affecting
system/zone memory reclaim. This patch inserts if (scan_global_lru()) and
hook to memory_cgroup reclaim support functions.
This patch can be a help for isolating system lru activity and group lru
activity and shows what additional functions are necessary.
* mem_cgroup_calc_mapped_ratio() ... calculate mapped ratio for cgroup.
* mem_cgroup_reclaim_imbalance() ... calculate active/inactive balance in
cgroup.
* mem_cgroup_calc_reclaim_active() ... calculate the number of active pages to
be scanned in this priority in mem_cgroup.
* mem_cgroup_calc_reclaim_inactive() ... calculate the number of inactive pages
to be scanned in this priority in mem_cgroup.
* mem_cgroup_all_unreclaimable() .. checks cgroup's page is all unreclaimable
or not.
* mem_cgroup_get_reclaim_priority() ...
* mem_cgroup_note_reclaim_priority() ... record reclaim priority (temporal)
* mem_cgroup_remember_reclaim_priority()
.... record reclaim priority as
zone->prev_priority.
This value is used for calc reclaim_mapped.
[akpm@linux-foundation.org: fix unused var warning] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Menage <menage@google.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
per-zone and reclaim enhancements for memory controller: per-zone active inactive counter
This patch adds per-zone status in memory cgroup. These values are often read
(as per-zone value) by page reclaiming.
In current design, per-zone stat is just a unsigned long value and not an
atomic value because they are modified only under lru_lock. (So, atomic_ops
is not necessary.)
This patch adds ACTIVE and INACTIVE per-zone status values.
For handling per-zone status, this patch adds
struct mem_cgroup_per_zone {
...
}
and some helper functions. This will be useful to add per-zone objects
in mem_cgroup.
This patch turns memory controller's early_init to be 0 for calling
kmalloc() in initialization.
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Rientjes <rientjes@google.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Menage <menage@google.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
per-zone and reclaim enhancements for memory controller: add scan_global_lru macro
This is used to detect which scan_control scans global lru or mem_cgroup lru.
And compiled to be static value (1) when memory controller is not configured.
This may make the meaning obvious.
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: David Rientjes <rientjes@google.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Menage <menage@google.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a handler "pre_destroy" to cgroup_subsys. It is called before
cgroup_rmdir() checks all subsys's refcnt.
I think this is useful for subsys which have some extra refs even if there
are no tasks in cgroup. By adding pre_destroy(), the kernel keeps the rule
"destroy() against subsystem is called only when refcnt=0." and allows css
ref to be used by other objects than tasks.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Menage <menage@google.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memory cgroup enhancements: add status accounting function for memory cgroup
Add statistics account infrastructure for memory controller. All account
information is stored per-cpu and caller will not have to take lock or use
atomic ops. This will be used by memory.stat file later.
CACHE includes swapcache now. I'd like to divide it to
PAGECACHE and SWAPCACHE later.
This patch adds 3 functions for accounting.
* __mem_cgroup_stat_add() ... for usual routine.
* __mem_cgroup_stat_add_safe ... for calling under irq_disabled section.
* mem_cgroup_read_stat() ... for reading stat value.
* renamed PAGECACHE to CACHE (because it may include swapcache *now*)
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix smp_processor_id-in-preemptible]
[akpm@linux-foundation.org: uninline things]
[akpm@linux-foundation.org: remove dead code] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Paul Menage <menage@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Kirill Korotaev <dev@sw.ru> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: David Rientjes <rientjes@google.com> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: Kirill Korotaev <dev@sw.ru> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Menage <menage@google.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>