NeilBrown [Sun, 25 Jun 2006 12:48:02 +0000 (05:48 -0700)]
[PATCH] Make copy_from_user_inatomic NOT zero the tail on i386
As described in a previous patch and documented in mm/filemap.h,
copy_from_user_inatomic* shouldn't zero out the tail of the buffer after an
incomplete copy.
This patch implements that change for i386.
For the _nocache version, a new __copy_user_intel_nocache is defined similar
to copy_user_zeroio_intel_nocache, and this is ultimately used for the copy.
For the regular version, __copy_from_user_ll_nozero is defined which uses
__copy_user and __copy_user_intel - the later needs casts to reposition the
__user annotations.
If copy_from_user_atomic is given a constant length of 1, 2, or 4, then we do
still zero the destintion on failure. This didn't seem worth the effort of
fixing as the places where it is used really don't care.
Signed-off-by: Neil Brown <neilb@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
NeilBrown [Sun, 25 Jun 2006 12:47:58 +0000 (05:47 -0700)]
[PATCH] Prepare for __copy_from_user_inatomic to not zero missed bytes
The problem is that when we write to a file, the copy from userspace to
pagecache is first done with preemption disabled, so if the source address is
not immediately available the copy fails *and* *zeros* *the* *destination*.
This is a problem because a concurrent read (which admittedly is an odd thing
to do) might see zeros rather that was there before the write, or what was
there after, or some mixture of the two (any of these being a reasonable thing
to see).
If the copy did fail, it will immediately be retried with preemption
re-enabled so any transient problem with accessing the source won't cause an
error.
The first copying does not need to zero any uncopied bytes, and doing so
causes the problem. It uses copy_from_user_atomic rather than copy_from_user
so the simple expedient is to change copy_from_user_atomic to *not* zero out
bytes on failure.
The first of these two patches prepares for the change by fixing two places
which assume copy_from_user_atomic does zero the tail. The two usages are
very similar pieces of code which copy from a userspace iovec into one or more
page-cache pages. These are changed to remove the assumption.
The second patch changes __copy_from_user_inatomic* to not zero the tail.
Once these are accepted, I will look at similar patches of other architectures
where this is important (ppc, mips and sparc being the ones I can find).
This patch:
There is a problem with __copy_from_user_inatomic zeroing the tail of the
buffer in the case of an error. As it is called in atomic context, the error
may be transient, so it results in zeros being written where maybe they
shouldn't be.
In the usage in filemap, this opens a window for a well timed read to see data
(zeros) which is not consistent with any ordering of reads and writes.
Most cases where __copy_from_user_inatomic is called, a failure results in
__copy_from_user being called immediately. As long as the latter zeros the
tail, the former doesn't need to. However in *copy_from_user_iovec
implementations (in both filemap and ntfs/file), it is assumed that
copy_from_user_inatomic will zero the tail.
This patch removes that assumption, so that after this patch it will
be safe for copy_from_user_inatomic to not zero the tail.
This patch also adds some commentary to filemap.h and asm-i386/uaccess.h.
After this patch, all architectures that might disable preempt when
kmap_atomic is called need to have their __copy_from_user_inatomic* "fixed".
This includes
- powerpc
- i386
- mips
- sparc
Signed-off-by: Neil Brown <neilb@suse.de> Cc: David Howells <dhowells@redhat.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Chris Wright [Sun, 25 Jun 2006 12:47:55 +0000 (05:47 -0700)]
[PATCH] cpuset: remove extra cpuset_zone_allowed check in __alloc_pages
This is redundant with check in wakeup_kswapd.
Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Paul Jackson <pj@sgi.com> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mingming Cao [Sun, 25 Jun 2006 12:47:50 +0000 (05:47 -0700)]
[PATCH] Avoid disk sector_t overflow for >2TB ext3 filesystem
If ext3 filesystem is larger than 2TB, and sector_t is a u32 (i.e.
CONFIG_LBD not defined in the kernel), the calculation of the disk sector
will overflow. Add check at ext3_fill_super() and ext3_group_extend() to
prevent mount/remount/resize >2TB ext3 filesystem if sector_t size is 4
bytes.
Verified this patch on a 32 bit platform without CONFIG_LBD defined
(sector_t is 32 bits long), mount refuse to mount a 10TB ext3.
Signed-off-by: Mingming Cao<cmm@us.ibm.com> Acked-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
schedule_on_each_cpu() presently does a large kmalloc - 96 kbytes on 1024 CPU
64-bit.
Rework it so that we do one 8192-byte allocation and then a pile of tiny ones,
via alloc_percpu(). This has a much higher chance of success (100% in the
current VM).
This also has the effect of reducing the memory requirements from NR_CPUS*n to
num_possible_cpus()*n.
Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Randy Dunlap [Sun, 25 Jun 2006 12:47:47 +0000 (05:47 -0700)]
[PATCH] kernel-doc: drop leading space in sections
Drop leading space of kernel-doc section contents.
"Section" data (contents) are split from the section header
(e.g., Note: below is a section header:
* Note: list_empty on entry does not return true after this, the entry is
* in an undefined state.
).
Currently the data/contents begins with a space and is left that way, which
causes it to look bad when printed (in text mode; see example below), so
just remove the leading space.
Note:
list_empty on entry does not return true after this, the entry is in an
undefined state.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sun, 25 Jun 2006 12:47:46 +0000 (05:47 -0700)]
[PATCH] pdflush: handle resume wakeups
pdflush is carefully designed to ensure that all wakeups have some
corresponding work to do - if a woken-up pdflush thread discovers that it
hasn't been given any work to do then this is considered an error.
That all broke when swsusp came along - because a timer-delivered wakeup to a
frozen pdflush thread will just get lost. This causes the pdflush thread to
get lost as well: the writeback timer is supposed to be re-armed by pdflush in
process context, but pdflush doesn't execute the callout which does this.
Fix that up by ignoring the return value from try_to_freeze(): jsut proceed,
see if we have any work pending and only go back to sleep if that is not the
case.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sun, 25 Jun 2006 12:47:45 +0000 (05:47 -0700)]
[PATCH] cpqarray section fix
WARNING: drivers/block/cpqarray.o - Section mismatch: reference to .init.text: from .text between 'cpqarray_register_ctlr' (at offset 0xe98) and 'alloc_cpqarray_hba'
WARNING: drivers/block/cpqarray.o - Section mismatch: reference to .init.text: from .text between 'cpqarray_register_ctlr' (at offset 0xe9c) and 'alloc_cpqarray_hba'
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Alan Cox [Sun, 25 Jun 2006 12:47:44 +0000 (05:47 -0700)]
[PATCH] IDE CD end-of media error fix
This is a patch from Alan that fixes a real ide-cd.c regression causing
bogus "Media Check" failures for perfectly valid Fedora install ISOs, on
certain CD-ROM drives.
This is a forward port to 2.6.16 (from RHEL) of the minimal changes for the
end of media problem. It may not be sufficient for some controllers
(promise notably) and it does not touch the locking so the error path
locking is as horked as in mainstream.
From: Ingo Molnar <mingo@elte.hu>
I have ported the patch to 2.6.17-rc4 and tested it by provoking
end-of-media IO errors with an unaligned ISO image. Unlike the vanilla
kernel, the patched kernel interpreted the error condition correctly with
512 byte granularity:
Adrian Bunk [Sun, 25 Jun 2006 12:47:41 +0000 (05:47 -0700)]
[PATCH] kernel/sys.c: cleanups
- proper prototypes for the following functions:
- ctrl_alt_del() (in include/linux/reboot.h)
- getrusage() (in include/linux/resource.h)
- make the following needlessly global functions static:
- kernel_restart_prepare()
- kernel_kexec()
[akpm@osdl.org: compile fix] Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Michael Ellerman [Sun, 25 Jun 2006 12:47:40 +0000 (05:47 -0700)]
[PATCH] Make printk work for really early debugging
Currently printk is no use for early debugging because it refuses to
actually print anything to the console unless
cpu_online(smp_processor_id()) is true.
The stated explanation is that console drivers may require per-cpu
resources, or otherwise barf, because the system is not yet setup
correctly. Fair enough.
However some console drivers might be quite happy running early during
boot, in fact we have one, and so it'd be nice if printk understood that.
So I added a flag (which I would have called CON_BOOT, but that's taken)
called CON_ANYTIME, which indicates that a console is happy to be called
anytime, even if the cpu is not yet online.
Tested on a Power 5 machine, with both a CON_ANYTIME driver and a bogus
console driver that BUG()s if called while offline. No problems AFAICT.
Built for i386 UP & SMP.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Daniel Walker [Sun, 25 Jun 2006 12:47:37 +0000 (05:47 -0700)]
[PATCH] idetape gcc 4.1 warning fix
In both the read and write cases it will return an error if
copy_{from/to}_user faults. However, I let the driver try to read/write as
much as it can just as it normally would , then finally it returns an error
if there was one. This was the most straight forward way to handle the
error , since there isn't a clear way to clean up the buffers on error .
I moved retval in idetape_chrdev_write() down into the actual code blocks
since it's really once used there, and it conflicted with my ret variable.
Fixes the following warning,
drivers/ide/ide-tape.c: In function â\80\98idetape_copy_stage_from_userâ\80\99:
drivers/ide/ide-tape.c:2662: warning: ignoring return value of â\80\98copy_from_userâ\80\99, declared with attribute warn_unused_result
drivers/ide/ide-tape.c: In function â\80\98idetape_copy_stage_to_userâ\80\99:
drivers/ide/ide-tape.c:2689: warning: ignoring return value of â\80\98copy_to_userâ\80\99, declared with attribute warn_unused_result
Signed-off-by: Daniel Walker <dwalker@mvista.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan Engelhardt [Sun, 25 Jun 2006 12:47:36 +0000 (05:47 -0700)]
[PATCH] openpromfs: factorize out
"Move" "common code" out to PTR_NOD, which does the conversion from private
pointer to node number. This is to reduce potential casting/conversion errors
due to redundancy. (The naming PTR_NOD follows PTR_ERR, turning a pointer
into xyz.)
[akpm@osdl.org: cleanups] Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jan Engelhardt [Sun, 25 Jun 2006 12:47:35 +0000 (05:47 -0700)]
[PATCH] openpromfs: fix missing NUL
tchars is not '\0'-terminated so the strtoul may run into problems. Fix that.
Also make tchars as big as a long in hexadecimal form would take rather than
just 16.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] oprofile: convert from semaphores to mutexes
Signed-off-by: Markus Armbruster <armbru@redhat.com> Cc: Philippe Elie <phil.el@wanadoo.fr> Cc: John Levon <levon@movementarian.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andrew Morton [Sun, 25 Jun 2006 12:47:32 +0000 (05:47 -0700)]
[PATCH] msnd section fix
WARNING: sound/oss/msnd.o - Section mismatch: reference to .init.text:msnd_register from __ksymtab between '__ksymtab_msnd_register' (at offset 0x0) and '__ksymtab_msnd_unregister'
This symbol is exported. It'll oops if the driver is nonmodular and the
caller is modular.
Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In ufs code there is function: ubh_ll_rw_block, it has parameter how many
ufs_buffer_head it should handle, but it always called with "1" on the place
of this parameter. This patch removes unused parameter of "ubh_ll_wr_block".
ufs super block contains some statistic about file systems, like amount of
directories, free blocks, inodes and so on.
UFS1 hold this information in one location and uses 32bit integers for such
information, UFS2 hold statistic in another location and uses 64bit integers.
There is transition variant, if UFS1 has type 44BSD and flags field in super
block has some special value this mean that we work with statistic like UFS2
does. and this also means that nobody care about old(UFS1) statistic.
So if start fsck against such file system, after usage linux ufs driver, it
found error: at now only UFS1 like statistic is updated.
This patch should fix this. Also it contains some minor cleanup: CodingSytle
and remove unused variables.
Andrew Morton [Sun, 25 Jun 2006 12:47:28 +0000 (05:47 -0700)]
[PATCH] ufs: printk warning fixes
fs/ufs/super.c: In function `ufs_print_super_stuff':
fs/ufs/super.c:103: warning: unsigned int format, different type arg (arg 2) fs/ufs/super.c: In function `ufs2_print_super_stuff': fs/ufs/super.c:147: warning: unsigned int format, different type arg (arg 2) fs/ufs/super.c: In function `ufs_print_cylinder_stuff':
fs/ufs/super.c:175: warning: unsigned int format, different type arg (arg 2)
Presently if we allocate several "metadata" blocks (pointers to indirect
blocks for example), we fill with zeroes only the first block. This cause
some problems in "truncate" function. Also this patch remove some unused
arguments from several functions and add comments.
Currently to turn on debug mode "user" has to edit ~10 files, to turn off he
has to do it again.
This patch introduce such changes:
1)turn on(off) debug messages via ".config"
2)remove unnecessary duplication of code
3)make "UFSD" macros more similar to function
4)fix some compiler warnings
[PATCH] ufs: not usual amounts of fragments per block
The writing to UFS file system with block/fragment!=8 may cause bogus
behaviour. The problem in "ufs_bitmap_search" function, which doesn't work
correctly in "block/fragment!=8" case. The idea is stolen from BSD code.
There are two ugly macros in ufs code:
#define UCPI_UBH ((struct ufs_buffer_head *)ucpi)
#define USPI_UBH ((struct ufs_buffer_head *)uspi)
when uspi looks like
struct {
struct ufs_buffer_head ;
}
and USPI_UBH has some sence,
ucpi looks like
struct {
struct not_ufs_buffer_head;
}
To prevent bugs in future, this patch convert macros to inline function and
fix "ucpi" structure.
First of all some necessary notes about UFS by it self: To avoid waste of disk
space the tail of file consists not from blocks (which is ordinary big enough,
16K usually), it consists from fragments(which is ordinary 2K). When file is
growing its tail occupy 1 fragment, 2 fragments... At some stage decision to
allocate whole block is made and all fragments are moved to one block.
bh = sb_bread
bh->b_blocknr = result + i;
mark_buffer_dirty (bh);
This is wrong solution, because:
- it didn't take into consideration that there is another cache: "inode page
cache"
- because of sb_getblk uses not b_blocknr, (it uses page->index) to find
certain block, this breaks sb_getblk.
How this situation is handled now: we go though all "page inode cache", if
there are no such page in cache we load it into cache, and change b_blocknr.
Currently, ufs write support have two sets of problems: work with files and
work with directories.
This series of patches should solve the first problem.
This patch is similar to http://lkml.org/lkml/2006/1/17/61 this patch
complements it.
The situation the same: in ufs_trunc_(not direct), we read block, check if
count of links to it is equal to one, if so we finish cycle, if not
continue. Because of "count of links" always >=2 this operation cause
infinite cycle and hang up the kernel.
Alan Stern [Sun, 25 Jun 2006 12:47:15 +0000 (05:47 -0700)]
[PATCH] Allow raw_notifier callouts to unregister themselves
Since raw_notifier chains don't benefit from any centralized locking
protections, they shouldn't suffer from the associated limitations. Under
some circumstances it might make sense for a raw_notifier callout routine
to unregister itself from the notifier chain. This patch (as678) changes
the notifier core to allow for such things.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Paul Mackerras [Sun, 25 Jun 2006 12:47:14 +0000 (05:47 -0700)]
[PATCH] Define __raw_get_cpu_var and use it
There are several instances of per_cpu(foo, raw_smp_processor_id()), which
is semantically equivalent to __get_cpu_var(foo) but without the warning
that smp_processor_id() can give if CONFIG_DEBUG_PREEMPT is enabled. For
those architectures with optimized per-cpu implementations, namely ia64,
powerpc, s390, sparc64 and x86_64, per_cpu() turns into more and slower
code than __get_cpu_var(), so it would be preferable to use __get_cpu_var
on those platforms.
This defines a __raw_get_cpu_var(x) macro which turns into per_cpu(x,
raw_smp_processor_id()) on architectures that use the generic per-cpu
implementation, and turns into __get_cpu_var(x) on the architectures that
have an optimized per-cpu implementation.
Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Jesper Juhl [Sun, 25 Jun 2006 12:47:09 +0000 (05:47 -0700)]
[PATCH] ensure NULL deref can't possibly happen in is_exported()
If CONFIG_KALLSYMS is defined and if it should happen that is_exported() is
given a NULL 'mod' and lookup_symbol(name, __start___ksymtab,
__stop___ksymtab) returns 0, then we'll end up dereferencing a NULL
pointer.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Rewritten backlight infrastructure for portable Apple computers
This patch contains a total rewrite of the backlight infrastructure for
portable Apple computers. Backward compatibility is retained. A sysfs
interface allows userland to control the brightness with more steps than
before. Userland is allowed to upload a brightness curve for different
monitors, similar to Mac OS X.
[akpm@osdl.org: add needed exports] Signed-off-by: Michael Hanselmann <linux-kernel@hansmi.ch> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Adrian Bunk [Sun, 25 Jun 2006 12:47:06 +0000 (05:47 -0700)]
[PATCH] uml: remove dead declaration
Became irrelevant when x86_64 unexported ia32_sys_call_table.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: Andi Kleen <ak@muc.de> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Sun, 25 Jun 2006 12:47:01 +0000 (05:47 -0700)]
[PATCH] m68k: convert generic irq code to irq controller
Convert the generic irq code to use irq controller, this gets rid of the
machine specific callbacks and gives better control over irq handling without
duplicating lots of code.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Roman Zippel [Sun, 25 Jun 2006 12:46:58 +0000 (05:46 -0700)]
[PATCH] m68k: fix show_registers()
Move some of the prints in die_if_kernel() to show_registers() and call that
instead of show_stack(), so show_registers() prints now similiar info as other
archs. Clean up the function a little.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Andreas Mohr [Sun, 25 Jun 2006 12:46:52 +0000 (05:46 -0700)]
[PATCH] cpu_relax(): smpboot.c
Add cpu_relax() to various smpboot.c init loops. cpu_relax() always implies a
barrier (according to Arjan), so remove those as well.
Signed-off-by: Andreas Mohr <andi@lisas.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] Clean up and refactor i386 sub-architecture setup
Clean up and refactor i386 sub-architecture setup.
This change moves all the code from the
asm-i386/mach-*/setup_arch_pre/post.h headers, into
arch/i386/mach-*/setup.c. mach-*/setup_arch_pre.h is renamed to
setup_arch.h, and contains only things which should be in header files. It
is purely code-motion; there should be no functional changes at all.
Several functions in arch/i386/kernel/setup.c needed to be made non-static
so that they're visible to the code in mach-*/setup.c. asm-i386/setup.h is
used to hold the prototypes for these functions.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Cc: Zachary Amsden <zach@vmware.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Christian Limpach <Christian.Limpach@cl.cam.ac.uk> Cc: Martin Bligh <mbligh@google.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Andrey Panin <pazke@donpac.ru> Cc: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] page migration: Support a vma migration function
Hooks for calling vma specific migration functions
With this patch a vma may define a vma->vm_ops->migrate function. That
function may perform page migration on its own (some vmas may not contain page
structs and therefore cannot be handled by regular page migration. Pages in a
vma may require special preparatory treatment before migration is possible
etc) . Only mmap_sem is held when the migration function is called. The
migrate() function gets passed two sets of nodemasks describing the source and
the target of the migration. The flags parameter either contains
MPOL_MF_MOVE which means that only pages used exclusively by
the specified mm should be moved
or
MPOL_MF_MOVE_ALL which means that pages shared with other processes
should also be moved.
The migration function returns 0 on success or an error condition. An error
condition will prevent regular page migration from occurring.
On its own this patch cannot be included since there are no users for this
functionality. But it seems that the uncached allocator will need this
functionality at some point.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
[PATCH] mm: remove VM_LOCKED before remap_pfn_range and drop VM_SHM
Remove VM_LOCKED before remap_pfn range from device drivers and get rid of
VM_SHM.
remap_pfn_range() already sets VM_IO. There is no need to set VM_SHM since
it does nothing. VM_LOCKED is of no use since the remap_pfn_range does not
place pages on the LRU. The pages are therefore never subject to swap
anyways. Remove all the vm_flags settings before calling remap_pfn_range.
After removing all the vm_flag settings no use of VM_SHM is left. Drop it.
Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Zach Brown [Sun, 25 Jun 2006 12:46:46 +0000 (05:46 -0700)]
[PATCH] AOP_TRUNCATED_PAGE victims in read_pages() belong in the LRU
AOP_TRUNCATED_PAGE victims in read_pages() belong in the LRU
Nick Piggin rightly pointed out that the introduction of AOP_TRUNCATED_PAGE
to read_pages() was wrong to leave A_T_P victim pages in the page cache but
not put them in the LRU. Failing to do so hid them from the VM.
A_T_P just means that the aop method unlocked the page rather than
performing IO. It would be very rare that the page was truncated between
the unlock and testing A_T_P. So we leave the pages in the LRU for likely
reuse soon rather than backing them back out of the page cache. We do this
by matching the behaviour before the A_T_P introduction which added pages
to the LRU regardless of what ->readpage() did.
This doesn't include the unrelated cleanup in Nick's initial fix which
changed read_pages() to return void to match its only caller's behaviour of
ignoring errors.
Signed-off-by: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Linus Torvalds [Sun, 25 Jun 2006 00:48:14 +0000 (17:48 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm: (25 commits)
[ARM] 3648/1: Update struct ucontext layout for coprocessor registers
[ARM] Add identifying number for non-rt sigframe
[ARM] Gather common sigframe saving code into setup_sigframe()
[ARM] Gather common sigframe restoration code into restore_sigframe()
[ARM] Re-use sigframe within rt_sigframe
[ARM] Merge sigcontext and sigmask members of sigframe
[ARM] Replace extramask with a full copy of the sigmask
[ARM] Remove rt_sigframe puc and pinfo pointers
[ARM] 3647/1: S3C24XX: add Osiris to the list of simtec pm machines
[ARM] 3645/1: S3C2412: irq support for external interrupts
[ARM] 3643/1: S3C2410: Add new usb clocks
[ARM] 3642/1: S3C24XX: Add machine SMDK2413
[ARM] 3641/1: S3C2412: Fixup gpio register naming
[ARM] 3640/1: S3C2412: Use S3C24XX_DCLKCON instead of S3C2410_DCLKCON
[ARM] 3639/1: S3C2412: serial port support
[ARM] 3638/1: S3C2412: core clocks
[ARM] 3637/1: S3C24XX: Add mpll clock, and set as fclk parent
[ARM] 3636/1: S3C2412: Add selection of CPU_ARM926
[ARM] 3635/1: S3C24XX: Add S3C2412 core cpu support
[ARM] 3633/1: S3C24XX: s3c2410 gpio bugfix - wrong pin nos
...
As Al so eloquently points out, the patch is crap. The old code was fine,
the new code was bogus.
It never dereferenced a user pointer, the "->" operator was to an array
member, which gives the _address_ of the member (in user space), not an
actual dereference at all.