NeilBrown [Wed, 17 Oct 2007 06:30:55 +0000 (23:30 -0700)]
md: make sure read errors are auto-corrected during a 'check' resync in raid1
Whenever a read error is found, we should attempt to overwrite with correct
data to 'fix' it.
However when do a 'check' pass (which compares data blocks that are
successfully read, but doesn't normally overwrite) we don't do that. We
should.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Iustin Pop [Wed, 17 Oct 2007 06:30:54 +0000 (23:30 -0700)]
md: expose the degraded status of an assembled array through sysfs
The 'degraded' attribute is useful to quickly determine if the array is
degraded, instead of parsing 'mdadm -D' output or relying on the other
techniques (number of working devices against number of defined devices,
etc.). The md code already keeps track of this attribute, so it's useful to
export it.
Signed-off-by: Iustin Pop <iusty@k1024.org> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Wed, 17 Oct 2007 06:30:53 +0000 (23:30 -0700)]
md: 'sync_action' in sysfs returns wrong value for readonly arrays
When an array is started read-only, MD_RECOVERY_NEEDED can be set but no
recovery will be running. This causes 'sync_action' to report the wrong
value.
We could remove the test for MD_RECOVERY_NEEDED, but doing so would leave a
small gap after requesting a sync action, where 'sync_action' would still
report the old value.
So make sure that for a read-only array, 'sync_action' always returns 'idle'.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
NeilBrown [Wed, 17 Oct 2007 06:30:53 +0000 (23:30 -0700)]
md: fix a bug in some never-used code.
http://bugzilla.kernel.org/show_bug.cgi?id=3277
There is a seq_printf here that isn't being passed a 'seq'. Howeve as the
code is inside #ifdef MD_DEBUG, nobody noticed.
Also remove some extra spaces.
Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Wed, 17 Oct 2007 06:30:52 +0000 (23:30 -0700)]
bitmap.h: remove dead artifacts
bitmap_active() no longer exists and BITMAP_ACTIVE is no longer used.
Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael J. Evans [Wed, 17 Oct 2007 06:30:52 +0000 (23:30 -0700)]
md: software Raid autodetect dev list not array
In current release kernels the md module (Software RAID) uses a static
array (dev_t[128]) to store partition/device info temporarily for
autostart.
I discovered this (and that the devices are added as disks/partitions are
discovered at boot) while I was debugging why only one of my MD arrays would
come up whole, while all the others were short a disk.
I eventually discovered that it was enumerating through all of 9 of my 11 hds
(2 had only 4 partitions apiece) while the other 9 have 15 partitions (I
wanted 64 per drive...). The last partition of the 8th drive in my 9 drive
raid 5 sets wasn't added, thus making the final md array short both a parity
and data disk, and it was started later, elsewhere.
This patch replaces that static array with a list.
[akpm@linux-foundation.org: removed unused var] Signed-off-by: Michael J. Evans <mjevans1983@gmail.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joern Engel [Wed, 17 Oct 2007 06:30:44 +0000 (23:30 -0700)]
introduce I_SYNC
I_LOCK was used for several unrelated purposes, which caused deadlock
situations in certain filesystems as a side effect. One of the purposes
now uses the new I_SYNC bit.
Also document the various bits and change their order from historical to
logical.
[bunk@stusta.de: make fs/inode.c:wake_up_inode() static] Signed-off-by: Joern Engel <joern@wohnheim.fh-wedel.de> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: David Chinner <dgc@sgi.com> Cc: Anton Altaparmakov <aia21@cam.ac.uk> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fengguang Wu [Wed, 17 Oct 2007 06:30:43 +0000 (23:30 -0700)]
writeback: introduce writeback_control.more_io to indicate more io
After making dirty a 100M file, the normal behavior is to start the writeback
for all data after 30s delays. But sometimes the following happens instead:
- after 30s: ~4M
- after 5s: ~4M
- after 5s: all remaining 92M
Some analyze shows that the internal io dispatch queues goes like this:
1) initial state with a 100M file and a 1K file
2) 4M written, nr_to_write <= 0, so write more
3) 1K written, nr_to_write > 0, no more writes(BUG)
nr_to_write > 0 in (3) fools the upper layer to think that data have all been
written out. The big dirty file is actually still sitting in s_more_io. We
cannot simply splice s_more_io back to s_io as soon as s_io becomes empty, and
let the loop in generic_sync_sb_inodes() continue: this may starve newly
expired inodes in s_dirty. It is also not an option to draw inodes from both
s_more_io and s_dirty, an let the loop go on: this might lead to live locks,
and might also starve other superblocks in sync time(well kupdate may still
starve some superblocks, that's another bug).
We have to return when a full scan of s_io completes. So nr_to_write > 0 does
not necessarily mean that "all data are written". This patch introduces a
flag writeback_control.more_io to indicate this situation. With it the big
dirty file no longer has to wait for the next kupdate invocation 5s later.
Cc: David Chinner <dgc@sgi.com> Cc: Ken Chen <kenchen@google.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fengguang Wu [Wed, 17 Oct 2007 06:30:42 +0000 (23:30 -0700)]
writeback: remove pages_skipped accounting in __block_write_full_page()
Miklos Szeredi <miklos@szeredi.hu> and me identified a writeback bug:
> The following strange behavior can be observed:
>
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
>
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.
% dmesg|grep 847824 # generated by a debug printk
[ 529.263184] redirtied inode 847824 line 548
[ 564.250872] redirtied inode 847824 line 548
[ 594.272797] redirtied inode 847824 line 548
[ 629.231330] redirtied inode 847824 line 548
[ 659.224674] redirtied inode 847824 line 548
[ 689.219890] redirtied inode 847824 line 548
[ 724.226655] redirtied inode 847824 line 548
[ 759.198568] redirtied inode 847824 line 548
# line 548 in fs/fs-writeback.c:
543 if (wbc->pages_skipped != pages_skipped) {
544 /*
545 * writeback is not making progress due to locked
546 * buffers. Skip this inode for now.
547 */
548 redirty_tail(inode);
549 }
More debug efforts show that __block_write_full_page()
never has the chance to call submit_bh() for that big dirty file:
the buffer head is *clean*. So basicly no page io is issued by
__block_write_full_page(), hence pages_skipped goes up.
Also the comment in generic_sync_sb_inodes():
544 /*
545 * writeback is not making progress due to locked
546 * buffers. Skip this inode for now.
547 */
and the comment in __block_write_full_page():
1713 /*
1714 * The page was marked dirty, but the buffers were
1715 * clean. Someone wrote them back by hand with
1716 * ll_rw_block/submit_bh. A rare case.
1717 */
do not quite agree with each other. The page writeback should be skipped for
'locked buffer', but here it is 'clean buffer'!
This patch fixes this bug. Though I'm not sure why __block_write_full_page()
is called only to do nothing and who actually issued the writeback for us.
This is the two possible new behaviors after the patch:
1) pretty nice: wait 30s and write ALL:)
2) not so good:
- during the dd: ~16M
- after 30s: ~4M
- after 5s: ~4M
- after 5s: ~176M
The next patch will fix case (2).
Cc: David Chinner <dgc@sgi.com> Cc: Ken Chen <kenchen@google.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: David Chinner <dgc@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fengguang Wu [Wed, 17 Oct 2007 06:30:39 +0000 (23:30 -0700)]
writeback: fix time ordering of the per superblock inode lists 8
Streamline the management of dirty inode lists and fix time ordering bugs.
The writeback logic used to move not-yet-expired dirty inodes from s_dirty to
s_io, *only to* move them back. The move-inodes-back-and-forth thing is a
mess, which is eliminated by this patch.
The new scheme is:
- s_dirty acts as a time ordered io delaying queue;
- s_io/s_more_io together acts as an io dispatching queue.
On kupdate writeback, we pull some inodes from s_dirty to s_io at the start of
every full scan of s_io. Otherwise (i.e. for sync/throttle/background
writeback), we always pull from s_dirty on each run (a partial scan).
Note that the line
list_splice_init(&sb->s_more_io, &sb->s_io);
is moved to queue_io() to leave s_io empty. Otherwise a big dirtied file will
sit in s_io for a long time, preventing new expired inodes to get in.
Cc: Ken Chen <kenchen@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Current -mm tree has bucketful of bug fixes in periodic writeback path.
However, we still hit a glitch where dirty pages on a given inode aren't
completely flushed to the disk, and system will accumulate large amount of
dirty pages beyond what dirty_expire_interval is designed for.
The problem is __sync_single_inode() will move an inode to sb->s_dirty list
even when there are more pending dirty pages on that inode. If there is
another inode with a small number of dirty pages, we hit a case where the loop
iteration in wb_kupdate() terminates prematurely because wbc.nr_to_write > 0.
Thus leaving the inode that has large amount of dirty pages behind and it has
to wait for another dirty_writeback_interval before we flush it again. We
effectively only write out MAX_WRITEBACK_PAGES every dirty_writeback_interval.
If the rate of dirtying is sufficiently high, the system will start
accumulate a large number of dirty pages.
So fix it by having another sb->s_more_io list on which to park the inode
while we iterate through sb->s_io and to allow each dirty inode which resides
on that sb to have an equal chance of flushing some amount of dirty pages.
Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 17 Oct 2007 06:30:37 +0000 (23:30 -0700)]
writeback: fix time ordering of the per superblock dirty inode lists 7
This one fixes four bugs.
There are a few situation in there where writeback decides it is going to skip
over a blockdev inode on the kernel-internal blockdev superblock. It
presently does this by moving the blockdev inode onto the tail of the blockdev
superblock's s_dirty. But
a) this screws up s_dirty's reverse-time-orderedness and
b) refiling the blockdev for writeback in another 30 second is rude. We
should try again sooner than that.
Fix all this up by using redirty_head(): move the blockdev inode onto the head
of the blockdev superblock's s_dirty list for prompt writeback.
Cc: Mike Waychison <mikew@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 17 Oct 2007 06:30:37 +0000 (23:30 -0700)]
writeback: fix time ordering of the per superblock dirty inode lists 6
Recycling the previous changelog:
When the writeback function is operating in writeback-for-flushing mode
(as opposed to writeback-for-integrity) and it encounters an I_LOCKed inode,
it will skip writing that inode. This is done for throughput and latency:
move on to another inode rather than blocking for this one.
Writeback skips this inode by moving it off s_io and onto s_dirty, so that
writeback can proceed with the other inodes on s_io.
However that inode movement can corrupt s_dirty's
reverse-time-orderedness. Fix that by using the new redirty_tail(), which
will update the refiled inode's dirtied_when field.
Note: the behaviour in here is a bit rude: if kupdate happens to come
across a locked inode then it will defer writeback of that inode for another
30 seconds. We'll address that in the next patch.
Address that here. What we do is to move the skipped inode to the _head_ of
s_dirty, immediately eligible for writeout again. Instead of deferring that
writeout for another 30 seconds.
One would think that this might cause a livelock: we keep on trying to write
the same locked inode. But it won't because:
a) if that was the case, it would _already_ be happening on the
balance_dirty_pages codepath. Because balance_dirty_pages() doesn't care
about inode timestamps.
b) if we skipped this inode then we won't have done any writeback. The
higher-level writeback paths will see that wbc.nr_to_write didn't change
and they'll then back off and take a nap.
Cc: Mike Waychison <mikew@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 17 Oct 2007 06:30:36 +0000 (23:30 -0700)]
writeback: fix time ordering of the per superblock dirty inode lists 5
When the writeback function is operating in writeback-for-flushing mode (as
opposed to writeback-for-integrity) and it encounters an I_LOCKed inode, it
will skip writing that inode. This is done for throughput and latency: move
on to another inode rather than blocking for this one.
Writeback skips this inode by moving it off s_io and onto s_dirty, so that
writeback can proceed with the other inodes on s_io.
However that inode movement can corrupt s_dirty's reverse-time-orderedness.
Fix that by using the new redirty_tail(), which will update the refiled
inode's dirtied_when field.
Note: the behaviour in here is a bit rude: if kupdate happens to come across a
locked inode then it will defer writeback of that inode for another 30
seconds. We'll address that in the next patch.
Cc: Mike Waychison <mikew@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 17 Oct 2007 06:30:34 +0000 (23:30 -0700)]
writeback: fix time ordering of the per superblock dirty inode lists 4
When the kupdate function has tried to write back an expired inode it will
then check to see whether some of the inode's pages are still dirty.
This can happen when the filesystem decided to not write a page for some
reason. But it does _not_ occur due to redirtyings: a redirtying will set
I_DIRTY_PAGES.
What we need to do here is to set I_DIRTY_PAGES to reflect reality and to then
put the inode onto the _head_ of s_dirty for consideration on the next kupdate
pass, in five seconds time.
Problem is, the code failed to modify the inode's timestamp when pushing the
inode onto thehead of s_dirty.
The patch:
If there are no other inodes on s_dirty then we leave the inode's timestamp
alone: it is already expired.
If there _are_ other inodes on s_dirty then we arrange for this inode to get
the same timestamp as the inode which is at the head of s_dirty, thus
preserving the s_dirty ordering. But we only need to do this if this inode
purports to have been dirtied before the one at head-of-list.
Cc: Mike Waychison <mikew@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 17 Oct 2007 06:30:34 +0000 (23:30 -0700)]
writeback: fix time ordering of the per superblock dirty inode lists 3
While writeback is working against a dirty inode it does a check after trying
to write some of the inode's pages:
"did the lower layers skip some of the inode's dirty pages because they were
locked (or under writeback, or whatever)"
If this turns out to be true, we must move the inode back onto s_dirty and
redirty it. The reason for doing this is that fsync() and friends only check
the s_dirty list, and those functions want to know about those pages which
were locked, so they can be waited upon and, if necessary, rewritten.
Problem is, that redirtying was putting the inode onto the tail of s_dirty
without updating its timestamp. This causes a violation of s_dirty ordering.
Fix this by updating inode->dirtied_when when moving the inode onto s_dirty.
But the code is still a bit buggy? If the inode was _already_ dirty then we
don't need to move it at all. Oh well, hopefully it doesn't matter too much,
as that was a redirtying, which was very recent anwyay.
Cc: Mike Waychison <mikew@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 17 Oct 2007 06:30:33 +0000 (23:30 -0700)]
writeback: fix time ordering of the per superblock dirty inode lists: memory-backed inodes
For reasons which escape me, inodes which are dirty against a ram-backed
filesystem are managed in the same way as inodes which are backed by real
devices.
Probably we could optimise things here. But given that we skip the entire
supeblock as son as we hit the first dirty inode, there's not a lot to be
gained.
And the code does need to handle one particular non-backed superblock: the
kernel's fake internal superblock which holds all the blockdevs.
Still. At present when the code encounters an inode which is dirty against a
memory-backed filesystem it will skip that inode by refiling it back onto
s_dirty. But it fails to update the inode's timestamp when doing so which at
least makes the debugging code upset.
Fix.
Cc: Mike Waychison <mikew@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 17 Oct 2007 06:30:32 +0000 (23:30 -0700)]
writeback: fix time-ordering of the per-superblock dirty-inode lists
When writeback has finished writing back an inode it looks to see if that
inode is still dirty. If it is, that means that a process redirtied the inode
while its writeback was in progress.
What we need to do here is to refile the redirtied inode onto the s_dirty
list.
But we're doing that wrongly: it could be that this inode was redirtied
_before_ the last inode on s_dirty. We're blindly appending this inode to the
list, after an inode which might be less-recently-dirtied, thus violating the
list's ordering.
So we must either insertion-sort this inode into the correct place, or we must
update this inode's dirtied_when field when appending it to the reverse-sorted
s_dirty list, to preserve the reverse-time-ordering.
This patch does the latter: if this inode was dirtied less recently than the
tail inode then copy the tail inode's timestamp into this inode.
This means that in rare circumstances, some inodes will be writen back later
than they should have been. But the time slip will be small.
Cc: Mike Waychison <mikew@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 17 Oct 2007 06:30:30 +0000 (23:30 -0700)]
dontdiff: update based on gitignore updates
Update dontdiff, based on .gitignore patches from Pete Zaitcev and Adrian
Bunk.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Pete Zaitcev <zaitcev@redhat.com> Cc: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Edward Shishkin [Wed, 17 Oct 2007 06:30:30 +0000 (23:30 -0700)]
reiserfs: do not repair wrong journal params
When mounting a file system with wrong journal params do not try to repair
them, suggest fsck instead.
Signed-off-by: Edward Shishkin <edward@namesys.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ingo Molnar [Wed, 17 Oct 2007 06:30:29 +0000 (23:30 -0700)]
printk: add KERN_CONT annotation
printk: add the KERN_CONT annotation (which is empty string but via
which checkpatch.pl can notice that the lacking KERN_ level is fine).
This useful for multiple calls of hand-crafted printk output done by
early debug code or similar.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Sandeen [Wed, 17 Oct 2007 06:30:28 +0000 (23:30 -0700)]
ext3: lighten up resize transaction requirements
When resizing online, setup_new_group_blocks attempts to reserve a
potentially very large transaction, depending on the current filesystem
geometry. For some journal sizes, there may not be enough room for this
transaction, and the online resize will fail.
The patch below resizes & restarts the transaction as necessary while
setting up the new group, and should work with even the smallest journal.
Tested with something like:
[root@newbox ~]# dd if=/dev/zero of=fsfile bs=1024 count=32768
[root@newbox ~]# mkfs.ext3 -b 1024 fsfile 16384
[root@newbox ~]# mount -o loop fsfile mnt/
[root@newbox ~]# resize2fs /dev/loop0
resize2fs 1.40.2 (12-Jul-2007)
Filesystem at /dev/loop0 is mounted on /root/mnt; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/loop0 to 32768 (1k) blocks.
resize2fs: No space left on device While trying to add group #2
[root@newbox ~]# dmesg | tail -n 1
JBD: resize2fs wants too many credits (258 > 256)
[root@newbox ~]#
With the below change, it works.
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Andreas Dilger <adilger@clusterfs.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ulrich Drepper [Wed, 17 Oct 2007 06:30:26 +0000 (23:30 -0700)]
F_DUPFD_CLOEXEC implementation
One more small change to extend the availability of creation of file
descriptors with FD_CLOEXEC set. Adding a new command to fcntl() requires
no new system call and the overall impact on code size if minimal.
If this patch gets accepted we will also add this change to the next
revision of the POSIX spec.
To test the patch, use the following little program. Adjust the value of
F_DUPFD_CLOEXEC appropriately.
Alexey Dobriyan [Wed, 17 Oct 2007 06:30:26 +0000 (23:30 -0700)]
task_struct: move ->fpu_counter and ->oomkilladj
There is nice 2 byte hole after struct task_struct::ioprio field
into which we can put two 1-byte fields: ->fpu_counter and ->oomkilladj.
Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Franck Bui-Huu [Wed, 17 Oct 2007 06:30:24 +0000 (23:30 -0700)]
Break ELF_PLATFORM and stack pointer randomization dependency
Currently arch_align_stack() is used by fs/binfmt_elf.c to randomize
stack pointer inside a page. But this happens only if ELF_PLATFORM
symbol is defined.
ELF_PLATFORM is normally set if the architecture wants ld.so to load
implementation specific libraries for optimization. And currently a
lot of architectures just yield this symbol to NULL.
This is the case for MIPS architecture where ELF_PLATFORM is NULL but
arch_align_stack() has been redefined to do stack inside page
randomization. So in this case no randomization is actually done.
This patch breaks this dependency which seems to be useless and allows
platforms such MIPS to do the randomization.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Sandeen [Wed, 17 Oct 2007 06:30:23 +0000 (23:30 -0700)]
ext3: remove #ifdef CONFIG_EXT3_INDEX
CONFIG_EXT3_INDEX is not an exposed config option in the kernel, and it is
unconditionally defined in ext3_fs.h. tune2fs is already able to turn off
dir indexing, so at this point it's just cluttering up the code. Remove
it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Wed, 17 Oct 2007 06:30:22 +0000 (23:30 -0700)]
fs: correct SuS compliance for open of large file without options
The early LFS work that Linux uses favours EFBIG in various places. SuSv3
specifically uses EOVERFLOW for this as noted by Michael (Bug 7253)
[EOVERFLOW]
The named file is a regular file and the size of the file cannot be
represented correctly in an object of type off_t. We should therefore
transition to the proper error return code
Signed-off-by: Alan Cox <alan@redhat.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ahmed S. Darwish [Wed, 17 Oct 2007 06:30:21 +0000 (23:30 -0700)]
Completely remove deprecated IRQ flags (SA_*)
Only very little files use the deprecated SA_* IRQ flags in latest pull. This
patch series removes such macros from the tree and transfrom old code to the
new IRQF_* flags.
I've grepped the whole tree to make sure that no more files than the patched
ones use such deprecated macros. I hope this series won't introduce build
errors.
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Matthew Wilcox <willy@debian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ahmed S. Darwish [Wed, 17 Oct 2007 06:30:20 +0000 (23:30 -0700)]
NCR53C8XX: Remove deprecated IRQ flags (SA_*)
Stop using deprecated IRQ flags in ncr53c8xx documentaion. The new IRQF_*
macros are used instead.
Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Matthew Wilcox <willy@debian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davide Libenzi [Wed, 17 Oct 2007 06:30:19 +0000 (23:30 -0700)]
anon-inodes use open coded atomic_inc for the shared inode
Since we know the shared inode count is always >0, we can avoid igrab()
and use an open coded atomic_inc().
This also fixes a bug noticed by Yan Zheng <yanzheng@21cn.com>: were checking
for an IS_ERR() return from igrab(), but it actually returns NULL on error.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Yan Zheng <yanzheng@21cn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Engelhardt [Wed, 17 Oct 2007 06:30:16 +0000 (23:30 -0700)]
menuconfig: transform Network Filesystems menu
Turn Network File Systems into a menuconfig so that it can be disabled at
once.
(Note: I added a "default y". If you do not like that, speak up.)
Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Steven French <sfrench@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Cc: Eric Van Hensbergen <ericvh@hera.kernel.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Engelhardt [Wed, 17 Oct 2007 06:30:15 +0000 (23:30 -0700)]
menuconfig: transform NLS and DLM menus
Changes NLS and DLM menus into a 'menuconfig' object so that it can be
disabled at once without having to enter the menu first to disable the config
option.
Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rusty Russell [Wed, 17 Oct 2007 06:30:13 +0000 (23:30 -0700)]
Delay creation of khcvd thread
This changes hvc_init() to be called only when someone actually uses the
hvc_console driver. Dave Jones complained when profiling bootup.
hvc_console used to only be for Power aka pSeries: now lguest and Xen both
want it built-in in case the kernel is a guest under one of those, even
though usually it will be a native boot.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Mirkin [Wed, 17 Oct 2007 06:30:13 +0000 (23:30 -0700)]
change inotifyfs magic as the same magic is used for futexfs
Right now futexfs and inotifyfs have one magic 0xBAD1DEA, that looks a
little bit confusing. Use 0xBAD1DEA as magic for futexfs and 0x2BAD1DEA as
magic for inotifyfs.
Signed-off-by: Andrey Mirkin <major@openvz.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Olaf Hering [Wed, 17 Oct 2007 06:30:12 +0000 (23:30 -0700)]
increase AT_VECTOR_SIZE to terminate saved_auxv properly
include/asm-powerpc/elf.h has 6 entries in ARCH_DLINFO. fs/binfmt_elf.c
has 14 unconditional NEW_AUX_ENT entries and 2 conditional NEW_AUX_ENT
entries. So in the worst case, saved_auxv does not get an AT_NULL entry at
the end.
The saved_auxv array must be terminated with an AT_NULL entry. Make the
size of mm_struct->saved_auxv arch dependend, based on the number of
ARCH_DLINFO entries.
Signed-off-by: Olaf Hering <olh@suse.de> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrill Gorcunov [Wed, 17 Oct 2007 06:30:08 +0000 (23:30 -0700)]
UDF: coding style fixups
This patch does additional coding style fixup. Initially the code is being
distorted by Lindent (in my patches sent not very long ago) and fixed in
the followup patches but this stuff was accidently missed.
New and old compiled files were compared with cmp to check for being
identically. So the patch will not break the kernel.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Wed, 17 Oct 2007 06:30:07 +0000 (23:30 -0700)]
tty: expose new methods needed for drivers to get termios right
This adds three new functions (or in one case to be more exact makes it
always available)
tty_termios_copy_hw
Copies all the hardware settings from one termios structure to the other.
This is intended for drivers that support little or no hardware setting
tty_termios_encode_baud_rate
Allows you to set the input and output baud rate in a termios structure. A
driver is supposed to set the resulting baud rate from a request so most
will want to use this function to set the resulting input and output rates
to match the hardware values. Internally it knows about keeping Bxxx
encoding when possible to maximise compatibility.
tty_encode_baud_rate
As above but for the tty's own current termios structure
I suspect this will initially need some tweaking as it gets enabled by
driver patches over the next few mm cycles so consider this lot -mm only
for the moment so it can stabilize and end up neat before it goes to base.
I've tried not to break any obscure architectures - if you get a speed you
can't represent the code will print warnings on non updated termios systems
but not break.
Once this is merged and seems sane I've got a growing pile of driver
updates to use it - notably for USB serial drivers.
[akpm@linux-foundation.org: cleanups] Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Wed, 17 Oct 2007 06:30:06 +0000 (23:30 -0700)]
ide-cd is unmaintained
I simply don't have any old IDE systems any more or time to really look
after this. Nobody responded to the previous linux-ide mail about
maintainers so...
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Borislav Petkov [Wed, 17 Oct 2007 06:30:05 +0000 (23:30 -0700)]
fs/isofs/namei.c: Remove uninitialized local vars warning
shut up those:
fs/isofs/namei.c: In function 'isofs_lookup':
fs/isofs/namei.c:161: warning: 'offset' may be used uninitialized in this function
fs/isofs/namei.c:161: warning: 'block' may be used uninitialized in this function
By the way, they get overwritten at the end of isofs_find_entry().
Robert P. J. Day [Wed, 17 Oct 2007 06:30:05 +0000 (23:30 -0700)]
Delete gcc-2.95 compatible structure definition.
Since nothing earlier than gcc-3.2 is supported for kernel
compilation, that 2.95 hack can be removed.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lepton Wu [Wed, 17 Oct 2007 06:30:04 +0000 (23:30 -0700)]
reiserfs: workaround for dead loop in finish_unfinished
There is possible dead loop in finish_unfinished function.
In most situation, the call chain iput -> ... -> reiserfs_delete_inode ->
remove_save_link will success. But for some reason such as data
corruption, reiserfs_delete_inode fails on reiserfs_do_truncate ->
search_for_position_by_key.
Then remove_save_link won't be called. We always get the same
"save_link_key" in the while loop in finish_unfinished function. The
following patch adds a check for the possible dead loop and just remove
save link when deap loop.
[akpm@linux-foundation.org: cleanups] Signed-off-by: Lepton Wu <ytht.net@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Denis V. Lunev [Wed, 17 Oct 2007 06:29:53 +0000 (23:29 -0700)]
shrink_dcache_sb speedup
This patch makes shrink_dcache_sb consistent with dentry pruning policy.
On the first pass we iterate over dentry unused list and prepare some
dentries for removal.
However, since the existing code moves evicted dentries to the beginning of
the LRU it can happen that fresh dentries from other superblocks will be
inserted *before* our dentries.
This can result in significant slowdown of shrink_dcache_sb(). Moreover,
for virtual filesystems like unionfs which can call dput() during dentries
kill existing code results in O(n^2) complexity.
We observed 2 minutes shrink_dcache_sb() with only 35000 dentries.
To avoid this effects we propose to isolate sb dentries at the end
of LRU list.
Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrey Mirkin <amirkin@openvz.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the old-fashioned lk201 driver under drivers/tc/ that used to be
used by the old dz.c and zs.c drivers, which is now orphan code referred to
from nowhere and does not build anymore. A modern replacement is available
as drivers/input/keyboard/lkkbd.c.
There are no plans to do anything about this piece of code and it does not
fit anywhere anymore, so it is not just a matter of maintenance or the lack
of. There are still some bits that might be added to the new lkkbd.c
driver based on the old code, and the embedded hardware documentation which
is otherwise quite hard to get hold of might be useful to keep too. Both
of these can be done separately though. RIP.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lepton Wu [Wed, 17 Oct 2007 06:29:50 +0000 (23:29 -0700)]
reiserfs: fix kernel panic on corrupted directory
When reading corrupted reiserfs directory data, d_reclen could be a
negative number or a big positive number, this can lead to kernel panic or
oop. The following patch adds a sanity check.
Signed-off-by: Lepton Wu <ytht.net@gmail.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 17 Oct 2007 06:29:49 +0000 (23:29 -0700)]
doc: about email clients for Linux patches
Requested by Jeff Garzik.
v3, updated from lkml comments.
Add info about various email clients and their applicability
in being used to send Linux kernel patches.
Some notes takes from http://mbligh.org/linuxdocs/Email/Clients
Portions used with permission.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Martin Bligh <mbligh@mbligh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Wed, 17 Oct 2007 06:29:46 +0000 (23:29 -0700)]
KEYS: Make request_key() and co fundamentally asynchronous
Make request_key() and co fundamentally asynchronous to make it easier for
NFS to make use of them. There are now accessor functions that do
asynchronous constructions, a wait function to wait for construction to
complete, and a completion function for the key type to indicate completion
of construction.
Note that the construction queue is now gone. Instead, keys under
construction are linked in to the appropriate keyring in advance, and that
anyone encountering one must wait for it to be complete before they can use
it. This is done automatically for userspace.
The following auxiliary changes are also made:
(1) Key type implementation stuff is split from linux/key.h into
linux/key-type.h.
(2) AF_RXRPC provides a way to allocate null rxrpc-type keys so that AFS does
not need to call key_instantiate_and_link() directly.
(3) Adjust the debugging macros so that they're -Wformat checked even if
they are disabled, and make it so they can be enabled simply by defining
__KDEBUG to be consistent with other code of mine.
(3) Documentation.
[alan@lxorguk.ukuu.org.uk: keys: missing word in documentation] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Mason [Wed, 17 Oct 2007 06:29:44 +0000 (23:29 -0700)]
try to reap reiserfs pages left around by invalidatepage
reiserfs_invalidatepage will refuse to free pages if they have been logged
in data=journal mode, or were pinned down by a data=ordered operation. For
data=journal, this is fairly easy to trigger just with fsx-linux, and it
results in a large number of pages hanging around on the LRUs with
page->mapping == NULL.
Calling try_to_free_buffers when reiserfs decides it is done with the page
allows it to be freed earlier, and with much less VM thrashing. Lock
ordering rules mean that reiserfs can't call lock_page when it is releasing
the buffers, so TestSetPageLocked is used instead. Contention on these
pages should be rare, so it should be sufficient most of the time.
Signed-off-by: Chris Mason <chris.mason@oracle.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Wed, 17 Oct 2007 06:29:44 +0000 (23:29 -0700)]
maintainers: linux-omap list is subscribers only
You are not allowed to post to this mailing list, and your message has been
automatically rejected. If you think that your messages are being rejected
in error, contact the mailing list owner at
linux-omap-open-source-owner@linux.omap.com.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ralf Baechle [Wed, 17 Oct 2007 06:29:42 +0000 (23:29 -0700)]
Remove dma_cache_(wback|inv|wback_inv) functions
dma_cache_(wback|inv|wback_inv) were the earliest attempt on a generalized
cache managment API for I/O purposes. Originally it was basically the raw
MIPS low level cache API exported to the entire world. The API has
suffered from a lack of documentation, was not very widely used unlike it's
more modern brothers and can easily be replaced by dma_cache_sync. So
remove it rsp. turn the surviving bits back into an arch private API, as
discussed on linux-arch.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Paul Mackerras <paulus@samba.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kyle McMartin <kyle@parisc-linux.org> Acked-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
J. Bruce Fields [Wed, 17 Oct 2007 06:29:42 +0000 (23:29 -0700)]
dcache: trivial comment fix
As it stands this comment is confusing, and not quite grammatical.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mutex documentation is unclear about software interrupts, tasklets and timers
Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cedric Le Goater [Wed, 17 Oct 2007 06:29:40 +0000 (23:29 -0700)]
ipc namespace: remove config ipc ns fix
Finish the work : kill all #ifdef CONFIG_IPC_NS.
Thanks Robert !
Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Eric Biederman <ebiederm@xmision.com> Cc: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Satyam Sharma [Wed, 17 Oct 2007 06:29:39 +0000 (23:29 -0700)]
I2O: Fix "defined but not used" build warnings
drivers/message/i2o/exec-osm.c:539: warning: `i2o_exec_lct_notify' defined but not used
comes when CONFIG_I2O_LCT_NOTIFY_ON_CHANGES=n, because its only callsite
is #ifdef'ed as such. So let's #ifdef the function definition also. Also
move the definition to before the callsite, to get rid of forward prototype.
Andy Whitcroft [Wed, 17 Oct 2007 06:29:38 +0000 (23:29 -0700)]
update checkpatch.pl to version 0.10
This version brings a number of new checks, and a number of bug
fixes. Of note:
- better categorisation and space checks for dual use unary/binary
operators
- warn on deprecated use of {SPIN,RW}_LOCK_UNLOCKED
- check if/for/while with trailing ';' for hanging statements
- detect DOS line endings
- detect redundant casts for kalloc()
Andy Whitcroft (18):
Version: 0.10
asmlinkage is also a storage type
pull out inline specifiers
allow only some operators before a unary operator
parenthesised values may span line ends
add additional attribute matching
handle sparse annotations within pointer type space checks
support alternative function definition syntax for typedefs
check if/for/while with trailing ';' for hanging statements
fix output format for case checks
deprecate SPIN_LOCK_UNLOCKED and RW_LOCK_UNLOCKED
allow complex macros with bracketing braces
detect and report DOS line endings
fastcall is a valid function attribute
bracket spacing is ok for 'for'
categorise operators into unary/binary/definitions
add heuristic to pick up on unannotated types
remove spurious warnings from cat_vet
Dave Jones (1):
Make checkpatch warn about pointless casting of kalloc returns.
Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bill Nottingham [Wed, 17 Oct 2007 06:29:38 +0000 (23:29 -0700)]
add CONFIG_VT_UNICODE
As of now, the kernel defaults to non-unicode and XLATE for the keyboard.
We've been changing this in Fedora, but that requires patching the defaults
in the kernel.
The attached introduces CONFIG_VT_UNICODE, which sets the console in
unicode mode by default on boot, including both the virtual terminal and
the keyboard driver.
Signed-off-by: Bill Nottingham <notting@redhat.com> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Lund [Wed, 17 Oct 2007 06:29:35 +0000 (23:29 -0700)]
avoid negative (and full-width) shifts in radix-tree.c
Negative shifts are not allowed in C (the result is undefined). Same thing
with full-width shifts.
It works on most platforms but not on the VAX with gcc 4.0.1 (it results in an
"operand reserved" fault).
Shifting by more than the width of the value on the left is also not
allowed. I think the extra '>> 1' tacked on at the end in the original
code was an attempt to work around that. Getting rid of that is an extra
feature of this patch.
Here's the chapter and verse, taken from the final draft of the C99
standard ("6.5.7 Bitwise shift operators", paragraph 3):
"The integer promotions are performed on each of the operands. The
type of the result is that of the promoted left operand. If the
value of the right operand is negative or is greater than or equal
to the width of the promoted left operand, the behavior is
undefined."
Thank you to Jan-Benedict Glaw, Christoph Hellwig, Maciej Rozycki, Pekka
Enberg, Andreas Schwab, and Christoph Lameter for review. Special thanks
to Andreas for spotting that my fix only removed half the undefined
behaviour.
Signed-off-by: Peter Lund <firefly@vax64.dk>
Christoph Lameter <clameter@sgi.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Andreas Schwab <schwab@suse.de> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Wed, 17 Oct 2007 06:29:34 +0000 (23:29 -0700)]
store __setup_str_* in a more compact way
__setup_str_* are referenced only during boot, hence there's no need to
waste image space for aligning these strings (with the aim of improving
performance).
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Beulich [Wed, 17 Oct 2007 06:29:33 +0000 (23:29 -0700)]
handle recursive calls to bust_spinlocks()
Various architectures may call bust_spinlocks() recursively; the function
itself, however, doesn't appear to be meant to be called in this manner.
Nevertheless, this doesn't appear to be a problem as long as
bust_spinlocks(0) doesn't get called twice in a row (otherwise,
unblank_screen() may enter the scheduler). However, at least on i386 die()
has been capable of returning (and on other architectures this should
really be that way, too) when notify_die() returns NOTIFY_STOP.
Short of getting a reply to a respective query, this patch makes
bust_spinlocks() increment/decrement oops_in_progress, and wake klogd only
when the count drops back to zero.
Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Wed, 17 Oct 2007 06:29:32 +0000 (23:29 -0700)]
Add a "rounddown_pow_of_two" routine to log2.h
To go along with the existing "roundup_pow_of_two" routine, add one for
rounding down since that operation appears to crop up on a regular basis in
the source tree.
[m.kozlowski@tuxland.pl: fix unbalanced parentheses] Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arjan van de Ven [Wed, 17 Oct 2007 06:29:32 +0000 (23:29 -0700)]
make dmapool code use __set_current_state()
Signed-off-by: Arjan van de Ven <arjan@Linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Wed, 17 Oct 2007 06:29:31 +0000 (23:29 -0700)]
quota: send messages via netlink
Implement sending of quota messages via netlink interface. The advantage
is that in userspace we can better decide what to do with the message - for
example display a dialogue in your X session or just write the message to
the console. As a bonus, we can get rid of problems with console locking
deep inside filesystem code once we remove the old printing mechanism.
Signed-off-by: Jan Kara <jack@suse.cz> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Wed, 17 Oct 2007 06:29:30 +0000 (23:29 -0700)]
Remove final traces of long-deprecated "ramdisk" kernel parm
Since the "ramdisk" kernel parameter has been officially deprecated
since at least 2.6.18, might as well finally get rid of it.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Grant Grundler was asking for more detail about correct usage of local
atomic operations and suggested adding the resulting summary to
local_ops.txt.
"Please add a bit more detail. If DaveM is correct (he normally is), then
there must be limits on how the local_t can be used in the kernel process
and interrupt contexts. I'd like those rules spelled out very clearly
since it's easy to get wrong and tracking down such a bug is quite
painful."
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Grant Grundler <grundler@parisc-linux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert P. J. Day [Wed, 17 Oct 2007 06:29:27 +0000 (23:29 -0700)]
Remove valueless definition of hard-selected RAMFS option
Since CONFIG_RAMFS is currently hard-selected to "y", and since
Documentation/filesystems/ramfs-rootfs-initramfs.txt reads as follows:
"The amount of code required to implement ramfs is tiny, because all the
work is done by the existing Linux caching infrastructure. Basically,
you're mounting the disk cache as a filesystem. Because of this, ramfs is
not an optional component removable via menuconfig, since there would be
negligible space savings."
It seems pointless to leave this as a Kconfig entry.
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adrian Bunk [Wed, 17 Oct 2007 06:29:26 +0000 (23:29 -0700)]
drivers/block/cciss.c: fix check-after-use
The Coverity checker spotted that we have already oops'ed if "disk"
was NULL.
Since "disk" being NULL seems impossible at this point this patch
removes the NULL check.
Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Steve Cameron [Wed, 17 Oct 2007 06:27:37 +0000 (23:27 -0700)]
cciss: fix error reporting for SG_IO
This fixes a problem with the way cciss was filling out the "errors" field
of the request structure upon completion of requests. Previously, it just
put a 1 or a 0 in there and used the negation of this as the uptodate
parameter to one of the functions in the block layer, being a block device.
For the SG_IO ioctl, this was not sufficient, and we noticed that, for
example, sg_turs from sg3_utils did not correctly detect problems due to
cciss having set rq->errors incorrectly.
Signed-off-by: Stephen M. Cameron <steve.cameron@hp.com> Acked-by: Mike Miller <mike.miller@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>