Theodore Ts'o [Thu, 10 Dec 2009 02:09:58 +0000 (21:09 -0500)]
ext4: Do not override ext2 or ext3 if built they are built as modules
The CONFIG_EXT4_USE_FOR_EXT23 option must not try to take over the
ext2 or ext3 file systems if the those file system drivers are
configured to be built as mdoules.
Akira Fujita [Mon, 7 Dec 2009 04:38:31 +0000 (23:38 -0500)]
ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT
This patch fixes three problems in the handling of the
EXT4_IOC_MOVE_EXT ioctl:
1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for
original and donor files, but they allow the illegal write access to
donor file, since donor file is overwritten by original file data. To
fix this problem, change access mode checks of original (r->r/w) and
donor (r->w) files.
2. Disallow the use of donor files that have a setuid or setgid bits.
3. Call mnt_want_write() and mnt_drop_write() before and after
ext4_move_extents() calling to get write access to a mount.
Jan Kara [Wed, 9 Dec 2009 04:51:10 +0000 (23:51 -0500)]
ext4: Wait for proper transaction commit on fsync
We cannot rely on buffer dirty bits during fsync because pdflush can come
before fsync is called and clear dirty bits without forcing a transaction
commit. What we do is that we track which transaction has last changed
the inode and which transaction last changed allocation and force it to
disk on fsync.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Dmitry Monakhov [Wed, 9 Dec 2009 03:42:28 +0000 (22:42 -0500)]
ext4: fix incorrect block reservation on quota transfer.
Inside ->setattr() call both ATTR_UID and ATTR_GID may be valid
This means that we may end-up with transferring all quotas. Add
we have to reserve QUOTA_DEL_BLOCKS for all quotas, as we do in
case of QUOTA_INIT_BLOCKS.
Dmitry Monakhov [Wed, 9 Dec 2009 03:42:15 +0000 (22:42 -0500)]
ext4: quota macros cleanup
Currently all quota block reservation macros contains hard-coded "2"
aka MAXQUOTAS value. This is no good because in some places it is not
obvious to understand what does this digit represent. Let's introduce
new macro with self descriptive name.
Josef Bacik [Wed, 9 Dec 2009 02:48:58 +0000 (21:48 -0500)]
ext4: wait for log to commit when umounting
There is a potential race when a transaction is committing right when
the file system is being umounting. This could reduce in a race
because EXT4_SB(sb)->s_group_info could be freed in ext4_put_super
before the commit code calls a callback so the mballoc code can
release freed blocks in the transaction, resulting in a panic trying
to access the freed s_group_info.
The fix is to wait for the transaction to finish committing before we
shutdown the multiblock allocator.
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Wed, 9 Dec 2009 02:24:33 +0000 (21:24 -0500)]
ext4: Avoid data / filesystem corruption when write fails to copy data
When ext4_write_begin fails after allocating some blocks or
generic_perform_write fails to copy data to write, we truncate blocks
already instantiated beyond i_size. Although these blocks were never
inside i_size, we have to truncate the pagecache of these blocks so
that corresponding buffers get unmapped. Otherwise subsequent
__block_prepare_write (called because we are retrying the write) will
find the buffers mapped, not call ->get_block, and thus the page will
be backed by already freed blocks leading to filesystem and data
corruption.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Theodore Ts'o [Mon, 7 Dec 2009 19:08:51 +0000 (14:08 -0500)]
ext4: Use ext4 file system driver for ext2/ext3 file system mounts
Add a new config option, CONFIG_EXT4_USE_FOR_EXT23 which if enabled,
will cause ext4 to be used for either ext2 or ext3 file system mounts
when ext2 or ext3 is not enabled in the configuration.
This allows minimalist kernel fanatics to drop to file system drivers
from their compiled kernel with out losing functionality.
Akira Fujita [Tue, 24 Nov 2009 15:31:56 +0000 (10:31 -0500)]
ext4: move_extent_per_page() cleanup
Integrate duplicate lines (acquire/release semaphore and invalidate
extent cache in move_extent_per_page()) into mext_replace_branches(),
to reduce source and object code size.
Kazuya Mio [Tue, 24 Nov 2009 15:28:48 +0000 (10:28 -0500)]
ext4: initialize moved_len before calling ext4_move_extents()
The move_extent.moved_len is used to pass back the number of exchanged
blocks count to user space. Currently the caller must clear this
field; but we spend more code space checking for this requirement than
simply zeroing the field ourselves, so let's just make life easier for
everyone all around.
Akira Fujita [Tue, 24 Nov 2009 15:19:57 +0000 (10:19 -0500)]
ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT
At the beginning of ext4_move_extent(), we call
ext4_discard_preallocations() to discard inode PAs of orig and donor
inodes. But in the following case, blocks can be double freed, so
move ext4_discard_preallocations() to the end of ext4_move_extents().
1. Discard inode PAs of orig and donor inodes with
ext4_discard_preallocations() in ext4_move_extents().
orig : [ DATA1 ]
donor: [ DATA2 ]
2. While data blocks are exchanging between orig and donor inodes, new
inode PAs is created to orig by other process's block allocation.
(Since there are semaphore gaps in ext4_move_extents().) And new
inode PAs is used partially (2-1).
2-1 Create new inode PAs to orig inode
orig : [ DATA1 | used PA1 | free PA1 ]
donor: [ DATA2 ]
3. Donor inode which has old orig inode's blocks is deleted after
EXT4_IOC_MOVE_EXT finished (3-1, 3-2). So the block bitmap
corresponds to old orig inode's blocks are freed.
3-1 After EXT4_IOC_MOVE_EXT finished
orig : [ DATA2 | free PA1 ]
donor: [ DATA1 | used PA1 ]
4. The double-free of blocks is occurred, when close() is called to
orig inode. Because ext4_discard_preallocations() for orig inode
frees used PA1 and free PA1, though used PA1 is already freed in 3.
Theodore Ts'o [Mon, 23 Nov 2009 01:48:42 +0000 (20:48 -0500)]
ext4: use ext4_data_block_valid() in ext4_free_blocks()
The block validity framework does a more comprehensive set of checks,
and it saves object code space to use the ext4_data_block_valid() than
the limited open-coded version that had been in ext4_free_blocks().
Theodore Ts'o [Mon, 23 Nov 2009 12:17:05 +0000 (07:17 -0500)]
ext4: call ext4_forget() from ext4_free_blocks()
Add the facility for ext4_forget() to be called from
ext4_free_blocks(). This simplifies the code in a large number of
places, and centralizes most of the work of calling ext4_forget() into
a single place.
Also fix a bug in the extents migration code; it wasn't calling
ext4_forget() when releasing the indirect blocks during the
conversion. As a result, if the system cashed during or shortly after
the extents migration, and the released indirect blocks get reused as
data blocks, the journal replay would corrupt the data blocks. With
this new patch, fixing this bug was as simple as adding the
EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().
Theodore Ts'o [Sun, 22 Nov 2009 12:44:56 +0000 (07:44 -0500)]
ext4: fold ext4_free_blocks() and ext4_mb_free_blocks()
ext4_mb_free_blocks() is only called by ext4_free_blocks(), and the
latter function doesn't really do much. So merge the two functions
together, such that ext4_free_blocks() is now found in
fs/ext4/mballoc.c. This saves about 200 bytes of compiled text space.
Theodore Ts'o [Mon, 23 Nov 2009 02:00:13 +0000 (21:00 -0500)]
ext4: fold ext4_journal_forget() into ext4_forget()
Convert the last two callers of ext4_journal_forget() to use
ext4_forget() instead, and then fold ext4_journal_forget() into
ext4_forget(). This reduces are code complexity and shortens our call
stack.
Theodore Ts'o [Tue, 24 Nov 2009 16:05:59 +0000 (11:05 -0500)]
ext4: fold ext4_journal_revoke() into ext4_forget()
The only caller of ext4_journal_revoke() is ext4_forget(), so we can
fold ext4_journal_revoke() into ext4_forget() to simplify the code and
shorten the call stack.
Theodore Ts'o [Mon, 23 Nov 2009 01:52:12 +0000 (20:52 -0500)]
ext4: move ext4_forget() to ext4_jbd2.c
The ext4_forget() function better belongs in ext4_jbd2.c. This will
allow us to do some cleanup of the ext4_journal_revoke() and
ext4_journal_forget() functions, as well as giving us better error
reporting since we can report the caller of ext4_forget() when things
go wrong.
Eric Sandeen [Thu, 19 Nov 2009 19:28:50 +0000 (14:28 -0500)]
ext4: make "norecovery" an alias for "noload"
Users on the linux-ext4 list recently complained about differences
across filesystems w.r.t. how to mount without a journal replay.
In the discussion it was noted that xfs's "norecovery" option is
perhaps more descriptively accurate than "noload," so let's make
that an alias for ext4.
Also show this status in /proc/mounts
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Eric Sandeen [Thu, 19 Nov 2009 19:25:42 +0000 (14:25 -0500)]
ext4: make trim/discard optional (and off by default)
It is anticipated that when sb_issue_discard starts doing
real work on trim-capable devices, we may see issues. Make
this mount-time optional, and default it to off until we know
that things are working out OK.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Jan Kara [Mon, 23 Nov 2009 12:24:48 +0000 (07:24 -0500)]
ext4: fix error handling in ext4_ind_get_blocks()
When an error happened in ext4_splice_branch we failed to notice that
in ext4_ind_get_blocks and mapped the buffer anyway. Fix the problem
by checking for error properly.
Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
Theodore Ts'o [Mon, 23 Nov 2009 12:24:52 +0000 (07:24 -0500)]
ext4: don't update the superblock in ext4_statfs()
commit a71ce8c6c9bf269b192f352ea555217815cf027e updated ext4_statfs()
to update the on-disk superblock counters, but modified this buffer
directly without any journaling of the change. This is one of the
accesses that was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
Eric Sandeen [Sun, 15 Nov 2009 20:30:52 +0000 (15:30 -0500)]
ext4: journal all modifications in ext4_xattr_set_handle
ext4_xattr_set_handle() was zeroing out an inode outside
of journaling constraints; this is one of the accesses that
was causing the crc errors in journal replay as seen in
kernel.org bugzilla #14354.
Reviewed-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
Julia Lawall [Sun, 15 Nov 2009 20:30:58 +0000 (15:30 -0500)]
ext4: fix i_flags access in ext4_da_writepages_trans_blocks()
We need to be testing the i_flags field in the ext4 specific portion
of the inode, instead of the (confusingly aliased) i_flags field in
the generic struct inode.
Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
Theodore Ts'o [Mon, 23 Nov 2009 12:17:34 +0000 (07:17 -0500)]
ext4: make sure directory and symlink blocks are revoked
When an inode gets unlinked, the functions ext4_clear_blocks() and
ext4_remove_blocks() call ext4_forget() for all the buffer heads
corresponding to the deleted inode's data blocks. If the inode is a
directory or a symlink, the is_metadata parameter must be non-zero so
ext4_forget() will revoke them via jbd2_journal_revoke(). Otherwise,
if these blocks are reused for a data file, and the system crashes
before a journal checkpoint, the journal replay could end up
corrupting these data blocks.
Thanks to Curt Wohlgemuth for pointing out potential problems in this
area.
Theodore Ts'o [Mon, 23 Nov 2009 02:00:01 +0000 (21:00 -0500)]
ext4: remove failed journal checksum check
Now that we are checking for failed journal checksums in the jbd2
layer, we don't need to check in the ext4 mount path --- since a
checksum fail will result in ext4_load_journal() returning an error,
causing the file system to refuse to be mounted until e2fsck can deal
with the problem.
Theodore Ts'o [Sun, 15 Nov 2009 20:31:37 +0000 (15:31 -0500)]
jbd2: don't wipe the journal on a failed journal checksum
If there is a failed journal checksum, don't reset the journal. This
allows for userspace programs to decide how to recover from this
situation. It may be that ignoring the journal checksum failure might
be a better way of recovering the file system. Once we add per-block
checksums, we can definitely do better. Until then, a system
administrator can try backing up the file system image (or taking a
snapshot) and and trying to determine experimentally whether ignoring
the checksum failure or aborting the journal replay results in less
data loss.
Theodore Ts'o [Sat, 14 Nov 2009 13:19:05 +0000 (08:19 -0500)]
ext4: plug a buffer_head leak in an error path of ext4_iget()
One of the invalid error paths in ext4_iget() forgot to brelse() the
inode buffer head. Fix it by adding a brelse() in the common error
return path, which also simplifies function.
Thanks to Andi Kleen <ak@linux.intel.com> reporting the problem.
Akira Fujita [Mon, 23 Nov 2009 12:24:41 +0000 (07:24 -0500)]
ext4: fix possible recursive locking warning in EXT4_IOC_MOVE_EXT
If CONFIG_PROVE_LOCKING is enabled, the double_down_write_data_sem()
will trigger a false-positive warning of a recursive lock. Since we
take i_data_sem for the two inodes ordered by their inode numbers,
this isn't a problem. Use of down_write_nested() will notify the lock
dependency checker machinery that there is no problem here.
Akira Fujita [Mon, 23 Nov 2009 12:24:43 +0000 (07:24 -0500)]
ext4: fix lock order problem in ext4_move_extents()
ext4_move_extents() checks the logical block contiguousness
of original file with ext4_find_extent() and mext_next_extent().
Therefore the extent which ext4_ext_path structure indicates
must not be changed between above functions.
But in current implementation, there is no i_data_sem protection
between ext4_ext_find_extent() and mext_next_extent(). So the extent
which ext4_ext_path structure indicates may be overwritten by
delalloc. As a result, ext4_move_extents() will exchange wrong blocks
between original and donor files. I change the place where
acquire/release i_data_sem to solve this problem.
Moreover, I changed move_extent_per_page() to start transaction first,
and then acquire i_data_sem. Without this change, there is a
possibility of the deadlock between mmap() and ext4_move_extents():
* NOTE: "A", "B" and "C" mean different processes
A-1: ext4_ext_move_extents() acquires i_data_sem of two inodes.
B: do_page_fault() starts the transaction (T),
and then tries to acquire i_data_sem.
But process "A" is already holding it, so it is kept waiting.
C: While "A" and "B" running, kjournald2 tries to commit transaction (T)
but it is under updating, so kjournald2 waits for it.
A-2: Call ext4_journal_start with holding i_data_sem,
but transaction (T) is locked.
Akira Fujita [Mon, 23 Nov 2009 12:25:48 +0000 (07:25 -0500)]
ext4: fix the returned block count if EXT4_IOC_MOVE_EXT fails
If the EXT4_IOC_MOVE_EXT ioctl fails, the number of blocks that were
exchanged before the failure should be returned to the userspace
caller. Unfortunately, currently if the block size is not the same as
the page size, the returned block count that is returned is the
page-aligned block count instead of the actual block count. This
commit addresses this bug.
Theodore Ts'o [Mon, 23 Nov 2009 12:24:46 +0000 (07:24 -0500)]
ext4: avoid divide by zero when trying to mount a corrupted file system
If s_log_groups_per_flex is greater than 31, then groups_per_flex will
will overflow and cause a divide by zero error. This can cause kernel
BUG if such a file system is mounted.
Thanks to Nageswara R Sastry for analyzing the failure and providing
an initial patch.
Theodore Ts'o [Mon, 23 Nov 2009 12:25:49 +0000 (07:25 -0500)]
ext4: fix potential buffer head leak when add_dirent_to_buf() returns ENOSPC
Previously add_dirent_to_buf() did not free its passed-in buffer head
in the case of ENOSPC, since in some cases the caller still needed it.
However, this led to potential buffer head leaks since not all callers
dealt with this correctly. Fix this by making simplifying the freeing
convention; now add_dirent_to_buf() *never* frees the passed-in buffer
head, and leaves that to the responsibility of its caller. This makes
things cleaner and easier to prove that the code is neither leaking
buffer heads or calling brelse() one time too many.
Mike Hommey [Wed, 11 Nov 2009 22:26:55 +0000 (14:26 -0800)]
__generic_block_fiemap(): fix for files bigger than 4GB
Because of an integer overflow on start_blk, various kind of wrong results
would be returned by the generic_block_fiemap() handler, such as no
extents when there is a 4GB+ hole at the beginning of the file, or wrong
fe_logical when an extent starts after the first 4GB.
Signed-off-by: Mike Hommey <mh@glandium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Eric Sandeen <sandeen@sgi.com> Cc: Josef Bacik <jbacik@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rodolfo Giometti [Wed, 11 Nov 2009 22:26:54 +0000 (14:26 -0800)]
pps: events reporting fix up
PPS events must be recorded according to PPS's mode settings.
If a process asks for (i.e.) capture-assert events only, when the PPS
client calls the pps_event() function to save the current PPS event, we
should verify the event type and then discard unwanted ones.
Also, without this patch userland processes waiting for a specific PPS
event (assert or clear but not both) may be awakened at wrong time.
Signed-off-by: Rodolfo Giometti <giometti@linux.it> Tested-by: William S. Brasher <billb958@door.net> Tested-by: Reg Clemens <clemens@dwf.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergei Shtylyov [Wed, 11 Nov 2009 22:26:50 +0000 (14:26 -0800)]
gpiolib: fix device_create() result check
In case of failure, device_create() returns not NULL but the error code.
The current code checks for non-NULL though which causes kernel oops in
sysfs_create_group() when device_create() fails. Check for error using
IS_ERR() and propagate the error value using PTR_ERR() instead of fixed
-ENODEV code returned now...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Scott Valentine [Wed, 11 Nov 2009 22:26:49 +0000 (14:26 -0800)]
rtc: v3020: fix v3020_mmio_read_bit()
v3020_mmio_read_bit() always returns 0 when left_shift > 7.
v3020_mmio_read_bit()'s return type is (unsigned char). The code returns
a value masked by (1 << left_shift) that is casted to the return type. If
left_shift is larger than 7, the cast will always result in a 0 return
value. The problem was discovered with left_shift = 16, and the included
patch corrects the problem.
The bug was introduced in the last (Apr 3 2009) commit of the file, kernel
versions 2.6.30 and later.
Yoichi Yuasa [Wed, 11 Nov 2009 22:26:48 +0000 (14:26 -0800)]
rtc-vr41xx: fix do_div() warning
drivers/rtc/rtc-vr41xx.c: In function 'vr41xx_rtc_irq_set_freq':
drivers/rtc/rtc-vr41xx.c:217: warning: comparison of distinct pointer types lacks a cast
drivers/rtc/rtc-vr41xx.c:217: warning: right shift count >= width of type
drivers/rtc/rtc-vr41xx.c:217: warning: passing argument 1 of '__div64_32' from incompatible pointer type
include/asm-generic/div64.h:35: note: expected 'uint64_t *' but argument is of type 'long unsigned int *'
Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Paul Gortmaker <p_gortmaker@yahoo.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
rtc: pcf50633: consider alrm->enable in pcf50633_rtc_set_alarm
According to Documentation/rtc.txt, RTC_WKALM_SET sets the alarm time and
enables/disables the alarm. We implement RTC_WKALM_SET through
pcf50633_rtc_set_alarm. The enabling/disabling part was missing.
Signed-off-by: Werner Almesberger <werner@openmoko.org> Reported-by: Michael 'Mickey' Lauer <mickey@openmoko.org> Signed-off-by: Paul Fertser <fercerpav@gmail.com> Cc: Paul Gortmaker <p_gortmaker@yahoo.com> Cc: Balaji Rao <balajirrao@openmoko.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nicolas Ferre [Wed, 11 Nov 2009 22:26:35 +0000 (14:26 -0800)]
atmel_lcdfb: new alternate pixel clock formula
at91sam9g45 non ES lots have an alternate pixel clock calculation formula.
Introduce this one with condition on the cpu_is_xxxxx() macros.
Newer 9g45 SOC will not have good pixel clock calculation without this
fix.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Krzysztof Helt <krzysztof.h1@wp.pl> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Heiko Carstens [Wed, 11 Nov 2009 22:26:34 +0000 (14:26 -0800)]
fs: add missing compat_ptr handling for FS_IOC_RESVSP ioctl
For FS_IOC_RESVSP and FS_IOC_RESVSP64 compat_sys_ioctl() uses its
arg argument as a pointer to userspace. However it is missing a
a call to compat_ptr() which will do a proper pointer conversion.
This was introduced with 3e63cbb1 "fs: Add new pre-allocation ioctls
to vfs for compatibility with legacy xfs ioctls".
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ankit Jain <me@ankitjain.org> Acked-by: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Arnd Bergmann <arndbergmann@googlemail.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: <stable@kernel.org> [2.6.31.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pidns: fix a leak in /proc dentries and inodes with pid namespaces.
Daniel Lezcano reported a leak in 'struct pid' and 'struct pid_namespace'
that is discussed in:
http://lkml.org/lkml/2009/10/2/159.
To summarize the thread, when container-init is terminated, it sets the
PF_EXITING flag, zaps other processes in the container and waits to reap
them. As a part of reaping, the container-init should flush any /proc
dentries associated with the processes. But because the container-init is
itself exiting and the following PF_EXITING check, the dentries are not
flushed, resulting in leak in /proc inodes and dentries.
This fix reverts the commit 7766755a2f249e7e0 ("Fix /proc dcache deadlock
in do_exit") which introduced the check for PF_EXITING. At the time of
the commit, shrink_dcache_parent() flushed dentries from other filesystems
also and could have caused a deadlock which the commit fixed. But as
pointed out by Eric Biederman, after commit 0feae5c47aabdde59,
shrink_dcache_parent() no longer affects other filesystems. So reverting
the commit is now safe.
As pointed out by Jan Kara, the leak is not as critical since the
unclaimed space will be reclaimed under memory pressure or by:
echo 3 > /proc/sys/vm/drop_caches
But since this check is no longer required, its best to remove it.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Reported-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Jan Kara <jack@ucw.cz> Cc: Andrea Arcangeli <andrea@cpushare.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alpha: Clean up linker script using new linker script macros.
Note that .data.page_aligned and .data.cacheline_aligned are now after
_data; it was probably a bug that they were before it.
Also, some explicit ALIGN(8)'s between various initcall sections were
removed; this should be harmless as the implicit alignment of
initcall_t was already 8.
Cc: Geoffrey Thomas <geofft@ksplice.com> Cc: Tim Abbott <tabbott@ksplice.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Wed, 11 Nov 2009 22:26:25 +0000 (14:26 -0800)]
savagefb: fix blanking mode on CRT display
Fix wrong bit mask for blanking register. Due to the error a CRT monitor
blanks off due to wrong frequency (out of range) instead of PM signal
(vertical and horizontal frequencies cut off).
Just compare the mask with bits set in the switch(blank) clause below the
changed line.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Wed, 11 Nov 2009 22:26:22 +0000 (14:26 -0800)]
fb: remove fb_save_state() and fb_restore_state operations
Remove fb_save_state() and fb_restore_state operations from frame buffer layer.
They are used only in two drivers:
1. savagefb - and cause bug #11248
2. uvesafb
Usage of these operations is misunderstood in both drivers so kill these
operations, fix the bug #11248 and avoid confusion in the future.
Tested on Savage 3D/MV card and the patch fixes the bug #11248.
The frame buffer layer uses these funtions during switch between graphics
and text mode of the console, but these drivers saves state before
switching of the frame buffer (in the fb_open) and after releasing it (in
the fb_release). This defeats the purpose of these operations.
Mel Gorman [Wed, 11 Nov 2009 22:26:17 +0000 (14:26 -0800)]
page allocator: Do not allow interrupts to use ALLOC_HARDER
Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 ("page allocator:
calculate the alloc_flags for allocation only once") altered watermark
logic slightly by allowing rt_tasks that are handling an interrupt to set
ALLOC_HARDER. This patch brings the watermark logic more in line with
2.6.30.
This change results in a reduction of the number high-order GFP_ATOMIC
allocation failures reported. See
http://www.gossamer-threads.com/lists/linux/kernel/1144153
[rientjes@google.com: Spotted the problem] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Wed, 11 Nov 2009 22:26:14 +0000 (14:26 -0800)]
page allocator: always wake kswapd when restarting an allocation attempt after direct reclaim failed
If a direct reclaim makes no forward progress, it considers whether it
should go OOM or not. Whether OOM is triggered or not, it may retry the
allocation afterwards. In times past, this would always wake kswapd as
well but currently, kswapd is not woken up after direct reclaim fails.
For order-0 allocations, this makes little difference but if there is a
heavy mix of higher-order allocations that direct reclaim is failing for,
it might mean that kswapd is not rewoken for higher orders as much as it
did previously.
This patch wakes up kswapd when an allocation is being retried after a
direct reclaim failure. It would be expected that kswapd is already
awake, but this has the effect of telling kswapd to reclaim at the higher
order as well.
Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Turquette [Wed, 11 Nov 2009 19:00:38 +0000 (11:00 -0800)]
omap3: Decrease cpufreq transition latency
Adjust OMAP3 frequency transition latency from 10,000,000uS to a more
reasonable 300,000uS. This causes ondemand and conservative governors to
sample CPU load more often resulting in more responsive behavior.
Tested on Android 2.6.29; using this value and conservative governor, CORE
power consumption on Zoom2 was comparable to the old and unresponsive
10,000,000uS value while UI responsiveness was greatly improved.
Signed-off-by: Mike Turquette <mturquette@ti.com> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: fix panic when trying to destroy a newly allocated
Btrfs: allow more metadata chunk preallocation
Btrfs: fallback on uncompressed io if compressed io fails
Btrfs: find ideal block group for caching
Btrfs: avoid null deref in unpin_extent_cache()
Btrfs: skip btrfs_release_path in btrfs_update_root and btrfs_del_root
Btrfs: fix some metadata enospc issues
Btrfs: fix how we set max_size for free space clusters
Btrfs: cleanup transaction starting and fix journal_info usage
Btrfs: fix data allocation hint start
Linus Torvalds [Wed, 11 Nov 2009 21:32:29 +0000 (13:32 -0800)]
btusb bluetooth driver: wait for 'waker' work too before closing
Rafael debugged a resume-time hang (with oopses in workqueue handling)
on his laptop that was due to the 'waker' workqueue entry being
disconnected and then released without the workqueue entry having been
synchronized.
Several people were involved, with Oleg Nesterov doing a debugging patch
showing what workqueue entry was corrupt etc.
This was a regression introduced by commit 7bee549e19 ("Bluetooth: Add
USB autosuspend support to btusb driver") as Rafael points out (not
actually bisected, but it became clear once the bug was found).
Tested-and-reported-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Oliver Neukum <oliver@neukum.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josef Bacik [Wed, 11 Nov 2009 20:53:34 +0000 (15:53 -0500)]
Btrfs: fix panic when trying to destroy a newly allocated
There is a problem where iget5_locked will look for an inode, not find it, and
then subsequently try to allocate it. Another CPU will have raced in and
allocated the inode instead, so when iget5_locked gets the inode spin lock again
and does a search, it finds the new inode. So it goes ahead and calls
destroy_inode on the inode it just allocated. The problem is we don't set
BTRFS_I(inode)->root until the new inode is completely initialized. This patch
makes us set root to NULL when alloc'ing a new inode, so when we get to
btrfs_destroy_inode and we see that root is NULL we can just free up the memory
and continue on. This fixes the panic
Linus Torvalds [Wed, 11 Nov 2009 19:52:22 +0000 (11:52 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
JBD/JBD2: free j_wbuf if journal init fails.
ext3: Wait for proper transaction commit on fsync
ext3: retry failed direct IO allocations
Linus Torvalds [Wed, 11 Nov 2009 19:34:14 +0000 (11:34 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86/PCI: Adjust GFP mask handling for coherent allocations
PCI ASPM: fix oops on root port removal
Linus Torvalds [Wed, 11 Nov 2009 19:33:08 +0000 (11:33 -0800)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: pasemi_defconfig update
powerpc: 2.6.32 update of defconfigs for embedded 6xx/7xxx, 8xx, 8{3,5,6}xxx
powerpc/8xxx: enable IPsec ESP by default on mpc83xx/mpc85xx
powerpc/83xx: Fix u-boot partion size for MPC8377E-WLAN boards
powerpc/85xx: Fix USB GPIOs for MPC8569E-MDS boards
powerpc/82xx: kmalloc failure ignored in ep8248e_mdio_probe()
powerpc/85xx: sbc8548 - fixup of PCI-e related DTS fields
Linus Torvalds [Wed, 11 Nov 2009 19:32:04 +0000 (11:32 -0800)]
Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (52 commits)
drm/kms: Init the CRTC info fields for modes forced from the command line.
drm/radeon/r600: CS parser updates
drm/radeon/kms: add debugfs for power management for AtomBIOS devices
drm/radeon/kms: initial mode validation support
drm/radeon/kms/atom/dce3: call transmitter init on mode set
drm/radeon/kms: store detailed connector info
drm/radeon/kms/atom/dce3: fix up usPixelClock calculation for Transmitter tables
drm/radeon/kms/r600: fix rs880 support v2
drm/radeon/kms/r700: fix some typos in chip init
drm/radeon/kms: remove some misleading debugging output
drm/radeon/kms: stop putting VRAM at 0 in MC space on r600s.
drm/radeon/kms: disable D1VGA and D2VGA if enabled
drm/radeon/kms: Don't RMW CP_RB_CNTL
drm/radeon/kms: fix coherency issues on AGP cards.
drm/radeon/kms: fix rc410 suspend/resume.
drm/radeon/kms: add quirk for hp dc5750
drm/radeon/kms/atom: fix potential oops in spread spectrum code
drm/kms: typo fix
drm/radeon/kms/atom: Make card_info per device
drm/radeon/kms/atom: Fix DVO support
...
Linus Torvalds [Wed, 11 Nov 2009 19:30:15 +0000 (11:30 -0800)]
Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
highmem: Fix debug_kmap_atomic() to also handle KM_IRQ_PTE, KM_NMI, and KM_NMI_PTE
highmem: Fix race in debug_kmap_atomic() which could cause warn_count to underflow
rcu: Fix long-grace-period race between forcing and initialization
uids: Prevent tear down race
Linus Torvalds [Wed, 11 Nov 2009 19:29:34 +0000 (11:29 -0800)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf tools: Fix permission checks
perf_events: Fix some typo in the perf events config description
Linus Torvalds [Wed, 11 Nov 2009 19:29:24 +0000 (11:29 -0800)]
Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Use root_task_group_empty only with FAIR_GROUP_SCHED
sched: Fix kernel-doc function parameter name
Linus Torvalds [Wed, 11 Nov 2009 19:29:10 +0000 (11:29 -0800)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, amd-ucode: Check UCODE_MAGIC before loading the container file
x86: Fix error return sequence in __ioremap_caller()
x86: Add Phoenix/MSC BIOSes to lowmem corruption list
Linus Torvalds [Wed, 11 Nov 2009 19:28:11 +0000 (11:28 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: partial revert to fix double brelse WARNING()
ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O
ext4: code clean up for dio fallocate handling
ext4: skip conversion of uninit extents after direct IO if there isn't any
ext4: fix ext4_ext_direct_IO()'s return value after converting uninit extents
ext4: discard preallocation when restarting a transaction during truncate