]> git.karo-electronics.de Git - karo-tx-linux.git/log
karo-tx-linux.git
14 years agoext4: avoid divide by zero when trying to mount a corrupted file system
Theodore Ts'o [Mon, 23 Nov 2009 12:24:46 +0000 (07:24 -0500)]
ext4: avoid divide by zero when trying to mount a corrupted file system

(cherry picked from commit 503358ae01b70ce6909d19dd01287093f6b6271c)

If s_log_groups_per_flex is greater than 31, then groups_per_flex will
will overflow and cause a divide by zero error.  This can cause kernel
BUG if such a file system is mounted.

Thanks to Nageswara R Sastry for analyzing the failure and providing
an initial patch.

http://bugzilla.kernel.org/show_bug.cgi?id=14287

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: fix potential buffer head leak when add_dirent_to_buf() returns ENOSPC
Theodore Ts'o [Mon, 23 Nov 2009 12:25:49 +0000 (07:25 -0500)]
ext4: fix potential buffer head leak when add_dirent_to_buf() returns ENOSPC

(cherry picked from commit 2de770a406b06dfc619faabbf5d85c835ed3f2e1)

Previously add_dirent_to_buf() did not free its passed-in buffer head
in the case of ENOSPC, since in some cases the caller still needed it.
However, this led to potential buffer head leaks since not all callers
dealt with this correctly.  Fix this by making simplifying the freeing
convention; now add_dirent_to_buf() *never* frees the passed-in buffer
head, and leaves that to the responsibility of its caller.  This makes
things cleaner and easier to prove that the code is neither leaking
buffer heads or calling brelse() one time too many.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O
Mingming [Fri, 6 Nov 2009 09:01:23 +0000 (04:01 -0500)]
ext4: Fix return value of ext4_split_unwritten_extents() to fix direct I/O

(cherry picked from commit ba230c3f6dc88ec008806adb27b12088486d508e)

To prepare for a direct I/O write, we need to split the unwritten
extents before submitting the I/O.  When no extents needed to be
split, ext4_split_unwritten_extents() was incorrectly returning 0
instead of the size of uninitialized extents. This bug caused the
wrong return value sent back to VFS code when it gets called from
async IO path, leading to an unnecessary fall back to buffered IO.

This bug also hid the fact that the check to see whether or not a
split would be necessary was incorrect; we can only skip splitting the
extent if the write completely covers the uninitialized extent.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: code clean up for dio fallocate handling
Mingming [Tue, 3 Nov 2009 19:44:54 +0000 (14:44 -0500)]
ext4: code clean up for dio fallocate handling

(cherry picked from commit 4b70df181611012a3556f017b57dfcef7e1d279f)

The ext4_debug() call in ext4_end_io_dio() should be moved after the
check to make sure that io_end is non-NULL.

The comment above ext4_get_block_dio_write() ("Maximum number of
blocks...") is a duplicate; the original and correct comment is above
the #define DIO_MAX_BLOCKS up above.

Based on review comments from Curt Wohlgemuth.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: skip conversion of uninit extents after direct IO if there isn't any
Mingming [Tue, 10 Nov 2009 15:48:04 +0000 (10:48 -0500)]
ext4: skip conversion of uninit extents after direct IO if there isn't any

(cherry picked from commit 5f5249507e4b5c4fc0f9c93f33d133d8c95f47e1)

At the end of direct I/O operation, ext4_ext_direct_IO() always called
ext4_convert_unwritten_extents(), regardless of whether there were any
unwritten extents involved in the I/O or not.

This commit adds a state flag so that ext4_ext_direct_IO() only calls
ext4_convert_unwritten_extents() when necessary.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: fix ext4_ext_direct_IO()'s return value after converting uninit extents
Mingming [Tue, 10 Nov 2009 15:48:08 +0000 (10:48 -0500)]
ext4: fix ext4_ext_direct_IO()'s return value after converting uninit extents

(cherry picked from commit 109f55651954def97fa41ee71c464d268c512ab0)

After a direct I/O request covering an uninitalized extent (i.e.,
created using the fallocate system call) or a hole in a file, ext4
will convert the uninitialized extent so it is marked as initialized
by calling ext4_convert_unwritten_extents().  This function returns
zero on success.

This return value was getting returned by ext4_direct_IO(); however
the file system's direct_IO function is supposed to return the number
of bytes read or written on a success.  By returning zero, it confused
the direct I/O code into falling back to buffered I/O unnecessarily.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: discard preallocation when restarting a transaction during truncate
Aneesh Kumar K.V [Mon, 2 Nov 2009 23:50:49 +0000 (18:50 -0500)]
ext4: discard preallocation when restarting a transaction during truncate

(cherry picked from commit fa5d11133b07053270e18fa9c18560e66e79217e)

When restart a transaction during a truncate operation, we drop and
reacquire i_data_sem.  After reacquiring i_data_sem, we need to
discard any inode-based preallocation that might have been grabbed
while we released i_data_sem (for example, if pdflush is allocating
blocks and racing against the truncate).

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: retry failed direct IO allocations
Eric Sandeen [Sat, 3 Oct 2009 01:20:55 +0000 (21:20 -0400)]
ext4: retry failed direct IO allocations

(cherry picked from commit fbbf69456619de5d251cb9f1df609069178c62d5)

On a 256M filesystem, doing this in a loop:

        xfs_io -F -f -d -c 'pwrite 0 64m' test
        rm -f test

eventually leads to ENOSPC.  (the xfs_io command does a
64m direct IO write to the file "test")

As with other block allocation callers, it looks like we need to
potentially retry the allocations on the initial ENOSPC.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: fix a BUG_ON crash by checking that page has buffers attached to it
Theodore Ts'o [Thu, 1 Oct 2009 02:57:41 +0000 (22:57 -0400)]
ext4: fix a BUG_ON crash by checking that page has buffers attached to it

(cherry picked from commit 1f94533d9cd75f6d2826018d54a971b9cc085992)

In ext4_num_dirty_pages() we were calling page_buffers() before
checking to see if the page actually had pages attached to it; this
would cause a BUG check crash in the inline function page_buffers().

Thanks to Markus Trippelsdorf for reporting this bug.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix time encoding with extra epoch bits
Theodore Ts'o [Wed, 30 Sep 2009 05:13:55 +0000 (01:13 -0400)]
ext4: Fix time encoding with extra epoch bits

(cherry picked from commit c1fccc0696bcaff6008c11865091f5ec4b0937ab)

"Looking at ext4.h, I think the setting of extra time fields forgets to
mask the epoch bits so the epoch part overwrites nsec part. The second
change is only for coherency (2 -> EXT4_EPOCH_BITS)."

Thanks to Damien Guibouret for pointing out this problem.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Handle nested ext4_journal_start/stop calls without a journal
Curt Wohlgemuth [Tue, 29 Sep 2009 15:01:03 +0000 (11:01 -0400)]
ext4: Handle nested ext4_journal_start/stop calls without a journal

(cherry picked from commit d3d1faf6a74496ea4435fd057c6a2cad49f3e523)

This patch fixes a problem with handling nested calls to
ext4_journal_start/ext4_journal_stop, when there is no journal present.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Make sure ext4_dirty_inode() updates the inode in no journal mode
Curt Wohlgemuth [Tue, 29 Sep 2009 20:06:01 +0000 (16:06 -0400)]
ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode

(cherry picked from commit f3dc272fd5e2ae08244796bb39e7e1ce4b25d3b3)

This patch a problem that ext4_dirty_inode() was not calling
ext4_mark_inode_dirty() if the current_handle is not valid, which it
is the case in no journal mode.

It also removes a test for non-matching transaction which can never
happen.

Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Avoid updating the inode table bh twice in no journal mode
Frank Mayhar [Tue, 29 Sep 2009 14:07:47 +0000 (10:07 -0400)]
ext4: Avoid updating the inode table bh twice in no journal mode

(cherry picked from commit 830156c79b0a99ddf0f62496bcf4de640f9f52cd)

This is a cleanup of commit 91ac6f4.  Since ext4_mark_inode_dirty()
has already called ext4_mark_iloc_dirty(), which in turn calls
ext4_do_update_inode(), it's not necessary to have ext4_write_inode()
call ext4_do_update_inode() in no journal mode.  Indeed, it would be
duplicated work.

Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first
Theodore Ts'o [Mon, 28 Sep 2009 19:58:29 +0000 (15:58 -0400)]
ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first

(cherry picked from commit f3ce8064b388ccf420012c5a4907aae4f13fe9d0)

Move the check to make sure the original and donor inodes are
different earlier, to avoid a potential deadlock by trying to lock the
same inode twice.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: async direct IO for holes and fallocate support
Mingming Cao [Mon, 28 Sep 2009 19:48:29 +0000 (15:48 -0400)]
ext4: async direct IO for holes and fallocate support

(cherry picked from commit 8d5d02e6b176565c77ff03604908b1453a22044d)

For async direct IO that covers holes or fallocate, the end_io
callback function now queued the convertion work on workqueue but
don't flush the work rightaway as it might take too long to afford.

But when fsync is called after all the data is completed, user expects
the metadata also being updated before fsync returns.

Thus we need to flush the conversion work when fsync() is called.
This patch keep track of a listed of completed async direct io that
has a work queued on workqueue.  When fsync() is called, it will go
through the list and do the conversion.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Use end_io callback to avoid direct I/O fallback to buffered I/O
Mingming Cao [Mon, 28 Sep 2009 19:48:41 +0000 (15:48 -0400)]
ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O

(cherry picked from commit 4c0425ff68b1b87b802ffeda7b6a46ff7da7241c)

Currently the DIO VFS code passes create = 0 when writing to the
middle of file.  It does this to avoid block allocation for holes, so
as not to expose stale data out when there is a parallel buffered read
(which does not hold the i_mutex lock).  Direct I/O writes into holes
falls back to buffered IO for this reason.

Since preallocated extents are treated as holes when doing a
get_block() look up (buffer is not mapped), direct IO over fallocate
also falls back to buffered IO.  Thus ext4 actually silently falls
back to buffered IO in above two cases, which is undesirable.

To fix this, this patch creates unitialized extents when a direct I/O
write into holes in sparse files, and registering an end_io callback which
converts the uninitialized extent to an initialized extent after the
I/O is completed.

Singed-Off-By: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Split uninitialized extents for direct I/O
Mingming Cao [Mon, 28 Sep 2009 19:49:08 +0000 (15:49 -0400)]
ext4: Split uninitialized extents for direct I/O

(cherry picked from commit 0031462b5b392f90d17f1d75abb795883c44e969)

When writing into an unitialized extent via direct I/O, and the direct
I/O doesn't exactly cover the unitialized extent, split the extent
into uninitialized and initialized extents before submitting the I/O.
This avoids needing to deal with an ENOSPC error in the end_io
callback that gets used for direct I/O.

When the IO is complete, the written extent will be marked as initialized.

Singed-Off-By: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: release reserved quota when block reservation for delalloc retry
Mingming Cao [Mon, 28 Sep 2009 19:49:52 +0000 (15:49 -0400)]
ext4: release reserved quota when block reservation for delalloc retry

(cherry picked from commit 9f0ccfd8e07d61b413e6536ffa02fbf60d2e20d8)

ext4_da_reserve_space() can reserve quota blocks multiple times if
ext4_claim_free_blocks() fail and we retry the allocation. We should
release the quota reservation before restarting.

Bug found by Jan Kara.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
Theodore Ts'o [Tue, 29 Sep 2009 17:31:31 +0000 (13:31 -0400)]
ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks

(cherry picked from commit 55138e0bc29c0751e2152df9ad35deea542f29b3)

Work around problems in the writeback code to force out writebacks in
larger chunks than just 4mb, which is just too small.  This also works
around limitations in the ext4 block allocator, which can't allocate
more than 2048 blocks at a time.  So we need to defeat the round-robin
characteristics of the writeback code and try to write out as many
blocks in one inode before allowing the writeback code to move on to
another inode.  We add a a new per-filesystem tunable,
max_writeback_mb_bump, which caps this to a default of 128mb per
inode.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix hueristic which avoids group preallocation for closed files
Theodore Ts'o [Mon, 28 Sep 2009 04:06:20 +0000 (00:06 -0400)]
ext4: Fix hueristic which avoids group preallocation for closed files

(cherry picked from commit 71780577306fd1e76c7a92e3b308db624d03adb9)

The hueristic was designed to avoid using locality group preallocation
when writing the last segment of a closed file.  Fix it by move
setting size to the maximum of size and isize until after we check
whether size == isize.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix the alloc on close after a truncate hueristic
Theodore Ts'o [Thu, 17 Sep 2009 13:34:16 +0000 (09:34 -0400)]
ext4: Fix the alloc on close after a truncate hueristic

(cherry picked from commit 5534fb5bb35a62a94e0bd1fa2421f7fb6e894f10)

In an attempt to avoid doing an unneeded flush after opening a
(previously non-existent) file with O_CREAT|O_TRUNC, the code only
triggered the hueristic if ei->disksize was non-zero.  Turns out that
the VFS doesn't call ->truncate() if the file doesn't exist, and
ei->disksize is always zero even if the file previously existed.  So
remove the test, since it isn't necessary and in fact disabled the
hueristic.

Thanks to Clemens Eisserer that he was seeing problems with files
written using kwrite and eclipse after sudden crashes caused by a
buggy Intel video driver.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags
Theodore Ts'o [Thu, 17 Sep 2009 12:32:22 +0000 (08:32 -0400)]
ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags

(cherry picked from commit 1b9c12f44c1eb614fd3b8822bfe8f1f5d8e53737)

EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag,
and the hex value assigned to it collides with FS_DIRECTIO_FL (which
is also stored in i_flags).  There's no reason for the
EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use
i_state instead.

Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: limit block allocations for indirect-block files to < 2^32
Eric Sandeen [Wed, 16 Sep 2009 18:45:10 +0000 (14:45 -0400)]
ext4: limit block allocations for indirect-block files to < 2^32

(cherry picked from commit fb0a387dcdcd21aab1b09ee7fd80b7c979bdbbfd)

Today, the ext4 allocator will happily allocate blocks past
2^32 for indirect-block files, which results in the block
numbers getting truncated, and corruption ensues.

This patch limits such allocations to < 2^32, and adds
BUG_ONs if we do get blocks larger than that.

This should address RH Bug 519471, ext4 bitmap allocator
must limit blocks to < 2^32

* ext4_find_goal() is modified to choose a goal < UINT_MAX,
  so that our starting point is in an acceptable range.

* ext4_xattr_block_set() is modified such that the goal block
  is < UINT_MAX, as above.

* ext4_mb_regular_allocator() is modified so that the group
  search does not continue into groups which are too high

* ext4_mb_use_preallocated() has a check that we don't use
  preallocated space which is too far out

* ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs

No attempt has been made to limit inode locations to < 2^32,
so we may wind up with blocks far from their inodes.  Doing
this much already will lead to some odd ENOSPC issues when the
"lower 32" gets full, and further restricting inodes could
make that even weirder.

For high inodes, choosing a goal of the original, % UINT_MAX,
may be a bit odd, but then we're in an odd situation anyway,
and I don't know of a better heuristic.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix different block exchange issue in EXT4_IOC_MOVE_EXT
Akira Fujita [Wed, 16 Sep 2009 18:25:39 +0000 (14:25 -0400)]
ext4: Fix different block exchange issue in EXT4_IOC_MOVE_EXT

(cherry picked from commit c40ce3c9ea97425a12d7e44031a98fe50add6fc1)

If logical block offset of original file which is passed to
EXT4_IOC_MOVE_EXT is different from donor file's,
a calculation error occurs in ext4_calc_swap_extents(),
therefore wrong block is exchanged between original file and donor file.
As a result, we hit ext4_error() in check_block_validity().
To detect the logical offset difference in EXT4_IOC_MOVE_EXT,
add checks to mext_calc_swap_extents() and handle it as error,
since data exchange must be done between the same blocks in EXT4_IOC_MOVE_EXT.

Reported-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Add null extent check to ext_get_path
Akira Fujita [Wed, 16 Sep 2009 18:25:07 +0000 (14:25 -0400)]
ext4: Add null extent check to ext_get_path

(cherry picked from commit 347fa6f1c7cb5df2b38d3c9167cfe242ce0cd1da)

There is the possibility that path structure which is taken
by ext4_ext_find_extent() indicates null extents.
Because during data block exchanging in ext4_move_extents(),
constitution of an extent tree may be changed.
As a solution, the patch adds null extent check
to ext_get_path().

Reported-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Replace BUG_ON() with ext4_error() in move_extents.c
Akira Fujita [Wed, 16 Sep 2009 17:46:35 +0000 (13:46 -0400)]
ext4: Replace BUG_ON() with ext4_error() in move_extents.c

(cherry picked from commit 2147b1a6a48e28399120ca51d4a91840a278611f)

Replace BUG_ON calls with a call to ext4_error()
to print an error message if EXT4_IOC_MOVE_EXT failed
with some kind of reasons.  This will help to debug.
Ted pointed this out, thanks.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Replace get_ext_path macro with an inline funciton
Akira Fujita [Wed, 16 Sep 2009 17:46:38 +0000 (13:46 -0400)]
ext4: Replace get_ext_path macro with an inline funciton

(cherry picked from commit e8505970af46658ece2545e9bc1fe594998fdcdf)

Replace get_ext_path macro with an inline function,
since this macro looks like a function call but its arguments
get modified. Ted pointed this out, thanks.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix small typo for move_extent_per_page()
Akira Fujita [Sun, 6 Sep 2009 03:12:41 +0000 (23:12 -0400)]
ext4: Fix small typo for move_extent_per_page()

(cherry picked from commit 44fc48f7048ab9657b524938a832fec4e0acea98)

This function means moving extents every page, so change its name from
move_exgtent_par_page().

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.co.jp>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix include/trace/events/ext4.h to work with Systemtap
Theodore Ts'o [Tue, 15 Sep 2009 02:59:50 +0000 (22:59 -0400)]
ext4: Fix include/trace/events/ext4.h to work with Systemtap

(cherry picked from commit 3661d28615ea580c1db02a972fd4d3898df1cb01)

Using relative pathnames in #include statements interacts badly with
SystemTap, since the fs/ext4/*.h header files are not packaged up as
part of a distribution kernel's header files.  Since systemtap doesn't
use TP_fast_assign(), we can use a blind structure definition and then
make sure the needed header files are defined before the ext4 source
files #include the trace/events/ext4.h header file.

https://bugzilla.redhat.com/show_bug.cgi?id=512478

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix initalization of s_flex_groups
Theodore Ts'o [Fri, 11 Sep 2009 20:51:28 +0000 (16:51 -0400)]
ext4: Fix initalization of s_flex_groups

(cherry picked from commit 7ad9bb651fc2036ea94bed94da76a4b08959a911)

The s_flex_groups array should have been initialized using atomic_add
to sum up the free counts from the block groups that make up a
flex_bg.  By using atomic_set, the value of the s_flex_groups array
was set to the values of the last block group in the flex_bg.

The impact of this bug is that the block and inode allocation
algorithms might not pick the best flex_bg for new allocation.

Thanks to Damien Guibouret for pointing out this problem!

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Always set dx_node's fake_dirent explicitly.
Andreas Schlick [Fri, 11 Sep 2009 03:16:07 +0000 (23:16 -0400)]
ext4: Always set dx_node's fake_dirent explicitly.

(cherry picked from commit 1f7bebb9e911d870fa8f997ddff838e82b5715ea)

When ext4_dx_add_entry() has to split an index node, it has to ensure that
name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck
won't recognise it as an intermediate htree node and consider the htree to
be corrupted.

Signed-off-by: Andreas Schlick <schlick@lavabit.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Don't update superblock write time when filesystem is read-only
Theodore Ts'o [Thu, 10 Sep 2009 21:31:04 +0000 (17:31 -0400)]
ext4: Don't update superblock write time when filesystem is read-only

(cherry picked from commit 71290b368ad5e1e0b0b300c9d5638490a9fd1a2d)

This avoids updating the superblock write time when we are mounting
the root file system read/only but we need to replay the journal; at
that point, for people who are east of GMT and who make their clock
tick in localtime for Windows bug-for-bug compatibility, and this will
cause e2fsck to complain and force a full file system check.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: check for need init flag in ext4_mb_load_buddy
Aneesh Kumar K.V [Thu, 10 Sep 2009 03:34:50 +0000 (23:34 -0400)]
ext4: check for need init flag in ext4_mb_load_buddy

(cherry picked from commit f41c0750538667b87a19c93952e5d42fcc069bd7)

We should check for need init flag with the group's alloc_sem held, to
make sure while we are loading the buddy cache and holding a reference
to it, a file system resize can't add new blocks to same group.

The patch also drops the need init flag check in
ext4_mb_regular_allocator() because doing the check without holding
alloc_sem is racy.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: move ext4_mb_init_group() function earlier in the mballoc.c
Aneesh Kumar K.V [Thu, 10 Sep 2009 03:47:46 +0000 (23:47 -0400)]
ext4: move ext4_mb_init_group() function earlier in the mballoc.c

(cherry picked from commit b6a758ec3af3ec236dbfdcf6a06b84ac8f94957e)

This moves the function around so that it can be called from
ext4_mb_load_buddy().

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Make non-journal fsync work properly
Frank Mayhar [Thu, 10 Sep 2009 02:33:47 +0000 (22:33 -0400)]
ext4: Make non-journal fsync work properly

(cherry picked from commit 91ac6f43317c0bf99969665f98016548011dfa38)

Teach ext4_write_inode() and ext4_do_update_inode() about non-journal
mode:  If we're not using a journal, ext4_write_inode() now calls
ext4_do_update_inode() (after getting the iloc via ext4_get_inode_loc())
with a new "do_sync" parameter.  If that parameter is nonzero _and_ we're
not using a journal, ext4_do_update_inode() calls sync_dirty_buffer()
instead of ext4_handle_dirty_metadata().

This problem was found in power-fail testing, checking the amount of
loss of files and blocks after a power failure when using fsync() and
when not using fsync().  It turned out that using fsync() was actually
worse than not doing so, possibly because it increased the likelihood
that the inodes would remain unflushed and would therefore be lost at
the power failure.

Signed-off-by: Frank Mayhar <fmayhar@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Assure that metadata blocks are written during fsync in no journal mode
Theodore Ts'o [Sat, 12 Sep 2009 17:41:55 +0000 (13:41 -0400)]
ext4: Assure that metadata blocks are written during fsync in no journal mode

(cherry picked from commit fe188c0e084bdf3038dc0ac963c21d764f53f7da)

When there is no journal present, we must attach buffer heads
associated with extent tree and indirect blocks to the inode's
mapping->private_list via mark_buffer_dirty_inode() so that
ext4_sync_file() --- which is called to service fsync() and
fdatasync() system calls --- can write out the inode's metadata blocks
by calling sync_mapping_buffers().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}()
Theodore Ts'o [Thu, 10 Sep 2009 01:32:41 +0000 (21:32 -0400)]
ext4: Use bforget() in no journal mode for ext4_journal_{forget,revoke}()

(cherry picked from commit c7acb4c16646943180bd221c167a077e0a084f9c)

When ext4 is using a journal, a metadata block which is deallocated
must be passed into the journal layer so it can be dropped from the
current transaction and/or revoked.  This is done by calling the
functions ext4_journal_forget() and ext4_journal_revoke(), which call
jbd2_journal_forget(), and jbd2_journal_revoke(), respectively.

Since the jbd2_journal_forget() and jbd2_journal_revoke() call
bforget(), if ext4 is not using a journal, ext4_journal_forget() and
ext4_journal_revoke() must call bforget() to avoid a dirty metadata
block overwriting a block after it has been reallocated and reused for
another inode's data block.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: print more sysadmin-friendly message in check_block_validity()
Theodore Ts'o [Tue, 8 Sep 2009 12:21:26 +0000 (08:21 -0400)]
ext4: print more sysadmin-friendly message in check_block_validity()

(cherry picked from commit 80e42468d65475e92651e62175bb7807773321d0)

Drop the WARN_ON(1), as he stack trace is not appropriate, since it is
triggered by file system corruption, and it misleads users into
thinking there is a kernel bug.  In addition, change the message
displayed by ext4_error() to make it clear that this is a file system
corruption problem.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Take page lock before looking at attached buffer_heads flags
Aneesh Kumar K.V [Thu, 10 Sep 2009 02:36:03 +0000 (22:36 -0400)]
ext4: Take page lock before looking at attached buffer_heads flags

(cherry picked from commit a827eaffff07c7d58a4cb32158cbeb4849f4e33a)

In order to check whether the buffer_heads are mapped we need to hold
page lock. Otherwise a reclaim can cleanup the attached buffer_heads.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Return exchanged blocks count to user space in failure
Akira Fujita [Sun, 6 Sep 2009 02:46:29 +0000 (22:46 -0400)]
ext4: Return exchanged blocks count to user space in failure

(cherry picked from commit 8d6669133d8cdbb7cbe0e1f0f3744e7802a84afe)

Return exchanged blocks count (moved_len) to user space,
if ext4_move_extents() failed on the way.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Remove unneeded BUG_ON() in ext4_move_extents()
Akira Fujita [Sun, 6 Sep 2009 02:11:55 +0000 (22:11 -0400)]
ext4: Remove unneeded BUG_ON() in ext4_move_extents()

(cherry picked from commit daea696dbac0e33af3cfe304efbfb8d74e0effe6)

The ext4_move_extents() functions checks with BUG_ON() whether the
exchanged blocks count accords with request blocks count.  But, if the
target range (orig_start + len) includes sparse block(s), 'moved_len'
(exchanged blocks count) does not agree with 'len' (request blocks
count), since sparse block is not counted in 'moved_len'.  This causes
us to hit the BUG_ON(), even though the function succeeded.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix wrong comparisons in mext_check_arguments()
Akira Fujita [Wed, 16 Sep 2009 18:28:22 +0000 (14:28 -0400)]
ext4: Fix wrong comparisons in mext_check_arguments()

(cherry picked from commit 70d5d3dcea47c16058d2b093c29e07fdf61b56ad)

The mext_check_arguments() function in move_extents.c has wrong
comparisons.  orig_start which is passed from user-space is block
unit, but i_size of inode is byte unit, therefore the checks do not
work fine.  This mis-check leads to the overflow of 'len' and then
hits BUG_ON() in ext4_move_extents().  The patch fixes this issue.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Reviewed-by: Greg Freemyer <greg.freemyer@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: fix cache flush in ext4_sync_file
Christoph Hellwig [Sun, 6 Sep 2009 01:42:42 +0000 (21:42 -0400)]
ext4: fix cache flush in ext4_sync_file

(cherry picked from commit 5f3481e9a80c240f169b36ea886e2325b9aeb745)

We need to flush the write cache unconditionally in ->fsync, otherwise
writes into already allocated blocks can get lost.  Writes into fully
allocated files are very common when using disk images for
virtualization, and without this fix can easily lose data after
an fdatasync, which is the typical implementation for a cache flush on
the virtual drive.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: Restore wbc->range_start in ext4_da_writepages()
Theodore Ts'o [Mon, 31 Aug 2009 21:00:59 +0000 (17:00 -0400)]
ext4: Restore wbc->range_start in ext4_da_writepages()

(cherry picked from commit de89de6e0cf4b1eb13f27137cf2aa40d287aabdf)

To solve a lock inversion problem, we implement part of the
range_cyclic algorithm in ext4_da_writepages().  (See commit 2acf2c26
for more details.)

As part of that change wbc->range_start was modified by ext4's
writepages function, which causes its callers to get confused since
they aren't expecting the filesystem to modify it.  The simplest fix
is to save and restore wbc->range_start in ext4_da_writepages.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: Limit number of links that can be created by ext4_link()
Theodore Ts'o [Sun, 30 Aug 2009 01:08:08 +0000 (21:08 -0400)]
ext4: Limit number of links that can be created by ext4_link()

(cherry picked from commit b05ab1dc3795e6f997fb0d34f38fce5012533c3e)

In ext4_link we need to check using EXT4_LINK_MAX, and not
EXT4_DIR_LINK_MAX(), since ext4_link() is creating hard links of
regular files, and not directories.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: Allow rename to create more than EXT4_LINK_MAX subdirectories
Aneesh Kumar K.V [Sat, 29 Aug 2009 01:43:15 +0000 (21:43 -0400)]
ext4: Allow rename to create more than EXT4_LINK_MAX subdirectories

(cherry picked from commit 2c94eb86c66e1eaaa1e7d8a2120f4fad5e7e7736)

Use EXT4_DIR_LINK_MAX so that rename() can move a directory into new
parent directory without running into the EXT4_LINK_MAX limit.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: Add missing unlock_new_inode() call in extent migration code
Aneesh Kumar K.V [Wed, 26 Aug 2009 02:36:05 +0000 (22:36 -0400)]
ext4: Add missing unlock_new_inode() call in extent migration code

(cherry picked from commit a8526e84ac758ac6da45cf273aa1538a6a7aa3de)

We need to unlock the new inode before iput.  This patch fixes the
following warning when calling chattr +e to migrate a file to use
extents.  It also fixes problems in when e4defrag attempts to
defragment an inode.

[  470.400044] ------------[ cut here ]------------
[  470.400065] WARNING: at fs/inode.c:1210 generic_delete_inode+0x65/0x16a()
[  470.400072] Hardware name: N/A
.....
...
[  470.400353] Pid: 4451, comm: chattr Not tainted 2.6.31-rc7-red-debug #4
[  470.400359] Call Trace:
[  470.400372]  [<ffffffff81037771>] warn_slowpath_common+0x77/0x8f
[  470.400385]  [<ffffffff81037798>] warn_slowpath_null+0xf/0x11
[  470.400395]  [<ffffffff810b7f28>] generic_delete_inode+0x65/0x16a
[  470.400405]  [<ffffffff810b8044>] generic_drop_inode+0x17/0x1bd
[  470.400413]  [<ffffffff810b7083>] iput+0x61/0x65
[  470.400455]  [<ffffffffa003b229>] ext4_ext_migrate+0x5eb/0x66a [ext4]
[  470.400492]  [<ffffffffa002b1f8>] ext4_ioctl+0x340/0x756 [ext4]
[  470.400507]  [<ffffffff810b1a91>] vfs_ioctl+0x1d/0x82
[  470.400517]  [<ffffffff810b1ff0>] do_vfs_ioctl+0x483/0x4c9
[  470.400527]  [<ffffffff81059c30>] ? trace_hardirqs_on+0xd/0xf
[  470.400537]  [<ffffffff810b2087>] sys_ioctl+0x51/0x74
[  470.400549]  [<ffffffff8100ba6b>] system_call_fastpath+0x16/0x1b
[  470.400557] ---[ end trace ab85723542352dac ]---

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: Add feature set check helper for mount & remount paths
Eric Sandeen [Tue, 18 Aug 2009 04:20:23 +0000 (00:20 -0400)]
ext4: Add feature set check helper for mount & remount paths

(cherry picked from commit a13fb1a4533f26c1e2b0204d5283b696689645af)

A user reported that although his root ext4 filesystem was mounting
fine, other filesystems would not mount, with the:

"Filesystem with huge files cannot be mounted RDWR without CONFIG_LBDAF"

error on his 32-bit box built without CONFIG_LBDAF.  This is because
the test at mount time for this situation was not being re-checked
on remount, and the normal boot process makes an ro->rw transition,
so this was being missed.

Refactor to make a common helper function to test the filesystem
features against the type of mount request (RO vs. RW) so that we
stay consistent.

Addresses Red-Hat-Bugzilla: #517650

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
14 years agoext4: reject too-large filesystems on 32-bit kernels
Eric Sandeen [Tue, 18 Aug 2009 03:48:51 +0000 (23:48 -0400)]
ext4: reject too-large filesystems on 32-bit kernels

(cherry picked from commit bf43d84b185e2ff54598f8c58a5a8e63148b6e90)

ext4 will happily mount a > 16T filesystem on a 32-bit box, but
this is not safe; writes to the block device will wrap past 16T
and the page cache can't index past 16T (232 index * 4k pages).

Adding another test to the existing "too many sectors" test
should do the trick.

Add a comment, a relevant return value, and fix the reference
to the CONFIG_LBD(AF) option as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks()
Jan Kara [Tue, 18 Aug 2009 02:17:20 +0000 (22:17 -0400)]
ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks()

During truncate we are sometimes forced to start a new transaction as
the amount of blocks to be journaled is both quite large and hard to
predict. So far we restarted a transaction while holding i_data_sem
and that violates lock ordering because i_data_sem ranks below a
transaction start (and it can lead to a real deadlock with
ext4_get_blocks() mapping blocks in some page while having a
transaction open).

(cherry picked from commit 487caeef9fc08c0565e082c40a8aaf58dad92bbb)

We fix the problem by dropping the i_data_sem before restarting the
transaction and acquire it afterwards. It's slightly subtle that this
works:

1) By the time ext4_truncate() is called, all the page cache for the
truncated part of the file is dropped so get_block() should not be
called on it (we only have to invalidate extent cache after we
reacquire i_data_sem because some extent from not-truncated part could
extend also into the part we are going to truncate).

2) Writes, migrate or defrag hold i_mutex so they are stopped for all
the time of the truncate.

This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agojbd2: Annotate transaction start also for jbd2_journal_restart()
Jan Kara [Tue, 18 Aug 2009 01:23:17 +0000 (21:23 -0400)]
jbd2: Annotate transaction start also for jbd2_journal_restart()

(cherry picked from commit 9599b0e597d810be9b8f759ea6e9619c4f983c5e)

lockdep annotation for a transaction start has been at the end of
jbd2_journal_start(). But a transaction is also started from
jbd2_journal_restart(). Move the lockdep annotation to start_this_handle()
which covers both cases.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Avoid group preallocation for closed files
Theodore Ts'o [Fri, 18 Sep 2009 17:34:02 +0000 (13:34 -0400)]
ext4: Avoid group preallocation for closed files

(cherry picked from commit 50797481a7bdee548589506d7d7b48b08bc14dcd)

Currently the group preallocation code tries to find a large (512)
free block from which to do per-cpu group allocation for small files.
The problem with this scheme is that it leaves the filesystem horribly
fragmented.  In the worst case, if the filesystem is unmounted and
remounted (after a system shutdown, for example) we forget the fact
that wee were using a particular (now-partially filled) 512 block
extent.  So the next time we try to allocate space for a small file,
we will find *another* completely free 512 block chunk to allocate
small files.  Given that there are 32,768 blocks in a block group,
after 64 iterations of "mount, write one 4k file in a directory,
unmount", the block group will have 64 files, each separated by 511
blocks, and the block group will no longer have any free 512
completely free chunks of blocks for group preallocation space.

So if we try to allocate blocks for a file that has been closed, such
that we know the final size of the file, and the filesystem is not
busy, avoid using group preallocation.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix bugs in mballoc's stream allocation mode
Theodore Ts'o [Mon, 10 Aug 2009 02:01:13 +0000 (22:01 -0400)]
ext4: Fix bugs in mballoc's stream allocation mode

(cherry picked from commit 4ba74d00a20256e22f159cb288ff34b587608917)

The logic around sbi->s_mb_last_group and sbi->s_mb_last_start was all
screwed up.  These fields were getting unconditionally all the time,
set even when stream allocation had not taken place, and if they were
being used when the file was smaller than s_mb_stream_request, which
is when the allocation should _not_ be doing stream allocation.

Fix this by determining whether or not we stream allocation should
take place once, in ext4_mb_group_or_file(), and setting a flag which
gets used in ext4_mb_regular_allocator() and ext4_mb_use_best_found().
This simplifies the code and assures that we are consistently using
(or not using) the stream allocation logic.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: fix journal ref count in move_extent_par_page
Peng Tao [Tue, 11 Aug 2009 03:05:28 +0000 (23:05 -0400)]
ext4: fix journal ref count in move_extent_par_page

(cherry picked from commit 91cc219ad963731191247c5f2db4118be2bc341a)

move_extent_par_page calls a_ops->write_begin() to increase journal
handler's reference count. However, if either mext_replace_branches()
or ext4_get_block fails, the increased reference count isn't
decreased. This will cause a later attempt to umount of the fs to hang
forever. The patch addresses the issue by calling ext4_journal_stop()
if page is not NULL (which means a_ops->write_end() isn't invoked).

Signed-off-by: Peng Tao <bergwolf@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agojbd2: round commit timer up to avoid uncommitted transaction
Andreas Dilger [Tue, 11 Aug 2009 02:51:53 +0000 (22:51 -0400)]
jbd2: round commit timer up to avoid uncommitted transaction

(cherry picked from commit b1f485f20eb9b02cc7d2009556287f3939d480cc)

fix jiffie rounding in jbd commit timer setup code.  Rounding down
could cause the timer to be fired before the corresponding transaction
has expired.  That transaction can stay not committed forever if no
new transaction is created or expicit sync/umount happens.

Signed-off-by: Alex Zhuravlev (Tomas) <alex.zhuravlev@sun.com>
Signed-off-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agojbd2: Fail to load a journal if it is too short
Jan Kara [Fri, 17 Jul 2009 14:40:01 +0000 (10:40 -0400)]
jbd2: Fail to load a journal if it is too short

(cherry picked from commit f6f50e28f0cb8d7bcdfaacc83129f005dede11b1)

Due to on disk corruption, it can happen that journal is too short. Fail
to load it in such case so that we don't oops somewhere later.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Avoid null pointer dereference when decoding EROFS w/o a journal
Theodore Ts'o [Tue, 28 Jul 2009 03:09:47 +0000 (23:09 -0400)]
ext4: Avoid null pointer dereference when decoding EROFS w/o a journal

(cherry picked from commit 78f1ddbb498283c2445c11b0dfa666424c301803)

We need to check to make sure a journal is present before checking the
journal flags in ext4_decode_error().

Signed-off-by: Eric Sesterhenn <eric.sesterhenn@lsexperts.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoext4: Fix memory leak fix when mounting an ext4 filesystem
Aneesh Kumar K.V [Fri, 17 Jul 2009 13:01:04 +0000 (09:01 -0400)]
ext4: Fix memory leak fix when mounting an ext4 filesystem

(cherry picked from commit 024eab4d5bf7e3168a2b71038b3e04e6b1f376ed)

The allocation of the ext4_group_info array was moved to a new
function ext4_mb_add_group_info() in commit 5f21b0e6 so that online
resize would use a common (and correct) codepath.  Unfortunately, the
call to the new ext4_mb_add_group_info() function was added without
removing the code which originally allocated the array.  This caused a
memory leak each time an ext4 filesystem was mounted.

The fix is simple; remove the code that did the original allocation,
since it is no longer needed.

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoLinux 2.6.31.7 v2.6.31.7
Greg Kroah-Hartman [Tue, 8 Dec 2009 19:13:50 +0000 (11:13 -0800)]
Linux 2.6.31.7

14 years agoisdn: hfc_usb: Fix read buffer overflow
Roel Kluin [Wed, 4 Nov 2009 16:31:59 +0000 (08:31 -0800)]
isdn: hfc_usb: Fix read buffer overflow

commit 286e633ef0ff5bb63c07b4516665da8004966fec upstream.

Check whether index is within bounds before testing the element.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoInput: keyboard - fix braille keyboard keysym generation
Samuel Thibault [Thu, 26 Nov 2009 06:28:20 +0000 (22:28 -0800)]
Input: keyboard - fix braille keyboard keysym generation

commit 46a965462a1c568a7cd7dc338de4a0afa5ce61c5 upstream.

Keysyms stored in key_map[] are not simply K() values, but U(K()) values,
as can be seen in the KDSKBENT ioctl handler.  The kernel-generated
braille keysyms thus need a U() call too.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoacerhdf: return temperature in milidegree instead of degree
Peter Feuerer [Tue, 17 Nov 2009 22:07:21 +0000 (14:07 -0800)]
acerhdf: return temperature in milidegree instead of degree

commit 7005291706341a11c094f39a756a01c9e649e5f9 upstream.

Return temperature in milidegree instead of degree, as sysfs-api requires
the temperature in milidegree.

Signed-off-by: Peter Feuerer <peter@piie.net>
Tested-by: Borislav Petkov <petkovbb@gmail.com>
Cc: Andreas Mohr <andi@lisas.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoacerhdf: additional BIOS versions
Peter Feuerer [Fri, 18 Sep 2009 19:41:07 +0000 (12:41 -0700)]
acerhdf: additional BIOS versions

commit f944915187f53810130eb1c56985e27e3cbe4a6d upstream.

Added BIOS versions:
Acer: AOA110-v0.3307, AOA150-v0.3301, AOA150-v0.3307
Packard Bell: AOA150-v0.3105

Signed-off-by: Peter Feuerer <peter@piie.net>
Cc: Andreas Mohr <andi@lisas.de>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoPCI: Prevent AER driver from being loaded on non-root port PCIE devices
Kenji Kaneshige [Wed, 7 Oct 2009 16:28:56 +0000 (09:28 -0700)]
PCI: Prevent AER driver from being loaded on non-root port PCIE devices

commit 30fc24b5cbc55f9e6c686e2710cc812419bddc0c upstream.

A bug was seen on boards using a PLX 8518 switch device which advertises
AER on each of it's transparent bridges. The AER driver was loaded for
each bridge and this driver tried to access the AER source ID register
whenever an interrupt occured on the shared PCI INTX lines. The source
ID register does not exist on non root port PCIE device's  which
advertise AER and trying to access this register causes a unsupported
request error on the bridge. Thus, when the next interrupt occurs,
another error is found and the non existent source ID register is
accessed again, and so it goes on.

The result is a spammed dmesg with unsupported request PCI express
errors on the bridge device that the AER driver is loaded against.

Reported-by: Malcolm Crossley <malcolm.crossley2@gefanuc.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Tested-by: Malcolm Crossley <malcolm.crossley2@gefanuc.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoV4L/DVB (13257): gspca - m5602-s5k4aa: Add vflip for Fujitsu Amilo Xi 2528
Erik Andrén [Sat, 3 Oct 2009 13:01:41 +0000 (10:01 -0300)]
V4L/DVB (13257): gspca - m5602-s5k4aa: Add vflip for Fujitsu Amilo Xi 2528

commit 81191f694cb507c49d3c7aa6238dcc0a83ad4001 upstream.

Adds a vflip quirk for the Fujitsu Amilo Xi 2528. Thanks to Evgeny for the report.

Signed-off-by: Erik Andrén <erik.andren@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
14 years agoV4L/DVB (13256): gspca - m5602-s5k4aa: Add another MSI GX700 vflip quirk
Erik Andrén [Sun, 27 Sep 2009 13:20:21 +0000 (10:20 -0300)]
V4L/DVB (13256): gspca - m5602-s5k4aa: Add another MSI GX700 vflip quirk

commit 2339a1887dab469bb4bae56aa7eca3a5e05ecde7 upstream.

Adds another vflip quirk for the MSI GX700.
Thanks to John Katzmaier for reporting.

Signed-off-by: Erik Andrén <erik.andren@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoV4L/DVB (13255): gspca - m5602-s5k4aa: Add vflip quirk for the Bruneinit laptop
Erik Andrén [Sun, 27 Sep 2009 13:11:43 +0000 (10:11 -0300)]
V4L/DVB (13255): gspca - m5602-s5k4aa: Add vflip quirk for the Bruneinit laptop

commit b6ef8836c1ff5199abd40cfba162052bc7e8af00 upstream.

Adds a vflip quirk for the Bruneinit laptop. Thanks to Jörg for the report

Signed-off-by: Erik Andrén <erik.andren@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agotty/of_serial: add missing ns16550a id
Michal Simek [Tue, 24 Nov 2009 10:22:41 +0000 (10:22 +0000)]
tty/of_serial: add missing ns16550a id

commit 16173c7c2d79da7eb89b41acfdebd74b130f4339 upstream.

Many boards have a bug-free ns16550 compatible serial port, which we should
register as PORT_16550A. This introduces a new value "ns16550a" for the
compatible property of of_serial to let a firmware choose that model instead
of using the crippled PORT_16550 mode.

Reported-by: Alon Ziv <alonz@nolaviz.org>
Signed-off-by: Michal Simek <monstr@monstr.eu>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agodrm/fb: fix FBIOGET/PUT_VSCREENINFO pixel clock handling
Clemens Ladisch [Wed, 2 Dec 2009 07:16:55 +0000 (08:16 +0100)]
drm/fb: fix FBIOGET/PUT_VSCREENINFO pixel clock handling

commit 5349ef3127c77075ff70b2014f17ae0fbcaaf199 upstream

When the framebuffer driver does not publish detailed timing information
for the current video mode, the correct value for the pixclock field is
zero, not -1.

Since pixclock is actually unsigned, the value -1 would be interpreted
as 4294967295 picoseconds (i.e., about 4 milliseconds) by
register_framebuffer() and userspace programs.

This patch allows X.org's fbdev driver to work.

Signed-off-by: Clemens Ladisch <clemens@ladisch.de>
Tested-by: Paulius Zaleckas <paulius.zaleckas@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoacerhdf: fix fan control for AOA150 model
Peter Feuerer [Thu, 6 Aug 2009 22:57:52 +0000 (15:57 -0700)]
acerhdf: fix fan control for AOA150 model

commit ded0cdfc6a7673916b0878c32fa8ba566b4f8cdb upstream.

- Apply Borislav Petkov's patch (convert the fancmd[] array to a real
  struct thus disambiguating command handling and making code more
  readable.)

- Add BIOS product to BIOS table as AOA110 and AOA150 have different
  register values

- Add force_product parameter to allow forcing different product

- fix linker warning caused by "acerhdf_drv" not being named
  "acerhdf_driver"

Signed-off-by: Peter Feuerer <peter@piie.net>
Cc: Andreas Mohr <andi@lisas.de>
Acked-by: Borislav Petkov <petkovbb@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Adrian von Bidder <avbidder@fortytwo.ch>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoi2c: Fix userspace_device list corruption
Jean Delvare [Thu, 26 Nov 2009 08:22:33 +0000 (09:22 +0100)]
i2c: Fix userspace_device list corruption

commit bbd2d9c9198c6efd449e9d395b3eaf2d03aa3bba upstream.

Fix userspace_device list corruption. The corruption was caused by
clients not being removed when adapters with such clients were
themselves removed. Something like the following would trigger it
(assuming i2c-stub gets adapter number 3):

# modprobe i2c-stub chip_addr=0x50
# echo 24c08 0x50 > /sys/bus/i2c/devices/i2c-3/new_device
# rmmod i2c-stub
# modprobe i2c-stub chip_addr=0x50
# echo 24c08 0x50 > /sys/bus/i2c/devices/i2c-3/new_device

For the records, the stack trace in the kernel logs look like this:

kernel: WARNING: at lib/list_debug.c:30 __list_add+0x8b/0x90()
kernel: Hardware name: (...)
kernel: list_add corruption. prev->next should be next (c137fc84), but was (null). (prev=f57111b8).
kernel: Modules linked in: (...)
kernel: Pid: 4669, comm: bash Not tainted 2.6.32-rc8 #259
kernel: Call Trace:
kernel:  [<c111eb8b>] ? __list_add+0x8b/0x90
kernel:  [<c111eb8b>] ? __list_add+0x8b/0x90
kernel:  [<c103265c>] warn_slowpath_common+0x6c/0xc0
kernel:  [<c111eb8b>] ? __list_add+0x8b/0x90
kernel:  [<c10326f6>] warn_slowpath_fmt+0x26/0x30
kernel:  [<c111eb8b>] __list_add+0x8b/0x90
kernel:  [<c11ba165>] i2c_sysfs_new_device+0x1c5/0x250
kernel:  [<c10861be>] ? might_fault+0x2e/0x80
kernel:  [<c11b9fa0>] ? i2c_sysfs_new_device+0x0/0x250
kernel:  [<c118c625>] dev_attr_store+0x25/0x30
kernel:  [<c10e305c>] sysfs_write_file+0x9c/0xf0
kernel:  [<c109d35c>] vfs_write+0x9c/0x160
kernel:  [<c10e2fc0>] ? sysfs_write_file+0x0/0xf0
kernel:  [<c109d4dd>] sys_write+0x3d/0x70
kernel:  [<c1002ed8>] sysenter_do_call+0x12/0x36

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoath5k: Linear PCDAC code fixes
Nick Kossifidis [Mon, 10 Aug 2009 00:27:59 +0000 (03:27 +0300)]
ath5k: Linear PCDAC code fixes

commit d1cb0bdac180a4afdd3c001acb2618d2a62d9abe upstream.

* Set correct xpd curve indices for high/low gain curves during
   rfbuffer setup on RF5112B with both calibration curves available.

 * Don't return zero min power when we have the same pcdac value
   twice because it breaks interpolation. Instead return the right
   x barrier as we do when we have equal power levels for 2 different
   pcdac values.

Signed-off-by: Nick Kossifidis <mickflemm@gmail.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Cc: Dan Williams <dcbw@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agosky2: set carrier off in probe
Brandon Philips [Thu, 29 Oct 2009 13:58:07 +0000 (13:58 +0000)]
sky2: set carrier off in probe

commit 33cb7d33a1c36e07839d08a4d1a33bf6a0f70bba upstream.

Before bringing up a sky2 interface up ethtool reports
"Link detected: yes". Do as ixgbe does and netif_carrier_off() on
probe().

Signed-off-by: Brandon Philips <bphilips@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agocrypto: padlock-aes - Use the correct mask when checking whether copying is required
Chuck Ebbert [Tue, 3 Nov 2009 15:32:03 +0000 (10:32 -0500)]
crypto: padlock-aes - Use the correct mask when checking whether copying is required

commit e8edb3cbd7dd8acf6c748a02d06ec1d82c4124ea upstream.

Masking with PAGE_SIZE is just wrong...

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agob43: Fix DMA TX bounce buffer copying
Michael Buesch [Wed, 28 Oct 2009 21:08:13 +0000 (22:08 +0100)]
b43: Fix DMA TX bounce buffer copying

commit 9a3f45116f5e08819136cd512fd7f6450ac22aa8 upstream.

b43 allocates a bouncebuffer, if the supplied TX skb is in an invalid
memory range for DMA.
However, this is broken in that it fails to copy over some metadata to the
new skb.

This patch fixes three problems:
* Failure to adjust the ieee80211_tx_info pointer to the new buffer.
  This results in a kmemcheck warning.
* Failure to copy the skb cb, which contains ieee80211_tx_info, to the new skb.
  This results in breakage of various TX-status postprocessing (Rate control).
* Failure to transfer the queue mapping.
  This results in the wrong queue being stopped on saturation and can result in queue overflow.

Signed-off-by: Michael Buesch <mb@bu3sch.de>
Tested-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agonetfilter: xt_connlimit: fix regression caused by zero family value
Jan Engelhardt [Sat, 7 Nov 2009 02:08:32 +0000 (18:08 -0800)]
netfilter: xt_connlimit: fix regression caused by zero family value

commit 539054a8fa5141c9a4e9ac6a86d249e3f2bdef45 upstream.

Commit v2.6.28-rc1~717^2~109^2~2 was slightly incomplete; not all
instances of par->match->family were changed to par->family.

References: http://bugzilla.netfilter.org/show_bug.cgi?id=610
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agonetfilter: nf_nat: fix NAT issue in 2.6.30.4+
Jozsef Kadlecsik [Fri, 6 Nov 2009 08:43:42 +0000 (00:43 -0800)]
netfilter: nf_nat: fix NAT issue in 2.6.30.4+

commit f9dd09c7f7199685601d75882447a6598be8a3e0 upstream.

Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work
over NAT. The "cause" of the problem was a fix of unacknowledged data
detection with NAT (commit a3a9f79e361e864f0e9d75ebe2a0cb43d17c4272).
However, actually, that fix uncovered a long standing bug in TCP conntrack:
when NAT was enabled, we simply updated the max of the right edge of
the segments we have seen (td_end), by the offset NAT produced with
changing IP/port in the data. However, we did not update the other parameter
(td_maxend) which is affected by the NAT offset. Thus that could drift
away from the correct value and thus resulted breaking active FTP.

The patch below fixes the issue by *not* updating the conntrack parameters
from NAT, but instead taking into account the NAT offsets in conntrack in a
consistent way. (Updating from NAT would be more harder and expensive because
it'd need to re-calculate parameters we already calculated in conntrack.)

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoagp/intel: new host bridge support
Zhenyu Wang [Tue, 10 Nov 2009 03:10:22 +0000 (03:10 +0000)]
agp/intel: new host bridge support

commit 9cf1e35cb025eaa52dde37df38e2750b6adb1620 upstream.

Add new CPU host bridge id, needed for support Ironlake graphics
device with it. No change for graphics device itself, so no need to
update drm/i915.

Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agohwmon: (adt7475) Cache limits for 60 seconds
Jean Delvare [Mon, 16 Nov 2009 11:45:40 +0000 (12:45 +0100)]
hwmon: (adt7475) Cache limits for 60 seconds

commit 56e35eeebed2dcb4e1a17ad119e039cf095854ac upstream.

The comment says that limits are cached for 60 seconds but the code
actually caches them for only 2 seconds. Align the code on the
comment, as 60 seconds makes more sense.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agohwmon: (adt7475) Fix temperature fault flags
Jean Delvare [Mon, 16 Nov 2009 11:45:39 +0000 (12:45 +0100)]
hwmon: (adt7475) Fix temperature fault flags

commit cf312e077662ec3a07529551ab6e885828ccfb1d upstream.

The logic of temperature fault flags is wrong, it shows faults when
there are none and vice versa. Fix it.

I can't believe this has been broken since the driver was added, 8
months ago, basically breaking temp1 and temp3, and nobody ever
complained.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoblock: use after free bug in __blkdev_get
Neil Brown [Mon, 26 Oct 2009 07:59:17 +0000 (08:59 +0100)]
block: use after free bug in __blkdev_get

commit 960cc0f4fef607baabc2232fbd7cce5368a9dcfd upstream.

commit 0762b8bde9729f10f8e6249809660ff2ec3ad735
(from 14 months ago) introduced a use-after-free bug which has just
recently started manifesting in my md testing.
I tried git bisect to find out what caused the bug to start
manifesting, and it could have been the recent change to
blk_unregister_queue (48c0d4d4c04) but the results were inconclusive.

This patch certainly fixes my symptoms and looks correct as the two
calls are now in the same order as elsewhere in that function.

Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agohso: fix soft-lockup
Antti Kaijanmäki [Mon, 23 Nov 2009 18:54:47 +0000 (10:54 -0800)]
hso: fix soft-lockup

commit dcfcb256cc23c4436691b0fe677275306699d6a1 upstream.

Fix soft-lockup in hso.c which is triggered on SMP machine when
modem is removed while file descriptor(s) under /dev are still open:

  old version called kref_put() too early which resulted in destroying
  hso_serial and hso_device objects which were still used later on.

Signed-off-by: Antti Kaijanmäki <antti.kaijanmaki@nomovok.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
14 years agoperf_event: Adjust frequency and unthrottle for non-group-leader events
Paul Mackerras [Wed, 14 Oct 2009 05:58:03 +0000 (16:58 +1100)]
perf_event: Adjust frequency and unthrottle for non-group-leader events

commit 03541f8b69c058162e4cf9675ec9181e6a204d55 upstream.

The loop in perf_ctx_adjust_freq checks the frequency of sampling
event counters, and adjusts the event interval and unthrottles the
event if required, and resets the interrupt count for the event.
However, at present it only looks at group leaders.

This means that a sampling event that is not a group leader will
eventually get throttled, once its interrupt count reaches
sysctl_perf_event_sample_rate/HZ --- and that is guaranteed to
happen, if the event is active for long enough, since the interrupt
count never gets reset.  Once it is throttled it never gets
unthrottled, so it basically just stops working at that point.

This fixes it by making perf_ctx_adjust_freq use ctx->event_list
rather than ctx->group_list.  The existing spin_lock/spin_unlock
around the loop makes it unnecessary to put rcu_read_lock/
rcu_read_unlock around the list_for_each_entry_rcu().

Reported-by: Mark W. Krentel <krentel@cs.rice.edu>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <19157.26731.855609.165622@cargo.ozlabs.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agomd: revert incorrect fix for read error handling in raid1.
NeilBrown [Tue, 1 Dec 2009 06:30:59 +0000 (17:30 +1100)]
md: revert incorrect fix for read error handling in raid1.

commit d0e260782c3702a009645c3caa02e381dab8798b upstream.

commit 4706b349f was a forward port of a fix that was needed
for SLES10.  But in fact it is not needed in mainline because
the earlier commit dd00a99e7a fixes the same problem in a
better way.
Further, this commit introduces a bug in the way it interacts with
the automatic read-error-correction.  If, after a read error is
successfully corrected, the same disk is chosen to re-read - the
re-read won't be attempted but an error will be returned instead.

After reverting that commit, there is the possibility that a
read error on a read-only array (where read errors cannot
be corrected as that requires a write) will repeatedly read the same
device and continue to get an error.
So in the "Array is readonly" case, fail the drive immediately on
a read error.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agomodules: don't export section names of empty sections via sysfs
Helge Deller [Wed, 2 Dec 2009 23:29:15 +0000 (00:29 +0100)]
modules: don't export section names of empty sections via sysfs

commit 35dead4235e2b67da7275b4122fed37099c2f462 upstream.

On the parisc architecture we face for each and every loaded kernel module
this kernel "badness warning":
  sysfs: cannot create duplicate filename '/module/ac97_bus/sections/.text'
  Badness at fs/sysfs/dir.c:487

Reason for that is, that on parisc all kernel modules do have multiple
.text sections due to the usage of the -ffunction-sections compiler flag
which is needed to reach all jump targets on this platform.

An objdump on such a kernel module gives:
Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .note.gnu.build-id 00000024  00000000  00000000  00000034  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .text         00000000  00000000  00000000  00000058  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .text.ac97_bus_match 0000001c  00000000  00000000  00000058  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  3 .text         00000000  00000000  00000000  000000d4  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
...
Since the .text sections are empty (size of 0 bytes) and won't be
loaded by the kernel module loader anyway, I don't see a reason
why such sections need to be listed under
/sys/module/<module_name>/sections/<section_name> either.

The attached patch does solve this issue by not exporting section
names which are empty.

This fixes bugzilla http://bugzilla.kernel.org/show_bug.cgi?id=14703

Signed-off-by: Helge Deller <deller@gmx.de>
CC: rusty@rustcorp.com.au
CC: akpm@linux-foundation.org
CC: James.Bottomley@HansenPartnership.com
CC: roland@redhat.com
CC: dave@hiauly1.hia.nrc.ca
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoparam: don't complain about unused module parameters.
Rusty Russell [Tue, 1 Dec 2009 04:26:44 +0000 (14:56 +1030)]
param: don't complain about unused module parameters.

commit f066a4f6df68f03b565dfe867dde54dfeb26576e upstream.

Jon confirms that recent modprobe will look in /proc/cmdline, so these
cmdline options can still be used.

See http://bugzilla.kernel.org/show_bug.cgi?id=14164

Reported-by: Adam Williamson <awilliam@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agopxamci: call mmc_remove_host() before freeing resources
Daniel Mack [Tue, 1 Dec 2009 17:17:18 +0000 (18:17 +0100)]
pxamci: call mmc_remove_host() before freeing resources

commit 5d6b1edf8ccc4b7e4e77dff3fc80882833d6186e upstream.

mmc_remove_host() will cause the mmc core to switch off the bus power by
eventually calling pxamci_set_ios(). This function uses the regulator or
the GPIO which have been freed already.

This causes the following Oops on module unload.

[   49.519649] Unable to handle kernel paging request at virtual address 30303a70
[   49.526878] pgd = c7084000
[   49.529563] [30303a70] *pgd=00000000
[   49.533136] Internal error: Oops: 5 [#1]
[   49.537025] last sysfs file: /sys/devices/platform/pxa27x-ohci/usb1/1-1/1-1:1.0/host0/target0:0:0/0:0:0:0/scsi_level
[   49.547471] Modules linked in: pxamci(-) eeti_ts
[   49.552061] CPU: 0    Not tainted  (2.6.32-rc8 #322)
[   49.557001] PC is at regulator_is_enabled+0x3c/0xbc
[   49.561846] LR is at regulator_is_enabled+0x30/0xbc
[   49.566691] pc : [<c01a2448>]    lr : [<c01a243c>]    psr: 60000013
[   49.566702] sp : c7083e70  ip : 30303a30  fp : 00000000
[   49.578093] r10: c705e200  r9 : c7082000  r8 : c705e2e0
[   49.583280] r7 : c7061340  r6 : c7061340  r5 : c7083e70  r4 : 00000000
[   49.589759] r3 : c04dc434  r2 : c04dc434  r1 : c03eecea  r0 : 00000047
[   49.596241] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   49.603329] Control: 0000397f  Table: a7084018  DAC: 00000015
[   49.609031] Process rmmod (pid: 1101, stack limit = 0xc7082278)
[   49.614908] Stack: (0xc7083e70 to 0xc7084000)
[   49.619238] 3e60:                                     c7082000 c703c4f8 c705ea00 c04f4074
[   49.627366] 3e80: 00000000 c705e3a0 ffffffff c0247ddc c70361a0 00000000 c705e3a0 ffffffff
[   49.635499] 3ea0: c705e200 bf006400 c78c4f00 c705e200 c705e3a0 ffffffff c705e200 ffffffff
[   49.643633] 3ec0: c04d8ac8 c02476d0 ffffffff c0247c60 c705e200 c0248678 c705e200 c0249064
[   49.651765] 3ee0: ffffffff bf006204 c04d8ad0 c04d8ad0 c04d8ac8 bf007490 00000880 c00440c4
[   49.659898] 3f00: 0000b748 c01c5708 bf007490 c01c44c8 c04d8ac8 c04d8afc bf007490 c01c4570
[   49.668031] 3f20: bf007490 bf00750c c04f4258 c01c37a4 00000000 bf00750c c7083f44 c007b014
[   49.676162] 3f40: 4000d000 6d617870 08006963 00000001 00000000 c7085000 00000001 00000000
[   49.684287] 3f60: 4000d000 c7083f8c 00000001 bea01a54 00005401 c7ab1400 c00440c4 00082000
[   49.692420] 3f80: bf00750c 00000880 c7083f8c 00000000 4000cfa8 00000000 00000880 bea01cc8
[   49.700552] 3fa0: 00000081 c0043f40 00000000 00000880 bea01cc8 00000880 00000006 00000000
[   49.708677] 3fc0: 00000000 00000880 bea01cc8 00000081 00000097 0000cca4 0000b748 00000000
[   49.716802] 3fe0: 4001a4f0 bea01cc0 00018bf4 4001a4fc 20000010 bea01cc8 a063e021 a063e421
[   49.724958] [<c01a2448>] (regulator_is_enabled+0x3c/0xbc) from [<c0247ddc>] (mmc_regulator_set_ocr+0x14/0xd8)
[   49.734836] [<c0247ddc>] (mmc_regulator_set_ocr+0x14/0xd8) from [<bf006400>] (pxamci_set_ios+0xd8/0x17c [pxamci])
[   49.745044] [<bf006400>] (pxamci_set_ios+0xd8/0x17c [pxamci]) from [<c02476d0>] (mmc_power_off+0x50/0x58)
[   49.754555] [<c02476d0>] (mmc_power_off+0x50/0x58) from [<c0247c60>] (mmc_detach_bus+0x68/0xc4)
[   49.763207] [<c0247c60>] (mmc_detach_bus+0x68/0xc4) from [<c0248678>] (mmc_stop_host+0xd4/0x1bc)
[   49.771944] [<c0248678>] (mmc_stop_host+0xd4/0x1bc) from [<c0249064>] (mmc_remove_host+0xc/0x20)
[   49.780681] [<c0249064>] (mmc_remove_host+0xc/0x20) from [<bf006204>] (pxamci_remove+0xc8/0x174 [pxamci])
[   49.790211] [<bf006204>] (pxamci_remove+0xc8/0x174 [pxamci]) from [<c01c5708>] (platform_drv_remove+0x1c/0x24)
[   49.800164] [<c01c5708>] (platform_drv_remove+0x1c/0x24) from [<c01c44c8>] (__device_release_driver+0x7c/0xc4)
[   49.810110] [<c01c44c8>] (__device_release_driver+0x7c/0xc4) from [<c01c4570>] (driver_detach+0x60/0x8c)
[   49.819535] [<c01c4570>] (driver_detach+0x60/0x8c) from [<c01c37a4>] (bus_remove_driver+0x90/0xcc)
[   49.828452] [<c01c37a4>] (bus_remove_driver+0x90/0xcc) from [<c007b014>] (sys_delete_module+0x1d8/0x254)
[   49.837891] [<c007b014>] (sys_delete_module+0x1d8/0x254) from [<c0043f40>] (ret_fast_syscall+0x0/0x28)
[   49.847145] Code: eb06c53a e596c030 e1a0500d e59f106c (e59c0040)
[   49.853566] ---[ end trace b5fa66a00cea142f ]---

Signed-off-by: Daniel Mack <daniel@caiaq.de>
Reported-by: Sven Neumann <s.neumann@raumfeld.com>
Cc: Pierre Ossman <pierre@ossman.eu>
Cc: linux-mmc@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agotty_port: handle the nonblocking open of a dead port corner case
Alan Cox [Wed, 18 Nov 2009 14:12:58 +0000 (14:12 +0000)]
tty_port: handle the nonblocking open of a dead port corner case

commit 8627b96dd80dca440d91fbb1ec733be25912d0dd upstream.

Some drivers allow O_NDELAY of a dead port (eg for setserial to work). In that
situation we must not try to raise the carrier.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoUSB: work around for EHCI with quirky periodic schedules
Oliver Neukum [Fri, 27 Nov 2009 14:17:59 +0000 (15:17 +0100)]
USB: work around for EHCI with quirky periodic schedules

commit ee4ecb8ac63a5792bec448037d4b82ec4144f94b upstream.

a quirky chipset needs periodic schedules to run for a minimum
time before they can be disabled again. This enforces the requirement
with a time stamp and a calculated delay

Signed-off-by: Oliver Neukum <oliver@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoUSB: ftdi_sio: Keep going when write errors are encountered.
Eric W. Biederman [Wed, 18 Nov 2009 03:10:48 +0000 (19:10 -0800)]
USB: ftdi_sio: Keep going when write errors are encountered.

commit 0de6ab8b91f2e1e8e7fc66a8b5c5e8ca82ea16b7 upstream.

The use of urb->actual_length to update tx_outstanding_bytes
implicitly assumes that the number of bytes actually written is the
same as the number of bytes we tried to write.  On error that
assumption is violated so just use transfer_buffer_length the number
of bytes we intended to write to the device.

If an error occurs we need to fall through and call
usb_serial_port_softint to wake up processes waiting in
tty_wait_until_sent.

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agousb: amd5536udc: fixed shared interrupt bug and warning oops
Thomas Dahlmann [Tue, 17 Nov 2009 22:18:27 +0000 (14:18 -0800)]
usb: amd5536udc: fixed shared interrupt bug and warning oops

commit c5deb832d7a3f9618b09e6eeaa91a1a845c90c65 upstream.

- fixed shared interrupt bug reported by Vadim Lobanov
 - fixed possible warning oops on driver unload when connected
 - prevent interrupt flood in PIO mode ("modprobe amd5536udc use_dma=0")
   when using gadget ether

Signed-off-by: Thomas Dahlmann <dahlmann.thomas@arcor.de>
Cc: Robert Richter <robert.richter@amd.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoUSB: musb_gadget: fix STALL handling
Sergei Shtylyov [Wed, 18 Nov 2009 19:51:18 +0000 (22:51 +0300)]
USB: musb_gadget: fix STALL handling

commit cea83241b3a84499c4f9b12f8288f787e7aa6383 upstream.

The driver incorrectly cancels the mass-storage device CSW request
(which leads to device reset) due to giving back URB at the head of
endpoint's queue after sending each STALL handshake; stop doing that
and start checking for the queue being non-empty before stalling an
endpoint and disallowing stall in such case in musb_gadget_set_halt()
like the other gadget drivers do.

Moreover, the driver starts Rx request despite of the endpoint being
halted -- fix this by moving the SendStall bit check from musb_g_rx()
to rxstate().  And we also sometimes get into rxstate() with DMA still
active after clearing an endpoint's halt (not clear why), so bail out
in this case, similarly to what txstate() does...

While at it, also do the following changes :

- in musb_gadget_set_halt(), remove pointless Tx FIFO flushing (the
  driver does not allow stalling with non-empty Tx FIFO anyway);

- in rxstate(), stop pointlessly zeroing the 'csr' variable;

- in musb_gadget_set_halt(), move the 'done' label to a more proper
  place;

- in musb_g_rx(), eliminate the 'done' label completely...

Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoUSB: EHCI: don't send Clear-TT-Buffer following a STALL
Alan Stern [Wed, 18 Nov 2009 16:37:15 +0000 (11:37 -0500)]
USB: EHCI: don't send Clear-TT-Buffer following a STALL

commit c2f6595fbdb408d3d6850cfae590c8fa93e27399 upstream.

This patch (as1304) fixes a regression in ehci-hcd.  Evidently some
hubs don't handle Clear-TT-Buffer requests correctly, so we should
avoid sending them when they don't appear to be absolutely necessary.
The reported symptom is that output on a downstream audio device cuts
out because the hub stops relaying isochronous packets.

The patch prevents Clear-TT-Buffer requests from being sent following
a STALL handshake.  In theory a STALL indicates either that the
downstream device sent a STALL or that no matching TT buffer could be
found.  In either case, the transfer is completed and the TT buffer
does not remain busy, so it doesn't need to be cleared.

Also, the patch fixes a minor flaw in the code that actually sends the
Clear-TT-Buffer requests.  Although the pipe direction isn't really
used for control transfers, it should be a Send rather than a Receive.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Javier Kohen <jkohen@users.sourceforge.net>
CC: David Brownell <david-b@pacbell.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agospeedstep-ich: fix error caused by 394122ab144dae4b276d74644a2f11c44a60ac5c
Rusty Russell [Tue, 3 Nov 2009 07:35:30 +0000 (23:35 -0800)]
speedstep-ich: fix error caused by 394122ab144dae4b276d74644a2f11c44a60ac5c

commit 8dca15e40889e5d5e9655b03ba79c26200f760ce upstream.

"[CPUFREQ] cpumask: avoid playing with cpus_allowed in speedstep-ich.c"
changed the code to mistakenly pass the current cpu as the "processor"
argument of speedstep_get_frequency(), whereas it should be the type of
the processor.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=14340

Based on a patch by Dave Mueller.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Dominik Brodowski <linux@brodo.de>
Reported-by: Dave Mueller <dave.mueller@gmx.ch>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoipv4: additional update of dev_net(dev) to struct *net in ip_fragment.c, NULL ptr...
David Ford [Mon, 30 Nov 2009 07:02:22 +0000 (23:02 -0800)]
ipv4: additional update of dev_net(dev) to struct *net in ip_fragment.c, NULL ptr OOPS

commit bbf31bf18d34caa87dd01f08bf713635593697f2 upstream.

ipv4 ip_frag_reasm(), fully replace 'dev_net(dev)' with 'net', defined
previously patched into 2.6.29.

Between 2.6.28.10 and 2.6.29, net/ipv4/ip_fragment.c was patched,
changing from dev_net(dev) to container_of(...).  Unfortunately the goto
section (out_fail) on oversized packets inside ip_frag_reasm() didn't
get touched up as well.  Oversized IP packets cause a NULL pointer
dereference and immediate hang.

I discovered this running openvasd and my previous email on this is
titled:  NULL pointer dereference at 2.6.32-rc8:net/ipv4/ip_fragment.c:566

Signed-off-by: David Ford <david@blue-labs.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoV4L/DVB (13314): saa7134: set ts_force_val for the Hauppauge WinTV HVR-1150
Michael Krufky [Wed, 4 Nov 2009 17:23:57 +0000 (14:23 -0300)]
V4L/DVB (13314): saa7134: set ts_force_val for the Hauppauge WinTV HVR-1150

commit 22370ef5035f206283505409c9a64a595c5c7320 upstream.

The Hauppauge WinTV HVR-1150 retail boards require the FORCE_TS_VALID bit
to be set in order to function properly. This change will work on the early
revisions on the board as well, but the final revision will not function
without this change.

Signed-off-by: Michael Krufky <mkrufky@kernellabs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoV4L/DVB (13313): saa7134: add support for FORCE_TS_VALID mode for mpeg ts input
Michael Krufky [Wed, 4 Nov 2009 17:19:35 +0000 (14:19 -0300)]
V4L/DVB (13313): saa7134: add support for FORCE_TS_VALID mode for mpeg ts input

commit 4007a672abd88091e3cced158ec491d41c0c454c upstream.

When FORCE_TS_VALID mode is enabled, the saa713x will accept MPEG TS input
without requiring TS_VALID set high.  This is required for some new boards
to function properly, due to the hardware design implementation.

The configuration is toggled within the board setup configuration.  Boards
that do not have this bit set will function as before with no change.

Signed-off-by: Michael Krufky <mkrufky@kernellabs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agoV4L/DVB (13202): smsusb: add autodetection support for three additional Hauppauge...
Michael Krufky [Wed, 21 Oct 2009 21:27:29 +0000 (18:27 -0300)]
V4L/DVB (13202): smsusb: add autodetection support for three additional Hauppauge USB IDs

commit 78c948ab0cc44f9c8ae397d7d9d217bb498bfa2f upstream.

Add support for three new Hauppauge Device USB IDs:

2040:b900
2040:b910
2040:c000

Signed-off-by: Michael Krufky <mkrufky@kernellabs.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
14 years agosched: Fix isolcpus boot option
Rusty Russell [Wed, 2 Dec 2009 03:39:16 +0000 (14:09 +1030)]
sched: Fix isolcpus boot option

commit bdddd2963c0264c56f18043f6fa829d3c1d3d1c0 upstream.

Anton Blanchard wrote:

> We allocate and zero cpu_isolated_map after the isolcpus
> __setup option has run. This means cpu_isolated_map always
> ends up empty and if CPUMASK_OFFSTACK is enabled we write to a
> cpumask that hasn't been allocated.

I introduced this regression in 49557e620339cb13 (sched: Fix
boot crash by zalloc()ing most of the cpu masks).

Use the bootmem allocator if they set isolcpus=, otherwise
allocate and zero like normal.

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: peterz@infradead.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <200912021409.17013.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
14 years agosched: Fix boot crash by zalloc()ing most of the cpu masks
Rusty Russell [Mon, 2 Nov 2009 10:07:20 +0000 (20:37 +1030)]
sched: Fix boot crash by zalloc()ing most of the cpu masks

commit 49557e620339cb134127b5bfbcfecc06b77d0232 upstream.

I got a boot crash when forcing cpumasks offstack on 32 bit,
because find_new_ilb() returned 3 on my UP system (nohz.cpu_mask
wasn't zeroed).

AFAICT the others need to be zeroed too: only
nohz.ilb_grp_nohz_mask is initialized before use.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <200911022037.21282.rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>