I encountered an issue that not to link up on cxgb3 fabric.
I bisected and found that this regression was introduced by 0f07c4ee8c800923ae7918c231532a9256233eed.
Correct to pass phy_addr to cphy_init() at t3_xaui_direct_phy_prep().
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Acked-by: Divy Le Ray <divy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
In the FPU emulator code of the MIPS, the Cause bits of the FCSR register
are not currently writeable by the ctc1 instruction. In odd corner cases,
this can cause problems. For example, a case existed where a divide-by-zero
exception was generated by the FPU, and the signal handler attempted to
restore the FPU registers to their state before the exception occurred. In
this particular setup, writing the old value to the FCSR register would
cause another divide-by-zero exception to occur immediately. The solution
is to change the ctc1 instruction emulator code to allow the Cause bits of
the FCSR register to be writeable. This is the behaviour of the hardware
that the code is emulating.
This problem was found by Shane McDonald, but the credit for the fix goes
to Kevin Kissell. In Kevin's words:
I submit that the bug is indeed in that ctc_op: case of the emulator. The
Cause bits (17:12) are supposed to be writable by that instruction, but the
CTC1 emulation won't let them be updated by the instruction. I think that
actually if you just completely removed lines 387-388 [...] things would
work a good deal better. At least, it would be a more accurate emulation of
the architecturally defined FPU. If I wanted to be really, really pedantic
(which I sometimes do), I'd also protect the reserved bits that aren't
necessarily writable.
Signed-off-by: Shane McDonald <mcdonald.shane@gmail.com>
To: anemo@mba.ocn.ne.jp
To: kevink@paralogos.com
To: sshtylyov@mvista.com
Patchwork: http://patchwork.linux-mips.org/patch/1205/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Please find attached a patch which adds the device ID for the Belkin F5D8053 v6 to the rtl8192su driver. I've tested this in 2.6.34-rc3
(Ubuntu 9.10 amd64) and the network adapter is working flawlessly.
Signed-off-by: Richard Airlie <richard@backtrace.co.uk> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes the TX_LIMIT feature flag. The previous logic check
for TX_LIMIT2 also took into account a device that only had TX_LIMIT
set.
Reported-by: Stephen Mulcahu <stephen.mulcahy@deri.org> Reported-by: Ben Huchings <ben@decadent.org.uk> Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
IR support on FusionHDTV cards is broken since kernel 2.6.31. One side
effect of the switch to the standard binding model for IR I2C devices
was to let i2c-core do the probing instead of the ir-kbd-i2c driver.
There is a slight difference between the two probe methods: i2c-core
uses 0-byte writes, while the ir-kbd-i2c was using 0-byte reads. As
some IR I2C devices only support reads, the new probe method fails to
detect them.
For now, revert to letting the driver do the probe, using 0-byte
reads. In the future, i2c-core will be extended to let callers of
i2c_new_probed_device() provide a custom probing function.
Signed-off-by: Jean Delvare <khali@linux-fr.org> Tested-by: "Timothy D. Lenz" <tlenz@vorgon.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix ULE decapsulation bug when less than 4 bytes of ULE SNDU is packed
into the remaining bytes of a MPEG2-TS frame
ULE (Unidirectional Lightweight Encapsulation RFC 4326) decapsulation
code has a bug that incorrectly treats ULE SNDU packed into the
remaining 2 or 3 bytes of a MPEG2-TS frame as having invalid pointer
field on the subsequent MPEG2-TS frame.
Signed-off-by: Ang Way Chuang <wcang@nav6.org> Acked-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch fixes a division by zero error in the irq handler.
There is a small window between the hw_params() callback and when
runtime->frame_bits is set by ALSA middle layer. When another substream is
already running, if an interrupt is delivered during that window the irq
handler calls pcm_pointer() which does a division by zero. The patch below
makes the irq handler skip substreams that are initialized but not started
yet. Cc to Clemens Ladisch because he proposed an alternate fix.
For more information, please read the original thread in the linux-kernel
mailing list: http://lkml.org/lkml/2010/2/2/187
395913d0b1db37092ea3d9d69b832183b1dd84c5 ("[CPUFREQ] remove rwsem lock
from CPUFREQ_GOV_STOP call (second call site)") is not needed, because
there is no rwsem lock in cpufreq_ondemand and cpufreq_conservative
anymore. Lock should not be released until the work done.
fix memory leak introduced by the patch 6e03a201bbe:
firmware: speed up request_firmware()
1. vfree won't release pages there were allocated explicitly and mapped
using vmap. The memory has to be vunmap-ed and the pages needs
to be freed explicitly
2. page array is moved into the 'struct
firmware' so that we can free it from release_firmware()
and not only in fw_dev_release()
The fix doesn't break the firmware load speed.
Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Ming Lei <tom.leiming@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Singed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
dm9000_set_rx_csum and dm9000_hash_table are called from atomic context (in
dm9000_init_dm9000), and from non-atomic context (via ethtool_ops and
net_device_ops respectively). This causes a spinlock recursion BUG. Fix this by
renaming these functions to *_unlocked for the atomic context, and make the
original functions locking wrappers for use in the non-atomic context.
Signed-off-by: Baruch Siach <baruch@tkos.co.il> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When operating in 1-bit mode, SDAT1 is used as dedicated interrupt line.
However, the 8686 will only drive this line when the ECSI bit is set in
the CCCR_IF register.
Thanks to Alagu Sankar for pointing me in the right direction.
Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Alagu Sankar <alagusankar@embwise.com> Cc: Volker Ernst <volker.ernst@txtr.com> Cc: Dan Williams <dcbw@redhat.com> Cc: John W. Linville <linville@tuxdriver.com> Cc: Holger Schurig <hs4233@mail.mn-solutions.de> Cc: Bing Zhao <bzhao@marvell.com> Cc: libertas-dev@lists.infradead.org Cc: linux-wireless@vger.kernel.org Cc: linux-mmc@vger.kernel.org Acked-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The ACPI spec tells us that the firmware will reenable SCI_EN on resume.
Reality disagrees in some cases. The ACPI spec tells us that the only way
to set SCI_EN is via an SMM call.
https://bugzilla.kernel.org/show_bug.cgi?id=13745 shows us that doing so
may break machines. Tracing the ACPI calls made by Windows shows that it
unconditionally sets SCI_EN on resume with a direct register write, and
therefore the overwhelming probability is that everything is fine with
this behaviour.
Signed-off-by: Matthew Garrett <mjg@redhat.com> Tested-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com> Cc: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It turns out that there is a bit in the _CST for Intel FFH C3
that tells the OS if we should be checking BM_STS or not.
Linux has been unconditionally checking BM_STS.
If the chip-set is configured to enable BM_STS,
it can retard or completely prevent entry into
deep C-states -- as illustrated by turbostat:
ref: Intel Processor Vendor-Specific ACPI Interface Specification
table 4 "_CST FFH GAS Field Encoding"
Bit 1: Set to 1 if OSPM should use Bus Master avoidance for this C-state
https://bugzilla.kernel.org/show_bug.cgi?id=15886
Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Save/restore MISC_ENABLE register on suspend/resume.
This fixes OOPS (invalid opcode) on resume from STR on Asus P4P800-VM,
which wakes up with MWAIT disabled.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org> Tested-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This code has been shamelessly stolen from XFS at the suggestion
of Christoph Hellwig. I've not added support for cached ACLs so
far... watch for that in a later patch, although this is designed
in such a way that they should be easy to add.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
futex_find_get_task is currently used (through lookup_pi_state) from two
contexts, futex_requeue and futex_lock_pi_atomic. None of the paths
looks it needs the credentials check, though. Different (e)uids
shouldn't matter at all because the only thing that is important for
shared futex is the accessibility of the shared memory.
The credentail check results in glibc assert failure or process hang (if
glibc is compiled without assert support) for shared robust pthread
mutex with priority inheritance if a process tries to lock already held
lock owned by a process with a different euid:
The problem is that futex_lock_pi_atomic which is called when we try to
lock already held lock checks the current holder (tid is stored in the
futex value) to get the PI state. It uses lookup_pi_state which in turn
gets task struct from futex_find_get_task. ESRCH is returned either
when the task is not found or if credentials check fails.
futex_lock_pi_atomic simply returns if it gets ESRCH. glibc code,
however, doesn't expect that robust lock returns with ESRCH because it
should get either success or owner died.
Signed-off-by: Michal Hocko <mhocko@suse.cz> Acked-by: Darren Hart <dvhltc@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Nick Piggin <npiggin@suse.de> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Marcelo Tosatti [Fri, 28 May 2010 12:44:59 +0000 (09:44 -0300)]
KVM: MMU: invalidate and flush on spte small->large page size change
Always invalidate spte and flush TLBs when changing page size, to make
sure different sized translations for the same address are never cached
in a CPU's TLB.
Currently the only case where this occurs is when a non-leaf spte pointer is
overwritten by a leaf, large spte entry. This can happen after dirty
logging is disabled on a memslot, for example.
Noticed by Andrea.
KVM-Stable-Tag Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
(cherry picked from commit 3be2264be3c00865116f997dc53ebcc90fe7fc4b)
Joerg Roedel [Mon, 17 May 2010 12:43:35 +0000 (14:43 +0200)]
KVM: SVM: Implement workaround for Erratum 383
This patch implements a workaround for AMD erratum 383 into
KVM. Without this erratum fix it is possible for a guest to
kill the host machine. This patch implements the suggested
workaround for hypervisors which will be published by the
next revision guide update.
Joerg Roedel [Mon, 17 May 2010 12:43:34 +0000 (14:43 +0200)]
KVM: SVM: Handle MCEs early in the vmexit process
This patch moves handling of the MC vmexits to an earlier
point in the vmexit. The handle_exit function is too late
because the vcpu might alreadry have changed its physical
cpu.
Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
(cherry picked from commit fe5913e4e1700cbfc337f4b1da9ddb26f6a55586)
Avi Kivity [Thu, 27 May 2010 11:35:58 +0000 (14:35 +0300)]
KVM: MMU: Remove user access when allowing kernel access to gpte.w=0 page
If cr0.wp=0, we have to allow the guest kernel access to a page with pte.w=0.
We do that by setting spte.w=1, since the host cr0.wp must remain set so the
host can write protect pages. Once we allow write access, we must remove
user access otherwise we mistakenly allow the user to write the page.
Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
(cherry picked from commit 69325a122580d3a7b26589e8efdd6663001c3297)
Add a new ext4 state to tell us when a file has been newly created; use
that state in ext4_sync_file in no-journal mode to tell us when we need
to sync the parent directory as well as the inode and data itself. This
fixes a problem in which a panic or power failure may lose the entire
file even when using fsync, since the parent directory entry is lost.
If i_data_sem was internally dropped due to transaction restart, it is
necessary to restart path look-up because extents tree was possibly
modified by ext4_get_block().
https://bugzilla.kernel.org/show_bug.cgi?id=15827
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dimitry Monakhov discovered an edge case where it was possible for the
EXT4_EOFBLOCKS_FL flag could get cleared unnecessarily. This is true;
I have a test case that can be exercised via downloading and
decompressing the file:
However, triggering it in real life is highly unlikely since it
requires an extremely fragmented sparse file with a hole in exactly
the right place in the extent tree. (It actually took quite a bit of
work to generate this test case.) Still, it's nice to get even
extreme corner cases to be correct, so this patch makes sure that we
don't clear the EXT4_EOFBLOCKS_FL incorrectly even in this corner
case.
If the EOFBLOCK_FL flag is set when it should not be and the inode is
zero length, then eh_entries is zero, and ex is NULL, so dereferencing
ex to print ex->ee_block causes a kernel OOPS in
ext4_ext_map_blocks().
On top of that, the error message which is printed isn't very helpful.
So we fix this by printing something more explanatory which doesn't
involve trying to print ex->ee_block.
At several places we modify EXT4_I(inode)->i_flags without holding
i_mutex (ext4_do_update_inode, ...). These modifications are racy and
we can lose updates to i_flags. So convert handling of i_flags to use
bitops which are atomic.
This adds a new field in ext4_group_info to cache the largest available
block range in a block group; and don't load the buddy pages until *after*
we've done a sanity check on the block group.
With large allocation requests (e.g., fallocate(), 8MiB) and relatively full
partitions, it's easy to have no block groups with a block extent large
enough to satisfy the input request length. This currently causes the loop
during cr == 0 in ext4_mb_regular_allocator() to load the buddy bitmap pages
for EVERY block group. That can be a lot of pages. The patch below allows
us to call ext4_mb_good_group() BEFORE we load the buddy pages (although we
have check again after we lock the block group).
Currently block/inode/dir counters initialized before journal was
recovered. In fact after journal recovery this info will probably
change. And freeblocks it critical for correct delalloc mode
accounting.
https://bugzilla.kernel.org/show_bug.cgi?id=15768
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
- Reorganize locking scheme to batch two atomic operation in to one.
This also allow us to state what healthy group must obey following rule
ext4_free_inodes_count(sb, gdp) == ext4_count_free(inode_bitmap, NUM);
- Fix possible undefined pointer dereference.
- Even if group descriptor stats aren't accessible we have to update
inode bitmaps.
- Move non-group members update out of group_lock.
Note: this commit has been observed to fix fs corruption problems
under heavy fs load
The extents code will sometimes zero out blocks and mark them as
initialized instead of splitting an extent into several smaller ones.
This optimization however, causes problems if the extent is beyond
i_size because fsck will complain if there are uninitialized blocks
after i_size as this can not be distinguished from an inode that has
an incorrect i_size field.
There was a bug reported on RHEL5 that a 10G dd on a 12G box
had a very, very slow sync after that.
At issue was the loop in write_cache_pages scanning all the way
to the end of the 10G file, even though the subsequent call
to mpage_da_submit_io would only actually write a smallish amt; then
we went back to the write_cache_pages loop ... wasting tons of time
in calling __mpage_da_writepage for thousands of pages we would
just revisit (many times) later.
Upstream it's not such a big issue for sys_sync because we get
to the loop with a much smaller nr_to_write, which limits the loop.
However, talking with Aneesh he realized that fsync upstream still
gets here with a very large nr_to_write and we face the same problem.
This patch makes mpage_add_bh_to_extent stop the loop after we've
accumulated 2048 pages, by setting mpd->io_done = 1; which ultimately
causes the write_cache_pages loop to break.
Repeating the test with a dirty_ratio of 80 (to leave something for
fsync to do), I don't see huge IO performance gains, but the reduction
in cpu usage is striking: 80% usage with stock, and 2% with the
below patch. Instrumenting the loop in write_cache_pages clearly
shows that we are wasting time here.
Eventually we need to change mpage_da_map_pages() also submit its I/O
to the block layer, subsuming mpage_da_submit_io(), and then change it
call ext4_get_blocks() multiple times.
Turn off issuance of discard requests if the device does
not support it - similar to the action we take for barriers.
This will save a little computation time if a non-discardable
device is mounted with -o discard, and also makes it obvious
that it's not doing what was asked at mount time ...
ext4_freeze() used jbd2_journal_lock_updates() which takes
the j_barrier mutex, and then returns to userspace. The
kernel does not like this:
================================================
[ BUG: lock held when returning to user space! ]
------------------------------------------------
lvcreate/1075 is leaving the kernel with locks still held!
1 lock held by lvcreate/1075:
#0: (&journal->j_barrier){+.+...}, at: [<ffffffff811c6214>]
jbd2_journal_lock_updates+0xe1/0xf0
Use vfs_check_frozen() added to ext4_journal_start_sb() and
ext4_force_commit() instead.
If groups_per_flex < 2, sbi->s_flex_groups[] doesn't get filled out,
and every other access to this first tests s_log_groups_per_flex;
same thing needs to happen in resize or we'll wander off into
a null pointer when doing an online resize of the file system.
Thanks to Christoph Biedl, who came up with the trivial testcase:
I have an x86_64 kernel with i386 userspace. e4defrag fails on the
EXT4_IOC_MOVE_EXT ioctl because it is not wired up for the compat
case. It seems that struct move_extent is compat save, only types
with fixed widths are used:
{
__u32 reserved; /* should be zero */
__u32 donor_fd; /* donor file descriptor */
__u64 orig_start; /* logical start offset in block for orig */
__u64 donor_start; /* logical start offset in block for donor */
__u64 len; /* block length to be moved */
__u64 moved_len; /* moved block length */
};
Lets just wire up EXT4_IOC_MOVE_EXT for the compat case.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> CC: Akira Fujita <a-fujita@rs.jp.nec.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When EIO occurs after bio is submitted, there is no memory free
operation for bio, which results in memory leakage. And there is also
no check against bio_alloc() for bio.
Calls to ext4_get_inode_loc() returns with a reference to a buffer
head in iloc->bh. The callers of this function in ext4_write_inode()
when in no journal mode and in ext4_xattr_fiemap() don't release the
buffer head after using it.
In the no-journal case, ext4_write_inode() will fetch the bh and call
sync_dirty_buffer() on it. However, if the bh has already been
written and the bh reclaimed for some other purpose, AND if the inode
is the only one in the inode table block in use, then
ext4_get_inode_loc() will not read the inode table block from disk,
but as an optimization, fill the block with zero's assuming that its
caller will copy in the on-disk version of the inode. This is not
done by ext4_write_inode(), so the contents of the inode can simply
get lost. The fix is to use __ext4_get_inode_loc() with in_mem set to
0, instead of ext4_get_inode_loc(). Long term the API needs to be
fixed so it's obvious why latter is not safe.
When used_dirs was introduced for the flex_groups struct, it looks
like the accounting was not put into place properly, in some places
manipulating free_inodes rather than used_dirs.
If EXT4_IOC_MOVE_EXT ioctl is called with NULL donor_fd, fget() in
ext4_ioctl() gets inappropriate file structure for donor; so we need
to do this check earlier, before calling double_down_write_data_sem().
If the leaf node has 2 extent space or fewer and EXT4_IOC_MOVE_EXT
ioctl is called with the file offset where after the 2nd extent
covers, mext_insert_across_blocks() always tries to insert extent into
the first extent. As a result, the file gets corrupted because of
wrong extent order. The patch fixes this problem.
The callers of ext4_check_dir_entry() usually pass in the "file
offset" (ext4_readdir, htree_dirblock_to_tree, search_dirblock,
ext4_dx_find_entry, empty_dir), but a few callers (add_dirent_to_buf,
ext4_delete_entry) only pass in the buffer offset.
To accomodate those last two (which would be hard to fix otherwise),
this patch changes ext4_check_dir_entry() to print the physical block
number and the relative offset as well as the passed-in offset.
In case of truncate errors we explicitly remove inode from in-core
orphan list via orphan_del(NULL, inode) without modifying the on-disk list.
But later on, the same inode may be inserted in the orphan list again
which will result the on-disk linked list getting corrupted. If inode
i_dtime contains valid value, then skip on-disk list modification.
Set i_nlink to zero for temporary inode from very beginning.
otherwise we may fail to start new journal handle and this
inode will be unreferenced but with i_nlink == 1
Since we hold inode reference it can not be pruned.
The ext4 multiblock allocator decides whether to use group or file
preallocation based on the file size. When the file size reaches
s_mb_stream_request (default is 16 blocks), it changes to use a
file-specific preallocation. This is cool, but it has a tiny problem.
See a simple script:
mkfs.ext4 -b 1024 /dev/sda8 1000000
mount -t ext4 -o nodelalloc /dev/sda8 /mnt/ext4
for((i=0;i<5;i++))
do
cat /mnt/4096>>/mnt/ext4/a #4096 is a file with 4096 characters.
cat /mnt/4096>>/mnt/ext4/b
done
debuge4fs -R 'stat a' /dev/sda8|grep BLOCKS -A 1
And you get
BLOCKS:
(0-14):8705-8719, (15):2356, (16-19):8465-8468
So there are 3 extents, a bit strange for the lonely 15th logical
block. As we write to the 16 blocks, we choose file preallocation in
ext4_mb_group_or_file, but in ext4_mb_normalize_request, we meet with
the 16*1024 range, so no preallocation will be carried. file b then
reserves the space after '2356', so when when write 16, we start from
another part.
This patch just change the check in ext4_mb_group_or_file, so
that for the lonely 15 we will still use group preallocation.
After the patch, we will get:
debuge4fs -R 'stat a' /dev/sda8|grep BLOCKS -A 1
BLOCKS:
(0-15):8705-8720, (16-19):8465-8468
Looks more sane. Thanks.
Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
fallocate() may potentially instantiate blocks past EOF, depending
on the flags used when it is called.
e2fsck currently has a test for blocks past i_size, and it
sometimes trips up - noticeably on xfstests 013 which runs fsstress.
This patch from Jiayang does fix it up - it (along with
e2fsprogs updates and other patches recently from Aneesh) has
survived many fsstress runs in a row.
Calls to ext4_handle_dirty_metadata should only pass in an inode
pointer for inode-specific metadata, and not for shared metadata
blocks such as inode table blocks, block group descriptors, the
superblock, etc.
The BUG_ON can get tripped when updating a special device (such as a
block device) that is opened (so that i_mapping is set in
fs/block_dev.c) and the file system is mounted in no journal mode.
At several places we modify EXT4_I(inode)->i_state without holding
i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage,
ext4_do_update_inode, ...). These modifications are racy and we can
lose updates to i_state. So convert handling of i_state to use bitops
which are atomic.
Cc: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
We should update reserve space if it is delalloc buffer
and that is indicated by EXT4_GET_BLOCKS_DELALLOC_RESERVE flag.
So use EXT4_GET_BLOCKS_DELALLOC_RESERVE in place of
EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE
[ Stable note: This fixes a corruption cuased by the following
reproduction case:
When we fallocate a region of the file which we had recently written,
and which is still in the page cache marked as delayed allocated blocks
we need to make sure we don't do the quota update on writepage path.
This is because the needed quota updated would have already be done
by fallocate.
In the past, ext4_calc_metadata_amount(), and its sub-functions
ext4_ext_calc_metadata_amount() and ext4_indirect_calc_metadata_amount()
badly over-estimated the number of metadata blocks that might be
required for delayed allocation blocks. This didn't matter as much
when functions which managed the reserved metadata blocks were more
aggressive about dropping reserved metadata blocks as delayed
allocation blocks were written, but unfortunately they were too
aggressive. This was fixed in commit 0637c6f, but as a result the
over-estimation by ext4_calc_metadata_amount() would lead to reserving
2-3 times the number of pending delayed allocation blocks as
potentially required metadata blocks. So if there are 1 megabytes of
blocks which have been not yet been allocation, up to 3 megabytes of
space would get reserved out of the user's quota and from the file
system free space pool until all of the inode's data blocks have been
allocated.
This commit addresses this problem by much more accurately estimating
the number of metadata blocks that will be required. It will still
somewhat over-estimate the number of blocks needed, since it must make
a worst case estimate not knowing which physical blocks will be
needed, but it is much more accurate than before.
As reported in Kernel Bugzilla #14936, commit d21cd8f triggered a BUG
in the function ext4_da_update_reserve_space() found in
fs/ext4/inode.c. The root cause of this BUG() was caused by the fact
that ext4_calc_metadata_amount() can severely over-estimate how many
metadata blocks will be needed, especially when using direct
block-mapped files.
In addition, it can also badly *under* estimate how much space is
needed, since ext4_calc_metadata_amount() assumes that the blocks are
contiguous, and this is not always true. If the application is
writing blocks to a sparse file, the number of metadata blocks
necessary can be severly underestimated by the functions
ext4_da_reserve_space(), ext4_da_update_reserve_space() and
ext4_da_release_space(). This was the cause of the dq_claim_space
reports found on kerneloops.org.
Unfortunately, doing this right means that we need to massively
over-estimate the amount of free space needed. So in some cases we
may need to force the inode to be written to disk asynchronously in
to avoid spurious quota failures.
This fixes a bug (found by Curt Wohlgemuth) in which new blocks
returned from an extent created with ext4_ext_zeroout() can have dirty
metadata still associated with them.
When ext4_da_writepages increases the nr_to_write in writeback_control
then it must always re-base the return value. Originally there was a
(misguided) attempt prevent wbc.nr_to_write from going negative. In
fact, it's necessary to allow nr_to_write to be negative so that
wb_writeback() can correctly calculate how many pages were actually
written.
b_entry_name and buffer are initially NULL, are initialized within a loop
to the result of calling kmalloc, and are freed at the bottom of this loop.
The loop contains gotos to cleanup, which also frees b_entry_name and
buffer. Some of these gotos are before the reinitializations of
b_entry_name and buffer. To maintain the invariant that b_entry_name and
buffer are NULL at the top of the loop, and thus acceptable arguments to
kfree, these variables are now set to NULL after the kfrees.
This seems to be the simplest solution. A more complicated solution
would be to introduce more labels in the error handling code at the end of
the function.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
This is a bit complicated because we are trying to optimize when we
send barriers to the fs data disk. We could just throw in an extra
barrier to the data disk whenever we send a barrier to the journal
disk, but that's not always strictly necessary.
We only need to send a barrier during a commit when there are data
blocks which are must be written out due to an inode written in
ordered mode, or if fsync() depends on the commit to force data blocks
to disk. Finally, before we drop transactions from the beginning of
the journal during a checkpoint operation, we need to guarantee that
any blocks that were flushed out to the data disk are firmly on the
rust platter before we drop the transaction from the journal.
Thanks to Oleg Drokin for pointing out this flaw in ext3/ext4.
This patch fixes the Kernel BZ #14286. When the address of an extent
corresponding to a valid block is corrupted, a -EIO should be reported
instead of a BUG(). This situation should not normally not occur
except in the case of a corrupted filesystem. If however it does,
then the system should not panic directly but depending on the mount
time options appropriate action should be taken. If the mount options
so permit, the I/O should be gracefully aborted by returning a -EIO.
We have to delay vfs_dq_claim_space() until allocation context destruction.
Currently we have following call-trace:
ext4_mb_new_blocks()
/* task is already holding ac->alloc_semp */
->ext4_mb_mark_diskspace_used
->vfs_dq_claim_space() /* acquire dqptr_sem here. Possible deadlock */
->ext4_mb_release_context() /* drop ac->alloc_semp here */
Let's move quota claiming to ext4_da_update_reserve_space()
=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.32-rc7 #18
-------------------------------------------------------
write-truncate-/3465 is trying to acquire lock:
(&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0
but task is already holding lock:
(&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
struct ethtool_rxnfc was originally defined in 2.6.27 for the
ETHTOOL_{G,S}RXFH command with only the cmd, flow_type and data
fields. It was then extended in 2.6.30 to support various additional
commands. These commands should have been defined to use a new
structure, but it is too late to change that now.
Since user-space may still be using the old structure definition
for the ETHTOOL_{G,S}RXFH commands, and since they do not need the
additional fields, only copy the originally defined fields to and
from user-space.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When an attempt is made to read the interface strings of the Artisman
Watchdog USB dongle (idVendor:idProduct 04b4:0526) an error is written
to the dmesg log (uhci_result_common: failed with status 440000) and the
dongle resets itself, resulting in a disconnect/reconnect loop.
Adding the dongle to the list of devices in quirks.c, with the same
quirk Alan Stern's previous patch for the Saitek Cyborg Gold 3D
joystick, stops the device from resetting and allows it to be used with
no problems.
Signed-off-by: Paul Mortier <mortier@btinternet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This patch (as1403) is a partial reversion of an earlier change
(commit 5f677f1d45b2bf08085bbba7394392dfa586fa8e "USB: fix remote
wakeup settings during system sleep"). After hearing from a user, I
realized that remote wakeup should be enabled during system sleep
whenever userspace allows it, and not only if a driver requests it
too.
Indeed, there could be a device with no driver, that does nothing but
generate a wakeup request when the user presses a button. Such a
device should be allowed to do its job.
The problem fixed by the earlier patch -- device generating a wakeup
request for no reason, causing system suspend to abort -- was also
addressed by a later patch ("USB: don't enable remote wakeup by
default", accepted but not yet merged into mainline). The device
won't be able to generate the bogus wakeup requests because it will be
disabled for remote wakeup by default. Hence this reversion will not
re-introduce any old problems.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
hpet_disable is called unconditionally on machine reboot if hpet support
is compiled in the kernel.
hpet_disable only checks if the machine is hpet capable but doesn't make
sure that hpet has been initialized.
[ tglx: Made it a one liner and removed the redundant hpet_address check ]
Signed-off-by: Bin Yang <bin.yang@marvell.com> Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
RealView boards with certain revisions of the L220 cache controller (ARM11*
processors only) may have issues (hardware deadlock) with the recent changes to
the mb() barrier implementation (DSB followed by an L2 cache sync). The patch
redefines the RealView ARM11MPCore mandatory barriers without the outer_sync()
call.
Tested-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The Nokia RX51 board code (arch/arm/mach-omap2/board-rx51-peripherals.c)
defines a key map for the matrix keypad keyboard. The hardware seems to
use all of the 8 rows and 8 columns of the keypad, although not all
possible locations are used.
The TWL4030 supports keypads with at most 8 rows and 8 columns. Most keys
are defined with a row and column number between 0 and 7, except
which represent keycodes that should be emitted when entire row is
connected to the ground. since the driver handles this case as if we
had an extra column in the key matrix. Unfortunately we do not allocate
enough space and end up owerwriting some random memory.
Gigabyte "Spring Peak" notebook indicates wrong chassis-type, tripping up
i8042 and breaking the touchpad. Add this model to i8042_dmi_noloop_table[]
to resolve.
Sumeet Lahorani <sumeet.lahorani@oracle.com> reported that the IPoIB
child entries are world-writable; however we don't want ordinary users
to be able to create and destroy child interfaces, so fix them to be
writable only by root.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Found one x2apic system kexec loop test failed
when CONFIG_NMI_WATCHDOG=y (old) or CONFIG_LOCKUP_DETECTOR=y (current tip)
first kernel can kexec second kernel, but second kernel can not kexec third one.
it can be duplicated on another system with BIOS preenabled x2apic.
First kernel can not kexec second kernel.
It turns out, when kernel boot with pre-enabled x2apic, it will not execute
disable_local_APIC on shutdown path.
when init_apic_mappings() is called in setup_arch, it will skip setting of
apic_phys when x2apic_mode is set. ( x2apic_mode is much early check_x2apic())
Then later, disable_local_APIC() will bail out early because !apic_phys.
So check !x2apic_mode in x2apic_mode in disable_local_APIC with !apic_phys.
another solution could be updating init_apic_mappings() to set apic_phys even
for preenabled x2apic system. Actually even for x2apic system, that lapic
address is mapped already in early stage.
BTW: is there any x2apic preenabled system with apicid of boot cpu > 255?
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C3EB22B.3000701@kernel.org> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
System will crash sooner or later once the memory with the code of the
s3c-sdhci.ko module is reused for something else. I really have no idea
how the lack of remove function went unnoticed into the mainline code.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
On a 32-bit machine, info.rule_cnt >= 0x40000000 leads to integer
overflow and the buffer may be smaller than needed. Since
ETHTOOL_GRXCLSRLALL is unprivileged, this can presumably be used for at
least denial of service.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The ds1307 driver misreads the ds1388 registers when checking for 12 or 24
hour mode. Instead of checking the hour register it reads the minute
register. Therefore the driver thinks minutes >= 40 has the 12HR bit set
and resets the minute register by zeroing the high bits. This results in
minutes are reset to 0-9, jumping back in time 40 or 50 minutes. The time
jump is also written back to the RTC.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Cc: Wan ZongShun <mcuos.com@gmail.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Paul Gortmaker <p_gortmaker@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
It can happen that there are no packets in queue while calling
tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns
NULL and that gets deref'ed to get sacked into a local var.
There is no work to do if no packets are outstanding so we just
exit early.
This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out
guard to make joining diff nicer).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> Tested-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When configuring DMVPN (GRE + openNHRP) and a GRE remote
address is configured a kernel Oops is observed. The
obserseved Oops is caused by a NULL header_ops pointer
(neigh->dev->header_ops) in neigh_update_hhs() when
is executed. The dev associated with the NULL header_ops is
the GRE interface. This patch guards against the
possibility that header_ops is NULL.
This Oops was first observed in kernel version 2.6.26.8.
Signed-off-by: Doug Kehn <rdkehn@yahoo.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>