Yan, Zheng [Wed, 26 May 2010 15:20:30 +0000 (11:20 -0400)]
Btrfs: Fix block generation verification race
After the path is released, the generation number got from block
pointer is no long valid. The race may cause disk corruption, because
verify_parent_transid() calls clear_extent_buffer_uptodate() when
generation numbers mismatch.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 26 May 2010 15:04:10 +0000 (11:04 -0400)]
Btrfs: fix preallocation and nodatacow checks in O_DIRECT
The O_DIRECT code wasn't checking for multiple references
on preallocated or nodatacow extents. This means it
wasn't honoring snapshots properly.
The fix here is to add an explicit check for multiple references
This also fixes the math for selecting the correct disk block,
making sure not to go past the end of the extent.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 26 May 2010 15:02:00 +0000 (11:02 -0400)]
Btrfs: avoid ENOSPC errors in btrfs_dirty_inode
btrfs_dirty_inode tries to sneak in without much waiting or
space reservation, mostly for performance reasons. This
usually works well but can cause problems when there are
many many writers.
When btrfs_update_inode fails with ENOSPC, we fallback
to a slower btrfs_start_transaction call that will reserve
some space.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 26 May 2010 14:59:53 +0000 (10:59 -0400)]
Btrfs: move O_DIRECT space reservation to btrfs_direct_IO
This moves the delalloc space reservation done for O_DIRECT
into btrfs_direct_IO. This way we don't leak reserved space
if the generic O_DIRECT write code errors out before it
calls into btrfs_direct_IO.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Wed, 26 May 2010 00:56:50 +0000 (20:56 -0400)]
Btrfs: rework O_DIRECT enospc handling
This changes O_DIRECT write code to mark extents as delalloc
while it is processing them. Yan Zheng has reworked the
enospc accounting based on tracking delalloc extents and
this makes it much easier to track enospc in the O_DIRECT code.
There are a few space cases with the O_DIRECT code though,
it only sets the EXTENT_DELALLOC bits, instead of doing
EXTENT_DELALLOC | EXTENT_DIRTY | EXTENT_UPTODATE, because
we don't want to mess with clearing the dirty and uptodate
bits when things go wrong. This is important because there
are no pages in the page cache, so any extent state structs
that we put in the tree won't get freed by releasepage. We have
to clear them ourselves as the DIO ends.
With this commit, we reserve space at in btrfs_file_aio_write,
and then as each btrfs_direct_IO call progresses it sets
EXTENT_DELALLOC on the range.
btrfs_get_blocks_direct is responsible for clearing the delalloc
at the same time it drops the extent lock.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 25 May 2010 13:48:28 +0000 (09:48 -0400)]
Btrfs: use async helpers for DIO write checksumming
The async helper threads offload crc work onto all the
CPUs, and make streaming writes much faster. This
changes the O_DIRECT write code to use them. The only
small complication was that we need to pass in the
logical offset in the file for each bio, because we can't
find it in the bio's pages.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Chris Mason [Tue, 25 May 2010 14:12:41 +0000 (10:12 -0400)]
Btrfs: don't walk around with task->state != TASK_RUNNING
Yan Zheng noticed two places we were doing a lot of work
without task->state set to TASK_RUNNING. This sets the state
properly after we get ready to sleep but decide not to.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Sun, 23 May 2010 15:07:21 +0000 (11:07 -0400)]
Btrfs: do aio_write instead of write
In order for AIO to work, we need to implement aio_write. This patch converts
our btrfs_file_write to btrfs_aio_write. I've tested this with xfstests and
nothing broke, and the AIO stuff magically started working. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Sun, 23 May 2010 15:00:55 +0000 (11:00 -0400)]
Btrfs: add basic DIO read/write support
This provides basic DIO support for reading and writing. It does not do the
work to recover from mismatching checksums, that will come later. A few design
changes have been made from Jim's code (sorry Jim!)
1) Use the generic direct-io code. Jim originally re-wrote all the generic DIO
code in order to account for all of BTRFS's oddities, but thanks to that work it
seems like the best bet is to just ignore compression and such and just opt to
fallback on buffered IO.
2) Fallback on buffered IO for compressed or inline extents. Jim's code did
it's own buffering to make dio with compressed extents work. Now we just
fallback onto normal buffered IO.
3) Use ordered extents for the writes so that all of the
lock_extent()
lookup_ordered()
type checks continue to work.
4) Do the lock_extent() lookup_ordered() loop in readpage so we don't race with
DIO writes.
I've tested this with fsx and everything works great. This patch depends on my
dio and filemap.c patches to work. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Normally the DIO code would put these into the same BIO's. The problem is we
need to know exactly what offset is associated with what BIO so we can do our
checksumming and unlocking properly, so putting them in the same BIO doesn't
work. So add another check where we submit the current BIO if the physical
blocks are not contigous OR the logical blocks are not contiguous.
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Sun, 23 May 2010 15:00:55 +0000 (11:00 -0400)]
direct-io: add a hook for the fs to provide its own submit_bio function
Because BTRFS can do RAID and such, we need our own submit hook so we can setup
the bio's in the correct fashion, and handle checksum errors properly. So there
are a few changes here
1) The submit_io hook. This is straightforward, just call this instead of
submit_bio.
2) Allow the fs to return -ENOTBLK for reads. Usually this has only worked for
writes, since writes can fallback onto buffered IO. But BTRFS needs the option
of falling back on buffered IO if it encounters a compressed extent, since we
need to read the entire extent in and decompress it. So if we get -ENOTBLK back
from get_block we'll return back and fallback on buffered just like the write
case.
I've tested these changes with fsx and everything seems to work. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Josef Bacik [Sun, 23 May 2010 15:00:54 +0000 (11:00 -0400)]
fs: allow short direct-io reads to be completed via buffered IO
This is similar to what already happens in the write case. If we have a short
read while doing O_DIRECT, instead of just returning, fallthrough and try to
read the rest via buffered IO. BTRFS needs this because if we encounter a
compressed or inline extent during DIO, we need to fallback on buffered. If the
extent is compressed we need to read the entire thing into memory and
de-compress it into the users pages. I have tested this with fsx and everything
works great. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Yan, Zheng [Sun, 16 May 2010 14:49:59 +0000 (10:49 -0400)]
Btrfs: Metadata ENOSPC handling for tree log
Previous patches make the allocater return -ENOSPC if there is no
unreserved free metadata space. This patch updates tree log code
and various other places to propagate/handle the ENOSPC error.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Yan, Zheng [Sun, 16 May 2010 14:48:47 +0000 (10:48 -0400)]
Btrfs: Update metadata reservation for delayed allocation
Introduce metadata reservation context for delayed allocation
and update various related functions.
This patch also introduces EXTENT_FIRST_DELALLOC control bit for
set/clear_extent_bit. It tells set/clear_bit_hook whether they
are processing the first extent_state with EXTENT_DELALLOC bit
set. This change is important if set/clear_extent_bit involves
multiple extent_state.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
Yan, Zheng [Sun, 16 May 2010 14:46:25 +0000 (10:46 -0400)]
Btrfs: Introduce contexts for metadata reservation
Introducing metadata reseravtion contexts has two major advantages.
First, it makes metadata reseravtion more traceable. Second, it can
reclaim freed space and re-add them to the itself after transaction
committed.
Besides add btrfs_block_rsv structure and related helper functions,
This patch contains following changes:
Move code that decides if freed tree block should be pinned into
btrfs_free_tree_block().
Make space accounting more accurate, mainly for handling read only
block groups.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Yan, Zheng [Sun, 16 May 2010 14:46:24 +0000 (10:46 -0400)]
Btrfs: Link block groups of different raid types
The size of reserved space is stored in space_info. If block groups
of different raid types are linked to separate space_info, changing
allocation profile will corrupt reserved space accounting.
Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
rtnetlink: make SR-IOV VF interface symmetric
sctp: delete active ICMP proto unreachable timer when free transport
tcp: fix MD5 (RFC2385) support
Linus Torvalds [Sun, 16 May 2010 18:11:31 +0000 (11:11 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
MIPS: Oprofile: Fix Loongson irq handler
MIPS: N32: Use compat version for sys_ppoll.
MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1
Wei Yongjun [Sun, 9 May 2010 16:56:07 +0000 (16:56 +0000)]
sctp: delete active ICMP proto unreachable timer when free transport
transport may be free before ICMP proto unreachable timer expire, so
we should delete active ICMP proto unreachable timer when transport
is going away.
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 16 May 2010 07:34:04 +0000 (00:34 -0700)]
tcp: fix MD5 (RFC2385) support
TCP MD5 support uses percpu data for temporary storage. It currently
disables preemption so that same storage cannot be reclaimed by another
thread on same cpu.
We also have to make sure a softirq handler wont try to use also same
context. Various bug reports demonstrated corruptions.
Fix is to disable preemption and BH.
Reported-by: Bhaskar Dutta <bhaskie@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wu Zhangjin [Thu, 6 May 2010 16:59:46 +0000 (00:59 +0800)]
MIPS: Oprofile: Fix Loongson irq handler
The interrupt enable bit for the performance counters is in the Control
Register $24, not in the counter register.
loongson2_perfcount_handler(), we need to use
Shane McDonald [Fri, 7 May 2010 05:26:57 +0000 (23:26 -0600)]
MIPS FPU emulator: allow Cause bits of FCSR to be writeable by ctc1
In the FPU emulator code of the MIPS, the Cause bits of the FCSR register
are not currently writeable by the ctc1 instruction. In odd corner cases,
this can cause problems. For example, a case existed where a divide-by-zero
exception was generated by the FPU, and the signal handler attempted to
restore the FPU registers to their state before the exception occurred. In
this particular setup, writing the old value to the FCSR register would
cause another divide-by-zero exception to occur immediately. The solution
is to change the ctc1 instruction emulator code to allow the Cause bits of
the FCSR register to be writeable. This is the behaviour of the hardware
that the code is emulating.
This problem was found by Shane McDonald, but the credit for the fix goes
to Kevin Kissell. In Kevin's words:
I submit that the bug is indeed in that ctc_op: case of the emulator. The
Cause bits (17:12) are supposed to be writable by that instruction, but the
CTC1 emulation won't let them be updated by the instruction. I think that
actually if you just completely removed lines 387-388 [...] things would
work a good deal better. At least, it would be a more accurate emulation of
the architecturally defined FPU. If I wanted to be really, really pedantic
(which I sometimes do), I'd also protect the reserved bits that aren't
necessarily writable.
Signed-off-by: Shane McDonald <mcdonald.shane@gmail.com>
To: anemo@mba.ocn.ne.jp
To: kevink@paralogos.com
To: sshtylyov@mvista.com
Patchwork: http://patchwork.linux-mips.org/patch/1205/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
---
Nicolas Ferre [Sat, 15 May 2010 16:32:31 +0000 (12:32 -0400)]
mmc: at91_mci: modify cache flush routines
As we were using an internal dma flushing routine, this patch changes to
the DMA API flush_kernel_dcache_page(). Driver is able to compile now.
[akpm@linux-foundation.org: flush_kernel_dcache_page() comes before kunmap_atomic()] Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 15 May 2010 16:03:15 +0000 (09:03 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
JFS: Free sbi memory in error path
fs/sysv: dereferencing ERR_PTR()
Fix double-free in logfs
Fix the regression created by "set S_DEAD on unlink()..." commit
Linus Torvalds [Sat, 15 May 2010 16:03:02 +0000 (09:03 -0700)]
Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf record: Add a fallback to the reference relocation symbol
Jan Blunck [Mon, 12 Apr 2010 23:44:08 +0000 (16:44 -0700)]
JFS: Free sbi memory in error path
I spotted the missing kfree() while removing the BKL.
[akpm@linux-foundation.org: avoid multiple returns so it doesn't happen again] Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Fri, 30 Apr 2010 21:17:09 +0000 (17:17 -0400)]
Fix the regression created by "set S_DEAD on unlink()..." commit
1) i_flags simply doesn't work for mount/unlink race prevention;
we may have many links to file and rm on one of those obviously
shouldn't prevent bind on top of another later on. To fix it
right way we need to mark _dentry_ as unsuitable for mounting
upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and
i_mutex on the inode in question. Set it (with dont_mount(dentry))
in unlink/rmdir/etc., check (with cant_mount(dentry)) in places
in namespace.c that used to check for S_DEAD. Setting S_DEAD
is still needed in places where we used to set it (for directories
getting killed), since we rely on it for readdir/rmdir race
prevention.
2) rename()/mount() protection has another bogosity - we unhash
the target before we'd checked that it's not a mountpoint. Fixed.
3) ancient bogosity in pivot_root() - we locked i_mutex on the
right directory, but checked S_DEAD on the different (and wrong)
one. Noticed and fixed.
Linus Torvalds [Sat, 15 May 2010 04:28:42 +0000 (21:28 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
ARM: 6126/1: ARM mpcore_wdt: fix build failure and other fixes
ARM: 6125/1: ARM TWD: move TWD registers to common header
ARM: 6110/1: Fix Thumb-2 kernel builds when UACCESS_WITH_MEMCPY is enabled
ARM: 6112/1: Use the Inner Shareable I-cache and BTB ops on ARMv7 SMP
ARM: 6111/1: Implement read/write for ownership in the ARMv6 DMA cache ops
ARM: 6106/1: Implement copy_to_user_page() for noMMU
ARM: 6105/1: Fix the __arm_ioremap_caller() definition in nommu.c
Hugh Dickins [Sat, 15 May 2010 02:44:10 +0000 (19:44 -0700)]
profile: fix stats and data leakage
If the kernel is large or the profiling step small, /proc/profile
leaks data and readprofile shows silly stats, until readprofile -r
has reset the buffer: clear the prof_buffer when it is vmalloc()ed.
H. Peter Anvin [Fri, 14 May 2010 20:55:57 +0000 (13:55 -0700)]
x86, mrst: Don't blindly access extended config space
Do not blindly access extended configuration space unless we actively
know we're on a Moorestown platform. The fixed-size BAR capability
lives in the extended configuration space, and thus is not applicable
if the configuration space isn't appropriately sized.
This fixes booting certain VMware configurations with CONFIG_MRST=y.
Moorestown will add a fake PCI-X 266 capability to advertise the
presence of extended configuration space.
Reported-and-tested-by: Petr Vandrovec <petr@vandrovec.name> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Jacob Pan <jacob.jun.pan@intel.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <AANLkTiltKUa3TrKR1M51eGw8FLNoQJSLT0k0_K5X3-OJ@mail.gmail.com>
Linus Torvalds [Fri, 14 May 2010 19:20:09 +0000 (12:20 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
x86, k8: Fix build error when K8_NB is disabled
x86, amd: Check X86_FEATURE_OSVW bit before accessing OSVW MSRs
x86: Fix fake apicid to node mapping for numa emulation
The L3 cache index disable feature of AMD CPUs has to be disabled if the
kernel is running as guest on top of a hypervisor because northbridge
devices are not available to the guest. Currently, this fixes a boot
crash on top of Xen. In the future this will become an issue on KVM as
well.
Check if northbridge devices are present and do not enable the feature
if there are none.
[ hpa: backported to 2.6.34 ]
Signed-off-by: Frank Arnold <frank.arnold@amd.com>
LKML-Reference: <1271945222-5283-3-git-send-email-bp@amd64.org> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@kernel.org>
K8_NB depends on PCI and when the last is disabled (allnoconfig) we fail
at the final linking stage due to missing exported num_k8_northbridges.
Add a header stub for that.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100503183036.GJ26107@aftab> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: <stable@kernel.org>
Linus Torvalds [Fri, 14 May 2010 18:49:42 +0000 (11:49 -0700)]
Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
inotify: don't leak user struct on inotify release
inotify: race use after free/double free in inotify inode marks
inotify: clean up the inotify_add_watch out path
Inotify: undefined reference to `anon_inode_getfd'
Manual merge to remove duplicate "select ANON_INODES" from Kconfig file
Sergei Shtylyov [Thu, 13 May 2010 18:51:51 +0000 (22:51 +0400)]
DA830: fix USB 2.0 clock entry
DA8xx OHCI driver fails to load due to failing clk_get() call for the USB 2.0
clock. Arrange matching USB 2.0 clock by the clock name instead of the device.
(Adding another CLK() entry for "ohci.0" device won't do -- in the future I'll
also have to enable USB 2.0 clock to configure CPPI 4.1 module, in which case
I won't have any device at all.)
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Pavel Emelyanov [Wed, 12 May 2010 22:34:07 +0000 (15:34 -0700)]
inotify: don't leak user struct on inotify release
inotify_new_group() receives a get_uid-ed user_struct and saves the
reference on group->inotify_data.user. The problem is that free_uid() is
never called on it.
Issue seem to be introduced by 63c882a0 (inotify: reimplement inotify
using fsnotify) after 2.6.30.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Eric Paris <eparis@parisplace.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com>
Eric Paris [Tue, 11 May 2010 21:17:40 +0000 (17:17 -0400)]
inotify: race use after free/double free in inotify inode marks
There is a race in the inotify add/rm watch code. A task can find and
remove a mark which doesn't have all of it's references. This can
result in a use after free/double free situation.
Task A Task B
------------ -----------
inotify_new_watch()
allocate a mark (refcnt == 1)
add it to the idr
inotify_rm_watch()
inotify_remove_from_idr()
fsnotify_put_mark()
refcnt hits 0, free
take reference because we are on idr
[at this point it is a use after free]
[time goes on]
refcnt may hit 0 again, double free
The fix is to take the reference BEFORE the object can be found in the
idr.
Signed-off-by: Eric Paris <eparis@redhat.com> Cc: <stable@kernel.org>
Linus Torvalds [Fri, 14 May 2010 14:29:29 +0000 (07:29 -0700)]
Merge branch 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze
* 'for-linus' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Fix module loading on system with WB cache
microblaze: export assembly functions used by modules
microblaze: Remove powerpc code from Microblaze port
microblaze: Remove compilation warnings in cache macro
microblaze: export assembly functions used by modules
microblaze: fix get_user/put_user side-effects
microblaze: re-enable interrupts before calling schedule
Redirecting directly to lsm, here's the patch discussed on lkml:
http://lkml.org/lkml/2010/4/22/219
The mmap_min_addr value is useful information for an admin to see without
being root ("is my system vulnerable to kernel NULL pointer attacks?") and
its setting is trivially easy for an attacker to determine by calling
mmap() in PAGE_SIZE increments starting at 0, so trying to keep it private
has no value.
Only require CAP_SYS_RAWIO if changing the value, not reading it.
Comment from Serge :
Me, I like to write my passwords with light blue pen on dark blue
paper, pasted on my window - if you're going to get my password, you're
gonna get a headache.
Signed-off-by: Kees Cook <kees.cook@canonical.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
(cherry picked from commit 822cceec7248013821d655545ea45d1c6a9d15b3)
Linus Torvalds [Thu, 13 May 2010 21:36:19 +0000 (14:36 -0700)]
Merge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Keep index within boundaries in kvmppc_44x_emul_tlbwe()
KVM: VMX: blocked-by-sti must not defer NMI injections
KVM: x86: Call vcpu_load and vcpu_put in cpuid_update
KVM: SVM: Fix wrong intercept masks on 32 bit
KVM: convert ioapic lock to spinlock
serial: imx.c: fix CTS trigger level lower to avoid lost chars
The imx CTS trigger level is left at its reset value that is 32
chars. Since the RX FIFO has 32 entries, when CTS is raised, the
FIFO already is full. However, some serial port devices first empty
their TX FIFO before stopping when CTS is raised, resulting in lost
chars.
This patch sets the trigger level lower so that other chars arrive
after CTS is raised, there is still room for 16 of them.
Alan Cox [Tue, 4 May 2010 19:42:36 +0000 (20:42 +0100)]
tty: Fix unbalanced BKL handling in error path
Arnd noted:
After the "retry_open:" label, we first get the tty_mutex
and then the BKL. However a the end of tty_open, we jump
back to retry_open with the BKL still held. If we run into
this case, the tty_open function will be left with the BKL
still held.
Jan Kara [Thu, 13 May 2010 10:52:57 +0000 (12:52 +0200)]
vfs: Fix O_NOFOLLOW behavior for paths with trailing slashes
According to specification
mkdir d; ln -s d a; open("a/", O_NOFOLLOW | O_RDONLY)
should return success but currently it returns ELOOP. This is a
regression caused by path lookup cleanup patch series.
Fix the code to ignore O_NOFOLLOW in case the provided path has trailing
slashes.
Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Reported-by: Marius Tolzmann <tolzmann@molgen.mpg.de> Acked-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 13 May 2010 14:35:26 +0000 (07:35 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: ice1724 - Fix ESI Maya44 capture source control
ALSA: pcm - Use pgprot_noncached() for MIPS non-coherent archs
ALSA: virtuoso: fix Xonar D1/DX front panel microphone
ALSA: hda - Add hp-dv4 model for IDT 92HD71bx
ALSA: hda - Fix mute-LED GPIO pin for HP dv series
ALSA: hda: Fix 0 dB for Lenovo models using Conexant CX20549 (Venice)
Linus Torvalds [Thu, 13 May 2010 14:28:43 +0000 (07:28 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: ad7877 - keep dma rx buffers in seperate cache lines
Input: psmouse - reset all types of mice before reconnecting
Input: elantech - use all 3 bytes when checking version
Input: iforce - fix Guillemot Jet Leader 3D entry
Input: iforce - add Guillemot Jet Leader Force Feedback
Mark Brown [Fri, 2 Apr 2010 12:08:39 +0000 (13:08 +0100)]
mfd: Clean up after WM83xx AUXADC interrupt if it arrives late
In certain circumstances, especially under heavy load, the AUXADC
completion interrupt may be detected after we've timed out waiting for
it. That conversion would still succeed but the next conversion will
see the completion that was signalled by the interrupt for the previous
conversion and therefore not wait for the AUXADC conversion to run,
causing it to report failure.
Provide a simple, non-invasive cleanup by using try_wait_for_completion()
to ensure that the completion is not signalled before we wait. Since
the AUXADC is run within a mutex we know there can only have been at
most one AUXADC interrupt outstanding. A more involved change should
follow for the next merge window.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Michal Simek [Thu, 13 May 2010 08:55:47 +0000 (10:55 +0200)]
microblaze: Remove compilation warnings in cache macro
CC arch/microblaze/kernel/cpu/cache.o
arch/microblaze/kernel/cpu/cache.c: In function '__invalidate_dcache_range_wb':
arch/microblaze/kernel/cpu/cache.c:398: warning: ISO C90 forbids mixed declarations and code
arch/microblaze/kernel/cpu/cache.c: In function '__flush_dcache_range_wb':
arch/microblaze/kernel/cpu/cache.c:509: warning: ISO C90 forbids mixed declara
With dma based spi transmission, data corruption is observed
occasionally. With dma buffers located right next to msg and
xfer fields, cache lines correctly flushed in preparation for
dma usage may be polluted again when writing to fields in the
same cache line.
Make sure cache fields used with dma do not share cache lines
with fields changed during dma handling. As both fields are part
of a struct that is allocated via kzalloc, thus cache aligned,
moving the fields to the 1st position and insert padding for
alignment does the job.
Signed-off-by: Oskar Schirmer <os@emlix.com> Signed-off-by: Daniel Glöckner <dg@emlix.com> Signed-off-by: Oliver Schneidewind <osw@emlix.com> Signed-off-by: Johannes Weiner <jw@emlix.com> Acked-by: Mike Frysinger <vapier@gentoo.org>
[dtor@mail.ru - changed to use ___cacheline_aligned as suggested
by akpm] Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Dmitry Torokhov [Thu, 13 May 2010 07:42:23 +0000 (00:42 -0700)]
Input: psmouse - reset all types of mice before reconnecting
Synaptics hardware requires resetting device after suspend to ram
in order for the device to be operational. The reset lives in
synaptics-specific reconnect handler, but it is not being invoked
if synaptics support is disabled and the device is handled as a
standard PS/2 device (bare or IntelliMouse protocol).
Let's add reset into generic reconnect handler as well.
The Microblaze implementations of get_user() and (MMU) put_user() evaluate
the address argument more than once. This causes unexpected side-effects for
invocations that include increment operators, i.e. get_user(foo, bar++).
This patch also removes the distinction between MMU and noMMU put_user().
Without the patch:
$ echo 1234567890 > /proc/sys/kernel/core_pattern
$ cat /proc/sys/kernel/core_pattern
12345
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
perf record: Add a fallback to the reference relocation symbol
Usually "_text" is enough, but I received reports that its not always
available, so fallback to "_stext" for the symbol we use to check if we
need to apply any relocation to all the symbols in the kernel symtab,
for when, for instance, kexec is being used.
Reported-by: Darren Hart <dvhltc@us.ibm.com> Reported-by: Steven Rostedt <rostedt@goodmis.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jan Kiszka [Tue, 11 May 2010 13:16:46 +0000 (15:16 +0200)]
KVM: VMX: blocked-by-sti must not defer NMI injections
As the processor may not consider GUEST_INTR_STATE_STI as a reason for
blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW
when we asked for it. But as we consider this state as NMI-blocking, we
can run into an endless loop.
Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is
active (originally suggested by Gleb). Intel confirmed that this is
safe, the processor will never complain about NMI injection in this
state.
Joerg Roedel [Wed, 5 May 2010 14:04:43 +0000 (16:04 +0200)]
KVM: SVM: Fix wrong intercept masks on 32 bit
This patch makes KVM on 32 bit SVM working again by
correcting the masks used for iret interception. With the
wrong masks the upper 32 bits of the intercepts are masked
out which leaves vmrun unintercepted. This is not legal on
svm and the vmrun fails.
Bug was introduced by commits 95ba827313 and 3cfc3092.
Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Linus Torvalds [Thu, 13 May 2010 01:48:26 +0000 (18:48 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/perf_event: Fix oops due to perf_event_do_pending call
powerpc/swiotlb: Fix off by one in determining boundary of which ops to use
Linus Torvalds [Thu, 13 May 2010 01:47:55 +0000 (18:47 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] correct address of _stext with CONFIG_SHARED_KERNEL=y
[S390] ptrace: fix return value of do_syscall_trace_enter()
[S390] dasd: fix race between tasklet and dasd_sleep_on
Linus Torvalds [Thu, 13 May 2010 01:47:29 +0000 (18:47 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: preserve seq # on requeued messages after transient transport errors
ceph: fix cap removal races
ceph: zero unused message header, footer fields
ceph: fix locking for waking session requests after reconnect
ceph: resubmit requests on pg mapping change (not just primary change)
ceph: fix open file counting on snapped inodes when mds returns no caps
ceph: unregister osd request on failure
ceph: don't use writeback_control in writepages completion
ceph: unregister bdi before kill_anon_super releases device name
Linus Torvalds [Thu, 13 May 2010 01:39:45 +0000 (18:39 -0700)]
Revert "PCI: update bridge resources to get more big ranges in PCI assign unssigned"
This reverts commit 977d17bb1749517b353874ccdc9b85abc7a58c2a, because it
can cause problems with some devices not getting any resources at all
when the resource tree is re-allocated.