Yang Zhang [Mon, 23 Mar 2009 07:31:04 +0000 (03:31 -0400)]
KVM: ia64: enable external interrupt in vmm
Currently, the interrupt enable bit is cleared when in
the vmm. This patch sets the bit and the external interrupts can
be dealt with when in the vmm. This improves the I/O performance.
Signed-off-by: Yang Zhang <yang.zhang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Gleb Natapov [Mon, 23 Mar 2009 13:11:44 +0000 (15:11 +0200)]
KVM: Timer event should not unconditionally unhalt vcpu.
Currently timer events are processed before entering guest mode. Move it
to main vcpu event loop since timer events should be processed even while
vcpu is halted. Timer may cause interrupt/nmi to be injected and only then
vcpu will be unhalted.
Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Amit Shah [Fri, 20 Mar 2009 07:09:00 +0000 (12:39 +0530)]
KVM: x86: Ignore reads to EVNTSEL MSRs
We ignore writes to the performance counters and performance event
selector registers already. Kaspersky antivirus reads the eventsel
MSR causing it to crash with the current behaviour.
Return 0 as data when the eventsel registers are read to stop the
crash.
Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Thu, 12 Mar 2009 13:45:39 +0000 (21:45 +0800)]
KVM: Device assignment framework rework
After discussion with Marcelo, we decided to rework device assignment framework
together. The old problems are kernel logic is unnecessary complex. So Marcelo
suggest to split it into a more elegant way:
1. Split host IRQ assign and guest IRQ assign. And userspace determine the
combination. Also discard msi2intx parameter, userspace can specific
KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to
enable MSI to INTx convertion.
2. Split assign IRQ and deassign IRQ. Import two new ioctls:
KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ.
This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the
old interface).
[avi: replace homemade bitcount() by hweight_long()]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Hannes Eder [Tue, 10 Mar 2009 21:51:09 +0000 (22:51 +0100)]
KVM: make 'lapic_timer_ops' and 'kpit_ops' static
Fix this sparse warnings:
arch/x86/kvm/lapic.c:916:22: warning: symbol 'lapic_timer_ops' was not declared. Should it be static?
arch/x86/kvm/i8254.c:268:22: warning: symbol 'kpit_ops' was not declared. Should it be static?
Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Avi Kivity <avi@redhat.com>
Jes Sorensen [Wed, 25 Feb 2009 16:38:55 +0000 (10:38 -0600)]
KVM: ia64: Drop in SN2 replacement of fast path ITC emulation fault handler
Copy in SN2 RTC based ITC emulation for fast exit. The two versions
have the same size, so a dropin is simpler than patching the branch
instruction to hit the SN2 version.
Jes Sorensen [Wed, 25 Feb 2009 16:38:53 +0000 (10:38 -0600)]
KVM: ia64: Create inline function kvm_get_itc() to centralize ITC reading.
Move all reading of special register 'AR_ITC' into two functions, one
in the kernel and one in the VMM module. When running on SN2, base the
result on the RTC rather the system ITC, as the ITC isn't
synchronized.
Gleb Natapov [Thu, 5 Mar 2009 14:34:59 +0000 (16:34 +0200)]
KVM: change the way how lowest priority vcpu is calculated
The new way does not require additional loop over vcpus to calculate
the one with lowest priority as one is chosen during delivery bitmap
construction.
Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead
of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in
apic_send_ipi() to figure out ipi destination instead of reimplementing
the logic.
Sheng Yang [Wed, 4 Mar 2009 05:33:02 +0000 (13:33 +0800)]
KVM: Merge kvm_ioapic_get_delivery_bitmask into kvm_get_intr_delivery_bitmask
Gleb fixed bitmap ops usage in kvm_ioapic_get_delivery_bitmask.
Sheng merged two functions, as well as fixed several issues in
kvm_get_intr_delivery_bitmask
1. deliver_bitmask is a bitmap rather than a unsigned long intereger.
2. Lowest priority target bitmap wrong calculated by mistake.
3. Prevent potential NULL reference.
4. Declaration in include/kvm_host.h caused powerpc compilation warning.
5. Add warning for guest broadcast interrupt with lowest priority delivery mode.
6. Removed duplicate bitmap clean up in caller of kvm_get_intr_delivery_bitmask.
Marcelo Tosatti [Mon, 23 Feb 2009 13:57:41 +0000 (10:57 -0300)]
KVM: unify part of generic timer handling
Hide the internals of vcpu awakening / injection from the in-kernel
emulated timers. This makes future changes in this logic easier and
decreases the distance to more generic timer handling.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
KVM: declare ioapic functions only on affected hardware
Since "KVM: Unify the delivery of IOAPIC and MSI interrupts"
I get the following warnings:
CC [M] arch/s390/kvm/kvm-s390.o
In file included from arch/s390/kvm/kvm-s390.c:22:
include/linux/kvm_host.h:357: warning: 'struct kvm_ioapic' declared inside parameter list
include/linux/kvm_host.h:357: warning: its scope is only this definition or declaration, which is probably not what you want
This patch limits IOAPIC functions for architectures that have one.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
As it turns out, the call trace is messed up due to gcc's inlining, but
I isolated the problem anyway: kvm_write_guest_time() is being used in a
non-thread-safe manner on preemptable kernels.
Basically kvm_write_guest_time()'s body needs to be surrounded by
preempt_disable() and preempt_enable(), since the kernel won't let us
query any per-CPU data (indirectly using smp_processor_id()) without
preemption disabled. The attached patch fixes this issue by disabling
preemption inside kvm_write_guest_time().
[marcelo: surround only __get_cpu_var calls since the warning
is harmless]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Wed, 25 Feb 2009 09:22:28 +0000 (17:22 +0800)]
KVM: Enable MSI-X for KVM assigned device
This patch finally enable MSI-X.
What we need for MSI-X:
1. Intercept one page in MMIO region of device. So that we can get guest desired
MSI-X table and set up the real one. Now this have been done by guest, and
transfer to kernel using ioctl KVM_SET_MSIX_NR and KVM_SET_MSIX_ENTRY.
2. Information for incoming interrupt. Now one device can have more than one
interrupt, and they are all handled by one workqueue structure. So we need to
identify them. The previous patch enable gsi_msg_pending_bitmap get this done.
3. Mapping from host IRQ to guest gsi as well as guest gsi to real MSI/MSI-X
message address/data. We used same entry number for the host and guest here, so
that it's easy to find the correlated guest gsi.
What we lack for now:
1. The PCI spec said nothing can existed with MSI-X table in the same page of
MMIO region, except pending bits. The patch ignore pending bits as the first
step (so they are always 0 - no pending).
2. The PCI spec allowed to change MSI-X table dynamically. That means, the OS
can enable MSI-X, then mask one MSI-X entry, modify it, and unmask it. The patch
didn't support this, and Linux also don't work in this way.
3. The patch didn't implement MSI-X mask all and mask single entry. I would
implement the former in driver/pci/msi.c later. And for single entry, userspace
should have reposibility to handle it.
Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 24 Feb 2009 20:26:47 +0000 (22:26 +0200)]
KVM: VMX: Don't intercept MSR_KERNEL_GS_BASE
Windows 2008 accesses this MSR often on context switch intensive workloads;
since we run in guest context with the guest MSR value loaded (so swapgs can
work correctly), we can simply disable interception of rdmsr/wrmsr for this
MSR.
A complication occurs since in legacy mode, we run with the host MSR value
loaded. In this case we enable interception. This means we need two MSR
bitmaps, one for legacy mode and one for long mode.
Peter Botha [Wed, 10 Jun 2009 00:16:32 +0000 (17:16 -0700)]
char: mxser, fix ISA board lookup
There's a bug in the mxser kernel module that still appears in the
2.6.29.4 kernel.
mxser_get_ISA_conf takes a ioaddress as its first argument, by passing the
not of the ioaddr, you're effectively passing 0 which means it won't be
able to talk to an ISA card. I have tested this, and removing the !
fixes the problem.
Jan Kara [Tue, 9 Jun 2009 23:26:26 +0000 (16:26 -0700)]
jbd: fix race in buffer processing in commit code
In commit code, we scan buffers attached to a transaction. During this
scan, we sometimes have to drop j_list_lock and then we recheck whether
the journal buffer head didn't get freed by journal_try_to_free_buffers().
But checking for buffer_jbd(bh) isn't enough because a new journal head
could get attached to our buffer head. So add a check whether the journal
head remained the same and whether it's still at the same transaction and
list.
This is a nasty bug and can cause problems like memory corruption (use after
free) or trigger various assertions in JBD code (observed).
Signed-off-by: Jan Kara <jack@suse.cz> Cc: <stable@kernel.org> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Tue, 9 Jun 2009 23:26:24 +0000 (16:26 -0700)]
autofs4: remove hashed check in validate_wait()
The recent ->lookup() deadlock correction required the directory inode
mutex to be dropped while waiting for expire completion. We were
concerned about side effects from this change and one has been identified.
I saw several error messages.
They cause autofs to become quite confused and don't really point to the
actual problem.
Things like:
handle_packet_missing_direct:1376: can't find map entry for (43,1827932)
which is usually totally fatal (although in this case it wouldn't be
except that I treat is as such because it normally is).
do_mount_direct: direct trigger not valid or already mounted
/test/nested/g3c/s1/ss1
which is recoverable, however if this problem is at play it can cause
autofs to become quite confused as to the dependencies in the mount tree
because mount triggers end up mounted multiple times. It's hard to
accurately check for this over mounting case and automount shouldn't need
to if the kernel module is doing its job.
There was one other message, similar in consequence of this last one but I
can't locate a log example just now.
When checking if a mount has already completed prior to adding a new mount
request to the wait queue we check if the dentry is hashed and, if so, if
it is a mount point. But, if a mount successfully completed while we
slept on the wait queue mutex the dentry must exist for the mount to have
completed so the test is not really needed.
Mounts can also be done on top of a global root dentry, so for the above
case, where a mount request completes and the wait queue entry has already
been removed, the hashed test returning false can cause an incorrect
callback to the daemon. Also, d_mountpoint() is not sufficient to check
if a mount has completed for the multi-mount case when we don't have a
real mount at the base of the tree.
Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Tue, 9 Jun 2009 23:26:23 +0000 (16:26 -0700)]
shm: fix unused warnings on nommu
The massive nommu update (8feae131) resulted in these warnings:
ipc/shm.c: In function `sys_shmdt':
ipc/shm.c:974: warning: unused variable `size'
ipc/shm.c:972: warning: unused variable `next'
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
cls_cgroup: Fix oops when user send improperly 'tc filter add' request
r8169: fix crash when large packets are received
Linus Torvalds [Tue, 9 Jun 2009 15:41:22 +0000 (08:41 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md/raid5: fix bug in reshape code when chunk_size decreases.
md/raid5 - avoid deadlocks in get_active_stripe during reshape
md/raid5: use conf->raid_disks in preference to mddev->raid_disk
it turns out that we need clear cpus_hardware_enabled in that case.
Reported-and-tested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Minoru Usui [Tue, 9 Jun 2009 11:03:09 +0000 (04:03 -0700)]
cls_cgroup: Fix oops when user send improperly 'tc filter add' request
I found a bug in cls_cgroup_change() in cls_cgroup.c.
cls_cgroup_change() expected tca[TCA_OPTIONS] was set from user space properly,
but tc in iproute2-2.6.29-1 (which I used) didn't set it.
In the current source code of tc in git, it set tca[TCA_OPTIONS].
If we always use a newest iproute2 in git when we use cls_cgroup,
we don't face this oops probably.
But I think, kernel shouldn't panic regardless of use program's behaviour.
Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 9 Jun 2009 11:01:02 +0000 (04:01 -0700)]
r8169: fix crash when large packets are received
Michael Tokarev reported receiving a large packet could crash
a machine with RTL8169 NIC.
( original thread at http://lkml.org/lkml/2009/6/8/192 )
Problem is this driver tells that NIC frames up to 16383 bytes
can be received but provides skb to rx ring allocated with
smaller sizes (1536 bytes in case standard 1500 bytes MTU is used)
When a frame larger than what was allocated by driver is received,
dma transfert can occurs past the end of buffer and corrupt
kernel memory.
Fix is to tell to NIC what is the maximum size a frame can be.
This bug is very old, (before git introduction, linux-2.6.10), and
should be backported to stable versions.
Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
NeilBrown [Tue, 9 Jun 2009 06:32:22 +0000 (16:32 +1000)]
md/raid5: fix bug in reshape code when chunk_size decreases.
Now that we support changing the chunksize, we calculate
"reshape_sectors" to be the max of number of sectors in old
and new chunk size.
However there is one please where we still use 'chunksize'
rather than 'reshape_sectors'.
This causes a reshape that reduces the size of chunks to freeze.
NeilBrown [Tue, 9 Jun 2009 04:39:59 +0000 (14:39 +1000)]
md/raid5 - avoid deadlocks in get_active_stripe during reshape
md has functionality to 'quiesce' and array so that all pending
IO completed and no new IO starts. This is used to achieve a
stable state before making internal changes.
Currently this quiescing applies equally to normal IO, resync
IO, and reshape IO.
However there is a problem with applying it to reshape IO.
Reshape can have multiple 'stripe_heads' that must be active together.
If the quiesce come between allocating the first and the last of
such a collection, then we deadlock, as the last will not be allocated
until the quiesce is lifted, the quiesce will not be lifted until the
first (which has been allocated) gets used, and that first cannot be
used until the last is allocated.
It is not necessary to inhibit reshape IO when a quiesce is
requested. Those places in the code that require a full quiesce will
ensure the reshape thread is not running at all.
So allow reshape requests to get access to new stripe_heads without
being blocked by a 'quiesce'.
This only affects in-place reshapes (i.e. where the array does not
grow or shrink) and these are only newly supported. So this patch is
not needed in earlier kernels.
Linus Torvalds [Mon, 8 Jun 2009 19:31:53 +0000 (12:31 -0700)]
async: Fix lack of boot-time console due to insufficient synchronization
Our async work synchronization was broken by "async: make sure
independent async domains can't accidentally entangle" (commit d5a877e8dd409d8c702986d06485c374b705d340), because it would report
the wrong lowest active async ID when there was both running and
pending async work.
This caused things like no being able to read the root filesystem,
resulting in missing console devices and inability to run 'init',
causing a boot-time panic.
This fixes it by properly returning the lowest pending async ID: if
there is any running async work, that will have a lower ID than any
pending work, and we should _not_ look at the pending work list.
There were alternative patches from Jaswinder and James, but this one
also cleans up the code by removing the pointless 'ret' variable and
the unnecesary testing for an empty list around 'for_each_entry()' (if
the list is empty, the for_each_entry() thing just won't execute).
Fixes-bug: http://bugzilla.kernel.org/show_bug.cgi?id=13474 Reported-and-tested-by: Chris Clayton <chris2553@googlemail.com> Cc: Jaswinder Singh Rajput <jaswinder@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 8 Jun 2009 16:22:53 +0000 (09:22 -0700)]
Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
MIPS: Outline udelay and fix a few issues.
MIPS: ioctl.h: Fix headers_check warnings
MIPS: Cobalt: PCI bus is always required to obtain the board ID
MIPS: Kconfig: Remove "Support for" from Cavium system type
MIPS: Sibyte: Honor CONFIG_CMDLINE
SSB: BCM47xx: Export ssb_watchdog_timer_set
Ralf Baechle [Sat, 28 Feb 2009 09:44:28 +0000 (09:44 +0000)]
MIPS: Outline udelay and fix a few issues.
Outlining fixes the issue were on certain CPUs such as the R10000 family
the delay loop would need an extra cycle if it overlaps a cacheline
boundary.
The rewrite also fixes build errors with GCC 4.4 which was changed in
way incompatible with the kernel's inline assembly.
Relying on pure C for computation of the delay value removes the need for
explicit. The price we pay is a slight slowdown of the computation - to
be fixed on another day.
Linus Torvalds [Mon, 8 Jun 2009 15:29:31 +0000 (08:29 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
[ARM] 5543/1: arm: serial amba: add missing declaration in serial.h
[ARM] pxa: fix pxa27x_udc default pullup GPIO
[ARM] pxa/imote2: fix UCAM sensor board ADC model number
mx[23]: don't put clock lookups in __initdata
fix oops when using console=ttymxcN with N > 0
[ARM] ARMv7 errata: only apply fixes when running on applicable CPU
[ARM] 5534/1: kmalloc must return a cache line aligned buffer
Linus Torvalds [Mon, 8 Jun 2009 14:53:59 +0000 (07:53 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
sdhci-of: Fix the wrong accessor to HOSTVER register
mvsdio: fix config failure with some high speed SDHC cards
mvsdio: ignore high speed timing requests from the core
mmc/omap: Use disable_irq_nosync() from within irq handlers.
sdhci-of: Add fsl,esdhc as a valid compatible to bind against
mvsdio: allow automatic loading when modular
mxcmmc: Fix missing return value checking in DMA setup code.
mxcmmc : Reset the SDHC hardware if software timeout occurs.
omap_hsmmc: Trivial fix for a typo in comment
mxcmmc: decrease minimum frequency to make MMC cards work
Avi Kivity [Sat, 6 Jun 2009 09:34:39 +0000 (12:34 +0300)]
KVM: Explicity initialize cpus_hardware_enabled
Under CONFIG_MAXSMP, cpus_hardware_enabled is allocated from the heap and
not statically initialized. This causes a crash on reboot when kvm thinks
vmx is enabled on random nonexistent cpus and accesses nonexistent percpu
lists.
Fix by explicitly clearing the variable.
Cc: stable@kernel.org Reported-and-tested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Avi Kivity <avi@redhat.com>
[ARM] 5543/1: arm: serial amba: add missing declaration in serial.h
This header is sometimes included in the uncompress stage to get
register values, but no <linux/amba/bus.h> can be included there.
So declare "struct amba_device" here before using it in a prototype.
Signed-off-by: Alessandro Rubini <rubini@unipv.it> Acked-by: Andrea Gallo <andrea.gallo@stericsson.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Sergei Shtylyov [Sun, 7 Jun 2009 11:52:50 +0000 (13:52 +0200)]
pdc202xx_old: fix resetproc() method
pdc202xx_reset() calls pdc202xx_reset_host() twice, for both channels, while
that function actually twiddles the single, shared software reset bit -- the
net effect is a duplicated reset and horrendous 4 second delay happening not
only on a channel reset but also when dma_lost_irq() and dma_clear() methods
are called. Fold pdc202xx_reset_host() into pdc202xx_reset(), fix printk(),
and move it before the actual reset...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Sergei Shtylyov [Sun, 7 Jun 2009 11:52:50 +0000 (13:52 +0200)]
pdc202xx_old: fix 'pdc20246_dma_ops'
Commit ac95beedf8bc97b24f9540d4da9952f07221c023 (ide: add struct ide_port_ops
(take 2)) erroneously converted the driver's dma_timeout() and dma_lost_irq()
methods to call the driver's resetproc() method regardless of whether it was
defined for this specific controller while it hadn't been defined and hence
called for PDC20246. So the dma_clear() method, the successor of dma_timeout(),
shouldn't exist and the dma_lost_irq() method should be standard for PDC20246.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Linus Torvalds [Sat, 6 Jun 2009 21:33:54 +0000 (14:33 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86/pci: fix mmconfig detection with 32bit near 4g
PCI: use fixed-up device class when configuring device
Hugh Dickins [Sat, 6 Jun 2009 20:18:09 +0000 (21:18 +0100)]
integrity: fix IMA inode leak
CONFIG_IMA=y inode activity leaks iint_cache and radix_tree_node objects
until the system runs out of memory. Nowhere is calling ima_inode_free()
a.k.a. ima_iint_delete(). Fix that by calling it from destroy_inode().
Linus Torvalds [Sat, 6 Jun 2009 19:18:14 +0000 (12:18 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
ext3/4 with synchronous writes gets wedged by Postfix
Fix nobh_truncate_page() to not pass stack garbage to get_block()
Al Viro [Wed, 13 May 2009 18:13:40 +0000 (19:13 +0100)]
ext3/4 with synchronous writes gets wedged by Postfix
OK, that's probably the easiest way to do that, as much as I don't like it...
Since iget() et.al. will not accept I_FREEING (will wait to go away
and restart), and since we'd better have serialization between new/free
on fs data structures anyway, we can afford simply skipping I_FREEING
et.al. in insert_inode_locked().
We do that from new_inode, so it won't race with free_inode in any interesting
ways and it won't race with iget (of any origin; nfsd or in case of fs
corruption a lookup) since both still will wait for I_LOCK.
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Jan Kara <jack@suse.cz> Tested-by: David Watson <dbwatson@ukfsn.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Theodore Ts'o [Tue, 12 May 2009 11:37:56 +0000 (07:37 -0400)]
Fix nobh_truncate_page() to not pass stack garbage to get_block()
The nobh_truncate_page() function is used by ext2, exofs, and jfs. Of
these three, only ext2 and jfs's get_block() function pays attention
to bh->b_size --- which is normally always the filesystem blocksize
except when the get_block() function is called by either
mpage_readpage(), mpage_readpages(), or the direct I/O routines in
fs/direct_io.c.
Unfortunately, nobh_truncate_page() does not initialize map_bh before
calling the filesystem-supplied get_block() function. So ext2 and jfs
will try to calculate the number of blocks to map by taking stack
garbage and shifting it left by inode->i_blkbits. This should be
*mostly* harmless (except the filesystem will do some unnneeded work)
unless the stack garbage is less than filesystem's blocksize, in which
case maxblocks will be zero, and the attempt to find out whether or
not the filesystem has a hole at a given logical block will fail, and
the page cache entry might not get zero'ed out.
Also if the stack garbage in in map_bh->state happens to have the
BH_Mapped bit set, there could be an attempt to call readpage() on a
non-existent page, which could cause nobh_truncate_page() to return an
error when it should not.
Fix this by initializing map_bh->state and map_bh->size.
Fortunately, it's probably fairly unlikely that ext2 and jfs users
mount with nobh these days.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Alan Cox [Wed, 13 May 2009 14:02:27 +0000 (15:02 +0100)]
[libata] pata_ali: Use IGN_SIMPLEX
Some ALi devices report simplex if they have been disabled and re-enabled, and
restoring the byte does not work. Ignore it - the needed supporting logic is
already present for the SATA ULi ports.
Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: Fix oops and use after free during space balancing
Btrfs: set device->total_disk_bytes when adding new device
Linus Torvalds [Fri, 5 Jun 2009 18:53:44 +0000 (11:53 -0700)]
Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
ata_piix: Add HP Compaq nc6000 to the broken poweroff list
ahci: add warning messages for hp laptops with broken suspend
pata_efar: fix PIO2 underclocking
pata_legacy: wait for async probing
Tejun Heo [Sat, 30 May 2009 11:50:12 +0000 (20:50 +0900)]
ahci: add warning messages for hp laptops with broken suspend
Harddisks on HP dv[4-6] and HDX18 fail to come online after resume on
earlier BIOSen. Fortunately, HP recently released BIOS updates for
all machines to fix the issue. Detect old BIOSen, warn the user to
update BIOS on boot and suspend attempts and fail suspend.
Kudos to all the bug reporters.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: kernel.org@epperson.homelinux.net Cc: emisca@gmail.com Cc: Gadi Cohen <dragon@wastelands.net> Cc: Paul Swanson <paul@procursa.com> Cc: s@ourada.org Cc: Trevor Davenport <trevor.davenport@gmail.com> Cc: corruptor1972 <steven_tierney@yahoo.co.uk> Cc: Victoria Wilson <mail@vwilson.co.uk> Cc: khiraly <khiraly.list@gmail.com> Cc: Sean <wollombi@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Sergei Shtylyov [Mon, 1 Jun 2009 19:42:10 +0000 (22:42 +0300)]
pata_efar: fix PIO2 underclocking
Fix the PIO mode 2 using mode 0 timings -- this driver should enable the
fast timing bank starting with PIO2, just like the PIIX/ICH drivers do.
Also, fix/rephrase some comments while at it.
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
James Bottomley [Fri, 5 Jun 2009 14:41:39 +0000 (10:41 -0400)]
pata_legacy: wait for async probing
The basic problem here that pata_legacy attaches the host, sees if it found
any devices and detaches it if none were found. With async probing, it's not
waiting until discovery is finished before deciding it has no devices and
trying the detach leading to this warning:
One way to fix it would be to put an async_synchronize_full() before looking
for devices, which this patch does. A better way might be to separate libata
into its own domain and only wait for that.
Reported-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Dave Jones [Fri, 5 Jun 2009 16:37:07 +0000 (12:37 -0400)]
[CPUFREQ] powernow-k8: check space_id of _PCT registers to be FFH
The powernow-k8 driver checks to see that the Performance Control/Status
Registers are declared as FFH (functional fixed hardware) by the BIOS.
However, this check got broken in the commit: 0e64a0c982c06a6b8f5e2a7f29eb108fdf257b2f
[CPUFREQ] checkpatch cleanups for powernow-k8
Fix based on an original patch from Naga Chumbalkar.
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com> Cc: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Dave Jones <davej@redhat.com>
Alan Cox [Fri, 5 Jun 2009 10:56:18 +0000 (11:56 +0100)]
ivtv: Fix PCI DMA direction
The ivtv stream buffers may be for receive or for send but the attached
sg handle is always destined cpu->device. We flush it correctly but the
allocation is wrongly done with the same type as the buffers.
See bug: http://bugzilla.kernel.org/show_bug.cgi?id=13385
(Note this doesn't close the bug - it fixes the ivtv part and in turn
the logging next shows up some rather alarming DMA sg list warnings in
libata)
Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>