git.karo-electronics.de Git - linux-beck.git/log

KVM: SVM: Check for nested intercepts on NMI injection

This patch implements the NMI intercept checking for nested
svm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Reset MMU on nested_svm_vmrun for NPT too

Without resetting the MMU the gva_to_pga function will not
work reliably when the vcpu is running in nested context.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Coding style cleanup

This patch removes whitespace errors, fixes comment formats
and most of checkpatch warnings. Now vim does not show
c-space-errors anymore.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: x86: Preserve injected TF across emulation

Call directly into the vendor services for getting/setting rflags in
emulate_instruction to ensure injected TF survives the emulation.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: x86: Drop RF manipulation for guest single-stepping

RF is not required for injecting TF as the latter will trigger only
after an instruction execution anyway. So do not touch RF when arming or
disarming guest single-step mode.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Emulate nRIP feature when reinjecting INT3

When in guest debugging mode, we have to reinject those #BP software
exceptions that are caused by guest-injected INT3. As older AMD
processors do not support the required nRIP VMCB field, try to emulate
it by moving RIP past the instruction on exception injection. Fix it up
again in case the injection failed and we were able to catch this. This
does not work for unintercepted faults, but it is better than doing
nothing.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: x86: Add kvm_is_linear_rip

Based on Gleb's suggestion: Add a helper kvm_is_linear_rip that matches
a given linear RIP against the current one. Use this for guest
single-stepping, more users will follow.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Move svm_queue_exception

Move svm_queue_exception past skip_emulated_instruction to allow calling
it later on.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: x86: Kick VCPU outside PIC lock again

This restores the deferred VCPU kicking before 956f97cf. We need this
over -rt as wake_up* requires non-atomic context in this configuration.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Destory timer on vcpu destruction

When we destory a vcpu, we should also make sure to kill all pending
timers that could still be up. When not doing this, hrtimers might
dereference null pointers trying to call our code.

This patch fixes spontanious kernel panics seen after closing VMs.

Signed-off-by: Alexander Graf <alex@csgraf.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Memset vcpu to zeros

While converting the kzalloc we used to allocate our vcpu struct to
vmalloc, I forgot to memset the contents to zeros. That broke quite
a lot.

This patch memsets it to zero again.

Signed-off-by: Alexander Graf <alex@csgraf.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: x86: Add support for saving&restoring debug registers

So far user space was not able to save and restore debug registers for
migration or after reset. Plug this hole.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: x86: Save&restore interrupt shadow mask

The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.

As a side effect we now have to make vmx_set_interrupt_shadow robust
against both shadow types being set. Give MOV SS a higher priority and
skip STI in that case to avoid that VMX throws a fault on next entry.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: x86: Do not return soft events in vcpu_events

To avoid that user space migrates a pending software exception or
interrupt, mask them out on KVM_GET_VCPU_EVENTS. Without this, user
space would try to reinject them, and we would have to reconstruct the
proper instruction length for VMX event injection. Now the pending event
will be reinjected via executing the triggering instruction again.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Fix wrong interrupt injection in enable_irq_windows

The nested_svm_intr() function does not execute the vmexit
anymore. Therefore we may still be in the nested state after
that function ran. This patch changes the nested_svm_intr()
function to return wether the irq window could be enabled.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: drop unneeded kvm_run check in emulate_instruction()

vcpu->run is initialized on vcpu creation and can never be NULL
here.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Allocate vcpu struct using vmalloc

We used to use get_free_pages to allocate our vcpu struct. Unfortunately
that call failed on me several times after my machine had a big enough
uptime, as memory became too fragmented by then.

Fortunately, we don't need it to be page aligned any more! We can just
vmalloc it and everything's great.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Simplify kvmppc_load_up_(FPU|VMX|VSX)

We don't need as complex code. I had some thinkos while writing it, figuring
I needed to support PPC32 paths on PPC64 which would have required DR=0, but
everything just runs fine with DR=1.

So let's make the functions simple C call wrappers that reserve some space on
the stack for the respective functions to clobber.

Fixes out-of-RMA-access (and thus guest FPU loading) on the PS3.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Enable use of secondary htab bucket

We had code to make use of the secondary htab buckets, but kept that
disabled because it was unstable when I put it in.

I checked again if that's still the case and apparently it was only
exposing some instability that was there anyways before. I haven't
seen any badness related to usage of secondary htab entries so far.

This should speed up guest memory allocations by quite a bit, because
we now have more space to put PTEs in.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Add capability for paired singles

We need to tell userspace that we can emulate paired single instructions.
So let's add a capability export.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Implement Paired Single emulation

The one big thing about the Gekko is paired singles.

Paired singles are an extension to the instruction set, that adds 32 single
precision floating point registers (qprs), some SPRs to modify the behavior
of paired singled operations and instructions to deal with qprs to the
instruction set.

Unfortunately, it also changes semantics of existing operations that affect
single values in FPRs. In most cases they get mirrored to the coresponding
QPR.

Thanks to that we need to emulate all FPU operations and all the new paired
single operations too.

In order to achieve that, we use the just introduced FPU call helpers to
call the real FPU whenever the guest wants to modify an FPR. Additionally
we also fix up the QPR values along the way.

That way we can execute paired single FPU operations without implementing a
soft fpu.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Enable program interrupt to do MMIO

When we get a program interrupt we usually don't expect it to perform an
MMIO operation. But why not? When we emulate paired singles, we can end
up loading or storing to an MMIO address - and the handling of those
happens in the program interrupt handler.

So let's teach the program interrupt handler how to deal with EMULATE_MMIO.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Add helpers to modify ppc fields

The PowerPC specification always lists bits from MSB to LSB. That is
really confusing when you're trying to write C code, because it fits
in pretty badly with the normal (1 << xx) schemes.

So I came up with some nice wrappers that allow to get and set fields
in a u64 with bit numbers exactly as given in the spec. That makes the
code in KVM and the spec easier comparable.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Fix error in BAT assignment

BATs didn't work. Well, they did, but only up to BAT3. As soon as we
came to BAT4 the offset calculation was screwed up and we ended up
overwriting BAT0-3.

Fortunately, Linux hasn't been using BAT4+. It's still a good
idea to write correct code though.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Add helpers to call FPU instructions

To emulate paired single instructions, we need to be able to call FPU
operations from within the kernel. Since we don't want gcc to spill
arbitrary FPU code everywhere, we tell it to use a soft fpu.

Since we know we can really call the FPU in safe areas, let's also add
some calls that we can later use to actually execute real world FPU
operations on the host's FPU.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Make ext giveup non-static

We need to call the ext giveup handlers from code outside of book3s.c.
So let's make it non-static.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Make software load/store return eaddr

The Book3S KVM implementation contains some helper functions to load and store
data from and to virtual addresses.

Unfortunately, this helper used to keep the physical address it so nicely
found out for us to itself. So let's change that and make it return the
physical address it resolved.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Implement mtsr instruction emulation

The Book3S_32 specifications allows for two instructions to modify segment
registers: mtsrin and mtsr.

Most normal operating systems use mtsrin, because it allows to define which
segment it wants to change using a register. But since I was trying to run
an embedded guest, it turned out to be using mtsr with hardcoded values.

So let's also emulate mtsr. It's a valid instruction after all.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Fix typo in book3s_32 debug code

There's a typo in the debug ifdef of the book3s_32 mmu emulation. While trying
to debug something I stumbled across that and wanted to save anyone after me
(or myself later) from having to debug that again.

So let's fix the ifdef.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Preload FPU when possible

There are some situations when we're pretty sure the guest will use the
FPU soon. So we can save the churn of going into the guest, finding out
it does want to use the FPU and going out again.

This patch adds preloading of the FPU when it's reasonable.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Combine extension interrupt handlers

When we for example get an Altivec interrupt, but our guest doesn't support
altivec, we need to inject a program interrupt, not an altivec interrupt.

The same goes for paired singles. When an altivec interrupt arrives, we're
pretty sure we need to emulate the instruction because it's a paired single
operation.

So let's make all the ext handlers aware that they need to jump to the
program interrupt handler when an extension interrupt arrives that
was not supposed to arrive for the guest CPU.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Add Gekko SPRs

The Gekko has some SPR values that differ from other PPC core values and
also some additional ones.

Let's add support for them in our mfspr/mtspr emulator.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Add hidden flag for paired singles

The Gekko implements an extension called paired singles. When the guest wants
to use that extension, we need to make sure we're not running the host FPU,
because all FPU instructions need to get emulated to accomodate for additional
operations that occur.

This patch adds an hflag to track if we're in paired single mode or not.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Add AGAIN type for emulation return

Emulation of an instruction can have different outcomes. It can succeed,
fail, require MMIO, do funky BookE stuff - or it can just realize something's
odd and will be fixed the next time around.

Exactly that is what EMULATE_AGAIN means. Using that flag we can now tell
the caller that nothing happened, but we still want to go back to the
guest and see what happens next time we come around.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Teach MMIO Signedness

The guest I was trying to get to run uses the LHA and LHAU instructions.
Those instructions basically do a load, but also sign extend the result.

Since we need to fill our registers by hand when doing MMIO, we also need
to sign extend manually.

This patch implements sign extended MMIO and the LHA(U) instructions.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Enable MMIO to do 64 bits, fprs and qprs

Right now MMIO access can only happen for GPRs and is at most 32 bit wide.
That's actually enough for almost all types of hardware out there.

Unfortunately, the guest I was using used FPU writes to MMIO regions, so
it ended up writing 64 bit MMIOs using FPRs and QPRs.

So let's add code to handle those odd cases too.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Make fpscr 64-bit

Modern PowerPCs have a 64 bit wide FPSCR register. Let's accomodate for that
and make it 64 bits in our vcpu struct too.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: PPC: Add QPR registers

The Gekko has GPRs, SPRs and FPRs like normal PowerPC codes, but
it also has QPRs which are basically single precision only FPU registers
that get used when in paired single mode.

The following patches depend on them being around, so let's add the
definitions early.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Remove newlines from nested trace points

The tracing infrastructure adds its own newlines. Remove
them from the trace point printk format strings.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Make lazy FPU switching work with nested svm

The new lazy fpu switching code may disable cr0 intercepts
when running nested. This is a bug because the nested
hypervisor may still want to intercept cr0 which will break
in this situation. This patch fixes this issue and makes
lazy fpu switching working with nested svm.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Activate nested state only when guest state is complete

Certain functions called during the emulated world switch
behave differently when the vcpu is running nested. This is
not the expected behavior during a world switch emulation.
This patch ensures that the nested state is activated only
if the vcpu is completly in nested state.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Don't sync nested cr8 to lapic and back

This patch makes syncing of the guest tpr to the lapic
conditional on !nested. Otherwise a nested guest using the
TPR could freeze the guest.
Another important change this patch introduces is that the
cr8 intercept bits are no longer ORed at vmrun emulation if
the guest sets VINTR_MASKING in its VMCB. The reason is that
nested cr8 accesses need alway be handled by the nested
hypervisor because they change the shadow version of the
tpr.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Fix nested msr intercept handling

The nested_svm_exit_handled_msr() function maps only one
page of the guests msr permission bitmap. This patch changes
the code to use kvm_read_guest to fix the bug.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Annotate nested_svm_map with might_sleep()

The nested_svm_map() function can sleep and must not be
called from atomic context. So annotate that function.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Sync all control registers on nested vmexit

Currently the vmexit emulation does not sync control
registers were the access is typically intercepted by the
nested hypervisor. But we can not count on that intercepts
to sync these registers too and make the code
architecturally more correct.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Fix schedule-while-atomic on nested exception handling

Move the actual vmexit routine out of code that runs with
irqs and preemption disabled.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Don't use kmap_atomic in nested_svm_map

Use of kmap_atomic disables preemption but if we run in
shadow-shadow mode the vmrun emulation executes kvm_set_cr3
which might sleep or fault. So use kmap instead for
nested_svm_map.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: remove redundant prototype of load_pdptrs()

This patch removes redundant prototype of load_pdptrs().

I found load_pdptrs() twice in kvm_host.h. Let's remove one.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: x86 emulator: Fix x86_emulate_insn() not to use the variable rc for non-X86EMUL values

This patch makes non-X86EMUL_* family functions not to use
the variable rc.

Be sure that this changes nothing but makes the purpose of
the variable rc clearer.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: x86 emulator: X86EMUL macro replacements: x86_emulate_insn() and its helpers

This patch just replaces integer values used inside
x86_emulate_insn() and its helper functions to X86EMUL_*.

The purpose of this is to make it clear what will happen
when the variable rc is compared to X86EMUL_* at the end
of x86_emulate_insn().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: x86 emulator: X86EMUL macro replacements: from do_fetch_insn_byte() to x86_decode_insn()

This patch just replaces the integer values used inside x86's
decode functions to X86EMUL_*.

By this patch, it becomes clearer that we are using X86EMUL_*
value propagated from ops->read_std() in do_fetch_insn_byte().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: inject #UD in 64bit mode from instruction that are not valid there

Some instruction are obsolete in a long mode. Inject #UD.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: use desc_ptr struct instead of kvm private descriptor_table

x86 arch defines desc_ptr for idt/gdt pointers, no need to define
another structure in kvm code.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: add doc note about PIO/MMIO completion API

Document that partially emulated instructions leave the guest state
inconsistent, and that the kernel will complete operations before
checking for pending signals.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: allow 4 coldfire serial ports
  m68knommu: fix coldfire tcdrain
  m68knommu: remove a duplicate vector setting line for 68360
  Fix m68k-uclinux's rt_sigreturn trampoline
  m68knommu: correct the CC flags for Coldfire M5272 targets
  uclinux: error message when FLAT reloc symbol is invalid, v2

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
mc13783-regulator: fix a memory leak in mc13783_regulator_remove
regulator: Let drivers know when they use the stub API

Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs

* git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
  [LogFS] Split large truncated into smaller chunks
  [LogFS] Set s_bdi
  [LogFS] Prevent mempool_destroy NULL pointer dereference
  [LogFS] Move assertion
  [LogFS] Plug 8 byte information leak
  [LogFS] Prevent memory corruption on large deletes
  [LogFS] Remove unused method

Fix trivial conflict with added header includes in fs/logfs/super.c

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6:
  jfs: add jfs specific ->setattr call
  jfs: fix diAllocExt error in resizing filesystem
  jfs_dmap.[ch]: trivial typo fix: s/heigth/height/g

Merge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm

* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Fix TSS size check for 16-bit tasks
  KVM: Add missing srcu_read_lock() for kvm_mmu_notifier_release()
  KVM: Increase NR_IOBUS_DEVS limit to 200
  KVM: fix the handling of dirty bitmaps to avoid overflows
  KVM: MMU: fix kvm_mmu_zap_page() and its calling path
  KVM: VMX: Save/restore rflags.vm correctly in real mode
  KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL
  KVM: Don't spam kernel log when injecting exceptions due to bad cr writes
  KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails
  KVM: take srcu lock before call to complete_pio()

Merge branch 'for-linus' of git://neil.brown.name/md

* 'for-linus' of git://neil.brown.name/md:
md/raid5: allow for more than 2^31 chunks.

AFS: Don't pass error value to page_cache_release() in error handling

In the error handling in afs_mntpt_do_automount(), we pass an error
pointer to page_cache_release() if read_mapping_page() failed. Instead,
we should extend the gotos around the error handling we don't need.

Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

KVM: x86: Fix TSS size check for 16-bit tasks

A 16-bit TSS is only 44 bytes long. So make sure to test for the correct
size on task switch.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: Add missing srcu_read_lock() for kvm_mmu_notifier_release()

I got this dmesg due to srcu_read_lock() is missing in
kvm_mmu_notifier_release().

===================================================
[ INFO: suspicious rcu_dereference_check() usage. ]
---------------------------------------------------
arch/x86/kvm/x86.h:72 invoked rcu_dereference_check() without protection!

other info that might help us debug this:

rcu_scheduler_active = 1, debug_locks = 0
2 locks held by qemu-system-x86/3100:
#0: (rcu_read_lock){.+.+..}, at: [<ffffffff810d73dc>] __mmu_notifier_release+0x38/0xdf
#1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa0130a6a>] kvm_mmu_zap_all+0x21/0x5e [kvm]

stack backtrace:
Pid: 3100, comm: qemu-system-x86 Not tainted 2.6.34-rc3-22949-gbc8a97a-dirty #2
Call Trace:
[<ffffffff8106afd9>] lockdep_rcu_dereference+0xaa/0xb3
[<ffffffffa0123a89>] unalias_gfn+0x56/0xab [kvm]
[<ffffffffa0119600>] gfn_to_memslot+0x16/0x25 [kvm]
[<ffffffffa012ffca>] gfn_to_rmap+0x17/0x6e [kvm]
[<ffffffffa01300c1>] rmap_remove+0xa0/0x19d [kvm]
[<ffffffffa0130649>] kvm_mmu_zap_page+0x109/0x34d [kvm]
[<ffffffffa0130a7e>] kvm_mmu_zap_all+0x35/0x5e [kvm]
[<ffffffffa0122870>] kvm_arch_flush_shadow+0x16/0x22 [kvm]
[<ffffffffa01189e0>] kvm_mmu_notifier_release+0x15/0x17 [kvm]
[<ffffffff810d742c>] __mmu_notifier_release+0x88/0xdf
[<ffffffff810d73dc>] ? __mmu_notifier_release+0x38/0xdf
[<ffffffff81040848>] ? exit_mm+0xe0/0x115
[<ffffffff810c2cb0>] exit_mmap+0x2c/0x17e
[<ffffffff8103c472>] mmput+0x2d/0xd4
[<ffffffff81040870>] exit_mm+0x108/0x115
[...]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

m68knommu: allow 4 coldfire serial ports

Fix driver/serial/mcf.c for 4-ports coldfire's (e.g. MCF5484).

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>

m68knommu: fix coldfire tcdrain

Fix tcdrain on coldfire uarts.
Currently with coldfire uarts tcdrain returns without waiting for txempty,
because (tx)fifosize is 0. Fix that and call uart_update_timeout when
setting the baud rate, otherwise tcdrain will wait for an half our :)
Also constify mcf_uart_ops.

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>

m68knommu: remove a duplicate vector setting line for 68360

Remove a duplicate vector setting line for the 68360 interrupt
setup. Pointed out by Roel Kluin <roel.kluin@gmail.com>

Signed-off-by: Greg Ungerer <gerg@uclinux.org>

Fix m68k-uclinux's rt_sigreturn trampoline

Signed-off-by: Maxim Kuvyrkov <maxim@codesourcery.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>

m68knommu: correct the CC flags for Coldfire M5272 targets

Signed-off-by: Philip Nye <philipn@engarts.com>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>

uclinux: error message when FLAT reloc symbol is invalid, v2

This patch fixes a cosmetic error in printk. Text segment and data/bss
segment are allocated from two different areas. It is not meaningful to
give the diff between them in the error reporting messages.

Signed-off-by: Jun Sun <jsun@junsun.net>
Signed-off-by: Greg Ungerer <gerg@uclinux.org>

[LogFS] Split large truncated into smaller chunks

Truncate would do an almost limitless amount of work without invoking
the garbage collector in between. Split it up into more manageable,
though still large, chunks.

Signed-off-by: Joern Engel <joern@logfs.org>

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
quota: Convert __DQUOT_PARANOIA symbol to standard config option

quota: Convert __DQUOT_PARANOIA symbol to standard config option

Make __DQUOT_PARANOIA define from the old days a standard config option
and turn it off by default.

This gets rid of a quota warning about writes before quota is turned on
for systems with ext4 root filesystem. Currently there's no way to legally
solve this because /etc/mtab has to be written before quota is turned on
on most systems.

Signed-off-by: Jan Kara <jack@suse.cz>

Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6

* 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
  pcmcia: fix error handling in cm4000_cs.c
  drivers/pcmcia: Add missing local_irq_restore
  serial_cs: MD55x support (PCMCIA GPRS/EDGE modem) (kernel 2.6.33)
  pcmcia: avoid late calls to pccard_validate_cis
  pcmcia: fix ioport size calculation in rsrc_nonstatic
  pcmcia: re-start on MFC override
  pcmcia: fix io_probe due to parent (PCI) resources
  pcmcia: use previously assigned IRQ for all card functions

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix hardirq tracing in trap return path.
  sparc64: Use correct pt_regs in decode_access_size() error paths.
  sparc64: Fix PREEMPT_ACTIVE value.
  sparc64: Run NMIs on the hardirq stack.
  sparc64: Allocate sufficient stack space in ftrace stubs.
  sparc: Fix forgotten kmemleak headers inclusion

Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Fix unsafe frame rewinding with hot regs fetching

Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6

* 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm: delay vblank cleanup until after driver unload

x86: correctly wire up the newuname system call

Before commit e28cbf22933d0c0ccaf3c4c27a1a263b41f73859 ("improve
sys_newuname() for compat architectures") 64-bit x86 had a private
implementation of sys_uname which was just called sys_uname, which other
architectures used for the old uname.

Due to some merge issues with the uname refactoring patches we ended up
calling the old uname version for both the old and new system call
slots, which lead to the domainname filed never be set which caused
failures with libnss_nis.

Reported-and-tested-by: Andy Isaacson <adi@hexapodia.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

KVM: Increase NR_IOBUS_DEVS limit to 200

This patch increases the current hardcoded limit of NR_IOBUS_DEVS
from 6 to 200. We are hitting this limit when creating a guest with more
than 1 virtio-net device using vhost-net backend. Each virtio-net
device requires 2 such devices to service notifications from rx/tx queues.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: fix the handling of dirty bitmaps to avoid overflows

Int is not long enough to store the size of a dirty bitmap.

This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.

Note: in mark_page_dirty(), we have to consider the fact that
__set_bit() takes the offset as int, not long.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: MMU: fix kvm_mmu_zap_page() and its calling path

This patch fix:

- calculate zapped page number properly in mmu_zap_unsync_children()
- calculate freeed page number properly kvm_mmu_change_mmu_pages()
- if zapped children page it shoud restart hlist walking

KVM-Stable-Tag.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: VMX: Save/restore rflags.vm correctly in real mode

Currently we set eflags.vm unconditionally when entering real mode emulation
through virtual-8086 mode, and clear it unconditionally when we enter protected
mode.  The means that the following sequence

  KVM_SET_REGS  (rflags.vm=1)
  KVM_SET_SREGS (cr0.pe=1)

Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode().

Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode:
reads and writes to those bits access a shadow register instead of the actual
register.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

KVM: allow bit 10 to be cleared in MSR_IA32_MC4_CTL

There is a quirk for AMD K8 CPUs in many Linux kernels (see
arch/x86/kernel/cpu/mcheck/mce.c:__mcheck_cpu_apply_quirks()) that
clears bit 10 in that MCE related MSR. KVM can only cope with all
zeros or all ones, so it will inject a #GP into the guest, which
will let it panic.
So lets add a quirk to the quirk and ignore this single cleared bit.
This fixes -cpu kvm64 on all machines and -cpu host on K8 machines
with some guest Linux kernels.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: Don't spam kernel log when injecting exceptions due to bad cr writes

These are guest-triggerable.

Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: SVM: Fix memory leaks that happen when svm_create_vcpu() fails

svm_create_vcpu() does not free the pages allocated during the creation
when it fails to complete the allocations. This patch fixes it.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>

KVM: take srcu lock before call to complete_pio()

complete_pio() may use slot table which is protected by srcu.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>

sparc64: Fix hardirq tracing in trap return path.

We can overflow the hardirq stack if we set the %pil here
so early, just let the normal control flow do it.

This is fine as we are allowed to do the actual IRQ enable
at any point after we call trace_hardirqs_on.

Signed-off-by: David S. Miller <davem@davemloft.net>

drm: delay vblank cleanup until after driver unload

Drivers may use vblank calls now (e.g. drm_vblank_off) in their unload
paths, so don't clean up the vblank related structures until after
driver unload.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

md/raid5: allow for more than 2^31 chunks.

With many large drives and small chunk sizes it is possible
to create a RAID5 with more than 2^31 chunks. Make sure this
works.

Reported-by: Brett King <king.br@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: stable@kernel.org

Linux 2.6.34-rc5

rmap: add exclusively owned pages to the newest anon_vma

The recent anon_vma fixes cause many anonymous pages to end up
in the parent process anon_vma, even when the page is exclusively
owned by the current process.

Adding exclusively owned anonymous pages to the top anon_vma
reduces rmap scanning overhead, especially in workloads with
forking servers.

This patch adds a parameter to __page_set_anon_rmap that can
be used to indicate whether or not the added page is exclusively
owned by the current process.

Pages added through page_add_new_anon_rmap are exclusively
owned by the current process, and can be added to the top
anon_vma.

Pages added through page_add_anon_rmap can be either shared
or exclusively owned, so we do the conservative thing and
add it to the oldest anon_vma.

A next step would be to add the exclusive parameter to
page_add_anon_rmap, to be used from functions where we do
know for sure whether a page is exclusively owned.

Signed-off-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>
Lightly-tested-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
[ Edited to look nicer - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  eCryptfs: Turn lower lookup error messages into debug messages
  eCryptfs: Copy lower directory inode times and size on link
  ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode
  ecryptfs: fix error code for missing xattrs in lower fs
  eCryptfs: Decrypt symlink target for stat size
  eCryptfs: Strip metadata in xattr flag in encrypted view
  eCryptfs: Clear buffer before reading in metadata xattr
  eCryptfs: Rename ecryptfs_crypt_stat.num_header_bytes_at_front
  eCryptfs: Fix metadata in xattr feature regression

sparc64: Use correct pt_regs in decode_access_size() error paths.

Signed-off-by: David S. Miller <davem@davemloft.net>

eCryptfs: Turn lower lookup error messages into debug messages

Vaugue warnings about ENAMETOOLONG errors when looking up an encrypted
file name have caused many users to become concerned about their data.
Since this is a rather harmless condition, I'm moving this warning to
only be printed when the ecryptfs_verbosity module param is 1.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>

eCryptfs: Copy lower directory inode times and size on link

The timestamps and size of a lower inode involved in a link() call was
being copied to the upper parent inode. Instead, we should be
copying lower parent inode's timestamps and size to the upper parent
inode. I discovered this bug using the POSIX test suite at Tuxera.

Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>

ecryptfs: fix use with tmpfs by removing d_drop from ecryptfs_destroy_inode

Since tmpfs has no persistent storage, it pins all its dentries in memory
so they have d_count=1 when other file systems would have d_count=0.
->lookup is only used to create new dentries. If the caller doesn't
instantiate it, it's freed immediately at dput(). ->readdir reads
directly from the dcache and depends on the dentries being hashed.

When an ecryptfs mount is mounted, it associates the lower file and dentry
with the ecryptfs files as they're accessed. When it's umounted and
destroys all the in-memory ecryptfs inodes, it fput's the lower_files and
d_drop's the lower_dentries. Commit 4981e081 added this and a d_delete in
2008 and several months later commit caeeeecf removed the d_delete. I
believe the d_drop() needs to be removed as well.

The d_drop effectively hides any file that has been accessed via ecryptfs
from the underlying tmpfs since it depends on it being hashed for it to
be accessible. I've removed the d_drop on my development node and see no
ill effects with basic testing on both tmpfs and persistent storage.

As a side effect, after ecryptfs d_drops the dentries on tmpfs, tmpfs
BUGs on umount. This is due to the dentries being unhashed.
tmpfs->kill_sb is kill_litter_super which calls d_genocide to drop
the reference pinning the dentry. It skips unhashed and negative dentries,
but shrink_dcache_for_umount_subtree doesn't. Since those dentries
still have an elevated d_count, we get a BUG().

This patch removes the d_drop call and fixes both issues.

This issue was reported at:
https://bugzilla.novell.com/show_bug.cgi?id=567887

Reported-by: Árpád Bíró <biroa@demasz.hu>
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: Dustin Kirkland <kirkland@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>

ecryptfs: fix error code for missing xattrs in lower fs

If the lower file system driver has extended attributes disabled,
ecryptfs' own access functions return -ENOSYS instead of -EOPNOTSUPP.
This breaks execution of programs in the ecryptfs mount, since the
kernel expects the latter error when checking for security
capabilities in xattrs.

Signed-off-by: Christian Pulvermacher <pulvermacher@gmx.de>
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>

eCryptfs: Decrypt symlink target for stat size

Create a getattr handler for eCryptfs symlinks that is capable of
reading the lower target and decrypting its path. Prior to this patch,
a stat's st_size field would represent the strlen of the encrypted path,
while readlink() would return the strlen of the decrypted path. This
could lead to confusion in some userspace applications, since the two
values should be equal.

https://bugs.launchpad.net/bugs/524919

Reported-by: Loïc Minier <loic.minier@canonical.com>
Cc: stable@kernel.org
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>

Fix ISDN/Gigaset build failure

Commit b91ecb00 ("gigaset: include cleanup cleanup") removed an implicit
sched.h inclusion that came in via slab.h, and caused various compile
problems as a result.

This should fix it.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  rcu: Make RCU lockdep check the lockdep_recursion variable
  rcu: Update docs for rcu_access_pointer and rcu_dereference_protected
  rcu: Better explain the condition parameter of rcu_dereference_check()
  rcu: Add rcu_access_pointer and rcu_dereference_protected

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  gigaset: include cleanup cleanup
  packet : remove init_net restriction
  WAN: flush tx_queue in hdlc_ppp to prevent panic on rmmod hw_driver.
  ip: Fix ip_dev_loopback_xmit()
  net: dev_pick_tx() fix
  fib: suppress lockdep-RCU false positive in FIB trie.
  tun: orphan an skb on tx
  forcedeth: fix tx limit2 flag check
  iwlwifi: work around bogus active chains detection