Gleb Natapov [Thu, 18 Mar 2010 13:20:24 +0000 (15:20 +0200)]
KVM: x86 emulator: Move string pio emulation into emulator.c
Currently emulation is done outside of emulator so things like doing
ins/outs to/from mmio are broken it also makes it hard (if not impossible)
to implement single stepping in the future. The implementation in this
patch is not efficient since it exits to userspace for each IO while
previous implementation did 'ins' in batches. Further patch that
implements pio in string read ahead address this problem.
Gleb Natapov [Thu, 18 Mar 2010 13:20:23 +0000 (15:20 +0200)]
KVM: x86 emulator: fix in/out emulation.
in/out emulation is broken now. The breakage is different depending
on where IO device resides. If it is in userspace emulator reports
emulation failure since it incorrectly interprets kvm_emulate_pio()
return value. If IO device is in the kernel emulation of 'in' will do
nothing since kvm_emulate_pio() stores result directly into vcpu
registers, so emulator will overwrite result of emulation during
commit of shadowed register.
Gleb Natapov [Thu, 18 Mar 2010 13:20:21 +0000 (15:20 +0200)]
KVM: x86 emulator: add decoding of X,Y parameters from Intel SDM
Add decoding of X,Y parameters from Intel SDM which are used by string
instruction to specify source and destination. Use this new decoding
to implement movs, cmps, stos, lods in a generic way.
Gleb Natapov [Thu, 18 Mar 2010 13:20:20 +0000 (15:20 +0200)]
KVM: x86 emulator: populate OP_MEM operand during decoding.
All struct operand fields are initialized during decoding for all
operand types except OP_MEM, but there is no reason for that. Move
OP_MEM operand initialization into decoding stage for consistency.
Gleb Natapov [Thu, 18 Mar 2010 13:20:09 +0000 (15:20 +0200)]
KVM: x86 emulator: 0f (20|21|22|23) ignore mod bits.
Resent spec says that for 0f (20|21|22|23) the 2 bits in the mod field
are ignored. Interestingly enough older spec says that 11 is only valid
encoding.
Gleb Natapov [Thu, 18 Mar 2010 13:20:03 +0000 (15:20 +0200)]
KVM: Provide callback to get/set control registers in emulator ops.
Use this callback instead of directly call kvm function. Also rename
realmode_(set|get)_cr to emulator_(set|get)_cr since function has nothing
to do with real mode.
kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced
mmio ring page and dev even after it has freed them.
Also, if this function fails, though it might be rare, it seems to be
suggesting the system's serious state: so we'd better stop the works
following the kvm_creat_vm().
This patch clears these problems.
We move the coalesced mmio's initialization out of kvm_create_vm().
This seems to be natural because it includes a registration which
can be done only when vm is successfully created.
Avi Kivity [Mon, 15 Mar 2010 11:59:57 +0000 (13:59 +0200)]
KVM: MMU: Reinstate pte prefetch on invlpg
Commit fb341f57 removed the pte prefetch on guest invlpg, citing guest races.
However, the SDM is adamant that prefetch is allowed:
"The processor may create entries in paging-structure caches for
translations required for prefetches and for accesses that are a
result of speculative execution that would never actually occur
in the executed code path."
And, in fact, there was a race in the prefetch code: we picked up the pte
without the mmu lock held, so an older invlpg could install the pte over
a newer invlpg.
Reinstate the prefetch logic, but this time note whether another invlpg has
executed using a counter. If a race occured, do not install the pte.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 15 Mar 2010 11:59:56 +0000 (13:59 +0200)]
KVM: MMU: Do not instantiate nontrapping spte on unsync page
The update_pte() path currently uses a nontrapping spte when a nonpresent
(or nonaccessed) gpte is written. This is fine since at present it is only
used on sync pages. However, on an unsync page this will cause an endless
fault loop as the guest is under no obligation to invlpg a gpte that
transitions from nonpresent to present.
Needed for the next patch which reinstates update_pte() on invlpg.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 15 Mar 2010 11:59:55 +0000 (13:59 +0200)]
KVM: Don't follow an atomic operation by a non-atomic one
Currently emulated atomic operations are immediately followed by a non-atomic
operation, so that kvm_mmu_pte_write() can be invoked. This updates the mmu
but undoes the whole point of doing things atomically.
Fix by only performing the atomic operation and the mmu update, and avoiding
the non-atomic write.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 15 Mar 2010 11:59:54 +0000 (13:59 +0200)]
KVM: Make locked operations truly atomic
Once upon a time, locked operations were emulated while holding the mmu mutex.
Since mmu pages were write protected, it was safe to emulate the writes in
a non-atomic manner, since there could be no other writer, either in the
guest or in the kernel.
These days emulation takes place without holding the mmu spinlock, so the
write could be preempted by an unshadowing event, which exposes the page
to writes by the guest. This may cause corruption of guest page tables.
Fix by using an atomic cmpxchg for these operations.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Avi Kivity [Mon, 15 Mar 2010 11:59:53 +0000 (13:59 +0200)]
KVM: MMU: Consolidate two guest pte reads in kvm_mmu_pte_write()
kvm_mmu_pte_write() reads guest ptes in two different occasions, both to
allow a 32-bit pae guest to update a pte with 4-byte writes. Consolidate
these into a single read, which also allows us to consolidate another read
from an invlpg speculating a gpte into the shadow page table.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Wei Yongjun [Fri, 12 Mar 2010 02:11:15 +0000 (10:11 +0800)]
KVM: ia64: fix the error of ioctl KVM_IRQ_LINE if no irq chip
If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT.
But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is
return. So this patch used -ENXIO instead of -EFAULT.
Wei Yongjun [Fri, 12 Mar 2010 02:09:45 +0000 (10:09 +0800)]
KVM: x86: fix the error of ioctl KVM_IRQ_LINE if no irq chip
If no irq chip in kernel, ioctl KVM_IRQ_LINE will return -EFAULT.
But I see in other place such as KVM_[GET|SET]IRQCHIP, -ENXIO is
return. So this patch used -ENXIO instead of -EFAULT.
Wei Yongjun [Fri, 12 Mar 2010 00:45:39 +0000 (08:45 +0800)]
KVM: ia64: fix the error code of ioctl KVM_IA64_VCPU_GET_STACK failure
The ioctl KVM_IA64_VCPU_GET_STACK does not set the error code if
copy_to_user() fail, and 0 will be return, we should use -EFAULT
instead of 0 in this case, so this patch fixed it.
Joerg Roedel [Mon, 1 Mar 2010 14:34:39 +0000 (15:34 +0100)]
KVM; SVM: Add correct handling of nested iopm
This patch adds the correct handling of the nested io
permission bitmap. Old behavior was to not lookup the port
in the iopm but only reinject an io intercept to the guest.
Joerg Roedel [Mon, 1 Mar 2010 14:34:38 +0000 (15:34 +0100)]
KVM: SVM: Use svm_msrpm_offset in nested_svm_exit_handled_msr
There is a generic function now to calculate msrpm offsets.
Use that function in nested_svm_exit_handled_msr() remove
the duplicate logic (which had a bug anyway).
Joerg Roedel [Mon, 1 Mar 2010 14:34:37 +0000 (15:34 +0100)]
KVM: SVM: Optimize nested svm msrpm merging
This patch optimizes the way the msrpm of the host and the
guest are merged. The old code merged the 2 msrpm pages
completly. This code needed to touch 24kb of memory for that
operation. The optimized variant this patch introduces
merges only the parts where the host msrpm may contain zero
bits. This reduces the amount of memory which is touched to
48 bytes.
Joerg Roedel [Mon, 1 Mar 2010 14:34:36 +0000 (15:34 +0100)]
KVM: SVM: Introduce direct access msr list
This patch introduces a list with all msrs a guest might
have direct access to and changes the svm_vcpu_init_msrpm
function to use this list.
It also adds a check to set_msr_interception which triggers
a warning if a developer changes a msr intercept that is not
in the list.
Joerg Roedel [Mon, 1 Mar 2010 14:34:34 +0000 (15:34 +0100)]
KVM: SVM: Return correct values in nested_svm_exit_handled_msr
The nested_svm_exit_handled_msr() returned an bool which is
a bug. I worked by accident because the exected integer
return values match with the true and false values. This
patch changes the return value to int and let the function
return the correct values.
Joerg Roedel [Wed, 24 Feb 2010 17:59:19 +0000 (18:59 +0100)]
KVM: SVM: Clear exit_info for injected INTR exits
When injecting an vmexit.intr into the nested hypervisor
there might be leftover values in the exit_info fields.
Clear them to not confuse nested hypervisors.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Then we get an cr0 write intercept which is handled on the
host. But that intercepts may actually be a selective cr0
intercept for the guest. This patch checks for this
condition and injects a selective cr0 intercept if needed.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Wed, 24 Feb 2010 17:59:17 +0000 (18:59 +0100)]
KVM: x86: Don't set arch.cr0 in kvm_set_cr0
The vcpu->arch.cr0 variable is already set in the
architecture specific set_cr0 callbacks. There is no need to
set it in the common code.
This allows the architecture code to keep the old arch.cr0
value if it wants. This is required for nested svm to decide
if a selective_cr0 exit needs to be injected.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:58 +0000 (17:47 +0100)]
KVM: x86: Drop RF manipulation for guest single-stepping
RF is not required for injecting TF as the latter will trigger only
after an instruction execution anyway. So do not touch RF when arming or
disarming guest single-step mode.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:56 +0000 (17:47 +0100)]
KVM: SVM: Emulate nRIP feature when reinjecting INT3
When in guest debugging mode, we have to reinject those #BP software
exceptions that are caused by guest-injected INT3. As older AMD
processors do not support the required nRIP VMCB field, try to emulate
it by moving RIP past the instruction on exception injection. Fix it up
again in case the injection failed and we were able to catch this. This
does not work for unintercepted faults, but it is better than doing
nothing.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Tue, 23 Feb 2010 16:47:55 +0000 (17:47 +0100)]
KVM: x86: Add kvm_is_linear_rip
Based on Gleb's suggestion: Add a helper kvm_is_linear_rip that matches
a given linear RIP against the current one. Use this for guest
single-stepping, more users will follow.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Mon, 22 Feb 2010 15:52:14 +0000 (16:52 +0100)]
KVM: PPC: Destory timer on vcpu destruction
When we destory a vcpu, we should also make sure to kill all pending
timers that could still be up. When not doing this, hrtimers might
dereference null pointers trying to call our code.
This patch fixes spontanious kernel panics seen after closing VMs.
Signed-off-by: Alexander Graf <alex@csgraf.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Fri, 19 Feb 2010 18:38:07 +0000 (19:38 +0100)]
KVM: x86: Save&restore interrupt shadow mask
The interrupt shadow created by STI or MOV-SS-like operations is part of
the VCPU state and must be preserved across migration. Transfer it in
the spare padding field of kvm_vcpu_events.interrupt.
As a side effect we now have to make vmx_set_interrupt_shadow robust
against both shadow types being set. Give MOV SS a higher priority and
skip STI in that case to avoid that VMX throws a fault on next entry.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Mon, 15 Feb 2010 09:45:41 +0000 (10:45 +0100)]
KVM: x86: Do not return soft events in vcpu_events
To avoid that user space migrates a pending software exception or
interrupt, mask them out on KVM_GET_VCPU_EVENTS. Without this, user
space would try to reinject them, and we would have to reconstruct the
proper instruction length for VMX event injection. Now the pending event
will be reinjected via executing the triggering instruction again.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Joerg Roedel [Fri, 19 Feb 2010 15:23:01 +0000 (16:23 +0100)]
KVM: SVM: Fix wrong interrupt injection in enable_irq_windows
The nested_svm_intr() function does not execute the vmexit
anymore. Therefore we may still be in the nested state after
that function ran. This patch changes the nested_svm_intr()
function to return wether the irq window could be enabled.
Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 11:24:33 +0000 (12:24 +0100)]
KVM: PPC: Allocate vcpu struct using vmalloc
We used to use get_free_pages to allocate our vcpu struct. Unfortunately
that call failed on me several times after my machine had a big enough
uptime, as memory became too fragmented by then.
Fortunately, we don't need it to be page aligned any more! We can just
vmalloc it and everything's great.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:47 +0000 (11:00 +0100)]
KVM: PPC: Simplify kvmppc_load_up_(FPU|VMX|VSX)
We don't need as complex code. I had some thinkos while writing it, figuring
I needed to support PPC32 paths on PPC64 which would have required DR=0, but
everything just runs fine with DR=1.
So let's make the functions simple C call wrappers that reserve some space on
the stack for the respective functions to clobber.
Fixes out-of-RMA-access (and thus guest FPU loading) on the PS3.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:46 +0000 (11:00 +0100)]
KVM: PPC: Enable use of secondary htab bucket
We had code to make use of the secondary htab buckets, but kept that
disabled because it was unstable when I put it in.
I checked again if that's still the case and apparently it was only
exposing some instability that was there anyways before. I haven't
seen any badness related to usage of secondary htab entries so far.
This should speed up guest memory allocations by quite a bit, because
we now have more space to put PTEs in.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:44 +0000 (11:00 +0100)]
KVM: PPC: Implement Paired Single emulation
The one big thing about the Gekko is paired singles.
Paired singles are an extension to the instruction set, that adds 32 single
precision floating point registers (qprs), some SPRs to modify the behavior
of paired singled operations and instructions to deal with qprs to the
instruction set.
Unfortunately, it also changes semantics of existing operations that affect
single values in FPRs. In most cases they get mirrored to the coresponding
QPR.
Thanks to that we need to emulate all FPU operations and all the new paired
single operations too.
In order to achieve that, we use the just introduced FPU call helpers to
call the real FPU whenever the guest wants to modify an FPR. Additionally
we also fix up the QPR values along the way.
That way we can execute paired single FPU operations without implementing a
soft fpu.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:43 +0000 (11:00 +0100)]
KVM: PPC: Enable program interrupt to do MMIO
When we get a program interrupt we usually don't expect it to perform an
MMIO operation. But why not? When we emulate paired singles, we can end
up loading or storing to an MMIO address - and the handling of those
happens in the program interrupt handler.
So let's teach the program interrupt handler how to deal with EMULATE_MMIO.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:42 +0000 (11:00 +0100)]
KVM: PPC: Add helpers to modify ppc fields
The PowerPC specification always lists bits from MSB to LSB. That is
really confusing when you're trying to write C code, because it fits
in pretty badly with the normal (1 << xx) schemes.
So I came up with some nice wrappers that allow to get and set fields
in a u64 with bit numbers exactly as given in the spec. That makes the
code in KVM and the spec easier comparable.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:41 +0000 (11:00 +0100)]
KVM: PPC: Fix error in BAT assignment
BATs didn't work. Well, they did, but only up to BAT3. As soon as we
came to BAT4 the offset calculation was screwed up and we ended up
overwriting BAT0-3.
Fortunately, Linux hasn't been using BAT4+. It's still a good
idea to write correct code though.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:40 +0000 (11:00 +0100)]
KVM: PPC: Add helpers to call FPU instructions
To emulate paired single instructions, we need to be able to call FPU
operations from within the kernel. Since we don't want gcc to spill
arbitrary FPU code everywhere, we tell it to use a soft fpu.
Since we know we can really call the FPU in safe areas, let's also add
some calls that we can later use to actually execute real world FPU
operations on the host's FPU.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:38 +0000 (11:00 +0100)]
KVM: PPC: Make software load/store return eaddr
The Book3S KVM implementation contains some helper functions to load and store
data from and to virtual addresses.
Unfortunately, this helper used to keep the physical address it so nicely
found out for us to itself. So let's change that and make it return the
physical address it resolved.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:37 +0000 (11:00 +0100)]
KVM: PPC: Implement mtsr instruction emulation
The Book3S_32 specifications allows for two instructions to modify segment
registers: mtsrin and mtsr.
Most normal operating systems use mtsrin, because it allows to define which
segment it wants to change using a register. But since I was trying to run
an embedded guest, it turned out to be using mtsr with hardcoded values.
So let's also emulate mtsr. It's a valid instruction after all.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:36 +0000 (11:00 +0100)]
KVM: PPC: Fix typo in book3s_32 debug code
There's a typo in the debug ifdef of the book3s_32 mmu emulation. While trying
to debug something I stumbled across that and wanted to save anyone after me
(or myself later) from having to debug that again.
So let's fix the ifdef.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:35 +0000 (11:00 +0100)]
KVM: PPC: Preload FPU when possible
There are some situations when we're pretty sure the guest will use the
FPU soon. So we can save the churn of going into the guest, finding out
it does want to use the FPU and going out again.
This patch adds preloading of the FPU when it's reasonable.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Fri, 19 Feb 2010 10:00:34 +0000 (11:00 +0100)]
KVM: PPC: Combine extension interrupt handlers
When we for example get an Altivec interrupt, but our guest doesn't support
altivec, we need to inject a program interrupt, not an altivec interrupt.
The same goes for paired singles. When an altivec interrupt arrives, we're
pretty sure we need to emulate the instruction because it's a paired single
operation.
So let's make all the ext handlers aware that they need to jump to the
program interrupt handler when an extension interrupt arrives that
was not supposed to arrive for the guest CPU.
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>