Xiao Guangrong [Mon, 15 Jun 2015 08:55:31 +0000 (16:55 +0800)]
KVM: MTRR: sort variable MTRRs
Sort all valid variable MTRRs based on its base address, it will help us to
check a range to see if it's fully contained in variable MTRRs
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Fix list insertion sort, simplify var_mtrr_range_is_valid to just
test the V bit. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiao Guangrong [Mon, 15 Jun 2015 08:55:29 +0000 (16:55 +0800)]
KVM: MTRR: introduce fixed_mtrr_segment table
This table summarizes the information of fixed MTRRs and introduce some APIs
to abstract its operation which helps us to clean up the code and will be
used in later patches
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[Change range_size to range_shift, in order to avoid udivdi3 errors.
- Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiao Guangrong [Mon, 15 Jun 2015 08:55:22 +0000 (16:55 +0800)]
KVM: x86: move MTRR related code to a separate file
MTRR code locates in x86.c and mmu.c so that move them to a separate file to
make the organization more clearer and it will be the place where we fully
implement vMTRR
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Bandan Das [Thu, 11 Jun 2015 06:05:33 +0000 (02:05 -0400)]
KVM: nSVM: Check for NRIPS support before updating control field
If hardware doesn't support DecodeAssist - a feature that provides
more information about the intercept in the VMCB, KVM decodes the
instruction and then updates the next_rip vmcb control field.
However, NRIP support itself depends on cpuid Fn8000_000A_EDX[NRIPS].
Since skip_emulated_instruction() doesn't verify nrip support
before accepting control.next_rip as valid, avoid writing this
field if support isn't present.
Signed-off-by: Bandan Das <bsd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch fixes this issue by breaking up the allocation of
the table and its entries into individual kzalloc calls.
These could all be satisfied with order-0 allocations, which
are less likely to fail.
The downside of this change is the lower performance, because
of more calls to kzalloc. But given how often kvm_set_irq_routing
is called in the lifetime of a guest, it doesn't really
matter much.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
[Avoid sparse warning through rcu_access_pointer. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Back in the days, vgic.c used to have an intimate knowledge of
the actual GICv2. These days, this has been abstracted away into
hardware-specific backends.
Remove the now useless arm-gic.h #include directive, making it
clear that GICv2 specific code doesn't belong here.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Commit fd1d0ddf2ae9 (KVM: arm/arm64: check IRQ number on userland
injection) rightly limited the range of interrupts userspace can
inject in a guest, but failed to consider the (unlikely) case where
a guest is configured with 1024 interrupts.
In this case, interrupts ranging from 1020 to 1023 are unuseable,
as they have a special meaning for the GIC CPU interface.
Make sure that these number cannot be used as an IRQ. Also delete
a redundant (and similarily buggy) check in kvm_set_irq.
Reported-by: Peter Maydell <peter.maydell@linaro.org> Cc: Andre Przywara <andre.przywara@arm.com> Cc: <stable@vger.kernel.org> # 4.1, 4.0, 3.19, 3.18 Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Thu, 11 Jun 2015 17:50:17 +0000 (18:50 +0100)]
arm/arm64: KVM: vgic: Do not save GICH_HCR / ICH_HCR_EL2
The GIC Hypervisor Configuration Register is used to enable
the delivery of virtual interupts to a guest, as well as to
define in which conditions maintenance interrupts are delivered
to the host.
This register doesn't contain any information that we need to
read back (the EOIcount is utterly useless for us).
So let's save ourselves some cycles, and not save it before
writing zero to it.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
ARM: kvm: psci: fix handling of unimplemented functions
According to the PSCI specification and the SMC/HVC calling
convention, PSCI function_ids that are not implemented must
return NOT_SUPPORTED as return value.
Current KVM implementation takes an unhandled PSCI function_id
as an error and injects an undefined instruction into the guest
if PSCI implementation is called with a function_id that is not
handled by the resident PSCI version (ie it is not implemented),
which is not the behaviour expected by a guest when calling a
PSCI function_id that is not implemented.
This patch fixes this issue by returning NOT_SUPPORTED whenever
the kvm PSCI call is executed for a function_id that is not
implemented by the PSCI kvm layer.
Cc: <stable@vger.kernel.org> # 3.18+ Cc: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Alex Bennée [Thu, 4 Jun 2015 13:28:37 +0000 (14:28 +0100)]
KVM: arm64: fix misleading comments in save/restore
The elr_el2 and spsr_el2 registers in fact contain the processor state
before entry into EL2. In the case of guest state it could be in either
el0 or el1.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Kim Phillips [Fri, 5 Jun 2015 15:21:49 +0000 (16:21 +0100)]
KVM: arm/arm64: Enable the KVM-VFIO device
The KVM-VFIO device is used by the QEMU VFIO device. It is used to
record the list of in-use VFIO groups so that KVM can manipulate
them.
Signed-off-by: Kim Phillips <kim.phillips@linaro.org> Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Christoffer Dall [Thu, 28 May 2015 18:49:10 +0000 (19:49 +0100)]
arm/arm64: KVM: Properly account for guest CPU time
Until now we have been calling kvm_guest_exit after re-enabling
interrupts when we come back from the guest, but this has the
unfortunate effect that CPU time accounting done in the context of timer
interrupts occurring while the guest is running doesn't properly notice
that the time since the last tick was spent in the guest.
Inspired by the comment in the x86 code, move the kvm_guest_exit() call
below the local_irq_enable() call and change __kvm_guest_exit() to
kvm_guest_exit(), because we are now calling this function with
interrupts enabled. We have to now explicitly disable preemption and
not enable preemption before we've called kvm_guest_exit(), since
otherwise we could be preempted and everything happening before we
eventually get scheduled again would be accounted for as guest time.
At the same time, move the trace_kvm_exit() call outside of the atomic
section, since there is no reason for us to do that with interrupts
disabled.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Marc Zyngier [Mon, 16 Mar 2015 10:59:43 +0000 (10:59 +0000)]
arm: KVM: force execution of HCPTR access on VM exit
On VM entry, we disable access to the VFP registers in order to
perform a lazy save/restore of these registers.
On VM exit, we restore access, test if we did enable them before,
and save/restore the guest/host registers if necessary. In this
sequence, the FPEXC register is always accessed, irrespective
of the trapping configuration.
If the guest didn't touch the VFP registers, then the HCPTR access
has now enabled such access, but we're missing a barrier to ensure
architectural execution of the new HCPTR configuration. If the HCPTR
access has been delayed/reordered, the subsequent access to FPEXC
will cause a trap, which we aren't prepared to handle at all.
The same condition exists when trapping to enable VFP for the guest.
The fix is to introduce a barrier after enabling VFP access. In the
vmexit case, it can be relaxed to only takes place if the guest hasn't
accessed its view of the VFP registers, making the access to FPEXC safe.
The set_hcptr macro is modified to deal with both vmenter/vmexit and
vmtrap operations, and now takes an optional label that is branched to
when the guest hasn't touched the VFP registers.
Andre Przywara [Thu, 23 Apr 2015 19:01:53 +0000 (20:01 +0100)]
KVM: arm64: add active register handling to GICv3 emulation as well
Commit 47a98b15ba7c ("arm/arm64: KVM: support for un-queuing active
IRQs") introduced handling of the GICD_I[SC]ACTIVER registers,
but only for the GICv2 emulation. For the sake of completeness and
as this is a pre-requisite for save/restore of the GICv3 distributor
state, we should also emulate their handling in the distributor and
redistributor frames of an emulated GICv3.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Suggested-by: Bandan Das <bsd@redhat.com> Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 1 Apr 2015 12:25:33 +0000 (14:25 +0200)]
KVM: x86: advertise KVM_CAP_X86_SMM
... and we're done. :)
Because SMBASE is usually relocated above 1M on modern chipsets, and
SMM handlers might indeed rely on 4G segment limits, we only expose it
if KVM is able to run the guest in big real mode. This includes any
of VMX+emulate_invalid_guest_state, VMX+unrestricted_guest, or SVM.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 18 May 2015 13:03:39 +0000 (15:03 +0200)]
KVM: x86: add SMM to the MMU role, support SMRAM address space
This is now very simple to do. The only interesting part is a simple
trick to find the right memslot in gfn_to_rmap, retrieving the address
space from the spte role word. The same trick is used in the auditing
code.
The comment on top of union kvm_mmu_page_role has been stale forever,
so remove it. Speaking of stale code, remove pad_for_nice_hex_output
too: it was splitting the "access" bitfield across two bytes and thus
had effectively turned into pad_for_ugly_hex_output.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 18 May 2015 11:33:16 +0000 (13:33 +0200)]
KVM: x86: work on all available address spaces
This patch has no semantic change, but it prepares for the introduction
of a second address space for system management mode.
A new function x86_set_memory_region (and the "slots_lock taken"
counterpart __x86_set_memory_region) is introduced in order to
operate on all address spaces when adding or deleting private
memory slots.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 8 Apr 2015 13:39:23 +0000 (15:39 +0200)]
KVM: x86: use vcpu-specific functions to read/write/translate GFNs
We need to hide SMRAM from guests not running in SMM. Therefore,
all uses of kvm_read_guest* and kvm_write_guest* must be changed to
check whether the VCPU is in system management mode and use a
different set of memslots. Switch from kvm_* to the newly-introduced
kvm_vcpu_*, which call into kvm_arch_vcpu_memslots_id.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Sun, 17 May 2015 11:58:53 +0000 (13:58 +0200)]
KVM: add vcpu-specific functions to read/write/translate GFNs
We need to hide SMRAM from guests not running in SMM. Therefore, all
uses of kvm_read_guest* and kvm_write_guest* must be changed to use
different address spaces, depending on whether the VCPU is in system
management mode. We need to introduce a new family of functions for
this purpose.
For now, the VCPU-based functions have the same behavior as the
existing per-VM ones, they just accept a different type for the
first argument. Later however they will be changed to use one of many
"struct kvm_memslots" stored in struct kvm, through an architecture hook.
VM-based functions will unconditionally use the first memslots pointer.
Whenever possible, this patch introduces slot-based functions with an
__ prefix, with two wrappers for generic and vcpu-based actions.
The exceptions are kvm_read_guest and kvm_write_guest, which are copied
into the new functions kvm_vcpu_read_guest and kvm_vcpu_write_guest.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 5 May 2015 09:50:23 +0000 (11:50 +0200)]
KVM: x86: save/load state on SMM switch
The big ugly one. This patch adds support for switching in and out of
system management mode, respectively upon receiving KVM_REQ_SMI and upon
executing a RSM instruction. Both 32- and 64-bit formats are supported
for the SMM state save area.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 4 Jun 2015 08:41:21 +0000 (10:41 +0200)]
KVM: x86: latch INITs while in system management mode
Do not process INITs immediately while in system management mode, keep
it instead in apic->pending_events. Tell userspace if an INIT is
pending when they issue GET_VCPU_EVENTS, and similarly handle the
new field in SET_VCPU_EVENTS.
Note that the same treatment should be done while in VMX non-root mode.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 7 May 2015 09:36:11 +0000 (11:36 +0200)]
KVM: x86: stubs for SMM support
This patch adds the interface between x86.c and the emulator: the
SMBASE register, a new emulator flag, the RSM instruction. It also
adds a new request bit that will be used by the KVM_SMI ioctl.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 1 Apr 2015 13:06:40 +0000 (15:06 +0200)]
KVM: x86: API changes for SMM support
This patch includes changes to the external API for SMM support.
Userspace can predicate the availability of the new fields and
ioctls on a new capability, KVM_CAP_X86_SMM, which is added at the end
of the patch series.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 1 Apr 2015 16:18:53 +0000 (18:18 +0200)]
KVM: x86: pass the whole hflags field to emulator and back
The hflags field will contain information about system management mode
and will be useful for the emulator. Pass the entire field rather than
just the guest-mode information.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Wed, 8 Apr 2015 13:30:38 +0000 (15:30 +0200)]
KVM: x86: pass host_initiated to functions that read MSRs
SMBASE is only readable from SMM for the VCPU, but it must be always
accessible if userspace is accessing it. Thus, all functions that
read MSRs are changed to accept a struct msr_data; the host_initiated
and index fields are pre-initialized, while the data field is filled
on return.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 5 May 2015 10:08:55 +0000 (12:08 +0200)]
KVM: x86: introduce num_emulated_msrs
We will want to filter away MSR_IA32_SMBASE from the emulated_msrs if
the host CPU does not support SMM virtualization. Introduce the
logic to do that, and also move paravirt MSRs to emulated_msrs for
simplicity and to get rid of KVM_SAVE_MSRS_BEGIN.
Reviewed-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 4 Jun 2015 07:51:50 +0000 (09:51 +0200)]
kvm: x86: default legacy PCI device assignment support to "n"
VFIO has proved itself a much better option than KVM's built-in
device assignment. It is mature, provides better isolation because
it enforces ACS, and even the userspace code is being tested on
a wider variety of hardware these days than the legacy support.
Disable legacy device assignment by default.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Let's remove "kvm-s390" from our printk messages and make use
of pr_fmt instead.
Also replace one printk() occurrence by a equivalent pr_warn
on the way.
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
KVM: s390: call exit_sie() directly on vcpu block/request
Thinking about it, I can't find a real use case where we want
to block a VCPU and not kick it out of SIE. (except if we want
to do the same in batch for multiple VCPUs - but that's a micro
optimization)
So let's simply perform the exit_sie() calls directly when setting
the other magic block bits in the SIE.
Otherwise e.g. kvm_s390_set_tod_low() still has other VCPUs running
after that call, working with a wrong epoch.
Fixes: 27406cd50c ("KVM: s390: provide functions for blocking all CPUs") Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
However, it turns out that kvmclock does provide a stable
sched_clock callback. So, let the scheduler know this which
in turn makes NOHZ_FULL work in the guest.
Marcelo Tosatti [Thu, 28 May 2015 23:20:39 +0000 (20:20 -0300)]
x86: kvmclock: add flag to indicate pvclock counts from zero
Setting sched clock stable for kvmclock causes the printk timestamps
to not start from zero, which is different from baremetal and
can possibly break userspace. Add a flag to indicate that
hypervisor sets clock base at zero when kvmclock is initialized.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Andrew Morton [Wed, 27 May 2015 09:53:06 +0000 (11:53 +0200)]
arch/x86/kvm/mmu.c: work around gcc-4.4.4 bug
arch/x86/kvm/mmu.c: In function 'kvm_mmu_pte_write':
arch/x86/kvm/mmu.c:4256: error: unknown field 'cr0_wp' specified in initializer
arch/x86/kvm/mmu.c:4257: error: unknown field 'cr4_pae' specified in initializer
arch/x86/kvm/mmu.c:4257: warning: excess elements in union initializer
...
gcc-4.4.4 (at least) has issues when using anonymous unions in
initializers.
Fixes: edc90b7dc4ceef6 ("KVM: MMU: fix SMAP virtualization") Cc: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Radim Krčmář [Fri, 22 May 2015 16:45:11 +0000 (18:45 +0200)]
KVM: x86: use correct APIC ID on x2APIC transition
SDM April 2015, 10.12.5 State Changes From xAPIC Mode to x2APIC Mode
• Any APIC ID value written to the memory-mapped local APIC ID register
is not preserved.
Fix it by sourcing vcpu_id (= initial APIC ID) instead of memory-mapped
APIC ID. Proper use of apic functions would result in two calls to
recalculate_apic_map(), so this patch makes a new helper.
Signed-off-by: Radim KrÄ\8dmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 19 May 2015 14:29:22 +0000 (16:29 +0200)]
KVM: x86: pass struct kvm_mmu_page to account/unaccount_shadowed
Prepare for multiple address spaces this way, since a VCPU is not available
where unaccount_shadowed is called. We will get to the right kvm_memslots
struct through the role field in struct kvm_mmu_page.
Paolo Bonzini [Tue, 19 May 2015 14:09:04 +0000 (16:09 +0200)]
KVM: remove __gfn_to_pfn
Most of the function that wrap it can be rewritten without it, except
for gfn_to_pfn_prot. Just inline it into gfn_to_pfn_prot, and rewrite
the other function on top of gfn_to_pfn_memslot*.
Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Tue, 19 May 2015 14:01:50 +0000 (16:01 +0200)]
KVM: pass kvm_memory_slot to gfn_to_page_many_atomic
The memory slot is already available from gfn_to_memslot_dirty_bitmap.
Isn't it a shame to look it up again? Plus, it makes gfn_to_page_many_atomic
agnostic of multiple VCPU address spaces.
Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 18 May 2015 11:20:23 +0000 (13:20 +0200)]
KVM: add "new" argument to kvm_arch_commit_memory_region
This lets the function access the new memory slot without going through
kvm_memslots and id_to_memslot. It will simplify the code when more
than one address space will be supported.
Unfortunately, the "const"ness of the new argument must be casted
away in two places. Fixing KVM to accept const struct kvm_memory_slot
pointers would require modifications in pretty much all architectures,
and is left for later.
Reviewed-by: Radim Krcmar <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Mon, 18 May 2015 11:59:39 +0000 (13:59 +0200)]
KVM: const-ify uses of struct kvm_userspace_memory_region
Architecture-specific helpers are not supposed to muck with
struct kvm_userspace_memory_region contents. Add const to
enforce this.
In order to eliminate the only write in __kvm_set_memory_region,
the cleaning of deleted slots is pulled up from update_memslots
to __kvm_set_memory_region.
Paolo Bonzini [Sun, 17 May 2015 09:41:37 +0000 (11:41 +0200)]
KVM: introduce kvm_alloc/free_memslots
kvm_alloc_memslots is extracted out of previously scattered code
that was in kvm_init_memslots_id and kvm_create_vm.
kvm_free_memslot and kvm_free_memslots are new names of
kvm_free_physmem and kvm_free_physmem_slot, but they also take
an explicit pointer to struct kvm_memslots.
This will simplify the transition to multiple address spaces,
each represented by one pointer to struct kvm_memslots.
Liang Li [Wed, 20 May 2015 20:41:25 +0000 (04:41 +0800)]
kvm/fpu: Enable eager restore kvm FPU for MPX
The MPX feature requires eager KVM FPU restore support. We have verified
that MPX cannot work correctly with the current lazy KVM FPU restore
mechanism. Eager KVM FPU restore should be enabled if the MPX feature is
exposed to VM.
Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Liang Li <liang.z.li@intel.com>
[Also activate the FPU on AMD processors. - Paolo] Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kvm: fix crash in kvm_vcpu_reload_apic_access_page
memslot->userfault_addr is set by the kernel with a mmap executed
from the kernel but the userland can still munmap it and lead to the
below oops after memslot->userfault_addr points to a host virtual
address that has no vma or mapping.
Nicholas Krause [Wed, 20 May 2015 04:24:10 +0000 (00:24 -0400)]
kvm: x86: Make functions that have no external callers static
This makes the functions kvm_guest_cpu_init and kvm_init_debugfs
static now due to having no external callers outside their
declarations in the file, kvm.c.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 2 Apr 2015 09:20:48 +0000 (11:20 +0200)]
KVM: export __gfn_to_pfn_memslot, drop gfn_to_pfn_async
gfn_to_pfn_async is used in just one place, and because of x86-specific
treatment that place will need to look at the memory slot. Hence inline
it into try_async_pf and export __gfn_to_pfn_memslot.
The patch also switches the subsequent call to gfn_to_pfn_prot to use
__gfn_to_pfn_memslot. This is a small optimization. Finally, remove
the now-unused async argument of __gfn_to_pfn.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiao Guangrong [Wed, 13 May 2015 06:42:27 +0000 (14:42 +0800)]
KVM: MMU: fix MTRR update
Currently, whenever guest MTRR registers are changed
kvm_mmu_reset_context is called to switch to the new root shadow page
table, however, it's useless since:
1) the cache type is not cached into shadow page's attribute so that
the original root shadow page will be reused
2) the cache type is set on the last spte, that means we should sync
the last sptes when MTRR is changed
This patch fixs this issue by drop all the spte in the gfn range which
is being updated by MTRR
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiao Guangrong [Wed, 13 May 2015 06:42:19 +0000 (14:42 +0800)]
KVM: MMU: fix decoding cache type from MTRR
There are some bugs in current get_mtrr_type();
1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
that means other bits are ignored if it is cleared
2: the fixed MTRR ranges are controlled by bit 0 of
mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
3: if MTRR is disabled, UC is applied to all of physical memory rather
than mtrr_state->def_type
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiao Guangrong [Mon, 11 May 2015 14:55:21 +0000 (22:55 +0800)]
KVM: MMU: fix SMAP virtualization
KVM may turn a user page to a kernel page when kernel writes a readonly
user page if CR0.WP = 1. This shadow page entry will be reused after
SMAP is enabled so that kernel is allowed to access this user page
Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu
once CR4.SMAP is updated
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nadav Amit [Tue, 28 Apr 2015 10:06:01 +0000 (13:06 +0300)]
KVM: x86: Fix zero iterations REP-string
When a REP-string is executed in 64-bit mode with an address-size prefix,
ECX/EDI/ESI are used as counter and pointers. When ECX is initially zero, Intel
CPUs clear the high 32-bits of RCX, and recent Intel CPUs update the high bits
of the pointers in MOVS/STOS. This behavior is specific to Intel according to
few experiments.
As one may guess, this is an undocumented behavior. Yet, it is observable in
the guest, since at least VMX traps REP-INS/OUTS even when ECX=0. Note that
VMware appears to get it right. The behavior can be observed using the
following code:
Nadav Amit [Tue, 28 Apr 2015 10:06:00 +0000 (13:06 +0300)]
KVM: x86: Fix update RCX/RDI/RSI on REP-string
When REP-string instruction is preceded with an address-size prefix,
ECX/EDI/ESI are used as the operation counter and pointers. When they are
updated, the high 32-bits of RCX/RDI/RSI are cleared, similarly to the way they
are updated on every 32-bit register operation. Fix it.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Nadav Amit [Sun, 19 Apr 2015 18:12:59 +0000 (21:12 +0300)]
KVM: x86: Fix DR7 mask on task-switch while debugging
If the host sets hardware breakpoints to debug the guest, and a task-switch
occurs in the guest, the architectural DR7 will not be updated. The effective
DR7 would be updated instead.
This fix puts the DR7 update during task-switch emulation, so it now uses the
standard DR setting mechanism instead of the one that was previously used. As a
bonus, the update of DR7 will now be effective for AMD as well.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiao Guangrong [Mon, 11 May 2015 14:55:21 +0000 (22:55 +0800)]
KVM: MMU: fix SMAP virtualization
KVM may turn a user page to a kernel page when kernel writes a readonly
user page if CR0.WP = 1. This shadow page entry will be reused after
SMAP is enabled so that kernel is allowed to access this user page
Fix it by setting SMAP && !CR0.WP into shadow page's role and reset mmu
once CR4.SMAP is updated
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 2 Apr 2015 09:04:05 +0000 (11:04 +0200)]
KVM: MMU: fix CR4.SMEP=1, CR0.WP=0 with shadow pages
smep_andnot_wp is initialized in kvm_init_shadow_mmu and shadow pages
should not be reused for different values of it. Thus, it has to be
added to the mask in kvm_mmu_pte_write.
Reviewed-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Xiao Guangrong [Thu, 7 May 2015 08:20:15 +0000 (16:20 +0800)]
KVM: MMU: fix smap permission check
Current permission check assumes that RSVD bit in PFEC is always zero,
however, it is not true since MMIO #PF will use it to quickly identify
MMIO access
Fix it by clearing the bit if walking guest page table is needed
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paul Mackerras [Wed, 29 Apr 2015 04:49:07 +0000 (14:49 +1000)]
KVM: PPC: Book3S HV: Fix list traversal in error case
This fixes a regression introduced in commit 25fedfca94cf, "KVM: PPC:
Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu", which
leads to a user-triggerable oops.
In the case where we try to run a vcore on a physical core that is
not in single-threaded mode, or the vcore has too many threads for
the physical core, we iterate the list of runnable vcpus to make
each one return an EBUSY error to userspace. Since this involves
taking each vcpu off the runnable_threads list for the vcore, we
need to use list_for_each_entry_safe rather than list_for_each_entry
to traverse the list. Otherwise the kernel will crash with an oops
message like this:
Fixes: 25fedfca94cfbf2461314c6c34ef58e74a31b025 Signed-off-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Alexander Graf <agraf@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Our implementation will never trigger interception code 12 as the
responsible setting is never enabled - and never will be.
The handler is dead code. Let's get rid of it.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
KVM: s390: factor out and optimize floating irq VCPU kick
This patch factors out the search for a floating irq destination
VCPU as well as the kicking of the found VCPU. The search is optimized
in the following ways:
1. stopped VCPUs can't take any floating interrupts, so try to find an
operating one. We have to take care of the special case where all
VCPUs are stopped and we don't have any valid destination.
2. use online_vcpus, not KVM_MAX_VCPU. This speeds up the search
especially if KVM_MAX_VCPU is increased one day. As these VCPU
objects are initialized prior to increasing online_vcpus, we can be
sure that they exist.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
KVM: s390: optimize interrupt handling round trip time
We can avoid checking guest control registers and guest PSW as well
as all the masking and calculations on the interrupt masks when
no interrupts are pending.
Also, the check for IRQ_PEND_COUNT can be removed, because we won't
enter the while loop if no interrupts are pending and invalid interrupt
types can't be injected.