git.karo-electronics.de Git - linux-beck.git/log

KVM: VMX: Fix exit qualification width on i386

According to Intel Software Developer's Manual, Vol. 3B, Appendix H.4.2,
exit qualification should be of natural width. However, current code
uses u64 as the data type for this register, which occasionally
introduces invalid value to VMExit handling logics. This patch fixes
this bug.

I have tested Windows and Linux guest on i386 host, and they can boot
successfully with this patch.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Move main vcpu loop into subarch independent code

This simplifies adding new code as well as reducing overall code size.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: VMX: Move vm entry failure handling to the exit handler

This will help moving the main loop to subarch independent code.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: MMU: Don't do GFP_NOWAIT allocations

Before preempt notifiers, kvm needed to allocate memory with GFP_NOWAIT so
as not to have to enable preemption and take a heavyweight exit. On oom, we'd
fall back to a GFP_KERNEL allocation.

With preemption notifiers, we can do a GFP_KERNEL allocation, and perform
the heavyweight exit only if the kernel decides to put us to sleep.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Rename kvm_arch_ops to kvm_x86_ops

This patch just renames the current (misnamed) _arch namings to _x86 to
ensure better readability when a real arch layer takes place.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Simplify memory allocation

The mutex->splinlock convertion alllows us to make some code simplifications.
As we can keep the lock longer, we don't have to release it and then
have to check if the environment has not been modified before re-taking it. We
can remove kvm->busy and kvm->memory_config_version.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Hoist SVM's get_cs_db_l_bits into core code.

SVM gets the DB and L bits for the cs by decoding the segment. This
is in fact the completely generic code, so hoist it for kvm-lite to use.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Keep control regs in sync

We don't update the vcpu control registers in various places. We
should do so.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Clean up unloved invlpg emulation

invlpg shouldn't fetch the "src" address, since it may not be valid,
however SVM's "solution" which neuters emulation of all group 7
instruction is horrible and breaks kvm-lite. The simplest fix is to
put a special check in for invlpg.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Remove the unused invlpg member of struct kvm_arch_ops.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Set the ET flag in CR0 after initializing FX

This was missed when moving stuff around in fbc4f2e

Fixes Solaris guests and bug #1773613

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: enable in-kernel APIC INIT/SIPI handling

This patch enables INIT/SIPI handling using in-kernel APIC by
introducing a ->mp_state field to emulate the SMP state transition.

[avi: remove smp_processor_id() warning]

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Xin Li <xin.b.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: round robin for APIC lowest priority delivery mode

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: deliver PIC interrupt only to vcpu0

This patch changes the PIC interrupts delivery. Now it is only delivered
to vcpu0 when either condition is met (on vcpu0):
1. local APIC is hardware disabled
2. LVT0 is unmasked and configured to delivery mode ExtInt

It fixes the 2x faster wall clock on x86_64 and SMP i386 Linux guests

Signed-off-by: Eddie (Yaozu) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: disable tpr/cr8 sync when in-kernel APIC is used

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Migrate lapic hrtimer when vcpu moves to another cpu

This reduces overhead by accessing cachelines from the wrong node, as well
as simplifying locking.

[Qing: fix for inactive or expired one-shot timer]

Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Keep track of missed timer irq injections

APIC timer IRQ is set every time when a certain period
expires at host time, but the guest may be descheduled
at that time and thus the irq be overwritten by later fire.
This patch keep track of firing irq numbers and decrease
only when the IRQ is injected to guest or buffered in
APIC.

Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: VMX: Use shadow TPR/cr8 for 64-bits guests

This patch enables TPR shadow of VMX on CR8 access. 64bit Windows using
CR8 access TPR frequently. The TPR shadow can improve the performance of
access TPR by not causing vmexit.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: pending irq save/restore

Add in kernel irqchip save/restore support for pending vectors.

[avi: fix compile warning on i386]
[avi: remove printk]

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: in-kernel LAPIC save and restore support

This patch adds a new vcpu-based IOCTL to save and restore the local
apic registers for a single vcpu. The kernel only copies the apic page as
a whole, extraction of registers is left to userspace side. On restore, the
APIC timer is restarted from the initial count, this introduces a little
delay, but works fine.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: in-kernel IOAPIC save and restore support

This patch adds support for in-kernel ioapic save and restore (to
and from userspace). It uses the same get/set_irqchip ioctl as
in-kernel PIC.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Bypass irq_pending get/set when using in kernel irqchip

vcpu->irq_pending is saved in get/set_sreg IOCTL, but when in-kernel
local APIC is used, doing this may occasionally overwrite vcpu->apic to
an invalid value, as in the vm restore path.

Signed-off-by: Qing He <qing.he@intel.com>

KVM: Add get/set irqchip ioctls for in-kernel PIC live migration support

This patch adds two new ioctls to dump and write kernel irqchips for
save/restore and live migration. PIC s/r and l/m is implemented in this
patch.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Protect in-kernel pio using kvm->lock

pio operation and IRQ_LINE kvm_vm_ioctl is not kvm->lock
protected. Add lock to same with IOAPIC MMIO operations.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Emulate hlt in the kernel

By sleeping in the kernel when hlt is executed, we simplify the in-kernel
guest interrupt path considerably.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: In-kernel I/O APIC model

This allows in-kernel host-side device drivers to raise guest interrupts
without going to userspace.

[avi: fix level-triggered interrupt redelivery on eoi]
[avi: add missing #include]
[avi: avoid redelivery of edge-triggered interrupt]
[avi: implement polarity]
[avi: don't deliver edge-triggered interrupts when unmasking]
[avi: fix host oops on invalid guest access]

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Emulate local APIC in kernel

Because lightweight exits (exits which don't involve userspace) are many
times faster than heavyweight exits, it makes sense to emulate high usage
devices in the kernel. The local APIC is one such device, especially for
Windows and for SMP, so we add an APIC model to kvm.

It also allows in-kernel host-side drivers to inject interrupts without
going through userspace.

[compile fix on i386 from Jindrich Makovicka]

Signed-off-by: Yaozu (Eddie) Dong <Eddie.Dong@intel.com>
Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Define and use cr8 access functions

This patch is to wrap APIC base register and CR8 operation which can
provide a unique API for user level irqchip and kernel irqchip.
This is a preparation of merging lapic/ioapic patch.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Add support for in-kernel PIC emulation

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: VMX: Split segments reload in vmx_load_host_state()

vmx_load_host_state() bundles fs, gs, ldt, and tss reloading into
one in the hope that it is infrequent. With smp guests, fs reloading is
frequent due to fs being used by threads.

Unbundle the reloads so reduce expensive gs reloads.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: X86 emulator: fix 'push reg' writeback

Pointed out by Rusty Russell.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Support more memory slots

Needed for mapping memory at 4GB.

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: VMX: allow rmode_tss_base() to work with >2G of guest memory

Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: x86 emulator: implement 'push reg' (opcodes 0x50-0x57)

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: x86 emulator: Implement 'jmp rel short' instruction (opcode 0xeb)

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: x86 emulator: implement 'jmp rel' instruction (opcode 0xe9)

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: x86 emulator: implement 'and $imm, %{al|ax|eax}'

Implement emulation of instruction
and al imm8 (opcode 0x24)
and ax/eax imm16/imm32 (opcode 0x25)

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Communicate cr8 changes to userspace

This allows running 64-bit Windows.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Close minor race in signal handling

We need to check for signals inside the critical section, otherwise a
signal can be sent which we will not notice. Also move the check
before entry, so that if the signal happens before the first entry,
we exit immediately instead of waiting for something to happen to the
guest.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Clean up kvm_setup_pio()

Split kvm_setup_pio() into two functions, one to setup in/out pio
(kvm_emulate_pio()) and one to setup ins/outs pio (kvm_emulate_pio_string()).

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Cleanup string I/O instruction emulation

Both vmx and svm decode the I/O instructions, and both botch the job,
requiring the instruction prefixes to be fetched in order to completely
decode the instruction.

So, if we see a string I/O instruction, use the x86 emulator to decode it,
as it already has all the prefix decoding machinery.

This patch defines ins/outs opcodes in x86_emulate.c and calls
emulate_instruction() from io_interception() (svm.c) and from handle_io()
(vmx.c). It removes all vmx/svm prefix instruction decoders
(get_addr_size(), io_get_override(), io_address(), get_io_count())

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Remove useless assignment

Line 1809 of kvm_main.c is useless, value is overwritten in line 1815:

1809         now = min(count, PAGE_SIZE / size);
1810
1811         if (!down)
1812                 in_page = PAGE_SIZE - offset_in_page(address);
1813         else
1814                 in_page = offset_in_page(address) + size;
1815         now = min(count, (unsigned long)in_page / size);
1816         if (!now) {

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: VMX: Remove a duplicated ia32e mode vm entry control

Remove a duplicated ia32e mode VM Entry control definition and use the
proper one.

Signed-off-by: Xin Li <xin.b.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Use kmem_cache_free for kmem_cache_zalloc'ed objects

We use kfree in svm.c and vmx.c, and this works, but it could break at
any time. kfree() is supposed to match up with kmalloc().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Add and use pr_unimpl for standard formatting of unimplemented features

All guest-invokable printks should be ratelimited to prevent malicious
guests from flooding logs. This is a start.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Remove unneeded kvm_dev_open and kvm_dev_release functions.

Devices don't need open or release functions.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Remove stat_set from debugfs

We shouldn't define stat_set on the debug attributes, since that will
cause silent failure on writing: without a set argument, userspace
will get -EACCESS.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Fix defined but not used warning in drivers/kvm/vmx.c

move_msr_up() is used only on X86_64 and generates a warning on !X86_64

Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Remove redundant alloc_vmcs_cpu declaration

alloc_vmcs_cpu is already declared (static) above, no need to
redeclare.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: SVM: Make set_msr_interception more reliable

set_msr_interception() is used by svm to set up which MSRs should be
intercepted. It can only fail if someone has changed the code to try
to intercept an MSR without updating the array of ranges.

The return value is ignored anyway: it should just BUG() if it doesn't
work. (A build-time failure would be better, but that's tricky).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Cleanup mark_page_dirty

For some reason, mark_page_dirty open-codes __gfn_to_memslot().

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Don't assign vcpu->cr3 if it's invalid: check first, set last

sSigned-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: VMX: Add cpu consistency check

All the physical CPUs on the board should support the same VMX feature
set. Add check_processor_compatibility to kvm_arch_ops for the consistency
check.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: kvm_vm_ioctl_get_dirty_log restore "nothing dirty" optimization

kvm_vm_ioctl_get_dirty_log scans bitmap to see it it's all zero, but
doesn't use that information.

Avi says:
Looks like it was used to guard kvm_mmu_slot_remove_write_access();
optimizing the case where the guest just leaves the screen alone (which
it usually does, especially in benchmarks).

I'd rather reinstate that optimization. See
90cb0529dd230548a7f0d6b315997be854caea1b where the damage was done.

It's pretty simple: if the bitmap is all zero, we don't need to do anything to
clean it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Use alignment properties of vcpu to simplify FPU ops

Now we use a kmem cache for allocating vcpus, we can get the 16-byte
alignment required by fxsave & fxrstor instructions, and avoid
manually aligning the buffer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Use kmem cache for allocating vcpus

Avi wants the allocations of vcpus centralized again. The easiest way
is to add a "size" arg to kvm_init_arch, and expose the thus-prepared
cache to the modules.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Remove kvm_{read,write}_guest()

... in favor of the more general emulator_{read,write}_*.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Change the emulator_{read,write,cmpxchg}_* functions to take a vcpu

... instead of a x86_emulate_ctxt, so that other callers can use it easily.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: SVM: internal function name cleanup

Changes some svm.c internal function names:
1) io_adress -> io_address (de-germanify the spelling)
2) kvm_reput_irq -> reput_irq (it's not a generic kvm function)
3) kvm_do_inject_irq -> (it's not a generic kvm function)

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: SVM: de-containization

container_of is wonderful, but not casting at all is better. This
patch changes svm.c's internal functions to pass "struct vcpu_svm"
instead of "struct kvm_vcpu" and using container_of.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Remove three magic numbers

There are several places where hardcoded numbers are used in place of
the easily-available constant, which is poor form.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: VMX: pass vcpu_vmx internally

container_of is wonderful, but not casting at all is better. This
patch changes vmx.c's internal functions to pass "struct vcpu_vmx"
instead of "struct kvm_vcpu" and using container_of.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: fx_init() needs preemption disabled while it plays with the FPU state

Now that kvm generally runs with preemption enabled, we need to protect
the fpu intialization sequence.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Convert vm lock to a mutex

This allows the kvm mmu to perform sleepy operations, such as memory
allocation.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Use the scheduler preemption notifiers to make kvm preemptible

Current kvm disables preemption while the new virtualization registers are
in use. This of course is not very good for latency sensitive workloads (one
use of virtualization is to offload user interface and other latency
insensitive stuff to a container, so that it is easier to analyze the
remaining workload). This patch re-enables preemption for kvm; preemption
is now only disabled when switching the registers in and out, and during
the switch to guest mode and back.

Contains fixes from Shaohua Li <shaohua.li@intel.com>.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: add hypercall nr to kvm_run

Add the hypercall number to kvm_run and initialize it. This changes the ABI,
but as this particular ABI was unusable before this no users are affected.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: VMX: Improve the method of writing vmcs control

Put cpu feature detecting part in hardware_setup, and stored the vmcs
condition in global variable for further check.

[glommer: fix for some i386-only machines not supporting CR8 load/store
exiting]

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Dynamically allocate vcpus

This patch converts the vcpus array in "struct kvm" to a pointer
array, and changes the "vcpu_create" and "vcpu_setup" hooks into one
"vcpu_create" call which does the allocation and initialization of the
vcpu (calling back into the kvm_vcpu_init core helper).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Remove arch specific components from the general code

struct kvm_vcpu has vmx-specific members; remove them to a private structure.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: load_pdptrs() cleanups

load_pdptrs can be handed an invalid cr3, and it should not oops.
This can happen because we injected #gp in set_cr3() after we set
vcpu->cr3 to the invalid value, or from kvm_vcpu_ioctl_set_sregs(), or
memory configuration changes after the guest did set_cr3().

We should also copy the pdpte array once, before checking and
assigning, otherwise an SMP guest can potentially alter the values
between the check and the set.

Finally one nitpick: ret = 1 should be done as late as possible: this
allows GCC to check for unset "ret" should the function change in
future.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Remove dead code in the cmpxchg instruction emulation

The writeback fixes (02c03a326a5df825cc01de426f72e160db2b9538) let
some dead code in the cmpxchg instruction emulation. Remove it.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: VMX: Import some constants of vmcs from IA32 SDM

This patch mainly imports some constants and rename two exist constants
of vmcs according to IA32 SDM.

It also adds two constants to indicate Lock bit and Enable bit in
MSR_IA32_FEATURE_CONTROL, and replace the hardcode _5_ with these two
bits.

Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Move gfn_to_page out of kmap/unmap pairs

gfn_to_page might sleep with swap support. Move it out of the kmap calls.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Hoist kvm_mmu_reload() out of the critical section

vmx_cpu_run doesn't handle error correctly and kvm_mmu_reload might
sleep with mutex changes, so I move it above.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Return if the pdptrs are invalid when the guest turns on PAE.

Don't fall through and turn on PAE in this case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: x86 emulator: fix faulty check for two-byte opcode

Right now, the bug is harmless as we never emulate one-byte 0xb6 or 0xb7.
But things may change.

Noted by the mysterious Gabriel C.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: x86 emulator: fix cmov for writeback changes

The writeback fixes (02c03a326a5df825cc01de426f72e160db2b9538) broke
cmov emulation. Fix.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Use standard CR8 flags, and fix TPR definition

Intel manual (and KVM definition) say the TPR is 4 bits wide. Also fix
CR8_RESEVED_BITS typo.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Set exit_reason to KVM_EXIT_MMIO where run->mmio is initialized.

Signed-off-by: Jeff Dike <jdike@linux.intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Trivial: Use standard BITMAP macros, open-code userspace-exposed header

Creating one's own BITMAP macro seems suboptimal: if we use manual
arithmetic in the one place exposed to userspace, we can use standard
macros elsewhere.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Use standard CR4 flags, tighten checking

On this machine (Intel), writing to the CR4 bits 0x00000800 and
0x00001000 cause a GPF. The Intel manual is a little unclear, but
AFIACT they're reserved, too.

Also fix spelling of CR4_RESEVED_BITS.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Use standard CR3 flags, tighten checking

The kernel now has asm/cpu-features.h: use those macros instead of inventing
our own.

Also spell out definition of CR3_RESEVED_BITS, fix spelling and
tighten it for the non-PAE case.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Trivial: Use standard CR0 flags macros from asm/cpu-features.h

The kernel now has asm/cpu-features.h: use those macros instead of
inventing our own.

Also spell out definition of CR0_RESEVED_BITS (no code change) and fix typo.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Trivial: Avoid hardware_disable predeclaration

Don't pre-declare hardware_disable: shuffle the reboot hook down.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Trivial: Comment spelling may escape grep

Speling error in comment.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Trivial: Make decode_register() static

I have shied away from touching x86_emulate.c (it could definitely use
some love, but it is forked from the Xen code, and it would be more
productive to cross-merge fixes).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Trivial: Remove unused struct cpu_user_regs declaration

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Trivial: /dev/kvm interface is no longer experimental.

KVM interface is no longer experimental.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: In-kernel string pio write support

Add string pio write support to support some version of Windows.

Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Future-proof the exit information union ABI

Note that as the size of struct kvm_run is not part of the ABI, we can add
things at the end.

Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: SMP: Add vcpu_id field in struct vcpu

This patch adds a `vcpu_id' field in `struct vcpu', so we can
differentiate BSP and APs without pointer comparison or arithmetic.

Signed-off-by: Qing He <qing.he@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

KVM: Fix *nopage() in kvm_main.c

*nopage() in kvm_main.c should only store the type of mmap() fault if
the pointers are not NULL. This patch fixes the problem.

Signed-off-by: Nguyen Anh Quynh <aquynh@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>

i386: Expose IOAPIC register definitions even if CONFIG_X86_IO_APIC is not set

KVM reuses the IOAPIC register definitions, and needs them even if the
host is not compiled with IOAPIC support. Move the #ifdef below so that only
the IOAPIC variables and functions are protected, and the register definitions
are available to all.

Signed-off-by: Avi Kivity <avi@qumranet.com>

x86/pci/acpi: fix DMI const-ification fallout

Fix DMI const-ification fallout that appeared when merging subsystem
trees.

Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: optimise barriers

According to latest memory ordering specification documents from Intel
and AMD, both manufacturers are committed to in-order loads from
cacheable memory for the x86 architecture.  Hence, smp_rmb() may be a
simple barrier.

Also according to those documents, and according to existing practice in
Linux (eg.  spin_unlock doesn't enforce ordering), stores to cacheable
memory are visible in program order too.  Special string stores are safe
-- their constituent stores may be out of order, but they must complete
in order WRT surrounding stores.  Nontemporal stores to WB memory can go
out of order, and so they should be fenced explicitly to make them
appear in-order WRT other stores.  Hence, smp_wmb() may be a simple
barrier.

    http://developer.intel.com/products/processor/manuals/318147.pdf
    http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf

In userspace microbenchmarks on a core2 system, fence instructions range
anywhere from around 15 cycles to 50, which may not be totally
insignificant in performance critical paths (code size will go down
too).

However the primary motivation for this is to have the canonical barrier
implementation for x86 architecture.

smp_rmb on buggy pentium pros remains a locked op, which is apparently
required.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: fix IO write barrier

wmb() on x86 must always include a barrier, because stores can go out of
order in many cases when dealing with devices (eg. WC memory).

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x86: fence oostores on 64-bit

movnt* instructions are not strongly ordered with respect to other stores,
so if we are to assume stores are strongly ordered in the rest of the 64
bit code, we must fence these off (see similar examples in 32 bit code).

[ The AMD memory ordering document seems to say that nontemporal stores can
also pass earlier regular stores, so maybe we need sfences _before_
movnt* everywhere too? ]

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Only enable BLOCK_COMPAT if COMPAT is needed

IOW, it needs to depend on both CONFIG_BLOCK and CONFIG_COMPAT.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev

* 'upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (119 commits)
  [libata] struct pci_dev related cleanups
  libata: use ata_exec_internal() for PMP register access
  libata: implement ATA_PFLAG_RESETTING
  libata: add @timeout to ata_exec_internal[_sg]()
  ahci: fix notification handling
  ahci: clean up PORT_IRQ_BAD_PMP enabling
  ahci: kill leftover from enabling NCQ over PMP
  libata: wrap schedule_timeout_uninterruptible() in loop
  libata: skip suppress reporting if ATA_EHI_QUIET
  libata: clear ehi description after initial host report
  pata_jmicron: match vendor and class code only
  libata: add ST9160821AS / 3.ALD to NCQ blacklist
  pata_acpi: ACPI driver support
  libata-core: Expose gtm methods for driver use
  libata: add HDT722516DLA380 to NCQ blacklist
  libata: blacklist NCQ on Seagate Barracuda ST380817AS
  [libata] Turn on ACPI by default
  libata_scsi: Fix ATAPI transfer lengths
  libata: correct handling of SRST reset sequences
  libata: Integrate ACPI-based PATA/SATA hotplug - version 5
  ...

Update maintainers file

Since there is no x86-64 architecture anymore it cannot be maintained.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>