Joerg Roedel [Wed, 18 Feb 2009 13:08:59 +0000 (14:08 +0100)]
KVM: MMU: remove redundant check in mmu_set_spte
The following code flow is unnecessary:
if (largepage)
was_rmapped = is_large_pte(*shadow_pte);
else
was_rmapped = 1;
The is_large_pte() function will always evaluate to one here because the
(largepage && !is_large_pte) case is already handled in the first
if-clause. So we can remove this check and set was_rmapped to one always
here.
Joerg Roedel [Wed, 18 Feb 2009 13:08:58 +0000 (14:08 +0100)]
KVM: MMU: handle compound pages in kvm_is_mmio_pfn
The function kvm_is_mmio_pfn is called before put_page is called on a
page by KVM. This is a problem when when this function is called on some
struct page which is part of a compund page. It does not test the
reserved flag of the compound page but of the struct page within the
compount page. This is a problem when KVM works with hugepages allocated
at boot time. These pages have the reserved bit set in all tail pages.
Only the flag in the compount head is cleared. KVM would not put such a
page which results in a memory leak.
Gerd Hoffmann [Wed, 4 Feb 2009 16:52:04 +0000 (17:52 +0100)]
KVM: Fix kvmclock on !constant_tsc boxes
kvmclock currently falls apart on machines without constant tsc.
This patch fixes it. Changes:
* keep tsc frequency in a per-cpu variable.
* handle kvmclock update using a new request flag, thus checking
whenever we need an update each time we enter guest context.
* use a cpufreq notifier to track frequency changes and force
kvmclock updates.
* send ipis to kick cpu out of guest context if needed to make
sure the guest doesn't see stale values.
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Sheng Yang [Tue, 10 Feb 2009 05:57:06 +0000 (13:57 +0800)]
KVM: Use irq routing API for MSI
Merge MSI userspace interface with IRQ routing table. Notice the API have been
changed, and using IRQ routing table would be the only interface kvm-userspace
supported.
Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Mon, 2 Feb 2009 15:23:51 +0000 (16:23 +0100)]
KVM: Add FFXSR support
AMD K10 CPUs implement the FFXSR feature that gets enabled using
EFER. Let's check if the virtual CPU description includes that
CPUID feature bit and allow enabling it then.
This is required for Windows Server 2008 in Hyper-V mode.
v2 adds CPUID capability exposure
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Jes Sorensen [Wed, 21 Jan 2009 14:16:43 +0000 (15:16 +0100)]
KVM: ia64: dynamic nr online cpus
Account for number of online cpus and use that in loops iterating over
the list of vpus instead of scanning the full array unconditionally.
This patch is a building block to facilitate allowing to bump up
the size of MAX_VCPUS significantly.
This patch fixes the SET PREFIX interrupt if triggered by userspace.
Until now, it was not necessary, but life migration will need it. In
addition, it helped me creating SMP support for my kvm_crashme tool
(lets kvm execute random guest memory content).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
KVM: s390: Fix problem state check for b2 intercepts
The kernel handles some priviledged instruction exits. While I was
unable to trigger such an exit from guest userspace, the code should
check for supervisor state before emulating a priviledged instruction.
I also renamed kvm_s390_handle_priv to kvm_s390_handle_b2. After all
there are non priviledged b2 instructions like stck (store clock).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
KVM on s390 does not support the ESA/390 architecture. We refuse to
change the architecture mode and print a warning. This patch removes
the printk for several reasons:
o A malicious guest can flood host dmesg
o The old message had no newline
o there is no connection between the message and the failing guest
This patch simply removes the printk. We already set the condition
code to 3 - the guest knows that something went wrong.
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Xiantao Zhang [Wed, 21 Jan 2009 03:21:27 +0000 (11:21 +0800)]
KVM: ia64: Implement some pal calls needed for windows 2008
For windows 2008, it needs more pal calls to implement for booting.
In addition, also changes the name of set_{sal, pal}_call_result to
get_{sal,pal}_call_result for readability.
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Mon, 19 Jan 2009 12:57:52 +0000 (14:57 +0200)]
KVM: Avoid using CONFIG_ in userspace visible headers
Kconfig symbols are not available in userspace, and are not stripped by
headers-install. Avoid their use by adding #defines in <asm/kvm.h> to
suit each architecture.
Avi Kivity [Wed, 19 Nov 2008 11:58:46 +0000 (13:58 +0200)]
KVM: Userspace controlled irq routing
Currently KVM has a static routing from GSI numbers to interrupts (namely,
0-15 are mapped 1:1 to both PIC and IOAPIC, and 16:23 are mapped 1:1 to
the IOAPIC). This is insufficient for several reasons:
- HPET requires non 1:1 mapping for the timer interrupt
- MSIs need a new method to assign interrupt numbers and dispatch them
- ACPI APIC mode needs to be able to reassign the PCI LINK interrupts to the
ioapics
This patch implements an interrupt routing table (as a linked list, but this
can be easily changed) and a userspace interface to replace the table. The
routing table is initialized according to the current hardwired mapping.
Marcelo Tosatti [Thu, 8 Jan 2009 19:44:19 +0000 (17:44 -0200)]
KVM: MMU: drop zeroing on mmu_memory_cache_alloc
Zeroing on mmu_memory_cache_alloc is unnecessary since:
- Smaller areas are pre-allocated with kmem_cache_zalloc.
- Page pointed by ->spt is overwritten with prefetch_page
and entries in page pointed by ->gfns are initialized
before reading.
[avi: zeroing pages is unnecessary]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 4 Jan 2009 15:10:50 +0000 (17:10 +0200)]
KVM: Interrupt mask notifiers for ioapic
Allow clients to request notifications when the guest masks or unmasks a
particular irq line. This complements irq ack notifications, as the guest
will not ack an irq line that is masked.
Alexander Graf [Mon, 5 Jan 2009 15:02:47 +0000 (16:02 +0100)]
KVM: SVM: Add microcode patch level dummy
VMware ESX checks if the microcode level is correct when using a barcelona
CPU, in order to see if it actually can use SVM. Let's tell it we're on the
safe side...
Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Sun, 4 Jan 2009 22:53:19 +0000 (00:53 +0200)]
KVM: VMX: Prevent exit handler from running if emulating due to invalid state
If we've just emulated an instruction, we won't have any valid exit
reason and associated information.
Fix by moving the clearing of the emulation_required flag to the exit handler.
This way the exit handler can notice that we've been emulating and abort
early.
KVM: ppc: split out common Book E instruction emulation
The Book E code will be shared with e500.
I've left PID in kvmppc_core_emulate_op() just so that we don't need to move
kvmppc_set_pid() right now. Once we have the e500 implementation, we can
probably share that too.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
This commit change the name of emulator_read_std into kvm_read_guest_virt,
and add new function name kvm_write_guest_virt that allow writing into a
guest virtual address.
Signed-off-by: Izik Eidus <ieidus@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Fix it by initializating the offset of each vcpu relative to vm creation
time, and moving it from vmx_vcpu_reset to vmx_vcpu_setup, out of the
APIC MP init path.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Avi Kivity [Tue, 23 Dec 2008 17:46:01 +0000 (19:46 +0200)]
KVM: Fix vmload and friends misinterpreted as lidt
The AMD SVM instruction family all overload the 0f 01 /3 opcode, further
multiplexing on the three r/m bits. But the code decided that anything that
isn't a vmmcall must be an lidt (which shares the 0f 01 /3 opcode, for the
case that mod = 3).
Fix by aborting emulation if this isn't a vmmcall.
Avi Kivity [Sun, 21 Dec 2008 17:27:36 +0000 (19:27 +0200)]
KVM: MMU: Segregate mmu pages created with different cr4.pge settings
Don't allow a vcpu with cr4.pge cleared to use a shadow page created with
cr4.pge set; this might cause a cr3 switch not to sync ptes that have the
global bit set (the global bit has no effect if !cr4.pge).
This can only occur on smp with different cr4.pge settings for different
vcpus (since a cr4 change will resync the shadow ptes), but there's no
cost to being correct here.
Jan Kiszka [Mon, 15 Dec 2008 12:52:10 +0000 (13:52 +0100)]
KVM: x86: Virtualize debug registers
So far KVM only had basic x86 debug register support, once introduced to
realize guest debugging that way. The guest itself was not able to use
those registers.
This patch now adds (almost) full support for guest self-debugging via
hardware registers. It refactors the code, moving generic parts out of
SVM (VMX was already cleaned up by the KVM_SET_GUEST_DEBUG patches), and
it ensures that the registers are properly switched between host and
guest.
This patch also prepares debug register usage by the host. The latter
will (once wired-up by the following patch) allow for hardware
breakpoints/watchpoints in guest code. If this is enabled, the guest
will only see faked debug registers without functionality, but with
content reflecting the guest's modifications.
Tested on Intel only, but SVM /should/ work as well, but who knows...
Known limitations: Trapping on tss switch won't work - most probably on
Intel.
Credits also go to Joerg Roedel - I used his once posted debugging
series as platform for this patch.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Mon, 15 Dec 2008 12:52:10 +0000 (13:52 +0100)]
KVM: VMX: Allow single-stepping when uninterruptible
When single-stepping over STI and MOV SS, we must clear the
corresponding interruptibility bits in the guest state. Otherwise
vmentry fails as it then expects bit 14 (BS) in pending debug exceptions
being set, but that's not correct for the guest debugging case.
Note that clearing those bits is safe as we check for interruptibility
based on the original state and do not inject interrupts or NMIs if
guest interruptibility was blocked.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Mon, 15 Dec 2008 12:52:10 +0000 (13:52 +0100)]
KVM: New guest debug interface
This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
part, controlling the "main switch" and the single-step feature. The
arch specific part adds an x86 interface for intercepting both types of
debug exceptions separately and re-injecting them when the host was not
interested. Moveover, the foundation for guest debugging via debug
registers is layed.
To signal breakpoint events properly back to userland, an arch-specific
data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
contains the PC, the debug exception, and relevant debug registers to
tell debug events properly apart.
The availability of this new interface is signaled by
KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
provided.
Note that both SVM and VTX are supported, but only the latter was tested
yet. Based on the experience with all those VTX corner case, I would be
fairly surprised if SVM will work out of the box.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Jan Kiszka [Mon, 15 Dec 2008 12:52:10 +0000 (13:52 +0100)]
KVM: VMX: Support for injecting software exceptions
VMX differentiates between processor and software generated exceptions
when injecting them into the guest. Extend vmx_queue_exception
accordingly (and refactor related constants) so that we can use this
service reliably for the new guest debugging framework.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Tue, 25 Nov 2008 19:17:11 +0000 (20:17 +0100)]
KVM: SVM: Only allow setting of EFER_SVME when CPUID SVM is set
Userspace has to tell the kernel module somehow that nested SVM should be used.
The easiest way that doesn't break anything I could think of is to implement
if (cpuid & svm)
allow write to efer
else
deny write to efer
Old userspaces mask the SVM capability bit, so they don't break.
In order to find out that the SVM capability is set, I had to split the
kvm_emulate_cpuid into a finding and an emulating part.
(introduced in v6)
Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Tue, 25 Nov 2008 19:17:08 +0000 (20:17 +0100)]
KVM: SVM: Add VMEXIT handler and intercepts
This adds the #VMEXIT intercept, so we return to the level 1 guest
when something happens in the level 2 guest that should return to
the level 1 guest.
v2 implements HIF handling and cleans up exception interception
v3 adds support for V_INTR_MASKING_MASK
v4 uses the host page hsave
v5 removes IOPM merging code
v6 moves mmu code out of the atomic section
Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Tue, 25 Nov 2008 19:17:07 +0000 (20:17 +0100)]
KVM: SVM: Add VMRUN handler
This patch implements VMRUN. VMRUN enters a virtual CPU and runs that
in the same context as the normal guest CPU would run.
So basically it is implemented the same way, a normal CPU would do it.
We also prepare all intercepts that get OR'ed with the original
intercepts, as we do not allow a level 2 guest to be intercepted less
than the first level guest.
v2 implements the following improvements:
- fixes the CPL check
- does not allocate iopm when not used
- remembers the host's IF in the HIF bit in the hflags
v3:
- make use of the new permission checking
- add support for V_INTR_MASKING_MASK
v4:
- use host page backed hsave
v5:
- remove IOPM merging code
v6:
- save cr4 so PAE l1 guests work
v7:
- return 0 on vmrun so we check the MSRs too
- fix MSR check to use the correct variable
Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Tue, 25 Nov 2008 19:17:06 +0000 (20:17 +0100)]
KVM: SVM: Add VMLOAD and VMSAVE handlers
This implements the VMLOAD and VMSAVE instructions, that usually surround
the VMRUN instructions. Both instructions load / restore the same elements,
so we only need to implement them once.
v2 fixes CPL checking and replaces memcpy by assignments
v3 makes use of the new permission checking
Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Tue, 25 Nov 2008 19:17:04 +0000 (20:17 +0100)]
KVM: SVM: Implement GIF, clgi and stgi
This patch implements the GIF flag and the clgi and stgi instructions that
set this flag. Only if the flag is set (default), interrupts can be received by
the CPU.
To keep the information about that somewhere, this patch adds a new hidden
flags vector. that is used to store information that does not go into the
vmcb, but is SVM specific.
I tried to write some code to make -no-kvm-irqchip work too, but the first
level guest won't even boot with that atm, so I ditched it.
v2 moves the hflags to x86 generic code
v3 makes use of the new permission helper
v6 only enables interrupt_window if GIF=1
Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Tue, 25 Nov 2008 19:17:03 +0000 (20:17 +0100)]
KVM: SVM: Add helper functions for nested SVM
These are helpers for the nested SVM implementation.
- nsvm_printk implements a debug printk variant
- nested_svm_do calls a handler that can accesses gpa-based memory
v3 makes use of the new permission checker
v6 changes:
- streamline nsvm_debug()
- remove printk(KERN_ERR)
- SVME check before CPL check
- give GP error code
- use new EFER constant
Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Tue, 25 Nov 2008 19:17:02 +0000 (20:17 +0100)]
KVM: SVM: Move EFER and MSR constants to generic x86 code
MSR_EFER_SVME_MASK, MSR_VM_CR and MSR_VM_HSAVE_PA are set in KVM
specific headers. Linux does have nice header files to collect
EFER bits and MSR IDs, so IMHO we should put them there.
While at it, I also changed the naming scheme to match that
of the other defines.
(introduced in v6)
Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Alexander Graf [Tue, 25 Nov 2008 19:17:01 +0000 (20:17 +0100)]
KVM: SVM: Clean up VINTR setting
The current VINTR intercept setters don't look clean to me. To make
the code easier to read and enable the possibilty to trap on a VINTR
set, this uses a helper function to set the VINTR intercept.
v2 uses two distinct functions for setting and clearing the bit
Acked-by: Joerg Roedel <joro@8bytes.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
Kyle McMartin [Mon, 23 Mar 2009 19:25:49 +0000 (15:25 -0400)]
Build with -fno-dwarf2-cfi-asm
With a sufficiently new compiler and binutils, code which wasn't
previously generating .eh_frame sections has begun to. Certain
architectures (powerpc, in this case) may generate unexpected relocation
formats in response to this, preventing modules from loading.
While the new relocation types should probably be handled, revert to the
previous behaviour with regards to generation of .eh_frame sections.
(This was reported against Fedora, which appears to be the only distro
doing any building against gcc-4.4 at present: RH bz#486545.)
Signed-off-by: Kyle McMartin <kyle@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: Alexandre Oliva <aoliva@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits)
ucc_geth: Fix oops when using fixed-link support
dm9000: locking bugfix
net: update dnet.c for bus_id removal
dnet: DNET should depend on HAS_IOMEM
dca: add missing copyright/license headers
nl80211: Check that function pointer != NULL before using it
sungem: missing net_device_ops
be2net: fix to restore vlan ids into BE2 during a IF DOWN->UP cycle
be2net: replenish when posting to rx-queue is starved in out of mem conditions
bas_gigaset: correctly allocate USB interrupt transfer buffer
smsc911x: reset last known duplex and carrier on open
sh_eth: Fix mistake of the address of SH7763
sh_eth: Change handling of IRQ
netns: oops in ip[6]_frag_reasm incrementing stats
net: kfree(napi->skb) => kfree_skb
net: fix sctp breakage
ipv6: fix display of local and remote sit endpoints
net: Document /proc/sys/net/core/netdev_budget
tulip: fix crash on iface up with shirq debug
virtio_net: Make virtio_net support carrier detection
...
Miklos Szeredi [Mon, 23 Mar 2009 15:07:24 +0000 (16:07 +0100)]
fix ptrace slowness
This patch fixes bug #12208:
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12208
Subject : uml is very slow on 2.6.28 host
This turned out to be not a scheduler regression, but an already
existing problem in ptrace being triggered by subtle scheduler
changes.
The problem is this:
- task A is ptracing task B
- task B stops on a trace event
- task A is woken up and preempts task B
- task A calls ptrace on task B, which does ptrace_check_attach()
- this calls wait_task_inactive(), which sees that task B is still on the runq
- task A goes to sleep for a jiffy
- ...
Since UML does lots of the above sequences, those jiffies quickly add
up to make it slow as hell.
This patch solves this by not rescheduling in read_unlock() after
ptrace_stop() has woken up the tracer.
Thanks to Oleg Nesterov and Ingo Molnar for the feedback.