On 14/12/11 11:03, David Evensky wrote:
> On an x86 32bit system (and using the 32bit CodeSourcery toolchain on
> a x86_64 system) I get:
>
> evensky@machine:~/.../linux-kvm/tools/kvm$ make
> CC util/util.o
> util/util.c: In function 'mmap_hugetlbfs':
> util/util.c:93:17: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
> util/util.c:99:7: error: format '%ld' expects argument of type 'long int', but argument 2 has type 'int' [-Werror=format]
> cc1: all warnings being treated as errors
>
> make: *** [util/util.o] Error 1
Fixes the build.
Reported-by: David Evensky <evensky@dancer.ca.sandia.gov> Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Different architectures will deal with MMIO exits differently. For example,
KVM_EXIT_IO is x86-specific, and I/O cycles are often synthesised by steering
into windows in PCI bridges on other architectures.
This patch calls arch-specific kvm_cpu__emulate_io() and kvm_cpu__emulate_mmio()
from the main runloop's IO and MMIO exit handlers. For x86, these directly
call kvm__emulate_io() and kvm__emulate_mmio() but other architectures will
perform some address munging before passing on the call.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Tue, 13 Dec 2011 06:21:46 +0000 (17:21 +1100)]
kvm tools: Add ability to map guest RAM from hugetlbfs
Add a --hugetlbfs commandline option to give a path to hugetlbfs-map guest
memory (down in kvm__arch_init()). For x86, guest memory is a normal
ANON mmap() if this option is not provided, otherwise a hugetlbfs mmap.
This maps directly from a hugetlbfs temp file rather than using something
like MADV_HUGEPAGES so that, if the user asks for hugepages, we definitely
are using hugepages. (This is particularly useful for architectures that
don't yet support KVM without hugepages, so we definitely need to use
them for the whole of guest RAM.)
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Mon, 5 Dec 2011 14:16:36 +0000 (16:16 +0200)]
kvm tools: Add 'kvm sandbox'
This patch adds 'kvm sandbox' which is a wrapper on top of 'kvm run' which
allows the user to easily specify sandboxed command to run in a custom
rootfs guest.
Sasha Levin [Mon, 5 Dec 2011 14:16:34 +0000 (16:16 +0200)]
kvm tools: Allow easily sandboxing applications within a guest
This patch adds a '--sandbox' argument when used in conjuction with a custom
rootfs, it allows running a script or an executable in the guest environment
by using executables and other files from the host.
This is useful when testing code that might cause problems on the host, or
to automate kernel testing since it's now easy to link a kvm tools test
script with 'git bisect run'.
Sasha Levin [Mon, 5 Dec 2011 14:16:32 +0000 (16:16 +0200)]
kvm tools: Split custom rootfs init into two stages
Currently custom rootfs init is built along with the main KVM tools executable
and is copied into custom rootfs directories when they are created with
'kvm setup'. The problem there is that if the init code changes, they have
to be manually copied to custom rootfs directories.
Instead, this patch splits init process into two parts. One part that simply
handles mounts, and passes it to stage 2 of the init.
Stage 2 really sits along in the code tree, and does all the heavy lifting.
This allows us to make init changes in the code tree and have it automatically
be updated in custom rootfs guests without having to copy files over manua
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
[ penberg@kernel.org: fix 'make check' breakage in Makefile ] Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Sat, 10 Dec 2011 20:40:43 +0000 (22:40 +0200)]
kvm tools: Free up the MSI-X PBA BAR
Free up the BAR to make space for the new virtio BARs. It isn't required
to have the PBA and the table in the separate BARs, and uniting them will
just give us extra BARs to play with.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Thomas Gleixner [Sat, 10 Dec 2011 20:27:26 +0000 (21:27 +0100)]
kvm tools: serial: Make it work with non rt guests as well
Sasha reported, that a non RT guest reports "too much work for irq 4"
with the previous serial overhaul.
The reason is, that the new code allows unlimited tx transfers, which
triggers the sanity check in the 8250.c interrupt handler.
Limit the consecutive TX chars to 16 and let the guest kernel escape
from the 8250 interrupt handler. Set the TEMT/THRE bits in the
periodic serial console update.
Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Thomas Gleixner [Sat, 10 Dec 2011 13:27:57 +0000 (13:27 +0000)]
kvm tool: serial: Fix interrupt handling
The interrupt injection of the serial emulation is completely
broken. It's just doing random toggling of the interrupt line, which
can lead to complete console hangs.
The real hardware asserts the interrupt line when a condition
(RX/TX/Status) is met and the corresponding interrupt is enabled in
the IER. It's deasserted when the condition is cleared or the
corresponding interrupt is disabled in the IER.
So the correct emulation just needs to check after each state change
in the LSR or the IER which bits in the IIR need to be set and update
the interrupt line accordingly. To avoid setting the same state over
and over keep an internal state of the last set interrupt line state
and only update via the kvm ioctl when the new state differs.
Rename serial8250__inject_interrupts() to serial8250__update_consoles()
which reflects what the function really is about.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Fri, 9 Dec 2011 06:55:54 +0000 (17:55 +1100)]
kvm tools: Arch-specific define for PCI MMIO allocation area
pci_get_io_space_block() used to grab addresses from
KVM_32BIT_GAP_START + 0x1000000, which is x86-specific. Create a new define,
KVM_PCI_MMIO_AREA, to specify a bus address these allocations can come from.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
This allows config space access in a more natural manner than clunky x86 IO ports,
and is useful for other architectures. Internally, the x86 IO port access uses
these new config space interfaces.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Fri, 9 Dec 2011 06:55:36 +0000 (17:55 +1100)]
kvm tools: Endian-sanitise pci.h and PCI device setup
vesa, pci-shmem and virtio-pci devices need to set up config space with
little-endian conversions (as config space is LE). The pci_config_address
bitfield also needs to be reversed when building on BE systems.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Fri, 9 Dec 2011 06:55:16 +0000 (17:55 +1100)]
kvm tools: Perform CPU and firmware setup after devices are added
Currently some devices (in this case kbd, fb, vesa) are initialised after
CPU/firmware setup. On some platforms (e.g. PPC) kvm__arch_setup_firmware() may
be making a device tree. Any devices added after this point will be missed!
Tiny refactor of builtin-run.c, moving timer start, firmware setup, cpu init
to occur last.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Fri, 9 Dec 2011 06:54:43 +0000 (17:54 +1100)]
kvm tools: Allow load_flat_binary() to load an initrd alongside
This patch passes the initrd fd and commandline to load_flat_binary(), which may
be used to load both the kernel & an initrd (stashing or inserting the
commandline as appropriate) in the same way that load_bzimage() does. This is
especially useful when load_bzimage() is unused for a particular
architecture. :-)
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
term_getc()'s int c has one byte written into it (at its lowest address) by
read_in_full(). This is expected to be the least significant byte, but that
isn't the case on BE! Use correct type, unsigned char. A similar issue exists
in term_getc_iov(), which needs to write a char to the iov rather than an int.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Fri, 9 Dec 2011 06:54:20 +0000 (17:54 +1100)]
kvm tools: Add CONSOLE_HV term type and allow it to be selected
This patch paves the way for adding a hypervisor console, useful on systems that
support one out of the box yet don't have either serial port or virtio console
support (e.g. kernels expecting POWER SPAPR).
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Fri, 9 Dec 2011 06:54:15 +0000 (17:54 +1100)]
kvm tools: Move arch-specific cmdline init into kvm__arch_set_cmdline()
Different systems will want different base kernel commandlines, e.g. non-x86
systems probably don't need noapic, i8042.* etc., so set the commandline up in
arch-specific code. Then, if the resulting commandline is empty, don't strcat a
space onto the front.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Fri, 9 Dec 2011 06:54:11 +0000 (17:54 +1100)]
kvm tools: Add kvm__arch_periodic_poll()
Currently, the SIGALRM handler calls device poll functions (for serial, virtio
console) directly. Which devices are present and which require polling is a
system-specific decision, so create a new function called from common code &
move the x86-specific poll calls into it.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Fri, 9 Dec 2011 06:54:06 +0000 (17:54 +1100)]
kvm tools: Fix KVM_RUN exit code check
kvm_cpu__run() currently die()s if KVM_RUN returns non-zero. Some architectures
may return positive values in non-error cases, whereas real errors are always
negative return values. Check for those instead.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Fri, 9 Dec 2011 06:54:01 +0000 (17:54 +1100)]
kvm tools: Don't die if KVM_CAP_NR_VCPUS isn't available
We die() if we can't read KVM_CAP_NR_VCPUS, but the API docs suggest to assume
the value 4 in this case. This is pertinent to PPC KVM, which currently
does not support this CAP.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Fri, 9 Dec 2011 06:53:56 +0000 (17:53 +1100)]
kvm tools: Add arch-specific KVM_RUN exit handling via kvm_cpu__handle_exit()
This patch creates a new function in x86/kvm-cpu.c, kvm_cpu__handle_exit(), in
which arch-specific exit reasons can be handled outside of the common runloop.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Fri, 9 Dec 2011 06:52:45 +0000 (17:52 +1100)]
kvm tools: Re-arrange Makefile to heed CFLAGS before checking for optional libs
The checks for optional libraries build code to perform the tests, so should
respect certain CFLAGS -- in particular, -m64 so we check for 64bit libraries if
they're required.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Fri, 9 Dec 2011 11:16:17 +0000 (13:16 +0200)]
kvm tools: Fix serial port probing
The process of probing the 8250 serial port is as follows:
1. Start detecting IRQs
2. Enable the IER register [At this point, the port is supposed to light
the INTR].
3. Stop detecting IRQs [At this point, the driver detects which IRQ
belongs to that port].
4. Disable IER register.
Since we weren't enabling and disabling the IRQ based on IER writes, we
would often fail the probing since the driver couldn't detect which IRQ
is used by the port, and would just default that to 0.
This would cause slowness and may have caused hangs. For me there is a
significant increase in speed of the terminal after this patch.
Sasha Levin [Wed, 7 Dec 2011 09:37:53 +0000 (11:37 +0200)]
kvm tools: Allow the user to pass a FD to use as a TAP device
This allows users to pass a pre-configured fd to use for the network
interface.
For example:
kvm run -n mode=tap,fd=3 3<>/dev/net/tap3
Acked-by: Daniel P. Berrange <berrange@redhat.com> Cc: Osier Yang <jyang@redhat.com> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pekka Enberg [Tue, 6 Dec 2011 13:54:29 +0000 (15:54 +0200)]
kvm tools: Fix kvm/barrier.h build breakage
Ingo Molnar writes:
On Tue, 6 Dec 2011, Ingo Molnar wrote:
> > FYI, today's version fails to build:
> >
> > In file included from x86/include/kvm/barrier.h:13:0,
> > from virtio/core.c:5:
> > ../../arch/x86/include/asm/system.h:404:1: error: unknown type
> > name ‘bool’
> > make: *** [virtio/core.o] Error 1
> > make: *** Waiting for unfinished jobs....
> >
> > latest Fedora Rawhide.
>
> There's no 'bool' in system.h for 3.2-rc4. Is this something that's
> changed in -tip?
It got introduced by a post-rc4 fix:
e5fd47bfab2d: xen/pm_idle: Make pm_idle be default_idle under Xen.
Fix it by include <stdbool.h> in kvm/barrier.h
Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Matt Evans [Tue, 6 Dec 2011 03:37:17 +0000 (14:37 +1100)]
kvm tools: Split x86 arch-specific bits into x86/
Create a new arch-specific subdirectory to contain architecture-specific code
and includes.
The Makefile now adds various arch-specific objects based on detected
architecture. That aside, this patch should only contain code moves. These
include:
- x86-specific kvm_cpu setup, kernel loading, memory setup etc. now in
x86/kvm{-cpu}.c
- BIOS now lives in x86/bios/
- ioport setup
- KVM extensions are asserted in arch-specific kvm.c now, so each architecture
can manage its own dependencies.
- Various architecture-specific #defines are moved into $(ARCH)/include/kvm{-cpu}.h
such as struct kvm_cpu, KVM_NR_CPUS, KVM_32BIT_GAP_SIZE.
Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Tue, 6 Dec 2011 08:45:21 +0000 (10:45 +0200)]
kvm tools: Ninja out support for VIRTIO_F_FEATURES_HIGH
Rusty has just removed it out of the spec. Since we probably the only ones
who implemented support for it, we should remove it out of our code as well.
There is no issue with breaking anything since nothing else worked with it,
so it's fully backwards compatible.
Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Lan Tianyu [Tue, 29 Nov 2011 07:30:26 +0000 (15:30 +0800)]
kvm tools, qcow: Add the support for copy-on-write cluster
When meeting request to write the cluster without copied flag,
allocate a new cluster and write original data with modification
to the new cluster. This also adds support for the writing operation
of the qcow2 compressed image. After testing, image file can pass
through "qemu-img check". The performance is needed to be improved.
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Asias He [Mon, 28 Nov 2011 05:34:11 +0000 (13:34 +0800)]
kvm tools: Improve virtio blk request processing
There are at most bdev->reqs[VIRTIO_BLK_QUEUE_SIZE] outstanding requests
at any time. We can simply use the head of each request to fetch the
right 'struct blk_dev_req' in bdev->reqs[].
So, we can eliminate the list and lock operations which introduced by
virtio_blk_req_{pop, push}.
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pekka Enberg [Mon, 28 Nov 2011 19:02:19 +0000 (21:02 +0200)]
kvm tools, rtc: Add RTC register names
Add missing RTC register names to hw/rtc.c and rename the current ones to
follow <linux/mc146818rtc.h> naming. It would be nice to use the header
directly but unfortunately it includes <linux/spinlock.h>.
Asias He [Fri, 25 Nov 2011 11:59:56 +0000 (19:59 +0800)]
kvm tools: Fix build in virtio/net.c
virtio/net.c: In function ???virtio_net__vhost_init???:
virtio/net.c:476:22: error: cast from pointer to integer of different
size [-Werror=pointer-to-int-cast]
cc1: all warnings being treated as errors
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Thu, 17 Nov 2011 13:53:24 +0000 (15:53 +0200)]
kvm tools: Prepare support for VIRTIO_RING_F_EVENT_IDX
This patch is the base for enabling support for event index feature in the virtio spec.
We do so by updating and evaluating the used/avail event idx in the virtio ring functions.
Actual usage of this flag is in the following patches.
The results are less notifications between the guest and host, and in
result faster operation of the virt queues.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 192.168.33.4 (192.168.33.4) port 0 AF_INET : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 1 1 10.00 14811.55
MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 192.168.33.4 (192.168.33.4) port 0 AF_INET : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
229376 229376 1 1 10.00 16000.44
229376 229376
After:
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.33.4 (192.168.33.4) port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 6340.74
MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
192.168.33.4 (192.168.33.4) port 0 AF_INET
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 192.168.33.4 (192.168.33.4) port 0 AF_INET : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 1 1 10.00 17126.10
MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 192.168.33.4 (192.168.33.4) port 0 AF_INET : first burst 0
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
229376 229376 1 1 10.00 17944.51 Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cyrill Gorcunov [Fri, 4 Nov 2011 06:48:06 +0000 (10:48 +0400)]
kvm tools: Fix VESA BIOS mode info
Some VGA data such as VESA info needs a proper tuning for addresses
passed on kernel requests. We returned linear addresses there while
spec points out that far pointers are needed.
This fixes a long-standing issue that caused Linux kernel to never exit
the mode scanning loop in arch/x86/boot/video-vesa.c::vesa_probe()
because returned mode info table segment/offset pair was bogus. The
issue triggered on some machines when CONFIG_FB_VESA was disabled.
Also use already VESA structures defined in <boot/vesa.h> header now
that we no longer use the struct to store VESA BIOS data.
Reported-by: Pekka Enberg <penberg@kernel.org> CC: Sasha Levin <levinsasha928@gmail.com> CC: Ingo Molnar <mingo@elte.hu> CC: Asias He <asias.hejun@gmail.com> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
[ penberg@kernel.org: minor cleanups to setup_vga_rom() ] Signed-off-by: Pekka Enberg <penberg@kernel.org>