Sasha Levin [Mon, 30 May 2011 17:27:57 +0000 (20:27 +0300)]
kvm tools: Add debug mode to brlock
Adds a debug mode which allows to switch the brlock into
a big rwlock.
This can be used to verify we don't end up with a BKL kind
of lock with the current brlock implementation.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Mon, 30 May 2011 17:27:55 +0000 (20:27 +0300)]
kvm tools: Add a brlock
brlock is a lock which is very cheap for reads, but very expensive
for writes.
This lock will be used when updates are very rare and reads are
common.
This lock is currently implemented by stopping the guest while
performing the updates. We assume that the only threads which
read from the locked data are VCPU threads, and the only writer
isn't a VCPU thread.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Mon, 30 May 2011 17:27:52 +0000 (20:27 +0300)]
kvm tools: Add APIs to allow pausing guests
Allow pausing and unpausing guests running on the host.
Pausing a guest means that none of the VCPU threads are running
KVM_RUN until they are unpaused.
The following API functions are added:
void kvm__pause(void);
void kvm__continue(void);
void kvm__notify_paused(void);
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Ingo Molnar [Sun, 29 May 2011 12:51:48 +0000 (14:51 +0200)]
kvm tools: Fix virtio net build breakage on 32-bit
* Sasha Levin <levinsasha928@gmail.com> wrote:
> Use ioeventfds to receive notifications of IO events in virtio-net.
> Doing so prevents an exit every time we receive/send a packet.
>
> Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> ---
> tools/kvm/virtio/net.c | 22 ++++++++++++++++++++++
> 1 files changed, 22 insertions(+), 0 deletions(-)
This needs the fix below to build on 32-bit.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Fri, 27 May 2011 16:18:37 +0000 (19:18 +0300)]
kvm tools: Add ioeventfd support
ioeventfd is way provided by KVM to receive notifications about
reads and writes to PIO and MMIO areas within the guest.
Such notifications are usefull if all we need to know is that
a specific area of the memory has been changed, and we don't need
a heavyweight exit to happen.
The implementation uses epoll to scale to large number of ioeventfds.
Benchmarks ran on a seperate (non boot) 1GB virtio-blk device, formatted
as ext4, using bonnie++.
Sasha Levin [Thu, 26 May 2011 14:25:46 +0000 (17:25 +0300)]
kvm tools: Exit VCPU thread only when SIGKVMEXIT is received
Currently the VCPU loop would exit when the thread received any signal.
Change behaviour to exit only when SIGKVMEXIT is received. This change
prevents from the guest to terminate when unrelated signals are processed
by the thread (for example, when attaching a debugger).
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Thu, 26 May 2011 10:30:07 +0000 (13:30 +0300)]
kvm tools: Add support for multiple virtio-rng devices
Since multiple hardware rng devices of the same type are currently
unsupported by the kernel, this serves more as an example of a basic
virtio driver under kvm tools and can be used to debug the PCI layer.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Thu, 26 May 2011 10:30:06 +0000 (13:30 +0300)]
kvm tools: Use ioport context to control blk devices
Since ioports now has the ability to pass context to its
callbacks, we can implement multiple blk devices more efficiently.
We can get a ptr to the 'current' blk dev on each ioport call, which
means that we don't need to keep track of the blk device allocation
and ioport distribution within the module.
The advantages are easier management of multiple blk devices and
removal of any hardcoded limits to the amount of possible blk
devices.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Thu, 26 May 2011 10:30:04 +0000 (13:30 +0300)]
kvm tools: Add optional parameter used in ioport callbacks
Allow specifying an optional parameter when registering an
ioport range. The callback functions provided by the registering
module will be called with the same parameter.
This may be used to keep context during callbacks on IO operations.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cyrill Gorcunov [Mon, 23 May 2011 14:39:17 +0000 (18:39 +0400)]
kvm tools: Drop unused vars from int10.c code
There is a couple of functions which defines 'ah' variable but
never use it in real so that gcc 4.6.x series does complain on
me as
CC bios/bios-rom.bin
bios/int10.c: In function ‘int10_putchar’:
bios/int10.c:86:9: error: variable ‘ah’ set but not used [-Werror=unused-but-set-variable]
bios/int10.c: In function ‘int10_vesa’:
bios/int10.c:96:9: error: variable ‘ah’ set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors
Start VNC server by starting kvm tools with "--vnc".
Connect to the VNC server by running: "vncviewer :0".
Since there is no support for input devices at this time,
it may be useful starting kvm tools with an additional
' -p "console=ttyS0" ' parameter so that it would be possible
to use a serial console alongside with a graphic one.
Signed-off-by: John Floren <john@jfloren.net>
[ turning code into patches and cleanup ] Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
John Floren [Mon, 23 May 2011 12:15:17 +0000 (15:15 +0300)]
kvm tools: Update makefile and feature tests
Update feature tests to test for libvncserver.
VESA support doesn't get compiled in unless libvncserver
is installed.
Signed-off-by: John Floren <john@jfloren.net>
[ turning code into patches and cleanup ] Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
John Floren [Mon, 23 May 2011 12:15:16 +0000 (15:15 +0300)]
kvm tools: Add VESA device
Add a simple VESA device which simply moves a framebuffer
from guest kernel to a VNC server.
VESA device PCI code is very similar to virtio-* PCI code.
Signed-off-by: John Floren <john@jfloren.net>
[ turning code into patches and cleanup ] Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
John Floren [Mon, 23 May 2011 12:15:15 +0000 (15:15 +0300)]
kvm tools: Add video mode to kernel initialization
Allow setting video mode in guest kernel.
For possible values see Documentation/fb/vesafb.txt
Signed-off-by: John Floren <john@jfloren.net>
[ turning code into patches and cleanup ] Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
John Floren [Mon, 23 May 2011 12:15:14 +0000 (15:15 +0300)]
kvm tools: Add BIOS INT10 handler
INT10 handler is a basic implementation of BIOS video services.
The handler implements a VESA interface which is initialized at
the very beginning of loading the kernel.
Signed-off-by: John Floren <john@jfloren.net>
[ turning code into patches and cleanup ] Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Prasad Joshi [Sun, 22 May 2011 16:24:04 +0000 (17:24 +0100)]
kvm tools: Add a wrapper function to open disk images
The patch was suggested by Ingo to move the disk image subsystem code
from the kvm-run.c file. The code to open all of the specified disk
images is now moved to a wrapper function in disk/core.c.
Cyrill Gorcunov [Sat, 21 May 2011 12:10:34 +0000 (16:10 +0400)]
kvm tools, 9p: Test for tuncation result
Without 'ret' usage I get
| cyrill@sun kvm $ make
| CC virtio/9p.o
| virtio/9p.c: In function ‘virtio_p9_wstat’:
| virtio/9p.c:448:6: error: variable ‘res’ set but not used [-Werror=unused-but-set-variable]
| cc1: all warnings being treated as errors
| make: *** [virtio/9p.o] Error 1
so add a basic check for ftruncate result, this eliminate warning and
we might need to use 'res' status later in caller code.
Pekka Enberg [Sat, 21 May 2011 12:04:10 +0000 (15:04 +0300)]
kvm tools, serial: Register 0x2e8 ioport
We already register ioports for 0x2f8 and 0x3e8 and mark them as inactive so
mark 0x2e8 ioport as such as well. This is a preparational step to dropping
serial port dummy registrations from ioport__setup_legacy().
Sasha Levin [Fri, 20 May 2011 14:23:05 +0000 (17:23 +0300)]
kvm tools: Cleanup e820 code
Several cleanups in the patch:
- Use kernel headers for e820 types and definitions.
- A byte sized entry count for e820 enteries was used,
this should be dword sized. Update in-memory layout and
bios code to fix it.
- Use struct e820map to calculate offsets used by bios code.
Sasha Levin [Fri, 20 May 2011 08:37:09 +0000 (11:37 +0300)]
kvm tools: Add virtio-9p
Overview:
9p allows for simple RPC based resource sharing over
different transports (in our case, virtio).
This is the implementation of (most of) the original
9p2000 protocol, without the .u or the .l extensions.
How to use:
1. Make sure kernel is compiled with:
CONFIG_NET_9P=y
CONFIG_NET_9P_VIRTIO=y
CONFIG_NET_9P_DEBUG=y (At least until code is stable)
CONFIG_9P_FS=y
2. Start KVM with '--virtio-9p <dirname>'. What happens now is that
a virtio transport with the name 'kvm_9p' is created. The server side
of the transport maps dirname to the root of the file system.
3. Within the guest, mount the fs:
mount -t 9p -otrans=virtio kvm_9p <local_dir> -oversion=9p2000
This will mount the 9p server to local_dir.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Fri, 20 May 2011 08:37:08 +0000 (11:37 +0300)]
kvm tools: Copy net/9p/9p.h
Header could not be included directly because among some minor
issues, the original header declared the same function twice:
int p9_errstr2errno(char *errstr, int len);
int p9_errstr2errno(char *, int);
A patch has been sent to 9P maintainers, this header should
be removed once the patch is in.
Until then, use a modified copy of the header.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cyrill Gorcunov [Wed, 18 May 2011 19:40:51 +0000 (23:40 +0400)]
kvm tools: Fix alignment for mpf_intel table
Thomas and Asias reported that kernel doesn't find MP
tables on 32 bit host. This is because previously the
alignment was done on address obtained from calloc
missing the fact that MP tables are put into guest
memory *with* offset and MP signature should be
calculated keeping this offset in midn as well and
then aligned.
Reported-by: Thomas Heil <heil@terminal-consulting.de> Reported-by: Asias He <asias.hejun@gmail.com> Tested-by: Thomas Heil <heil@terminal-consulting.de> Tested-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pekka Enberg [Wed, 18 May 2011 19:19:40 +0000 (22:19 +0300)]
kvm tools: Fail if passed initrd is not really an initrd
We recently changed the meaning of "-i" from disk image to initrd. This has
confused many users because kvm just reports:
Fatal: mmap() failed.
if a disk image is passed as initrd. This patch fixes that by checking for the
first two ID bytes in initrd:
$ ./kvm run -i ~/images/linux-0.2.qcow
# kvm run -k ../../arch/x86/boot/bzImage -m 256 -c 1
Fatal: /home/penberg/images/linux-0.2.qcow is not an initrd
Reported-by: Thomas Heil <heil@terminal-consulting.de> Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cyrill Gorcunov [Wed, 18 May 2011 19:08:57 +0000 (22:08 +0300)]
kvm tools: Add conditional compilation of symbol resolving
Thomas reported that on some systems there might be no bdf
library installed. So we take perf approach and check for
library presence at compilation time.
Reported-by: Thomas Heil <heil@terminal-consulting.de> Tested-by: Thomas Heil <heil@terminal-consulting.de> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Asias He [Wed, 18 May 2011 08:19:10 +0000 (16:19 +0800)]
kvm tools: Rename struct disk_image_operations ops name for raw image
This patch renames:
raw_image__read_sector_ro_mmap to raw_image__read_sector
raw_image__write_sector_ro_mmap to raw_image__write_sector
raw_image__close_ro_mmap to raw_image__close
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pekka Enberg [Tue, 17 May 2011 15:17:12 +0000 (18:17 +0300)]
kvm tools: Fix includes for preadv/pwritev
"bornto befrag <born2befrag@gmail.com>" writes:
> When i compile i kvm native tool tools/kvm && make i get this
>
> CC read-write.o
> cc1: warnings being treated as errors
> read-write.c: In function ‘xpreadv’:
> read-write.c:255: error: implicit declaration of function ‘preadv’
> read-write.c:255: error: nested extern declaration of ‘preadv’
> read-write.c: In function ‘xpwritev’:
> read-write.c:268: error: implicit declaration of function ‘pwritev’
> read-write.c:268: error: nested extern declaration of ‘pwritev’
> make: *** [read-write.o] Error 1
Fix that up by including <sys/uio.h> for preadv()/pwritev(). Reported-by: <born2befrag@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Tue, 17 May 2011 12:07:59 +0000 (15:07 +0300)]
kvm tools: Add interval red-black tree helper
Interval rb-tree allows to directly store interval ranges
and quickly lookup an overlap with a single point or a range.
The helper is based on the kernel rb-tree implementation
(located in <linux/rbtree.h>) which alows for the augmention
of the classical rb-tree to be used as an interval tree.
Prasad Joshi [Fri, 13 May 2011 14:02:46 +0000 (15:02 +0100)]
kvm tools: Add VIRTIO_BLK_T_FLUSH feature to handle flush operation from VM
The virtual machine calls 'sync' when the machine
is halted. Adding the virtio flush feature will
ensure that the data is synced on to disk before
the virtual machine is halted. This is needed to
ensure the intigrity of the data.
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Asias He [Fri, 13 May 2011 02:40:09 +0000 (10:40 +0800)]
kvm tools: Tune the command-line option
With this patch we can have
-c --cpus
-m --mem
-d --disk
-k --kernel
-i --initrd
which is more consistent and easy to remember.
The patch also frees up -s, -g option.
Ingo suggestied
'''
The debug options should probably be concentrated under a --debug option
anyway, to allow things like:
--debug single-step,ioport
Even if the debug options are kept they should be streamlined along
the same
pattern:
>> --debug-single-step Enable single stepping
>> --debug-ioport Enable ioport debugging
But having a --debug option that recognizes all the debug flags would
be nicer.
It would also allow future enhancements to group debug features, like:
--debug all # turn on everything and the kitchen sink
for early hangs
--debug all,-single-step # turn on everything except single-step
debugging
--debug nonverbose # turn on all non-noisy debug options we
have
Maybe even:
--debug memcheck
... could run kvm under valgrind automatically - that way we can hide
any secondary tool complexities from the user and turn those tools into
simple debug options :-)
'''
Let's do this --debug option consolidation later.
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Ingo Molnar [Fri, 13 May 2011 08:19:09 +0000 (10:19 +0200)]
kvm tools: Fix type mismatches on GCC 4.4 on 32-bit systems
The tools/kvm build still fails on 32-bit:
cc1: warnings being treated as errors
qcow.c: In function ‘qcow1_write_sector’:
qcow.c:307: error: comparison between signed and unsigned integer expressions
make: *** [qcow.o] Error 1
make: *** Waiting for unfinished jobs....
using:
gcc version 4.4.4 20100630 (Red Hat 4.4.4-10) (GCC)
The patch below addresses them.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Ingo Molnar [Thu, 12 May 2011 08:09:29 +0000 (10:09 +0200)]
kvm tools: Use standardized style for the virtio/net.c driver
I had a quick look at virtio/net.c and it still had quite many style
inefficiencies - all of which are patterns which i pointed out before:
- use short names for devices within the driver, so not 'net_device' but
'ndev' - everyone hacking net.c knows that this is the network driver so
'ndev' is a self-explanatory (and very short) term of art ...
- use 'pci_header' instead of the ambiguous and misleading
'virtio_net_pci_device' naming.
- do not repeat 'net' in struct net_device fields! So rename ndev->net_config
to ndev->config.
- In the kernel we generally use _lock names for mutexes. This is conceptually
more generic. So rename the net device mutexes accordingly.
- group #include lines in a topical way instead of a random mess
- fix vertical alignment mismatches
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Wed, 11 May 2011 15:17:23 +0000 (18:17 +0300)]
kvm tools: Add memory gap for larger RAM sizes
e820 is expected to leave a memory gap within the low 32
bits of RAM space. From the documentation of e820_setup_gap():
/*
* Search for the biggest gap in the low 32 bits of the e820
* memory space. We pass this space to PCI to assign MMIO resources
* for hotplug or unconfigured devices in.
* Hopefully the BIOS let enough space left.
*/
Not leaving such gap causes errors and hangs during the boot process.
This patch adds a memory gap between 0xe0000000 and 0x100000000 when using more
than 0xe0000000 bytes for guest RAM.
This patch updates the e820 table, slot allocations used for
KVM_SET_USER_MEMORY_REGION.
This is undesirable as the order of printout is highly random, so successive
dumps are difficult to compare.
The patch below serializes the signalling itself. (this is on top of the
previous patch)
The patch also tweaks the vCPU printout line a bit so that it does not start
with '#', which is discarded if such messages are pasted into Git commit
messages.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Ingo Molnar [Mon, 9 May 2011 07:27:11 +0000 (09:27 +0200)]
kvm tools: Fix and improve the CPU register dump debug output code
* Pekka Enberg <penberg@kernel.org> wrote:
> Ingo Molnar reported that 'kill -3' didn't work on his machine:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
> > This is really cumbersome to debug - is there some good way to get to the RIP
> > that the guest is hanging in? If kvm would print that out to the host console
> > (even if it's just the raw RIP initially) on a kill -3 that would help
> > enormously.
>
> Looks like the code should be doing that already - but the ioctl(KVM_GET_SREGS)
> hangs:
>
> [pid 748] ioctl(6, KVM_GET_SREGS
>
> Avi Kivity pointed out that it's not safe to call KVM_GET_SREGS (or other vcpu
> related ioctls) from other threads:
>
> > is it not OK to call KVM_GET_SREGS from other threads than the one
> > that's doing KVM_RUN?
>
> From Documentation/kvm/api.txt:
>
> - vcpu ioctls: These query and set attributes that control the operation
> of a single virtual cpu.
>
> Only run vcpu ioctls from the same thread that was used to create the
> vcpu.
>
> Fix that up by using pthread_kill() to force the threads that are doing KVM_RUN
> to do the register dumps.
>
> Reported: Ingo Molnar <mingo@elte.hu>
> Cc: Asias He <asias.hejun@gmail.com>
> Cc: Avi Kivity <avi@redhat.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Prasad Joshi <prasadjoshi124@gmail.com>
> Cc: Sasha Levin <levinsasha928@gmail.com>
> Signed-off-by: Pekka Enberg <penberg@kernel.org>
> ---
> tools/kvm/kvm-run.c | 20 +++++++++++++++++---
> 1 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c
> index eb50b6a..58e2977 100644
> --- a/tools/kvm/kvm-run.c
> +++ b/tools/kvm/kvm-run.c
> @@ -127,6 +127,18 @@ static const struct option options[] = {
> OPT_END()
> };
>
> +static void handle_sigusr1(int sig)
> +{
> + struct kvm_cpu *cpu = current_kvm_cpu;
> +
> + if (!cpu)
> + return;
> +
> + kvm_cpu__show_registers(cpu);
> + kvm_cpu__show_code(cpu);
> + kvm_cpu__show_page_tables(cpu);
> +}
> +
> static void handle_sigquit(int sig)
> {
> int i;
> @@ -134,9 +146,10 @@ static void handle_sigquit(int sig)
> for (i = 0; i < nrcpus; i++) {
> struct kvm_cpu *cpu = kvm_cpus[i];
>
> - kvm_cpu__show_registers(cpu);
> - kvm_cpu__show_code(cpu);
> - kvm_cpu__show_page_tables(cpu);
> + if (!cpu)
> + continue;
> +
> + pthread_kill(cpu->thread, SIGUSR1);
> }
>
> serial8250__inject_sysrq(kvm);
i can see a couple of problems with the debug printout code, which currently
produces a stream of such dumps for each vcpu:
- This does not work very well on SMP with lots of vcpus, because the printing
is unserialized, resulting in a jumbled mess of an output, all vcpus trying
to print to the console at once, often mixing lines and characters randomly.
- stdout from a signal handler must be flushed, otherwise lines can remain
buffered if someone saves the output via 'tee' for example.
- the dumps from the various CPUs are not distinguishable - they are just
dumped after each other with no identification
- the various printouts are rather hard to parse visually - it's not easy to see
various properties "at a glance" because the dump is visually confusing.
The patch below addresses these concerns, serializes the output, tidies up the
printout, resulting in this new output:
Sasha Levin [Sun, 8 May 2011 18:58:04 +0000 (21:58 +0300)]
kvm tools: Add missing space after kernel params
Add missing space so that user-provided kernel params
will be properly concatenated to default params.
Instead of just adding a space at the end, add it with
a separate strcat(), since it's not the first (and wouldn't
have been the last) time a space wasn't added.
Asias He [Sun, 8 May 2011 13:09:25 +0000 (21:09 +0800)]
kvm tools: Fix virtio console hangs by removing IRQ injection for tx path
As virtio spec says:
"""
Because this is high importance and low bandwidth, the current Linux
implementation polls for the buffer to be used, rather than waiting
for an interrupt, simplifying the implementation signicantly.
"""
drivers/char/virtio_console.c
send_buf() {
...
/* Tell Host to go! */
virtqueue_kick(out_vq);
...
while (!virtqueue_get_buf(out_vq, &len))
cpu_relax();
...
}
The console hangs can simply be reproduced by yes command which
gives tremendous console IOs and IRQs.