Pekka Enberg [Wed, 18 May 2011 19:19:40 +0000 (22:19 +0300)]
kvm tools: Fail if passed initrd is not really an initrd
We recently changed the meaning of "-i" from disk image to initrd. This has
confused many users because kvm just reports:
Fatal: mmap() failed.
if a disk image is passed as initrd. This patch fixes that by checking for the
first two ID bytes in initrd:
$ ./kvm run -i ~/images/linux-0.2.qcow
# kvm run -k ../../arch/x86/boot/bzImage -m 256 -c 1
Fatal: /home/penberg/images/linux-0.2.qcow is not an initrd
Reported-by: Thomas Heil <heil@terminal-consulting.de> Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cyrill Gorcunov [Wed, 18 May 2011 19:08:57 +0000 (22:08 +0300)]
kvm tools: Add conditional compilation of symbol resolving
Thomas reported that on some systems there might be no bdf
library installed. So we take perf approach and check for
library presence at compilation time.
Reported-by: Thomas Heil <heil@terminal-consulting.de> Tested-by: Thomas Heil <heil@terminal-consulting.de> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Asias He [Wed, 18 May 2011 08:19:10 +0000 (16:19 +0800)]
kvm tools: Rename struct disk_image_operations ops name for raw image
This patch renames:
raw_image__read_sector_ro_mmap to raw_image__read_sector
raw_image__write_sector_ro_mmap to raw_image__write_sector
raw_image__close_ro_mmap to raw_image__close
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pekka Enberg [Tue, 17 May 2011 15:17:12 +0000 (18:17 +0300)]
kvm tools: Fix includes for preadv/pwritev
"bornto befrag <born2befrag@gmail.com>" writes:
> When i compile i kvm native tool tools/kvm && make i get this
>
> CC read-write.o
> cc1: warnings being treated as errors
> read-write.c: In function ‘xpreadv’:
> read-write.c:255: error: implicit declaration of function ‘preadv’
> read-write.c:255: error: nested extern declaration of ‘preadv’
> read-write.c: In function ‘xpwritev’:
> read-write.c:268: error: implicit declaration of function ‘pwritev’
> read-write.c:268: error: nested extern declaration of ‘pwritev’
> make: *** [read-write.o] Error 1
Fix that up by including <sys/uio.h> for preadv()/pwritev(). Reported-by: <born2befrag@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Tue, 17 May 2011 12:07:59 +0000 (15:07 +0300)]
kvm tools: Add interval red-black tree helper
Interval rb-tree allows to directly store interval ranges
and quickly lookup an overlap with a single point or a range.
The helper is based on the kernel rb-tree implementation
(located in <linux/rbtree.h>) which alows for the augmention
of the classical rb-tree to be used as an interval tree.
Prasad Joshi [Fri, 13 May 2011 14:02:46 +0000 (15:02 +0100)]
kvm tools: Add VIRTIO_BLK_T_FLUSH feature to handle flush operation from VM
The virtual machine calls 'sync' when the machine
is halted. Adding the virtio flush feature will
ensure that the data is synced on to disk before
the virtual machine is halted. This is needed to
ensure the intigrity of the data.
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Asias He [Fri, 13 May 2011 02:40:09 +0000 (10:40 +0800)]
kvm tools: Tune the command-line option
With this patch we can have
-c --cpus
-m --mem
-d --disk
-k --kernel
-i --initrd
which is more consistent and easy to remember.
The patch also frees up -s, -g option.
Ingo suggestied
'''
The debug options should probably be concentrated under a --debug option
anyway, to allow things like:
--debug single-step,ioport
Even if the debug options are kept they should be streamlined along
the same
pattern:
>> --debug-single-step Enable single stepping
>> --debug-ioport Enable ioport debugging
But having a --debug option that recognizes all the debug flags would
be nicer.
It would also allow future enhancements to group debug features, like:
--debug all # turn on everything and the kitchen sink
for early hangs
--debug all,-single-step # turn on everything except single-step
debugging
--debug nonverbose # turn on all non-noisy debug options we
have
Maybe even:
--debug memcheck
... could run kvm under valgrind automatically - that way we can hide
any secondary tool complexities from the user and turn those tools into
simple debug options :-)
'''
Let's do this --debug option consolidation later.
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Ingo Molnar [Fri, 13 May 2011 08:19:09 +0000 (10:19 +0200)]
kvm tools: Fix type mismatches on GCC 4.4 on 32-bit systems
The tools/kvm build still fails on 32-bit:
cc1: warnings being treated as errors
qcow.c: In function ‘qcow1_write_sector’:
qcow.c:307: error: comparison between signed and unsigned integer expressions
make: *** [qcow.o] Error 1
make: *** Waiting for unfinished jobs....
using:
gcc version 4.4.4 20100630 (Red Hat 4.4.4-10) (GCC)
The patch below addresses them.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Ingo Molnar [Thu, 12 May 2011 08:09:29 +0000 (10:09 +0200)]
kvm tools: Use standardized style for the virtio/net.c driver
I had a quick look at virtio/net.c and it still had quite many style
inefficiencies - all of which are patterns which i pointed out before:
- use short names for devices within the driver, so not 'net_device' but
'ndev' - everyone hacking net.c knows that this is the network driver so
'ndev' is a self-explanatory (and very short) term of art ...
- use 'pci_header' instead of the ambiguous and misleading
'virtio_net_pci_device' naming.
- do not repeat 'net' in struct net_device fields! So rename ndev->net_config
to ndev->config.
- In the kernel we generally use _lock names for mutexes. This is conceptually
more generic. So rename the net device mutexes accordingly.
- group #include lines in a topical way instead of a random mess
- fix vertical alignment mismatches
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Wed, 11 May 2011 15:17:23 +0000 (18:17 +0300)]
kvm tools: Add memory gap for larger RAM sizes
e820 is expected to leave a memory gap within the low 32
bits of RAM space. From the documentation of e820_setup_gap():
/*
* Search for the biggest gap in the low 32 bits of the e820
* memory space. We pass this space to PCI to assign MMIO resources
* for hotplug or unconfigured devices in.
* Hopefully the BIOS let enough space left.
*/
Not leaving such gap causes errors and hangs during the boot process.
This patch adds a memory gap between 0xe0000000 and 0x100000000 when using more
than 0xe0000000 bytes for guest RAM.
This patch updates the e820 table, slot allocations used for
KVM_SET_USER_MEMORY_REGION.
This is undesirable as the order of printout is highly random, so successive
dumps are difficult to compare.
The patch below serializes the signalling itself. (this is on top of the
previous patch)
The patch also tweaks the vCPU printout line a bit so that it does not start
with '#', which is discarded if such messages are pasted into Git commit
messages.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Ingo Molnar [Mon, 9 May 2011 07:27:11 +0000 (09:27 +0200)]
kvm tools: Fix and improve the CPU register dump debug output code
* Pekka Enberg <penberg@kernel.org> wrote:
> Ingo Molnar reported that 'kill -3' didn't work on his machine:
>
> * Ingo Molnar <mingo@elte.hu> wrote:
>
> > This is really cumbersome to debug - is there some good way to get to the RIP
> > that the guest is hanging in? If kvm would print that out to the host console
> > (even if it's just the raw RIP initially) on a kill -3 that would help
> > enormously.
>
> Looks like the code should be doing that already - but the ioctl(KVM_GET_SREGS)
> hangs:
>
> [pid 748] ioctl(6, KVM_GET_SREGS
>
> Avi Kivity pointed out that it's not safe to call KVM_GET_SREGS (or other vcpu
> related ioctls) from other threads:
>
> > is it not OK to call KVM_GET_SREGS from other threads than the one
> > that's doing KVM_RUN?
>
> From Documentation/kvm/api.txt:
>
> - vcpu ioctls: These query and set attributes that control the operation
> of a single virtual cpu.
>
> Only run vcpu ioctls from the same thread that was used to create the
> vcpu.
>
> Fix that up by using pthread_kill() to force the threads that are doing KVM_RUN
> to do the register dumps.
>
> Reported: Ingo Molnar <mingo@elte.hu>
> Cc: Asias He <asias.hejun@gmail.com>
> Cc: Avi Kivity <avi@redhat.com>
> Cc: Cyrill Gorcunov <gorcunov@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Prasad Joshi <prasadjoshi124@gmail.com>
> Cc: Sasha Levin <levinsasha928@gmail.com>
> Signed-off-by: Pekka Enberg <penberg@kernel.org>
> ---
> tools/kvm/kvm-run.c | 20 +++++++++++++++++---
> 1 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c
> index eb50b6a..58e2977 100644
> --- a/tools/kvm/kvm-run.c
> +++ b/tools/kvm/kvm-run.c
> @@ -127,6 +127,18 @@ static const struct option options[] = {
> OPT_END()
> };
>
> +static void handle_sigusr1(int sig)
> +{
> + struct kvm_cpu *cpu = current_kvm_cpu;
> +
> + if (!cpu)
> + return;
> +
> + kvm_cpu__show_registers(cpu);
> + kvm_cpu__show_code(cpu);
> + kvm_cpu__show_page_tables(cpu);
> +}
> +
> static void handle_sigquit(int sig)
> {
> int i;
> @@ -134,9 +146,10 @@ static void handle_sigquit(int sig)
> for (i = 0; i < nrcpus; i++) {
> struct kvm_cpu *cpu = kvm_cpus[i];
>
> - kvm_cpu__show_registers(cpu);
> - kvm_cpu__show_code(cpu);
> - kvm_cpu__show_page_tables(cpu);
> + if (!cpu)
> + continue;
> +
> + pthread_kill(cpu->thread, SIGUSR1);
> }
>
> serial8250__inject_sysrq(kvm);
i can see a couple of problems with the debug printout code, which currently
produces a stream of such dumps for each vcpu:
- This does not work very well on SMP with lots of vcpus, because the printing
is unserialized, resulting in a jumbled mess of an output, all vcpus trying
to print to the console at once, often mixing lines and characters randomly.
- stdout from a signal handler must be flushed, otherwise lines can remain
buffered if someone saves the output via 'tee' for example.
- the dumps from the various CPUs are not distinguishable - they are just
dumped after each other with no identification
- the various printouts are rather hard to parse visually - it's not easy to see
various properties "at a glance" because the dump is visually confusing.
The patch below addresses these concerns, serializes the output, tidies up the
printout, resulting in this new output:
Sasha Levin [Sun, 8 May 2011 18:58:04 +0000 (21:58 +0300)]
kvm tools: Add missing space after kernel params
Add missing space so that user-provided kernel params
will be properly concatenated to default params.
Instead of just adding a space at the end, add it with
a separate strcat(), since it's not the first (and wouldn't
have been the last) time a space wasn't added.
Asias He [Sun, 8 May 2011 13:09:25 +0000 (21:09 +0800)]
kvm tools: Fix virtio console hangs by removing IRQ injection for tx path
As virtio spec says:
"""
Because this is high importance and low bandwidth, the current Linux
implementation polls for the buffer to be used, rather than waiting
for an interrupt, simplifying the implementation signicantly.
"""
drivers/char/virtio_console.c
send_buf() {
...
/* Tell Host to go! */
virtqueue_kick(out_vq);
...
while (!virtqueue_get_buf(out_vq, &len))
cpu_relax();
...
}
The console hangs can simply be reproduced by yes command which
gives tremendous console IOs and IRQs.
Pekka Enberg [Sun, 8 May 2011 09:56:04 +0000 (12:56 +0300)]
kvm tools: Fix 'kill -3' hangs
Ingo Molnar reported that 'kill -3' didn't work on his machine:
* Ingo Molnar <mingo@elte.hu> wrote:
> This is really cumbersome to debug - is there some good way to get to the RIP
> that the guest is hanging in? If kvm would print that out to the host console
> (even if it's just the raw RIP initially) on a kill -3 that would help
> enormously.
Looks like the code should be doing that already - but the ioctl(KVM_GET_SREGS)
hangs:
[pid 748] ioctl(6, KVM_GET_SREGS
Avi Kivity pointed out that it's not safe to call KVM_GET_SREGS (or other vcpu
related ioctls) from other threads:
> is it not OK to call KVM_GET_SREGS from other threads than the one
> that's doing KVM_RUN?
From Documentation/kvm/api.txt:
- vcpu ioctls: These query and set attributes that control the operation
of a single virtual cpu.
Only run vcpu ioctls from the same thread that was used to create the
vcpu.
Fix that up by using pthread_kill() to force the threads that are doing KVM_RUN
to do the register dumps.
Ingo Molnar [Sun, 8 May 2011 07:39:34 +0000 (09:39 +0200)]
kvm tools: Enable earlyprintk=serial by default
Enable the earlyprintk console to the serial port, to allow the debugging of
very early hangs/crashes.
Since we already enable the serial console by default, this is a natural
extension of it.
I have tested that it indeed works, by provoking an early hang that triggers
after the early console is enabled by before the real console is registered. In
that case before the patch we get:
$ ./kvm run --cpus 2
[ silent hang ]
With this patch applied i got the early output:
$ ./kvm run --cpus 60
[ 0.000000] console [earlyser0] enabled
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 2.6.39-rc6-tip-02944-g87b0bcf-dirty (mingo@aldebaran) (gcc version 4.6.0 20110419 (Red Hat 4.6.0-5) (GCC) ) #84 SMP Mon May 9 02:34:26 CEST 2011
[ 0.000000] Command line: notsc noapic noacpi pci=conf1 console=ttyS0 earlyprintk=serialroot=/dev/vda1 rw
[ 0.000000] locking up the box!
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cyrill Gorcunov [Sat, 7 May 2011 15:02:58 +0000 (19:02 +0400)]
kvm tools: Fix up mtable srcbusirq assignment for PCI devices
The kernel expects srcbusirq follows MP specification and consists
a tuple of PCI device number with pin encoded. Make it so, otherwise
the kernel reports kind of "buggy MP table" found.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cyrill Gorcunov [Sat, 7 May 2011 15:02:57 +0000 (19:02 +0400)]
kvm tools: Fix up PCI pin assignment to conform specification
Only 4 pins are allowed for every PCI compilant device as per PCI 2.2 spec
Section 2.2.6 ("Interrupt Pins"). Multifunctional devices can use up to all
INTA#,B#,C#,D# pins, for our single function devices pin INTA# is enough.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pekka Enberg [Sat, 7 May 2011 14:37:28 +0000 (17:37 +0300)]
kvm tools: Limit CPU count by KVM_CAP_NR_VCPUS
This patch limits the number of CPUs to KVM_CAP_NR_VCPUS when user specifies
more CPUs with the "--cpus=N" command line option than what the in-kernel KVM
is able to handle.
Asias He [Sat, 7 May 2011 02:34:19 +0000 (10:34 +0800)]
kvm tools: Respect ISR status in virtio header
Inject IRQ to guest only when ISR status is low which means
guest has read ISR status and device has cleared this bit as
the side effect of this reading.
This reduces a lot of unnecessary IRQ inject from device to
guest.
Netpef test shows this patch changes:
the host to guest bandwidth
from 2866.27 Mbps (cpu 33.96%) to 5548.87 Mbps (cpu 53.87%),
the guest to host bandwitdth
form 1408.86 Mbps (cpu 99.9%) to 1301.29 Mbps (cpu 99.9%).
The bottleneck of the guest to host bandwidth is guest cpu power.
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cyrill Gorcunov [Thu, 5 May 2011 19:06:40 +0000 (23:06 +0400)]
kvm tools: Gather Virtio-PCI constants into one place
It's better than have them sprinkled in.c files. Note
that pin for ring device is changed so it no longer shared
with block device (it is done in a sake of simplicity).
Also comment style if a bit tuned up in virtio-pci.h
just to be consistent.
Ingo Molnar [Thu, 5 May 2011 08:00:45 +0000 (10:00 +0200)]
kvm tools: Fix 32-bit build of the asm/system.h include
Provide wrappers and other environmental dependencies that the
asm/system.h header file from hell needs to build fine in user-space.
Sidenote: right now alternative() defaults to the compatible, slightly
slower barrier instructions that work on all x86 systems.
If this ever shows up in profiles then kvm could provide an alternatives
patching machinery as well. Right now those instructions are emitted
into special sections and then discarded by the linker harmlessly.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Enable virtio-rng, a virtio random number generator.
Guest kernel should be compiled with CONFIG_HW_RANDOM_VIRTIO.
Once enabled, A RNG device will be located at /dev/hwrng.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
virtio-blk has been converted to use the threadpool. All the threading code has
been removed, which left only simple callback handling code.
New threadpool job types are created within VIRTIO_PCI_QUEUE_PFN for every
queue (just one in the case of virtio-blk). The module signals for work after
receiving VIRTIO_PCI_QUEUE_NOTIFY and expects the threadpool to call
virtio_blk_do_io to handle the I/O. It is possible that the module will signal
work several times while virtio_blk_do_io is already working, but there is no
need to handle multithreading there since the threadpool will call each job in
linear and not in parallel.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch adds a generic pool to create a common interface for working with
threads within the kvm tool. Main idea here is using this threadpool for all
I/O threads instead of having every I/O module write it's own thread code. The
process of working with the thread pool is supposed to be very simple.
During initialization, each module which is interested in working with the
threadpool will call threadpool__add_jobtype with the callback function and a
void* parameter. For example, virtio modules will register every virt_queue as
a new job type. During operation, When theres work to do for a specific job,
the module will signal it to the queue and would expect the callback to be
called with proper parameters. It is assured that the callback will be called
once for every signal action and each callback will be called only once at a
time (i.e. callback functions themselves don't need to handle threading).
[ penberg@kernel.org: Use Lindent ] Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
kvm tools: display appropriate error message when default kernel image could not be found
This change was recommended by Ingo Molnar in his reply to mail 'Use the root
partition of the host to boot the guest machine'. The patch informs user to
explicitly run the 'kvm run --help' command, in case the kvm tool could not find
a default kernel image to boot.
prasad@prasad-kvm:~/KVM/linux-kvm/tools/kvm$ ./kvm run
Fatal: could not find default kernel image in:
./bzImage
../../arch/x86/boot/bzImage
/boot/vmlinuz-2.6.35-25-generic
/boot/bzImage-2.6.35-25-generic
Please see 'kvm run --help' for more options.
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
kvm tools: check read permission before using the root partition of the host to boot VM
The commit fbe8d0f (kvm tools: Use the root partition of the host to boot the
guest machine) changed the default image for virtual machine to root partition
of the host machine. The patch adds a check to ensure appropriate permission
(a read permission) is available for kvm tool to use this partition.
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
kvm tools: Add NR_CPUS definition in case of non-configured kernel sources
Pekka reported
|
| I see this if I ignore the reject:
|
| penberg@tiger:~/linux/tools/kvm$ make
| In file included from mptable.c:10:
| ../../arch/x86/include/asm/mpspec_def.h:20:6: error: "NR_CPUS" is not defined
This is because the source linux kernel might not be configured (bare sources)
so we add own definition in case if there is no NR_CPUS defined.
[ penberg@kernel.org: fix up compilation error ] Reported-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>