Asias He [Wed, 29 Jun 2011 08:47:28 +0000 (16:47 +0800)]
kvm tools: Introduce uip_rx() for uip
This patch implement rx interface for uip. uip_rx() can be called in
virtio_net_rx_thread().
It is a consumer of the ethernet used buffer. It sleeps until there is
used buffer avaiable and copy ethernet data into virtio iov buffers
which provided by virtio_net_rx_thread().
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Asias He [Wed, 29 Jun 2011 08:47:23 +0000 (16:47 +0800)]
kvm tools: Add helper to allocate and get TCP initial sequence number
Guest's initial sequence number can be found in the SYN package that
guest send to us to intialize a TCP session.
Remote server's initial sequence number is faked. RFC 793 specifies
that the ISN should be viewed as a 32-bit counter that increments
by one every 4 microseconds. For simplicity's sake, current
implementation in uip just returns a constant.
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch adds three helpers uip_tcp_hdrlen(), uip_tcp_len(),
uip_tcp_payloadlen() to return TCP header length, TCP totoal
length, and tcp payload length.
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Anton Vorontsov [Fri, 17 Jun 2011 16:10:46 +0000 (20:10 +0400)]
kvm tools: Fix broken terminal when kvm exits because of a signal
Issuing 'killall kvm' leaves the terminal on which kvm was running in
a broken state. This is because atexit(3) handlers are not called if
a process terminates because of a signal.
Installing a proper handler for the TERM signal fixes the issue.
p.s. The rest of the kvm tools use signal(2), and not sigaction(2), so
I continue the tradition.
Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pekka Enberg [Thu, 16 Jun 2011 14:18:58 +0000 (17:18 +0300)]
kvm tools, qcow: Use fdatasync() instead of sync_file_range()
As explained by Christoph Hellwig, sync_file_range() is not sufficient to
guarantee that Qcow image metadata is never corrupted:
On Thu, Jun 16, 2011 at 12:34:04PM +0300, Pekka Enberg wrote:
> Hi Christoph,
>
> On Thu, Jun 16, 2011 at 09:21:03AM +0300, Pekka Enberg wrote:
> >> And btw, we use sync_file_range()
>
> On Thu, Jun 16, 2011 at 12:24 PM, Christoph Hellwig <hch@infradead.org> wrote:
> > Which doesn't help you at all. ?sync_file_range is just a hint for VM
> > writeback, but never commits filesystem metadata nor the physical
> > disk's write cache. ?In short it's a completely dangerous interface, and
> > that is pretty well documented in the man page.
>
> Doh - I didn't read it carefully enough and got hung up with:
>
> Therefore, unless the application is strictly performing overwrites of
> already-instantiated disk blocks, there are no guarantees that the data will
> be available after a crash.
>
> without noticing that it obviously doesn't work with filesystems like
> btrfs that do copy-on-write.
You also missed:
" This system call does not flush disk write caches and thus does not
provide any data integrity on systems with volatile disk write
caches."
so it's not safe if you either have a cache, or are using btrfs, or
are using a sparse image, or are using an image preallocated using
fallocate/posix_fallocate.
> What's the right thing to do here? Is fdatasync() sufficient?
Yes.
Cc: Ingo Molnar <mingo@elte.hu> Cc: Prasad Joshi <prasadjoshi124@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Prasad Joshi [Fri, 10 Jun 2011 10:35:20 +0000 (11:35 +0100)]
kvm tools: Add IO delay option
Add a command line debug option to add a fix amount of delay in read and
write operation.
From Ingo "the delays are *constant* [make sure you use a high-res timers
kernel], so they do not result in nearly as much measurement noise as real
block IO does.
[...]
This way you are basically 'emulating' a real disk drive but you will
emulate uniform latencies, which makes measurements a lot more
reliable - while still relevant to the end result."
Cyrill Gorcunov [Tue, 7 Jun 2011 19:41:16 +0000 (23:41 +0400)]
kvm tools: Reform bios make rules
Put bios code into bios.s and adjust makefile
rules accordingly. It's more natural than bios-rom.S
(which is now simply a container over real bios code).
Also improve bios deps in Makefile.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pekka Enberg [Tue, 7 Jun 2011 19:48:32 +0000 (22:48 +0300)]
kvm, ui: Kill fb_write() function
This patch kills fb_write() and related functions because they're no longer
called as of commit 6768f73 ("kvm tools, vesa: Use guest-mapped memory for
framebuffer").
Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: John Floren <john@jfloren.net> Cc: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Prasad Joshi [Mon, 6 Jun 2011 19:58:24 +0000 (20:58 +0100)]
kvm tools: Add QCOW level2 caching support
QCOW uses two tables level1 (L1) table and level2 (L2) table. The L1 table
points to offset of L2 table. When a QCOW image is probed, the L1 table is
cached in the memory to avoid reading it from disk on every access. This
caching improves the performance.
The similar performance improvement can be observed when L2 tables are cached.
It is impossible to cache all of the L2 tables because of the memory
constraint. The patch adds L2 table caching capability for up to 32 L2 tables,
it uses combination of RB tree and List to manage the L2 cached tables. The
link list implementation helps in building simple LRU structure and RB tree
helps to search cached table efficiently.
To calculate the performance numbers, the VM was started with following
command line arguments
Run status group 1 (all jobs):
READ: io=409600KB, aggrb=231151KB/s, minb=59174KB/s, maxb=96111KB/s,
mint=1091msec, maxt=1772msec
WRITE: io=141936KB, aggrb=137268KB/s, minb=32340KB/s, maxb=44496KB/s,
mint=813msec, maxt=1034msec
Run status group 2 (all jobs):
READ: io=409600KB, aggrb=9211KB/s, minb=2358KB/s, maxb=2363KB/s,
mint=44367msec, maxt=44468msec
WRITE: io=129808KB, aggrb=2931KB/s, minb=707KB/s, maxb=797KB/s,
mint=43331msec, maxt=44285msec
Run status group 3 (all jobs):
READ: io=409600KB, aggrb=170453KB/s, minb=43636KB/s, maxb=78545KB/s,
mint=1335msec, maxt=2403msec
WRITE: io=138256KB, aggrb=108012KB/s, minb=27648KB/s, maxb=37931KB/s,
mint=879msec, maxt=1280msec
Disk stats (read/write):
vda: ios=120698/16690, merge=0/114742, ticks=113170/304480, in_queue=417560,
util=93.26%
[...]
Summary
=======
Read bandwidth increased by 1.2 to 1.8 times
Write bandwidth increased by 1.1 to 2.9 times
Read latency decreased by small margin of 0.2
Write latency decreased by 0.4
Signed-off-by: Prasad Joshi <prasadjoshi124@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pekka Enberg [Mon, 6 Jun 2011 13:48:50 +0000 (16:48 +0300)]
kvm tools, vesa: Use guest-mapped memory for framebuffer
This patch converts hw/vesa.c to use guest-mapped memory for framebuffer and
drops the slow MMIO emulation. This speeds up framebuffer accesses
considerably. Please note that this can be optimized even more with the
KVM_GET_DIRTY_LOG ioctl() as explained by Alexander Graf.
Cc: Alexander Graf <agraf@suse.de> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: John Floren <john@jfloren.net> Cc: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Fri, 3 Jun 2011 19:51:08 +0000 (22:51 +0300)]
kvm tools: Add MMIO coalescing support
Coalescing MMIO allows us to avoid an exit every time we have a
MMIO write, instead - MMIO writes are coalesced in a ring which
can be flushed once an exit for a different reason is needed.
A MMIO exit is also trigged once the ring is full.
Coalesce all MMIO regions registered in the MMIO mapper.
Add a coalescing handler under kvm_cpu.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
The kbd_out() function was taking 32 bits instead of 8 bits for 'outb'. This
caused kbd_write_command() to receive bogus 'val' which meant that
I8042_CMD_CTL_RCTR case in the switch statement was never executed.
Pekka Enberg [Thu, 2 Jun 2011 10:47:44 +0000 (13:47 +0300)]
kvm tools, i8042: Use kernel command names
This patch renames the command constants in hw/i8042.c to use similar names as
in <linux/i8042.h>. Note: we cannot use <linux/i8042.h> constants directly
because they include the command and data.
John Floren [Wed, 1 Jun 2011 14:53:56 +0000 (17:53 +0300)]
kvm tools: Add support for PS/2 keyboard system
Add support for PS/2 keyboard system with AUX device (aka mouse).
The device works with vnc, the guest must be started with the
'--vnc' parameter for the device to be initialized.
Signed-off-by: John Floren <john@jfloren.net>
[ turn into patch and clean up code ] Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Mon, 30 May 2011 17:27:57 +0000 (20:27 +0300)]
kvm tools: Add debug mode to brlock
Adds a debug mode which allows to switch the brlock into
a big rwlock.
This can be used to verify we don't end up with a BKL kind
of lock with the current brlock implementation.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Mon, 30 May 2011 17:27:55 +0000 (20:27 +0300)]
kvm tools: Add a brlock
brlock is a lock which is very cheap for reads, but very expensive
for writes.
This lock will be used when updates are very rare and reads are
common.
This lock is currently implemented by stopping the guest while
performing the updates. We assume that the only threads which
read from the locked data are VCPU threads, and the only writer
isn't a VCPU thread.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Mon, 30 May 2011 17:27:52 +0000 (20:27 +0300)]
kvm tools: Add APIs to allow pausing guests
Allow pausing and unpausing guests running on the host.
Pausing a guest means that none of the VCPU threads are running
KVM_RUN until they are unpaused.
The following API functions are added:
void kvm__pause(void);
void kvm__continue(void);
void kvm__notify_paused(void);
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Ingo Molnar [Sun, 29 May 2011 12:51:48 +0000 (14:51 +0200)]
kvm tools: Fix virtio net build breakage on 32-bit
* Sasha Levin <levinsasha928@gmail.com> wrote:
> Use ioeventfds to receive notifications of IO events in virtio-net.
> Doing so prevents an exit every time we receive/send a packet.
>
> Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> ---
> tools/kvm/virtio/net.c | 22 ++++++++++++++++++++++
> 1 files changed, 22 insertions(+), 0 deletions(-)
This needs the fix below to build on 32-bit.
Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Fri, 27 May 2011 16:18:37 +0000 (19:18 +0300)]
kvm tools: Add ioeventfd support
ioeventfd is way provided by KVM to receive notifications about
reads and writes to PIO and MMIO areas within the guest.
Such notifications are usefull if all we need to know is that
a specific area of the memory has been changed, and we don't need
a heavyweight exit to happen.
The implementation uses epoll to scale to large number of ioeventfds.
Benchmarks ran on a seperate (non boot) 1GB virtio-blk device, formatted
as ext4, using bonnie++.
Sasha Levin [Thu, 26 May 2011 14:25:46 +0000 (17:25 +0300)]
kvm tools: Exit VCPU thread only when SIGKVMEXIT is received
Currently the VCPU loop would exit when the thread received any signal.
Change behaviour to exit only when SIGKVMEXIT is received. This change
prevents from the guest to terminate when unrelated signals are processed
by the thread (for example, when attaching a debugger).
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Thu, 26 May 2011 10:30:07 +0000 (13:30 +0300)]
kvm tools: Add support for multiple virtio-rng devices
Since multiple hardware rng devices of the same type are currently
unsupported by the kernel, this serves more as an example of a basic
virtio driver under kvm tools and can be used to debug the PCI layer.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Thu, 26 May 2011 10:30:06 +0000 (13:30 +0300)]
kvm tools: Use ioport context to control blk devices
Since ioports now has the ability to pass context to its
callbacks, we can implement multiple blk devices more efficiently.
We can get a ptr to the 'current' blk dev on each ioport call, which
means that we don't need to keep track of the blk device allocation
and ioport distribution within the module.
The advantages are easier management of multiple blk devices and
removal of any hardcoded limits to the amount of possible blk
devices.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Sasha Levin [Thu, 26 May 2011 10:30:04 +0000 (13:30 +0300)]
kvm tools: Add optional parameter used in ioport callbacks
Allow specifying an optional parameter when registering an
ioport range. The callback functions provided by the registering
module will be called with the same parameter.
This may be used to keep context during callbacks on IO operations.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Cyrill Gorcunov [Mon, 23 May 2011 14:39:17 +0000 (18:39 +0400)]
kvm tools: Drop unused vars from int10.c code
There is a couple of functions which defines 'ah' variable but
never use it in real so that gcc 4.6.x series does complain on
me as
CC bios/bios-rom.bin
bios/int10.c: In function ‘int10_putchar’:
bios/int10.c:86:9: error: variable ‘ah’ set but not used [-Werror=unused-but-set-variable]
bios/int10.c: In function ‘int10_vesa’:
bios/int10.c:96:9: error: variable ‘ah’ set but not used [-Werror=unused-but-set-variable]
cc1: all warnings being treated as errors