Return device id when requested by virtio-blk.
Device id is currently based on the device information and the inode
number of the underlying disk image.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
kvm tools, bios: Make sure IRQ handlers segment is properly set
The mini bios is copied at predefined place and should have
constant segment address.
Without this patch cs value varies depending on which code
is compiled and how big bios is, it's kinda flowing error
which is not triggered yet by a pure luck.
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pekka Enberg [Sat, 9 Jul 2011 07:58:37 +0000 (10:58 +0300)]
kvm tools, qcow: Delayed L2 table writeout
This patch delays writeout for new L2 tables like we do for L1 tables. If a L2
table has non-allocated clusters, we mark that in the in-memory L2 table but
don't actually write it to disk until the L2 table is thrown out of LRU cache
or when qcow_disk_flush() is called. That makes writes to new clusters volatile
before VIRTIO_BLK_T_FLUSH is issued without corrupting the QCOW image on I/O
error.
There's now now point in making sure new L2 tables actually hit the disk before
we write out data to clusters because they are not visible on-disk until
qcow_disk_flush() is called which flushes the L1 table.
Pekka Enberg [Sat, 9 Jul 2011 07:23:07 +0000 (10:23 +0300)]
kvm tools, qcow: Delayed L1 table writeout
This patch moves L1 table writeout to qcow_disk_flush(). The rationale here is
that while writes to clusters that don't have L2 table allocated on-disk are
volatile until VIRTIO_BLK_T_FLUSH is issued, we never corrupt the disk image.
Pekka Enberg [Sat, 9 Jul 2011 11:04:12 +0000 (14:04 +0300)]
kvm tools, qcow: Fix locking issues
The virtio_blk_do_io() function can enter the QCOW code through
disk_image__{read,write,flush}() from multiple threads because it uses a thread
pool for I/O requests. Thus, use locking to make the QCOW2 code thread-safe.
Calling readdir() with NULL dirp leads to segfault.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
[ penberg@kernel.org: use 'while' instead of 'for' ] Signed-off-by: Pekka Enberg <penberg@kernel.org>
Asias He [Sat, 9 Jul 2011 23:58:18 +0000 (07:58 +0800)]
kvm tools: Make virtio net work on older kernels
Some old kernels do not support TUNSETVNETHDRSZ ioctl which modifies the virtio
net header size. The default header size should work, so let's go on if the
TUNSETVNETHDRSZ ioctl is not supported and just give a warnning.
Reported-by: John Floren <john@jfloren.net> Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Pekka Enberg [Fri, 8 Jul 2011 22:25:28 +0000 (01:25 +0300)]
kvm tools: Don't sort command-list.txt for help text
This patch removes the alphabetical sorting from util/generate-cmdlist.h so
that 'kvm run' command, for example, is displayed first:
$ ./kvm
usage: kvm [--version] [--help] COMMAND [ARGS]
The most commonly used kvm commands are:
run Start the virtual machine
pause Pause/resume the virtual machine
version Print the version of the kernel tree kvm tools
list Print a list of running instances on the host.
debug Print debug information from a running instance
balloon Inflate or deflate the virtio balloon
See 'kvm help COMMAND' for more information on a specific command.
Pekka Enberg [Thu, 7 Jul 2011 18:12:45 +0000 (21:12 +0300)]
kvm tools: Fix guest single-stepping setup
"K. Watts" writes:
When the singlestep is enabled the ioctl to sent out when the kvm_cpu
is initialized (kvm-cpu.c in the for loop that gets each vcpu built).
When the ioct goes out the CPU is sitting at the initialization vector
of 0xf000:0xfff0 on CPU #0 and 0x000000 on the other SMP CPUs. The
new host kernel code that handles setting the TF was changed in 2.6.32
and again at 2.6.38. 2.6.32 seems just flat broken, but 2.6.38 checks
that the linear address of the RIP matches what it was when the
KVM_GUESTDBG_SINGLESTEP flag was set. Because the kvm-tool doesn't
start the CPU at the initialization vector (0xfff0) (0x7c00 for the
MBR and where ever you guys map the linux kernel to) they don't match
and the host kernel won't set the trap flag.
Basically the debug and singlestep ioclts need to happen after the CPU
has been initialized. I moved kvm_cpu__enable_singlestep to happen in
kvm_cpu__reset_vcpu after the registers are set (EIP points to boot
address) and the TRAP flags get set and all is good with the world.
Singlestepping is disabled when the guest issues a CLI because the
Linux host doesn't support features in new Intel and AMD CPUs. We can
sort of "shadow" the interrupt mask and still get the CPU to trap out
at ever instruction even when the guest has disabled interrupts on the
CPU. I have to get the bios disk handler working completely first,
but that may be my next task so that we can trace all the CPU
instructions. My current hack is to just re-enable the trap flag
every time a VMEXIT occurs. I get enough instructions to get me by.
This patch fixes the problem by moving the kvm_cpu__enable_singlestep() into
kvm_cpu__start().
Suggested-by: K. Watts <traetox@gmail.com> Cc: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
The virtio memory balloon device is a primitive device for managing guest
memory: the device asks for a certain amount of memory, and the guest supplies
it (or withdraws it, if the device has more than it asks for). This allows the
guest to adapt to changes in allowance of underlying physical memory.
To activate the virtio-balloon device run kvm tools with the '--balloon'
command line parameter.
Current implementation listens for two signals:
- SIGKVMADDMEM: Adds 1M to the balloon driver (inflate). This will decrease
available memory within the guest.
- SIGKVMDELMEM: Remove 1M from the balloon driver (deflate). This will
increase available memory within the guest.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
kvm tools: Process virtio-blk requests in parallel
Process multiple requests within a virtio-blk device's vring
in parallel.
Doing so may improve performance in cases when a request which can
be completed using data which is present in a cache is queued after
a request with un-cached data.
bonnie++ benchmarks have shown a 6% improvement with reads, and 2%
improvement in writes.
Suggested-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Asias He [Wed, 29 Jun 2011 08:47:28 +0000 (16:47 +0800)]
kvm tools: Introduce uip_rx() for uip
This patch implement rx interface for uip. uip_rx() can be called in
virtio_net_rx_thread().
It is a consumer of the ethernet used buffer. It sleeps until there is
used buffer avaiable and copy ethernet data into virtio iov buffers
which provided by virtio_net_rx_thread().
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Asias He [Wed, 29 Jun 2011 08:47:23 +0000 (16:47 +0800)]
kvm tools: Add helper to allocate and get TCP initial sequence number
Guest's initial sequence number can be found in the SYN package that
guest send to us to intialize a TCP session.
Remote server's initial sequence number is faked. RFC 793 specifies
that the ISN should be viewed as a 32-bit counter that increments
by one every 4 microseconds. For simplicity's sake, current
implementation in uip just returns a constant.
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
This patch adds three helpers uip_tcp_hdrlen(), uip_tcp_len(),
uip_tcp_payloadlen() to return TCP header length, TCP totoal
length, and tcp payload length.
Signed-off-by: Asias He <asias.hejun@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
Hugh Dickins [Mon, 27 Jun 2011 23:18:20 +0000 (16:18 -0700)]
drm/i915: more struct_mutex locking
When auditing the locking in i915_gem.c (for a prospective change which
I then abandoned), I noticed two places where struct_mutex is not held
across GEM object manipulations that would usually require it.
Since one is in initial setup and the other in driver unload, I'm
guessing the mutex is not required for either; but post a patch in case
it is.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Keith Packard <keithp@keithp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 27 Jun 2011 23:18:19 +0000 (16:18 -0700)]
drm/i915: use shmem_truncate_range
The interface to ->truncate_range is changing very slightly: once "tmpfs:
take control of its truncate_range" has been applied, this can be applied.
For now there is only a slight inefficiency while this remains unapplied,
but it will soon become essential for managing shmem's use of swap.
Change i915_gem_object_truncate() to use shmem_truncate_range() directly:
which should also spare i915 later change if we switch from
inode_operations->truncate_range to file_operations->fallocate.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Keith Packard <keithp@keithp.com> Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 27 Jun 2011 23:18:18 +0000 (16:18 -0700)]
drm/i915: use shmem_read_mapping_page
Soon tmpfs will stop supporting ->readpage and read_cache_page_gfp(): once
"tmpfs: add shmem_read_mapping_page_gfp" has been applied, this patch can
be applied to ease the transition.
Make i915_gem_object_get_pages_gtt() use shmem_read_mapping_page_gfp() in
the one place it's needed; elsewhere use shmem_read_mapping_page(), with
the mapping's gfp_mask properly initialized.
Forget about __GFP_COLD: since tmpfs initializes its pages with memset,
asking for a cold page is counter-productive.
Include linux/shmem_fs.h also in drm_gem.c: with shmem_file_setup() now
declared there too, we shall remove the prototype from linux/mm.h later.
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Keith Packard <keithp@keithp.com> Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Mon, 27 Jun 2011 23:18:17 +0000 (16:18 -0700)]
drm/ttm: use shmem_read_mapping_page
Soon tmpfs will stop supporting ->readpage and read_mapping_page(): once
"tmpfs: add shmem_read_mapping_page_gfp" has been applied, this patch can
be applied to ease the transition.
ttm_tt_swapin() and ttm_tt_swapout() use shmem_read_mapping_page() in
place of read_mapping_page(), since their swap_space has been created with
shmem_file_setup().
Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Dave Airlie <airlied@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WARNING: drivers/misc/ioc4.o(.data+0x144): Section mismatch in reference from the variable ioc4_load_modules_work to the function .devinit.text:ioc4_load_modules()
The variable ioc4_load_modules_work references
the function __devinit ioc4_load_modules()
If the reference is valid then annotate the
variable with __init* or __refdata (see linux/init.h) or name the variable:
*driver, *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
This one is potentially fatal; by the time ioc4_load_modules is invoked
it may already have been freed. For that reason ioc4_load_modules_work
can't be turned to __devinitdata but also because it's referenced in
ioc4_exit.
WARNING: drivers/leds/leds-lp5523.o(.text+0x12f4): Section mismatch in reference from the function lp5523_probe() to the function .init.text:lp5523_init_led()
The function lp5523_probe() references
the function __init lp5523_init_led().
This is often because lp5523_probe lacks a __init
annotation or the annotation of lp5523_init_led is wrong.
Fixing this one triggers one more mismatch, fix that one as well.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WARNING: drivers/leds/leds-lp5521.o(.text+0xf2c): Section mismatch in reference from the function lp5521_probe() to the function .init.text:lp5521_init_led()
The function lp5521_probe() references
the function __init lp5521_init_led().
This is often because lp5521_probe lacks a __init
annotation or the annotation of lp5521_init_led is wrong.
Fixing this mismatch triggers one more mismatch, fix that one as well.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memcg: fix direct softlimit reclaim to be called in limit path
Commit d149e3b25d7c ("memcg: add the soft_limit reclaim in global direct
reclaim") adds a softlimit hook to shrink_zones(). By this, soft limit
is called as
This will cause a global reclaim when a memcg hits limit.
This is bug. soft_limit_reclaim() should be called when
scanning_global_lru(sc) == true.
And the commit adds a variable "total_scanned" for counting softlimit
scanned pages....it's not "total". This patch removes the variable and
update sc->nr_scanned instead of it. This will affect shrink_slab()'s
scan condition but, global LRU is scanned by softlimit and I think this
change makes sense.
TODO: avoid too much scanning of a zone when softlimit did enough work.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Ying Han <yinghan@google.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vasiliy Kulikov [Mon, 27 Jun 2011 23:18:11 +0000 (16:18 -0700)]
taskstats: don't allow duplicate entries in listener mode
Currently a single process may register exit handlers unlimited times.
It may lead to a bloated listeners chain and very slow process
terminations.
Eg after 10KK sent TASKSTATS_CMD_ATTR_REGISTER_CPUMASKs ~300 Mb of
kernel memory is stolen for the handlers chain and "time id" shows 2-7
seconds instead of normal 0.003. It makes it possible to exhaust all
kernel memory and to eat much of CPU time by triggerring numerous exits
on a single CPU.
The patch limits the number of times a single process may register
itself on a single CPU to one.
One little issue is kept unfixed - as taskstats_exit() is called before
exit_files() in do_exit(), the orphaned listener entry (if it was not
explicitly deregistered) is kept until the next someone's exit() and
implicit deregistration in send_cpu_listeners(). So, if a process
registered itself as a listener exits and the next spawned process gets
the same pid, it would inherit taskstats attributes.
Jan Kara [Mon, 27 Jun 2011 23:18:10 +0000 (16:18 -0700)]
mm: fix assertion mapping->nrpages == 0 in end_writeback()
Under heavy memory and filesystem load, users observe the assertion
mapping->nrpages == 0 in end_writeback() trigger. This can be caused by
page reclaim reclaiming the last page from a mapping in the following
race:
Josh Hunt [Mon, 27 Jun 2011 23:18:08 +0000 (16:18 -0700)]
drivers/misc/lkdtm.c: fix race when crashpoint is hit multiple times before checking count
We observed the crash point count going negative in cases where the
crash point is hit multiple times before the check of "count == 0" is
done. Because of this we never call lkdtm_do_action(). This patch just
adds a spinlock to protect count.