git.karo-electronics.de Git - karo-tx-linux.git/log

kvm tools: fix help output for run command

This dummy patch remove tabs in help output.
Introduced in commit:
ae9ec23 kvm tools: generate command line options dynamically

Signed-off-by: William Dauchy <wdauchy@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Drop lchown() calls from 9p

Creating new files in a guest:

echo "hello, world" > hello.txt

results in EPERM on my machine.

The problem is that virtio_p9_create() calls lchown() on UID zero (we
are logged in as root in the guest) which obviously doesn't work. In
fact, ownership changes can't work unless we have some sort of uid/gid
mapping. Therefore, drop the calls to lchown().

Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Do setup_fdt() later, get powerpc to boot again

In commit e3d3ced "kernel load/firmware cleanup", the call to
kvm__arch_setup_firmware() was moved. Previously more or less at the end
of the init sequence, but that commit moved it into kvm__init() which
is a core_init() call and so runs quite early.

This broke booting powerpc guests, as setup_fdt() needs to be called
later in the setup sequence. In particular it looks at kvm->nrcpus,
which is uninitialised at that point.

In general setup_fdt() needs to run late in the sequence, as it encodes
the setup of the machine into the device tree.

So move setup_fdt() out of kvm__arch_setup_firmware() and make it a
firmware_init() call of its own.

With this patch I am able to boot guests again on HV KVM.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Fix segfault on powerpc in xics_register()

In commit 06e6648 "move kvm_cpus into struct kvm", kvm_cpu__init() became
kvm_cpu__arch_init() called from a new kvm_cpu__init(), and the call was moved
from the end of the init sequence to much earlier, and in particular prior to
irq__init().

This leads to a segfault on powerpc, because kvm_cpu__arch_init() calls into
xics_cpu_register(), which dereferences vcpu->kvm.icp which is uninitialised
until irq__init().

Later in commit a48488d "use init/exit where possible", irq__init() was pulled
out of the init sequence and made a dev_base_init() routine, on x86. On powerpc
the call to irq__init() was dropped entirely.

Finally, we now have a circular dependency between kvm_cpu__init() (which needs
kvm->arch.icp), and irq__init() (which needs kvm->nrcpus). This is caused by
the combination of commit 89f40a7 "move nrcpus into struct kvm_config",
which moved the global nrcpus into kvm->cfg, and commit 06e6648 "move kvm_cpus
into struct kvm", which moved the setup of kvm->nrcpus from kvm->cfg into
kvm_cpu__init().

To fix it we drop irq__init() entirely, if we ever have a non xics irq option
we can bring it back. We turn xics_system_init() into xics_init(), and have it
do the allocation and setup of the icp/ics, including the per-vcpu setup,
removing the dependency from kvm_cpu__init() (via kvm_cpu__arch_init()).

xics_init() is a base_init() routine, it can't be core, which should be early
enough, fingers crossed.

Finally drop irq__exit(), it does nothing and is never called.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Fix powerpc build errors caused by recent changes

Several caused by commit 8074303 "remove global kvm object",
ioport__setup_arch(), term_getc_iov() & term_getc() in the
spapr_hvcons.c code, and kvm_cpu__reboot() in rtas_power_off().

Commit 221b584 "move active_console into struct kvm_config" added
checks in h_put_term_char() & h_get_term_char() of
kvm->cfg.active_console but needs to be vcpu->kvm->cfg.active_console.

That commit also missed updates to term_putc() & term_getc() in
spapr_rtas.c, and I'm guessing that we need similar checks of
active_console in rtas_put_term_char() & rtas_get_term_char().

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: don't exit on debug ioport write

While it shouldn't happen on regular guests, we sometimes hit it when fuzzing
within the guest, which would cause the lkvm process to exit - which is
undesired.

Our PIT tests were using the debug port to trigger a reboot. Instead of using
that port we now use the reboot line of our i8042 controller.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: support build-time checks

Support using build-time check tools when building lkvm. This allows
using tools such as smatch with the same syntax used with kernel
code.

For example, to build with smatch checks, first make sure you have
smatch installed, then run:

make CHECK="smatch -p=kernel" C=1

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: don't die if sdl wasn't compiled in and we don't try using it

If SDL isn't compiled in we shouldn't die unless we actually try using it.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: fix SDL build error when libsdl isn't installed

We used wrong prototypes for sdl init/exit when libsdl wasn't installed when
building. This would cause build errors.

Reported-by: Kashyap Chamarthy <kashyap.cv@gmail.com>
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: fix SMP

We accidently broke SMP when we moved mptable init to before we initialize the vcpu
count, that means that we always built smptable which was not properly initialized
for the given configuration.

Instead of initializing mptable as part of the kvm arch initialization, let it
be initialized on it's own in the firmware initialization level.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: remove global kvm object

This was ugly, and now we get rid of it.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: pass kvm ptr directly to timer injection

This will help us get rid of the global kvm object.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: use init/exit where possible

Switch to using init/exit calls instead of the repeating call blocks in builtin-run.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: add init/exit automatic calls

This adds a method to call init/exit functions similar to the kernel's init functions.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: fix build optimization

I've accidently changed optimization level to -O0 when testing one of the
patches and commited that. Revert it back to -O2.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: initialize the threadpool job iterator before using

This would fix a bug where the exit function of the threadpool would hang
if no jobs were processed yet and a request to exit was received.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: virtio-9p cleanup

Sort out init/exit calls, move parser into the 9p code and make sure
rootfs config is initialized before virtio-9p (or any other init func)
is called.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: move the rest of the config initializations

These should appear before running any init calls.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: ram init

RAM should be initialized as part of kvm__init, and not somewhere random in the global
init code.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: ioport arch init

Move ioport arch init into ioport init, which is the logical place for that instead of a
random place in the global init code.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: kernel load/firmware cleanup

Sort out the config initialization order so that configuration is fully initialized
before init functions start running, and move the firmware initialization code into
kvm.c.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: ui improvements

Move the vesa initialization logic into sdl__init() and vnc__init(), builtin-run
shouldn't have to know about the conditions for initializing vesa on it's own.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: kbd initialization check

Check if i8042 is supported only within the initialization call itself, so that
builtin-run won't need to know which archs are supported by it.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: kvm-ipc cleanup

Move all the kvm-ipc specific code into the relevant file, and modify
the ipc callback to pass a ptr to struct kvm as well.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: virtio-net init/exit

Make the init/exit of virtio-net self-contained, so the global init code
won't need to check if it was selected or not.

This also moves the bulk of the net-specific initialization code, including
the parser, into virtio-net itself.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: pci-shmem init-exit

Make the init/exit of pci-shmem self-contained, so the global init code
won't need to check if it was selected or not.

Also move the parser out of builtin-run into the pci-shmem code.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: virtio-bln init/exit

Make the init/exit of virtio-balloon self-contained, so the global init code
won't need to check if it was selected or not.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: virtio-rng init/exit

Make the init/exit of virtio-rng self-contained, so the global init code
won't need to check if it was selected or not.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: virtio-console init/exit

Make the init/exit of virtio-console self-contained, so the global init code
won't need to check if it was selected or not.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: timer cleanup

Make the timer init/exit follow the rest of the code, and move it out of
builtin-run.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: threadpool exit routine

Add an exit function for the threadpool which will stop all running threads in the
pool. Also clean up the init code a bit.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: improve term init/exit functions

Make the init and exit functions of the term code similar to the rest
of the code.

Also move in the pty parser into the term code out of builtin-run.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: improve framebuffer manager initialization

Make the init and exit functions of the framebuffer similar to the rest
of the code.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: move kvm_cpus into struct kvm

There's no reason the array of guest specific vcpus is global. Move it into
struct kvm.

Also split up arch specific vcpu init from the generic code and call it from
the kvm_cpu initializer.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: move nrcpus into struct kvm_config

This no longer has to be a global since we now have kvm_config.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: disk image related cleanup

Move io debug delay into kvm_config, the parser out of builtin-run into the disk code
and make the init/exit functions match the rest of the code in style.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: add private ptr to option parser

Support passing a private ptr to CALLBACK options. This will make it possible
assigning options into specific struct kvms by passing them directly to parsers.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: move active_console into struct kvm_config

This config option was 'extern'ed between different objects. Clean it up
and move it into struct kvm_config.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: move mmio_debug into struct kvm_config

This config option was 'extern'ed between different objects. Clean it up
and move it into struct kvm_config.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: move ioport_debug into struct kvm_config

This config option was 'extern'ed between different objects. Clean it up
and move it into struct kvm_config.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: remove redundancy between kvm_config and kvm

Remove some redundant members between struct kvm_config and struct kvm
since options are now contained within struct kvm.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: move kvm_config into struct kvm

Contain the options within struct kvm itself. This way options are specific
to a given struct kvm and not just global.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: split struct kvm into arch specific part

Move all the non-arch specific members into a generic struct, and the arch specific
members into a arch specific kvm_arch. This prevents code duplication across different
archs.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: generate command line options dynamically

Since we now store options in a struct, we should generate the command line options
dynamically. This is a pre-requisite to the following patch moving the options
into struct kvm.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: introduce kvm_config to store instance-specific config options

Move all the configurable options from global static variables to a self-contained
structure.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: add HOME env var to hostfs

Add a HOME env var when booting a hostfs guest. This will point out to a home
dir within the given guest name.

This will make several apps happier when being run under hostfs.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: simplify virtio config handling

Instead of a get/set for config values, just request the address of the
config region, and handle that by simply reading directly from that region.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: 9p don't nuke fids on attach

We're not supposed to kill all fids when a new attach request
arrives. This used to cause issues when the guest would send
multiple attach requests.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: enable LTO

Build with -flto set, which should enable link-time-optimizations.

I'm not sure if it provides a significant performance increase, but
it's probably just worth it for catching issues which it may cause.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: fix warnings in virtio-blk

Fix up warnings related to not checking return value of read/write by actually
handling errors there.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Use the new KVM_SIGNAL_MSI ioctl to inject interrupts directly.

We still create GSIs and keep them for two reasons:

- They're required by virtio-* devices.
- There's not much overhead since we just create them when starting the
guest, they don't use anything when the guest is running.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: clean garbage from ioeventfd code

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Mount devpts to /dev/pts

I'm seeing this in guest due to lacking of /dev/pts:

   sh-4.2# xterm
   xterm: Error 32, errno 28: No space left on device
   Reason: get_pty: not enough ptys

Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Export DISPLAY ENV as our default host ip address

It is useful to run a X program in guest and display it on host.

1) Make host's x server listen to localhost:6000
   host_shell$ socat -d -d TCP-LISTEN:6000,fork,bind=localhost \
               UNIX-CONNECT:/tmp/.X11-unix/X0

2) Start the guest and run X program
   host_shell$ lkvm run -k /boot/bzImage
  guest_shell$ xlogo

Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Respect guest tcp window size

Respect guest tcp window size and stop sending tcp segments to guest
if guest's receive window is closed.

This fixes the TCP hang I'm seeing where guest and host are transferring
big chuck of data.

This problem was not triggered when guest and external host
communicates, probably because guest to external host communication
walks through real network and is much slower than guest and host
communication. Thus, guest's receive window has little chance to be
closed.

v2: use pthread_cond_wait to wait

Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Make tcp between guest/host virtual ip work in UIP mode

This pach makes tcp between 'guest ip' and 'host virtual ip' work in UIP mode.
(The defulat guest ip is 192.168.33.15, host virtual ip is 192.168.33.1)

guest$ wget http://192.168.33.1/file
guest$ ssh 192.168.33.1

Without this patch, user has to figure out the ip address of host's
interface (e.g. eth0) and use that ip address to access host.

With this patch, user can simply use the virtual host ip.

Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Make 'lkvm setup' work outside kernel/tools/kvm dir

Generate

~/.lkvm/$guest/virt/etc/passwd
~/.lkvm/$guest/virt/init

on the fly.

Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Merge guest init and init_stage2.c

Two stages init process is introduced in

   commit eaf720b285947a6f4e29174d0eba1899de31d8ab
   kvm tools: Split custom rootfs init into two stages

to address the outdated init binary issue.

Currenly, init binary is embedded in lkvm binary and generated on the
fly. So there is no need to split the init process.

This makes
   1) the size of lkvm binary smaller becasue only one
      statically linked init binary.
   2) the init process simpler and cleaner.

Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Use LD instead using 'ld' directly

Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Embed init and init_stage2 into lkvm binary

We generate guest init and init_stage2 on the fly and put them in

~/.lkvm/$guest/virt/init
~/.lkvm/$guest/virt/init_stage2
.

This makes 'lkvm run' no longer depending on guest/{int, init_stage2}.

Further, we can merge init and init_stage2 since there is no need to
split the init function into two stages any more.

Signed-off-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Add initial virtio-scsi support

This patch brings virito-scsi support to kvm tool.

With the introduce of tcm_vhost (vhost-scsi)

   tcm_vhost: Initial merge for vhost level target fabric driver

we can implement virito-scsi by simply having vhost-scsi to handle the
SCSI command.

Howto use:
1) Setup the tcm_vhost target through /sys/kernel/config

   [Stefan Hajnoczi, Thanks for the script to setup tcm_vhost]

   ** Setup wwpn and tpgt
   $ wwpn="naa.0"
   $ tpgt=/sys/kernel/config/target/vhost/$wwpn/tpgt_0
   $ nexus=$tpgt/nexus
   $ mkdir -p $tpgt
   $ echo -n $wwpn > $nexus

   ** Setup lun using /dev/ram
   $ n=0
   $ lun=$tpgt/lun/lun_${n}
   $ data=/sys/kernel/config/target/core/iblock_0/data_${n}
   $ ram=/dev/ram${n}
   $ mkdir -p $lun
   $ mkdir -p $data
   $ echo -n udev_path=${ram} > $data/control
   $ echo -n 1 > $data/enable
   $ ln -s $data $lun

2) Run kvm tool with the new disk option '-d scsi:$wwpn:$tpgt', e.g
   $ lkvm run -k /boot/bzImage -d ~/img/sid.img -d scsi:naa.0:0

Signed-off-by: Asias He <asias.hejun@gmail.com>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Add 'install' target to Makefile

Add a new 'install' target to Makefile that installs 'lkvm' binary to
$HOME/bin by default. The installed binary still needs to be launched
from linux/tools/kvm directory because of our silly external dependency
to stage 2 guest init file:

[penberg@tux ~]$ lkvm run
Fatal: Failed linking stage 2 of init.

The most convinent way to fix that is to embed the stage 2 image in
'lkvm' executable like we do with our mini-BIOS.

Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Fix segfault on "lkvm run"

The segfault is triggered by just running "lkvm run". On my system, it
does not find any kernel, so kvm_cmd_run_init() returns EINVAL which
fails the (r < 0) check in kvm_cmd_run(). Since kvm_cmd_run_init() does
not get to initialize the cpus, kvm_cpus gets mistakenly dereferenced in
kvm_cmd_run_work().

The errors from kvm_cmd_run_init() are not handled properly as they are
returned as positive values.

Acked-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Paul Neumann <paul104x@yahoo.de>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Check for unknown parameters in network option handling

Currently the parsing of network command line parameters doesn't check
for unknown parameters.

For example "-n mod=none" will cause kvmtool to setup TAP networking.

Save users from their typos by checking for unknown parameters and bailing.

Acked-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Fix formatting of error message in TAP handling

This error message is missing a space, and has a redundant ":" at the end,
currently it produces:

You have requested a TAP device, but creation of one hasfailed because:: No such file or directory

Add a space to "hasfailed" and remove the extra ":".

Don't split the line to improve grepability.

Acked-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Fix crash when /etc/resolv.conf doesn't exist

In uip_dhcp_get_dns() we try to open /etc/resolv.conf. If we fail to
open it we then SEGV trying to fclose() it.

Fix the code to just return directly if we can't open it.

Acked-by: Asias He <asias.hejun@gmail.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: don't bother tracking is_dir

This is something we can calculate on the fly, and doesn't justify the overhead
of tracking it all over fid transitions.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: prevent guest softlockup errors when pausing

Use the new KVM_KVMCLOCK_CTRL ioctl to prevent guests from wrongfully
detecting lockups when in fact they were paused.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: Enable O_DIRECT support

With Direct I/O, file reads and writes go directly from the applications
to the storage device, bypassing the operating system read and write
caches. This is useful for applications that manage their own caches.

Open a disk image with O_DIRECT:
$ lkvm run -d ~/img/test.img,direct

The original readonly flag is still supported.
Open a disk image with O_DIRECT and readonly:
$ lkvm run -d ~/img/test.img,direct,ro

Signed-off-by: Asias He <asias.hejun@gmail.com>
Acked-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: use correct error value for virtio-9p RLERROR

RLERROR expects positive error values, passing negative values causes guest kernel
panic.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: remove unused field from virtio-blk

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: support injecting arbitrary sysrqs

Add support to 'lkvm debug' to inject arbitrary sysrqs using a new
'-s <sysrq>' argument.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

kvm tools: dynamically allocate p9 fids

This removes the limit for p9 fids and the huge fid array that came along with
it. Instead, it dynamically allocates fids and stores them in a rb-tree.

This is useful when the guest needs a lot of fids, such as when stress
testing guests.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>

Merge commit 'v3.5' into kvmtool/next

Michael Ellerman writes:

  It occurred to me overnight that I forgot to mention that in order to
  build the new code you need the headers from a 3.5-rc1 era kernel (for
  the ioctl & KVM_CAP definitions).

  The easiest way to do that is to merge linus' tree into kvmtool.

Signed-off-by: Pekka Enberg <penberg@kernel.org>

Linux 3.5

Remove SYSTEM_SUSPEND_DISK system state

The SYSTEM_SUSPEND_DISK system state is never used, so drop it.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'anton-kgdb' (kgdb dmesg fixups)

Merge emailed kgdb dmesg fixups patches from Anton Vorontsov:
"The dmesg command appears to be broken after the printk rework.  The
  old logic in the kdb code makes no sense in terms of current
  printk/logging storage format, and KDB simply hangs forever upon
  entering 'dmesg' command.

  The first patch revives the command by switching to kmsg_dumper
  iterator.  As a side-effect, the code is now much more simpler.

  A few changes were needed in the printk.c: we needed unlocked variant
  of the kmsg_dumper iterator, but these can surely wait for 3.6.

  It's probably too late even for the first patch to go to 3.5, but I'll
  try to convince otherwise.  :-) Here we go:

   - The current code is broken for sure, and has no hope to work at
     all.  It is a regression
   - The new code works for me, and probably works for everyone else;
   - If it compiles (and I urge everyone to compile-test it on your
     setup), it hardly can make things worse."

* Merge emailed patches from Anton Vorontsov: (4 commits)
  kdb: Switch to nolock variants of kmsg_dump functions
  printk: Implement some unlocked kmsg_dump functions
  printk: Remove kdb_syslog_data
  kdb: Revive dmesg command

kdb: Switch to nolock variants of kmsg_dump functions

The locked variants are prone to deadlocks (suppose we got to the
debugger w/ the logbuf lock held), so let's switch to nolock variants.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

printk: Implement some unlocked kmsg_dump functions

If used from KDB, the locked variants are prone to deadlocks (suppose we
got to the debugger w/ the logbuf lock held).

So, we have to implement a few routines that grab no logbuf lock.

Yet we don't need these functions in modules, so we don't export them.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

printk: Remove kdb_syslog_data

The function is no longer needed, so remove it.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

kdb: Revive dmesg command

The kgdb dmesg command is broken after the printk rework. The old logic
in kdb code makes no sense in terms of current printk/logging storage
format, and KDB simply hangs forever.

This patch revives the command by switching to kmsg_dumper iterator.

The code is now much more simpler and shorter.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull late MIPS fixes from Ralf Baechle:
"This fixes a number of lose ends in the MIPS code and various bug
  fixes.

  Aside of dropping some patch that should not be in this pull request
  everything has sat in -next for quite a while and there are no known
  issues.

  The biggest patch in this patch set moves the allocation of an array
  that is aliased to a function (for runtime generated code) to
  assembler code.  This avoids an issue with certain toolchains when
  building for microMIPS."

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (35 commits)
  MIPS: PCI: Move fixups from __init to __devinit.
  MIPS: Fix bug.h MIPS build regression
  MIPS: sync-r4k: remove redundant irq operation
  MIPS: smp: Warn on too early irq enable
  MIPS: call set_cpu_online() on cpu being brought up with irq disabled
  MIPS: call ->smp_finish() a little late
  MIPS: Yosemite: delay irq enable to ->smp_finish()
  MIPS: SMTC: delay irq enable to ->smp_finish()
  MIPS: BMIPS: delay irq enable to ->smp_finish()
  MIPS: Octeon: delay enable irq to ->smp_finish()
  MIPS: Oprofile: Fix build as a module.
  MIPS: BCM63XX: Fix BCM6368 IPSec clock bit
  MIPS: perf: Fix build error caused by unused counters_per_cpu_to_total()
  MIPS: Fix Magic SysRq L kernel crash.
  MIPS: BMIPS: Fix duplicate header inclusion.
  mips: mark const init data with __initconst instead of __initdata
  MIPS: cmpxchg.h: Add missing include
  MIPS: Malta may also be equipped with MIPS64 R2 processors.
  MIPS: Fix typo multipy -> multiply
  MIPS: Cavium: Fix duplicate ARCH_SPARSEMEM_ENABLE in kconfig.
  ...

Merge tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm

Pull device-mapper discard fixes from Alasdair G Kergon:
  - avoid a crash in dm-raid1 when discards coincide with mirror
    recovery;
  - avoid discarding shared data that's still needed in dm-thin;
  - don't guarantee that discarded blocks will be wiped in dm-raid1.

* tag 'dm-3.5-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm raid1: set discard_zeroes_data_unsupported
  dm thin: do not send discards to shared blocks
  dm raid1: fix crash with mirror recovery and discard

Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd

Pull pnfs/ore fixes from Boaz Harrosh:
"These are catastrophic fixes to the pnfs objects-layout that were just
  discovered.  They are also destined for @stable.

  I have found these and worked on them at around RC1 time but
  unfortunately went to the hospital for kidney stones and had a very
  slow recovery.  I refrained from sending them as is, before proper
  testing, and surly I have found a bug just yesterday.

  So now they are all well tested, and have my sign-off.  Other then
  fixing the problem at hand, and assuming there are no bugs at the new
  code, there is low risk to any surrounding code.  And in anyway they
  affect only these paths that are now broken.  That is RAID5 in pnfs
  objects-layout code.  It does also affect exofs (which was not broken)
  but I have tested exofs and it is lower priority then objects-layout
  because no one is using exofs, but objects-layout has lots of users."

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
  pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
  pnfs-obj: don't leak objio_state if ore_write/read fails
  ore: Unlock r4w pages in exact reverse order of locking
  ore: Remove support of partial IO request (NFS crash)
  ore: Fix NFS crash by supporting any unaligned RAID IO

Merge tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs

Pull UBIFS free space fix-up bugfix from Artem Bityutskiy:
"It's been reported already twice recently:

    http://lists.infradead.org/pipermail/linux-mtd/2012-May/041408.html
    http://lists.infradead.org/pipermail/linux-mtd/2012-June/042422.html

  and we finally have the fix.  I am quite confident the fix is correct
  because I could reproduce the problem with nandsim and verify the fix.
  It was also verified by Iwo (the reporter).

  I am also confident that this is OK to merge the fix so late because
  this patch affects only the fixup functionality, which is not used by
  most users."

* tag 'upstream-3.5-rc8' of git://git.infradead.org/linux-ubifs:
  UBIFS: fix a bug in empty space fix-up

dm raid1: set discard_zeroes_data_unsupported

We can't guarantee that REQ_DISCARD on dm-mirror zeroes the data even if
the underlying disks support zero on discard. So this patch sets
ti->discard_zeroes_data_unsupported.

For example, if the mirror is in the process of resynchronizing, it may
happen that kcopyd reads a piece of data, then discard is sent on the
same area and then kcopyd writes the piece of data to another leg.
Consequently, the data is not zeroed.

The flag was made available by commit 983c7db347db8ce2d8453fd1d89b7a4bb6920d56
(dm crypt: always disable discard_zeroes_data).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm thin: do not send discards to shared blocks

When process_discard receives a partial discard that doesn't cover a
full block, it sends this discard down to that block. Unfortunately, the
block can be shared and the discard would corrupt the other snapshots
sharing this block.

This patch detects block sharing and ends the discard with success when
sending it to the shared block.

The above change means that if the device supports discard it can't be
guaranteed that a discard request zeroes data. Therefore, we set
ti->discard_zeroes_data_unsupported.

Thin target discard support with this bug arrived in commit
104655fd4dcebd50068ef30253a001da72e3a081 (dm thin: support discards).

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

dm raid1: fix crash with mirror recovery and discard

This patch fixes a crash when a discard request is sent during mirror
recovery.

Firstly, some background.  Generally, the following sequence happens during
mirror synchronization:
- function do_recovery is called
- do_recovery calls dm_rh_recovery_prepare
- dm_rh_recovery_prepare uses a semaphore to limit the number
  simultaneously recovered regions (by default the semaphore value is 1,
  so only one region at a time is recovered)
- dm_rh_recovery_prepare calls __rh_recovery_prepare,
  __rh_recovery_prepare asks the log driver for the next region to
  recover. Then, it sets the region state to DM_RH_RECOVERING. If there
  are no pending I/Os on this region, the region is added to
  quiesced_regions list. If there are pending I/Os, the region is not
  added to any list. It is added to the quiesced_regions list later (by
  dm_rh_dec function) when all I/Os finish.
- when the region is on quiesced_regions list, there are no I/Os in
  flight on this region. The region is popped from the list in
  dm_rh_recovery_start function. Then, a kcopyd job is started in the
  recover function.
- when the kcopyd job finishes, recovery_complete is called. It calls
  dm_rh_recovery_end. dm_rh_recovery_end adds the region to
  recovered_regions or failed_recovered_regions list (depending on
  whether the copy operation was successful or not).

The above mechanism assumes that if the region is in DM_RH_RECOVERING
state, no new I/Os are started on this region. When I/O is started,
dm_rh_inc_pending is called, which increases reg->pending count. When
I/O is finished, dm_rh_dec is called. It decreases reg->pending count.
If the count is zero and the region was in DM_RH_RECOVERING state,
dm_rh_dec adds it to the quiesced_regions list.

Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is
in DM_RH_RECOVERING state, it could be added to quiesced_regions list
multiple times or it could be added to this list when kcopyd is copying
data (it is assumed that the region is not on any list while kcopyd does
its jobs). This results in memory corruption and crash.

There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests
do not belong to any region, so they are always added to the sync list
in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH
requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH
requests. These bypasses avoid the crash possibility described above.

These bypasses were improperly implemented for REQ_DISCARD when
the mirror target gained discard support in commit
5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4 (dm raid1: support discard).

In do_writes, REQ_DISCARD requests is always added to the sync queue and
immediately dispatched (even if the region is in DM_RH_RECOVERING).  However,
dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts.  So it violates the
rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list
corruption described above.

This patch changes it so that REQ_DISCARD requests follow the same path
as REQ_FLUSH. This avoids the crash.

Reference: https://bugzilla.redhat.com/837607

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

pnfs-obj: Fix __r4w_get_page when offset is beyond i_size

It is very common for the end of the file to be unaligned on
stripe size. But since we know it's beyond file's end then
the XOR should be preformed with all zeros.

Old code used to just read zeros out of the OSD devices, which is a great
waist. But what scares me more about this situation is that, we now have
pages attached to the file's mapping that are beyond i_size. I don't
like the kind of bugs this calls for.

Fix both birds, by returning a global zero_page, if offset is beyond
i_size.

TODO:
Change the API to ->__r4w_get_page() so a NULL can be
returned without being considered as error, since XOR API
treats NULL entries as zero_pages.

[Bug since 3.2. Should apply the same way to all Kernels since]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>

pnfs-obj: don't leak objio_state if ore_write/read fails

[Bug since 3.2 Kernel]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>

ore: Unlock r4w pages in exact reverse order of locking

The read-4-write pages are locked in address ascending order.
But where unlocked in a way easiest for coding. Fix that,
locks should be released in opposite order of locking, .i.e
descending address order.

I have not hit this dead-lock. It was found by inspecting the
dbug print-outs. I suspect there is an higher lock at caller that
protects us, but fix it regardless.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>

ore: Remove support of partial IO request (NFS crash)

Do to OOM situations the ore might fail to allocate all resources
needed for IO of the full request. If some progress was possible
it would proceed with a partial/short request, for the sake of
forward progress.

Since this crashes NFS-core and exofs is just fine without it just
remove this contraption, and fail.

TODO:
Support real forward progress with some reserved allocations
of resources, such as mem pools and/or bio_sets

[Bug since 3.2 Kernel]
CC: Stable Tree <stable@kernel.org>
CC: Benny Halevy <bhalevy@tonian.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>

ore: Fix NFS crash by supporting any unaligned RAID IO

In RAID_5/6 We used to not permit an IO that it's end
byte is not stripe_size aligned and spans more than one stripe.
.i.e the caller must check if after submission the actual
transferred bytes is shorter, and would need to resubmit
a new IO with the remainder.

Exofs supports this, and NFS was supposed to support this
as well with it's short write mechanism. But late testing has
exposed a CRASH when this is used with none-RPC layout-drivers.

The change at NFS is deep and risky, in it's place the fix
at ORE to lift the limitation is actually clean and simple.
So here it is below.

The principal here is that in the case of unaligned IO on
both ends, beginning and end, we will send two read requests
one like old code, before the calculation of the first stripe,
and also a new site, before the calculation of the last stripe.
If any "boundary" is aligned or the complete IO is within a single
stripe. we do a single read like before.

The code is clean and simple by splitting the old _read_4_write
into 3 even parts:
1._read_4_write_first_stripe
2. _read_4_write_last_stripe
3. _read_4_write_execute

And calling 1+3 at the same place as before. 2+3 before last
stripe, and in the case of all in a single stripe then 1+2+3
is preformed additively.

Why did I not think of it before. Well I had a strike of
genius because I have stared at this code for 2 years, and did
not find this simple solution, til today. Not that I did not try.

This solution is much better for NFS than the previous supposedly
solution because the short write was dealt with out-of-band after
IO_done, which would cause for a seeky IO pattern where as in here
we execute in order. At both solutions we do 2 separate reads, only
here we do it within a single IO request. (And actually combine two
writes into a single submission)

NFS/exofs code need not change since the ORE API communicates the new
shorter length on return, what will happen is that this case would not
occur anymore.

hurray!!

[Stable this is an NFS bug since 3.2 Kernel should apply cleanly]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>

UBIFS: fix a bug in empty space fix-up

UBIFS has a feature called "empty space fix-up" which is a quirk to work-around
limitations of dumb flasher programs. Namely, of those flashers that are unable
to skip NAND pages full of 0xFFs while flashing, resulting in empty space at
the end of half-filled eraseblocks to be unusable for UBIFS. This feature is
relatively new (introduced in v3.0).

The fix-up routine (fixup_free_space()) is executed only once at the very first
mount if the superblock has the 'space_fixup' flag set (can be done with -F
option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and
writes it back to the same LEB. The routine assumes the image is pristine and
does not have anything in the journal.

There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly.
All but one LEB of the log of a pristine file-system are empty. And one
contains just a commit start node. And 'fixup_free_space()' just unmapped this
LEB, which resulted in wiping the commit start node. As a result, some users
were unable to mount the file-system next time with the following symptom:

UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node
UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0

The root-cause of this bug was that 'fixup_free_space()' wrongly assumed
that the beginning of empty space in the log head (c->lhead_offs) was known
on mount. However, it is not the case - it was always 0. UBIFS does not store
in it the master node and finds out by scanning the log on every mount.

The fix is simple - just pass commit start node size instead of 0 to
'fixup_leb()'.

Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com>
Cc: stable@vger.kernel.org [v3.0+]
Reported-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Tested-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com>
Reported-by: James Nute <newten82@gmail.com>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client

Pull last minute Ceph fixes from Sage Weil:
"The important one fixes a bug in the socket failure handling behavior
  that was turned up in some recent failure injection testing.  The
  other two are minor bug fixes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: endian bug in rbd_req_cb()
  rbd: Fix ceph_snap_context size calculation
  libceph: fix messenger retry

Merge tag 'md-3.5-fixes' of git://neil.brown.name/md

Pull three md bugfixes from NeilBrown:
"One of the bugs was introduced in 3.5-rc1.  Others have been there for
  longer."

* tag 'md-3.5-fixes' of git://neil.brown.name/md:
  md/raid1: close some possible races on write errors during resync
  md: avoid crash when stopping md array races with closing other open fds.
  md: fix bug in handling of new_data_offset

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking changes from David Miller:
"Ok, we should be good to go now"

1) We have to statically initialize the init_net device list head rather
   than do so in an initcall, otherwise netprio_cgroup crashes if it's
   built statically rather than modular (Mark D.  Rustad)

2) Fix SKB null oopser in CIPSO ipv4 option processing (Paul Moore)

3) Qlogic maintainers update (Anirban Chakraborty)

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: Statically initialize init_net.dev_base_head
  MAINTAINERS: Changes in qlcnic and qlge maintainers list
  cipso: don't follow a NULL pointer when setsockopt() is called

Merge branch 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid

Pull HID update from Jiri Kosina:
"A final round of changes for HID for 3.5: just device ID additions."

* 'upstream-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: hid-multitouch: add support for Zytronic panels
  HID: add Sennheiser BTD500USB device support
  HID: add battery quirk for Apple Wireless ANSI

cx25821: Remove bad strcpy to read-only char*

The strcpy was being used to set the name of the board. Since the
destination char* was read-only and the name is set statically at
compile time; this was both wrong and redundant.

The type of char* is changed to const char* to prevent future errors.

Reported-by: Radek Masin <radek@masin.eu>
Signed-off-by: Ezequiel Garcia <elezegarcia@gmail.com>
[ Taking directly due to vacations - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

HID: hid-multitouch: add support for Zytronic panels

Signed-off-by: Benjamin Tissoires <benjamin.tissoires@enac.fr>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>