James Smart [Wed, 10 Jun 2009 21:23:16 +0000 (17:23 -0400)]
[SCSI] lpfc 8.3.3 : Add support for Target Reset handler entrypoint
Patch was originally submitted upstream on 4/21/2008:
http://marc.info/?l=linux-scsi&m=120880973719266&w=2
Somewhere, it never get merged. The patch restructures the task mgmt
routines, commonizing like behavior. Then the patch changes device
reset to LUN resets, and adds a target reset handler.
Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
James Smart [Wed, 10 Jun 2009 21:23:06 +0000 (17:23 -0400)]
[SCSI] lpfc 8.3.3 : Fix a couple of spin_lock and memory issues and a crash
Contains the following changes:
- Fixed error paths retaking a spin lock which they already hold
- Added code to free memory in a couple of error paths
- Added code to free RPI bit map while unloading driver
- Added code to write zero to memory object allocated through dma_alloc_coherent
- Fixed crash/hang with target or LUN resets
Signed-off-by: James Smart <James.Smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
James Smart [Wed, 10 Jun 2009 21:22:56 +0000 (17:22 -0400)]
[SCSI] lpfc 8.3.3 : FC/FCOE discovery fixes
Contains the following changes:
- Force vport to send LOGO to fabric controller when deleting vport
- Fixed driver failing to register login when a PLOGI is received
- Fixes for FIP discovery
- Added stricter checks for FCF addressing mode
- Added code to send only FLOGI, FDISC and LOGO to Fabric controller as FIP
- Fixed handling of LOGO from Fabric port
- Fixed consecutive link up events skipped link_down processing
Signed-off-by: James Smart <James.Smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
James Smart [Wed, 10 Jun 2009 21:22:44 +0000 (17:22 -0400)]
[SCSI] lpfc 8.3.3 : Fix various SLI-3 vs SLI-4 differences
Contains the following changes
- Set the CT field of FDISC to 3
- Fixed over allocation of SCSI buffers on SLI4
- Removed unused jump table entries
- Increase LPFC_WQE_DEF_COUNT to 256
- Updated FDISC context to VPI
- Fixed immediate SCSI command for LUN reset translation to WQE
- Extended mailbox handling to allow MBX_POLL commands in between async
MBQ commands
- Fixed SID used for FDISC
- Fix crash when accessing ctlregs from sysfs for SLI4 HBAs
- Fix SLI4 firmware version not being saved or displayed correctly
- Expand CQID field in WQE structure to 16 bits
- Fix post header template mailbox command timing out
- Removed FCoE PCI device ID 0x0705
Signed-off-by: James Smart <James.Smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[SCSI] qla2xxx: Resolve a performance issue in interrupt
Reverted back a change in qla*_intr_handler code that caused an increase in
cpu cycles by allowing interrupts to occur while the instance hardware lock
was being held. Fix by taking the lock in irqsave mode.
Reported-and-tested-by: Douglas W. Styner <douglas.w.styner@intel.com> Signed-off-by: Anirban Chakraborty <anirban.chakraborty@qlogic.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Randy Dunlap [Wed, 10 Jun 2009 19:56:58 +0000 (12:56 -0700)]
[SCSI] qla2xxx: fix printk format warnings
Fix qla2xxx printk format warnings:
drivers/scsi/qla2xxx/qla_sup.c:915: warning: long long unsigned int format, u64 arg (arg 5)
drivers/scsi/qla2xxx/qla_sup.c:915: warning: long long unsigned int format, u64 arg (arg 6)
drivers/scsi/qla2xxx/qla_sup.c:923: warning: long long unsigned int format, u64 arg (arg 5)
drivers/scsi/qla2xxx/qla_sup.c:923: warning: long long unsigned int format, u64 arg (arg 6)
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Seokmann Ju <seokmann.ju@qlogic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Alexey Zaytsev [Wed, 10 Jun 2009 19:56:56 +0000 (12:56 -0700)]
[SCSI] compat: don't perform unneeded copy in sg_io code
The members from 'status' in struct sg_io_hdr to the last are used to
transfer information from kernel to user space. The values that user
space sets are just ignored.
Signed-off-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Christof Schmitt [Fri, 15 May 2009 11:18:22 +0000 (13:18 +0200)]
[SCSI] zfcp: Update FC pass-through support
Don't access the block layer request, get the payload length instead
from the FC job. Simplify access to the zfcp_port, only the d_id is
required, if the port is no longer accessed later. This is possible
when the els_handler does not access the port pointer from the ELS
request.
Sven Schuetz [Mon, 6 Apr 2009 16:31:47 +0000 (18:31 +0200)]
[SCSI] zfcp: Add FC pass-through support
Provide the ability to do fibre channel requests from the userspace to
our zfcp driver. Patch builds upon extension to the fibre channel
tranport class by James Smart and Seokmann Ju. See here
http://marc.info/?l=linux-scsi&m=123808882309133&w=2
Signed-off-by: Sven Schuetz <sven@linux.vnet.ibm.com> Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
James Smart [Thu, 26 Mar 2009 17:33:19 +0000 (13:33 -0400)]
[SCSI] FC Pass Thru support
Attached is the ELS/CT pass-thru patch for the FC Transport. The patch
creates a generic framework that lays on top of bsg and the SGIO v4 ioctl
in order to pass transaction requests to LLDD's.
The interface supports the following operations:
On an fc_host basis:
Request login to the specified N_Port_ID, creating an fc_rport.
Request logout of the specified N_Port_ID, deleting an fc_rport
Send ELS request to specified N_Port_ID w/o requiring a login, and
wait for ELS response.
Send CT request to specified N_Port_ID and wait for CT response.
Login is required, but LLDD is allowed to manage login and decide
whether it stays in place after the request is satisfied.
Vendor-Unique request. Allows a LLDD-specific request to be passed
to the LLDD, and the passing of a response back to the application.
On an fc_rport basis:
Send ELS request to nport and wait for ELS response.
Send CT request to nport and wait for CT response.
The patch also exports several headers from include/scsi such that
they can be available to user-space applications:
include/scsi/scsi.h
include/scsi/scsi_netlink.h
include/scsi/scsi_netlink_fc.h
include/scsi/scsi_bsg_fc.h
For further information, refer to the last RFC:
http://marc.info/?l=linux-scsi&m=123436574018579&w=2
Note: Documentation is still spotty and will be added later.
[bharrosh@panasas.com: update for new block API] Signed-off-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Linus Torvalds [Fri, 12 Jun 2009 18:16:27 +0000 (11:16 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (290 commits)
ALSA: pcm - Update document about xrun_debug proc file
ALSA: lx6464es - support standard alsa module parameters
ALSA: snd_usb_caiaq: set mixername
ALSA: hda - add quirk for STAC92xx (SigmaTel STAC9205)
ALSA: use card device as parent for jack input-devices
ALSA: sound/ps3: Correct existing and add missing annotations
ALSA: sound/ps3: Restructure driver source
ALSA: sound/ps3: Fix checkpatch issues
ASoC: Fix lm4857 control
ALSA: ctxfi - Clear PCM resources at hw_params and hw_free
ALSA: ctxfi - Check the presence of SRC instance in PCM pointer callbacks
ALSA: ctxfi - Add missing start check in atc_pcm_playback_start()
ALSA: ctxfi - Add use_system_timer module option
ALSA: usb - Add boot quirk for C-Media 6206 USB Audio
ALSA: ctxfi - Fix wrong model id for UAA
ALSA: ctxfi - Clean up probe routines
ALSA: hda - Fix the previous tagra-8ch patch
ALSA: hda - Add 7.1 support for MSI GX620
ALSA: pcm - A helper function to compose PCM stream name for debug prints
ALSA: emu10k1 - Fix minimum periods for efx playback
...
Linus Torvalds [Fri, 12 Jun 2009 16:52:30 +0000 (09:52 -0700)]
Merge branch 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
* 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slab: setup cpu caches later on when interrupts are enabled
slab,slub: don't enable interrupts during early boot
slab: fix gfp flag in setup_cpu_cache()
x86: make zap_low_mapping could be used early
irq: slab alloc for default irq_affinity
memcg: fix page_cgroup fatal error in FLATMEM
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (154 commits)
[SCSI] osd: Remove out-of-tree left overs
[SCSI] libosd: Use REQ_QUIET requests.
[SCSI] osduld: use filp_open() when looking up an osd-device
[SCSI] libosd: Define an osd_dev wrapper to retrieve the request_queue
[SCSI] libosd: osd_req_{read,write} takes a length parameter
[SCSI] libosd: Let _osd_req_finalize_data_integrity receive number of out_bytes
[SCSI] libosd: osd_req_{read,write}_kern new API
[SCSI] libosd: Better printout of OSD target system information
[SCSI] libosd: OSD2r05: Attribute definitions
[SCSI] libosd: OSD2r05: Additional command enums
[SCSI] mpt fusion: fix up doc book comments
[SCSI] mpt fusion: Added support for Broadcast primitives Event handling
[SCSI] mpt fusion: Queue full event handling
[SCSI] mpt fusion: RAID device handling and Dual port Raid support is added
[SCSI] mpt fusion: Put IOC into ready state if it not already in ready state
[SCSI] mpt fusion: Code Cleanup patch
[SCSI] mpt fusion: Rescan SAS topology added
[SCSI] mpt fusion: SAS topology scan changes, expander events
[SCSI] mpt fusion: Firmware event implementation using seperate WorkQueue
[SCSI] mpt fusion: rewrite of ioctl_cmds internal generated function
...
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (31 commits)
lguest: add support for indirect ring entries
lguest: suppress notifications in example Launcher
lguest: try to batch interrupts on network receive
lguest: avoid sending interrupts to Guest when no activity occurs.
lguest: implement deferred interrupts in example Launcher
lguest: remove obsolete LHREQ_BREAK call
lguest: have example Launcher service all devices in separate threads
lguest: use eventfds for device notification
eventfd: export eventfd_signal and eventfd_fget for lguest
lguest: allow any process to send interrupts
lguest: PAE fixes
lguest: PAE support
lguest: Add support for kvm_hypercall4()
lguest: replace hypercall name LHCALL_SET_PMD with LHCALL_SET_PGD
lguest: use native_set_* macros, which properly handle 64-bit entries when PAE is activated
lguest: map switcher with executable page table entries
lguest: fix writev returning short on console output
lguest: clean up length-used value in example launcher
lguest: Segment selectors are 16-bit long. Fix lg_cpu.ss1 definition.
lguest: beyond ARRAY_SIZE of cpu->arch.gdt
...
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-virtio:
virtio: enhance id_matching for virtio drivers
virtio: fix id_matching for virtio drivers
virtio: handle short buffers in virtio_rng.
virtio_blk: add missing __dev{init,exit} markings
virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
virtio: teach virtio_has_feature() about transport features
virtio: expose features in sysfs
virtio_pci: optional MSI-X support
virtio_pci: split up vp_interrupt
virtio: find_vqs/del_vqs virtio operations
virtio: add names to virtqueue struct, mapping from devices to queues.
virtio: meet virtio spec by finalizing features before using device
virtio: fix obsolete documentation on probe function
Linus Torvalds [Fri, 12 Jun 2009 16:31:20 +0000 (09:31 -0700)]
Merge branch 'cuse' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'cuse' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
CUSE: implement CUSE - Character device in Userspace
fuse: export symbols to be used by CUSE
fuse: update fuse_conn_init() and separate out fuse_conn_kill()
fuse: don't use inode in fuse_file_poll
fuse: don't use inode in fuse_do_ioctl() helper
fuse: don't use inode in fuse_sync_release()
fuse: create fuse_do_open() helper for CUSE
fuse: clean up args in fuse_finish_open() and fuse_release_fill()
fuse: don't use inode in helpers called by fuse_direct_io()
fuse: add members to struct fuse_file
fuse: prepare fuse_direct_io() for CUSE
fuse: clean up fuse_write_fill()
fuse: use struct path in release structure
fuse: misc cleanups
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-module-and-param:
module: cleanup FIXME comments about trimming exception table entries.
module: trim exception table on init free.
module: merge module_alloc() finally
uml module: fix uml build process due to this merge
x86 module: merge the rest functions with macros
x86 module: merge the same functions in module_32.c and module_64.c
uvesafb: improve parameter handling.
module_param: allow 'bool' module_params to be bool, not just int.
module_param: add __same_type convenience wrapper for __builtin_types_compatible_p
module_param: split perm field into flags and perm
module_param: invbool should take a 'bool', not an 'int'
cyber2000fb.c: use proper method for stopping unload if CONFIG_ARCH_SHARK
Linus Torvalds [Fri, 12 Jun 2009 16:29:42 +0000 (09:29 -0700)]
Merge branch 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6
* 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (29 commits)
ide: re-implement ide_pci_init_one() on top of ide_pci_init_two()
ide: unexport ide_find_dma_mode()
ide: fix PowerMac bootup oops
ide: skip probe if there are no devices on the port (v2)
sl82c105: add printk() logging facility
ide-tape: fix proc warning
ide: add IDE_DFLAG_NIEN_QUIRK device flag
ide: respect quirk_drives[] list on all controllers
hpt366: enable all quirks for devices on quirk_drives[] list
hpt366: sync quirk_drives[] list with pdc202xx_{new,old}.c
ide: remove superfluous SELECT_MASK() call from do_rw_taskfile()
ide: remove superfluous SELECT_MASK() call from ide_driveid_update()
icside: remove superfluous ->maskproc method
ide-tape: fix IDE_AFLAG_* atomic accesses
ide-tape: change IDE_AFLAG_IGNORE_DSC non-atomically
pdc202xx_old: kill resetproc() method
pdc202xx_old: don't call pdc202xx_reset() on IRQ timeout
pdc202xx_old: use ide_dma_test_irq()
ide: preserve Host Protected Area by default (v2)
ide-gd: implement block device ->set_capacity method (v2)
...
Linus Torvalds [Fri, 12 Jun 2009 16:26:32 +0000 (09:26 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Provide _sdata in the vmlinux.lds.S file
x86: handle initrd that extends into unusable memory
Pekka Enberg [Fri, 12 Jun 2009 11:03:06 +0000 (14:03 +0300)]
slab,slub: don't enable interrupts during early boot
As explained by Benjamin Herrenschmidt:
Oh and btw, your patch alone doesn't fix powerpc, because it's missing
a whole bunch of GFP_KERNEL's in the arch code... You would have to
grep the entire kernel for things that check slab_is_available() and
even then you'll be missing some.
For example, slab_is_available() didn't always exist, and so in the
early days on powerpc, we used a mem_init_done global that is set form
mem_init() (not perfect but works in practice). And we still have code
using that to do the test.
Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators
in early boot code to avoid enabling interrupts.
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Mark McLoughlin [Mon, 11 May 2009 17:11:46 +0000 (18:11 +0100)]
lguest: add support for indirect ring entries
Support the VIRTIO_RING_F_INDIRECT_DESC feature.
This is a simple matter of changing the descriptor walking
code to operate on a struct vring_desc* and supplying it
with an indirect table if detected.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 04:27:12 +0000 (22:27 -0600)]
lguest: suppress notifications in example Launcher
The Guest only really needs to tell us about activity when we're going
to listen to the eventfd: normally, we don't want to know.
So if there are no available buffers, turn on notifications, re-check,
then wait for the Guest to notify us via the eventfd, then turn
notifications off again.
There's enough else going on that the differences are in the noise.
Rusty Russell [Sat, 13 Jun 2009 04:27:12 +0000 (22:27 -0600)]
lguest: try to batch interrupts on network receive
Rather than triggering an interrupt every time, we only trigger an
interrupt when there are no more incoming packets (or the recv queue
is full).
However, the overhead of doing the select to figure this out is
measurable: 1M pings goes from 98 to 104 seconds, and 1G Guest->Host
TCP goes from 3.69 to 3.94 seconds. It's close to the noise though.
I tested various timeouts, including reducing it as the number of
pending packets increased, timing a 1 gigabyte TCP send from Guest ->
Host and Host -> Guest (GSO disabled, to increase packet rate).
Rusty Russell [Sat, 13 Jun 2009 04:27:11 +0000 (22:27 -0600)]
lguest: avoid sending interrupts to Guest when no activity occurs.
If we track how many buffers we've used, we can tell whether we really
need to interrupt the Guest. This happens as a side effect of
spurious notifications.
Spurious notifications happen because it can take a while before the
Host thread wakes up and sets the VRING_USED_F_NO_NOTIFY flag, and
meanwhile the Guest can more notifications.
A real fix would be to use wake counts, rather than a suppression
flag, but the practical difference is generally in the noise: the
interrupt is usually coalesced into a pending one anyway so we just
save a system call which isn't clearly measurable.
Rusty Russell [Sat, 13 Jun 2009 04:27:11 +0000 (22:27 -0600)]
lguest: implement deferred interrupts in example Launcher
Rather than sending an interrupt on every buffer, we only send an interrupt
when we're about to wait for the Guest to send us a new one. The console
input and network input still send interrupts manually, but the block device,
network and console output queues can simply rely on this logic to send
interrupts to the Guest at the right time.
The patch is cluttered by moving trigger_irq() higher in the code.
In practice, two factors make this optimization less interesting:
(1) we often only get one input at a time, even for networking,
(2) triggering an interrupt rapidly tends to get coalesced anyway.
Rusty Russell [Sat, 13 Jun 2009 04:27:10 +0000 (22:27 -0600)]
lguest: remove obsolete LHREQ_BREAK call
We no longer need an efficient mechanism to force the Guest back into
host userspace, as each device is serviced without bothering the main
Guest process (aka. the Launcher).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 04:27:10 +0000 (22:27 -0600)]
lguest: have example Launcher service all devices in separate threads
Currently lguest has three threads: the main Launcher thread, a Waker
thread, and a thread for the block device (because synchronous block
was simply too painful to bear).
The Waker selects() on all the input file descriptors (eg. stdin, net
devices, pipe to the block thread) and when one becomes readable it calls
into the kernel to kick the Launcher thread out into userspace, which
repeats the poll, services the device(s), and then tells the kernel to
release the Waker before re-entering the kernel to run the Guest.
Also, to make a slightly-decent network transmit routine, the Launcher
would suppress further network interrupts while it set a timer: that
signal handler would write to a pipe, which would rouse the Waker
which would prod the Launcher out of the kernel to check the network
device again.
Now we can convert all our virtqueues to separate threads: each one has
a separate eventfd for when the Guest pokes the device, and can trigger
interrupts in the Guest directly.
The linecount shows how much this simplifies, but to really bring it
home, here's an strace analysis of single Guest->Host ping before:
* Guest sends packet, notifies xmit vq, return control to Launcher
* Launcher clears notification flag on xmit ring
* Launcher writes packet to TUN device
writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\366\r\224`\2058\272m\224vf\274\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
* Launcher sets up interrupt for Guest (xmit ring is empty)
write(10, "\2\0\0\0\3\0\0\0", 8) = 0
* Launcher sets up timer for interrupt mitigation
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 505}}, NULL) = 0
* Launcher re-runs guest
pread64(10, 0xbfa5f4d4, 4, 0) ...
* Waker notices reply packet in tun device (it was in select)
select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [4])
* Waker kicks Launcher out of guest:
pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
* Launcher returns from running guest:
... = -1 EAGAIN (Resource temporarily unavailable)
* Launcher looks at input fds:
select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [4], left {0, 0})
* Launcher reads pong from tun device:
readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"\272m\224vf\274\366\r\224`\2058\10\0E\0\0T\364\26\0\0@"..., 1518}], 2) = 108
* Launcher injects guest notification:
write(10, "\2\0\0\0\2\0\0\0", 8) = 0
* Launcher rechecks fds:
select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
* Launcher clears Waker:
pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
* Launcher reruns Guest:
pread64(10, 0xbfa5f4d4, 4, 0) = ? ERESTARTSYS (To be restarted)
* Signal comes in, uses pipe to wake up Launcher:
--- SIGALRM (Alarm clock) @ 0 (0) ---
write(8, "\0", 1) = 1
sigreturn() = ? (mask now [])
* Waker sees write on pipe:
select(12, [0 3 4 6 11], NULL, NULL, NULL) = 1 (in [6])
* Waker kicks Launcher out of Guest:
pwrite64(10, "\3\0\0\0\1\0\0\0", 8, 0) = 0
* Launcher exits from kernel:
pread64(10, 0xbfa5f4d4, 4, 0) = -1 EAGAIN (Resource temporarily unavailable)
* Launcher looks to see what fd woke it:
select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 1 (in [6], left {0, 0})
* Launcher reads timeout fd, sets notification flag on xmit ring
read(6, "\0", 32) = 1
* Launcher rechecks fds:
select(7, [0 3 4 6], NULL, NULL, {0, 0}) = 0 (Timeout)
* Launcher clears Waker:
pwrite64(10, "\3\0\0\0\0\0\0\0", 8, 0) = 0
* Launcher resumes Guest:
pread64(10, "\0p\0\4", 4, 0) ....
strace analysis of single Guest->Host ping after:
* Guest sends packet, notifies xmit vq, creates event on eventfd.
* Network xmit thread wakes from read on eventfd:
read(7, "\1\0\0\0\0\0\0\0", 8) = 8
* Network xmit thread writes packet to TUN device
writev(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"J\217\232FI\37j\27\375\276\0\304\10\0E\0\0T\0\0@\0@\1\265"..., 98}], 2) = 108
* Network recv thread wakes up from read on tunfd:
readv(4, [{"\0\0\0\0\0\0\0\0\0\0", 10}, {"j\27\375\276\0\304J\217\232FI\37\10\0E\0\0TiO\0\0@\1\214"..., 1518}], 2) = 108
* Network recv thread sets up interrupt for the Guest
write(6, "\2\0\0\0\2\0\0\0", 8) = 0
* Network recv thread goes back to reading tunfd
13:39:42.460285 readv(4, <unfinished ...>
* Network xmit thread sets up interrupt for Guest (xmit ring is empty)
write(6, "\2\0\0\0\3\0\0\0", 8) = 0
* Network xmit thread goes back to reading from eventfd
read(7, <unfinished ...>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 04:27:09 +0000 (22:27 -0600)]
lguest: use eventfds for device notification
Currently, when a Guest wants to perform I/O it calls LHCALL_NOTIFY with
an address: the main Launcher process returns with this address, and figures
out what device to run.
A far nicer model is to let processes bind an eventfd to an address: if we
find one, we simply signal the eventfd.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Davide Libenzi <davidel@xmailserver.org>
Rusty Russell [Sat, 13 Jun 2009 04:27:08 +0000 (22:27 -0600)]
lguest: allow any process to send interrupts
We currently only allow the Launcher process to send interrupts, but it
as we already send interrupts from the hrtimer, it's a simple matter of
extracting that code into a common set_interrupt routine.
As we switch to a thread per virtqueue, this avoids a bottleneck through the
main Launcher process.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 04:27:08 +0000 (22:27 -0600)]
lguest: PAE fixes
1) j wasn't initialized in setup_pagetables, so they weren't set up for me
causing immediate guest crashes.
2) gpte_addr should not re-read the pmd from the Guest. Especially
not BUG_ON() based on the value. If we ever supported SMP guests,
they could trigger that. And the Launcher could also trigger it
(tho currently root-only).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 04:27:04 +0000 (22:27 -0600)]
lguest: clean up length-used value in example launcher
The "len" field in the used ring for virtio indicates the number of
bytes *written* to the buffer. This means the guest doesn't have to
zero the buffers in advance as it always knows the used length.
Erroneously, the console and network example code puts the length
*read* into that field. The guest ignores it, but it's wrong.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 04:27:03 +0000 (22:27 -0600)]
lguest: optimize by coding restore_flags and irq_enable in assembler.
The downside of the last patch which made restore_flags and irq_enable
check interrupts is that they are now too big to be patched directly
into the callsites, so the C versions are always used.
But the C versions go via PV_CALLEE_SAVE_REGS_THUNK which saves all
the registers. In fact, we don't need any registers in the fast path,
so we can do better than this if we actually code them in assembler.
The results are in the noise, but since it's about the same amount of
code, it's worth applying.
Rusty Russell [Sat, 13 Jun 2009 04:27:02 +0000 (22:27 -0600)]
lguest: improve interrupt handling, speed up stream networking
lguest never checked for pending interrupts when enabling interrupts, and
things still worked. However, it makes a significant difference to TCP
performance, so it's time we fixed it by introducing a pending_irq flag
and checking it on irq_restore and irq_enable.
These two routines are now too big to patch into the 8/10 bytes
patch space, so we drop that code.
Note: The high latency on interrupt delivery had a very curious
effect: once everything else was optimized, networking without GSO was
faster than networking with GSO, since more interrupts were sent and
hence a greater chance of one getting through to the Guest!
Note2: (Almost) Closing the same loophole for iret doesn't have any
measurable effect, so I'm leaving that patch for the moment.
Rusty Russell [Sat, 13 Jun 2009 04:27:02 +0000 (22:27 -0600)]
lguest: fix race in halt code
When the Guest does the LHCALL_HALT hypercall, we go to sleep, expecting
that a timer or the Waker will wake_up_process() us.
But we do it in a stupid way, leaving a classic missing wakeup race.
So split maybe_do_interrupt() into interrupt_pending() and
try_deliver_interrupt(), and check maybe_do_interrupt() and the
"break_out" flag before calling schedule.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 04:27:01 +0000 (22:27 -0600)]
lguest: remove invalid interrupt forcing logic.
20887611523e749d99cc7d64ff6c97d27529fbae (lguest: notify on empty) introduced
lguest support for the VIRTIO_F_NOTIFY_ON_EMPTY flag, but in fact it turned on
interrupts all the time.
Because we always process one buffer at a time, the inflight count is always 0
when call trigger_irq and so we always ignore VRING_AVAIL_F_NO_INTERRUPT from
the Guest.
It should be looking to see if there are more buffers in the Guest's queue:
if it's empty, then we force an interrupt.
This makes little difference, since we usually have an empty queue; but
that's the subject of another patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 04:27:01 +0000 (22:27 -0600)]
lguest: fix lguest wake on guest clock tick, or fd activity
The Launcher could be inside the Guest on another CPU; wake_up_process
will do nothing because it is "running". kick_process will knock it
back into our kernel in this case, otherwise we'll miss it until the
next guest exit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 04:26:58 +0000 (22:26 -0600)]
lguest: be paranoid about guest playing with device descriptors.
We can't trust the values in the device descriptor table once the
guest has booted, so keep local copies. They could set them to
strange values then cause us to segv (they're 8 bit values, so they
can't make our pointers go too wild).
This becomes more important with the following patches which read them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch allows a virtio driver to use VIRTIO_DEV_ANY_ID for the
device id. This will be used by a test module that can be bound to
any virtio device.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This bug never appeared, since all current virtio drivers use
VIRTIO_DEV_ANY_ID for the vendor field. If a real vendor would be used,
the check in virtio_id_match is wrong - it returns 0 if
id->vendor == dev->id.vendor.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 04:16:39 +0000 (22:16 -0600)]
virtio: handle short buffers in virtio_rng.
If the device fills less than 4 bytes of our random buffer, we'll
BUG_ON. It's nicer to handle the case where it partially fills the
buffer (the protocol doesn't explicitly bad that).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Mike Frysinger [Mon, 18 May 2009 07:39:09 +0000 (03:39 -0400)]
virtio_blk: add missing __dev{init,exit} markings
The remove member of the virtio_driver structure uses __devexit_p(), so
the remove function itself should be marked with __devexit. And where
there be __devexit on the remove, so is there __devinit on the probe.
Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Mark McLoughlin [Mon, 11 May 2009 17:11:45 +0000 (18:11 +0100)]
virtio: indirect ring entries (VIRTIO_RING_F_INDIRECT_DESC)
Add a new feature flag for indirect ring entries. These are ring
entries which point to a table of buffer descriptors.
The idea here is to increase the ring capacity by allowing a larger
effective ring size whereby the ring size dictates the number of
requests that may be outstanding, rather than the size of those
requests.
This should be most effective in the case of block I/O where we can
potentially benefit by concurrently dispatching a large number of
large requests. Even in the simple case of single segment block
requests, this results in a threefold increase in ring capacity.
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This implements optional MSI-X support in virtio_pci.
MSI-X is used whenever the host supports at least 2 MSI-X
vectors: 1 for configuration changes and 1 for virtqueues.
Per-virtqueue vectors are allocated if enough vectors
available.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ whitespace, style)
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)
Rusty Russell [Sat, 13 Jun 2009 04:16:35 +0000 (22:16 -0600)]
virtio: meet virtio spec by finalizing features before using device
Virtio devices are supposed to negotiate features before they start using
the device, but the current code doesn't do this. This is because the
driver's probe() function invariably has to add buffers to a virtqueue,
or probe the disk (virtio_blk).
This currently doesn't matter since no existing backend is strict about
the feature negotiation. But it's possible to imagine a future feature
which completely changes how a device operates: in this case, we'd need
to acknowledge it before using the device.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 03:47:03 +0000 (21:47 -0600)]
module: trim exception table on init free.
It's theoretically possible that there are exception table entries
which point into the (freed) init text of modules. These could cause
future problems if other modules get loaded into that memory and cause
an exception as we'd see the wrong fixup. The only case I know of is
kvm-intel.ko (when CONFIG_CC_OPTIMIZE_FOR_SIZE=n).
Amerigo fixed this long-standing FIXME in the x86 version, but this
patch is more general.
This implements trim_init_extable(); most archs are simple since they
use the standard lib/extable.c sort code. Alpha and IA64 use relative
addresses in their fixups, so thier trimming is a slight variation.
Sparc32 is unique; it doesn't seem to define ARCH_HAS_SORT_EXTABLE,
yet it defines its own sort_extable() which overrides the one in lib.
It doesn't sort, so we have to mark deleted entries instead of
actually trimming them.
Inspired-by: Amerigo Wang <amwang@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: linux-alpha@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-ia64@vger.kernel.org
Rusty Russell [Sat, 13 Jun 2009 03:46:58 +0000 (21:46 -0600)]
uvesafb: improve parameter handling.
1) Now module_param(..., invbool, ...) requires a bool, and similarly
module_param(..., bool, ...) allows it, change pmi_setpal to a bool.
2) #define param_get_scroll to NULL, since it can never be called (perm
argument to module_param_named is 0).
3) Return -EINVAL from param_set_scroll if the value is bad, so it's
reported.
Note that I don't think the old fb_get_options() is required for new
drivers: the parameters automatically work as uvesafb.XXX=... anyway.
Acked-by: MichaĆ Januszewski <spock@gentoo.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Sat, 13 Jun 2009 03:46:50 +0000 (21:46 -0600)]
cyber2000fb.c: use proper method for stopping unload if CONFIG_ARCH_SHARK
Russell explains the __module_get():
> cyber2000fb.c does it in its module initialization function
> to prevent the module (when built for Shark) from being unloaded. It
> does this because it's from the days of 2.2 kernels and no one bothered
> writing the module unload support for Shark.
Since 2.4, the correct answer has been to not define an unload fn.
Cc: Russell King <rmk+lkml@arm.linux.org.uk> Cc: alex@shark-linux.de Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now, SLAB is configured in very early stage and it can be used in
init routine now.
But replacing alloc_bootmem() in FLAT/DISCONTIGMEM's page_cgroup()
initialization breaks the allocation, now.
(Works well in SPARSEMEM case...it supports MEMORY_HOTPLUG and
size of page_cgroup is in reasonable size (< 1 << MAX_ORDER.)
This patch revive FLATMEM+memory cgroup by using alloc_bootmem.
In future,
We stop to support FLATMEM (if no users) or rewrite codes for flatmem
completely.But this will adds more messy codes and overheads.
Reported-by: Li Zefan <lizf@cn.fujitsu.com> Tested-by: Li Zefan <lizf@cn.fujitsu.com> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
This patch adds the ability to trace various aspects of the GFS2
filesystem. The trace points are divided into three groups,
glocks, logging and bmap. These points have been chosen because
they allow inspection of the major internal functions of GFS2
and they are also generic enough that they are unlikely to need
any major changes as the filesystem evolves.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Catalin Marinas [Mon, 11 May 2009 12:22:00 +0000 (13:22 +0100)]
x86: Provide _sdata in the vmlinux.lds.S file
_sdata is a common symbol defined by many architectures and made
available to the kernel via asm-generic/sections.h. Kmemleak uses this
symbol when scanning the data sections.