]> git.karo-electronics.de Git - linux-beck.git/log
linux-beck.git
13 years agodrbd: we should write meta data updates with FLUSH FUA
Lars Ellenberg [Tue, 28 Jun 2011 11:22:48 +0000 (13:22 +0200)]
drbd: we should write meta data updates with FLUSH FUA

We used to write these with BIO_RW_BARRIER aka REQ_HARDBARRIER (unless
disabled in the configuration). The correct semantic now would be to
write with FLUSH/FUA.
For example, with activity log transactions, FUA alone is not enough, we
need the corresponding bitmap update (and all related application
updates) on stable storage as well.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix limit define, we support 1 PiByte now
Lars Ellenberg [Tue, 28 Jun 2011 07:48:48 +0000 (09:48 +0200)]
drbd: fix limit define, we support 1 PiByte now

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: when receive times out on meta socket, also check last receive time on data...
Lars Ellenberg [Mon, 20 Jun 2011 12:44:45 +0000 (14:44 +0200)]
drbd: when receive times out on meta socket, also check last receive time on data socket

If we have an asymetrically congested network, we may send P_PING,
but due to congestion, the corresponding P_PING_ACK would time out,
and we would drop a (congested, but otherwise) healthy connection
("PingAck did not arrive in time.")

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: account bitmap IO during resync as resync-(related-)-io
Lars Ellenberg [Tue, 14 Jun 2011 12:18:23 +0000 (14:18 +0200)]
drbd: account bitmap IO during resync as resync-(related-)-io

If we have a good resync rate, we will frequently update the on-disk
bitmap, which, if not accounted for as resync io, may let an otherwise
idle device appear to be "busy", and cause us to throttle resync.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: don't cond_resched_lock with IRQs disabled
Lars Ellenberg [Mon, 6 Jun 2011 09:31:42 +0000 (11:31 +0200)]
drbd: don't cond_resched_lock with IRQs disabled

The last commit, drbd: add missing spinlock to bitmap receive,
introduced a cond_resched_lock(), where the lock in question is taken
with irqs disabled.

As we must not schedule with IRQs disabled,
and cond_resched_lock_irq() does not exist, yet,
we re-aquire the spin_lock_irq() for each bitmap page processed in turn.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: add missing spinlock to bitmap receive
Lars Ellenberg [Fri, 3 Jun 2011 19:18:13 +0000 (21:18 +0200)]
drbd: add missing spinlock to bitmap receive

During bitmap exchange, when using the RLE bitmap compression scheme,
we have a code path that can set the whole bitmap at once.

To avoid holding spin_lock_irq() for too long, we used to lock out other
bitmap modifications during bitmap exchange by other means, and then,
knowing we have exclusive access to the bitmap, modify it without
the spinlock, and with IRQs enabled.

Since we now allow local IO to continue, potentially setting additional
bits during the bitmap receive phase, this is no longer true, and we get
uncoordinated updates of bitmap members, causing bm_set to no longer
accurately reflect the total number of set bits.

To actually see this, you'd need to have a large bitmap, use RLE bitmap
compression, and have busy IO during sync handshake and bitmap exchange.

Fix this by taking the spin_lock_irq() in this code path as well, but
calling cond_resched_lock() after each page worth of bits processed.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Use the correct max_bio_size when creating resync requests
Philipp Reisner [Wed, 25 May 2011 09:14:35 +0000 (11:14 +0200)]
drbd: Use the correct max_bio_size when creating resync requests

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agoloop: handle on-demand devices correctly
Namhyung Kim [Tue, 24 May 2011 14:48:55 +0000 (16:48 +0200)]
loop: handle on-demand devices correctly

When finding or allocating a loop device, loop_probe() did not take
partition numbers into account so that it can result to a different
device. Consider following example:

$ sudo modprobe loop max_part=15
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7,   0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7,  16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7,  32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7,  48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7,  64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7,  80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7,  96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7, 112 2011-05-24 22:16 /dev/loop7
$ sudo mknod /dev/loop8 b 7 128
$ sudo losetup /dev/loop8 ~/temp/disk-with-3-parts.img
$ sudo losetup -a
/dev/loop128: [0805]:278201 (/home/namhyung/temp/disk-with-3-parts.img)
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7,    0 2011-05-24 22:16 /dev/loop0
brw-rw---- 1 root disk 7,   16 2011-05-24 22:16 /dev/loop1
brw-rw---- 1 root disk 7, 2048 2011-05-24 22:18 /dev/loop128
brw-rw---- 1 root disk 7, 2049 2011-05-24 22:18 /dev/loop128p1
brw-rw---- 1 root disk 7, 2050 2011-05-24 22:18 /dev/loop128p2
brw-rw---- 1 root disk 7, 2051 2011-05-24 22:18 /dev/loop128p3
brw-rw---- 1 root disk 7,   32 2011-05-24 22:16 /dev/loop2
brw-rw---- 1 root disk 7,   48 2011-05-24 22:16 /dev/loop3
brw-rw---- 1 root disk 7,   64 2011-05-24 22:16 /dev/loop4
brw-rw---- 1 root disk 7,   80 2011-05-24 22:16 /dev/loop5
brw-rw---- 1 root disk 7,   96 2011-05-24 22:16 /dev/loop6
brw-rw---- 1 root disk 7,  112 2011-05-24 22:16 /dev/loop7
brw-r--r-- 1 root root 7,  128 2011-05-24 22:17 /dev/loop8

After this patch, /dev/loop8 - instead of /dev/loop128 - was
accessed correctly.

In addition, 'range' passed to blk_register_region() should
include all range of dev_t that LOOP_MAJOR can address. It does
not need to be limited by partition numbers unless 'max_loop'
param was specified.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoloop: limit 'max_part' module param to DISK_MAX_PARTS
Namhyung Kim [Tue, 24 May 2011 14:48:54 +0000 (16:48 +0200)]
loop: limit 'max_part' module param to DISK_MAX_PARTS

The 'max_part' parameter controls the number of maximum partition
a loop block device can have. However if a user specifies very
large value it would exceed the limitation of device minor number
and can cause a kernel panic (or, at least, produce invalid
device nodes in some cases).

On my desktop system, following command kills the kernel. On qemu,
it triggers similar oops but the kernel was alive:

$ sudo modprobe loop max_part0000
 ------------[ cut here ]------------
 kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
 invalid opcode: 0000 [#1] SMP
 last sysfs file:
 CPU 0
 Modules linked in: loop(+)

 Pid: 43, comm: insmod Tainted: G        W   2.6.39-qemu+ #155 Bochs Bochs
 RIP: 0010:[<ffffffff8113ce61>]  [<ffffffff8113ce61>] internal_create_group=
+0x2a/0x170
 RSP: 0018:ffff880007b3fde8  EFLAGS: 00000246
 RAX: 00000000ffffffef RBX: ffff880007b3d878 RCX: 00000000000007b4
 RDX: ffffffff8152da50 RSI: 0000000000000000 RDI: ffff880007b3d878
 RBP: ffff880007b3fe38 R08: ffff880007b3fde8 R09: 0000000000000000
 R10: ffff88000783b4a8 R11: ffff880007b3d878 R12: ffffffff8152da50
 R13: ffff880007b3d868 R14: 0000000000000000 R15: ffff880007b3d800
 FS:  0000000002137880(0063) GS:ffff880007c00000(0000) knlGS:00000000000000=
00
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000422680 CR3: 0000000007b50000 CR4: 00000000000006b0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
 Process insmod (pid: 43, threadinfo ffff880007b3e000, task ffff880007afb9c=
0)
 Stack:
  ffff880007b3fe58 ffffffff811e66dd ffff880007b3fe58 ffffffff811e570b
  0000000000000010 ffff880007b3d800 ffff880007a7b390 ffff880007b3d868
  0000000000400920 ffff880007b3d800 ffff880007b3fe48 ffffffff8113cfc8
 Call Trace:
  [<ffffffff811e66dd>] ? device_add+0x4bc/0x5af
  [<ffffffff811e570b>] ? dev_set_name+0x3c/0x3e
  [<ffffffff8113cfc8>] sysfs_create_group+0xe/0x12
  [<ffffffff810b420e>] blk_trace_init_sysfs+0x14/0x16
  [<ffffffff8116a090>] blk_register_queue+0x47/0xf7
  [<ffffffff8116f527>] add_disk+0xdf/0x290
  [<ffffffffa00060eb>] loop_init+0xeb/0x1b8 [loop]
  [<ffffffffa0006000>] ? 0xffffffffa0005fff
  [<ffffffff8100020a>] do_one_initcall+0x7a/0x12e
  [<ffffffff81096804>] sys_init_module+0x9c/0x1e0
  [<ffffffff813329bb>] system_call_fastpath+0x16/0x1b
 Code: c3 55 48 89 e5 41 57 41 56 41 89 f6 41 55 41 54 49 89 d4 53 48 89 fb=
 48 83 ec 28 48 85 ff 74 0b 85 f6 75 0b 48 83 7f 30 00 75 14 <0f> 0b eb fe =
48 83 7f 30 00 b9 ea ff ff ff 0f 84 18 01 00 00 49
 RIP  [<ffffffff8113ce61>] internal_create_group+0x2a/0x170
  RSP <ffff880007b3fde8>
 ---[ end trace a123eb592043acad ]---

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Laurent Vivier <Laurent.Vivier@bull.net>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agodrbd: fix warning
Andrew Morton [Mon, 23 May 2011 22:29:32 +0000 (15:29 -0700)]
drbd: fix warning

In file included from drivers/block/drbd/drbd_main.c:54:                        drivers/block/drbd/drbd_int.h:1190: warning: parameter has incomplete type

Forward declarations of enums do not work.

Fix it unpleasantly by moving the prototype.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Lars Ellenberg <drbd-dev@lists.linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
13 years agodrbd: fix warning
Philipp Reisner [Tue, 24 May 2011 08:27:38 +0000 (10:27 +0200)]
drbd: fix warning

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
13 years agodrbd: Fix spelling
Bart Van Assche [Sat, 21 May 2011 16:32:29 +0000 (18:32 +0200)]
drbd: Fix spelling

Found these with the help of ispell -l.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
13 years agodrbd: fix schedule in atomic
Lars Ellenberg [Mon, 2 May 2011 09:51:31 +0000 (11:51 +0200)]
drbd: fix schedule in atomic

An administrative detach used to request a state change directly to D_DISKLESS,
first suspending IO to avoid the last put_ldev() occuring from an endio handler,
potentially in irq context.

This is not enough on the receiving side (typically secondary), we may miss
some peer_req on the way to local disk, which then may do the last put_ldev()
from their drbd_peer_request_endio().

This patch makes the detach always go through the intermediate D_FAILED state.
We may consider to rename it D_DETACHING.

Alternative approach would be to create yet an other work item to be scheduled
on the worker, do the destructor work from there, and get the timing right.

manually picked commit 564040f from the drbd 8.4 branch.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Take a more conservative approach when deciding max_bio_size
Philipp Reisner [Fri, 20 May 2011 14:39:13 +0000 (16:39 +0200)]
drbd: Take a more conservative approach when deciding max_bio_size

The old (optimistic) implementation could shrink the bio size
on an primary device.

Shrinking the bio size on a primary device is bad. Since there
we might get BIOs with the old (bigger) size shortly after
we published the new size.

The new implementation is more conservative, and eventually
increases the max_bio_size on a primary device (which is valid).
It does so, when it knows the local limit AND the remote limit.

 We cache the last seen max_bio_size of the peer in the meta
 data, and rely on that, to make the operation of single
 nodes more efficient.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Fixed state transitions after async outdate-peer-handler returned
Philipp Reisner [Tue, 17 May 2011 12:19:41 +0000 (14:19 +0200)]
drbd: Fixed state transitions after async outdate-peer-handler returned

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Disallow the peer_disk_state to be D_OUTDATED while connected
Philipp Reisner [Tue, 17 May 2011 12:48:55 +0000 (14:48 +0200)]
drbd: Disallow the peer_disk_state to be D_OUTDATED while connected

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Fix for the connection problems on high latency links
Philipp Reisner [Fri, 13 May 2011 10:03:55 +0000 (12:03 +0200)]
drbd: Fix for the connection problems on high latency links

It seems that the real cause of all the issues where that
we did not noticed in drbd_try_connect() when the other
guy closes one socket if the round trip time gets higher
than 100ms. There were that 100ms hard coded!

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix potential activity log refcount imbalance in error path
Lars Ellenberg [Mon, 16 May 2011 13:31:45 +0000 (15:31 +0200)]
drbd: fix potential activity log refcount imbalance in error path

It is no longer sufficient to trigger on local WRITE,
we need to check on (rq_state & RQ_IN_ACT_LOG)
before calling drbd_al_complete_io also in the error path.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Only downgrade the disk state in case of disk failures
Philipp Reisner [Mon, 14 Mar 2011 10:54:47 +0000 (11:54 +0100)]
drbd: Only downgrade the disk state in case of disk failures

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix disconnect/reconnect loop, if ping-timeout == ping-int
Lars Ellenberg [Wed, 9 Mar 2011 21:44:55 +0000 (22:44 +0100)]
drbd: fix disconnect/reconnect loop, if ping-timeout == ping-int

If there is no replication traffic within the idle timeout
(ping-int seconds), DRBD will send a P_PING,
and adjust the timeout to ping-timeout.

If there is no P_PING_ACK received within this ping-timeout,
DRBD finally drops the connection, and tries to re-establish it.

To decide which timeout was active, we compared the current timeout
with the ping-timeout, and dropped the connection, if that was the case.

By default, ping-int is 10 seconds, ping-timeout is 500 ms.

Unfortunately, if you configure ping-timeout to be the same as ping-int,
expiry of the idle-timeout had been mistaken for a missing ping ack,
and caused an immediate reconnection attempt.

Fix:
Allow both timeouts to be equal, use a local variable
to store which timeout is active.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: fix potential distributed deadlock
Lars Ellenberg [Tue, 8 Mar 2011 16:11:40 +0000 (17:11 +0100)]
drbd: fix potential distributed deadlock

We limit ourselves to a configurable maximum number of pages used as
temporary bio pages.

If the configured "max_buffers" is not big enough to match the bandwidth
of the respective deployment, a distributed deadlock could be triggered
by e.g. fast online verify and heavy application IO.

TCP connections would block on congestion, because both receivers
would wait on pages to become available.

Fortunately the respective senders in this case would be able to give
back some pages already. So do that.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agolru_cache.h: fix comments referring to ts_ instead of lc_
Lars Ellenberg [Thu, 27 Jan 2011 14:24:58 +0000 (15:24 +0100)]
lru_cache.h: fix comments referring to ts_ instead of lc_

For some time we contemplated calling the "struct lru_cache"
a "struct tracked_set", and some comments kept the ts_ prefix.

Fix those to match the member field names.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agodrbd: Fix for application IO with the on-io-error=pass-on policy
Philipp Reisner [Wed, 2 Mar 2011 23:21:30 +0000 (00:21 +0100)]
drbd: Fix for application IO with the on-io-error=pass-on policy

In case a write failes on the local disk, go into D_INCONSISTENT
disk state. That causes future reads of that block to be shipped
to the peer.

Read retry remote was already in place.

Actually the documentation needs to get fixed now. Since the
application is still shielded from the error. (as long as we have
only a single disk failing) The difference to detach is that
we keep the disk. And therefore might keep all the other, still
working sectors up to date.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
13 years agoMerge branches 'for-jens/xen-backend-fixes' and 'for-jens/xen-blkback-v3.3' of git...
Jens Axboe [Thu, 19 May 2011 07:46:00 +0000 (09:46 +0200)]
Merge branches 'for-jens/xen-backend-fixes' and 'for-jens/xen-blkback-v3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-2.6.40/drivers

13 years agoxen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions.
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 15:54:10 +0000 (11:54 -0400)]
xen/p2m: Add EXPORT_SYMBOL_GPL to the M2P override functions.

If the backends, which use these two functions, are compiled as
a module we need these two functions to be exported.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override.
Konrad Rzeszutek Wilk [Mon, 28 Feb 2011 22:58:48 +0000 (17:58 -0500)]
xen/p2m/m2p/gnttab: Support GNTMAP_host_map in the M2P override.

We only supported the M2P (and P2M) override only for the
GNTMAP_contains_pte type mappings. Meaning that we grants
operations would "contain the machine address of the PTE to update"
If the flag is unset, then the grant operation is
"contains a host virtual address". The latter case means that
the Hypervisor takes care of updating our page table
(specifically the PTE entry) with the guest's MFN. As such we should
not try to do anything with the PTE. Previous to this patch
we would try to clear the PTE which resulted in Xen hypervisor
being upset with us:

(XEN) mm.c:1066:d0 Attempt to implicitly unmap a granted PTE c0100000ccc59067
(XEN) domain_crash called from mm.c:1067
(XEN) Domain 0 (vcpu#0) crashed on cpu#3:
(XEN) ----[ Xen-4.0-110228  x86_64  debug=y  Not tainted ]----

and crashing us.

This patch allows us to inhibit the PTE clearing in the PV guest
if the GNTMAP_contains_pte is not set.

On the m2p_remove_override path we provide the same parameter.

Sadly in the grant-table driver we do not have a mechanism to
tell m2p_remove_override whether to clear the PTE or not. Since
the grant-table driver is used by user-space, we can safely assume
that it operates only on PTE's. Hence the implementation for
it to work on !GNTMAP_contains_pte returns -EOPNOTSUPP. In the future
we can implement the support for this. It will require some extra
accounting structure to keep track of the page[i], and the flag.

[v1: Added documentation details, made it return -EOPNOTSUPP instead
 of trying to do a half-way implementation]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: don't fail empty barrier requests
Jan Beulich [Tue, 17 May 2011 10:07:05 +0000 (11:07 +0100)]
xen/blkback: don't fail empty barrier requests

The sector number on empty barrier requests may (will?) be -1, which,
given that it's being treated as unsigned 64-bit quantity, will almost
always exceed the actual (virtual) disk's size.

Inspired by Konrad's "When writting barriers set the sector number to
zero...".

While at it also add overflow checking to the math in vbd_translate().

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: fix xenbus_transaction_start() hang caused by double xenbus_transaction_...
Laszlo Ersek [Fri, 13 May 2011 13:45:40 +0000 (09:45 -0400)]
xen/blkback: fix xenbus_transaction_start() hang caused by double xenbus_transaction_end()

vbd_resize() up_read()'s xs_state.suspend_mutex twice in a row via double
xenbus_transaction_end() calls. The next down_read() in
xenbus_transaction_start() (at eg. the next resize attempt) hangs.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=618317

Acked-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Align the tabs on the structure.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 22:02:28 +0000 (18:02 -0400)]
xen/blkback: Align the tabs on the structure.

The recent changes caused this field of the structure to be offset a bit.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: if log_stats is enabled print out the data.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 21:23:30 +0000 (17:23 -0400)]
xen/blkback: if log_stats is enabled print out the data.

And not depend on the driver being built with -DDEBUG flag.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Add the prefix XEN in the common.h.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:58:21 +0000 (16:58 -0400)]
xen/blkback: Add the prefix XEN in the common.h.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Prefix 'vbd' with 'xen' in structs and functions.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:53:56 +0000 (16:53 -0400)]
xen/blkback: Prefix 'vbd' with 'xen' in structs and functions.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Change structure name blkif_st to xen_blkif.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:47:48 +0000 (16:47 -0400)]
xen/blkback: Change structure name blkif_st to xen_blkif.

No need for that '_st' and xen_blkif is more apt.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Remove the unused typedefs.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:37:04 +0000 (16:37 -0400)]
xen/blkback: Remove the unused typedefs.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Move include/xen/blkif.h into drivers/block/xen-blkback/common.h
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:31:51 +0000 (16:31 -0400)]
xen/blkback: Move include/xen/blkif.h into drivers/block/xen-blkback/common.h

Not point of the blkif.h file. It is not used by the frontend.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Fixing some more of the cleanpatch.pl warnings.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:23:06 +0000 (16:23 -0400)]
xen/blkback: Fixing some more of the cleanpatch.pl warnings.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Checkpatch.pl recommend against multiple assigments.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:19:23 +0000 (16:19 -0400)]
xen/blkback: Checkpatch.pl recommend against multiple assigments.

CHECK: multiple assignments should be avoided

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Fix checkpatch.pl warnings about more than 80 lines.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:14:15 +0000 (16:14 -0400)]
xen/blkback: Fix checkpatch.pl warnings about more than 80 lines.

Break up the macro usage.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Flesh out the description in the Kconfig.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:10:55 +0000 (16:10 -0400)]
xen/blkback: Flesh out the description in the Kconfig.

with more details.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Fix spelling mistakes.
Konrad Rzeszutek Wilk [Wed, 11 May 2011 20:26:59 +0000 (16:26 -0400)]
xen/blkback: Fix spelling mistakes.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Move blkif_get_x86_[32|64]_req to common.h in block/xen-blkback dir.
Konrad Rzeszutek Wilk [Wed, 11 May 2011 20:23:39 +0000 (16:23 -0400)]
xen/blkback: Move blkif_get_x86_[32|64]_req to common.h in block/xen-blkback dir.

From the blkif.h header, which was exposed to the frontend.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Removing the debug_lvl option.
Konrad Rzeszutek Wilk [Wed, 11 May 2011 20:21:08 +0000 (16:21 -0400)]
xen/blkback: Removing the debug_lvl option.

It is not really used for anything.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Use the DRV_PFX in the pr_.. macros.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:43:12 +0000 (16:43 -0400)]
xen/blkback: Use the DRV_PFX in the pr_.. macros.

To make it easier to read.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Make the DPRINTK uniform.
Konrad Rzeszutek Wilk [Wed, 11 May 2011 20:15:24 +0000 (16:15 -0400)]
xen/blkback: Make the DPRINTK uniform.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Change printk/DPRINTK to pr_.. type variant.
Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:42:31 +0000 (16:42 -0400)]
xen/blkback: Change printk/DPRINTK to pr_.. type variant.

And also make them uniform and prefix the message with 'xen-blkback'.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen-blkfront: Introduce BLKIF_OP_FLUSH_DISKCACHE support.
Konrad Rzeszutek Wilk [Tue, 3 May 2011 16:01:11 +0000 (12:01 -0400)]
xen-blkfront: Introduce BLKIF_OP_FLUSH_DISKCACHE support.

If the backend supports the 'feature-flush-cache' mode, use that
instead of the 'feature-barrier' support.

Currently there are three backends that support the 'feature-flush-cache'
mode: NetBSD, Solaris and Linux kernel. The 'flush' option is much
light-weight version than the 'barrier' support so lets try to use as
there are no filesystems in the kernel that use full barriers anymore.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen-blkfront: Provide for 'feature-flush-cache' the BLKIF_OP_WRITE_FLUSH_CACHE operation.
Konrad Rzeszutek Wilk [Thu, 5 May 2011 16:41:03 +0000 (12:41 -0400)]
xen-blkfront: Provide for 'feature-flush-cache' the BLKIF_OP_WRITE_FLUSH_CACHE operation.

The operation BLKIF_OP_WRITE_FLUSH_CACHE has existed in the Xen
tree header file for years but it was never present in the Linux tree
because the frontend (nor the backend) supported this interface.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen-blkfront: fix data size for xenbus_gather in blkfront_connect
Marek Marczykowski [Tue, 3 May 2011 16:04:52 +0000 (12:04 -0400)]
xen-blkfront: fix data size for xenbus_gather in blkfront_connect

barrier variable is int, not long. This overflow caused another variable
override: "err" (in PV code) and "binfo" (in xenlinux code -
drivers/xen/blkfront/blkfront.c). The later caused incorrect device
flags (RO/removable etc).

Signed-off-by: Marek Marczykowski <marmarek@mimuw.edu.pl>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
[v1: Changed title]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Fixed up comments and converted spaces to tabs.
Konrad Rzeszutek Wilk [Wed, 11 May 2011 19:57:09 +0000 (15:57 -0400)]
xen/blkback: Fixed up comments and converted spaces to tabs.

Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agocciss: fix compile issue
Jens Axboe [Fri, 6 May 2011 14:27:00 +0000 (08:27 -0600)]
cciss: fix compile issue

drivers/block/cciss.c: In function ‘cciss_send_reset’:
drivers/block/cciss.c:2515:2: error: implicit declaration of function ‘fill_cmd’
drivers/block/cciss.c: At top level:
drivers/block/cciss.c:2531:12: error: conflicting types for ‘fill_cmd’
drivers/block/cciss.c:2534:1: note: an argument type that has a default promotion can’t match an empty parameter name list declaration
drivers/block/cciss.c:2515:18: note: previous implicit declaration of ‘fill_cmd’ was here
make[1]: *** [drivers/block/cciss.o] Error 1
make: *** [drivers/block/cciss.o] Error 2

Move fill_cmd() to above where it is first used.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: add cciss_tape_cmds module paramter
Stephen M. Cameron [Tue, 3 May 2011 19:54:12 +0000 (14:54 -0500)]
cciss: add cciss_tape_cmds module paramter

This is to allow number of commands reserved for use by SCSI tape drives
and medium changers to be adjusted at driver load time via the kernel
parameter cciss_tape_cmds, with a default value of 6, and a range
of 2 - 16 inclusive.  Previously, the driver limited the number of
commands which could be queued to the SCSI half of the the driver
to only 2.  This is to fix the problem that if you had more than
two tape drives, you couldn't, for example, erase or rewind them all
at the same time.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: do not use bit 2 doorbell reset
Stephen M. Cameron [Tue, 3 May 2011 19:54:07 +0000 (14:54 -0500)]
cciss: do not use bit 2 doorbell reset

It causes NMIs which are undesirable at best, unsurvivable at worst.
Prefer the soft reset instead.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: do not attempt PCI power management reset method if we know it won't work.
Stephen M. Cameron [Tue, 3 May 2011 19:54:02 +0000 (14:54 -0500)]
cciss: do not attempt PCI power management reset method if we know it won't work.

Just go straight to the soft-reset method instead.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: remove superfluous sleeps around reset code
Stephen M. Cameron [Tue, 3 May 2011 19:53:57 +0000 (14:53 -0500)]
cciss: remove superfluous sleeps around reset code

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: do soft reset if hard reset is broken
Stephen M. Cameron [Tue, 3 May 2011 19:53:52 +0000 (14:53 -0500)]
cciss: do soft reset if hard reset is broken

on driver load, if reset_devices is set, and the hard reset
attempts fail, try to bring up the controller to the point that
a command can be sent, and send it a soft reset command, then
after the reset undo whatever driver initialization was done to get
it to the point to take a command, and re-do it after the reset.

This is to get kdump to work on all the "non-resettable" controllers
(except 64xx controllers which can't be reset due to the potentially
shared cache module.)

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: use new doorbell-bit-5 reset method
Stephen M. Cameron [Tue, 3 May 2011 19:53:46 +0000 (14:53 -0500)]
cciss: use new doorbell-bit-5 reset method

The bit-2-doorbell reset method seemed to cause (survivable) NMIs
on some systems and (unsurvivable) IOCK NMIs on some G7 servers.
Firmware guys implemented a new doorbell method to alleviate these
problems triggered by bit 5 of the doorbell register.  We want to
use it if it's available.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: increase timeouts for post-reset no-ops
Stephen M. Cameron [Tue, 3 May 2011 19:53:41 +0000 (14:53 -0500)]
cciss: increase timeouts for post-reset no-ops

Just to reduce the messages about timeouts that appear.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: clarify messages around reset behavior
Stephen M. Cameron [Tue, 3 May 2011 19:53:36 +0000 (14:53 -0500)]
cciss: clarify messages around reset behavior

When waiting for the board to become "not ready"
don't print a message saying "waiting for board to
become ready" (possibly followed by a message saying
"failed waiting for board to become not ready".  Instead,
it should be "waiting for board to reset" and "failed
waiting for board to reset."

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
"
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: increase time to wait for board reset to start
Stephen M. Cameron [Tue, 3 May 2011 19:53:31 +0000 (14:53 -0500)]
cciss: increase time to wait for board reset to start

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: get rid of message related magic numbers
Stephen M. Cameron [Tue, 3 May 2011 19:53:26 +0000 (14:53 -0500)]
cciss: get rid of message related magic numbers

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: fix reply pool and block fetch table memory leaks
Stephen M. Cameron [Tue, 3 May 2011 19:53:21 +0000 (14:53 -0500)]
cciss: fix reply pool and block fetch table memory leaks

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out irq request code
Stephen M. Cameron [Tue, 3 May 2011 19:53:16 +0000 (14:53 -0500)]
cciss: factor out irq request code

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out scatterlist allocation functions
Stephen M. Cameron [Tue, 3 May 2011 19:53:10 +0000 (14:53 -0500)]
cciss: factor out scatterlist allocation functions

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: factor out command pool allocation functions
Stephen M. Cameron [Tue, 3 May 2011 19:53:05 +0000 (14:53 -0500)]
cciss: factor out command pool allocation functions

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: do a better job of detecting controller reset failure
Stephen M. Cameron [Tue, 3 May 2011 19:53:00 +0000 (14:53 -0500)]
cciss: do a better job of detecting controller reset failure

Detect failure of controller reset by noticing if the 32 bytes of
"driver version" we store on the hardware in the config table
fail to get zeroed out.  Previously we noticed if the controller
did not transition to "simple mode", but this did not detect reset
failure if the controller was already in simple mode prior to
the reset attempt (e.g. due to module parameter hpsa_simple_mode=1).

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocciss: add readl after writel in interrupt mask setting code
Stephen M. Cameron [Tue, 3 May 2011 19:52:54 +0000 (14:52 -0500)]
cciss: add readl after writel in interrupt mask setting code

This is to ensure the board interrupts are really off when
these functions return.

Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoiosched: remove redundant sprintf
Kees Cook [Fri, 6 May 2011 00:02:12 +0000 (18:02 -0600)]
iosched: remove redundant sprintf

After the anticipatory scheduler was dropped, there was no need to
special-case the request_module string. As such, drop the redundant
sprintf and stack variable.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoblock: Remove 'plug/unplug' comment in blk_execute_rq_nowait
Tao Ma [Thu, 5 May 2011 21:10:05 +0000 (15:10 -0600)]
block: Remove 'plug/unplug' comment in blk_execute_rq_nowait

unplug is replaced with blk_run_queue now in blk_execute_rq_nowait,
so change the comment accordingly.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoxen/blkback: Fix up some of the comments.
Konrad Rzeszutek Wilk [Thu, 5 May 2011 17:42:10 +0000 (13:42 -0400)]
xen/blkback: Fix up some of the comments.

They had the wrong data or were in the wrong spot.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Squash the checking for operation into dispatch_rw_block_io
Konrad Rzeszutek Wilk [Thu, 5 May 2011 17:37:23 +0000 (13:37 -0400)]
xen/blkback: Squash the checking for operation into dispatch_rw_block_io

We do a check for the operations right before calling dispatch_rw_block_io.
And then we do the same check in dispatch_rw_block_io. This patch
squashes those checks into the 'dispatch_rw_block_io' function.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Add support for BLKIF_OP_FLUSH_DISKCACHE and drop BLKIF_OP_WRITE_BARRIER.
Konrad Rzeszutek Wilk [Wed, 4 May 2011 21:07:27 +0000 (17:07 -0400)]
xen/blkback: Add support for BLKIF_OP_FLUSH_DISKCACHE and drop BLKIF_OP_WRITE_BARRIER.

We drop the support for 'feature-barrier' and add in the support
for the 'feature-flush-cache' if the real backend storage supports
flushing.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen-blkfront: Provide for 'feature-flush-cache' the BLKIF_OP_WRITE_FLUSH_CACHE operation.
Konrad Rzeszutek Wilk [Thu, 5 May 2011 16:41:03 +0000 (12:41 -0400)]
xen-blkfront: Provide for 'feature-flush-cache' the BLKIF_OP_WRITE_FLUSH_CACHE operation.

The operation BLKIF_OP_WRITE_FLUSH_CACHE has existed in the Xen
tree header file for years but it was never present in the Linux tree
because the frontend (nor the backend) supported this interface.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoRevert "xen/blkback: Move the plugging/unplugging to a higher level."
Konrad Rzeszutek Wilk [Wed, 27 Apr 2011 16:40:11 +0000 (12:40 -0400)]
Revert "xen/blkback: Move the plugging/unplugging to a higher level."

This reverts commit 97961ef46b9b5a6a7c918a38b898a7b3e49869f4 b/c
we lose about 15% performance if we do the unplugging and the
end of the reading the ring buffer.

13 years agoxen/blkback: Stick REQ_SYNC on WRITEs to deal with CFQ I/O scheduler.
Konrad Rzeszutek Wilk [Tue, 26 Apr 2011 20:24:18 +0000 (16:24 -0400)]
xen/blkback: Stick REQ_SYNC on WRITEs to deal with CFQ I/O scheduler.

If one runs a simple fio request with random read/write with a
20%/80% ratio, the numbers are incredibly bad when using the CFQ scheduler.

IOmeter       |       |      |          |
64K, randrw   |  NOOP | CFQ  | deadline |
randrwmix=80  |       |      |          |
--------------+-------+------+----------+
blkback       |103/27 |32/10 | 102/27   |
--------------+-------+------+----------+
QEMU qdisk    |103/27 |102/27| 102/27   |

The problem as explained by Vivek Goyal was:

".. that difference is that sync vs async requests. In the case of
a kernel thread submitting IO, [..] all the WRITES might be being
considered as async and will go in a different queue. If you mix those
with some READS, they are always sync and will go in differnet queue.
In presence of sync queue, CFQ will idle and choke up WRITES in
an attempt to improve latencies of READs.

In case of AIO [note: this is what QEMU qdisk is doing] , [..]
it is direct IO and both READS and WRITES will be considered SYNC
and will go in a single queue and no choking of WRITES will take place."

The solution is quite simple, tack on REQ_SYNC (which is
what the WRITE_ODIRECT macro points to) and the numbers go
back up.

Suggested-by: Vivek Goyal <vgoyal@redhat.com
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Move the plugging/unplugging to a higher level.
Konrad Rzeszutek Wilk [Tue, 26 Apr 2011 16:57:59 +0000 (12:57 -0400)]
xen/blkback: Move the plugging/unplugging to a higher level.

We used to the plug/unplug on the submit_bio. But that means
if within a stream of WRITE, WRITE, WRITE,...,WRITE we have
one READ, it could stall the pipeline (as the 'submio_bio'
could trigger the unplug_fnc to be called and stall/sync
when doing the READ). Instead we want to move the unplugging
when the whole (or as a much as possible) ring buffer has been
processed. This also eliminates us doing plug/unplug for
each request.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoblock: don't block events on excl write for non-optical devices
Tejun Heo [Thu, 21 Apr 2011 18:54:46 +0000 (20:54 +0200)]
block: don't block events on excl write for non-optical devices

Disk event code automatically blocks events on excl write.  This is
primarily to avoid issuing polling commands while burning is in
progress.  This behavior doesn't fit other types of devices with
removeable media where polling commands don't have adverse side
effects and door locking usually doesn't exist.

This patch introduces new genhd flag which controls the auto-blocking
behavior and uses it to enable auto-blocking only on optical devices.

Note for stable: 2.6.38 and later only

Cc: stable@kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoblock: rescan partitions on invalidated devices on -ENOMEDIA too
Tejun Heo [Thu, 21 Apr 2011 18:54:45 +0000 (20:54 +0200)]
block: rescan partitions on invalidated devices on -ENOMEDIA too

__blkdev_get() doesn't rescan partitions if disk->fops->open() fails,
which leads to ghost partition devices lingering after medimum removal
is known to both the kernel and userland.  The behavior also creates a
subtle inconsistency where O_NONBLOCK open, which doesn't fail even if
there's no medium, clears the ghots partitions, which is exploited to
work around the problem from userland.

Fix it by updating __blkdev_get() to issue partition rescan after
-ENOMEDIA too.

This was reported in the following bz.

 https://bugzilla.kernel.org/show_bug.cgi?id=13029

Note for stable: 2.6.38 and later only

Cc: stable@kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: David Zeuthen <zeuthen@gmail.com>
Reported-by: Martin Pitt <martin.pitt@ubuntu.com>
Reported-by: Kay Sievers <kay.sievers@vrfy.org>
Tested-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agocdrom: always check_disk_change() on open
Tejun Heo [Thu, 21 Apr 2011 18:54:44 +0000 (20:54 +0200)]
cdrom: always check_disk_change() on open

cdrom_open() called check_disk_change() after the rest of open path
succeeded which leads to the following bizarre behavior.

* After media change, if the device opened without O_NONBLOCK,
  open_for_data() naturally fails with -ENOMEDIA and
  check_disk_change() is never called.  The media is known to be gone
  and the open failure makes it obvious to the userland but device
  invalidation never happens.

* But if the device is opened with O_NONBLOCK, all the checks are
  bypassed and cdrom_open() doesn't notice that the media is not there
  and check_disk_change() is called and invalidation happens.

There's nothing to be gained by avoiding calling check_disk_change()
on open failure.  Common cases end up calling check_disk_change()
anyway.  All we get is inconsistent behavior.

Fix it by moving check_disk_change() invocation to the top of
cdrom_open() so that it always gets called regardless of how the rest
of open proceeds.

Note for stable: 2.6.38 and later only

Cc: stable@kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Amit Shah <amit.shah@redhat.com>
Tested-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoxen/blkback: Prefix exposed functions with xen_
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 15:50:43 +0000 (11:50 -0400)]
xen/blkback: Prefix exposed functions with xen_

And also shorten the name if it has blkback to blkbk.

This results in the symbol table (if compiled in the kernel)
to be much shorter, prettier,  and also easier to search for.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen-blkback: Inline some of the functions that were moved from vbd/interface.c
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 15:21:43 +0000 (11:21 -0400)]
xen-blkback: Inline some of the functions that were moved from vbd/interface.c

Shuffling code around.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen-blkback: Remove from the copyright notice the address.
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 15:01:47 +0000 (11:01 -0400)]
xen-blkback: Remove from the copyright notice the address.

There is no need for it, as the address is updated constatly
in the root of the Linux kernel.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Squash vbd.c,interface.c in blkback.c and xenbus.c respectivly.
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 14:57:29 +0000 (10:57 -0400)]
xen/blkback: Squash vbd.c,interface.c in blkback.c and xenbus.c respectivly.

Daniel Stodden suggested to eliminate vbd.c and interface.c, inlining the
critical bits where they belong, respectively.

Leaving only blkback.c for the data- and xenbus.c for the control path.

Suggested-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoLinux 2.6.39-rc4
Linus Torvalds [Tue, 19 Apr 2011 04:26:00 +0000 (21:26 -0700)]
Linux 2.6.39-rc4

13 years agoMerge branch 'for-39-rc4' of git://codeaurora.org/quic/kernel/davidb/linux-msm
Linus Torvalds [Mon, 18 Apr 2011 22:44:29 +0000 (15:44 -0700)]
Merge branch 'for-39-rc4' of git://codeaurora.org/quic/kernel/davidb/linux-msm

* 'for-39-rc4' of git://codeaurora.org/quic/kernel/davidb/linux-msm:
  msm: timer: fix missing return value
  msm: Remove extraneous ffa device check

13 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Linus Torvalds [Mon, 18 Apr 2011 20:29:03 +0000 (13:29 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: xen-kbdfront - fix mouse getting stuck after save/restore
  Input: estimate number of events per packet
  Input: evdev - indicate buffer overrun with SYN_DROPPED
  Input: document event types and codes and their intended use
  Input: add KEY_IMAGES specifically for AL Image Browser
  Input: twl4030_keypad - fix potential NULL dereference in twl4030_kp_probe()
  Input: h3600_ts - fix error handling at connect
  Input: twl4030_keypad - avoid potential NULL-pointer dereference

13 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
Linus Torvalds [Mon, 18 Apr 2011 20:21:18 +0000 (13:21 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: add blk_run_queue_async
  block: blk_delay_queue() should use kblockd workqueue
  md: fix up raid1/raid10 unplugging.
  md: incorporate new plugging into raid5.
  md: provide generic support for handling unplug callbacks.
  md - remove old plugging code.
  md/dm - remove remains of plug_fn callback.
  md: use new plugging interface for RAID IO.
  block: drop queue lock before calling __blk_run_queue() for kblockd punt
  Revert "block: add callback function for unplug notification"
  block: Enhance new plugging support to support general callbacks

13 years agoMerge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Linus Torvalds [Mon, 18 Apr 2011 19:24:24 +0000 (12:24 -0700)]
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/powermac: Build fix with SMP and CPU hotplug
  powerpc/perf_event: Skip updating kernel counters if register value shrinks
  powerpc: Don't write protect kernel text with CONFIG_DYNAMIC_FTRACE enabled
  powerpc: Fix oops if scan_dispatch_log is called too early
  powerpc/pseries: Use a kmem cache for DTL buffers
  powerpc/kexec: Fix regression causing compile failure on UP
  powerpc/85xx: disable Suspend support if SMP enabled
  powerpc/e500mc: Remove CPU_FTR_MAYBE_CAN_NAP/CPU_FTR_MAYBE_CAN_DOZE
  powerpc/book3e: Fix CPU feature handling on 64-bit e5500
  powerpc: Check device status before adding serial device
  powerpc/85xx: Don't add disabled PCIe devices

13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable
Linus Torvalds [Mon, 18 Apr 2011 19:24:05 +0000 (12:24 -0700)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (24 commits)
  Btrfs: fix free space cache leak
  Btrfs: avoid taking the chunk_mutex in do_chunk_alloc
  Btrfs end_bio_extent_readpage should look for locked bits
  Btrfs: don't force chunk allocation in find_free_extent
  Btrfs: Check validity before setting an acl
  Btrfs: Fix incorrect inode nlink in btrfs_link()
  Btrfs: Check if btrfs_next_leaf() returns error in btrfs_real_readdir()
  Btrfs: Check if btrfs_next_leaf() returns error in btrfs_listxattr()
  Btrfs: make uncache_state unconditional
  btrfs: using cached extent_state in set/unlock combinations
  Btrfs: avoid taking the trans_mutex in btrfs_end_transaction
  Btrfs: fix subvolume mount by name problem when default mount subvolume is set
  fix user annotation in ioctl.c
  Btrfs: check for duplicate iov_base's when doing dio reads
  btrfs: properly handle overlapping areas in memmove_extent_buffer
  Btrfs: fix memory leaks in btrfs_new_inode()
  Btrfs: check for duplicate iov_base's when doing dio reads
  Btrfs: reuse the extent_map we found when calling btrfs_get_extent
  Btrfs: do not use async submit for small DIO io's
  Btrfs: don't split dio bios if we don't have to
  ...

13 years agoxen/blkback: Move it from drivers/xen to drivers/block
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 18:24:23 +0000 (14:24 -0400)]
xen/blkback: Move it from drivers/xen to drivers/block

.. and modify the Makefile and Kconfig files appropriately.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoblock, xen/blkback: remove blk_[get|put]_queue calls.
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 18:17:49 +0000 (14:17 -0400)]
block, xen/blkback: remove blk_[get|put]_queue calls.

They were used to check if the queue does not have QUEUE_FLAG_DEAD
set. That is not necessary anymore as the 'submit_io' call
ends up doing that for us.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoproc: do proper range check on readdir offset
Linus Torvalds [Mon, 18 Apr 2011 17:36:54 +0000 (10:36 -0700)]
proc: do proper range check on readdir offset

Rather than pass in some random truncated offset to the pid-related
functions, check that the offset is in range up-front.

This is just cleanup, the previous commit fixed the real problem.

Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agonext_pidmap: fix overflow condition
Linus Torvalds [Mon, 18 Apr 2011 17:35:30 +0000 (10:35 -0700)]
next_pidmap: fix overflow condition

next_pidmap() just quietly accepted whatever 'last' pid that was passed
in, which is not all that safe when one of the users is /proc.

Admittedly the proc code should do some sanity checking on the range
(and that will be the next commit), but that doesn't mean that the
helper functions should just do that pidmap pointer arithmetic without
checking the range of its arguments.

So clamp 'last' to PID_MAX_LIMIT.  The fact that we then do "last+1"
doesn't really matter, the for-loop does check against the end of the
pidmap array properly (it's only the actual pointer arithmetic overflow
case we need to worry about, and going one bit beyond isn't going to
overflow).

[ Use PID_MAX_LIMIT rather than pid_max as per Eric Biederman ]

Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com>
Analyzed-by: Robert Święcki <robert@swiecki.net>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoInput: xen-kbdfront - fix mouse getting stuck after save/restore
Igor Mammedov [Mon, 18 Apr 2011 17:17:17 +0000 (10:17 -0700)]
Input: xen-kbdfront - fix mouse getting stuck after save/restore

Mouse gets "stuck" after restore of PV guest but buttons are in working
condition.

If driver has been configured for ABS coordinates at start it will get
XENKBD_TYPE_POS events and then suddenly after restore it'll start getting
XENKBD_TYPE_MOTION events, that will be dropped later and they won't get
into user-space.

Regression was introduced by hunk 5 and 6 of
5ea5254aa0ad269cfbd2875c973ef25ab5b5e9db
("Input: xen-kbdfront - advertise either absolute or relative
coordinates").

Driver on restore should ask xen for request-abs-pointer again if it is
available. So restore parts that did it before 5ea5254.

Acked-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
[v1: Expanded the commit description]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
13 years agoInput: estimate number of events per packet
Jeff Brown [Mon, 18 Apr 2011 17:08:02 +0000 (10:08 -0700)]
Input: estimate number of events per packet

Calculate a default based on the number of ABS axes, REL axes,
and MT slots for the device during input device registration.

Signed-off-by: Jeff Brown <jeffbrown@android.com>
Reviewed-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
13 years agoxen/blkback: Get the 'requeust_queue' properly.
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 16:04:17 +0000 (12:04 -0400)]
xen/blkback: Get the 'requeust_queue' properly.

After the commit 0faa8cca883bbc6a0919e3c89128672659b75820
("    xen/blkback: remove per-queue plugging") we forgot
to retrieve the 'struct request_queue' from the block device.

This puts the functionality back in and fixes a NULL pointer
bug.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoxen/blkback: Move the check for misaligned I/O once more.
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 15:34:55 +0000 (11:34 -0400)]
xen/blkback: Move the check for misaligned I/O once more.

The commit 976222e05ea5a9959ccf880d7a24efbf79b3c6cf

    xen/blkback: Move the check for misaligned I/O higher.

moved it a bit to high. The preq->vbdev was not set, so the
check for misaligned I/O would cause a NULL pointer derefence.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
13 years agoBtrfs: fix free space cache leak
Chris Mason [Mon, 18 Apr 2011 12:55:34 +0000 (08:55 -0400)]
Btrfs: fix free space cache leak

The free space caching code was recently reworked to
cache all the pages it needed instead of using find_get_page everywhere.

One loop was missed though, so it ended up leaking pages.  This fixes
it to use our page array instead of find_get_page.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
13 years agoblock: add blk_run_queue_async
Christoph Hellwig [Mon, 18 Apr 2011 09:41:33 +0000 (11:41 +0200)]
block: add blk_run_queue_async

Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly.  I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agoblock: blk_delay_queue() should use kblockd workqueue
Jens Axboe [Mon, 18 Apr 2011 09:36:39 +0000 (11:36 +0200)]
block: blk_delay_queue() should use kblockd workqueue

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
13 years agomd: fix up raid1/raid10 unplugging.
NeilBrown [Mon, 18 Apr 2011 08:25:43 +0000 (18:25 +1000)]
md: fix up raid1/raid10 unplugging.

We just need to make sure that an unplug event wakes up the md
thread, which is exactly what mddev_check_plugged does.

Also remove some plug-related code that is no longer needed.

Signed-off-by: NeilBrown <neilb@suse.de>