Joe Thornber [Tue, 9 Aug 2011 23:13:35 +0000 (09:13 +1000)]
Initial EXPERIMENTAL implementation of device-mapper thin provisioning
with snapshot support. The 'thin' target is used to create instances of
the virtual devices that are hosted in the 'thin-pool' target. The
thin-pool target provides data sharing among devices. This sharing is
made possible using the persistent-data library in the previous patch.
The main highlight of this implementation, compared to the previous
implementation of snapshots, is that it allows many virtual devices to
be stored on the same data volume, simplifying administration and
allowing sharing of data between volumes (thus reducing disk usage).
Another big feature is support for arbitrary depth of recursive
snapshots (snapshots of snapshots of snapshots ...). The previous
implementation of snapshots did this by chaining together lookup tables,
and so performance was O(depth). This new implementation uses a single
data structure so we don't get this degradation with depth.
For further information and examples of how to use this, please read
Documentation/device-mapper/thin-provisioning.txt
Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Tue, 9 Aug 2011 23:13:33 +0000 (09:13 +1000)]
This patch introduces dm_kcopyd_zero() to make it easy to use
kcopyd to write zeros into the requested areas instead
instead of copying. It is implemented by passing a NULL
copying source to dm_kcopyd_copy().
The forthcoming thin provisioning target uses this.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Milan Broz [Tue, 9 Aug 2011 23:13:33 +0000 (09:13 +1000)]
If optional discard support in dm-crypt is enabled, discards requests
bypass the crypt queue and blocks of the underlying device are discarded.
For the read path, discarded blocks are handled the same as normal
ciphertext blocks, thus decrypted.
So if the underlying device announces discarded regions return zeroes,
dm-crypt must disable this flag because after decryption there is just
random noise instead of zeroes.
Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Jonthan Brassow [Tue, 9 Aug 2011 23:13:33 +0000 (09:13 +1000)]
Fix off-by-one error in validation of write_mostly.
The user-supplied value given for the 'write_mostly' argument must be an
index starting at 0. The validation of the supplied argument failed to
check for 'N' ('>' vs '>='), which would have caused an access beyond the
end of the array.
Reported-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Sage Weil [Tue, 9 Aug 2011 22:05:07 +0000 (15:05 -0700)]
libceph: warn on msg allocation failures
Any non-masked msg allocation failure should generate a warning and stack
trace to the console. All of these need to eventually be replaced by
safe preallocation or msgpools.
Sage Weil [Tue, 9 Aug 2011 22:03:46 +0000 (15:03 -0700)]
libceph: don't complain on msgpool alloc failures
The pool allocation failures are masked by the pool; there is no need to
spam the console about them. (That's the whole point of having the pool
in the first place.)
Mark msg allocations whose failure is safely handled as such.
Sage Weil [Thu, 4 Aug 2011 15:21:30 +0000 (08:21 -0700)]
ceph: implement (optional) max read size
The 'rsize' mount option limits the maximum size of an individual
read(ahead) operation that is sent off to an OSD. This is distinct from
'rasize', which controls the size of the readahead window.
Sage Weil [Wed, 3 Aug 2011 16:58:09 +0000 (09:58 -0700)]
ceph: make readpages fully async
When we get a ->readpages() aop, submit async reads for all page ranges
in the provided page list. Lock the pages immediately, so that VFS/MM
will block until the reads complete.
Sage Weil [Tue, 9 Aug 2011 21:48:11 +0000 (14:48 -0700)]
libceph: fix msgpool
There were several problems here:
1- we weren't tagging allocations with the pool, so they were never
returned to the pool.
2- msgpool_put didn't add back to the mempool, even it were called.
3- msgpool_release didn't clear the pool pointer, so it would have looped
had #1 not been broken.
These may or may not have been responsible for #1136 or #1381 (BUG due to
non-empty mempool on umount). I can't seem to trigger the crash now using
the method I was using before.