Joe Thornber [Wed, 3 Aug 2011 00:43:45 +0000 (10:43 +1000)]
Initial EXPERIMENTAL implementation of device-mapper thin provisioning
with snapshot support. The 'thin' target is used to create instances of
the virtual devices that are hosted in the 'thin-pool' target. The
thin-pool target provides data sharing among devices. This sharing is
made possible using the persistent-data library in the previous patch.
The main highlight of this implementation, compared to the previous
implementation of snapshots, is that it allows many virtual devices to
be stored on the same data volume, simplifying administration and
allowing sharing of data between volumes (thus reducing disk usage).
Another big feature is support for arbitrary depth of recursive
snapshots (snapshots of snapshots of snapshots ...). The previous
implementation of snapshots did this by chaining together lookup tables,
and so performance was O(depth). This new implementation uses a single
data structure so we don't get this degradation with depth.
For further information and examples of how to use this, please read
Documentation/device-mapper/thin-provisioning.txt
Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Wed, 3 Aug 2011 00:43:43 +0000 (10:43 +1000)]
This patch introduces dm_kcopyd_zero() to make it easy to use
kcopyd to write zeros into the requested areas instead
instead of copying. It is implemented by passing a NULL
copying source to dm_kcopyd_copy().
The forthcoming thin provisioning target uses this.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Wed, 3 Aug 2011 00:43:43 +0000 (10:43 +1000)]
DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities
regardless of whether or not a given DM device's underlying devices
also advertised a need for them.
Block's flush-merge changes from 2.6.39 have proven to be more costly
for DM devices. Performance regressions have been reported even when
DM's underlying devices do not advertise that they have a write cache.
Fix the performance regressions by configuring a DM device's flushing
capabilities based on those of the underlying devices' capabilities.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Milan Broz [Wed, 3 Aug 2011 00:43:43 +0000 (10:43 +1000)]
Add optional parameter field to dmcrypt table and support
"allow_discards" option.
Discard requests bypass crypt queue processing. Bio is simple remapped
to underlying device.
Note that discard will be never enabled by default because of security
consequences. It is up to the administrator to enable it for encrypted
devices.
(Note that userspace cryptsetup does not understand new optional
parameters yet. Support for this will come later. Until then, you
should use 'dmsetup' to enable and disable this.)
Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Add the ability to parse and use metadata devices to dm-raid. Although
not strictly required, without the metadata devices, many features of
RAID are unavailable. They are used to store a superblock and bitmap.
The role, or position in the array, of each device must be recorded in
its superblock. This is to help with fault handling, array reshaping,
and sanity checks. RAID 4/5/6 devices must be loaded in a specific order:
in this way, the 'array_position' field helps validate the correctness
of the mapping when it is loaded. It can be used during reshaping to
identify which devices are added/removed. Fault handling is impossible
without this field. For example, when a device fails it is recorded in
the superblock. If this is a RAID1 device and the offending device is
removed from the array, there must be a way during subsequent array
assembly to determine that the failed device was the one removed. This
is done by correlating the 'array_position' field and the bit-field
variable 'failed_devices'.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Wed, 3 Aug 2011 00:43:41 +0000 (10:43 +1000)]
Exactly one of name, uuid or device must be specified when referencing
an existing device. This removes the ambiguity (risking the wrong
device being updated) if two conflicting parameters were specified.
Previously one parameter got used and any others were ignored silently.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Wed, 3 Aug 2011 00:43:40 +0000 (10:43 +1000)]
Move logic to find device based on major/minor number to a separate
function __get_dev_cell (similar to __get_uuid_cell and __get_name_cell).
This makes the function __find_device_hash_cell more straightforward.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Wed, 3 Aug 2011 00:43:40 +0000 (10:43 +1000)]
Move parameter filling from find_device to __find_device_hash_cell.
This patch causes ioctls using __find_device_hash_cell
(DM_DEV_REMOVE_CMD, DM_DEV_SUSPEND_CMD - resume, DM_TABLE_CLEAR_CMD)
to return device parameters, bringing them into line with the other
ioctls.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Wed, 3 Aug 2011 00:43:40 +0000 (10:43 +1000)]
Add corrupt_bio_byte feature to simulate corruption by overwriting a byte at a
specified position with a specified value during intervals when the device is
"down".
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Wed, 3 Aug 2011 00:43:39 +0000 (10:43 +1000)]
Add the ability to specify arbitrary feature flags when creating a
flakey target. This code uses the same target argument helpers that
the multipath target does.
Also remove the superfluous 'dm-flakey' prefixes from the error messages,
as they already contain the prefix 'flakey'.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Wed, 3 Aug 2011 00:43:38 +0000 (10:43 +1000)]
If we write a full chunk in the snapshot, skip reading the origin device
because the whole chunk will be overwritten anyway.
This patch changes the snapshot write logic when a full chunk is written.
In this case:
1. allocate the exception
2. dispatch the bio (but don't report the bio completion to device mapper)
3. write the exception record
4. report bio completed
Callbacks must be done through the kcopyd thread, because callbacks must not
race with each other. So we create two new functions:
dm_kcopyd_prepare_callback: allocate a job structure and prepare the callback.
(This function must not be called from interrupt context.)
dm_kcopyd_do_callback: submit callback.
(This function may be called from interrupt context.)
Performance test (on snapshots with 4k chunk size):
without the patch:
non-direct-io sequential write (dd): 17.7MB/s
direct-io sequential write (dd): 20.9MB/s
non-direct-io random write (mkfs.ext2): 0.44s
with the patch:
non-direct-io sequential write (dd): 26.5MB/s
direct-io sequential write (dd): 33.2MB/s
non-direct-io random write (mkfs.ext2): 0.27s
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Wed, 3 Aug 2011 00:43:38 +0000 (10:43 +1000)]
Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate
whether the device can accept bios larger than the size its merge
function returns. When set, use this to send large bios to snapshots
which can split them if necessary. Snapshot I/O may be significantly
fragmented and this approach seems to improve peformance.
Before the patch, dm_set_device_limits restricted bio size to page size
if the underlying device had a merge function and the target didn't
provide a merge function. After the patch, dm_set_device_limits
restricts bio size to page size if the underlying device has a merge
function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't
provide a merge function.
The snapshot target can't provide a merge function because when the merge
function is called, it is impossible to determine where the bio will be
remapped. Previously this led us to impose a 4k limit, which we can
now remove if the snapshot store is located on a device without a merge
function. Together with another patch for optimizing full chunk writes,
it improves performance from 29MB/s to 40MB/s when writing to the
filesystem on snapshot store.
If the snapshot store is placed on a non-dm device with a merge function
(such as md-raid), device mapper still limits all bios to page size.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Wed, 3 Aug 2011 00:43:34 +0000 (10:43 +1000)]
The nr_pages field in struct kcopyd_job is only used temporarily in
run_pages_job() to count the number of required pages.
We can use a local variable instead.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Wed, 3 Aug 2011 00:43:31 +0000 (10:43 +1000)]
Remove 'discards_supported' from the dm_table structure. The same
information can be easily discovered from the table's target(s) in
dm_table_supports_discards().
Before this fix dm_table_supports_discards() would skip checking the
individual targets' 'discards_supported' flag if any one target in the
table didn't set num_discard_requests > 0. Now the per-target
'discards_supported' flag is effective at insuring the final DM device
advertises discard support. But, to be clear, targets that don't
support discards (!num_discard_requests) will not receive discard
requests.
Also DMWARN if a target sets 'discards_supported' override but forgets
to set 'num_discard_requests'.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Wed, 3 Aug 2011 00:43:30 +0000 (10:43 +1000)]
For normal kernel pages, CPU cache is synchronized by the dma layer.
However, this is not done for pages allocated with vmalloc. If we do I/O
to/from vmallocated pages, we must synchronize CPU cache explicitly.
Prior to doing I/O on vmallocated page we must call
flush_kernel_vmap_range to flush dirty cache on the virtual address.
After finished read we must call invalidate_kernel_vmap_range to
invalidate cache on the virtual address, so that accesses to the virtual
address return newly read data and not stale data from CPU cache.
This patch fixes metadata corruption on dm-snapshots on PA-RISC and
possibly other architectures with caches indexed by virtual address.
Wolfram Sang [Tue, 2 Aug 2011 17:42:19 +0000 (19:42 +0200)]
ASoC: sgtl5000: fix cache handling
Cache handling in this driver is broken. The chip has 16-bit registers, yet the
register numbers also increase by 2 per register, i.e. there are only
even-numbered registers. The cache in this driver, though, simply increments
register numbers, so it does need some mapping as seen in
sgtl5000_restore_regs(), note the '>> 1':
That, of course, won't work with snd_soc_update_bits(). (Thus, we won't even
notice the missing register 0x1c in the default regs which shifted all follwing
registers to wrong values.) Noticed on the MX28EVK where enabling the regulators
simply locked up the chip.
Refactor the routines and use a properly sized default_regs array which matches
the register layout of the underlying chip, i.e. create a truly flat cache.
This also saves some code which should make up for the bigger array a little.
When soc-core will somewhen have another cache type which handles a step size,
this conversion will also ease the transition.
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de> Tested-by: Dong Aisheng <b29396@freescale.com> Tested-by: Shawn Guo <shawn.guo@linaro.org> Acked-by: Liam Girdwood <lrg@ti.com> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
In this code, blkvsc_req is allocated in the cache blkdev->request_pool,
but freed in the first case to the cache blkvsc_req->dev->request_pool.
blkvsc_req->dev is subsequently initialized to blkdev, making these the
same at the second call to kmem_cache_free. But at the point of the first
call, blkvsc_req->dev is NULL. The second call is changed too, for
uniformity.
The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x,e,e1,e2,e3;
@@
x = \(kmem_cache_alloc\|kmem_cache_zalloc\)(e1,e2)
... when != x = e
(
kmem_cache_free(e1,x);
|
?-kmem_cache_free(e3,x);
+kmem_cache_free(e1,x);
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk> Cc: KY Srinivasan <kys@microsoft.com> Cc: Hank Janssen <hjanssen@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>