Joe Thornber [Tue, 24 Jul 2012 23:25:33 +0000 (09:25 +1000)]
Add read-only and fail-io modes to thin provisioning.
If a transaction commit fails the pool's metadata device will transition
to "read-only" mode. If a commit fails once already in read-only mode
the transition to "fail-io" mode occurs.
Once in fail-io mode the pool and all associated thin devices will
report a status of "Fail".
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Tue, 24 Jul 2012 23:25:33 +0000 (09:25 +1000)]
Introduce dm_pool_abort_metadata to abort the current metadata
transaction. Generally this will only be called when bad things are
happening and dm-thin is trying to roll back to a good state for
read-only mode.
It's complicated by the fact that the metadata device may have failed
completely causing the abort to be unable to read the old transaction.
In this case the metadata object is placed in a 'fail' mode and
everything fails apart from destroying it.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Tue, 24 Jul 2012 23:25:32 +0000 (09:25 +1000)]
Introduce dm_bm_set_read_only to switch the block manager into a
read-only mode. To be used when dm-thin degrades due to io errors on
the metadata device.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Tue, 24 Jul 2012 23:25:31 +0000 (09:25 +1000)]
Introduce dm_thin_changed_this_transaction to dm-thin-metadata to publish a
useful bit of information we're already tracking. This will help dm thin
decide when to commit.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Tue, 24 Jul 2012 23:25:23 +0000 (09:25 +1000)]
Stop using dm_bm_unlock_move when shadowing blocks in the transaction
manager as an optimisation and remove the function as it is then no
longer used.
Some code, such as the space maps, keeps using on-disk data structures
from the previous transaction. It can do this because blocks won't
be reallocated until the subsequent transaction. Using
dm_bm_unlock_move to copy blocks sounds like a win, but it forces a
synchronous read should the old block be accessed.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Tue, 24 Jul 2012 23:25:22 +0000 (09:25 +1000)]
This patch introduces a separate struct for the block_manager.
It also uses IS_ERR to check the return value of dm_bufio_client_create
instead of testing incorrectly for NULL.
Prior to this patch a struct dm_block_manager was really an alias for
a struct dm_bufio_client. We want to add some functionality to the
block manager that will require extra fields, so this one to one
mapping is no longer valid.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Tue, 24 Jul 2012 23:25:21 +0000 (09:25 +1000)]
The thin provisioning target commits internal metadata on flush. So it
should receive flushes regardless of whether the underlying devices
support them.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Tue, 24 Jul 2012 23:25:19 +0000 (09:25 +1000)]
Unlock the superblock even if initial dm_bufio_write_dirty_buffers fails.
Also, remove redundant flush calls. dm_bm_flush_and_unlock's calls to
dm_bufio_write_dirty_buffers already result in dm_bufio_issue_flush
being called.
This avoids warnings about unflushed dirty buffers from bufio.
Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Joe Thornber [Tue, 24 Jul 2012 23:25:18 +0000 (09:25 +1000)]
Fix memory leak in process_prepared_mapping by always freeing
the dm_thin_new_mapping structs from the mapping_pool mempool on
the error paths.
Signed-off-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
In preparation for RAID10 inclusion in dm-raid, we move the sectors_per_dev
calculation later in the device creation process. This is because we won't
know up-front how many stripes vs how many mirrors there are which will
change the calculation.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
In preparation for RAID10 addition to dm-raid, we change an 'if' conditional
to a 'switch' conditional to make it easier to see what is being checked for
each RAID type.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Tue, 24 Jul 2012 23:25:17 +0000 (09:25 +1000)]
A SCSI device handler might get attached to a device during the
initial device scan. We do not necessarily want to override
this when loading a multipath table, so this patch adds a new
multipath feature argument "retain_attached_hw_handler".
During SCSI device scan all loaded SCSI device handlers will be
consulted for a match (via scsi_dh's provided .match). If a match is
found that device handler will be attached. We need a way to have
userspace multipathd's provided 'hw_handler' not override the already
attached hardware handler.
When specifying the new feature 'retain_attached_hw_handler' multipath
will use the currently attached hardware handler instead of trying to
attach the one specified during table load. If no hardware handler is
attached the specified hardware handler will still be used.
Leverages scsi_dh_attach's ability to increment the scsi_dh's reference
count if the same scsi_dh name is provided when attaching - currently
attached scsi_dh name is determined with scsi_dh_attached_handler_name.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Babu Moger <babu.moger@netapp.com> Reviewed-by: Chandra Seetharaman <sekharan@us.ibm.com> Acked-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
dm-thin will be most likely used with a block size that is a power of
two. So it should be optimized for this case.
This patch changes division and modulo operations to shifts and bit
masks if block size is a power of two.
A test that bi_sector is divisible by a block size is removed from
io_overlaps_block. Device mapper never sends bios that span a block
boundary. Consequently, if we tested that bi_size is equivalent to block
size, bi_sector must already be on a block boundary.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
This patch sets the variable "ti->split_discard_requests", so that
device mapper core splits discard requests on a block boundary.
Consequently, a discard request that spans multiple blocks is never sent
to dm-thin. The patch also removes some code in process_discard that
deals with discards that span multiple blocks.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Tue, 24 Jul 2012 23:25:16 +0000 (09:25 +1000)]
Non power of 2 blocksize support is needed to properly align thinp IO
on storage that has non power of 2 optimal IO sizes (e.g. RAID6 10+2).
Use sector_div to support non power of 2 blocksize for the pool's
data device. This provides comparable performance to the power of 2
math that was performed until now (as tested on modern x86_64 hardware).
The kernel currently assumes that limits->discard_granularity is a power
of two so the thin target only enables discard support if the block
size is a power of two.
Eliminate pool structure's 'block_shift', 'offset_mask' and
remaining 4 byte holes.
Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
dm-stripe is usually used with a chunk size that is a power of two.
Previous code used slow divide and modulo operations with a power-of-two
chunk size. This patch changes it to use faster shifts and bit masks.
stripe_width is already optimized in a similar way.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>