]> git.karo-electronics.de Git - karo-tx-linux.git/commitdiff
md: avoid endless recovery loop when waiting for fail device to complete.
authorNeilBrown <neilb@suse.de>
Tue, 28 Jun 2011 06:59:42 +0000 (16:59 +1000)
committerPaul Gortmaker <paul.gortmaker@windriver.com>
Thu, 17 May 2012 15:21:01 +0000 (11:21 -0400)
commit 4274215d24633df7302069e51426659d4759c5ed upstream.

If a device fails in a way that causes pending request to take a while
to complete, md will not be able to immediately remove it from the
array in remove_and_add_spares.
It will then incorrectly look like a spare device and md will try to
recover it even though it is failed.
This leads to a recovery process starting and instantly aborting over
and over again.

We should check if the device is faulty before considering it to be a
spare.  This will avoid trying to start a recovery that cannot
proceed.

This bug was introduced in 2.6.26 so that patch is suitable for any
kernel since then.

Reported-by: Jim Paradis <james.paradis@stratus.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
drivers/md/md.c

index 1287b03bb3f2f27380272c7a0362116867809f9c..d26df7f1b5abff65626429922452778f6ffe9062 100644 (file)
@@ -6863,6 +6863,7 @@ static int remove_and_add_spares(mddev_t *mddev)
                list_for_each_entry(rdev, &mddev->disks, same_set) {
                        if (rdev->raid_disk >= 0 &&
                            !test_bit(In_sync, &rdev->flags) &&
+                           !test_bit(Faulty, &rdev->flags) &&
                            !test_bit(Blocked, &rdev->flags))
                                spares++;
                        if (rdev->raid_disk < 0