2009-06-29

linux-raid thread

Log Analysis

Controller(?) timed out and sdc3 ejected

Jun 29 20:47:07 hastur kernel: ata11.00: failed to read SCR 1 (Emask=0x40) Jun 29 20:48:49 hastur kernel: INFO: task md3_raid5:3352 blocked for more than 120 seconds Jun 29 20:48:58 hastur kernel: ata11.02: hard resetting link Jun 29 20:48:58 hastur kernel: ata11.02: failed to read SCR 2 (Emask=0x40) Jun 29 20:48:58 hastur kernel: ata11.02: failed to read SCR 2 (Emask=0x40) Jun 29 20:48:58 hastur kernel: ata11.02: COMRESET failed (errno=-5) Jun 29 20:48:58 hastur kernel: ata11.02: failed to read SCR 0 (Emask=0x40) Jun 29 20:48:58 hastur kernel: ata11.02: reset failed, giving up Jun 29 20:48:58 hastur kernel: ata11.02: failed to recover link after 3 tries, disabling Jun 29 20:48:58 hastur kernel: ata11.02: disabled Jun 29 20:49:08 hastur kernel: sd 10:2:0:0: rejecting I/O to offline device Jun 29 20:49:08 hastur kernel: sd 10:2:0:0: rejecting I/O to offline device Jun 29 20:49:08 hastur kernel: ata11: EH complete Jun 29 20:49:08 hastur kernel: sd 10:2:0:0: rejecting I/O to offline device Jun 29 20:49:08 hastur kernel: raid5: Disk failure on sdc3, disabling device. Operation continuing on 5 devices Jun 29 20:49:11 hastur kernel: ata11.02: detaching (SCSI 10:2:0:0)
Hot-removed sdd3 from array after enclosure alarm

Jun 29 20:57:47 hastur kernel: ata11.03: disabled Jun 29 20:57:47 hastur kernel: sd 10:3:0:0: rejecting I/O to offline device Jun 29 20:57:47 hastur kernel: sd 10:3:0:0: rejecting I/O to offline device Jun 29 20:57:47 hastur kernel: sd 10:3:0:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK Jun 29 20:57:47 hastur kernel: end_request: I/O error, dev sdd, sector 404675832 Jun 29 20:57:47 hastur kernel: raid5:md3: read error not correctable (sector 298261272 on sdd3). Jun 29 20:57:47 hastur kernel: raid5: Disk failure on sdd3, disabling device. Operation continuing on 4 devices Jun 29 20:57:48 hastur kernel: ata11.03: detaching (SCSI 10:3:0:0)
Marked array as readonly
Shutdown and removed system for maintenance.
On reboot disks were renumbered.
Ran non-destructive read/write badblocks test on all disks (ALL CLEAN)
Attempted to re-add failed disks to array.
Somehow managed to rewrite superblocks on disks I was attempting to re-add.

data_offset

Old version of mdadm created the original array with data offset of 136 sectors into each component device.
Versions since mdadm-2.6 support a new bitmap feature which moves the data offset to 272 sectors.
Hexediting the data offset and fixing the superblock checksum would be safe according to Neil Brown.
In the end though I compiled from source the version of mdadm that was used to create the original array.

diff -u md3.sdb3.orig md3.sde3.new

--- md3.sdb3.orig       2009-07-07 10:14:25.000000000 +0100
+++ md3.sde3.new        2009-07-07 10:14:27.000000000 +0100
@@ -2,26 +2,26 @@
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
-     Array UUID : 2b7ca9c9:c9fa9a28:086e0f83:90cbef62
+     Array UUID : 679bc68c:aeb0464c:8f11e607:c8e58161
           Name : hastur:3  (local to host hastur)
-  Creation Time : Thu Oct 18 14:46:47 2007
+  Creation Time : Sun Jul  5 03:32:38 2009
     Raid Level : raid5
   Raid Devices : 6

- Avail Dev Size : 870353369 (415.02 GiB 445.62 GB)
+ Avail Dev Size : 870353233 (415.02 GiB 445.62 GB)
     Array Size : 4351765760 (2075.08 GiB 2228.10 GB)
  Used Dev Size : 870353152 (415.02 GiB 445.62 GB)
-    Data Offset : 136 sectors
+    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
-    Device UUID : c4983266:9ee820fd:106bbf9d:20a69333
+    Device UUID : 1b87acce:883de3fc:881f279e:e2b84a9b

-    Update Time : Mon Jun 29 21:05:55 2009
-       Checksum : 5a501eb5 - correct
-         Events : 320840
+    Update Time : Sun Jul  5 03:32:38 2009
+       Checksum : 94ba3ae4 - correct
+         Events : 0

         Layout : left-symmetric
     Chunk Size : 128K

-    Array Slot : 0 (failed, 1, 2, failed, failed, 4, 5)
-   Array State : _uu_uu 3 failed
+    Array Slot : 0 (0, 1, 2, 3, 4, 5)
+   Array State : Uuuuuu

Loopback devices

Created sparse loopback devices (first 50MB of each partition) to play with superblocks

!/bin/sh

BLOCKS_PER_DEV=$(sfdisk -s /dev/sdb3} for i in {b..g} do BLOCKS=$(sfdisk -s /dev/sd${i}3) # blocks BLOCKS=$(sfdisk -s /dev/sd${i}3) # blocks dd if=/dev/sd${i}3 of=isd${i}3 bs=512 count=102400 # 50MB dd if=/dev/zero of=isd${i}3 bs=1k seek=$BLOCKS count=0 losetup -f isd${i}3 done

Permute

Quick c++ to permute order of devices
Output space-separated, one permutation per line

permute-loop.cpp

Permute [012345]

include

using namespace std;

int main(void) { vector v; v.push_back(0); v.push_back(1); v.push_back(2); v.push_back(3); v.push_back(4); v.push_back(5);

    cout << "0 1 2 3 4 5" << endl; // initial 
    while (next_permutation(v.begin(), v.end() ) ) { 
                // Loop until all permutations are generated.
            copy(v.begin(), v.end(), ostream_iterator<int>(cout, " "));
            cout << endl;
    }
    return 0;

}

permute-real.cpp

Permute [bcdefg]

include

using namespace std;

int main(void) { vector v; v.push_back('b'); v.push_back('c'); v.push_back('d'); v.push_back('e'); v.push_back('f'); v.push_back('g');

    cout << "b c d e f g" << endl;
    while (next_permutation(v.begin(), v.end() ) ) { 
                // Loop until all permutations are generated.
            copy(v.begin(), v.end(), ostream_iterator<char>(cout, " "));
            cout << endl;
    }
    return 0;

}

Compile

g++ -o permute-loop permute-loop.cpp
g++ -o permute-real permute-real.cpp

Recovery script

#!/bin/sh
ECHO=   # set to echo to test
MDADM=mdadm-2.5.6 # old version for old superblock data_offset size
MD_DEV=md3
CRYPT_DEV=crypt-md3

./permute-real | while read b c d e f g
do
        echo /dev/sd${b}3 /dev/sd${c}3 /dev/sd${d}3 /dev/sd${e}3 /dev/sd${f}3 /dev/sd${g}3
        echo 'y' |
        $ECHO $MDADM -C --assume-clean -f -e 1.2 -l 5 -p ls -c 128 -n6 /dev/$MD_DEV /dev/sd${b}3 /dev/sd${c}3 /dev/sd${d}3 /dev/sd${e}3 /dev/sd${f}3 /dev/sd${g}3 &> /dev/null
        if (($? == 0))
        then
                sleep 0.3s
                $ECHO mdadm -o /dev/$MD_DEV
                if ($ECHO cryptsetup isLuks /dev/$MD_DEV )
                then
                        echo -n " LUKS "
                        echo "$PASSWORD" | 
                        if ($ECHO cryptsetup -T1 luksOpen /dev/$MD_DEV $CRYPT_DEV )
                        then
                                echo -n "  UNLOCKED "
                                if ( $ECHO mount -o ro /dev/mapper/$CRYPT_DEV mnt )
                                then
                                        echo -n " MOUNTED "
                                        $ECHO umount /dev/mapper/$CRYPT_DEV
                                fi
                                $ECHO cryptsetup  luksClose $CRYPT_DEV
                        fi
                fi
                sleep 0.3s
                $ECHO mdadm --stop /dev/$MD_DEV &> /dev/null
        fi
        echo ""
done

XFS Repair

XFS wouldn't mount read-only if there were errors. (So the script was inconclusive).
Ran xfs_repair -n to determine which (of the two probable) configurations would need the fewest filesystem changes.
Recreated correct configuration

mdadm-2.5.6 -C --assume-clean -f -e 1.2 -l 5 -p ls -c 128 -n6 /dev/md3 /dev/sde3 /dev/sdd3 /dev/sdg3 /dev/sdf3 /dev/sdc3 /dev/sdb3
Run mdadm check, speed limit

echo -n check > /sys/block/md3/md/sync_action echo -n 10000 > /proc/sys/dev/raid/speed_limit_max
Open, mount and unmount XFS, xfs_repair

xfs_repair /dev/mapper/crypt-md3

xfs_repair reported that the log needed to be replayed by mount/umounting, then rerunning xfs_repair

mount /dev/mapper/crypt-md3 /mnt/md3 umount /mnt/md3 xfs_repair /dev/mapper/crypt-md3
Final mount

mount /mnt/md3

Force Assemble?

Recover array faster by forcing assemble: clear failed flag from enough disks to assemble

mdadm --assemble --force --scan /dev/md3

mdadm: forcing event count in /dev/sdd3(2) from 5 upto 10 mdadm: clearing FAULTY flag for device 3 in /dev/md3 for /dev/sdd3 mdadm: /dev/md3 has been started with 5 drives (out of 6).
Mark as readonly

mdadm -o /dev/md3
How do we forcibly re-add a failed drive?

HasturRaidRecovery

2009-06-29

Log Analysis

data_offset

diff -u md3.sdb3.orig md3.sde3.new

Loopback devices

!/bin/sh

Permute

permute-loop.cpp

include

include

include

include

permute-real.cpp

include

include

include

include

Compile

Recovery script

XFS Repair

Force Assemble?

mdadm --assemble --force --scan /dev/md3

mdadm -o /dev/md3

SideBar