2009-06-29

Log Analysis

  • Controller(?) timed out and sdc3 ejected

    Jun 29 20:47:07 hastur kernel: ata11.00: failed to read SCR 1 (Emask=0x40) Jun 29 20:48:49 hastur kernel: INFO: task md3_raid5:3352 blocked for more than 120 seconds Jun 29 20:48:58 hastur kernel: ata11.02: hard resetting link Jun 29 20:48:58 hastur kernel: ata11.02: failed to read SCR 2 (Emask=0x40) Jun 29 20:48:58 hastur kernel: ata11.02: failed to read SCR 2 (Emask=0x40) Jun 29 20:48:58 hastur kernel: ata11.02: COMRESET failed (errno=-5) Jun 29 20:48:58 hastur kernel: ata11.02: failed to read SCR 0 (Emask=0x40) Jun 29 20:48:58 hastur kernel: ata11.02: reset failed, giving up Jun 29 20:48:58 hastur kernel: ata11.02: failed to recover link after 3 tries, disabling Jun 29 20:48:58 hastur kernel: ata11.02: disabled Jun 29 20:49:08 hastur kernel: sd 10:2:0:0: rejecting I/O to offline device Jun 29 20:49:08 hastur kernel: sd 10:2:0:0: rejecting I/O to offline device Jun 29 20:49:08 hastur kernel: ata11: EH complete Jun 29 20:49:08 hastur kernel: sd 10:2:0:0: rejecting I/O to offline device Jun 29 20:49:08 hastur kernel: raid5: Disk failure on sdc3, disabling device. Operation continuing on 5 devices Jun 29 20:49:11 hastur kernel: ata11.02: detaching (SCSI 10:2:0:0)

  • Hot-removed sdd3 from array after enclosure alarm

    Jun 29 20:57:47 hastur kernel: ata11.03: disabled Jun 29 20:57:47 hastur kernel: sd 10:3:0:0: rejecting I/O to offline device Jun 29 20:57:47 hastur kernel: sd 10:3:0:0: rejecting I/O to offline device Jun 29 20:57:47 hastur kernel: sd 10:3:0:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK Jun 29 20:57:47 hastur kernel: end_request: I/O error, dev sdd, sector 404675832 Jun 29 20:57:47 hastur kernel: raid5:md3: read error not correctable (sector 298261272 on sdd3). Jun 29 20:57:47 hastur kernel: raid5: Disk failure on sdd3, disabling device. Operation continuing on 4 devices Jun 29 20:57:48 hastur kernel: ata11.03: detaching (SCSI 10:3:0:0)

  • Marked array as readonly

  • Shutdown and removed system for maintenance.
  • On reboot disks were renumbered.
  • Ran non-destructive read/write badblocks test on all disks (ALL CLEAN)
  • Attempted to re-add failed disks to array.
  • Somehow managed to rewrite superblocks on disks I was attempting to re-add.

data_offset

  • Old version of mdadm created the original array with data offset of 136 sectors into each component device.
  • Versions since mdadm-2.6 support a new bitmap feature which moves the data offset to 272 sectors.
  • Hexediting the data offset and fixing the superblock checksum would be safe according to Neil Brown.
  • In the end though I compiled from source the version of mdadm that was used to create the original array.

diff -u md3.sdb3.orig md3.sde3.new

--- md3.sdb3.orig       2009-07-07 10:14:25.000000000 +0100
+++ md3.sde3.new        2009-07-07 10:14:27.000000000 +0100
@@ -2,26 +2,26 @@
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
-     Array UUID : 2b7ca9c9:c9fa9a28:086e0f83:90cbef62
+     Array UUID : 679bc68c:aeb0464c:8f11e607:c8e58161
           Name : hastur:3  (local to host hastur)
-  Creation Time : Thu Oct 18 14:46:47 2007
+  Creation Time : Sun Jul  5 03:32:38 2009
     Raid Level : raid5
   Raid Devices : 6

- Avail Dev Size : 870353369 (415.02 GiB 445.62 GB)
+ Avail Dev Size : 870353233 (415.02 GiB 445.62 GB)
     Array Size : 4351765760 (2075.08 GiB 2228.10 GB)
  Used Dev Size : 870353152 (415.02 GiB 445.62 GB)
-    Data Offset : 136 sectors
+    Data Offset : 272 sectors
   Super Offset : 8 sectors
          State : clean
-    Device UUID : c4983266:9ee820fd:106bbf9d:20a69333
+    Device UUID : 1b87acce:883de3fc:881f279e:e2b84a9b

-    Update Time : Mon Jun 29 21:05:55 2009
-       Checksum : 5a501eb5 - correct
-         Events : 320840
+    Update Time : Sun Jul  5 03:32:38 2009
+       Checksum : 94ba3ae4 - correct
+         Events : 0

         Layout : left-symmetric
     Chunk Size : 128K

-    Array Slot : 0 (failed, 1, 2, failed, failed, 4, 5)
-   Array State : _uu_uu 3 failed
+    Array Slot : 0 (0, 1, 2, 3, 4, 5)
+   Array State : Uuuuuu

Loopback devices

  • Created sparse loopback devices (first 50MB of each partition) to play with superblocks

    !/bin/sh

    BLOCKS_PER_DEV=$(sfdisk -s /dev/sdb3} for i in {b..g} do BLOCKS=$(sfdisk -s /dev/sd${i}3) # blocks BLOCKS=$(sfdisk -s /dev/sd${i}3) # blocks dd if=/dev/sd${i}3 of=isd${i}3 bs=512 count=102400 # 50MB dd if=/dev/zero of=isd${i}3 bs=1k seek=$BLOCKS count=0 losetup -f isd${i}3 done

Permute

  • Quick c++ to permute order of devices
  • Output space-separated, one permutation per line

permute-loop.cpp

  • Permute [012345]

    include

    include

    include

    include

    using namespace std;

    int main(void) { vector v; v.push_back(0); v.push_back(1); v.push_back(2); v.push_back(3); v.push_back(4); v.push_back(5);

        cout << "0 1 2 3 4 5" << endl; // initial 
        while (next_permutation(v.begin(), v.end() ) ) { 
                    // Loop until all permutations are generated.
                copy(v.begin(), v.end(), ostream_iterator<int>(cout, " "));
                cout << endl;
        }
        return 0;
    

    }

permute-real.cpp

  • Permute [bcdefg]

    include

    include

    include

    include

    using namespace std;

    int main(void) { vector v; v.push_back('b'); v.push_back('c'); v.push_back('d'); v.push_back('e'); v.push_back('f'); v.push_back('g');

        cout << "b c d e f g" << endl;
        while (next_permutation(v.begin(), v.end() ) ) { 
                    // Loop until all permutations are generated.
                copy(v.begin(), v.end(), ostream_iterator<char>(cout, " "));
                cout << endl;
        }
        return 0;
    

    }

Compile

g++ -o permute-loop permute-loop.cpp
g++ -o permute-real permute-real.cpp

Recovery script

#!/bin/sh
ECHO=   # set to echo to test
MDADM=mdadm-2.5.6 # old version for old superblock data_offset size
MD_DEV=md3
CRYPT_DEV=crypt-md3

./permute-real | while read b c d e f g
do
        echo /dev/sd${b}3 /dev/sd${c}3 /dev/sd${d}3 /dev/sd${e}3 /dev/sd${f}3 /dev/sd${g}3
        echo 'y' |
        $ECHO $MDADM -C --assume-clean -f -e 1.2 -l 5 -p ls -c 128 -n6 /dev/$MD_DEV /dev/sd${b}3 /dev/sd${c}3 /dev/sd${d}3 /dev/sd${e}3 /dev/sd${f}3 /dev/sd${g}3 &> /dev/null
        if (($? == 0))
        then
                sleep 0.3s
                $ECHO mdadm -o /dev/$MD_DEV
                if ($ECHO cryptsetup isLuks /dev/$MD_DEV )
                then
                        echo -n " LUKS "
                        echo "$PASSWORD" | 
                        if ($ECHO cryptsetup -T1 luksOpen /dev/$MD_DEV $CRYPT_DEV )
                        then
                                echo -n "  UNLOCKED "
                                if ( $ECHO mount -o ro /dev/mapper/$CRYPT_DEV mnt )
                                then
                                        echo -n " MOUNTED "
                                        $ECHO umount /dev/mapper/$CRYPT_DEV
                                fi
                                $ECHO cryptsetup  luksClose $CRYPT_DEV
                        fi
                fi
                sleep 0.3s
                $ECHO mdadm --stop /dev/$MD_DEV &> /dev/null
        fi
        echo ""
done

XFS Repair

  • XFS wouldn't mount read-only if there were errors. (So the script was inconclusive).
  • Ran xfs_repair -n to determine which (of the two probable) configurations would need the fewest filesystem changes.
  • Recreated correct configuration

    mdadm-2.5.6 -C --assume-clean -f -e 1.2 -l 5 -p ls -c 128 -n6 /dev/md3 /dev/sde3 /dev/sdd3 /dev/sdg3 /dev/sdf3 /dev/sdc3 /dev/sdb3

  • Run mdadm check, speed limit

    echo -n check > /sys/block/md3/md/sync_action echo -n 10000 > /proc/sys/dev/raid/speed_limit_max

  • Open, mount and unmount XFS, xfs_repair

xfs_repair /dev/mapper/crypt-md3

  • xfs_repair reported that the log needed to be replayed by mount/umounting, then rerunning xfs_repair

    mount /dev/mapper/crypt-md3 /mnt/md3 umount /mnt/md3 xfs_repair /dev/mapper/crypt-md3

  • Final mount

mount /mnt/md3

Force Assemble?

  • Recover array faster by forcing assemble: clear failed flag from enough disks to assemble

    mdadm --assemble --force --scan /dev/md3

    mdadm: forcing event count in /dev/sdd3(2) from 5 upto 10 mdadm: clearing FAULTY flag for device 3 in /dev/md3 for /dev/sdd3 mdadm: /dev/md3 has been started with 5 drives (out of 6).

  • Mark as readonly

    mdadm -o /dev/md3

  • How do we forcibly re-add a failed drive?