2009-06-29
Log Analysis
-
Controller(?) timed out and sdc3 ejected
Jun 29 20:47:07 hastur kernel: ata11.00: failed to read SCR 1 (Emask=0x40) Jun 29 20:48:49 hastur kernel: INFO: task md3_raid5:3352 blocked for more than 120 seconds Jun 29 20:48:58 hastur kernel: ata11.02: hard resetting link Jun 29 20:48:58 hastur kernel: ata11.02: failed to read SCR 2 (Emask=0x40) Jun 29 20:48:58 hastur kernel: ata11.02: failed to read SCR 2 (Emask=0x40) Jun 29 20:48:58 hastur kernel: ata11.02: COMRESET failed (errno=-5) Jun 29 20:48:58 hastur kernel: ata11.02: failed to read SCR 0 (Emask=0x40) Jun 29 20:48:58 hastur kernel: ata11.02: reset failed, giving up Jun 29 20:48:58 hastur kernel: ata11.02: failed to recover link after 3 tries, disabling Jun 29 20:48:58 hastur kernel: ata11.02: disabled Jun 29 20:49:08 hastur kernel: sd 10:2:0:0: rejecting I/O to offline device Jun 29 20:49:08 hastur kernel: sd 10:2:0:0: rejecting I/O to offline device Jun 29 20:49:08 hastur kernel: ata11: EH complete Jun 29 20:49:08 hastur kernel: sd 10:2:0:0: rejecting I/O to offline device Jun 29 20:49:08 hastur kernel: raid5: Disk failure on sdc3, disabling device. Operation continuing on 5 devices Jun 29 20:49:11 hastur kernel: ata11.02: detaching (SCSI 10:2:0:0)
-
Hot-removed sdd3 from array after enclosure alarm
Jun 29 20:57:47 hastur kernel: ata11.03: disabled Jun 29 20:57:47 hastur kernel: sd 10:3:0:0: rejecting I/O to offline device Jun 29 20:57:47 hastur kernel: sd 10:3:0:0: rejecting I/O to offline device Jun 29 20:57:47 hastur kernel: sd 10:3:0:0: [sdd] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK Jun 29 20:57:47 hastur kernel: end_request: I/O error, dev sdd, sector 404675832 Jun 29 20:57:47 hastur kernel: raid5:md3: read error not correctable (sector 298261272 on sdd3). Jun 29 20:57:47 hastur kernel: raid5: Disk failure on sdd3, disabling device. Operation continuing on 4 devices Jun 29 20:57:48 hastur kernel: ata11.03: detaching (SCSI 10:3:0:0)
Marked array as readonly
- Shutdown and removed system for maintenance.
- On reboot disks were renumbered.
- Ran non-destructive read/write badblocks test on all disks (ALL CLEAN)
- Attempted to re-add failed disks to array.
- Somehow managed to rewrite superblocks on disks I was attempting to re-add.
data_offset
- Old version of mdadm created the original array with data offset of 136 sectors into each component device.
- Versions since mdadm-2.6 support a new bitmap feature which moves the data offset to 272 sectors.
- Hexediting the data offset and fixing the superblock checksum would be safe according to Neil Brown.
- In the end though I compiled from source the version of mdadm that was used to create the original array.
diff -u md3.sdb3.orig md3.sde3.new
--- md3.sdb3.orig 2009-07-07 10:14:25.000000000 +0100
+++ md3.sde3.new 2009-07-07 10:14:27.000000000 +0100
@@ -2,26 +2,26 @@
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
- Array UUID : 2b7ca9c9:c9fa9a28:086e0f83:90cbef62
+ Array UUID : 679bc68c:aeb0464c:8f11e607:c8e58161
Name : hastur:3 (local to host hastur)
- Creation Time : Thu Oct 18 14:46:47 2007
+ Creation Time : Sun Jul 5 03:32:38 2009
Raid Level : raid5
Raid Devices : 6
- Avail Dev Size : 870353369 (415.02 GiB 445.62 GB)
+ Avail Dev Size : 870353233 (415.02 GiB 445.62 GB)
Array Size : 4351765760 (2075.08 GiB 2228.10 GB)
Used Dev Size : 870353152 (415.02 GiB 445.62 GB)
- Data Offset : 136 sectors
+ Data Offset : 272 sectors
Super Offset : 8 sectors
State : clean
- Device UUID : c4983266:9ee820fd:106bbf9d:20a69333
+ Device UUID : 1b87acce:883de3fc:881f279e:e2b84a9b
- Update Time : Mon Jun 29 21:05:55 2009
- Checksum : 5a501eb5 - correct
- Events : 320840
+ Update Time : Sun Jul 5 03:32:38 2009
+ Checksum : 94ba3ae4 - correct
+ Events : 0
Layout : left-symmetric
Chunk Size : 128K
- Array Slot : 0 (failed, 1, 2, failed, failed, 4, 5)
- Array State : _uu_uu 3 failed
+ Array Slot : 0 (0, 1, 2, 3, 4, 5)
+ Array State : Uuuuuu
Loopback devices
-
Created sparse loopback devices (first 50MB of each partition) to play with superblocks
!/bin/sh
BLOCKS_PER_DEV=$(sfdisk -s /dev/sdb3} for i in {b..g} do BLOCKS=$(sfdisk -s /dev/sd${i}3) # blocks BLOCKS=$(sfdisk -s /dev/sd${i}3) # blocks dd if=/dev/sd${i}3 of=isd${i}3 bs=512 count=102400 # 50MB dd if=/dev/zero of=isd${i}3 bs=1k seek=$BLOCKS count=0 losetup -f isd${i}3 done
Permute
- Quick c++ to permute order of devices
- Output space-separated, one permutation per line
permute-loop.cpp
-
Permute [012345]
include
include
include
include
using namespace std;
int main(void) { vector v; v.push_back(0); v.push_back(1); v.push_back(2); v.push_back(3); v.push_back(4); v.push_back(5);
cout << "0 1 2 3 4 5" << endl; // initial while (next_permutation(v.begin(), v.end() ) ) { // Loop until all permutations are generated. copy(v.begin(), v.end(), ostream_iterator<int>(cout, " ")); cout << endl; } return 0;
}
permute-real.cpp
-
Permute [bcdefg]
include
include
include
include
using namespace std;
int main(void) { vector v; v.push_back('b'); v.push_back('c'); v.push_back('d'); v.push_back('e'); v.push_back('f'); v.push_back('g');
cout << "b c d e f g" << endl; while (next_permutation(v.begin(), v.end() ) ) { // Loop until all permutations are generated. copy(v.begin(), v.end(), ostream_iterator<char>(cout, " ")); cout << endl; } return 0;
}
Compile
g++ -o permute-loop permute-loop.cpp
g++ -o permute-real permute-real.cpp
Recovery script
#!/bin/sh
ECHO= # set to echo to test
MDADM=mdadm-2.5.6 # old version for old superblock data_offset size
MD_DEV=md3
CRYPT_DEV=crypt-md3
./permute-real | while read b c d e f g
do
echo /dev/sd${b}3 /dev/sd${c}3 /dev/sd${d}3 /dev/sd${e}3 /dev/sd${f}3 /dev/sd${g}3
echo 'y' |
$ECHO $MDADM -C --assume-clean -f -e 1.2 -l 5 -p ls -c 128 -n6 /dev/$MD_DEV /dev/sd${b}3 /dev/sd${c}3 /dev/sd${d}3 /dev/sd${e}3 /dev/sd${f}3 /dev/sd${g}3 &> /dev/null
if (($? == 0))
then
sleep 0.3s
$ECHO mdadm -o /dev/$MD_DEV
if ($ECHO cryptsetup isLuks /dev/$MD_DEV )
then
echo -n " LUKS "
echo "$PASSWORD" |
if ($ECHO cryptsetup -T1 luksOpen /dev/$MD_DEV $CRYPT_DEV )
then
echo -n " UNLOCKED "
if ( $ECHO mount -o ro /dev/mapper/$CRYPT_DEV mnt )
then
echo -n " MOUNTED "
$ECHO umount /dev/mapper/$CRYPT_DEV
fi
$ECHO cryptsetup luksClose $CRYPT_DEV
fi
fi
sleep 0.3s
$ECHO mdadm --stop /dev/$MD_DEV &> /dev/null
fi
echo ""
done
XFS Repair
- XFS wouldn't mount read-only if there were errors. (So the script was inconclusive).
- Ran xfs_repair -n to determine which (of the two probable) configurations would need the fewest filesystem changes.
-
Recreated correct configuration
mdadm-2.5.6 -C --assume-clean -f -e 1.2 -l 5 -p ls -c 128 -n6 /dev/md3 /dev/sde3 /dev/sdd3 /dev/sdg3 /dev/sdf3 /dev/sdc3 /dev/sdb3
-
Run mdadm check, speed limit
echo -n check > /sys/block/md3/md/sync_action echo -n 10000 > /proc/sys/dev/raid/speed_limit_max
Open, mount and unmount XFS, xfs_repair
xfs_repair /dev/mapper/crypt-md3
-
xfs_repair reported that the log needed to be replayed by mount/umounting, then rerunning xfs_repair
mount /dev/mapper/crypt-md3 /mnt/md3 umount /mnt/md3 xfs_repair /dev/mapper/crypt-md3
Final mount
mount /mnt/md3
Force Assemble?
-
Recover array faster by forcing assemble: clear failed flag from enough disks to assemble
mdadm --assemble --force --scan /dev/md3
mdadm: forcing event count in /dev/sdd3(2) from 5 upto 10 mdadm: clearing FAULTY flag for device 3 in /dev/md3 for /dev/sdd3 mdadm: /dev/md3 has been started with 5 drives (out of 6).
-
Mark as readonly
mdadm -o /dev/md3
How do we forcibly re-add a failed drive?