Fedora uses new mdadm command for SW RAID

Updated 20-October-2004, Tim Fredrick
When I went to rebuild diskserver1 to Fedora, I did not have a copy of the old /etc/raidtab file. However, the rebuild recognized the RAID device and came up with its /dev/md0 and /pr1 filesystem intact.

My problem was that one of the disks had changed positions in the enclosure to allow for a new CD-ROM drive. So it showed up in /proc/mdstat as degraded.

It turns out that Fedora Linux now uses "mdadm" to manage its RAID filesystems, and doesn't really need /etc/raidtab any more. In fact, I never found a way to regenerate the file.

A helpful article at http://www.networknewz.com/2003/0113.html described mdadm and all that you can do with it in detail. The resulting /dev/md0 configuration always uses persistent superblocks, so the RAID configuration is recognized by mdadm at reboot.

To add a new /dev/hdf1 partition to the existing RAID-5 configuration, using mdadm it was simple: mdadm /dev/md0 --add /dev/hdf1. Once I did that, I entered "cat /proc/mdstat" and got:


Personalities : [raid5] 
read_ahead 1024 sectors
md0 : active raid5 hdf1[3] hdc1[1] hdd1[2]
      240107264 blocks level 5, 32k chunk, algorithm 0 [3/2] [_UU]
      [>....................]  recovery =  4.1% (4968188/120053632) finish=193.1min speed=9929K/sec
unused devices: 


showing the RAID being rebuilt with the new disk in place. (and in fact, the old hdd1 which had been considered "failed" because the partition had been moved and didn't exist as /dev/hdd1, had been removed from the RAID configuration.

Monitoring

In the past I have been depending on a home-grown script to monitor software RAIDs. "mdadm" has monitoring built in. The following command can be added to /etc/rc.local:


nohup mdadm --monitor --mail=sysadmin@acd.ucar.edu --delay=300 /dev/md0 &



The /etc/mdadm.conf file

The /etc/raidtab file should not exist -- or it will confuse mdadm in its role of enabling a software RAID.

To save the RAID configuration you need to save /etc/mdadm.conf. Once a RAID has been built, you need to run this command to create an entry for /etc/mdadm.conf:

/home/fredrick> mdadm --detail --scan
ARRAY /dev/md0 level=raid5 num-devices=6 UUID=1ba7e95a:24d106f6:7756543e:82c7b110
   devices=/dev/hde1,/dev/hdf1,/dev/hdi1,/dev/hdj1,/dev/hdk1,/dev/hdl1

Add the line (without line breaks) to /etc/mdadm.conf. While mdadm doesn't require this file to locate and mount a functioning array, it can be an invaluable file to have configured for troubleshooting when things go wrong.

Troubleshooting

Commands that can help with troubleshooting are: The problem that I found above was that 2 disks had not come up properly, perhaps due to a cabling issue or some such. What I did to rebuild the raid configuration was this:
sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=6 /dev/hde1 /dev/hdf1 /dev/hdi1 /dev/hdj1 /dev/hdk1 /dev/hdl1
sudo mount /dev/md0 /mnt
I may have also caused the problem by an errant /etc/raidtab file. For the record, /etc/raidtab should not exist.

When a RAID is degraded but you think the disk is okay

On nasbackup we had a problem after reboot where the RAID was marked degraded (as seen by "cat /proc/mdstat") but examination of the device (/dev/hde in our case) didn't reveal any obvious problem. I went ahead and hot-added the partition which caused the RAID to start to rebuild (preserving its data). The commands were:
1      cat /proc/mdstat
2      sudo mdadm --examine /dev/hde1
3      sudo mdadm /dev/md0 -a /dev/hde2
4      cat /proc/mdstat

Command 1 shows that the RAID is degraded. Command 2 examines the status of the disk which dropped out of our 3 disk RAID Level 5 configuration. Command 3 is the "hot-add" command to re-add the partition back to the RAID Level 5 configuration. And the 4th command shows that a rebuild is in progress (can take up to 24 hours in our case).

References