Updated: 7/10/2005
Disclaimer: These notes are a project in development -- primarily to myself as I try to understand the process of implementing a Coraid etherdrive. If you happen across these notes, treat any information contained herein with the appropriate distrust. Neither I or my organization are responsible for errors in this document or their consequences.

Implementing a Coraid Etherdrive Storage Blade unit

We purchased an Etherdrive unit with 10 Parallel-ATA drive bays in order to provide disk storage for disk-to-disk backups and for supplemental storage for our web server.

For disk-to-disk backups, we usually do a monthly, daily, weekly backup scheme and do not really need RAID storage at this time.

To connect the Etherdrive unit, I chose to create a separate LAN independent of our office LAN. I chose the 3COM 4226T switch because it meets CORAID's suggestion that the switch have 802.3ad (port aggregation or LACP) and 802.3x (flow control). The price was reasonable, and the switch has sufficient ports to dedicate a port for each blade and a port for each server even if we end up with a 1:1 correspondance between servers and blades. (the switch has 24 ports total).

We're starting with Fedora Core Linux 5, so I downloaded the 'aoe-2.6-31.tar.gz' from the CORAID support site.

The AOE drivers for Linux

To install AOE for our Fedora 2.6 kernel (SMP):
  1. sudo yum install kernel-smp-devel
  2. For kernel 2.6.17-1.2157_FC5smp I needed the following steps:
    • cd /lib/modules/2.6.17-1.2157_FC5smp
    • ln -sf ../../../usr/src/kernels/2.6.17-1.2157_FC5-smp-i686 build
  3. tar xzpvf aoe-2.6-31.tar.gz
  4. cd aoe-2.6.31
  5. I needed to fix the udev-install.sh script by adding the following line to fix the value of "rules_d"
    rules_d=/etc/udev/rules.d
    
  6. make -- The Makefile builds an aoe module for the currently-running kernel. The kernel sources in /lib/modules/`uname -r`/build are used. Enough drivers are made for 10 etherdrive blade shelves.
  7. sudo make install

The AOEtools

The AOEtools are included in the aoe-2.6.31.tar.gz tar file: tools:
aoe-discovertrigger discovery of ATA over Ethernet devices
aoe-interfacesrestrict network interfaces used for AoE
aoe-mkdevscreate character and block device files
aoe-mkshelfcreate block device files for one shelf address
aoe-statprint status information for AoE devices
aoepingsimple userland communication with AoE devices
To install the AOEtools
  1. tar xzvf aoetools-4.tar.gz
  2. cd aoetools-4
  3. make
  4. sudo make install -- the default install locations are in 'Makefile' -- I didn't change anything.

The Hardware

I populated the Etherdrive chassis with 10 200GB drives and placed in our rack alongside the 3COM switch. I used the bottom row of switch connections for the drive shelves and the top row to connect to other machines.

I also used the switch for mailserver3 to send a heartbeat signal to mailserver4 and vice-versa, so this was a good test that the switch worked.

The Etherdrive unit itself needs a shelf address.. This is set with dip switches on the back of the etherdrive unit. Since this is our only unit, I set the shelf address to 0. shelf addresses allow multiple etherdrive units to be connected to the same switch and lan -- and the shelf address is also a part of the /dev/etherd/nnn block device. For example, /dev/etherd/e0.1 refers to the 1st blade on shelf 0.

Setting up a partition

I installed the etherdrive software on mailserver4 (running Fedora Core 3) and rebooted. The devices should show up in tt>/dev/etherd. If they don't, make sure the following lines are in /etc/modprobe.conf.
alias block-major-152 aoe
alias char-major-152 aoe
If these lines exist, it may not be a big deal -- Fedora Core 3 and 4 uses 'udev' for creating device nodes dynamically. The file /etc/udev/rules.d/60-aoe.rules should have been installed. The file looks like this:
# These rules tell udev what device nodes to create for aoe support.
# They may be installed along the following lines (adjusted to what
# you see on your system).
#
#   ecashin@makki ~$ su
#   Password:
#   bash# find /etc -type f -name udev.conf
#   /etc/udev/udev.conf
#   bash# grep udev_rules= /etc/udev/udev.conf
#   udev_rules="/etc/udev/rules.d/"
#   bash# ls /etc/udev/rules.d/
#   10-wacom.rules  50-udev.rules
#   bash# cp /path/to/linux-2.6.xx/Documentation/aoe/udev.txt \
#           /etc/udev/rules.d/60-aoe.rules
#

# aoe char devices
SUBSYSTEM="aoe", KERNEL="discover",     NAME="etherd/%k", GROUP="disk", MODE="0220"
SUBSYSTEM="aoe", KERNEL="err",          NAME="etherd/%k", GROUP="disk", MODE="0440"
SUBSYSTEM="aoe", KERNEL="interfaces",   NAME="etherd/%k", GROUP="disk", MODE="0220"

# aoe block devices
KERNEL="etherd*",       NAME="%k", GROUP="disk"

With the device driver in place, and the /dev/etherd drivers possibly present, I was able to do fdisk /dev/etherd/e0.1 and create two partitions:


Disk /dev/etherd/e0.1: 251.0 GB, 251000193024 bytes 255 heads, 63 sectors/track, 30515 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/etherd/e0.1p1 1 31 248976 83 Linux /dev/etherd/e0.1p2 32 30515 244862730 83 Linux
I then did an mkfs -t ext3 /dev/etherd/e0.1p1 command to create the ext3 filesystem. I did the same with e0.1p2.

In our disk-to-disk backup scheme, I wanted to use the autofs facility to mount our backup devices in /daily, in /weekly, and in monthly. I wanted the "daily" backup device to be the 1st etherdrive blade, and so on mailserver3 I added an entry to /etc/auto.master for the /daily directory like this:

# $Id: auto.master,v 1.2 1997/10/06 21:52:03 hpa Exp $
# Sample auto.master file
# Format of this file:
# mountpoint map options
# For details of the format look at autofs(8).
/daily  /etc/auto.daily
/misc   /etc/auto.misc
/r      /etc/auto.direct
/workstation    /etc/auto.workstation
The corresponding auto.daily file then looks like this:
mailserver3.acd.ucar.edu:/home/fredrick/aoetools-4> cat /etc/auto.daily
#
# $Id: auto.misc,v 1.2 2003/09/29 08:22:35 raven Exp $
#
# This is an automounter map and it has the following format
# key [ -mount-options-separated-by-comma ] location
# Details may be found in the autofs(5) manpage
var           -fstype=ext3,rw         :/dev/etherd/e0.0p1
home          -fstype=ext3,rw         :/dev/etherd/e0.0p2
etc           -fstype=ext3,rw         :/dev/etherd/e0.0p3
usr           -fstype=ext3,rw         :/dev/etherd/e0.0p5
free          -fstype=ext3,rw         :/dev/etherd/e0.0p6
The result should be that any reference to /daily/home mounts the drive /dev/etherd/e0.0p2 and so forth.

An rsync command verified that I could populate the partition with files. So it was a simple matter to create a daily backup script to rsync to the above filesystems.

Unmounting early in shutdown

It is important in the shutdown and reboot scripts to unmount a drive early -- before the network is lost. Otherwise, open inodes, etc., may exist and there would be no network to which to write them to disk. Using autofs seemed like the best solution to me -- autofs would be shut down before the network in most cases.

Note that when creating RAID devices, the initialization of these devices should be after the initalization of the network. This cannot be done in /etc/raidtab, so CORAID recommends setting up a raid configuration separately (such as in /etc/rt) and mounting from rc.local (e.g., raidstart -c /etc/rt /dev/md1 ; mount /dev/md1 /mnt/raid -- see http://www.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html

Performance

Communicating with an AOE drive is of course quite a bit slower than communicating with a directly attached drive. Performance can be improved somewhat by striping multiple drives together in a RAID configuration, but the bottleneck is more or less the 100MBps network interface. Etherdrive partitions should not be used where performance is critical.

Also note that fsck's could take a very long time because of the slower performance of the drives. You can set these options:

mailserver4:/etc/rc.d> sudo tune2fs -i 300 /dev/etherd/e0.1p2
tune2fs 1.36 (05-Feb-2005)
Setting interval between check 25920000 seconds

mailserver4:/etc/rc.d> sudo tune2fs -c 400 /dev/etherd/e0.1p2
tune2fs 1.36 (05-Feb-2005)
Setting maximal mount count to 400

but it might be a good idea to set up some process whereby fsck can happen in a controlled fashion. Perhaps manually through a regular maintenance schedule.

Moving drives between servers

An etherdrive (or etherdrives) may be moved between servers simply by unmounting the drive on one server and mounting it on another. It might also be possible for one server to have read/write access to a drive while another has read-only access, but I haven't yet explored that option.

Troubleshooting

The aoe-stat program may be used to see the status of the various etherdrive blades (as seen from the perspective of the Fedora Linux aoe device driver module). For example:
mailserver4:/etc/udev/rules.d> sudo aoe-stat
    e0.0            eth1              up
    e0.1            eth1              up
    e0.2            eth1              up
    e0.3            eth1              up
"up" meands that the drive is functioning normally and ready for I/O.

CORAID's FAQ also contains a lot of useful troubleshooting information.

Another situation that is common is a mismatched kernel. The kernel source for single-CPU kernels is installed with the command yum install kernel-devel and the kernel source for SMP kernels is installed with the command yum install kernel-smp-devel. Taking a look at the output of /sbin/modinfo /lib/modules/2.6.17-1.2157_FC5smp/kernel/drivers/block/aoe/aoe.ko may help. In particular, look at the "vermagic" string.

References