February 19, 2013

Linux RAID with mdadm - Do's and Dont's

I've been keeping my personal data safe with linux software raids for almost a decade. I've even convinced many friends to do the same. Lost data is so frustrating... actually, loosing data was one of the forces that pushed me to abandon M$ windowz and become a daily linux user.

History

It was the early 2000’s and everyone was sharing multimedia files. The p2p networks had anything you could want and DVD writers + sneakernet allowed even the bandwidth poor to get anything from friends.

I bought a huge 160GB drive to act as the primary dumping ground for my new digital treasure and I filled it to the brim, or so I thought. While editing some videos with a friend I discovered chunks of other video files interspersed with the file I was editing. Turns out windowz was only using 32bit adressing on my disk and after I reached 128GB it started truncating the addresses and writing back over the other data while reporting that everything was OK.

I thought RAID 5 would be cool, until I realized how expensive and loud that would be. Taking a step back I decided that I could afford 2 drives in a mirrored configuration called RAID 1. I originally RAIDed my drives with the so-called raid controller on my motherboard. Bad Idea - it wasn’t a real raid controller. It used the CPU to do all the heavy lifting, the admin interface sucked and no other vendor’s controller wanted anything to do with those drives. Linux to the rescue!

Glorious Software RAID

Creating a software RAID in linux is easy and has allowed me to move the same pair of drives into multiple systems with different hardware and kernels without a problem for nearly a decade. Additionally, in a pinch I can mount each drive independently without any raid software at-all.

You may need to install mdadm. In Ubuntu or Debian use:

sudo apt-get install mdadm

Preparing Your Drives

If your drives are brand new you can skip to step 2. For this example I'll call the drives /dev/sdX and /dev/sdY but replace them with the actual names/letters from your system. These instructions also assume that the entire drive will be used for your RAID.

1) Check for previous raid superblocks

If you get the following response, then you are likely in good shape.

> sudo mdadm --manage --examine /dev/sdX
mdadm: No md superblock detected on /dev/sdX.

But if you get the following, you'll need to do some cleanup first.

> sudo mdadm --manage --examine /dev/sdY
/dev/sdY:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 2a321d73:92a29f89:91a9f934:c8ab7b11 (local to host)
  Creation Time : Tue Jan 25 17:02:50 2011
     Raid Level : raid1
  Used Dev Size : 625131776 (596.17 GiB 640.13 GB)
     Array Size : 625131776 (596.17 GiB 640.13 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 1

    Update Time : Sun Feb 10 11:39:41 2013
          State : clean
 Active Devices : 2
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 83ef590e - correct
         Events : 122


      Number   Major   Minor   RaidDevice State
this     1       8       48        1      active sync   /dev/sdY

Remove the old superblock.

> sudo mdadm --misc --zero-superblock /dev/sdY

Erase all drive meta data.

> sudo dd if=/dev/zero of=/dev/sdY bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 1.86259 s, 56.3 MB/s

2) Create partitions

You'll want do these steps on both drives.

> sudo fdisk /dev/sdX
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xe0fd90a7.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

> Command (m for help): o
Building a new DOS disklabel with disk identifier 0xacab143c.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

> Command (m for help): p
Disk /dev/sdc: 640.1 GB, 640133946880 bytes
255 heads, 63 sectors/track, 77825 cylinders, total 1250261615 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xacab143c

   Device Boot      Start         End      Blocks   Id  System

> Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
> Select (default p): p
> Partition number (1-4, default 1): 1
> First sector (2048-1250261614, default 2048): 2048

Do not setup your partitions all the way to the last sector. I left 2 GiB at the end of my partition. I once had a drive run out of spare sectors. Rather than warning me, it truncated the data and resized it and it suddenly became too small to pair with the other drive. Choosing to leave even more freespace will help if you ever need to replace a failed disk with one that is not identical to the other. (see fixing a busted array)

As you can see above my sectors are each 512 bytes and I wanted 2GB free.
1024 * 1024 * 1024 * 2 = 2147483648 bytes to save
2147483648 / 512 = 4194304 sectors to save

last sector - sectors to save = end of the partition
1250261614 - 4194304 = 1246067310 last sector

> Last sector, +sectors or +size{K,M,G} (2048-1250261614, default 1250261614): 1246067310

> Command (m for help): t
Selected partition 1
> Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

> Command (m for help): p
Disk /dev/sdc: 640.1 GB, 640133946880 bytes
238 heads, 28 sectors/track, 187614 cylinders, total 1250261615 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xacab143c

   Device Boot      Start         End      Blocks   Id  System
/dev/sdcX            2048  1246067310   623032631+  fd  Linux raid autodetect

> Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Repeat this process for the second drive, but use the same last sector value that you used for the first drive so the partitions are identical.

Create the RAID array

1) Create the RAID array

Note: You want to setup the partitions in your raid, not the raw disk, so use /dev/sdX1 not /dev/sdX.

> sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sdX1 /dev/sdY1
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
mdadm: size set to 622901376K
> Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.

2) Format the partition

I used the default linux filesystem ext4.

> sudo mkfs.ext4 -v /dev/md0
mke2fs 1.42 (29-Nov-2011)
fs_types for mke2fs.conf resolution: 'ext4'
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
38936576 inodes, 155725344 blocks
7786267 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
4753 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
    4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
    102400000

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done     

Configure for use

1) Setup mdadm.conf

Each raid array is given a unique id which can be used by mdadm to setup the raid array during boot. You can use mdadm to get the unique id of your array.

>sudo mdadm --misc --detail /dev/md0
    /dev/md0:
            Version : 1.2
      Creation Time : Mon Feb 18 10:26:45 2013
         Raid Level : raid1
         Array Size : 622901376 (594.05 GiB 637.85 GB)
      Used Dev Size : 622901376 (594.05 GiB 637.85 GB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent

        Update Time : Mon Feb 18 15:38:08 2013
              State : clean
     Active Devices : 2
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 0

               UUID : 449ec249:ca6af101:8ca61121:a4f427b9  <--- unique id
             Events : 4394

        Number   Major   Minor   RaidDevice State
           0       8       33        0      active sync   /dev/sdX1
           2       8       49        1      active sync   /dev/sdY1

Update mdadm.conf

> sudo vim /etc/mdadm/mdadm.conf

Then in the array section I added the line:

ARRAY /dev/md0 449ec249:ca6af101:8ca61121:a4f427b9

2) Mount and Use It

I'll let you take it from here. Enjoy! Don't forget /etc/fstab

Fixing a Busted RAID Array

As mentioned above, I once made the mistake of using the entire drive for my raid array and then found myself with 2 drives that would not assemble.

Below is how I fixed it.

1) Make a backup

If you can make an additional backup than do it. I didn't have enough space so I backed-up what I cared about most - but the process below should allow you to reconfigure your raid array without losing any data.

2) Free up one drive in the array

If your array is already degraded then you can skip to step 3 using the already rejected drive.

If your array has not yet failed, but you know it is flawed you can manually fail one of the drives. This will leave the array running in a degraded mode with only a single drive.

> sudo mdadm --manage /dev/md0 --fail /dev/sdX

3) Clean up and configure

Follow steps under Preparing Your Drives to cleanup and configure /dev/sdX

4) Create a new incomplete raid array

Using the newly prepped partition you can create a new raid array.

> sudo mdadm --create /dev/md1 -v --level=1 --raid-devices=2 /dev/sdX1 missing
mdadm: Note: this array has metadata at the start and
    may not be suitable as a boot device.  If you plan to
    store '/boot' on this device please ensure that
    your boot-loader understands md/v1.x metadata, or use
    --metadata=0.90
mdadm: size set to 622901376K
>Continue creating array? y
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md1 started.

Format the new array as-well

> sudo mkfs.ext4 /dev/md1

5) Mount and copy the data

I’d mount the original array /dev/md0 readonly and the new array /dev/md1 readwrite. Then copy all the data from the old array to the new array.

6) Check everything is copied

I’m a little paranoid so I dumped the file paths and sizes of the data on each drive and compared them with diff.

> sudo find mount_point/of_md0 -type f -printf "%P %s\n" | sort > drive_md0.txt
> sudo find mount_point/of_md1 -type f -printf "%P %s\n" | sort > drive_md1.txt
> diff -u drive_md0.txt drive_md1.txt
--- drive_md0.txt   2013-02-18 15:18:55.721848694 -0500
+++ drive_md1.txt   2013-02-18 15:19:30.909173089 -0500
@@ -33439,7 +33439,6 @@
 afolder/file-x.jpg 57344
 afolder/file-y.jpg 430729
 afolder/file-z.jpg 46778
-.DS_Store 6148
 exported/img-040.jpg 1442109
 exported/img-180.jpg 1703897
 exported/img-186.jpg 2062038    

As you can see I managed to leave behind a hidden file .DS_Store that I don’t care about.

7) Kill the original array

We are now going to kill the last disk from the original array. It’s ok if you triple check your backups, I did.

Unmount /dev/md0 then stop it

> sudo mdadm --manage --stop /dev/md0
mdadm: stopped /dev/md0

This will remove the last remaining drive /dev/sdY from the original raid array.

8) Clean up and configure

Follow steps under Preparing Your Drives to cleanup and configure /dev/sdY

9) Add the other drive to the new array

This will add the drive so that it can begin syncing

> sudo mdadm /dev/md1 --manage --add /dev/sdY1
mdadm: added /dev/sdY1
> cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1 sdX1[2] sdY1[0]
      622901376 blocks super 1.2 [2/1] [U_]
      [>....................]  recovery =  0.7% (4703744/622901376) finish=93.7min speed=109830K/sec

10) Follow the steps in Configure for use

And update mdadm.conf and /etc/fstab to reference the new array.