Growing a mdadm RAID by replacing disks

Introduction

As it can be read in my related earlier post: Replacing a failed disk in a mdadm RAID I have a 4 disk RAID 5 setup which I initially populated with 1TB disk WD GREEN (cheap, but not really suited for NAS operation). After a few years I started fill up the file system, so I wanted to grow my RAID by upgrading the disks to WD RED 3TB disks. The WD RED disk are especially tailored to the NAS workload. The workflow of growing the mdadm RAID is done through the following steps:

  • Fail, remove and replace each of 1TB disk with a 3TB disk. After each disk I have to wait for the RAID to resync to the new disk.
  • I then have to grow the RAID to use all the space on each of the 3TB disks.
  • Finally, I have to grow the filesystem to use the available space on the RAID device.

The following is similar to my previous article Replacing a failed disk in a mdadm RAID, but I have included it hear for completness.

Removing the old drive

The enclosure I have does not support hot-swap and the disk have no separate lights for each disk, so I need a way to find out which of the disks to replace. Finding the serial number of the disk is fairly easy:

# hdparm -i /dev/sde | grep SerialNo
 Model=WDC WD10EARS-003BB1, FwRev=80.00A80, SerialNo=WD-WCAV5K430328

and luckily the Western Digital disks I have came with a small sticker which shows the serial on the disk. So now I know the serial number of the disk I want to replace, so before shutting down and replacing the disk I marked as failed in madam and removed from the raid:

mdadm --manage /dev/md0 --fail /dev/sde1
mdadm --manage /dev/md0 --remove /dev/sde1

Adding the new drive

Having replaced the old disk and inserted the new disk I found the serial on the back and compared it to the serial of /dev/sde to make sure I was about to format the right disk:

# hdparm -i /dev/sde | grep SerialNo
Model=WDC WD30EFRX-68EUZN0, FwRev=80.00A80, SerialNo=WD-WMC4N1096166

Partitioning disk over 2TB does not work with MSDOS file table so I needed to use parted (instead of fdisk to partition the disk correctly). The “-a optimal” makes parted use the optimum alignment as given by the disk topology information. This aligns to a multiple of the physical block size in a way that guarantees optimal performance.

# parted -a optimal /dev/sde 
(parted) mklabel gpt
(parted) mkpart primary 2048s 100%
(parted) align-check optimal 1
1 aligned
(parted) set 1 raid on                                                    
(parted) print                                                                
Model: ATA WDC WD30EFRX-68E (scsi)
Disk /dev/sde: 3001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
 
Number  Start   End     Size    File system  Name     Flags
 1      1049kB  3001GB  3001GB               primary  raid
 
(parted) quit                                                             
Information: You may need to update /etc/fstab.

Now the disk was ready for inclusion in the raid:

mdadm --manage /dev/md0 --add /dev/sde1

Over the next 3 hours I could monitor the rebuild using the following command:

[root@kelvin ~][20:43]# cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4] 
md0 : active raid5 sde1[5] sdc1[1] sdb1[3] sdd1[4]
      2930280960 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/3] [_UUU]
      [>....................]  recovery =  0.5% (4893636/976760320) finish=176.9min speed=91536K/sec
      bitmap: 4/8 pages [16KB], 65536KB chunk
 
unused devices: <none>

Now this takes around 3 hours in my case per disk and it is very important to wait for the array to have rebuilt after each replacement. After having replaced all 4 disk and the RAID is resynced I can now continue.

Resize the array to the new maximal size

Now all the disks have been replaced with larger 3TB disk, but the raid device is not using the space yet. To instruct mdadm to use all the available space I issue the following commands:

mdadm --grow /dev/md0 --bitmap none
mdadm --grow /dev/md0 --size=max

Now this also takes quite a while to complete – several hours in my case. The RAID is still usable while this is happening.

Resize the filesystem

Finally I had to grow the filesystem to use the new available space on the array. My array is mounted under /home, so I have umount the filesystem first:

umount /home

To make sure everything is okay I force a check of the filesystem before the resizing:

fsck.ext4 -f /dev/md0

Finally I start the resizing of the file system – this is very quick as majority of the work is done later when the filesystem is mounted again by a process called ext4lazyinit. ext4lazyinit took almost a full day to complete:

resize2fs /dev/md0

Related posts

http://rainbow.chard.org/2013/01/30/how-to-align-partitions-for-best-performance-using-parted/
http://zackreed.me/articles/69-mdadm-replace-smaller-disks-with-larger-ones

Only registered users can comment.

  1. It seems using Raid5 with big hard drives is not recommended due to prevalence of URE relative to drive size. Are you not concerned about using Raid5 with 3TB disks?

  2. I might have done it differently if I had started the server now, but the setup is 5 years old and working fine, so I am not planning to tinker with it now.

    The reason I am not to worried is that I have a USB connected 8TB external disk I do rsnapshots backups to every month and a daily offsite backup through rsync to another server.

      1. I grew a 25TB to 50 TB this way but without disabled bitmap and it worked, but it took about 10 hours. Looks like disabling bitmap may help with speed, but if it needs to restart during the resync process, bitmap will help it restart and finish more quickly rather than starting over.

  3. Thomas you are an absolute champ. While I ultimately decided to backup, reformat and restore, the general process of adding disks and expanding the file system worked great. Thank you for posting these valuable details!

Leave a Reply