Replacing a Failed Disk on a Network Appliance (Netapp)

Replacing a failed disk on any system can be a scary thought, but a lot of technology has been put together on the NetApp to make this as easy and painless as possible.
Without going in to all of the talk about what RAID levels to use and the use of DP (dual parity) disks, there is an easy procedure for replacement. If the disk has not completely failed, you may see messages like:

Disk Inventory Monitor: Dual Loop Configuration WARNING!!!
The system has detected an inconsistency in the device maps
of paired-channels listed below. Review the device maps of
both channels to check for disks which may not be responding
on both A and B device ports.
channels: 0c and 0a.

In this case, you will want to find out what device is failing. Execute :

storage show disk -p

The output will look like:

PRIMARY PORT  SECONDARY PORT SHELF BAY
------- ----  --------- ---- ---------
0a.16    A    0c.16      B     1    0
0c.17    B    0a.17      A     1    1
0c.18    B    0a.18      A     1    2
0c.19    B    0a.19      A     1    3
0a.20    A    0c.20      B     1    4
0a.21    A    0c.21      B     1    5
0c.22    B    0a.22      A     1    6
0c.23    B    0a.23      A     1    7
0c.24    B    0a.24      A     1    8
0c.25    B    0a.25      A     1    9
0c.26    B    0a.26      A     1   10
0c.27    B    0a.27      A     1   11
0c.28    B    0a.28      A     1   12
0c.29    B    0a.29      A     1   13
0c.32    B                     2    0
0c.33    B    0a.33      A     2    1
0a.34    A    0c.34      B     2    2
0a.35    A    0c.35      B     2    3
0a.36    A    0c.36      B     2    4
0a.37    A    0c.37      B     2    5
0a.38    A    0c.38      B     2    6
0a.39    A    0c.39      B     2    7
0a.40    A    0c.40      B     2    8
0a.41    A    0c.41      B     2    9
0a.42    A    0c.42      B     2   10
0a.43    A    0c.43      B     2   11
0c.44    B    0a.44      A     2   12
0c.45    B    0a.45      A     2   13
0d.16    B    0b.16      A     1    0
0b.17    A    0d.17      B     1    1
0d.18    B    0b.18      A     1    2
0b.19    A    0d.19      B     1    3
0b.20    A    0d.20      B     1    4
0b.21    A    0d.21      B     1    5
0d.22    B    0b.22      A     1    6
0d.23    B    0b.23      A     1    7
0b.24    A    0d.24      B     1    8
0d.25    B    0b.25      A     1    9
0b.26    A    0d.26      B     1   10
0d.27    B    0b.27      A     1   11
0d.28    B    0b.28      A     1   12
0d.29    B    0b.29      A     1   13

Notice that disk 0c.32 is only showing up on one channel. That is the one that is causing the problem. Let’s fail the disk over to the spare:

disk replace start 0c.32 0d.29

The system will prompt you with the normal “are you sure” message, and, with the positive response, the system will begin the process of replacing the failing drive and rebuilding the contents on the new drive.

You can monitor the progress with “sysconfig -r”.

Once the reconstruction has completed, you will want to remove the drive. In order to help you properly identify the drive, you can have the RED LED blink on the drive in a consistent manner to make it obvious to the person who will be pulling the drive.

priv set advanced
blink_on 0c.32

priv set admin

If the disk is completely broken and the LED won’t blink, just turn on the RED LED’s on either side of the bad disk. That way, the one in the middle (not blinking) is easy to find and replace.

Now that we are ready to replace the drive, execute:

disk swap

Pull the drive from the system and insert the new drive.

disk swap

And your new drive has been added to the system as a spare. Confirm this with ‘volstatus -r’ or ‘sysconfig -r’.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.