How to: Replace dead physical disk from Proxmox (PVE) for ZFS pool easily

Warning: You are responsible for your data, if the data is very important to you, highly suggest you to backup/copy all data to somewhere safe before proceeding.

The Error

One of physical disks that zfs pool is using is dead/faulty, the system is in DEGRADED mode

Login to PVE web gui, navigate to the Datacenter -> cluster name -> Disks -> ZFS

Proxmox (PVE) ZFS pool, physical disk dead, DEGRADED
Proxmox (PVE) ZFS pool, physical disk dead, DEGRADED

Now we have to replace this disk (Note that the message says the dead disk “was /dev/sdc1”)

Note down our affected ZFS pool name, “rpool” in this case

The Fix

In this guide, the ZFS pool is running in mirror mode, though the process for other types of configuration will be similar, note that with ZFS Stripe/RAID0 mode it is not possible to replace physical disks

1 Power off the system if it does not support hot swap

2 Replace the dead physical disk (Make sure do not touch other drives, it will be easier if we just swap the disk on the same port/location)

3 Power on the system

4 Login to PVE web gui, navigate to Detecenter -> cluster name -> Disks, find the value under “Device” note it down, in this case it is “/dev/sdc”

Proxmox (PVE) Disks
Proxmox (PVE) Disks

5 Login to proxmox (PVE) terminal (directly or SSH. DO NOT use web gui console!!!)

6 Use following command to add the replaced disk back to the ZFS pool

Warning: After the command is executed, we have to wait until it finishes, so make sure you have planed well, do this in an appropriate time, large disk & datasets will take very long time, so be prepared!

Warning: Make absolutely sure the device label you are entering is correct!

# zpool replace [affected ZFS pool name] [new disk's device label]
zpool replace rpool /dev/sdc

7 Once the above command executed, we can go back to proxmox (PVE) web gui to check the resilver status (Navigate to Datacenter -> clister name -> ZFS)

ZFS resilver
ZFS resilver

(In this case there is 273MB data which only took 1 second to resilver)

8 Wait till it’s all ONLINE with green check mark, the system is fully recovered and ready to be used now

Warning: Do not test random commands/methods, unless if you have full backup up or in testing environment or data is not important at all.

Leave a Reply

Your email address will not be published. Required fields are marked *