I have a simple 5x1TB RAIDz1 configuration (tank? pool? vdev?), with a global spare assigned to it. One of the 5 drives in the array is listed in a FAULTED
state (corrupted data
), and the spare is listed as AVAIL
. The array lists as DEGRADED
. Clearly there is no mechanism for the array to gracefully failover to the spare, so how do I force the failover?
I have read many forum posts from many locations, discussing detach
ing the drive, replace
ing the drive with the spare, physically removing the drive, moving the spare to the same slot etc.
The replace
command tells me it cannot replace the drive because the spare is in a spare or replacing config and to try detach
.
The detach
command tells me it is only compatible with mirrors and vdev replacement.
There is no indication that the spare is being used to rebuild the array.
I am not keen to start physically moving drives around, neither the current array member nor the functioning hot spare - I'd prefer not to interrupt anything.
I'd also prefer to not bring the array down, restart he server etc. The system is designed to transparently recover without this, I want to learn how. The data is backed up so I have free reign.
Linux Kernel: 3.10.0-1160
ZFS Version: 5
Update:
Output from replace
function:
[root@localhost ~]# zpool replace <name> 4896358983234274072 ata-WDC_WD10EFRX-68PJCN0_WD-<serial>
cannot replace 4896358983234274072 with ata-WDC_WD10EFRX-68PJCN0_WD-<serial>: already in replacing/spare config; wait for completion or use 'zpool detach'
Output from detach
function:
[root@localhost ~]# zpool detach <name> 4896358983234274072
cannot detach 4896358983234274072: only applicable to mirror and replacing vdevs
ZFS version:
[root@localhost ~]# zfs upgrade
This system is currently running ZFS filesystem version 5.
All filesystems are formatted with the current version.
[root@localhost ~]# modinfo zfs | grep version
version: 0.8.2-1
rhelversion: 7.9
srcversion: 29C160FF878154256C93164
vermagic: 3.10.0-1160.49.1.el7.x86_64 SMP mod_unload modversions
zpool status:
[root@localhost ~]# zpool status <name>
pool: <name>
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub repaired 0 in 0h18m with 0 errors on Mon Apr 4 13:29:39 2022
config:
NAME STATE READ WRITE CKSUM
<name> DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
pci-0000:01:00.0-sas-0x443322110c000000-lun-0 ONLINE 0 0 0
ata-WDC_WD10EFRX-68FYTN0_WD-<serial> ONLINE 0 0 0
pci-0000:01:00.0-sas-0x4433221109000000-lun-0 ONLINE 0 0 0
4896358983234274072 FAULTED 0 0 0 corrupted data
pci-0000:01:00.0-sas-0x443322110b000000-lun-0 ONLINE 0 0 0
spares
ata-WDC_WD10EFRX-68PJCN0_WD-<serial> AVAIL
Update 2:
Restarting the server allowed the replace operation to be carried out without interference or issue. I am now looking into updating ZFS and potentially the kernel, and want to make sure that is a safe operation to be doing with an existing array built within the older system.
zpool status
to your question? And the output (including any error messages) of the commands you ran. Also, where did you get ZFS version 5 from? The latest version of ZFS on Linux is 2.1.4 – cas Apr 06 '22 at 01:13