You can read more on my homelab and datahoarding problem here and here.
Today I scored two recertified 10TB HGST drives for very little. Normally I’d go for the brand new stuff, but this deal was too good to be true.
My main goal is to check if these recertified disks are worth the money/effort. Next up I want to experiment with a new ZFS pool setups. (You still cannot remove a raidz vdev from your pool in 20241)
Current state of affairs#
Right now my main ZFS storage pool looks like this:
tank
raidz1
4x 3TB WD Redraidz1
4x 8TB WD Whiteraidz1
4x 14TB WD White
All in all good for 100TB raw storage space and roughly 75TB of usable storage. And yet, it’s getting full. Linux ISO’s take up a lot of space. I also have a 2TB (2x 1TB SSD mirrors) pool for VM storage and a single 3TB drive as a backup intermediate disk (i.e. backups are copied there, then uploaded elsewhere).
As stated, I cannot remove any of those raidz1
vdevs. My only options are to build a new pool with new disks or replace disks in this pool.
It’s all about trust#
So, new drives I normally trust. If they don’t work, they don’t work. But if they spin-up, I’ve alwasy assumed they’d be okay. Yeah, I know.
But, since I now have a pair of 10TB recertified drives, I’d like to be sure they’re good to go. Recertified in this context probably means they were retired from a data center somewhere. Their power-on time is a little over 5 years with a production data of December 2017.
To make sure these disks are good to go, I’m going to run a bunch of tests against them to see if they hold up.
- SMART conveyance test
- SMART extended test
badblocks
S.M.A.R.T.#
If you don’t know about different SMART tests, here’s a refresher. I’m skipping the short test, because I’m running a long test.
Short Checks the electrical and mechanical performance as well as the read performance of the disk. Electrical tests might include a test of buffer RAM, a read/write circuitry test, or a test of the read/write head elements. Mechanical test includes seeking and servo on data tracks. Scans small parts of the drive’s surface (area is vendor-specific and there is a time limit on the test). Checks the list of pending sectors that may have read errors, and it usually takes under two minutes.
Long/extended A longer and more thorough version of the short self-test, scanning the entire disk surface with no time limit. This test usually takes several hours, depending on the read/write speed of the drive and its size.
Conveyance Intended as a quick test to identify damage incurred during transporting of the device from the drive manufacturer to the computer manufacturer. Only available on ATA drives, and it usually takes several minutes.
Running these is as easy as:
$ sudo smartctl -t <short|long|conveyance> /dev/sda
If you want to know how long theses tests are going to take:
$ sudo smartctl -c /dev/sda
...
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: (1144) minutes.
...
So it’ll take a little of 19 hours to do a full extended SMART test.
Badblocks#
Badblocks is a utility, well, searches a disk for bad blocks. It will write data to the entire disk an verify each block to good.
Naively I ran the following to do a full write test with progress indicator and verbose output:
$ sudo badblocks -wsv /dev/sda
badblocks: Value too large for defined data type invalid end block (9766436864): must be 32-bit value
As it turns out, badblocks
uses a default block size of 1024, meaning it cannot - out of the box - scan disks 8TB or larger. Let’s figure out what blocksize our disk uses, and plug that information into badblocks
.
$ sudo blockdev --getbsz /dev/sda
4096
$ sudo badblocks -t random -w -v -s -b 4096 /dev/sda
Checking for bad blocks in read-write mode
From block 0 to 2441609215
Testing with random pattern: 5.97% done, 39:45 elapsed. (0/0/0 errors)
And we’re in business. Well, now we wait a few hours / days for badblocks
to complete. I might even do a second pass just for the fun of it.
What’s next?#
After I’ve run at least two badblocks passes and both they conveyance and extended SMART tests on this disk, I’m going to do it on the other one as well. If that all goes well I’ll probably put them as a ZFS mirror pair in a new pool and do some testing there.
There are good technical reasons for that, I know. Wish I’d known about it before I built my pool, though. ↩︎