May 172020
 

There are two aspects to ZFS that I will be covering here – checksums and error-correcting memory. The first is a feature of ZFS itself; the second is a feature of the hardware that you are running and some claim that it is required for ZFS.

Checksums

By default ZFS keeps checksums of the blocks of data that it writes to later verify that the data block hasn’t been subject to silent corruption. If it detects corruption, it can use resilience (if any) to correct the corruption or it can indicate there’s a problem.

If you have only one disk and don’t ask to keep multiple copies of each block, then checksums will do little more than protect the most important metadata and tell you when things go wrong.

All that checksum calculation does make file operations slightly slower but frankly without benchmarks you are unlikely to notice. And it gives extra protection to your data.

For those who do not believe that silent data corruption exists, take a look at the relevant Wikipedia page. Everyone who has old enough files has come across occasional weird corruption in them, and whilst there are many possible causes, silent data corruption is certainly one of them.

Personally I feel like a probably unnoticeable loss of performance is more than balanced by greater data resilience.

Error-Correcting Memory

(Henceforth “ECC”)

I’m an enthusiast for ECC memory – my main workstation has a ton of it, and I’ve insisted on ECC memory for years. I’ve seen errors being corrected (although that was back when I was running an SGI Indigo2). Reliability is everything.

However there are those who will claim you cannot run ZFS without ECC memory. Or that ZFS without ECC is more dangerous than any other file system format without ECC.

Not really.

Part of the problem is that those with the most experience of ZFS are salty old Unix veterans who would are justifiably contemptuous of server hardware that lacks ECC memory (that includes me). We would no sooner consider running a serious file server on hardware that lacks ECC memory than rely on disk ‘reliability’ and not mirror or RAID those fallible pieces of spinning rust.

ZFS will run fine without ECC memory.

But will it make it worse?

It’s exceptionally unlikely – there are arguable examples of exceptionally esoteric failure conditions that may make things worse (the “scrub of death”) but I side with those who feel that such situations are not likely to occur in the real world.

And as always, why isn’t your data backed up anyway?