Nov 072007
 

I have been spending some time looking up information on ZFS for OSX because I’ve used ZFS under Solaris and would quite like it on my new Macbook. In many of the places I looked, there were tons of comments wondering why ZFS would be of any use for ordinary users. Oddly the responders indicating features that are more useful for servers than workstations. The doubters were responding with “So?”.

This is perhaps understandable because most of the information out there is for Solaris ZFS and tends to concentrate on the advantages for the server (and the server administrator). This is perhaps unfortunate because I can see plenty of advantages for ordinary users.

I will go through some of the advantages of ZFS that may work for ordinary users. In some cases I will give examples using a command-line. Apple will undoubtedly come up with a GUI for doing much of this, but I don’t have access to that version of OSX and the command-line still works.

ZFS Checks Writes

Unlike most conventional filesystems, ZFS does not assume that hard disks are perfect and uses checks on the data it writes to ensure that what gets read back is what was written. As each “block” is written to disk, ZFS will also write a checksum; when reading a “block” ZFS will verify that the block read matches the checksum.

This has already been commented on by people using ZFS under Solaris as showing up problematic disks that were thought to be fine. Who wants to lose data ?

This checksum checking that zfs does will not protect from the most common forms of data loss … hard disk failures or accidentally removing files. But it does protect against silent data corruption. As someone who has seen this personally, I can tell you it is more than a little scary with mysterious problems becoming more and more common. Protecting against this is probably the biggest feature of ZFS although it is not something that is immediately obvious.
ZFS Filesystems Are Easy To Create

So easy in fact that it frequently makes sense to create a filesystem where in the past we would create a directory. Why? So that it is very easy and quick to see who or what is using all that disk space that got eaten up since last week.

Lets assume you currently have a directory structure like :-

/Users/mike
/Users/john
/Users/stuart
/Users/stuart/music
/Users/stuart/photos

If those directories were ZFS filesystems you could instantly see how much disk space is in use for each with the command zfs list

% zfs list
NAME                                 USED   AVAIL   REFER   MOUNTPOINT
zpool0                               3.92G  23G     3.91M   /zpool0
zpool0/Users/mike                    112M   23G     112M    /Users/mike
zpool0/Users/john                    919M   23G     919M    /Users/john
zpool0/Users/stuart                  309M   23G     309M    /Users/stuart
zpool0/Users/stuart/music            78G    23G     78G     /Users/stuart/music
zpool0/Users/stuart/photos           12G    23G     12G     /Users/stuart/photos

With one very simple (and quick) command you can see that Stuart is using the most space in his ‘music’ folder … perhaps he has discovered Bittorrent! The equivalent for a series of directories on a normal filesystem can take a long time to complete.

With any luck Apple will modify the Finder so that alongside the option to create a new folder is a new option to “create a new folder as a ZFS filesstem” (or something more user-friendly).

It may seem silly to have many filesystems when we are used to filesystems that are fixed in size (or are adjustable but in limited ways), but zfs filesystems are allocated out of a common storage pool and grow and shrink as required.

ZFS Supports Snapshots

Heard of “Time Machine” ? Nifty isn’t it ?

Well ZFS snapshots do the same thing … only better. Time Machine is pretty much limited to an external hard disk which is all very well if you happen to have one with you, but not much use when you only have a single disk. ZFS snapshots work “in place” and are instantaneous. In addition you can create a snapshot when you want to … for instance just before starting to revise a large document so that if everything goes wrong you can quickly revert.

Time Machine has one little disadvantage … if you modify a very large file, it will need to duplicate the entire file multiple times. For instance if you have a 1Gbyte video that you are editing over multiple days, Time Machine will store the entire video every time it ‘checkpoints’ the filesystem. This can add up pretty quick, and could be a problem if you work on very large files. Zfs snapshots stores only the changes to the file (although an application can accidentally ‘break’ this) making it far more space efficient.

One thing that zfs snapshots does not do that Time Machine does, is to ensure you have a backup of your data on an external hard disk. The zfs equivalent is the zfs send command which sends a zfs snapshot “somewhere”. The somewhere could be to a zfs storage pool on an external hard disk, to a zfs pool on a remote server somewhere (for instance an external hard disk attached to your Mac at work to give you offsite backups), or even to a storage server that does not understand ZFS! And yes you can send “incrementals” in much the same way too.

Currently using zfs send (and the opposite zfs receive) requires inscrutable Unix commands, but somebody will soon come up with a friendlier way of doing it. Oh! It seems they already have!

Unfortunately I’ve found out that using ZFS with Leopard is currently (10.5.0) pretty difficult … the beta code for ZFS is hard to get hold of, and may not be too reliable. Funnily enough this mirrors what happened when Solaris 10 first came out … ZFS was not ready until the first update of Solaris 10!

Unfortunately it seems that Apple have retreated back from using ZFS in OSX which is a great shame, and until they come up with something better, we are stuck with HFS+, which means not only do we lack the features of a modern filesystem, but we are also stuck with slow fsck times. Ever wonder why sometimes that blue screen of a Mac starting sometimes takes much longer ? The chances are that it is because a filesystem is being checked – something that isn’t necessary with a modern filesystem.

Aug 252007
 

If you’re hoping to read about Linux finally getting ZFS (except as a FUSE module) then you are going to be disappointed … this is merely a rant about the foolishness shown by the open-source world. It seems that the reason we won’t see ZFS in the Linux kernel is not because of technical issues but because of licensing issues … the two open-source licenses (GPL and CDDL) are allegedly incompatible!

Now some may wonder why ZFS is so great given that most of the features are available in other storage/filesystem solutions. Well as an old Unix systems administrator, I have seen many different storage and filesystem solutions over time … Veritas, Solaris Volume Manager, the AIX logical volume manager, Linux software RAID, Linux LVM, …, and none come as close to perfection as ZFS. In particular ZFS is insanely simple to manage, and those who have never managed a server with hundreds of disks may not appreciate just how desireable this simplicity is.

Lets take a relatively common example from Linux; we have two disks and no RAID controller so it makes sense to use Linux software RAID to create a virtual disk that is a mirror of the two physical disks. Not a difficult task. Now we want to split that disk up into seperate virtual disks to put filesystems on; we don’t know how large the different filesystems will become so we need to have some facility to grow and shrink those virtual disks. So we use LVM and make that software RAID virtual disk into an LVM “physical volume”, add the “physical volume” to a volume group, and finally create “logical volumes” for each filesystem we want. Then of course we need to put a filesystem on each “logical volume”. None of these steps are particularly difficult, but there are 5 seperate steps, and the separate software components are isolated from each other … which imposes some limitations.

Now imagine doing the same thing with ZFS … we create a storage pool consisting of two mirrored physical disks with a single command. This storage pool is automatically mounted as a filesystem ready for immediate use. If we need separate filesystems, we can create each with a single command. Now we come to the advantages … filesystem ‘snapshots’ are almost instantaneous and do not consume additional disk space until changes are made to the original filesystem at which point the increase in size is directly proportional to the changes made. Each ZFS filesystem shares the storage pool with the size being totally dynamic (by default) so that you do not have a set size reserved for each filesystem … essentially the free space on every single filesystem is available to all filesystems.

So what is the reason for not having ZFS under Linux ? It is open-source so it is technically possible to add to the Linux kernel. It has already been added to the FreeBSD kernel (in “-CURRENT”) and will shortly be added to the released version of OSX. Allegedly because the license is incompatible. The ZFS code from Sun is licensed under the CDDL license and the Linux kernel is licensed under the GPL license. I’m not sure how they are incompatible because frankly I have better things to do with my time than read license small-print and try to determine the effects.

But Linux (reluctantly admittedly) allows binary kernel modules to be loaded into the kernel and the license on those certainly isn’t the GPL! So why is not possible to allow GPLed code and CDDLed code to co-exist peacefully ? After all it seems that if ZFS were compiled as a kernel module and released as a binary blob, it could then be used … which is insane!

The suspicion I have is that there is a certain amount of “not invented here” going on.