The way ZFS finds places to put data is the same as metadata. If you have a long running history of volume creation, snapshots, clones, &c. with any great quantity (think self-service wrappers for lots of folks), operations like 'zfs list' take progressively longer over time. The only fix is to send the volume data elsewhere, recreate the pool, and recv the data back into it. There have also been bugs related to soft errors on particular types of devices over the last handful of years that would cause an inordinate number of false declarations of failed disks. Many of those bugs have been fixed, but the fundamental flaw of "fragmented" metadata across the pool does not appear to have been fixed in spite of the corporate sponsorship of folks like Joyent and Delphix. We struggled for years (2008 to 2012) with ZFS trying to work around this problem, but eventually gave up and dumped it.
For secondary we managed to glue together Coraid for the disk fabric, rack mount servers with Veritas on them for NFS heads, and an Avere cluster in front of that for working dataset performance. Completely unsupported by Symantec but it works fine as long as you don't need I/O fencing (because SCSI3-PR doesn't work with Coraid and the non-PR method doesn't appear to work either; but can't get Symantec to look at it because Coraid isn't on the HCL yet). The 1PB of Coraid ran me $0.25/GB and after the rack mount servers and the Avere cluster, the TCO was $0.52/GB. Not bad for a rack and 7 kWh. Not as reliable as the Infinidat seven 9s figures but pretty okay for a solution that wasn't engineered.
_______________________________________________