I did find a white paper that also gives some other strategies:
http://www.netapp.com/library/tr/3483.pdf
-- Adam Fox adamfox@netapp.com
________________________________
From: owner-dl-toasters@jhereg.corp.netapp.com [mailto:owner-dl-toasters@jhereg.corp.netapp.com] On Behalf Of Fox, Adam Sent: Thursday, March 08, 2007 9:44 PM To: Steven Mandrake; toasters@mathworks.com Subject: RE: mounting snapshots
As one who struggled with being able to explain space reservations for a long time, I sympathize with the techpubs folks so I'm going to give it a shot.
WARNING: If you understand how space reservations work and don't want to read a rather long-winded explanation, feel free to delete this now.
The first basic concept is that pretty much all SANs virtualize LUNs somehow. I mean, to me, that's the point of a SAN. If I didn't want to virutalize my LUNs, I'd just hook up a direct attached JBOD and be done with it. So, WAFL virtualizes LUNs as files with special attributes. But if you understand how snapshots work in a NAS context, you're about 70% of the way there when it comes to the SAN context.
First things first. You only need space reservations (or some alternative) if you volume has LUNs and that volume has snapshots. Let's assume you have LUNs, but if you have LUNs and the volume is not being snapshot'ed, then definitely turn off space reservations as they will never be used. Without snapshots, WAFL behaves exactly like most traditional filesystems, it allocates new blocks on writes and deletes freed blocks as the client (or in this case the host) requests.
Now, as soon as you introduce snapshots into this game, the rules change. Just like in the NAS context, when a volume is snapshot'ed all of the data blocks are, in effect, frozen and if a block if over-written a new block is allocated instead in order to preserve the data in the snapshot. So, the question becomes: where do these new blocks come form?
At first, WAFL treats the LUN just like any other file. So, if there is a snap reserve in the volume, it would account for (not move) the old block in snap reserve, then allocate new block. This leaves the usable filesystem size exactly the way it was when you started since the active filesystem can't write to the snap reserve. However, most SAN volumes don't have snap reseve so this step is usually skipped, but if they did, it's important to realize that this would work exactly the same way as a NAS volume.
Next, if there's no snap reserve to play with (which is the typical case), the block will be allocated from the free space in the volume. Again, no change from the NAS context. If your snap reserve is full, the snapshot space "bleeds into" data space and net result is the available space in the volume goes down (unlike the snap reserve case where it's stays the same).
Hopefully, everyone is following because at this point things start to change.
Unlike NAS clients, SAN hosts see free space with respect to their local filesystem, not WAFL. The host does not see the underlying WAFL layer and ONTAP does not see the filesystem inside the LUN. So at some point, if at least one snapshot exists, there can be the case where the underlying WAFL volume runs out of blocks, but the filesystem inside the LUN is reporting free space. If you've followed so far, hopefully you see why. WAFL must be able to keep the older data blocks because they are in a snapshot, but the filesystem inside the LUN has no knowlege of it so it thinks the blocks was freed. Well, SAN hosts REALLY don't like it when they are told there is space to write so they write and are then told ENOSPC (out of space). Many apps will crash, blue screen can abound, and generally a bad time is had by all. It's not the host's fault, it has no way of knowing what's going on.
There are several ways to handle this situation. Each involves sacrificing something. You just have to decide which ftis your needs.
The first way is through space reservations. If we hold enough space equal to the entire LUN back and only use it when we hit this condition (no snap reserve space, no free space in the volume), then we will never hit this condition because we can over-write the entire LUN and the writes will never fail. But there's a price, and in this case, the price is new snapshots. Once a volume goes into this condition where it's using space reservations for new writes, no new snapshots can be created until more space is allocated or snapshots are deleted to free up more space.
Now the thing most folks don't like about full space reservations is holding all of that space back, especially if it's for data where the likleyhood of such a condition happening is not very high. So, for them, WAFL has the concept of a fractional reserve. This is where you don't reserve 100%, but a smaller amount. But using this without any other methods is kind of like using thin provisioning all by itself. If you don't watch what's going on, you could hit the bad condition where your host dies because the volume is out of space. But if you know your data patterns, maybe this is an option.
Starting in ONTAP 7.1, ONTAP introduced some alternative methods. These can be used by themselves or in combination with the other methods of handling this condition. But, again, each has a price (there's no free lunch).
One is called autogrow. You can set up a policy where when a FlexVol gets full, it will automatically grow. So it's possible to use this as a backstop against your fractional reserve or eliminate space reservagtions altogether and just make sure you've got space in your aggregate to handle it. The price here is obviously aggregate space and using this by itself does not gurantee you will never hit the condition because you could run out of space in the aggregate, or hit the policy limit.
Another method is snapshot autodelte. You can set policy up that when a volume gets full, WAFL will begin to delete old snapshots until there is enough space. There are exceptions to the rule (I believe SnapMirror baselines and active NDMP dump snapshots wont' be deleted, but you get the idea). So by deleting older snapshots you free up blocks to allow the writes. Of course the price here is you might have wanted those old snapshots around.
You can use these methods together so WAFL will try to grow first, then delete snapshots or vice versa. And both of these methods can be used with or without space reservations (full or fractional).
So bottom line. The order of block allocation when snapshots are involved. 1) snap reserve, 2) free space in volume, 3) space reservations. If you understand this, you'll usually get what's going on. Which method (or combination of methods) you use to deal with this space full condition is totally up to you, your site, and your priorities. There's no one answer as to which is the "best practice". The best practice is the one that makes sense to you and your data. And, of course, you could deploy multiple methods on multiple aggrs and/or volumes as the data and SLAs dictate.
Anyway, I hope this is clear. If nothing else you get some sympathy for the tech writers since this is not an easy subject. It took me a couple of years to be able to explain it this way and have anyone else understand it.
-- Adam Fox adamfox@netapp.com
________________________________
From: owner-dl-toasters@jhereg.corp.netapp.com [mailto:owner-dl-toasters@jhereg.corp.netapp.com] On Behalf Of Steven Mandrake Sent: Thursday, March 08, 2007 7:36 PM To: toasters@mathworks.com Subject: Re: mounting snapshots
Hi Everyone,
thank you for the very beneficial lesson. So I understand now that fractional reserve is to allow for new writes to a lun (especially if you are taking snapshots)
But what is the best way to set this? --- It appears that vol fractional reserve is set at the CLI level, but seems like I can also set this up in SME & SD. If I set fractional reserve to 50%, it shows in SME/SD that all the luns are set to 50%. But what does this do to checking/unchecking 'lun space reserve' in FilerView --- What is the point of doing this?
I have read all the tech docs from ntap, and they only explain these in the most convulted way possible. Wish they put a doc on best practices on when to turn these on/off.
Is anything ever written & deleted from the fractional reserver space?? Or is it just there, in case things go haywire?
On 3/8/07, Fox, Adam Adam.Fox@netapp.com wrote:
Don't feel too bad...space reservations takes time to digest. Space reservations don't affect your ability to mount snapshots. Doing fractional reserves is one way to handle snapshots on LUN volumes. Doing so carries some risk in that the volume can run out of space before the LUN does. But if you know your change rate, you can use fractional reserve if it fits your needs. Another method you will find customers on 7.1 and later using is to not use space reservations at all, leaving a reasonable amount of free space in the volume, then setting up policies for autogrow and/or auto snapshot delete in case you run over.
-- Adam Fox NGS Tools Developer adamfox@netapp.com
________________________________
From: owner-dl-toasters@jhereg.corp.netapp.com [mailto:owner-dl-toasters@jhereg.corp.netapp.com] On Behalf Of Steven Mandrake Sent: Thursday, March 08, 2007 12:16 AM To: toasters@mathworks.com Subject: mounting snapshots Hello, I am having the hardest time understanding fractional space reservations for volumes and lun space reservations. If I reduce fractional space reservations to a volume that contain luns, what happens to lun space reservations? Is there a way to throttle lun space reservations -- as far as I can tell it's just a check box in Filerview.... The main question I have is --- does setting fractional space reservation affect my ability to mount snapshots. I assume that mounting snapshots does not take any space since it's all read-only.... or am i wrong in assuming this? My assumption right now is that fractional space reserve is there if I ever need to mount a snapshot to be read/write --- am I dead wrong? Thanks, Steve