Thank you for a NA docs pointer from a list member and the opportunity for a little RTFM before I reply to the list!
Looks like my colleague’s recollection of earlier versions still applies to 8.3. Essentially, we've been snapmirroring un-deduped data.
My next question is, is it realistic to run dedupes hourly on 30-40 volumes totalling ~100TB? Because that's a much easier proposition than amending our SLAs to lengthen our snapshot cycles.
And to get both on the same cycle, is it possible to make snapshots dependent on dedupe finishing, or do we just assume dedupe will complete, and if so, if it doesn't, what are the consequences? For example, if a dedupe that usually finishes at 5 minutes after the hour isn't done, and snapmirror runs then, will snapmirror then be syncing a full hour of full-sized changes?
Last question for now: assuming both are on the same schedule, how do I get the current lost space back? Will it be reclaimed when the schedules are synced and the snapshots have rolled off? Or do I need to destroy and recreate the target volumes?
Hope to hear from you, especially from any other shop running both snapmirror and dedupe.
Randy
(and if a solution requires DOT9 we do have an upgrade on our roadmap)
Replying to just toy for now. Hopefully this will help and you can report back Look at this... https://library.netapp.com/ecm/ecm_download_file/ECMLP2348026
Page 142.
I think you may need to work out some scheduling and that may help. Going to look a bit more...i have another idea, but that may only be ONTAP 9 related.
Get Outlook for iOShttps://aka.ms/o0ukef
_____________________________ From: Rue, Randy <rrue@fredhutch.orgmailto:rrue@fredhutch.org> Sent: Saturday, December 17, 2016 11:14 AM Subject: snapmirror source and target aggregate usage don't match? To: <toasters@teaparty.netmailto:toasters@teaparty.net>
Hello All,
We run two 8.3 filers with a list of vservers and their associated volumes, with each volume snapmirrored (volume level) from the active primary cluster to matching vserver/volumes on the passive secondary.
Both clusters have a similar set of aggregates of just about equal size. Both clusters’ aggregates contain the same list of volumes of the same size, with the same space total/used/available on both sets.
But on the target cluster the same aggregates are reporting 30% more used space.
This is about on par with the dedupe savings we’re getting on the primary so when I first noticed this my thought was to check that dedupe was OK on the target. But if you look in the webUI, it reports that no “storage efficiency” is available on a replication target, and ended up thinking this meant that the secondary data would have to be full-sized. I even recall asking someone and having this confirmed, but can’t recall if that came from the vendor SE or our VAR SE or a support tech or.
Now we’re approaching the space limit of the secondary cluster and I’m looking deeper. At this point, as it appears that for each volume the total/used/free space matches after dedupe on the source, I’m thinking that dedupe properties aren’t exposed on the target but the data is still a true copy of the deduped original. This is supported by being able to view dedupe stats on the target via the CLI that show the same savings as on the source.
Note that we’re also snapshotting these volumes, and while we’re deduping daily, we’re snapshotting hourly. A colleague mentioned remembering that this could mean mirrored data that’s not deduped yet is being replicated full-size. But if so, wouldn’t this be reflected in the dedupe stats on the target?
OK, just found that “storage aggregate show -fields usedsize,physical-used” on the primary/source cluster shows that used and physical-used are about identical for all aggrs. On the secondary/target, used is consistently larger than physical-used and the total difference makes up the 30% I’m “missing.”
Is this a problem with my reporting? Are we actually OK and I need to look at physical-used instead of used? Or if we’re not OK, where is the space being used and can I get it back?
Thanks in advance for your guidance…
Randy
Trial by fire?
You will likely need to get a rough baseline of how long a single dedupe run takes on a source volume. Then look at doing multiple volumes and see how much the process slows down. With Spinnng media, the dedupe process can take a while on larger volumes.
You may need to get a little creative and group some of the volumes together and created schedules for the deduping to happen at specific times. I really doubt you are going to be able to dedupe 30-40 volumes totalling ~100TB every hour. You should probably stagger and run throughout the day with different schedules
you can create custom vol eff. policies that only run for a certain duration ("-" is default for no duration or up to 999 hours (whole numbers in hours)) and the qos-policy (for vol eff operations!) can be set to background (run, but do not impede) or best-effort (may slightly impact operations)
as far as the timing, you may want to look at scripting with powershell or the NetApp SDK. Wait for a vol eff operation to finish, then do snapshots and/or mirroring.
We could spend lots of time on this. I hope this helps at least a little.
--tmac
*Tim McCarthy, **Principal Consultant*
*Proud Member of the #NetAppATeam https://twitter.com/NetAppATeam*
*I Blog at TMACsRack https://tmacsrack.wordpress.com/*
On Sat, Dec 17, 2016 at 2:13 PM, Rue, Randy rrue@fredhutch.org wrote:
Thank you for a NA docs pointer from a list member and the opportunity for a little RTFM before I reply to the list!
Looks like my colleague’s recollection of earlier versions still applies to 8.3. Essentially, we've been snapmirroring un-deduped data.
My next question is, is it realistic to run dedupes hourly on 30-40 volumes totalling ~100TB? Because that's a much easier proposition than amending our SLAs to lengthen our snapshot cycles.
And to get both on the same cycle, is it possible to make snapshots dependent on dedupe finishing, or do we just assume dedupe will complete, and if so, if it doesn't, what are the consequences? For example, if a dedupe that usually finishes at 5 minutes after the hour isn't done, and snapmirror runs then, will snapmirror then be syncing a full hour of full-sized changes?
Last question for now: assuming both are on the same schedule, how do I get the current lost space back? Will it be reclaimed when the schedules are synced and the snapshots have rolled off? Or do I need to destroy and recreate the target volumes?
Hope to hear from you, especially from any other shop running both snapmirror and dedupe.
Randy
(and if a solution requires DOT9 we do have an upgrade on our roadmap)
Replying to just toy for now. Hopefully this will help and you can report back Look at this... https://library.netapp.com/ecm/ecm_download_file/ECMLP2348026
Page 142.
I think you may need to work out some scheduling and that may help. Going to look a bit more...i have another idea, but that may only be ONTAP 9 related.
Get Outlook for iOS https://aka.ms/o0ukef
From: Rue, Randy rrue@fredhutch.org Sent: Saturday, December 17, 2016 11:14 AM Subject: snapmirror source and target aggregate usage don't match? To: toasters@teaparty.net
Hello All,
We run two 8.3 filers with a list of vservers and their associated volumes, with each volume snapmirrored (volume level) from the active primary cluster to matching vserver/volumes on the passive secondary.
Both clusters have a similar set of aggregates of just about equal size. Both clusters’ aggregates contain the same list of volumes of the same size, with the same space total/used/available on both sets.
But on the target cluster the same aggregates are reporting 30% more used space.
This is about on par with the dedupe savings we’re getting on the primary so when I first noticed this my thought was to check that dedupe was OK on the target. But if you look in the webUI, it reports that no “storage efficiency” is available on a replication target, and ended up thinking this meant that the secondary data would have to be full-sized. I even recall asking someone and having this confirmed, but can’t recall if that came from the vendor SE or our VAR SE or a support tech or.
Now we’re approaching the space limit of the secondary cluster and I’m looking deeper. At this point, as it appears that for each volume the total/used/free space matches after dedupe on the source, I’m thinking that dedupe properties aren’t exposed on the target but the data is still a true copy of the deduped original. This is supported by being able to view dedupe stats on the target via the CLI that show the same savings as on the source.
Note that we’re also snapshotting these volumes, and while we’re deduping daily, we’re snapshotting hourly. A colleague mentioned remembering that this could mean mirrored data that’s not deduped yet is being replicated full-size. But if so, wouldn’t this be reflected in the dedupe stats on the target?
OK, just found that “storage aggregate show -fields usedsize,physical-used” on the primary/source cluster shows that used and physical-used are about identical for all aggrs. On the secondary/target, used is consistently larger than physical-used and the total difference makes up the 30% I’m “missing.”
Is this a problem with my reporting? Are we actually OK and I need to look at physical-used instead of used? Or if we’re not OK, where is the space being used and can I get it back?
Thanks in advance for your guidance…
Randy
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters