NetApp is definitely spoiling me with their feature set!
The dedupe thoughts were specific to NetApp - ie, if NetApp were to
perform dedupe on the source, what would life be like. I wasn't aware
that Data Domain wouldn't use NetApp back-end storage... deduping
without having some impressive fault tolerance is NOT a good idea!
Glenn
________________________________
From: Glenn Dekhayser [mailto:gdekhayser@voyantinc.com]
Sent: Tuesday, February 27, 2007 9:26 AM
To: Glenn Walker; Darren Soothill; toasters(a)mathworks.com
Subject: RE: data management through compression for Netapp
Ah, Glenn, you're so spoiled with Netapp! :-)
Problem is, with things like Data Domain, unless you're going to use the
Netapp as back-end FC storage (which isn't supported yet), we're not in
WAFL-land, so in most cases you are writing to the same place. And
without Raid-DP, a single bit error on a raid rebuild could have a much
bigger impact than normal, if that particular bit belongs to a byte
that's referenced mulitple times in the de-duping scheme. Basically,
I'm making the argument that if you're going to use de-duping- Raid-DP
and WAFL is a MUST!
Regarding the caching- de-duping would eliminate the efficiencies of
WAFL read-ahead, since the blocks wouldn't be temporally located. While
the read ahead would pick up the stub, the de-duping software (not being
netapp-aware) would need to go back and retrieve the actual data
referenced by the stub, and there goes your cache hit %.
But, sometimes performance is not important.
Glenn (the other one)
________________________________
From: owner-toasters(a)mathworks.com [mailto:owner-toasters@mathworks.com]
On Behalf Of Glenn Walker
Sent: Tuesday, February 27, 2007 8:41 AM
To: Darren Soothill; toasters(a)mathworks.com
Subject: RE: data management through compression for Netapp
I think that all of the solutions have their place - I'm not as
interested in 'compression' as I am with 'deDupe' - they are separate,
and not entirely equal.
The only issue I can foresee with deDupe, besides adding overhead, is
chance of failure: if the data is stored in a single block, and it is
lost, so are you - but that's why you have parity, so it shouldn't be
any more dangerous than normal data storage (provided the software works
adequately). As for overhead, having multiple individuals 'hit' the
same block would cause performance issues - but I'm not so convinced:
There's really no such thing as 'overwrite' in ONTAP, it's always
writing to a free block - as for reading the single block that has been
deDuped, even though multiple machines hitting the same block should
ordinarily cause higher latencies for everyone, I would think that the
simple fact that there is one block with many accesses would indicate
that it would already be cached - caching it in memory should remove the
read latency penalty.
Stubs carry their own penalties, btw: when the stub is seen, the filer
must use the fpolicy service to communicate with the hosting system to
redirect and\or obtain the actual data: this is multiple layers of
network communication, memory lookup, and disk access before the client
gets the requested data - this (in theory) should be slower than a self
contained system in almost every case.
VTL is definitely a good solution for backups and we're looking into
them now. But HSM products, and archiving products of all types, still
have one common flaw: they move the data around and it must then be
managed. While this would be the most likely best way of dealing with
the massive amounts of data in anyone's environment, it would still be
nice to keep it all in one place. That's just a dream, anyways... I'm
not sure that there would be enough savings through deDupe to actually
accomplish anything like this. Now on the other hand, if the filer were
able to move data to cheaper ATA storage within the same system, and
simply use a redirector in the filesystem (pointer through vvbn), that
could prove useful: you'd have the same lookup of vvbn-to-pvbn in order
to find the data, but the actual blocks would be elsewhere within the
system. No additional layers of stubs, or redirection through some
other HSM system.
Sorry for the long email this early in the morning,
Glenn
________________________________
From: owner-toasters(a)mathworks.com [mailto:owner-toasters@mathworks.com]
On Behalf Of Darren Soothill
Sent: Tuesday, February 27, 2007 12:36 AM
To: toasters(a)mathworks.com
Subject: RE: data management through compression for Netapp
The issue with any compression technology is the latency it adds to any
workload and what happens with a large number of people doing the same
thing.
Would you not be better at looking at an archiving product that can
archive off the older files leaving stub files behind so any application
can carry on working with no interruption. Assuming that the data is
stored as lots of files.
Perhaps you need to look at some traditional HSM type product that
integrates with a tape library except instead of using a physical tape
library you use a VTL which will do the deduplication and compression
for you. You then have the added advantage that the VTL can spit out
copies of the virtual tapes to be taken off site for DR. This type of
solution should then be platform independent.
________________________________
From: owner-toasters(a)mathworks.com [mailto:owner-toasters@mathworks.com]
On Behalf Of Glenn Walker
Sent: 27 February 2007 03:18
To: Glenn Dekhayser; Wilbur Castro; toasters(a)mathworks.com
Subject: RE: data management through compression for Netapp
I've also heard good things about Data Domain.
On the other hand, does anyone have any experience with the A-SIS deDupe
technology from NetApp? Sounds great, except for the 'out of band'
nature of it.
Glenn
________________________________
From: owner-toasters(a)mathworks.com [mailto:owner-toasters@mathworks.com]
On Behalf Of Glenn Dekhayser
Sent: Monday, February 26, 2007 8:36 PM
To: Wilbur Castro; toasters(a)mathworks.com
Subject: RE: data management through compression for Netapp
Wilbur:
What are your performance requirements? Adding compression/deDup into
this has a serious impact on I/O performance, it will pretty much
totally screw any read-ahead optimization.
If you still want to go ahead, look at Data domain's Gateway series. It
doesn't formally support Netapp, but i'm sure you could use it. This
will do exactly what you want, as long as you want NFS!
Glenn (the other one)
________________________________
From: owner-toasters(a)mathworks.com [mailto:owner-toasters@mathworks.com]
On Behalf Of Wilbur Castro
Sent: Monday, February 26, 2007 4:55 PM
To: toasters(a)mathworks.com
Subject: data management through compression for Netapp
Hi toasters,
We have a couple of 100 TBs of heterogenous storage (some from netapp),
and see that our storage grows close to 60% over year. We were looking
at alternatives for managing this data growth. Compression was one of
the techniques we were considering for our nearline and (possibly)
primary storage. Our applications cannot change to do their own
compression, so it boils down to doing this in the storage layer or
through an external device. Also, we'd like to not have any performance
impact and compression to happen transparently. Deduplication technology
from storage vendors would help, but it is not a hetrogenous solution.
I am not aware of any compression technology from netapp. Are you folks
aware of any solutions? Would love to hear your experience with those or
other alternative ways you deal with the storage growth problem while
managing costs.
Thx,
Wilbur