NetApp is definitely spoiling me with
their feature set!
The dedupe thoughts were specific to
NetApp – ie, if NetApp were to perform dedupe on the source, what would
life be like. I wasn’t aware that Data Domain wouldn’t use NetApp
back-end storage… deduping without having some impressive fault tolerance
is NOT a good idea!
Glenn
From: Glenn Dekhayser
[mailto:gdekhayser@voyantinc.com]
Sent: Tuesday, February 27, 2007
9:26 AM
To: Glenn Walker; Darren Soothill;
toasters@mathworks.com
Subject: RE: data management
through compression for Netapp
Ah, Glenn, you're so spoiled with Netapp!
:-)
Problem is, with things like Data Domain,
unless you're going to use the Netapp as back-end FC storage (which isn't
supported yet), we're not in WAFL-land, so in most cases you are writing to the
same place. And without Raid-DP, a single bit error on a raid rebuild
could have a much bigger impact than normal, if that particular bit belongs to
a byte that's referenced mulitple times in the de-duping scheme.
Basically, I'm making the argument that if you're going to use de-duping-
Raid-DP and WAFL is a MUST!
Regarding the caching- de-duping would
eliminate the efficiencies of WAFL read-ahead, since the blocks wouldn't be
temporally located. While the read ahead would pick up the stub, the
de-duping software (not being netapp-aware) would need to go back and retrieve
the actual data referenced by the stub, and there goes your cache hit %.
But, sometimes performance is not
important.
Glenn (the other one)
From:
owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Glenn Walker
Sent: Tuesday, February 27, 2007
8:41 AM
To: Darren Soothill;
toasters@mathworks.com
Subject: RE: data management
through compression for Netapp
I think that all of the solutions have
their place – I’m not as interested in ‘compression’ as
I am with ‘deDupe’ – they are separate, and not entirely
equal.
The only issue I can foresee with deDupe,
besides adding overhead, is chance of failure: if the data is stored in a
single block, and it is lost, so are you – but that’s why you have
parity, so it shouldn’t be any more dangerous than normal data storage (provided
the software works adequately). As for overhead, having multiple
individuals ‘hit’ the same block would cause performance issues
– but I’m not so convinced: There’s really no such
thing as ‘overwrite’ in ONTAP, it’s always writing to a free
block – as for reading the single block that has been deDuped, even
though multiple machines hitting the same block should ordinarily cause higher
latencies for everyone, I would think that the simple fact that there is one
block with many accesses would indicate that it would already be cached –
caching it in memory should remove the read latency penalty.
Stubs carry their own penalties,
btw: when the stub is seen, the filer must use the fpolicy service to
communicate with the hosting system to redirect and\or obtain the actual
data: this is multiple layers of network communication, memory lookup,
and disk access before the client gets the requested data – this (in
theory) should be slower than a self contained system in almost every case.
VTL is definitely a good solution for
backups and we’re looking into them now. But HSM products, and
archiving products of all types, still have one common flaw: they move
the data around and it must then be managed. While this would be the most
likely best way of dealing with the massive amounts of data in anyone’s
environment, it would still be nice to keep it all in one place.
That’s just a dream, anyways… I’m not sure that there
would be enough savings through deDupe to actually accomplish anything like
this. Now on the other hand, if the filer were able to move data to
cheaper ATA storage within the same system, and simply use a redirector in the
filesystem (pointer through vvbn), that could prove useful: you’d
have the same lookup of vvbn-to-pvbn in order to find the data, but the actual
blocks would be elsewhere within the system. No additional layers of
stubs, or redirection through some other HSM system.
Sorry for the long email this early in the
morning,
Glenn
From:
owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Darren Soothill
Sent: Tuesday, February 27, 2007
12:36 AM
To: toasters@mathworks.com
Subject: RE: data management
through compression for Netapp
The issue with any compression technology
is the latency it adds to any workload and what happens with a large number of
people doing the same thing.
Would you not be better at looking at an
archiving product that can archive off the older files leaving stub files
behind so any application can carry on working with no interruption. Assuming
that the data is stored as lots of files.
Perhaps you need to look at some
traditional HSM type product that integrates with a tape library except instead
of using a physical tape library you use a VTL which will do the deduplication
and compression for you. You then have the added advantage that the VTL can
spit out copies of the virtual tapes to be taken off site for DR. This type of
solution should then be platform independent.
From:
owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Glenn Walker
Sent: 27 February 2007 03:18
To: Glenn Dekhayser; Wilbur
Castro; toasters@mathworks.com
Subject: RE: data management
through compression for Netapp
I’ve also heard good things about
Data Domain.
On the other hand, does anyone have any
experience with the A-SIS deDupe technology from NetApp? Sounds great,
except for the ‘out of band’ nature of it.
Glenn
From:
owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Glenn Dekhayser
Sent: Monday, February 26, 2007
8:36 PM
To: Wilbur Castro;
toasters@mathworks.com
Subject: RE: data management
through compression for Netapp
Wilbur:
What are your performance
requirements? Adding compression/deDup into this has a serious impact on
I/O performance, it will pretty much totally screw any read-ahead
optimization.
If you still want to go ahead, look at
Data domain's Gateway series. It doesn't formally support Netapp, but i'm
sure you could use it. This will do exactly what you want, as long as you
want NFS!
Glenn (the other one)
From:
owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Wilbur Castro
Sent: Monday, February 26, 2007
4:55 PM
To: toasters@mathworks.com
Subject: data management through
compression for Netapp
Hi toasters,
We have a couple of 100 TBs of heterogenous storage (some from netapp), and see
that our storage grows close to 60% over year. We were looking at alternatives
for managing this data growth. Compression was one of the techniques we were
considering for our nearline and (possibly) primary storage. Our applications
cannot change to do their own compression, so it boils down to doing this in
the storage layer or through an external device. Also, we'd like to not have
any performance impact and compression to happen transparently. Deduplication
technology from storage vendors would help, but it is not a hetrogenous
solution.
I am not aware of any compression technology from netapp. Are you folks aware
of any solutions? Would love to hear your experience with those or other
alternative ways you deal with the storage growth problem while managing costs.
Thx,
Wilbur