I think that all of the
solutions have their place – I’m not as interested in ‘compression’ as I am with
‘deDupe’ – they are separate, and not entirely
equal.
The only issue I can
foresee with deDupe, besides adding overhead, is chance of failure: if the
data is stored in a single block, and it is lost, so are you – but that’s why
you have parity, so it shouldn’t be any more dangerous than normal data storage
(provided the software works adequately). As for overhead, having multiple
individuals ‘hit’ the same block would cause performance issues – but I’m not so
convinced: There’s really no such thing as ‘overwrite’ in ONTAP, it’s
always writing to a free block – as for reading the single block that has been
deDuped, even though multiple machines hitting the same block should ordinarily
cause higher latencies for everyone, I would think that the simple fact that
there is one block with many accesses would indicate that it would already be
cached – caching it in memory should remove the read latency
penalty.
Stubs carry their own
penalties, btw: when the stub is seen, the filer must use the fpolicy
service to communicate with the hosting system to redirect and\or obtain the
actual data: this is multiple layers of network communication, memory
lookup, and disk access before the client gets the requested data – this (in
theory) should be slower than a self contained system in almost every
case.
VTL is definitely a
good solution for backups and we’re looking into them now. But HSM
products, and archiving products of all types, still have one common flaw:
they move the data around and it must then be managed. While this would be
the most likely best way of dealing with the massive amounts of data in anyone’s
environment, it would still be nice to keep it all in one place. That’s
just a dream, anyways… I’m not sure that there would be enough savings
through deDupe to actually accomplish anything like this. Now on the other
hand, if the filer were able to move data to cheaper ATA storage within the same
system, and simply use a redirector in the filesystem (pointer through vvbn),
that could prove useful: you’d have the same lookup of vvbn-to-pvbn in
order to find the data, but the actual blocks would be elsewhere within the
system. No additional layers of stubs, or redirection through some other
HSM system.
Sorry for the long
email this early in the morning,
Glenn
From:
owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Darren Soothill
Sent: Tuesday, February 27, 2007 12:36
AM
To:
toasters@mathworks.com
Subject:
RE: data management through compression for
Netapp
The issue with any
compression technology is the latency it adds to any workload and what happens
with a large number of people doing the same thing.
Would you not be better
at looking at an archiving product that can archive off the older files leaving
stub files behind so any application can carry on working with no interruption.
Assuming that the data is stored as lots of files.
Perhaps you need to
look at some traditional HSM type product that integrates with a tape library
except instead of using a physical tape library you use a VTL which will do the
deduplication and compression for you. You then have the added advantage that
the VTL can spit out copies of the virtual tapes to be taken off site for DR.
This type of solution should then be platform
independent.
From:
owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Glenn Walker
Sent: 27 February 2007 03:18
To: Glenn Dekhayser; Wilbur Castro;
toasters@mathworks.com
Subject:
RE: data management through compression for
Netapp
I’ve also heard good
things about Data Domain.
On the other hand, does
anyone have any experience with the A-SIS deDupe technology from NetApp?
Sounds great, except for the ‘out of band’ nature of
it.
Glenn
From:
owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Glenn Dekhayser
Sent: Monday, February 26, 2007 8:36
PM
To: Wilbur Castro;
toasters@mathworks.com
Subject:
RE: data management through compression for
Netapp
Wilbur:
What are your
performance requirements? Adding compression/deDup into this has a serious
impact on I/O performance, it will pretty much totally screw any read-ahead
optimization.
If you still want to go
ahead, look at Data domain's Gateway series. It doesn't formally support
Netapp, but i'm sure you could use it. This will do exactly what you want,
as long as you want NFS!
Glenn (the other
one)
From:
owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Wilbur Castro
Sent: Monday, February 26, 2007 4:55
PM
To:
toasters@mathworks.com
Subject:
data management through compression for Netapp
Hi toasters,
We have a couple of 100 TBs of
heterogenous storage (some from netapp), and see that our storage grows close to
60% over year. We were looking at alternatives for managing this data growth.
Compression was one of the techniques we were considering for our nearline and
(possibly) primary storage. Our applications cannot change to do their own
compression, so it boils down to doing this in the storage layer or through an
external device. Also, we'd like to not have any performance impact and
compression to happen transparently. Deduplication technology from storage
vendors would help, but it is not a hetrogenous solution.
I am not aware
of any compression technology from netapp. Are you folks aware of any solutions?
Would love to hear your experience with those or other alternative ways you deal
with the storage growth problem while managing costs.
Thx,
Wilbur