FYI - we've just completed testing with NFS and dedupe - about 72% savings (VMDK on NFS is thin provisioned by default anyways, so no big shock there). About 4% performance reduction, easily acceptable.
One thing to keep in mind - the busier the drives, the sharper the performance loss (as would be expected): We only have a 21 disk I/O pool (14D+2P, 7D+2P AGGR), and we're getting about 38MB/s for an 8K transfer size, 100% random, 25%/75% r/w mix with a 4GB test file per VM Guest - 12 guests total. Disks are almost maxed out (95% avg I'd say - disks are upsetting the NVRAM pretty quickly) and adding de-dupe dropped it to 36.8MB/s. Kicking it up to 15 guests creates 100% disk I/O and adding de-dupe gives about a 25% performance loss. Testing with 2MB transfer gives us about 112MB/s, so the testing is pretty subjective - as is the performance loss.
The thing to keep in mind: performance will _always_ suck when 100% disk utilization kicks in. The 4% performance loss we've seen for NFS w/ de-dupe and 7% performance loss with FCP de-dupe when disks are almost completely at max are cases where more disk I/O would help and I'd guess that we'd see little to no performance degradation.
We've found that the performance curve for NFS and FCP dedupe are about the same - the performance for both for a VMWare environment are pretty close as well.
Hope this info helps some of you...
Glenn
________________________________
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Glenn Walker Sent: Wednesday, February 13, 2008 12:50 PM To: Daniel Keisling; Bill Holland; toasters@mathworks.com Subject: RE: De-dup'ing Primary Storage
Just keep in mind that we were using IOMeter to test, not actual real world workloads - we were also very much disk bound in our testing. If you are neither using IOMeter (ie, a real-world workload) nor disk bound, I can't foresee it being a huge problem.
We'll be testing with NFS dedupe a bit later today and I'll gladly share that info if you want. Same number of disks, so same stipulations exist.
And because the question did come up, the VMWare server guy didn't build the VMDKs with thin provisioning, so it may have also impacted the testing.
Glenn
________________________________
From: Daniel Keisling [mailto:daniel.keisling@austin.ppdi.com] Sent: Wednesday, February 13, 2008 12:37 PM To: Glenn Walker; Bill Holland; toasters@mathworks.com Subject: RE: De-dup'ing Primary Storage
Thanks for the stats, I'll be de-duping VMWare data soon too.
My NetApp storage tech says that read cache peformance will increase since you're reducing the total number of actual blocks. I have not heard of any performance degradations with A-SIS, other than filer overhead (CPU) when SIS is actually de-duplicating. My A-SIS schedules are during off-peak hours so it's not a concern of mine.
Daniel
________________________________
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Glenn Walker Sent: Wednesday, February 13, 2008 9:42 AM To: Bill Holland; toasters@mathworks.com Subject: RE: De-dup'ing Primary Storage
We've been doing some VMWare testing with FCP LUNs and A-SIS.
We saw a reduction from 471GB to 21GB with only about a 7% reduction in performance. More than a fair trade-off in my opinion.
Our testing could have had impact on the performance more than the de-dupe, however.
Glenn
________________________________
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Bill Holland Sent: Wednesday, February 13, 2008 6:50 AM To: toasters@mathworks.com Subject: De-dup'ing Primary Storage
To those of you that have implemented Nearstore and A-SIS on your primary storage:
1. Have you seen any difference in overall filer performance?
2. If you have LUNs, how are your space savings on those volumes?
I know that enabling Nearstore does some system tweaking in the background to increase the number of concurrent backup streams that can be running, but I don't know what else it tweaks that may adversely affect performance of a primary storage system. Afterall, it was originally designed to run as a secondary storage platform.
______________________________________________________________________ This email transmission and any documents, files or previous email messages attached to it may contain information that is confidential or legally privileged. If you are not the intended recipient or a person responsible for delivering this transmission to the intended recipient, you are hereby notified that you must not read this transmission and that any disclosure, copying, printing, distribution or use of this transmission is strictly prohibited. If you have received this transmission in error, please immediately notify the sender by telephone or return email and delete the original transmission and its attachments without reading or saving in any manner.