I found the bug I was thinking of:
http://support.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=657692
it certainly caused some headaches for me after upgrading to 8.1.2P4
To further veer off-topic, this does explain the increase in disk usage
I was seeing on some volumes, running 8.1.2P3. I ran sis start -s on
them, and got back 15-30%. Annoyingly this went straight into snapshot
usage, it seems a bit silly that the dedupe metadata is snapshotted.
Interestingly there was only a small saving on the snapvault secondary
volumes, which are also on a filer running 8.1.2P3.
--
James Andrewartha
Network & Projects Engineer
Christ Church Grammar School
Claremont, Western Australia
Ph. (08) 9442 1757
Mob. 0424 160 877
> ________________________________________
> From: toasters-bounces@teaparty.net [toasters-bounces@teaparty.net] on behalf of Jordan Slingerland [Jordan.Slingerland@independenthealth.com]
> Sent: Saturday, July 27, 2013 12:00 PM
> To: Scott Eno; Chris Picton
> Cc: toasters@teaparty.net
> Subject: RE: sis start -s causing system slowdown
>
> a few things, if you have gone from 8.1.2P3 to P4, you are probably going to want to do a sis start -s on all volumes over 50% or so (netapp says 70, I say 50) and any volumes on aggregates in the same boat.
>
> I do not have the bug number off hand but there is a bug fixed in 8.1.2P4 that fixes an issue with deduplicaiton and causes your next dedup run to inflate the volumes by as much as 30%. I cant help wonder if this bug could be related to your issue. If you cant figure out the bug I am talking about, I can dig through my emails.
>
> Also, just a shot in the dark. have you ran a statit to possibly see if a certain disk could be the source of a bottle neck?
>
>
>
> ________________________________________
> From: toasters-bounces@teaparty.net [toasters-bounces@teaparty.net] on behalf of Scott Eno [s.eno@me.com]
> Sent: Saturday, July 27, 2013 9:43 AM
> To: Chris Picton
> Cc: toasters@teaparty.net
> Subject: Re: sis start -s causing system slowdown
>
> Hi Chris,
>
> I have seen this same behavior on a 3160 (8.1.2P3) trying to dedupe a single VMware datastore.
>
> This datastore lived on an aggregate made up of 90 10k disks with 6 raid groups of 15 disks each. Even with reallocate keeping the data spread out, every time a dedupe job would start the 4 CPU cores would go to 100% until it finished (days later), or I stopped it. Snmp would stop responding, performance manager would have huge gaps in data collection for the controller, etc.
>
> Figured it was too much work for the CPU to handle the "math" deduping data across that many disks, or a bug. But, as you say, other dedupe jobs on the same controller, on smaller aggrs, work fine.
>
> They had 30+ TB of empty space on the aggr, so I just disabled dedupe and let the VMware volumes grow.
>
> Sorry I don't have a solution for you, but wanted to let you know you weren't alone.
>
> ----
> Scott Eno
> s.eno@me.com
>
> On Jul 27, 2013, at 12:04 AM, Chris Picton
chris@picton.nom.za wrote:
>
>> Hi all
>>
>> One of the volumes exported via NFS from my fas3210 didn't have dedup enabled when comissioned. It is 250GB, and hosts ploop backed openvz vms. It is currently using about 210GB, and hourly snapshot size is about 6GB.
>>
>> When I run sis start -s on this volume, the entire system slows down to a crawl. My snmp monitoring start timing out, ssh access to the system is hit and miss, taking over a minute to log in, and when logged on, command response is sluggish. I also get the following error in the logs for all snapmirror pairs
>>
>> SnapMirror: source transfer from TEST_TESTVOL to xx.yy.zz:TEST_TESTVOL : request denied, previous request still processing.
>>
>> Fortunately, disk access from clients on this and other volumes are not detrimentally affected, but IO response times do go up by about 100ms.
>>
>> After running overnight for 11 hours, sis status reports
>> Progress: 19333120 KB Scanned
>> Change Log Usage: 88%
>> Logical Data: 151 GB/49 TB (0%)
>>
>>
>> At this rate, it will take about 5 days to finish scanning, leaving me barely able to manage the system effectively while this is happening.
>>
>> Is this normal behaviour - do I just have to wait through it, or can I stop it and correct something before trying again. Also, is the change log filling up towards 100% something to worry about?
>>
>> Regards
>> Chris