OK, just wondering if anybody can shed light on this.
We just had a massive performance problem on one of our
netapps. One aggregate was amazingly busy, with disk drives
100% busy all the time. IO latency went through the roof
for all the volumes on the aggregate.
We spent a bit of time on this and we are not novice netapp users.
In the end we could not identify the problem. There appeared to be no
relationship between the amountof I/O coming from clients and
the amount of I/O on the aggregate.
So, we called netapp support. It took them a while (a few days)
but eventually they suggested changing the snapshot schedules
and removing snapshots off the aggregates and also removing
hourly snapshots off the volumes on the aggregate. We complied.
It took a while, but after 5 hours or so, relatively suddenly
the problem went away and seems to have stayed away for a day now.
I am assuming the snapshot cleanup was the problem and
the problem stopped when it finally caught up.
So, I guess my question is "why is snapshot cleanup so expensive".
Do blocks freed from snapshots need to be written, if so why?
It seems snapshot creation is cheap, but deletion expensive, which
makes the entire snapshot management cycle rather more trouble
then you might hope.
Regards,
pdg