Are you sure the latency is less than 1 millisecond? I would expect these kind of latencies from Flash enabled storage (FlashCache, SSD, FlashPool, ...). Strange that you only have 22 SAS disks in your appliance, SAS shelves come with 24 disks. You probably mean there are 22 disks in the aggregate.
Did you remove the broken disk? We had some cases in the past where one broken disk slowed down the system until that disk was removed from the system. It was in a FCAL attached disk shelf, I think this kind of problem should not occur with SAS attached disk shelves because it is now a point-to-point connection.
We have seen that the reconstruct of a disk could cause some additional load, but this should only be for a few hours. I can't say that we have noticed a real impact on the system that was caused because the system had to use the parity disks.
Adding more spares helps only when one of the spare disks fails in addition to a data disk being failed, but this only helps to make sure a rebuild can start not to the impact on the performance. The time of reconstruction isn't impacted with the amount of spare disks. Also, there can only run a reconstruction in two raidgroups at the same time (by default). So if you would have a failure in 3 raigroups the third one has to wait until the first raidgroup is reconstructed.
The option raid.reconstruct.perf_impact only affects the rebuild priority, if there is no spare disk to start the rebuild this has no impact on performance. If the disk isn't reconstructed after two days I think there could another problem.
Mvg, Wouter Vervloesem
Neoria - Uptime Group Veldkant 35D B-2550 Kontich
Tel: +32 (0)3 451 23 82 Mailto: wouter.vervloesem@neoria.be Web: http://www.neoria.be
Op 8-feb.-2013, om 15:35 heeft Sebastian Goetze spgoetze@gmail.com het volgende geschreven:
Hi Rafal,
performance degradation is expected and normal while a system is in degraded mode or recovering from a failure. (Apart from that I think a latency of 0.4ms is not too bad...)
2 possible solutions to reduce impact: • more spare disks • give reconstruct a lower priority: • options raid.reconstruct.perf_impact low Hope that helps Sebastian
On 05.02.2013 09:04, Rafał Radecki wrote:
Hi All.
I am new to this forum and quite new to NetApp. I work on FAS3210 with 8.0.2 7-Mode. We have 22 SAS disks in our appliance. The aggregate is:
aggr status
Aggr State Status Options aggrX online raid_dp, aggr root degraded redirect 32-bit because we have lost 2 disks 2 days ago (one spare and one data disk, NetApp has sent new disks to us). The problem is that the appliance is performing bad in this situation. NFS latency in file operations changed from for example 0,04ms to 0,4ms. I know that the aggregate is now using parity disks in place of one of the failed disks but is this performance drop common? Are there any known ways/howtos to cut down this problem?
Best regards, Rafal Radecki. _______________________________________________ Toasters mailing list
Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters