We've been chasing some NFS timeout issues here as well. The systat 1 command on the filer doesn't look very bad, however, I did the wafl_susp -z followed by a wafl_susp -w after a few minutes(approx 5). The value at that time was
= 1 and
it continued to increase over time. The recommendation below was to wait 30 seconds before checking the value, but does it matter. Does a value other than 0 in this parameter mean the filer is denying write requests to users ?
Kelvin Edwards System Admin Jefferson Lab
Brian Long wrote:
The other command you can use to gather meaninful data is wafl_susp. Run wafl_susp -z to reset the stats. Wait 30 seconds or more and then run wafl_susp -w (you may want to do this via rsh and save the output to a file).
In the output, there is a field called "cp_from_cp". If this is called at all, your NVRAM is overflowing and denying write requests to users. cp_from_cp means you're in the middle of checkpointing the first half of NVRAM and yet the second half already filled up and needs to checkpoint as well. This is VERY bad.