Fletcher,
What ONTAP version are you running?
We've had a case open since we swapped our heads out to 3270s with cp_slo_vols that seemed to be happening at random, but we thought we'd narrowed it down to times when large metadata writes are occuring. Deletions into snapshots, for example. Latterly I could reliably trigger it with storage vmotions - it would normally occur at the end of the process (i.e. when VMware deletes the files and it ends up in snaps)
Netapp had us upgrade to 8.1.2RC2 with reference to some bug IDs I don't have to hand at the moment. We thought this had fixed it - certainly storage vmotions were not triggering it, however, it reappeared when a number of LUNs were deleted at once the other day.
Regards,
Tim
On 15 January 2013 21:12, Fletcher Cocquyt fcocquyt@stanford.edu wrote:
resending this without the 80kb chart
Yesterday morning one of the heads on our 3270 experienced large NFS latency spikes causing our VMware hosts and their VMs to log storage timeouts. This latency does not correlate to any external metrics like CPU, network, OPS etc.
But in the logs do show CP events on the aggregate hosting the VMs:
Jan 14 05:27:56 [n04:wafl.cp.slovol:warning]: aggregate aggr2 is holding up the CP.
And the EMS log has CP events logged for the duration of the episode - what can we do to prevent these issues?
<wafl_cp_toolong_warning_1 total_ms="117825" total_dbufs="32276" clean="4312" v_ino="3" v_bm="29" a_ino="0" a_bm="3428" flush="1209"/>
</LR> <LR d="14Jan2013 05:19:38" n="irt-na04" t="1358169578" id="1335304168/148007" p="4" s="Ok" o="wafl_CP_proc" vf="" type="0" seq="633232" > <wafl_cp_slovol_warning_1 voltype="aggregate" volowner="" volname="aggr2" volident="" nt="35" nb="22045" clean="1346852" v_ino="0" v_bm="113" a_ino="0" a_bm="4" flush="0" rgid="2"/>
Netapp support wants me to run perfstats, but the issue is not ongoing - things are idle
thanks
Toasters mailing list Toasters@teaparty.net http://www.teaparty.net/mailman/listinfo/toasters