On Mon, 12 Jan 2009, Peter D. Gray wrote:
Just reverted all our filers to 7.2.5.1 after we had 7 panics in one day on 7.2.6.1 (6 on a 270A and 1 on a 3170A) We had been on 7.2.6.1 for 2 days but today was the first work day.
Looks like a problems in the quota handling.
Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0 Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0
Anyone else seen this?
A quick "me too", and caution to others.
Last summer we upgraded to 7.2.5 and got hit by a quota-crashing bug. The effects on our service (20,000 accounts) were bad... very bad. This was fixed by a patch release. (Apparently, I understand, this was then properly fixed in 7.2.5.1, but we stayed on our patch release.)
Moving forward to a few days ago...
Last week, we upgraded to 7.2.6.1, having been assured that all would be well, and that such problems could not recur. (You can see it coming, can't you?) Over the weekend, we were hit by similar quota-crashing with similar bad effects on the service.
Might it be that the 7.2.5-series bugfix didn't get rolled forward into the 7.2.6 series? Or is it a new, different, quota-related bug?
Naturally, we will be pursuing this, with vigour, with Netapp.
Meanwhile, if you use quotas, I would suggest caution before going to the 7.2.6 series for the moment.