Just reverted all our filers to 7.2.5.1 after we had 7 panics in one day on 7.2.6.1 (6 on a 270A and 1 on a 3170A) We had been on 7.2.6.1 for 2 days but today was the first work day.
Looks like a problems in the quota handling.
Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0 Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0
Anyone else seen this?
Regards, pdg
--
See mail headers for contact information.
Yes, this is a known bug. Disable quota and there is a work around to fix this.
Greetings,
Reinoud
-----Oorspronkelijk bericht----- Van: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] Namens Peter D. Gray Verzonden: maandag 12 januari 2009 6:42 Aan: toasters@mathworks.com Onderwerp: panics with 7.2.6.1
Just reverted all our filers to 7.2.5.1 after we had 7 panics in one day on 7.2.6.1 (6 on a 270A and 1 on a 3170A) We had been on 7.2.6.1 for 2 days but today was the first work day.
Looks like a problems in the quota handling.
Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0 Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0
Anyone else seen this?
Regards, pdg
--
See mail headers for contact information.
On Mon, Jan 12, 2009 at 09:06:01AM +0100, Reinoud Reynders wrote:
Yes, this is a known bug. Disable quota and there is a work around to fix this.
What, the workaround is to disable quotas? What kind of a workaround is that? Do you have a bug ID?
RELEASE: 7.2.6.1 BUG: 327361
Contact support to be sure this is the one and they will send you also the workaround if this is the correct bug for your problem.
Reinoud
-----Oorspronkelijk bericht----- Van: Jan Pieter Cornet [mailto:johnpc@xs4all.nl] Verzonden: maandag 12 januari 2009 9:39 Aan: Reinoud Reynders CC: toasters@mathworks.com Onderwerp: Re: panics with 7.2.6.1
On Mon, Jan 12, 2009 at 09:06:01AM +0100, Reinoud Reynders wrote:
Yes, this is a known bug. Disable quota and there is a work around to fix this.
What, the workaround is to disable quotas? What kind of a workaround is that? Do you have a bug ID?
-- Jan-Pieter Cornet johnpc@xs4all.nl !! Disclamer: The addressee of this email is not the intended recipient. !! !! This is only a test of the echelon and data retention systems. Please !! !! archive this message indefinitely to allow verification of the logs. !!
On Mon, 12 Jan 2009, Peter D. Gray wrote:
Just reverted all our filers to 7.2.5.1 after we had 7 panics in one day on 7.2.6.1 (6 on a 270A and 1 on a 3170A) We had been on 7.2.6.1 for 2 days but today was the first work day.
Looks like a problems in the quota handling.
Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0 Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0
Anyone else seen this?
A quick "me too", and caution to others.
Last summer we upgraded to 7.2.5 and got hit by a quota-crashing bug. The effects on our service (20,000 accounts) were bad... very bad. This was fixed by a patch release. (Apparently, I understand, this was then properly fixed in 7.2.5.1, but we stayed on our patch release.)
Moving forward to a few days ago...
Last week, we upgraded to 7.2.6.1, having been assured that all would be well, and that such problems could not recur. (You can see it coming, can't you?) Over the weekend, we were hit by similar quota-crashing with similar bad effects on the service.
Might it be that the 7.2.5-series bugfix didn't get rolled forward into the 7.2.6 series? Or is it a new, different, quota-related bug?
Naturally, we will be pursuing this, with vigour, with Netapp.
Meanwhile, if you use quotas, I would suggest caution before going to the 7.2.6 series for the moment.
On Mon, 12 Jan 2009, David Lee wrote:
On Mon, 12 Jan 2009, Peter D. Gray wrote:
Just reverted all our filers to 7.2.5.1 after we had 7 panics in one day on 7.2.6.1 (6 on a 270A and 1 on a 3170A) We had been on 7.2.6.1 for 2 days but today was the first work day.
Looks like a problems in the quota handling.
Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1
....
Anyone else seen this?
Me too. We contacted NetApp and got the fix which allowed us to turn quotas back on. However, quotas are now "leaking", which means quota usage is actually higher than the actual disk usage, causing many of our users to hit their quota limit. It only affects CIFS, NFS quotas are fine.
We have a case open with NetApp about it, but I think we will be downgrading fairly soon.
On Tue, 13 Jan 2009, Darren Miller wrote:
On Mon, 12 Jan 2009, David Lee wrote:
On Mon, 12 Jan 2009, Peter D. Gray wrote:
Just reverted all our filers to 7.2.5.1 after we had 7 panics in one day on 7.2.6.1 (6 on a 270A and 1 on a 3170A) We had been on 7.2.6.1 for 2 days but today was the first work day.
Looks like a problems in the quota handling.
Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1
....
Anyone else seen this?
Me too. We contacted NetApp and got the fix which allowed us to turn quotas back on. However, quotas are now "leaking", which means quota usage is actually higher than the actual disk usage, causing many of our users to hit their quota limit. It only affects CIFS, NFS quotas are fine.
Ugh. Thanks for the heads-up.
Would that fix happen to be 'wafl_enable_allocation_size 0'? We applied that fix and restarted quotas earlier today, and are holding our breath...
On Tue, 13 Jan 2009, David Lee wrote:
On Tue, 13 Jan 2009, Darren Miller wrote:
Me too. We contacted NetApp and got the fix which allowed us to turn quotas back on. However, quotas are now "leaking", which means quota usage is actually higher than the actual disk usage, causing many of our users to hit their quota limit. It only affects CIFS, NFS quotas are fine.
Ugh. Thanks for the heads-up.
Would that fix happen to be 'wafl_enable_allocation_size 0'? We applied that fix and restarted quotas earlier today, and are holding our breath...
That's the one. The filer hasn't crashed any more, but I am having to do quota off/quota on on the affected volumes every night, which takes several hours. I'd keep an eye on actual disk space compared to what the quota says if I was you.
On Tue, 13 Jan 2009, David Lee wrote:
On Tue, 13 Jan 2009, Darren Miller wrote:
[...] Me too. We contacted NetApp and got the fix which allowed us to turn quotas back on. However, quotas are now "leaking", which means quota usage is actually higher than the actual disk usage, causing many of our users to hit their quota limit. It only affects CIFS, NFS quotas are fine.
Ugh. Thanks for the heads-up.
Would that fix happen to be 'wafl_enable_allocation_size 0'? We applied that fix and restarted quotas earlier today, and are holding our breath...
Warning: side-effect: CPU gobbling by "quota resize".
Our operational procedures includes handling Servicedesk (Helpdesk) requests from users for increased quotas. Our staff use a UNIX script which (1) edits the Netapp "quotas" file and (2) invokes "quota resize" on the relevant volume. This has worked happily for years. But since doing that "wafl_..." workaround yesterday, we've had bursts of very poor response from the filers. Closer examination, and a controlled test, show that these have coincided with invoking "quota resize".
It looks as though that "wafl_..." workaround and "quota resize" are most unhappy partners.
I hope that the proper fix to the quota-crash problem won't introduce this performance-hit as a side-effect...
Is there a NetApp person on this list taking an overview of this particular issue, and collating information?