panics with 7.2.6.1

List overview All Threads
Download

newer

older

RE: SMVI - Limitations?

Peter D. Gray

12 Jan 2009 12 Jan '09

5:41 a.m.

Just reverted all our filers to 7.2.5.1 after we had 7 panics in one day on 7.2.6.1 (6 on a 270A and 1 on a 3170A) We had been on 7.2.6.1 for 2 days but today was the first work day.

Looks like a problems in the quota handling.

Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0 Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0

Anyone else seen this?

Regards, pdg

See mail headers for contact information.

Show replies by date

Reinoud Reynders

12 Jan 12 Jan

8:06 a.m.

Yes, this is a known bug. Disable quota and there is a work around to fix this.

Greetings,

Reinoud

-----Oorspronkelijk bericht----- Van: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] Namens Peter D. Gray Verzonden: maandag 12 januari 2009 6:42 Aan: toasters@mathworks.com Onderwerp: panics with 7.2.6.1

Just reverted all our filers to 7.2.5.1 after we had 7 panics in one day on 7.2.6.1 (6 on a 270A and 1 on a 3170A) We had been on 7.2.6.1 for 2 days but today was the first work day.

Looks like a problems in the quota handling.

Anyone else seen this?

Regards, pdg

See mail headers for contact information.

Jan Pieter Cornet

8:39 a.m.

On Mon, Jan 12, 2009 at 09:06:01AM +0100, Reinoud Reynders wrote:

...

Yes, this is a known bug. Disable quota and there is a work around to fix this.

What, the workaround is to disable quotas? What kind of a workaround is that? Do you have a bug ID?

-- Jan-Pieter Cornet johnpc@xs4all.nl !! Disclamer: The addressee of this email is not the intended recipient. !! !! This is only a test of the echelon and data retention systems. Please !! !! archive this message indefinitely to allow verification of the logs. !!

Reinoud Reynders

8:50 a.m.

RELEASE: 7.2.6.1 BUG: 327361

Contact support to be sure this is the one and they will send you also the workaround if this is the correct bug for your problem.

Reinoud

-----Oorspronkelijk bericht----- Van: Jan Pieter Cornet [mailto:johnpc@xs4all.nl] Verzonden: maandag 12 januari 2009 9:39 Aan: Reinoud Reynders CC: toasters@mathworks.com Onderwerp: Re: panics with 7.2.6.1

On Mon, Jan 12, 2009 at 09:06:01AM +0100, Reinoud Reynders wrote:

...

Yes, this is a known bug. Disable quota and there is a work around to fix this.

What, the workaround is to disable quotas? What kind of a workaround is that? Do you have a bug ID?

David Lee

10:47 a.m.

On Mon, 12 Jan 2009, Peter D. Gray wrote:

...

Just reverted all our filers to 7.2.5.1 after we had 7 panics in one day on 7.2.6.1 (6 on a 270A and 1 on a 3170A) We had been on 7.2.6.1 for 2 days but today was the first work day.

Looks like a problems in the quota handling.

Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.string:notice]: Panic string: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.at:notice]: Panic occurred at: Mon Jan 12 14:48:38 2009 Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.proc:notice]: Panic in process: wafl_lopri Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0 Mon Jan 12 15:03:59 EST [naga: mgr.stack.framename:notice]: Stack frame 0: sk_vpanic(0xd84ccc) + 0x0

Anyone else seen this?

A quick "me too", and caution to others.

Last summer we upgraded to 7.2.5 and got hit by a quota-crashing bug. The effects on our service (20,000 accounts) were bad... very bad. This was fixed by a patch release. (Apparently, I understand, this was then properly fixed in 7.2.5.1, but we stayed on our patch release.)

Moving forward to a few days ago...

Last week, we upgraded to 7.2.6.1, having been assured that all would be well, and that such problems could not recur. (You can see it coming, can't you?) Over the weekend, we were hit by similar quota-crashing with similar bad effects on the service.

Might it be that the 7.2.5-series bugfix didn't get rolled forward into the 7.2.6 series? Or is it a new, different, quota-related bug?

Naturally, we will be pursuing this, with vigour, with Netapp.

Meanwhile, if you use quotas, I would suggest caution before going to the 7.2.6 series for the moment.

-- : David Lee I.T. Service : : Senior Systems Programmer Computer Centre : : UNIX Team Leader Durham University : : South Road : : http://www.dur.ac.uk/t.d.lee/ Durham DH1 3LE : : Phone: +44 191 334 2752 U.K. :

Darren Miller

13 Jan 13 Jan

4:53 p.m.

On Mon, 12 Jan 2009, David Lee wrote:

...

On Mon, 12 Jan 2009, Peter D. Gray wrote:

...
Just reverted all our filers to 7.2.5.1 after we had 7 panics in one day on 7.2.6.1 (6 on a 270A and 1 on a 3170A) We had been on 7.2.6.1 for 2 days but today was the first work day.

Looks like a problems in the quota handling.

Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1

....

...

...
Anyone else seen this?

Me too. We contacted NetApp and got the fix which allowed us to turn quotas back on. However, quotas are now "leaking", which means quota usage is actually higher than the actual disk usage, causing many of our users to hit their quota limit. It only affects CIFS, NFS quotas are fine.

We have a case open with NetApp about it, but I think we will be downgrading fairly soon.

-- Darren Miller Computing Service University of York York, YO10 5DD, UK

David Lee

8:46 p.m.

On Tue, 13 Jan 2009, Darren Miller wrote:

...

On Mon, 12 Jan 2009, David Lee wrote:

...
On Mon, 12 Jan 2009, Peter D. Gray wrote:

...
Just reverted all our filers to 7.2.5.1 after we had 7 panics in one day on 7.2.6.1 (6 on a 270A and 1 on a 3170A) We had been on 7.2.6.1 for 2 days but today was the first work day.

Looks like a problems in the quota handling.

Mon Jan 12 14:48:38 EST [naga: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/quotas/quota.c:1684: Assertion failure. in process wafl_lopri on release NetApp Release 7.2.6.1

....

...
...
Anyone else seen this?

Me too. We contacted NetApp and got the fix which allowed us to turn quotas back on. However, quotas are now "leaking", which means quota usage is actually higher than the actual disk usage, causing many of our users to hit their quota limit. It only affects CIFS, NFS quotas are fine.

Ugh. Thanks for the heads-up.

Would that fix happen to be 'wafl_enable_allocation_size 0'? We applied that fix and restarted quotas earlier today, and are holding our breath...

Darren Miller

9:02 p.m.

On Tue, 13 Jan 2009, David Lee wrote:

...

On Tue, 13 Jan 2009, Darren Miller wrote:

...
Me too. We contacted NetApp and got the fix which allowed us to turn quotas back on. However, quotas are now "leaking", which means quota usage is actually higher than the actual disk usage, causing many of our users to hit their quota limit. It only affects CIFS, NFS quotas are fine.

Ugh. Thanks for the heads-up.

Would that fix happen to be 'wafl_enable_allocation_size 0'? We applied that fix and restarted quotas earlier today, and are holding our breath...

That's the one. The filer hasn't crashed any more, but I am having to do quota off/quota on on the affected volumes every night, which takes several hours. I'd keep an eye on actual disk space compared to what the quota says if I was you.

-- Darren Miller dm26@york.ac.uk Computing Service Tel: (01904) 43 3815 University of York Fax: (01904) 43 3740 York, YO10 5DD, UK http://www.york.ac.uk/

David Lee

14 Jan 14 Jan

12:34 p.m.

On Tue, 13 Jan 2009, David Lee wrote:

...

On Tue, 13 Jan 2009, Darren Miller wrote:

...
[...] Me too. We contacted NetApp and got the fix which allowed us to turn quotas back on. However, quotas are now "leaking", which means quota usage is actually higher than the actual disk usage, causing many of our users to hit their quota limit. It only affects CIFS, NFS quotas are fine.

Ugh. Thanks for the heads-up.

Would that fix happen to be 'wafl_enable_allocation_size 0'? We applied that fix and restarted quotas earlier today, and are holding our breath...

Warning: side-effect: CPU gobbling by "quota resize".

Our operational procedures includes handling Servicedesk (Helpdesk) requests from users for increased quotas. Our staff use a UNIX script which (1) edits the Netapp "quotas" file and (2) invokes "quota resize" on the relevant volume. This has worked happily for years. But since doing that "wafl_..." workaround yesterday, we've had bursts of very poor response from the filers. Closer examination, and a controlled test, show that these have coincided with invoking "quota resize".

It looks as though that "wafl_..." workaround and "quota resize" are most unhappy partners.

I hope that the proper fix to the quota-crash problem won't introduce this performance-hit as a side-effect...

Is there a NetApp person on this list taking an overview of this particular issue, and collating information?

6043

Age (days ago)

6045

Last active (days ago)

toasters@lists.teaparty.net

8 comments

5 participants

tags (0)

participants (5)

Darren Miller
David Lee
Jan Pieter Cornet
Peter D. Gray
Reinoud Reynders