If you are running a-sis with 7.2.3 or 7.2.3, there are a few bugs that can cause the filer to panic and mark that volume offline forcing a wafl_check on the containing aggregate. Not only that, but we had to netboot a developer release and had extensive outages.
We have been told that 7.2.4 had fixed these bugs, but if you have run asis on an old version, the bug may be there dormant.
I believe these were the bugs that were identified in our incident report.
bug 256799 Bug 266312 Bug 276084 bug 266312 bug 251673
The bug hit our primary storage, and it also his the VSM volumes on our DR filer, and it would is in the ACL/directory of the backup tapes as well.
Darragh, Stephen J (CSC) (US SSA) wrote:
If you are running a-sis with 7.2.3 or 7.2.3,
I was under the impression that A-SIS required >=7.2.4 (as per one of our SEs, I believe). I guess this isnt the case?
Regardless, thanks for the heads up to the list.
It runs on 7.2.3 (first version?). 7.2.4 is the way to go, however.
We were lucky in that we've only deployed 7.2.4 for A-SIS systems and are having pretty good luck. No consistency issues, great space savings, little-to-no performance degradation.
I think perhaps your SE was trying to save you some grief. :)
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Nick Silkey Sent: Friday, April 11, 2008 8:50 AM To: toasters@mathworks.com Subject: Re: Serious A-sis bugs
Darragh, Stephen J (CSC) (US SSA) wrote:
If you are running a-sis with 7.2.3 or 7.2.3,
I was under the impression that A-SIS required >=7.2.4 (as per one of our SEs, I believe). I guess this isnt the case?
Regardless, thanks for the heads up to the list.
We have been running ASIS on 7.2.3 for 4 months with no problems and great results...although I am upgrading this weekend to 7.2.4
Glenn Walker wrote:
It runs on 7.2.3 (first version?). 7.2.4 is the way to go, however.
We were lucky in that we've only deployed 7.2.4 for A-SIS systems and are having pretty good luck. No consistency issues, great space savings, little-to-no performance degradation.
I think perhaps your SE was trying to save you some grief. :)
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Nick Silkey Sent: Friday, April 11, 2008 8:50 AM To: toasters@mathworks.com Subject: Re: Serious A-sis bugs
Darragh, Stephen J (CSC) (US SSA) wrote:
If you are running a-sis with 7.2.3 or 7.2.3,
I was under the impression that A-SIS required >=7.2.4 (as per one of our SEs, I believe). I guess this isnt the case?
Regardless, thanks for the heads up to the list.
Hi guys,
I've upgraded our FAS3020c to 7.3RC1 and activated asis on some esx datastore nfs volumes. First I was very impressed:
robin> df -sh Filesystem used saved %saved /vol/esx_prod2/ 171GB 1063GB 86%
But unfortunatly one day later the 3020c ran in takeover mode, robin paniced. I tracked down the problem to A-SIS. Everytime the scheduler runs now, the filer panics with the following error message:
Tue Apr 15 01:08:44 CEST [robin: sk.assert:ALERT]: replayed event: ../common/wafl/buftree.c:6558(NULL) Tue Apr 15 01:08:44 CEST [robin: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/buftree.c:6558: Assertion failure. in process wafl_lopri on release NetApp Release 7.3RC1
I ran wafl_check but the check couldn't find any problems. I couldn't find anything on the now site regarding this issue or error message.
Any ideas?
As a workaround I turned off the scheduler and no more panic occured.
Regards, alex
-----Ursprüngliche Nachricht----- Von: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] Im Auftrag von Darragh, Stephen J (CSC) (US SSA) Gesendet: Donnerstag, 10. April 2008 17:05 An: toasters@mathworks.com Betreff: Serious A-sis bugs
If you are running a-sis with 7.2.3 or 7.2.3, there are a few bugs that can cause the filer to panic and mark that volume offline forcing a wafl_check on the containing aggregate. Not only that, but we had to netboot a developer release and had extensive outages.
We have been told that 7.2.4 had fixed these bugs, but if you have run asis on an old version, the bug may be there dormant.
I believe these were the bugs that were identified in our incident report.
bug 256799 Bug 266312 Bug 276084 bug 266312 bug 251673
The bug hit our primary storage, and it also his the VSM volumes on our DR filer, and it would is in the ACL/directory of the backup tapes as well.
Especially since this is an RC release, I would open a case with the Support Center and ask them specifically what they woulld need to troubleshoot this for you. It's likely that they will want the core file, but they would be best scaled to drive this.
Cheers ..........
Stetson M. Webster Onsite Professional Services Engineer PS - North Amer. - East
NetApp 919.250.0052 Mobile Stetson.Webster@netapp.com www.netapp.com
-----Original Message----- From: Alexander Schalek [mailto:as@bacher.at] Sent: Tuesday, April 15, 2008 1:24 PM To: Darragh, Stephen J (CSC) (US SSA); toasters@mathworks.com Subject: AW: Serious A-sis bugs
Hi guys,
I've upgraded our FAS3020c to 7.3RC1 and activated asis on some esx datastore nfs volumes. First I was very impressed:
robin> df -sh Filesystem used saved %saved /vol/esx_prod2/ 171GB 1063GB 86%
But unfortunatly one day later the 3020c ran in takeover mode, robin paniced. I tracked down the problem to A-SIS. Everytime the scheduler runs now, the filer panics with the following error message:
Tue Apr 15 01:08:44 CEST [robin: sk.assert:ALERT]: replayed event: ../common/wafl/buftree.c:6558(NULL) Tue Apr 15 01:08:44 CEST [robin: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/buftree.c:6558: Assertion failure. in process wafl_lopri on release NetApp Release 7.3RC1
I ran wafl_check but the check couldn't find any problems. I couldn't find anything on the now site regarding this issue or error message.
Any ideas?
As a workaround I turned off the scheduler and no more panic occured.
Regards, alex
-----Ursprüngliche Nachricht----- Von: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] Im Auftrag von Darragh, Stephen J (CSC) (US SSA) Gesendet: Donnerstag, 10. April 2008 17:05 An: toasters@mathworks.com Betreff: Serious A-sis bugs
If you are running a-sis with 7.2.3 or 7.2.3, there are a few bugs that can cause the filer to panic and mark that volume offline forcing a wafl_check on the containing aggregate. Not only that, but we had to netboot a developer release and had extensive outages.
We have been told that 7.2.4 had fixed these bugs, but if you have run asis on an old version, the bug may be there dormant.
I believe these were the bugs that were identified in our incident report.
bug 256799 Bug 266312 Bug 276084 bug 266312 bug 251673
The bug hit our primary storage, and it also his the VSM volumes on our DR filer, and it would is in the ACL/directory of the backup tapes as well.
Hi,
Regarding Netapp Support I've hit bug 287105, will be fixed in RC2
Here is the workaround for this panic:
- Issue "sis start -s" on the volume (wait for couple of couple minutes to start)
- Issue "sis stop" on that volume.
Explanation: "sis start -s" will create fresh in-active changelog before starting gatherer scanner. so, fingerprint records which causes this panic will be deleted. there by we can avoid panic on next sis runs.
Regards, alex
-- Alexander Schalek, Technical Services / IT-Infrastruktur Bacher Systems EDV GmbH FN 54202i, Handelsgericht Wien Clemens-Holzmeister-Strasse 4 A-1100 Wien, Business Park Vienna, Austria phone: +43 (1) 60 126-34 | fax: +43 (1) 60 126-4 e-mail: as@bacher.at | web: www.bacher.at
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Alexander Schalek Sent: Tuesday, April 15, 2008 7:24 PM To: Darragh, Stephen J (CSC) (US SSA); toasters@mathworks.com Subject: AW: Serious A-sis bugs
Hi guys,
I've upgraded our FAS3020c to 7.3RC1 and activated asis on some esx datastore nfs volumes. First I was very impressed:
robin> df -sh Filesystem used saved %saved /vol/esx_prod2/ 171GB 1063GB 86%
But unfortunatly one day later the 3020c ran in takeover mode, robin paniced. I tracked down the problem to A-SIS. Everytime the scheduler runs now, the filer panics with the following error message:
Tue Apr 15 01:08:44 CEST [robin: sk.assert:ALERT]: replayed event: ../common/wafl/buftree.c:6558(NULL) Tue Apr 15 01:08:44 CEST [robin: sk.panic:ALERT]: replayed event: Panic String: ../common/wafl/buftree.c:6558: Assertion failure. in process wafl_lopri on release NetApp Release 7.3RC1
I ran wafl_check but the check couldn't find any problems. I couldn't find anything on the now site regarding this issue or error message.
Any ideas?
As a workaround I turned off the scheduler and no more panic occured.
Regards, alex
-----Ursprüngliche Nachricht----- Von: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] Im Auftrag von Darragh, Stephen J (CSC) (US SSA) Gesendet: Donnerstag, 10. April 2008 17:05 An: toasters@mathworks.com Betreff: Serious A-sis bugs
If you are running a-sis with 7.2.3 or 7.2.3, there are a few bugs that can cause the filer to panic and mark that volume offline forcing a wafl_check on the containing aggregate. Not only that, but we had to netboot a developer release and had extensive outages.
We have been told that 7.2.4 had fixed these bugs, but if you have run asis on an old version, the bug may be there dormant.
I believe these were the bugs that were identified in our incident report.
bug 256799 Bug 266312 Bug 276084 bug 266312 bug 251673
The bug hit our primary storage, and it also his the VSM volumes on our DR filer, and it would is in the ACL/directory of the backup tapes as well.