You should get that 4th shelf for free. :)
-Blake
On 8/13/07, Paul Letta letta@jlab.org wrote:
Hello, I can't stress the below bulletin enough. Our site experienced a 12+ hour outage on a FAS3040's first day of production due to this bug.
On Sunday, I did a migration from 2 FAS940's and 1 F880 to a new FAS3040 with 3 SATA shelves. The first day of production on Monday was great, it took the load with ease and response was great. Level 0 backups were due. So at about 5pm on Monday, I started level 0's of the volumes on the new 3040. At 7pm, it panic'ed and halted. It said it lost 1 disk in the root aggr and was rebuilding, during which time it said it lost 2 more.
I wasn't on the phone with L3 tech support more than 5 minutes before they said I was hit with this firmware bug. The shelves came with V33, and I hadn't put them in production before the May 31st email alert about the FW upgrade to V34 came out.
So by around 11pm Monday we started the WAFL_check on the root aggr. And it finished about 5am. But there were a lot of lost files and support did not want us to accept the changes by this WAFL_check and do it against a snapshot from before the crash. We elected to accept the lost files and then search the snapshot and copy the lost files back to the live file system ourselves. But the file system was offline until about 11am, 3 hours into a work day for 2200 users -- not good. And then the 3040 was so slow during that day doing a double reconstruction that many services were not responsive at all, our email spool being the most visible.
So,
If you have shelves with this firmware -- update it as soon as possible !
The firmware upgrade was the first thing I did the night of the crash, and it only took about 15 minutes to do 3 shelves with 2 modules each.
The good news is, we got a 4th shelf that was came later than the rest, and it came with V34 firmware.
Paul
9 August 2007 Customer Support Bulletin CSB-0708-03
Critical Upgrade – AT-FCX FW to v34 *Summary**:*
This Customer Support Bulletin is meant to reinforce and amplify the need to expeditiously upgrade to the AT-FCX v34 Firmware Release that is described in CSB-0705-03 . Customers are required to schedule a maintenance window and upgrade to AT-FCX FW v34 as soon as possible.
*What is the potential service impact **if AT-FCX firmware is not upgraded to v34**:*
Systems with AT-FCX I/O modules (X/SP-5612A-R5) not upgraded to firmware v34 can experience the following problems (see link above for more details):
* WAFL(r) file system inconsistency * Write-data loss * Extended system outages * The need to run WAFL recovery utilities to restore data integrity * Unexpected reservation conflicts that in turn can cause additional service outages
*What are the failure symptoms if AT-FCX firmware is not upgraded to v34?*
Systems with AT-FCX I/O modules (X/SP-5612A-R5) not upgraded to firmware v34 would see the following failure symptoms related to pre-v34 firmware:**
o Multiple RAID lost-write or checksum errors are seen on multiple drives attached to an AT-FCX shelf running shelf I/O module firmware version 33 or lower. These errors are caused by misdirected writes and/or reads to a disk. In some cases, not only is the location wrong, but the wrong disk is involved. o The AT-FCX I/O module can occasionally panic during shelf firmware downloading. This will result in loss of disk connectivity and storage multi-disk panic.
*Additional Benefits*
In addition to protection against the above-mentioned problems, the following fixes are included in firmware v34:
* Corrective action that recognizes the failure of the first I/O AFTER a cache flush has failed. This I/O is returned with a senseKey/asc/ascq of 9/88/01 error that causes Data ONTAP(r) to take the drive out of service. The data on the drive will then be reconstructed normally without data loss. * Corrects an issue in which the reservation database is not updated properly, resulting in unexpected reservation conflicts that in turn can cause service outages. * Avoids a known problem in which a controller failover can occur while downloading shelf firmware.
*Support*
Please contact NetApp Global Services and/or Professional Services for questions or assistance concerning the firmware upgrade of AT-FCX (X/SP-5612A-R5) to v34.**
*NOW Self-Service Options*
* To register online for NOW access, go to http://now.netapp.com/newuser/ * To unsubscribe from the Customer Product Announcement distribution list, deselect the Field Alerts option in the Subscriptions
https://now.netapp.com/eservice/personal/loadSubscription.do?moduleName=MYPROFILE area on NOW.
https://now.netapp.com/eservice/personal/loadSubscription.do?moduleName=MYPROFILE
* To review, order, and renew software subscriptions, click Manage Service Contracts on the Service and Support <http://now.netapp.com/Self-Service/Forms/SupportHome.asp> home page on NOW.
The purpose of this communication is for NetApp Global Services to notify its installed base end users about urgent and important product information that may affect product performance or reliability. The information contained herein and the distribution list are Network Appliance™ confidential materials that are subject to restrictions on redistribution and that cannot be shared outside of this email distribution list.
(c) 2007 Network Appliance, Inc. All rights reserved. Specifications subject to change without notice. NetApp, the Network Appliance logo, Data ONTAP, and WAFL are registered trademarks and Network Appliance and NOW are trademarks of Network Appliance, Inc. in the U.S. and other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.