I wonder what type of testing is done before deploying firmware versions. I know they can not simulate all of the scenarios that may or may not occur, but I would assume normal tasks like backup of data would be tested.
James
-----Original Message----- From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Blake Golliher Sent: Monday, August 13, 2007 10:36 AM To: letta@jlab.org Cc: toasters@mathworks.com Subject: Re: [Fwd: Network Appliance Customer Support Bulletin: Critical Upgrade - AT-FCX FW to v34 (CSB-0708-03)]
You should get that 4th shelf for free. :)
-Blake
On 8/13/07, Paul Letta letta@jlab.org wrote:
Hello, I can't stress the below bulletin enough. Our site experienced a
12+
hour outage on a FAS3040's first day of production due to this bug.
On Sunday, I did a migration from 2 FAS940's and 1 F880 to a new
FAS3040
with 3 SATA shelves. The first day of production on Monday was great, it took the load with ease and response was great. Level 0 backups
were
due. So at about 5pm on Monday, I started level 0's of the volumes on the new 3040. At 7pm, it panic'ed and halted. It said it lost 1 disk in the root aggr and was rebuilding, during which time it said it lost
2
more.
I wasn't on the phone with L3 tech support more than 5 minutes before they said I was hit with this firmware bug. The shelves came with
V33,
and I hadn't put them in production before the May 31st email alert about the FW upgrade to V34 came out.
So by around 11pm Monday we started the WAFL_check on the root aggr. And it finished about 5am. But there were a lot of lost files and support did not want us to accept the changes by this WAFL_check and
do
it against a snapshot from before the crash. We elected to accept the lost files and then search the snapshot and copy the lost files back
to
the live file system ourselves. But the file system was offline until about 11am, 3 hours into a work day for 2200 users -- not good. And then the 3040 was so slow during that day doing a double
reconstruction
that many services were not responsive at all, our email spool being
the
most visible.
So,
If you have shelves with this firmware -- update it as soon as
possible !
The firmware upgrade was the first thing I did the night of the crash, and it only took about 15 minutes to do 3 shelves with 2 modules each.
The good news is, we got a 4th shelf that was came later than the
rest,
and it came with V34 firmware.
Paul
9 August 2007 Customer Support Bulletin CSB-0708-03
Critical Upgrade - AT-FCX FW to v34 *Summary**:*
This Customer Support Bulletin is meant to reinforce and amplify the need to expeditiously upgrade to the AT-FCX v34 Firmware Release that
is
described in CSB-0705-03 . Customers are required to schedule a maintenance window and upgrade to AT-FCX FW v34 as soon as possible.
*What is the potential service impact **if AT-FCX firmware is not upgraded to v34**:*
Systems with AT-FCX I/O modules (X/SP-5612A-R5) not upgraded to
firmware
v34 can experience the following problems (see link above for more
details):
* WAFL(r) file system inconsistency * Write-data loss * Extended system outages * The need to run WAFL recovery utilities to restore data
integrity
* Unexpected reservation conflicts that in turn can cause
additional
service outages
*What are the failure symptoms if AT-FCX firmware is not upgraded to
v34?*
Systems with AT-FCX I/O modules (X/SP-5612A-R5) not upgraded to
firmware
v34 would see the following failure symptoms related to pre-v34
firmware:**
o Multiple RAID lost-write or checksum errors are seen on multiple drives attached to an AT-FCX shelf running shelf I/O module firmware version 33 or lower. These errors are caused by misdirected writes and/or reads to a disk. In
some
cases, not only is the location wrong, but the wrong disk
is
involved. o The AT-FCX I/O module can occasionally panic during shelf firmware downloading. This will result in loss of disk connectivity and storage multi-disk panic.
*Additional Benefits*
In addition to protection against the above-mentioned problems, the following fixes are included in firmware v34:
* Corrective action that recognizes the failure of the first I/O AFTER a cache flush has failed. This I/O is returned with a senseKey/asc/ascq of 9/88/01 error that causes Data ONTAP(r) to
take
the drive out of service. The data on the drive will then be reconstructed normally without data loss. * Corrects an issue in which the reservation database is not
updated
properly, resulting in unexpected reservation conflicts that in turn can cause service outages. * Avoids a known problem in which a controller failover can occur while downloading shelf firmware.
*Support*
Please contact NetApp Global Services and/or Professional
Services
for questions or assistance concerning the firmware upgrade of AT-FCX (X/SP-5612A-R5) to v34.**
*NOW Self-Service Options*
* To register online for NOW access, go to http://now.netapp.com/newuser/ * To unsubscribe from the Customer Product Announcement
distribution
list, deselect the Field Alerts option in the Subscriptions
https://now.netapp.com/eservice/personal/loadSubscription.do?moduleName =MYPROFILE
area on NOW.
https://now.netapp.com/eservice/personal/loadSubscription.do?moduleName =MYPROFILE
* To review, order, and renew software subscriptions, click
Manage
Service Contracts on the Service and Support <http://now.netapp.com/Self-Service/Forms/SupportHome.asp> home page on NOW.
The purpose of this communication is for NetApp Global Services to notify its installed base end users about urgent and important product information that may affect product performance or reliability. The information contained herein and the distribution list are Network Appliance(tm) confidential materials that are subject to restrictions
on
redistribution and that cannot be shared outside of this email distribution list.
(c) 2007 Network Appliance, Inc. All rights reserved. Specifications subject to change without notice. NetApp, the Network Appliance logo, Data ONTAP, and WAFL are registered trademarks and Network Appliance
and
NOW are trademarks of Network Appliance, Inc. in the U.S. and other countries. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.