I was reading the docs prior to upgrading to 5.2.3, and I realized that I get to upgrade everything, os, filer fw, and disk fw..
The thing that occured to me as unfortunate was from na_disk_fw_update(1):
This command makes disks inaccessible for up to 2 minutes, so network sessions using the filer should be closed down before running it.
I've only got 4 shelves, so its not a huge deal, but I could see this having a big impact in larger environments..
Obviously the disk has to be inaccesible during the period of the fw upgrade, but disabling file service necessary? The reason I ask is that the filer can already run short a disk following a failure, so why not copy this functionality so that disk_fw updates can be done without service disruption?
Hi Kevin.
You are correct that a RAID system protects against a single disk failure; the raid group is placed into degraded mode and the contents rebuilt onto a spare.
Now, to address your point, yes, it's technically possible to place a raid group into "pseudo-degraded" mode where a disk is temporarily take offline but no reconstruction is started. Note, OnTap doesn't work this way and making it work this way would take quite a bit of doing. For one thing, "re-syncing" the disk which was temporarily taken offline and then brought back later would take quite a bit of book keeping and associated headaches.
*If* the above machinery was in place, firmware updating could be made to work one-disk-at-a-time without any disruption of service. However, one has to ask oneself whether it's worth it? How often does one upgrade disk firmware?
Hope this answers your question. Let me know otherwise.