How long should it take a filer to fail over from one head to another? When I force a failover (cf forcetakeover) from one head the other goes down for minutes. Here's what I see on the console. This is a new filer with very little traffic going to it and FC is not even set up yet, all NFS/CIFS.
array01> cf forcetakeover
cf forcetakeover may lead to data corruption; really force a takeover? yes
cf: forcetakeover initiated by operator
array01> Mon May 5 12:27:48 EST [array01: cf.misc.operatorForcedTakeover:warning]: Cluster monitor: forced takeover initiated by operator
Mon May 5 12:27:48 EST [array01: cf.fsm.takeover.forced:info]: Cluster monitor: takeover attempted after cf forcetakeover command
Mon May 5 12:27:48 EST [array01: cf.fsm.stateTransit:warning]: Cluster monitor: UP --> TAKEOVER
Mon May 5 12:27:48 EST [array01: cf.fm.takeoverStarted:warning]: Cluster monitor: takeover started
Mon May 5 12:27:48 EST [array02/array01: coredump.spare.none:info]: No sparecore disk was found.
Mon May 5 12:27:51 EST [array01: raid.vol.replay.nvram:info]: Performing raid replay on volume(s)
Mon May 5 12:27:51 EST [array01: raid.cksum.replay.summary:info]: Replayed 0 checksum blocks.
Mon May 5 12:27:51 EST [array01: raid.stripe.replay.summary:info]: Replayed 0 stripes.
Mon May 5 12:27:54 EST [array02/array01: wafl.replay.done:info]: WAFL log replay completed, 2 seconds
ifconfig: no such media type <xxx>
media type options are: <tp> <tp-fd> <100tx> <100tx-fd> <1000fx> <auto> <10g-sr>
ifconfig: Unable to determine primary for interface e0a.
ifconfig: e0a: no such interface
ifconfig: Unable to determine primary for interface e0b.
ifconfig: e0b: no such interface
ifconfig: Unable to determine primary for interface e0c.
ifconfig: e0c: no such interface
ifconfig: Unable to determine primary for interface e0d.
ifconfig: e0d: no such interface
ifconfig: Unable to determine primary for interface e2a.
ifconfig: e2a: no such interface
ifconfig: Unable to determine primary for interface e2b.
ifconfig: e2b: no such interface
add net default: gateway 10.28.17.1: network unreachable
Mon May 5 12:27:55 EST [array02/array01: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e0a.
Mon May 5 12:27:55 EST [array02/array01: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e0b.
Mon May 5 12:27:55 EST [array02/array01: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e0c.
Mon May 5 12:27:55 EST [array02/array01: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e0d.
Mon May 5 12:27:55 EST [array02/array01: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e2a.
Mon May 5 12:27:55 EST [array02/array01: net.ifconfig.noLocal:error]: ifconfig: Unable to determine primary for interface e2b.
Mon May 5 12:27:55 EST [array02/array01: nis.servers.not.available:error]: NIS server(s) not available.
Mon May 5 12:27:55 EST [array02/array01: cf_takeover:info]: relog syslog Mon May 5 12:26:00 EST [array02: monitor.globalStatus.ok:info]: The system's global status is normal.
Mon May 5 12:27:55 EST [array02/array01: cf_takeover:info]: relog syslog Mon May 5 12:27:47 EST [array02: cf.fsm.takeoverOfPartnerDisabled:notice]: Cluster monitor: takeover of array
There are 68 spare disks; you may want to use the vol or aggr command
to create new volumes or aggregates or add disks to the existing aggregate.
FCP service stopped.
Mon May 5 12:27:55 EST [array01: net.ifconfig.takeoverError:warning]: WARNING: 6 errors detected during network takeover processing WARNING: Some network clients may not be able to access the cluster during takeover
Mon May 5 12:27:55 EST [array01: cf.rsrc.takeoverOpFail:error]: Cluster monitor: takeover during ifconfig_2 failed; takeover continuing...
CIFS partner server is running.
Mon May 5 12:27:55 EST [array01 (takeover): cf.rsrc.transitTime:notice]: Top Takeover transit times wafl_replay=2383 {replay_log=2353, mark_replaying=29}, raid=832, rc=410 {hostname=51, ifconfig=46, options=23, options=14, options=10, options=9, ifconfig=1, ifconfig=1, ifconfig=1, route=1}, wafl=405, registry_postrc_phase1=227, raid_replay=179, registry_prerc=115, wafl_sync=74, fmdisk_reserve=70, cifs=70
Mon May 5 12:27:55 EST [array01 (takeover): cf.fm.takeoverComplete:warning]: Cluster monitor: takeover completed
Mon May 5 12:27:55 EST [array01 (takeover): cf.fm.takeoverDuration:warning]: Cluster monitor: takeover duration time is 7 seconds
Mon May 5 12:27:58 EST [array02/array01: asup.smtp.host:info]: Autosupport cannot connect to host smtp.danahermail.com (Network comm problem) for message: REBOOT (CLUSTER TAKEOVER)
Mon May 5 12:27:58 EST [array02/array01: asup.smtp.unreach:error]: Autosupport mail was not sent because the system cannot reach any of the mail hosts from the autosupport.mailhost option. (REBOOT (CLUSTER TAKEOVER))
Mon May 5 12:28:00 EST [array01 (takeover): monitor.globalStatus.critical:CRITICAL]: This node has taken over array02.
Mon May 5 12:28:00 EST [array02/array01: monitor.globalStatus.critical:CRITICAL]: array01 has taken over this node.
Mon May 5 12:28:05 EST [array02/array01: nbt.nbns.registrationComplete:info]: NBT: All CIFS name registrations have completed for the partner server.
Mon May 5 12:28:07 EST [array01 (takeover): asup.post.sent:notice]: Cluster Notification message posted to IBM: Cluster Notification from array01 (CLUSTER TAKEOVER COMPLETE MANUAL) INFO
Mon May 5 12:32:07 EST [array02/array01: asup.smtp.host:info]: Autosupport cannot connect to host smtp.danahermail.com (Network comm problem) for message: REBOOT (CLUSTER TAKEOVER)
Mon May 5 12:32:07 EST [array02/array01: asup.smtp.unreach:error]: Autosupport mail was not sent because the system cannot reach any of the mail hosts from the autosupport.mailhost option. (REBOOT (CLUSTER TAKEOVER))
Here is the ifconfig -a from the node that stayed up:
array01(takeover)> ifconfig -a
e0a: flags=948043<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500
ether 02:a0:98:08:22:b7 (auto-1000t-fd-up) flowcontrol full
trunked lan0
e0b: flags=108042<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500
ether 00:a0:98:08:22:b6 (auto-unknown-cfg_down) flowcontrol full
e0c: flags=948043<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500
ether 02:a0:98:08:22:b7 (auto-1000t-fd-up) flowcontrol full
trunked lan0
e0d: flags=108042<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500
ether 00:a0:98:08:22:b4 (auto-unknown-cfg_down) flowcontrol full
e2a: flags=108042<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500
ether 00:07:43:05:16:98 (auto-10g_sr-fd-cfg_down) flowcontrol full
e2b: flags=108042<BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500
ether 00:07:43:05:16:99 (auto-10g_sr-fd-cfg_down) flowcontrol full
lo: flags=19e8049<UP,LOOPBACK,RUNNING,MULTICAST,MULTIHOST,PARTNER_UP,TCPCKSU M> mtu 8160
inet 127.0.0.1 netmask 0xff000000 broadcast 127.0.0.1
ether 00:00:00:00:00:00 (VIA Provider)
lan0: flags=948043<UP,BROADCAST,RUNNING,MULTICAST,TCPCKSUM> mtu 1500
inet 10.28.17.213 netmask 0xffffff00 broadcast 10.28.17.255
partner lan0 (not in use)
ether 02:a0:98:08:22:b7 (Enabled virtual interface)
This message (including any attachments) contains confidential and/or proprietary information intended only for the addressee. Any unauthorized disclosure, copying, distribution or reliance on the contents of this information is strictly prohibited and may constitute a violation of law. If you are not the intended recipient, please notify the sender immediately by responding to this e-mail, and delete the message from your system. If you have any questions about this e-mail please notify the sender immediately.