Last night we had an issue with our production filer where controller A went down but controller B did not know that A is down and did not took over. Only after I manually power off the controller A then controller B took over. I then brought
the Controller A back up and did the give back and the filer worked fine for about 3 hours and same thing happened again. This time I shut down controller A and let it shut down and right now we are operating off of controller B only. Per netapp there is nothing
in the logs that I sent them or in autosupport to suggest the root cause. They are asking us to bring controller A again and wait for it to happen again and if it does collect a core dump. This option is not acceptable to us as this affects production. Has
anyone seen this before or have any ideas?
Filer failover settings are all correct. We are running CDOT 8.3.1 on FAS8040.
Mustafa Sayla