RE: Problem on 840 Cluster.. - toasters

20 Dec 2000


      Premanshu,
I assume you meant 5.3.6H1R2, not that it really matters.  As
for the failed takeovers it should fail in both directions as
you have a volume on each node with the same fsid.  The fsid
is set when the volume is created and is based on the system id
of the filer which is extracted from the nvram card.  The
likely way that this occurred was that vol0 was created on one
node and then moved to the other node.  Then another vol0 was
then created on the first node.  Note that when I say "moved"
it could have happened if someone was playing with the cabling
and switched shelves around.
As for getting out of this, the easiest way is to redo the
filesystems assuming you haven't populated the volumes with
data yet.  Otherwise I'd call netapp support and they should
walk you through rewriting the fsids on one of the volumes;
the one that doesn't match the system id of the host node.
-Steve
...
-----Original Message-----
From: Premanshu Jain [mailto:PrJain@shastanets.com]
Sent: Tuesday, December 19, 2000 5:34 PM
To: toasters
Subject: Problem on 840 Cluster..
Feel toasters will be faster to help...
Here is the description of my problem>
I have an 840 cluster running 5.3.5H1 R2. When I initiate a 
manual takeover
from A, it successfully takes over B. However, the reverse is 
not true. If I
initiate a manual takeover from B (cf takeover) it fails and 
tends to hang A
in rebooting stage..Here are the typical error messages..
___________________________________________
Here is the o/p when I initiate a manual takeover.. 
___________________ 
Tue Dec 19 16:31:52 PST [rc]: Cluster monitor: takeover 
initiated by operat 
or 
Tue Dec 19 16:31:52 PST [cf_main]: Cluster monitor: takeover 
attempted after
cf 
takeover command 
Tue Dec 19 16:31:52 PST [cf_main]: Cluster monitor: UP --> TAKEOVER 
Tue Dec 19 16:31:52 PST [cf_takeover]: Cluster monitor: 
takeover started 
Tue Dec 19 16:32:04 PST [disk_admin]: Resetting all devices 
on ISP2100 in
slot 9 
Tue Dec 19 16:32:12 PST [raid_disk_admin]: Root volume vol0 
has the same
FSID as 
another volume on this system. 
Tue Dec 19 16:32:13 PST last message repeated 5 times 
Tue Dec 19 16:32:13 PST [raid_disk_admin]: RAID takeover 
failed: Partner
disk la 
bel processing failed! 
Tue Dec 19 16:32:13 PST [cf_takeover]: Cluster monitor: 
takeover during raid
fai 
led; takeover cancelled 
Tue Dec 19 16:32:13 PST [cf_takeover]: Cluster monitor: 
takeover failed
'unable 
to start partner' 
Tue Dec 19 16:32:13 PST [cf_giveback]: Cluster monitor: 
giveback started 
Tue Dec 19 16:32:13 PST [disk_admin]: Resetting all devices 
on ISP2100 in
slot 8 
Tue Dec 19 16:32:13 PST [disk_admin]: Resetting all devices 
on ISP2100 in
slot 9 
Tue Dec 19 16:32:14 PST [asup_main]: Cluster Notification mail sent 
Tue Dec 19 16:32:20 PST [cf_giveback]: Cluster monitor: 
giveback completed 
Tue Dec 19 16:32:20 PST [cf_main]: Cluster monitor: TAKEOVER --> UP 
Tue Dec 19 16:32:20 PST [cf_main]: Cluster monitor: partner 
not responding 
bot> cf status 
Waiting for store to recover. 
bot has disabled takeover by store (unsynchronized log) 
________________________
Other Question :
I can't use the Windows Hyperterminal keyboard, when console is pluged
directly to com port of a windows PC. I was able to do it 
earlier..but not
now..
prem
Premanshu Jain
Nortel Networks, Content Networks(Shasta)
2305, Mission College Blvd, Santa Clara, CA 95054
Direct: (408) 565-3573    ESN: 655-3573
eMail: prjain@shastanets.com     Web:
http://www.nortelnetworks.com/ipservices