Aaron,
Thanks for the information. We found out that one of our
DC's were upgraded and moved to a different domain. WINS was never updated to
reflect this change so the filers still see this DC as a valid DC in their
domain. I'm going to use prefdc until our Windows crew removes this system from
WINS.
I'm running 6.5.5 on the majority of the filers we saw
impact on. I still seen issues on two filers running 7.02. The cifs resetdc
would work on some of the filers but not all of them. On two of the filers I had
to actually terminate cifs and do a cifs setup to bring it back to life. I had
to use prefdc on one filer because if kept trying to rebind to the DC that
shouldn't be listed in our domain. This was the only work around I could think
of at the time.
If I use prefdc and all of my DC's in my list become
unavailible will the filer automatically broadcast to find a
DC?
-Mark
Yeah, we saw something like this a few times with Win2k AD
DC's periodically last year. The issue was intermittent and could affect 1 filer
in a cluster even though they had the same settings.
Deep analysis was done involving packet traces,
Microsoft/NetApp escalation concalls and stubbornly attempting to reproduce
the issue in a lab without success.
The root cause was never fully determined and the only fix
we had was reactive ..... cifs resetdc to re-establish the authenticated
pipe with the DC's.
Essentially, anyone with an existing CIFS session
established was ok. They could operate as per normal. It was clients attempting
new connections that were met with the "Access is denied" message. Do you
know if ALL sessions were affected, existing AND new?
Our problem may have been a little different to you in
that we had a defined prefdc list of 5 DC's. It seemed that if 1 DC
"authentication pipe" shutdown with the filer, the filer was not smart enough to
move onto the next one in the list. This is essentially because the only way a
DC was tested as being "available" was by a ping test. Something they call
DCPING as part of the Filer AD site awareness.
NetApp did release a fix in 6.5.6 (or maybe a slightly
earlier P release) that made the DC test a little smarter. I believe it now uses
the ability to establish a pipe as an available DC criteria. We haven't had any
reported incidents in the past few months, so it may have helped. It is hard to
say due to the intermittent nature.
I will try and track down our original case number if it
helps.
Thu Jan 26 15:03:35 MST [cifs.trace.GSS:error]: AUTH: Unable to acquire
filer credentials: (0x96c73a44) KRB5 error code 68. -----> This is very
similar to the error we would see just before connections dropped for the
clients.
Thu Jan 26 15:03:35 MST
[cifs.server.infoMsg:info]: CIFS: Warning for server \\WNTNANI:
Connection terminated ----> This happens regularly and isn't anything to be
concerned about. EXCEPT apparently when it follows the previous message and no
sessions are attempted to be established with other
DC's.
The other messages look familiar, but I can't be
sure.
What does your "testdc" return?
Does this affect all filers or only some? i.e. Like one cluster partner
affected without the other.
Are your DC's experiencing any issues? i.e. Are the Domain Controller
services and dependent services all up and running? Can you connect to a share
on your domain controllers directly when this happens?
Good luck,
Aaron
Are the times synced up between filer and the domain
controllers? Kerberos is very time sensitive.
Derek
Today I had several filers at different OS levels
stop serving data via CIFS with the following errors. I had to stop and restart
CIFS to get CIFS working properly. The filer acted as if its kerberos ticket
wasn't valid with any of the domain controllers. Has anyone seen this
before?
Cifs domain info command:
Not currently connected to any DCs
Preferred Addresses:
None
Favored Addresses:
ip
WNTNABO2 PDCBROKEN
ip WNTNABO1
PDCBROKEN
Other Addresses:
ip WNTNABO4
PDCBROKEN
ip WNTNACRUBO1 PDCBROKEN
/etc/messages output:
Thu Jan 26 15:03:35 MST [cifs.trace.GSS:error]: AUTH:
Unable to acquire filer credentials: (0x96c73a44) KRB5 error code 68.
Thu Jan 26 15:03:35 MST [cifs.server.infoMsg:info]:
CIFS: Warning for server \\WNTNANI:
Connection terminated.
Thu Jan 26 15:04:07
MST [auth.dc.GetDCName.failed:error]: AUTH: Error 0x0 while trying to get Domain
Controller name for
Thu Jan 26 15:04:32 MST [sshd_0:info]: Did not
receive identification string from x.x.x.x
Thu Jan 26 15:04:39 MST [auth.dc.GetDCName.failed:error]: AUTH: Error 0x0
while trying to get Domain Controller name for
Thu Jan 26 15:05:11 MST
[auth.dc.GetDCName.failed:error]: AUTH: Error 0x0 while trying to get Domain
Controller name for .
Thu Jan 26 15:05:43 MST
[auth.dc.GetDCName.failed:error]: AUTH: Error 0x0 while trying to get Domain
Controller name for
Thu Jan 26 15:07:01 MST [sshd_0:info]: Did not
receive identification string from x.x.x.x
Thu Jan 26 15:09:32 MST [sshd_0:info]: Did not receive identification
string from x.x.x.x
Any insight into this would really help me out. I
have a case open with Netapp as well.
-Mark
**************
IMPORTANT MESSAGE **************
This e-mail message is intended only for the
addressee(s) and contains information which may be confidential.
If you are
not the intended recipient please advise the sender by return email, do not use
or disclose the contents, and delete the message and any attachments from your
system. Unless specifically indicated, this email does not constitute formal
advice or commitment by the sender or the Commonwealth Bank of Australia (ABN 48
123 123 124) or its subsidiaries.
We can be contacted through our web site:
commbank.com.au.
If you no longer wish to receive commercial electronic
messages from us, please reply to this e-mail by typing Unsubscribe in the
subject
line.
***************************************************************