Yeah, we saw something like this a few times with Win2k AD DC's periodically last year. The issue was intermittent and could affect 1 filer in a cluster even though they had the same settings.
Deep analysis was done involving packet traces, Microsoft/NetApp escalation concalls and stubbornly attempting to reproduce the issue in a lab without success.
The root cause was never fully determined and the only fix we had was reactive ..... cifs resetdc to re-establish the authenticated pipe with the DC's.
Essentially, anyone with an existing CIFS session established was ok. They could operate as per normal. It was clients attempting new connections that were met with the "Access is denied" message. Do you know if ALL sessions were affected, existing AND new?
Our problem may have been a little different to you in that we had a defined prefdc list of 5 DC's. It seemed that if 1 DC "authentication pipe" shutdown with the filer, the filer was not smart enough to move onto the next one in the list. This is essentially because the only way a DC was tested as being "available" was by a ping test. Something they call DCPING as part of the Filer AD site awareness.
NetApp did release a fix in 6.5.6 (or maybe a slightly earlier P release) that made the DC test a little smarter. I believe it now uses the ability to establish a pipe as an available DC criteria. We haven't had any reported incidents in the past few months, so it may have helped. It is hard to say due to the intermittent nature.
I will try and track down our original case number if it helps.
Thu Jan 26 15:03:35 MST [cifs.trace.GSS:error]: AUTH: Unable to acquire filer credentials: (0x96c73a44) KRB5 error code 68. -----> This is very similar to the error we would see just before connections dropped for the clients.
Thu Jan 26 15:03:35 MST [cifs.server.infoMsg:info]: CIFS: Warning for server file://WNTNANI/ \WNTNANI: Connection terminated ----> This happens regularly and isn't anything to be concerned about. EXCEPT apparently when it follows the previous message and no sessions are attempted to be established with other DC's.
The other messages look familiar, but I can't be sure.
What does your "testdc" return? Does this affect all filers or only some? i.e. Like one cluster partner affected without the other. Are your DC's experiencing any issues? i.e. Are the Domain Controller services and dependent services all up and running? Can you connect to a share on your domain controllers directly when this happens?
Good luck, Aaron
_____
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of Lai, Derek Sent: Friday, 27 January 2006 10:14 AM To: markallen@micron.com; toasters@mathworks.com Subject: RE: CIFS PDC Issues
Are the times synced up between filer and the domain controllers? Kerberos is very time sensitive.
Derek
_____
From: owner-toasters@mathworks.com [mailto:owner-toasters@mathworks.com] On Behalf Of markallen@micron.com Sent: Thursday, January 26, 2006 2:13 PM To: toasters@mathworks.com Subject: CIFS PDC Issues
Today I had several filers at different OS levels stop serving data via CIFS with the following errors. I had to stop and restart CIFS to get CIFS working properly. The filer acted as if its kerberos ticket wasn't valid with any of the domain controllers. Has anyone seen this before?
Cifs domain info command:
Not currently connected to any DCs Preferred Addresses: None Favored Addresses: ip WNTNABO2 PDCBROKEN ip WNTNABO1 PDCBROKEN Other Addresses: ip WNTNABO4 PDCBROKEN ip WNTNACRUBO1 PDCBROKEN
/etc/messages output:
Thu Jan 26 15:03:35 MST [cifs.trace.GSS:error]: AUTH: Unable to acquire filer credentials: (0x96c73a44) KRB5 error code 68.
Thu Jan 26 15:03:35 MST [cifs.server.infoMsg:info]: CIFS: Warning for server file://\\WNTNANI \WNTNANI: Connection terminated. Thu Jan 26 15:04:07 MST [auth.dc.GetDCName.failed:error]: AUTH: Error 0x0 while trying to get Domain Controller name for
Thu Jan 26 15:04:32 MST [sshd_0:info]: Did not receive identification string from x.x.x.x Thu Jan 26 15:04:39 MST [auth.dc.GetDCName.failed:error]: AUTH: Error 0x0 while trying to get Domain Controller name for
Thu Jan 26 15:05:11 MST [auth.dc.GetDCName.failed:error]: AUTH: Error 0x0 while trying to get Domain Controller name for .
Thu Jan 26 15:05:43 MST [auth.dc.GetDCName.failed:error]: AUTH: Error 0x0 while trying to get Domain Controller name for
Thu Jan 26 15:07:01 MST [sshd_0:info]: Did not receive identification string from x.x.x.x Thu Jan 26 15:09:32 MST [sshd_0:info]: Did not receive identification string from x.x.x.x
Any insight into this would really help me out. I have a case open with Netapp as well.
-Mark
************** IMPORTANT MESSAGE ************** This e-mail message is intended only for the addressee(s) and contains information which may be confidential. If you are not the intended recipient please advise the sender by return email, do not use or disclose the contents, and delete the message and any attachments from your system. Unless specifically indicated, this email does not constitute formal advice or commitment by the sender or the Commonwealth Bank of Australia (ABN 48 123 123 124) or its subsidiaries. We can be contacted through our web site: commbank.com.au. If you no longer wish to receive commercial electronic messages from us, please reply to this e-mail by typing Unsubscribe in the subject line. ***************************************************************