As a wild guess - have you checked that clock is synchronized between DCs and filer? Is it possible that clock drifts too far between time daemon scheduled updates?
-andrey
________________________________
From: Glenn Walker [mailto:ggwalker@mindspring.com] Sent: Wed 11/29/2006 2:08 PM To: Simon Vallet Cc: Borzenkov, Andrey; toasters@mathworks.com Subject: RE: Intermittent "Permission denied" on NTFS qtree
More logs would be helpful - by default, the filer will try to 'improve' DC connectivity by searching every 4 hours. The last part of the logs that you posted may very well be that, or more likely it is trying to regain connectivity after an error (security context expiring is not something that is seen frequently - this is abnormal!).
The biggest problems are the context expiring, the GSSAPI security context error (result of the security context expiring, no doubt), and the RPC rejection.
I'm assuming that the RPC rejection and the GSSAPI errors are a direct result of the context expiration.
Glenn
-----Original Message----- From: Simon Vallet [mailto:svallet@genoscope.cns.fr] Sent: Wednesday, November 29, 2006 6:10 AM To: Glenn Walker Cc: Andrey.Borzenkov@fujitsu-siemens.com; toasters@mathworks.com Subject: Re: Intermittent "Permission denied" on NTFS qtree
Hi,
On Tue, 28 Nov 2006 07:41:58 -0500 "Glenn Walker" ggwalker@mindspring.com wrote:
Enable the option (temporarily) 'cifs.trace_dc_connection'. The
output (via screen\messages file) will help.
It may not be an issue with complete connectivity drop, but the DC is
definitely rejecting the RPC request
to look up group membership (SamrGetAliasMembership).
Apparently, there are some connectivity problems, but it seems they are quite random -- a trace of network traffic between the filer and the PDC reveals some unexpected TCP resets issued byt the DC :
[...] filer -> DC [FIN,ACK] DC->filer [ACK] DC->filer [RST,ACK] [...]
this shouldn't be a problem, since the filer requested a FIN anyway, but the time coincidence is troubling...
Enabling cifs.trace_dc_connection and cifs.trace_login yields some more information:
AUTH: notice- The context has expired. AUTH: notice- No error. AUTH: notice- Unexpected GSSAPI security context error. AUTH: notice- The context has expired. AUTH: notice- No error. CIFSRPC SamrGetAliasMembership: Exception rpc_s_unknown_reject caught. AUTH: Error looking up domain groups during login from 192.168.x.x:RPC_NT_CALL_FAILED (0xc002001b).
and ten seconds later: AUTH: TraceLDAPServer- Starting AD LDAP server address discovery for domain.tld AUTH: TraceLDAPServer- Found 2 AD LDAP server addresses using generic DNS query. AUTH: TraceLDAPServer- AD LDAP server address discovery for domain.tld complete. 2 unique addresses found. AUTH: notice- Unexpected GSSAPI security context error. [...]
This goes on for ten minutes, then the filer tries to locate a DC again, and then everything works fine again
AUTH: TraceDC- Starting DC address discovery for domain. AUTH: TraceDC- Filer is not a member of a site. AUTH: TraceDC- Found 2 addresses using generic DNS query. AUTH: TraceDC- Starting WINS queries. AUTH: TraceDC- Found 2 BDC addresses through WINS. AUTH: TraceDC- Found 1 PDC addresses through WINS. AUTH: TraceDC- DC address discovery for PC complete. 2 unique addresses found.
I'm not really sure of what *should* happen, but this definitely does *not* look good... I understand that a security context expires sometimes, but I wonder why it takes so long to re-negociate
Simon