Hi,
On Tue, 28 Nov 2006 07:41:58 -0500 "Glenn Walker" ggwalker@mindspring.com wrote:
Enable the option (temporarily) 'cifs.trace_dc_connection'. The output (via screen\messages file) will help.
It may not be an issue with complete connectivity drop, but the DC is definitely rejecting the RPC request to look up group membership (SamrGetAliasMembership).
Apparently, there are some connectivity problems, but it seems they are quite random -- a trace of network traffic between the filer and the PDC reveals some unexpected TCP resets issued byt the DC :
[...] filer -> DC [FIN,ACK] DC->filer [ACK] DC->filer [RST,ACK] [...]
this shouldn't be a problem, since the filer requested a FIN anyway, but the time coincidence is troubling...
Enabling cifs.trace_dc_connection and cifs.trace_login yields some more information:
AUTH: notice- The context has expired. AUTH: notice- No error. AUTH: notice- Unexpected GSSAPI security context error. AUTH: notice- The context has expired. AUTH: notice- No error. CIFSRPC SamrGetAliasMembership: Exception rpc_s_unknown_reject caught. AUTH: Error looking up domain groups during login from 192.168.x.x:RPC_NT_CALL_FAILED (0xc002001b).
and ten seconds later: AUTH: TraceLDAPServer- Starting AD LDAP server address discovery for domain.tld AUTH: TraceLDAPServer- Found 2 AD LDAP server addresses using generic DNS query. AUTH: TraceLDAPServer- AD LDAP server address discovery for domain.tld complete. 2 unique addresses found. AUTH: notice- Unexpected GSSAPI security context error. [...]
This goes on for ten minutes, then the filer tries to locate a DC again, and then everything works fine again
AUTH: TraceDC- Starting DC address discovery for domain. AUTH: TraceDC- Filer is not a member of a site. AUTH: TraceDC- Found 2 addresses using generic DNS query. AUTH: TraceDC- Starting WINS queries. AUTH: TraceDC- Found 2 BDC addresses through WINS. AUTH: TraceDC- Found 1 PDC addresses through WINS. AUTH: TraceDC- DC address discovery for PC complete. 2 unique addresses found.
I'm not really sure of what *should* happen, but this definitely does *not* look good... I understand that a security context expires sometimes, but I wonder why it takes so long to re-negociate
Simon
Just a guess here, but is the time on the filer and PDC synchronized?
I have noticed that even though filers use ntp, they do not correct the time continuously. Instead they periodically reset the local clock according to the ntp server's clock. By default this happens once an hour. Between hourly time resets, the filer uses some internal heuristics to try keep its local time correct, but sometimes it doesn't do a very good job and the time on the filer may gradually drift forward or backward. We had a filer that would drift forward between 5 and 10 seconds over the course of an hour. Our work around was this:
options timed.sched 10m
which resets the time every 10 minutes instead of every hour. Now the filer keeps much better time.
Steve Losen scl@virginia.edu phone: 434-924-0640
University of Virginia ITC Unix Support