On Wed, 29 Nov 2006 08:08:22 -0500 "Glenn Walker" ggwalker@mindspring.com wrote:
More logs would be helpful - by default, the filer will try to 'improve' DC connectivity by searching every 4 hours. The last part of the logs that you posted may very well be that, or more likely it is trying to regain connectivity after an error (security context expiring is not something that is seen frequently - this is abnormal!).
Yes -- actually, the filer did try to "improve" its LDAP connection *just before* the first GSSAPI error:
06:33:05 CET -- AUTH: TraceLDAPServer- Attempting to improve AD LDAP connection for domain.tld 06:33:05 CET -- AUTH: TraceLDAPServer- AD LDAP connection improved for PC.LOCAL. 06:33:59 CET -- AUTH: notice- Unexpected GSSAPI security context error.
this might be related
The biggest problems are the context expiring, the GSSAPI security context error (result of the security context expiring, no doubt), and the RPC rejection.
I'm assuming that the RPC rejection and the GSSAPI errors are a direct result of the context expiration.
Digging a bit more, I looked into the details of what is going over the wire during that time:
From 06:34:00 (first GSSAPI error) until 06:38 (no GSSAPI error seen after this), the filer constantly issues LDAP Bind requests, to which the DC replies with 'saslBindInProgress'
From what I know of the GSSAPI/Krb5 negociation, the filer should then continue the challeng-response until it has a valid Bind response. But instead, it issues an Unbind request, suggesting it doesn't have valid Kerberos credentials and is giving up :
06:34:00 filer -> DC -- Bind request (Initial: MsgId=1) 06:34:00 DC->filer -- saslBindInProgress 06:34:00 filer->DC -- Unbind request (answer: MsgId=2)
At 06:38:10, it seems to re-negociate a Kerberos ticket with the DC, and the following LDAP Bind request then proceeds as (I think) it should:
06:38:11 filer -> DC -- Bind request (Initial: MsgId=1) 06:38:11 DC->filer -- saslBindInProgress 06:38:11 filer->DC -- Bind request (answer: MsgId=2) 06:38:11 DC->filer -- saslBindInProgress 06:38:11 filer->DC -- Bind request (answer: MsgId=3) 06:38:11 DC->filer -- Bind success
So it seems the question is indeed: why does the filer looses its Kerberos credentials ? I'm thinking with Andrew that this could be due to buggy MS patches : we did indeed apply several patches two weeks ago, and that's were the problems seem to start. But identifying the incorrect patch will not be easy, which is why I wanted to check the general behaviour first.
The reason why I'm thinking this is that the successful LDAP connection above is brutally interrupted by the DC at 06:54:29 with a RST in reply to an apparently typical LDAP query -- I'll have to dig a bit more into Kerberos to check the credentials are still valid at this time : We have another RST-interrupted LDAP session between 06:54:29 and 07:18:31, and then another one , starting at 07:18:31, which didn't get interrupted until the end of the dump, at 11:44.
This all thing is very strange: the RST could be due to wrong/expired credentials or a buggy MS patch : oddly enough, every "Bind success" issued by the DC can't be completely decoded by Ethereal (ethereal bug or MS bug ?) Maybe the ticket expiration time somehow ends badly encoded ?
Thanks for the input so far, Simon