Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

DEMinSoCAL

macrumors 603
Original poster
Sep 27, 2005
5,081
7,315
We have 2 SL servers (10.6.8). One is an OD master and the other holds a replica.

Recently, I created a new user account for a new employee. We started having problems with the employee being unable to log in or connect to shares. If we reboot the server, the user can log in (for a while).

When the user can't login, looking at Workgroup Admin on the OD master, the account is there. On the OD replica, it's gone. If I "refresh" WA, the account shows up, but the user still can't log in. Reboot the replica server, and the user can log in and the account shows up in WA without having to "refresh". After some amount of time (less than 24 hours) the problem returns, with the same issue...

At first I thought there was some sort of communication problem between the servers, but I don't think there is. When it's not working and the account is missing in WA on the replica server, if I make a change to the user account on the OD master server (like change the picture icon of the user or other info in the account), then go to the replica server and start WA and refresh so the account shows up, the change is there! If I make more changes at this point, the changes are immediate on both servers. So, they are talking.

Any ideas? This is driving me crazy why it's happening, why restarting the replica server fixes it, and why it stops working again after a day or so.
 

DEMinSoCAL

macrumors 603
Original poster
Sep 27, 2005
5,081
7,315
Sure. I think this speaks to the level in which Apple does not give a #$%@ about server. I have had similar issues and have gone back to Windows. Try it, it works really well.

I wish I could. All my clients have Windows Server and Exchange Server except for this one, which I came into with a mixture of Mac and Windows clients, and all Mac servers. After working a year with this, trust me...I'd LOVE to replace the servers with SBS 2011, but at this time, it isn't possible.

In the mean time, I'd love to find a fix. I believe it's an LDAP problem. Looking at the console logs on the replica server, the ldap log shows constant entries stating "LDAP Server not found".

Why it works for a bit, then stops...I'm not sure.
 

StevenMeyer

macrumors member
Dec 17, 2011
90
0
New York... Where Else?
I wish I could. All my clients have Windows Server and Exchange Server except for this one, which I came into with a mixture of Mac and Windows clients, and all Mac servers. After working a year with this, trust me...I'd LOVE to replace the servers with SBS 2011, but at this time, it isn't possible.

In the mean time, I'd love to find a fix. I believe it's an LDAP problem. Looking at the console logs on the replica server, the ldap log shows constant entries stating "LDAP Server not found".

Why it works for a bit, then stops...I'm not sure.

Any way you can post logs? I'll try and hash through them. LDAP is always a tricky protocol. You can also do something called an ldap trace, I know the commands for unix and linux but not osx (ill link another page). Wireshark should help you also look for bad packets.
http://forums.novell.com/netiq/neti...s-when-one-replica-becomes-unavailable-2.html
 
Last edited:

StevenMeyer

macrumors member
Dec 17, 2011
90
0
New York... Where Else?
We have 2 SL servers (10.6.8). One is an OD master and the other holds a replica.

Recently, I created a new user account for a new employee. We started having problems with the employee being unable to log in or connect to shares. If we reboot the server, the user can log in (for a while).

When the user can't login, looking at Workgroup Admin on the OD master, the account is there. On the OD replica, it's gone. If I "refresh" WA, the account shows up, but the user still can't log in. Reboot the replica server, and the user can log in and the account shows up in WA without having to "refresh". After some amount of time (less than 24 hours) the problem returns, with the same issue...

At first I thought there was some sort of communication problem between the servers, but I don't think there is. When it's not working and the account is missing in WA on the replica server, if I make a change to the user account on the OD master server (like change the picture icon of the user or other info in the account), then go to the replica server and start WA and refresh so the account shows up, the change is there! If I make more changes at this point, the changes are immediate on both servers. So, they are talking.

Any ideas? This is driving me crazy why it's happening, why restarting the replica server fixes it, and why it stops working again after a day or so.

After re-reading this it really looks like a blacklisting. Have you tried to have the user login through a terminal you know is problem free? Every once in a while when mac's networking card go bad they start shooting out tons of crap that gets them locked out of things (it would be rare but SL could think its the user trying to attack the server).
 

DEMinSoCAL

macrumors 603
Original poster
Sep 27, 2005
5,081
7,315
After re-reading this it really looks like a blacklisting. Have you tried to have the user login through a terminal you know is problem free? Every once in a while when mac's networking card go bad they start shooting out tons of crap that gets them locked out of things (it would be rare but SL could think its the user trying to attack the server).

It's not the workstation (which is a Macbook Pro). Even trying to use the users account to connect to a share FROM THE SERVER TO THE OTHER SERVER, it will not use that users account (but will use other users' accounts). Also, to allow this new employee to access the shares, she is successfully using the previous employees credentials when connecting to shares.

What I cannot understand is why this one user, and why it works fine for a while and then stops working?

Here's a sample of the ldap log on the replica server JUST AFTER A RESTART:


Mar 3 12:56:19 art slapd[76]: @(#) $OpenLDAP: slapd 2.4.11 (Aug 12 2010 17:17:10) $
Mar 3 12:56:19 art slapd[76]: daemon: SLAP_SOCK_INIT: dtblsize=8192
Mar 3 12:56:23 art slapd[76]: bdb_monitor_db_open: monitoring disabled; configure monitor database to enable
Mar 3 12:56:23 art slapd[76]: slapd starting
Mar 3 13:09:47 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 3 13:09:47 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 3 14:09:46 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 3 14:09:46 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 3 15:09:45 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 3 15:09:45 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 3 16:09:47 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 3 16:09:47 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 3 17:09:47 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 3 17:09:47 art slapd[76]: do_syncrepl: rid=183 retrying


Then, after a few days I start getting this:

Mar 6 02:11:34 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 6 02:11:34 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 6 02:22:46 art slapd[76]: SASL [conn=744] Failure: Have neither type of secret
Mar 6 03:09:58 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 6 03:09:58 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 6 03:11:33 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 6 03:11:33 art slapd[76]: do_syncrepl: rid=183 retrying


Maybe you can make some sense of this? Thanks!
 

matspekkie

macrumors member
Oct 19, 2010
97
0
Have you tried to make a new user and move all the files from the old account to the new and then delete the old account?
 

cg0def

macrumors regular
Feb 9, 2009
141
0
you should check for a hardware failure. Not sure what hardware you're running on, but if it's a Mac Mini or anything that does not have ECC RAM a memory failure can manifest in very very strange ways. I'm speaking from personal experience. The OS does not always crash and memory faults do usually manifest themselves as software bugs.

Anyway, since you are on SL, you can use AppleJack but generally, you should use Apple Service Diagnostics or Apple Hardware Test tool. Apple Service Diagnostics is rather hard to get if you don't already have access to it, AHT comes as part of your SL installation (hold the D button before the gray startup screen)
 

DEMinSoCAL

macrumors 603
Original poster
Sep 27, 2005
5,081
7,315
you should check for a hardware failure. Not sure what hardware you're running on, but if it's a Mac Mini or anything that does not have ECC RAM a memory failure can manifest in very very strange ways. I'm speaking from personal experience. The OS does not always crash and memory faults do usually manifest themselves as software bugs.

Anyway, since you are on SL, you can use AppleJack but generally, you should use Apple Service Diagnostics or Apple Hardware Test tool. Apple Service Diagnostics is rather hard to get if you don't already have access to it, AHT comes as part of your SL installation (hold the D button before the gray startup screen)

They are running on 2009 Mac PRO's. I hear what you're saying, but it sure seems unlikely that it's a hardware problem!
 

StevenMeyer

macrumors member
Dec 17, 2011
90
0
New York... Where Else?
It's not the workstation (which is a Macbook Pro). Even trying to use the users account to connect to a share FROM THE SERVER TO THE OTHER SERVER, it will not use that users account (but will use other users' accounts). Also, to allow this new employee to access the shares, she is successfully using the previous employees credentials when connecting to shares.

What I cannot understand is why this one user, and why it works fine for a while and then stops working?

Here's a sample of the ldap log on the replica server JUST AFTER A RESTART:


Mar 3 12:56:19 art slapd[76]: @(#) $OpenLDAP: slapd 2.4.11 (Aug 12 2010 17:17:10) $
Mar 3 12:56:19 art slapd[76]: daemon: SLAP_SOCK_INIT: dtblsize=8192
Mar 3 12:56:23 art slapd[76]: bdb_monitor_db_open: monitoring disabled; configure monitor database to enable
Mar 3 12:56:23 art slapd[76]: slapd starting
Mar 3 13:09:47 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 3 13:09:47 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 3 14:09:46 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 3 14:09:46 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 3 15:09:45 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 3 15:09:45 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 3 16:09:47 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 3 16:09:47 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 3 17:09:47 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 3 17:09:47 art slapd[76]: do_syncrepl: rid=183 retrying


Then, after a few days I start getting this:

Mar 6 02:11:34 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 6 02:11:34 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 6 02:22:46 art slapd[76]: SASL [conn=744] Failure: Have neither type of secret
Mar 6 03:09:58 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 6 03:09:58 art slapd[76]: do_syncrepl: rid=183 retrying
Mar 6 03:11:33 art slapd[76]: do_syncrep2: rid=183 (-1) Can't contact LDAP server
Mar 6 03:11:33 art slapd[76]: do_syncrepl: rid=183 retrying


Maybe you can make some sense of this? Thanks!


This is 90% a cert issue. when ldaps replicate they use a ssl or tls authentication. remove your certs and try again.
 

cg0def

macrumors regular
Feb 9, 2009
141
0
They are running on 2009 Mac PRO's. I hear what you're saying, but it sure seems unlikely that it's a hardware problem!

You are absolutely correct that it is not a hardware related issue and I jumped the gun a bit there. I just took a better look at the log file that you have posted.
Is there any reason why you slapd times out every hour? It would seem like you are having a session time out problem which is usually caused by a configuration problem of the tcp stack and more specifically the
tcp keepalive parameter.

I have no idea why your firewall goes crazy after a couple of days but it's probably counting a number of failed attempts before deciding that it's an attack. Anyway a quick and dirty solution would probably be to adjust the tcp keepalive parameter on the client machine.

Rather than me describing how to do this here's a very good description

http://www.gnugk.org/keepalive.html

take a look under the FreeBSD and MacOS section. You will need to make sure that the tcp keepalive (which on OS X is net.inet.tcp.keepidle + (net.inet.tcp.keepintvl x 8) ) is not larger than the allowed connection time period on the server. I think your server is set to max 60 min because you get errors every 61st minute. If you don't want to keep changing the setting on every new computer that you guys get, you might want to relax the server settings a bit.
 

StevenMeyer

macrumors member
Dec 17, 2011
90
0
New York... Where Else?
You are absolutely correct that it is not a hardware related issue and I jumped the gun a bit there. I just took a better look at the log file that you have posted.
Is there any reason why you slapd times out every hour? It would seem like you are having a session time out problem which is usually caused by a configuration problem of the tcp stack and more specifically the
tcp keepalive parameter.

I have no idea why your firewall goes crazy after a couple of days but it's probably counting a number of failed attempts before deciding that it's an attack. Anyway a quick and dirty solution would probably be to adjust the tcp keepalive parameter on the client machine.

Rather than me describing how to do this here's a very good description

http://www.gnugk.org/keepalive.html

take a look under the FreeBSD and MacOS section. You will need to make sure that the tcp keepalive (which on OS X is net.inet.tcp.keepidle + (net.inet.tcp.keepintvl x 8) ) is not larger than the allowed connection time period on the server. I think your server is set to max 60 min because you get errors every 61st minute. If you don't want to keep changing the setting on every new computer that you guys get, you might want to relax the server settings a bit.

+1 +cert issue
 

matspekkie

macrumors member
Oct 19, 2010
97
0
You probably checked this too but anyway. The clock on the servers should be in sync anything greater then a 5 minute drift will stop open directory from working.
 

DEMinSoCAL

macrumors 603
Original poster
Sep 27, 2005
5,081
7,315
You probably checked this too but anyway. The clock on the servers should be in sync anything greater then a 5 minute drift will stop open directory from working.

Just checked, and the time and date on both server are correct within a few seconds. Thanks, though!

----------

You are absolutely correct that it is not a hardware related issue and I jumped the gun a bit there. I just took a better look at the log file that you have posted.
Is there any reason why you slapd times out every hour? It would seem like you are having a session time out problem which is usually caused by a configuration problem of the tcp stack and more specifically the
tcp keepalive parameter.

I have no idea why your firewall goes crazy after a couple of days but it's probably counting a number of failed attempts before deciding that it's an attack. Anyway a quick and dirty solution would probably be to adjust the tcp keepalive parameter on the client machine.

Rather than me describing how to do this here's a very good description

http://www.gnugk.org/keepalive.html

take a look under the FreeBSD and MacOS section. You will need to make sure that the tcp keepalive (which on OS X is net.inet.tcp.keepidle + (net.inet.tcp.keepintvl x 8) ) is not larger than the allowed connection time period on the server. I think your server is set to max 60 min because you get errors every 61st minute. If you don't want to keep changing the setting on every new computer that you guys get, you might want to relax the server settings a bit.

I wasn't sure if slapd was timing out every hour, or just that it only checks every hour in order to replicate. It's interesting though that slapd is having a problem, because a month or two ago we had an ongoing problem with slapd ON BOTH SERVERS using 25-50% CPU constant. Rebooting did not fix it. Then, for some reason, rebooting the servers one weekend and slapd was no longer stuck on high CPU.

The way I see it now is, the servers DO see each other, as I can make account changes to this one account and the other server picks it up immediately. Now, this account does show up in WM on both servers, but is still not usable. I still cannot use that account to mount a share from either server unless I reboot the replica server.

I don't see this as a workstation issue, as I can replicate the problem completely using only the two servers (connect to share on one server from the other server using this trouble account). No client station is involved.

I can look at the certs being an issue. Whereabouts would I go to look at that, and if I remove them and reinstall them, what are the ramifications of doing so?

Thanks all for your help! :)
 

StevenMeyer

macrumors member
Dec 17, 2011
90
0
New York... Where Else?
Just checked, and the time and date on both server are correct within a few seconds. Thanks, though!

----------



I wasn't sure if slapd was timing out every hour, or just that it only checks every hour in order to replicate. It's interesting though that slapd is having a problem, because a month or two ago we had an ongoing problem with slapd ON BOTH SERVERS using 25-50% CPU constant. Rebooting did not fix it. Then, for some reason, rebooting the servers one weekend and slapd was no longer stuck on high CPU.

The way I see it now is, the servers DO see each other, as I can make account changes to this one account and the other server picks it up immediately. Now, this account does show up in WM on both servers, but is still not usable. I still cannot use that account to mount a share from either server unless I reboot the replica server.

I don't see this as a workstation issue, as I can replicate the problem completely using only the two servers (connect to share on one server from the other server using this trouble account). No client station is involved.

I can look at the certs being an issue. Whereabouts would I go to look at that, and if I remove them and reinstall them, what are the ramifications of doing so?

Thanks all for your help! :)

http://support.apple.com/kb/HT4183
If you disable all certs then restart, it may fix it.
Either way post the log again to see if that changed anything. (unless it did fix it, in which case i'm a wizard!)
 

DEMinSoCAL

macrumors 603
Original poster
Sep 27, 2005
5,081
7,315
http://support.apple.com/kb/HT4183
If you disable all certs then restart, it may fix it.
Either way post the log again to see if that changed anything. (unless it did fix it, in which case i'm a wizard!)

I will look at that kb article. One thing I did do a couple of days ago (and, so far, so good), is I changed the users password type from Crypt Password to Open Directory. Her account still works! But, I do have other user accounts that were created with Crypt Password and never have had a problem with them. Most user accounts look like they were created with Open Directory password type.

Could the certs still be a problem given these outcomes? When the account was first created it was OD. In troubleshooting this mess, we did change to Crypt, but obviously that didn't fix it. Changing it back to OD seems to have made something work again (and I didn't have to reboot the server to get the account to work).

It's only been a couple of days, but I'm keeping my fingers crossed.
 

StevenMeyer

macrumors member
Dec 17, 2011
90
0
New York... Where Else?
I will look at that kb article. One thing I did do a couple of days ago (and, so far, so good), is I changed the users password type from Crypt Password to Open Directory. Her account still works! But, I do have other user accounts that were created with Crypt Password and never have had a problem with them. Most user accounts look like they were created with Open Directory password type.

Could the certs still be a problem given these outcomes? When the account was first created it was OD. In troubleshooting this mess, we did change to Crypt, but obviously that didn't fix it. Changing it back to OD seems to have made something work again (and I didn't have to reboot the server to get the account to work).

It's only been a couple of days, but I'm keeping my fingers crossed.

How deep in this rabbit hole do you want to go? I think the certs are on a per user basis. I can research it if you really want to KNOW.
 

DEMinSoCAL

macrumors 603
Original poster
Sep 27, 2005
5,081
7,315
How deep in this rabbit hole do you want to go? I think the certs are on a per user basis. I can research it if you really want to KNOW.

Well, keep in mind that this not only affects the client PC but also server-to-server, using the account credentials when asked. The article you referenced seemed to indicate server-to-client SSL OD binding, but that may not be the root issue here.

So, not sure if we're looking down the right rabbit hole or not! :)
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.