[GE users] Troubleshooting NIS errors (SGE 6.1u3 / Linux)

Joe Landman landman at scalableinformatics.com
Fri Jan 11 16:00:39 GMT 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Chris:

   Sorry for the delay ... meetings and travel.

Chris Dagdigian wrote:
> Hi folks,
> 
> I'm stuck troubleshooting a "can't submit jobs" problem that seems to be 
> NIS related and would appreciate some debug tips if anyone has them.
> 
> The system is 6.1u3 running on Linux and the basic summary is:
> 
> - Local user accounts with local homedirs are always successful with 
> qsub and qrsh

Good.

> - Local user accounts with NFS shared homedirs are always successful 
> with qsub and qrsh
> - User accounts found in NIS are *never* successful with qsub or qrsh

Hmmm.  Do they seem to time out?  If you ssh into a machine, does it 
take longer than a few tenths of second (assuming passwordless ssh setup).

> - NIS seems happy according to "getent" on Linux

Ok ... sanity checks

1) ypwhich

2) ypcat passwd | grep -i some_user_in_nis

do both of these return correct result on both server and client?

> - SGE utilbin program "uidgid" report success when we run them against 
> NIS resident usernames
> 
> That said though, any time a user configured within NIS tries to use 
> qrsh we see this error in the logs:
>> can't get password entry

Try the ypcat passwd bit above.  Sometimes we had a "database 
permissions to rigid" or something like that.

> 
> Similar error occurs with qsub, the job pends forever because:
> 
>> error reason 1: can't get password entry for user "XXX". Either the 
>> user does not exist or NIS error!

The ypcat should be something to look at.  Also, do you have nscd 
running on the clients?  If so, I would suggest turning it off.

> 
> 
> 
> I've done as much as I can with Linux to confirm that NIS is happy and 
> functioning for user. I've done the same (minimal) work with the SGE 
> utilbin binaries and we can't seem to discover any actual trouble.
> 
> Anyone have any additional hints for working out NIS issues with Grid 
> Engine? Thanks!

Strace is your friend.  If you can strace the qsub or qrsh process, this 
would help.

Joe

> 
> 
> Regards,
> Chris
> 
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list