[GE users] SGE on latest Mac OS X Server 10.5.4 - help with non-root users

Chris Dagdigian dag at sonsorol.org
Thu Jul 3 22:45:57 BST 2008

Hi folks,

Skip this message if you don't want to be overwhelmed with SGE debug  
output ...

I've got a brand new OS X Apple cluster running the 10.5.4 server  
release that only came out a few days ago.

Right from the beginning I had "can't get password entry for user..."  
errors so I stripped the system down to the bare essentials:

- No open directory / LDAP
- No NFS
- All user accounts local
- All user accounts using UIDs less than 1024

My test account 'dag' is local and all system commands like 'id',  
'finger' and even the OS X command line commands like 'dscl' all  
resolve the account info perfectly fine. The system search path is  
correct as well - pointing at /Local/Default and no LDAP servers.

Even in a single-node, no-NFS, no-LDAP environment I still can't get  
SGE 6.0, 6.1 or 6.2beta2 to function for non-root users.

With courtesy binaries, "qrsh hostname" will hang forever and the  
qmaster logs will simply show the same old "can't get password entry  
for user "dag". Either the user does not exist or NIS error!" error.

If I take the SGE 6.1 source code and patch it according to the blog  
article here:

... then it still does not work but at least I get the "can't get  
password" entry error coming to STDOUT instead of hanging the qrsh  

What is pretty interesting though is if I run "qrsh hostname" with  
debug mode turned on, using the patched binaries.

It seems that some parts of SGE are able resolve my username and UID  
just fine and other parts (qrsh starter perhaps) are not able to.

Cutting from the verbose output, this is the interesting bit:

>    163   8332 -1602449504     qlogin_starter sent: 1:can't get  
> password entry for user "dag". Either the user does not exist or NIS  
> error!
>    164   8332 -1602449504     ../clients/qsh/qsh.c 890 1: can't get  
> password entry for user "dag". Either the user does not exist or NIS  
> error!
>    165   8332 -1602449504     sge_set_auth_info: username(uid) =  
> dag(511), groupname = staff(20)

So sge_set_auth_info correctly resolves my non-root user and treats it  
as if it exists, yet right above that line is the "you don't exist"  
error message ...

I'm going to attach a text file with the full debug output from a  
"qrsh hostname" command below, I'm hoping someone will have some  
pointers or insights as to how to keep on troubleshooting this ...


    [ Part 2, Text/PLAIN (Name: "sge-error.txt") 338 lines. ]
    [ Unable to print this part. ]

    [ Part 3: "Attached Text" ]

    [ Part 4: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list