[GE users] SGE on latest Mac OS X Server 10.5.4 - help with non-root users

Chansup Byun Chansup.Byun at Sun.COM
Tue Jul 8 16:04:06 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Chris,

Have you tried to capture packet traces when you got the error?

http://docs.info.apple.com/article.html?artnum=107952

This may reveal what's going on.

Regards,

- Chansup


Chris Dagdigian wrote:
> Hi Ian,
>
> Already saw the bug report -- the comments and debug output attached 
> to that issue were added by me ('craff' on sunsource). I was able to 
> reproduce the problem on a clean Mac OS X 10.5.4 Server over the 
> weekend without OpenDirectory/LDAP and without NFS so the problem is 
> strictly with the OS and almost certainly with something that changed 
> in a recent OS X Update. We can't reproduce the issue on our 10.5.4 
> laptops running OS X Client which is a bit strange.
>
> For a while my hypothesis was that accounts created using Workgroup 
> Manager (both in LDAP or local mode) were somehow broken but I was 
> able to create user and sgeadmin accounts using nothing but the 
> command line 'dscl' program and those accounts also can not be 
> resolved by SGE.
>
> The issue is pretty consistent once you can get it to break (I have a 
> different mac mini running OS X client 10.5.4 where SGE works 
> perfectly) and it boils down to this:
>
> - When installed as root, all user jobs fail with the 'cant' get 
> password entry for user <username>' error
> - When installed with a non-root admin user, all jobs fail with 'admin 
> user <username> does not exist'
>
> The good news is that people far smarter than me are taking a look at 
> it and I've made my server system accessible to a few people who are 
> looking into things.
>
> The bad news is I may have to migrate a new client cluster to Platform 
> LSF as not being able to get SGE to run for more than a week is pretty 
> embarrassing.
>
> -Chris
>
>
> On Jul 8, 2008, at 9:18 AM, Ian Levesque wrote:
>
>> Hi Chris,
>>
>> I posted to the list about this problem recently, you should see the 
>> thread in the archives. I created a bug report on sunsource if you'd 
>> like to add your observations: 
>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2636
>>
>> Cheers,
>> Ian
>>
>>
>> On Jul 3, 2008, at 5:45 PM, Chris Dagdigian wrote:
>>
>>> Hi folks,
>>>
>>> Skip this message if you don't want to be overwhelmed with SGE debug 
>>> output ...
>>>
>>> I've got a brand new OS X Apple cluster running the 10.5.4 server 
>>> release that only came out a few days ago.
>>>
>>> Right from the beginning I had "can't get password entry for 
>>> user..." errors so I stripped the system down to the bare essentials:
>>>
>>> - No open directory / LDAP
>>> - No NFS
>>> - All user accounts local
>>> - All user accounts using UIDs less than 1024
>>>
>>> My test account 'dag' is local and all system commands like 'id', 
>>> 'finger' and even the OS X command line commands like 'dscl' all 
>>> resolve the account info perfectly fine. The system search path is 
>>> correct as well - pointing at /Local/Default and no LDAP servers.
>>>
>>> Even in a single-node, no-NFS, no-LDAP environment I still can't get 
>>> SGE 6.0, 6.1 or 6.2beta2 to function for non-root users.
>>>
>>> With courtesy binaries, "qrsh hostname" will hang forever and the 
>>> qmaster logs will simply show the same old "can't get password entry 
>>> for user "dag". Either the user does not exist or NIS error!" error.
>>>
>>> If I take the SGE 6.1 source code and patch it according to the blog 
>>> article here:
>>> http://gridengine.info/articles/2008/03/03/building-6-1u3-on-mac-osx-10-5-2-leopard-server 
>>>
>>>
>>> ... then it still does not work but at least I get the "can't get 
>>> password" entry error coming to STDOUT instead of hanging the qrsh 
>>> process.
>>>
>>> What is pretty interesting though is if I run "qrsh hostname" with 
>>> debug mode turned on, using the patched binaries.
>>>
>>> It seems that some parts of SGE are able resolve my username and UID 
>>> just fine and other parts (qrsh starter perhaps) are not able to.
>>>
>>> Cutting from the verbose output, this is the interesting bit:
>>>
>>>>  163   8332 -1602449504     qlogin_starter sent: 1:can't get 
>>>> password entry for user "dag". Either the user does not exist or 
>>>> NIS error!
>>>>  164   8332 -1602449504     ../clients/qsh/qsh.c 890 1: can't get 
>>>> password entry for user "dag". Either the user does not exist or 
>>>> NIS error!
>>>>
>>>>  165   8332 -1602449504     sge_set_auth_info: username(uid) = 
>>>> dag(511), groupname = staff(20)
>>>
>>>
>>> So sge_set_auth_info correctly resolves my non-root user and treats 
>>> it as if it exists, yet right above that line is the "you don't 
>>> exist" error message ...
>>>
>>>
>>> I'm going to attach a text file with the full debug output from a 
>>> "qrsh hostname" command below, I'm hoping someone will have some 
>>> pointers or insights as to how to keep on troubleshooting this ...
>>>
>>> Regards,
>>> Chris
>>>
>>> <sge-error.txt>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list