[GE users] User does not exist problems on Leopard Was: [GE users] Job does not exist

Jonathan Hunt jjh at 42quarks.com
Wed Sep 3 07:57:16 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Just to recap. I am trying to setup SGE on Leopard 10.5.4 with NFS
shares and OpenDirectory users. The nodes work if I log in (under user
jhunt) and run
sudo killall sge_execd
sudo $SGEROOT/default/common/sgeexecd

As soon as I log out the nodes fail with errors that the jobs do not exist.

I found a corresponding error in the qmaster log files which I think
helps me understand a bit of what's going on. It says:

09/03/2008 16:34:01|worker|qbi-xgrid-01|W|job 71.1 failed on host
qbi-xgrid-02.qbi.uq.edu.au general before job because: 09/03/2008
16:34:00 [0:64986]: can't get password entry for user "jhunt"
09/03/2008 16:34:01|worker|qbi-xgrid-01|W|rescheduling job 71.1
09/03/2008 16:34:01|worker|qbi-xgrid-01|E|queue all.q marked QERROR as
result of job 71's failure at host qbi-xgrid-02.qbi.uq.edu.au

The user jhunt is an OpenDirectory user. I can ssh into the box with
for that user with no problems.  So somehow logging out is causing
problems finding my user password etc. It appears from Googling that
this problem was encountered when first porting SGE to Leopard. Does
anyone know how to fix it now? If anyone knows of binaries posted
online for SGE 6.2 that might work better than mine please let me
know.

Any help appreciated.
Jonny


On Tue, Sep 2, 2008 at 7:48 PM, Jonathan Hunt <jjh at 42quarks.com> wrote:
> On Tue, Sep 2, 2008 at 7:44 PM, Ravi Chandra Nallan
> <Ravichandra.Nallan at sun.com> wrote:
>> Jonathan Hunt wrote:
>>>
>>> On Tue, Sep 2, 2008 at 2:17 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
>>>
>>>>
>>>> do you have the spool directory of the nodes local or also on NFS?
>>>>
>>>> -- Reuti
>>>>
>>>
>>> I have tried both local and on NFS and get the same problem.
>>>
>>> Thanks,
>>> Jonny
>>>
>>>
>>
>> Can you check the permissions of the local spool directory on the exec node?
>>
>> --
>> regards,
>> ~Ravi
>
> qbi-xgrid-02:qbi-xgrid-02 jhunt$ pwd
> /sge/default/spool/qbi-xgrid-02
> qbi-xgrid-02:qbi-xgrid-02 jhunt$ ls -le
> total 8
> drwxr-xr-x  2 nobody  nobody    68 Sep  1 23:46 active_jobs
> -rw-r--r--  1 nobody  nobody     6 Sep  1 23:39 execd.pid
> drwxr-xr-x  2 nobody  nobody    68 Sep  1 23:46 job_scripts
> drwxr-xr-x  2 nobody  nobody    68 Sep  1 23:46 jobs
> -rw-r--r--  1 nobody  nobody  1930 Sep  1 23:46 messages
> qbi-xgrid-02:qbi-xgrid-02 jhunt$
>
>
> Thanks for trying to help. Any conclusions much appreciated.,
> Jonny
>
> --
> Jonathan J Hunt <jjh at 42quarks.com>
> Homepage: http://www.42quarks.net.nz/wiki/JJH
> (Further contact details there)
> "Physics isn't the most important thing. Love is." Richard Feynman
>



-- 
Jonathan J Hunt <jjh at 42quarks.com>
Homepage: http://www.42quarks.net.nz/wiki/JJH
(Further contact details there)
"Physics isn't the most important thing. Love is." Richard Feynman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list