[GE users] Re: User does not exist problems on Leopard Was: [GE users] Job does not exist

Jonathan Hunt jjh at 42quarks.com
Wed Sep 3 08:12:35 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi all,

I have just found the answer to my problem. Here
http://blog.bioteam.net/2008/07/15/sge-launchd-script-maker-for-apple-os-x-105-leopard/

Thanks for people's help. I'm happy now - I NEEDed this cluster!

Cheers,
Jonny

On Wed, Sep 3, 2008 at 4:57 PM, Jonathan Hunt <jjh at 42quarks.com> wrote:
> Hi,
>
> Just to recap. I am trying to setup SGE on Leopard 10.5.4 with NFS
> shares and OpenDirectory users. The nodes work if I log in (under user
> jhunt) and run
> sudo killall sge_execd
> sudo $SGEROOT/default/common/sgeexecd
>
> As soon as I log out the nodes fail with errors that the jobs do not exist.
>
> I found a corresponding error in the qmaster log files which I think
> helps me understand a bit of what's going on. It says:
>
> 09/03/2008 16:34:01|worker|qbi-xgrid-01|W|job 71.1 failed on host
> qbi-xgrid-02.qbi.uq.edu.au general before job because: 09/03/2008
> 16:34:00 [0:64986]: can't get password entry for user "jhunt"
> 09/03/2008 16:34:01|worker|qbi-xgrid-01|W|rescheduling job 71.1
> 09/03/2008 16:34:01|worker|qbi-xgrid-01|E|queue all.q marked QERROR as
> result of job 71's failure at host qbi-xgrid-02.qbi.uq.edu.au
>
> The user jhunt is an OpenDirectory user. I can ssh into the box with
> for that user with no problems.  So somehow logging out is causing
> problems finding my user password etc. It appears from Googling that
> this problem was encountered when first porting SGE to Leopard. Does
> anyone know how to fix it now? If anyone knows of binaries posted
> online for SGE 6.2 that might work better than mine please let me
> know.
>
> Any help appreciated.
> Jonny
>
>
> On Tue, Sep 2, 2008 at 7:48 PM, Jonathan Hunt <jjh at 42quarks.com> wrote:
>> On Tue, Sep 2, 2008 at 7:44 PM, Ravi Chandra Nallan
>> <Ravichandra.Nallan at sun.com> wrote:
>>> Jonathan Hunt wrote:
>>>>
>>>> On Tue, Sep 2, 2008 at 2:17 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
>>>>
>>>>>
>>>>> do you have the spool directory of the nodes local or also on NFS?
>>>>>
>>>>> -- Reuti
>>>>>
>>>>
>>>> I have tried both local and on NFS and get the same problem.
>>>>
>>>> Thanks,
>>>> Jonny
>>>>
>>>>
>>>
>>> Can you check the permissions of the local spool directory on the exec node?
>>>
>>> --
>>> regards,
>>> ~Ravi
>>
>> qbi-xgrid-02:qbi-xgrid-02 jhunt$ pwd
>> /sge/default/spool/qbi-xgrid-02
>> qbi-xgrid-02:qbi-xgrid-02 jhunt$ ls -le
>> total 8
>> drwxr-xr-x  2 nobody  nobody    68 Sep  1 23:46 active_jobs
>> -rw-r--r--  1 nobody  nobody     6 Sep  1 23:39 execd.pid
>> drwxr-xr-x  2 nobody  nobody    68 Sep  1 23:46 job_scripts
>> drwxr-xr-x  2 nobody  nobody    68 Sep  1 23:46 jobs
>> -rw-r--r--  1 nobody  nobody  1930 Sep  1 23:46 messages
>> qbi-xgrid-02:qbi-xgrid-02 jhunt$
>>
>>
>> Thanks for trying to help. Any conclusions much appreciated.,
>> Jonny
>>
>> --
>> Jonathan J Hunt <jjh at 42quarks.com>
>> Homepage: http://www.42quarks.net.nz/wiki/JJH
>> (Further contact details there)
>> "Physics isn't the most important thing. Love is." Richard Feynman
>>
>
>
>
> --
> Jonathan J Hunt <jjh at 42quarks.com>
> Homepage: http://www.42quarks.net.nz/wiki/JJH
> (Further contact details there)
> "Physics isn't the most important thing. Love is." Richard Feynman
>



-- 
Jonathan J Hunt <jjh at 42quarks.com>
Homepage: http://www.42quarks.net.nz/wiki/JJH
(Further contact details there)
"Physics isn't the most important thing. Love is." Richard Feynman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list