[GE users] Help reg SGE 6.0 Globus 3.2 integration

Shuja Parvez shshgs01 at fht-esslingen.de
Wed Sep 29 10:40:52 BST 2004


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Yes, I did that while installing itself.
i really dont know whats going wrong, bcos normal jobs on both nodes run
just fine. its only when i submit through the globus job manager that it
goes into an error state.
regards
shuja

> Hmmm... I'm fresh out of clever ideas.  I see that you're on a 2.6
> kernel.  Are you using the symbolic link fix?
>
> Daniel
>
> Shuja Parvez wrote:
>
>>yes i can. i tried them. all worked. :(
>>
>>
>>
>>>If you log in as sgeadmin on node2 and source the settings file, can you
>>>execute the following successfully?
>>>
>>>% mkdir $SGE_ROOT/default/spool/node2/active_jobs/mytest
>>>% touch $SGE_ROOT/default/spool/node2/active_jobs/mytest/test
>>>% ls -R $SGE_ROOT/default/spool/node2/active_jobs/mytest
>>>
>>>Daniel
>>>
>>>Shuja Parvez wrote:
>>>
>>>
>>>
>>>>Hello.
>>>>Thank you for the reply.
>>>>
>>>>
>>>>
>>>>
>>>>>I helped someone with the exact same symptoms last week.  His problem
>>>>>was that the sge_execd was not running as the right user.  Did you
>>>>> check
>>>>>that node2's execd is run as root or as the $SGE_ROOT directory owner?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>sge_execd runs as user sgeadmin
>>>>sgeadmin  4768     1  0 15:48 ?      00:00:00
>>>>/home/sgeadmin/bin/lx26-x86/sge_execd
>>>>
>>>>And the $SGE_ROOT=/home/sgeadmin which is NFSed from node1
>>>>This looks on node2 as follows
>>>>drwxrwxrwx  20   1003 globus 4096 2004-09-28 16:33 sgeadmin
>>>>
>>>>Now, I have user sgeadmin on both machines and sgeadmin belongs to the
>>>>group sgeadmin. But when I NFS it the group is shown as globus, that
>>>> too
>>>>confuses me now.
>>>>
>>>>This is how /home/sgeadmin looks on node1
>>>>drwxrwxrwx  20 sgeadmin sgeadmin 4096 2004-09-28 16:33 sgeadmin
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Another thought would be to check that the SGE user on node2 has
>>>>>permission to write to the $SGE_ROOT over NFS.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>it does. i confirmed that.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>If SGE is running as
>>>>>root, this can be an issue since root turns into nobody when it
>>>>> crosses
>>>>>NFS boundaries.
>>>>>
>>>>>Daniel
>>>>>
>>>>>Shuja Parvez wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Hi
>>>>>>I have 2 nodes node1. SGE Master which is also the Globus gatekeeper
>>>>>>and
>>>>>>an
>>>>>>execution host.
>>>>>>node2. Execution host.
>>>>>>I installed SGE succesfully and the jobs run on queues on both
>>>>>>machines.
>>>>>>But when i submit a job from globus to the Sun Grid engine, the job
>>>>>>goes
>>>>>>into the error state and i have the following message in the spooler:
>>>>>>===8X message on node2===
>>>>>>09/28/2004 15:57:23|execd|node2|E|shepherd of job 172.1 exited with
>>>>>>exit
>>>>>>status = 26
>>>>>>09/28/2004 15:57:23|execd|node2|E|can't open usage file
>>>>>>"active_jobs/172.1/usage" for job 172.1: No such file or directory
>>>>>>09/28/2004 15:57:23|execd|node2|E|"can't read usage file for job
>>>>>> 172.1
>>>>>>===8X ===
>>>>>>===8X message on qmaster ===
>>>>>>09/28/2004 16:01:44|qmaster|node1|W|job 172.1 failed on host
>>>>>>node2.cfd1.honda-ri.de general opening input/output file because:
>>>>>> can't
>>>>>>read usage file for job 172.1
>>>>>>09/28/2004 16:01:44|qmaster|node1|W|rescheduling job 172.1
>>>>>>===8X ===
>>>>>>The jobs always go into the error state, and when i clear the error
>>>>>>through qmon, the jobs are rescheduled on node1 and then it
>>>>>> continues.
>>>>>>
>>>>>>Could anyone please help me out of this
>>>>>>Regards
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>--
>>>>>*******************************************************
>>>>>*          Daniel Templeton   ERGB01 x60220           *
>>>>>*         Staff Engineer, Sun N1 Grid Engine          *
>>>>>*******************************************************
>>>>>*    "Camera one closes in, the soundtrack starts,    *
>>>>>*     The scene begins.  You're playing you now."     *
>>>>>*                -Josh Joplin Group, "Camera One"     *
>>>>>*******************************************************
>>>>>
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>--
>>>*******************************************************
>>>*          Daniel Templeton   ERGB01 x60220           *
>>>*         Staff Engineer, Sun N1 Grid Engine          *
>>>*******************************************************
>>>*    "Camera one closes in, the soundtrack starts,    *
>>>*     The scene begins.  You're playing you now."     *
>>>*                -Josh Joplin Group, "Camera One"     *
>>>*******************************************************
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>
>>
>>
>>
>
> --
> *******************************************************
> *          Daniel Templeton   ERGB01 x60220           *
> *         Staff Engineer, Sun N1 Grid Engine          *
> *******************************************************
> *    "Camera one closes in, the soundtrack starts,    *
> *     The scene begins.  You're playing you now."     *
> *                -Josh Joplin Group, "Camera One"     *
> *******************************************************
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


-- 
Shuja Parvez
Msc IT and Automation Systems (2003-2005),
FH Esslingen.
Residence Address: AschaffenburgerStrasse 120,
D-63073, Offenbach am Main, Germany
Email: shshgs01 at fht-esslingen.de
Handy: +49 176 700 395 00

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list