[GE users] Help reg SGE 6.0 Globus 3.2 integration

Shuja Parvez shshgs01 at fht-esslingen.de
Tue Sep 28 15:46:41 BST 2004


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

yes i can. i tried them. all worked. :(

> If you log in as sgeadmin on node2 and source the settings file, can you
> execute the following successfully?
>
> % mkdir $SGE_ROOT/default/spool/node2/active_jobs/mytest
> % touch $SGE_ROOT/default/spool/node2/active_jobs/mytest/test
> % ls -R $SGE_ROOT/default/spool/node2/active_jobs/mytest
>
> Daniel
>
> Shuja Parvez wrote:
>
>>Hello.
>>Thank you for the reply.
>>
>>
>>>I helped someone with the exact same symptoms last week.  His problem
>>>was that the sge_execd was not running as the right user.  Did you check
>>>that node2's execd is run as root or as the $SGE_ROOT directory owner?
>>>
>>>
>>sge_execd runs as user sgeadmin
>>sgeadmin  4768     1  0 15:48 ?      00:00:00
>>/home/sgeadmin/bin/lx26-x86/sge_execd
>>
>>And the $SGE_ROOT=/home/sgeadmin which is NFSed from node1
>>This looks on node2 as follows
>>drwxrwxrwx  20   1003 globus 4096 2004-09-28 16:33 sgeadmin
>>
>>Now, I have user sgeadmin on both machines and sgeadmin belongs to the
>>group sgeadmin. But when I NFS it the group is shown as globus, that too
>>confuses me now.
>>
>>This is how /home/sgeadmin looks on node1
>>drwxrwxrwx  20 sgeadmin sgeadmin 4096 2004-09-28 16:33 sgeadmin
>>
>>
>>
>>>Another thought would be to check that the SGE user on node2 has
>>>permission to write to the $SGE_ROOT over NFS.
>>>
>>>
>>it does. i confirmed that.
>>
>>
>>
>>>If SGE is running as
>>>root, this can be an issue since root turns into nobody when it crosses
>>>NFS boundaries.
>>>
>>>Daniel
>>>
>>>Shuja Parvez wrote:
>>>
>>>
>>>
>>>>Hi
>>>>I have 2 nodes node1. SGE Master which is also the Globus gatekeeper
>>>> and
>>>>an
>>>>execution host.
>>>>node2. Execution host.
>>>>I installed SGE succesfully and the jobs run on queues on both
>>>> machines.
>>>>But when i submit a job from globus to the Sun Grid engine, the job
>>>> goes
>>>>into the error state and i have the following message in the spooler:
>>>>===8X message on node2===
>>>>09/28/2004 15:57:23|execd|node2|E|shepherd of job 172.1 exited with
>>>> exit
>>>>status = 26
>>>>09/28/2004 15:57:23|execd|node2|E|can't open usage file
>>>>"active_jobs/172.1/usage" for job 172.1: No such file or directory
>>>>09/28/2004 15:57:23|execd|node2|E|"can't read usage file for job 172.1
>>>>===8X ===
>>>>===8X message on qmaster ===
>>>>09/28/2004 16:01:44|qmaster|node1|W|job 172.1 failed on host
>>>>node2.cfd1.honda-ri.de general opening input/output file because: can't
>>>>read usage file for job 172.1
>>>>09/28/2004 16:01:44|qmaster|node1|W|rescheduling job 172.1
>>>>===8X ===
>>>>The jobs always go into the error state, and when i clear the error
>>>>through qmon, the jobs are rescheduled on node1 and then it continues.
>>>>
>>>>Could anyone please help me out of this
>>>>Regards
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>--
>>>*******************************************************
>>>*          Daniel Templeton   ERGB01 x60220           *
>>>*         Staff Engineer, Sun N1 Grid Engine          *
>>>*******************************************************
>>>*    "Camera one closes in, the soundtrack starts,    *
>>>*     The scene begins.  You're playing you now."     *
>>>*                -Josh Joplin Group, "Camera One"     *
>>>*******************************************************
>>>
>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>
>>
>>
>>
>>
>
> --
> *******************************************************
> *          Daniel Templeton   ERGB01 x60220           *
> *         Staff Engineer, Sun N1 Grid Engine          *
> *******************************************************
> *    "Camera one closes in, the soundtrack starts,    *
> *     The scene begins.  You're playing you now."     *
> *                -Josh Joplin Group, "Camera One"     *
> *******************************************************
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


-- 
Shuja Parvez
Msc IT and Automation Systems (2003-2005),
FH Esslingen.
Residence Address: AschaffenburgerStrasse 120,
D-63073, Offenbach am Main, Germany
Email: shshgs01 at fht-esslingen.de
Handy: +49 176 700 395 00

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list