[GE users] Help reg SGE 6.0 Globus 3.2 integration

Daniel Templeton Dan.Templeton at Sun.COM
Wed Sep 29 10:15:52 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hmmm... I'm fresh out of clever ideas.  I see that you're on a 2.6 
kernel.  Are you using the symbolic link fix?

Daniel

Shuja Parvez wrote:

>yes i can. i tried them. all worked. :(
>
>  
>
>>If you log in as sgeadmin on node2 and source the settings file, can you
>>execute the following successfully?
>>
>>% mkdir $SGE_ROOT/default/spool/node2/active_jobs/mytest
>>% touch $SGE_ROOT/default/spool/node2/active_jobs/mytest/test
>>% ls -R $SGE_ROOT/default/spool/node2/active_jobs/mytest
>>
>>Daniel
>>
>>Shuja Parvez wrote:
>>
>>    
>>
>>>Hello.
>>>Thank you for the reply.
>>>
>>>
>>>      
>>>
>>>>I helped someone with the exact same symptoms last week.  His problem
>>>>was that the sge_execd was not running as the right user.  Did you check
>>>>that node2's execd is run as root or as the $SGE_ROOT directory owner?
>>>>
>>>>
>>>>        
>>>>
>>>sge_execd runs as user sgeadmin
>>>sgeadmin  4768     1  0 15:48 ?      00:00:00
>>>/home/sgeadmin/bin/lx26-x86/sge_execd
>>>
>>>And the $SGE_ROOT=/home/sgeadmin which is NFSed from node1
>>>This looks on node2 as follows
>>>drwxrwxrwx  20   1003 globus 4096 2004-09-28 16:33 sgeadmin
>>>
>>>Now, I have user sgeadmin on both machines and sgeadmin belongs to the
>>>group sgeadmin. But when I NFS it the group is shown as globus, that too
>>>confuses me now.
>>>
>>>This is how /home/sgeadmin looks on node1
>>>drwxrwxrwx  20 sgeadmin sgeadmin 4096 2004-09-28 16:33 sgeadmin
>>>
>>>
>>>
>>>      
>>>
>>>>Another thought would be to check that the SGE user on node2 has
>>>>permission to write to the $SGE_ROOT over NFS.
>>>>
>>>>
>>>>        
>>>>
>>>it does. i confirmed that.
>>>
>>>
>>>
>>>      
>>>
>>>>If SGE is running as
>>>>root, this can be an issue since root turns into nobody when it crosses
>>>>NFS boundaries.
>>>>
>>>>Daniel
>>>>
>>>>Shuja Parvez wrote:
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>Hi
>>>>>I have 2 nodes node1. SGE Master which is also the Globus gatekeeper
>>>>>and
>>>>>an
>>>>>execution host.
>>>>>node2. Execution host.
>>>>>I installed SGE succesfully and the jobs run on queues on both
>>>>>machines.
>>>>>But when i submit a job from globus to the Sun Grid engine, the job
>>>>>goes
>>>>>into the error state and i have the following message in the spooler:
>>>>>===8X message on node2===
>>>>>09/28/2004 15:57:23|execd|node2|E|shepherd of job 172.1 exited with
>>>>>exit
>>>>>status = 26
>>>>>09/28/2004 15:57:23|execd|node2|E|can't open usage file
>>>>>"active_jobs/172.1/usage" for job 172.1: No such file or directory
>>>>>09/28/2004 15:57:23|execd|node2|E|"can't read usage file for job 172.1
>>>>>===8X ===
>>>>>===8X message on qmaster ===
>>>>>09/28/2004 16:01:44|qmaster|node1|W|job 172.1 failed on host
>>>>>node2.cfd1.honda-ri.de general opening input/output file because: can't
>>>>>read usage file for job 172.1
>>>>>09/28/2004 16:01:44|qmaster|node1|W|rescheduling job 172.1
>>>>>===8X ===
>>>>>The jobs always go into the error state, and when i clear the error
>>>>>through qmon, the jobs are rescheduled on node1 and then it continues.
>>>>>
>>>>>Could anyone please help me out of this
>>>>>Regards
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>--
>>>>*******************************************************
>>>>*          Daniel Templeton   ERGB01 x60220           *
>>>>*         Staff Engineer, Sun N1 Grid Engine          *
>>>>*******************************************************
>>>>*    "Camera one closes in, the soundtrack starts,    *
>>>>*     The scene begins.  You're playing you now."     *
>>>>*                -Josh Joplin Group, "Camera One"     *
>>>>*******************************************************
>>>>
>>>>
>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>
>>>
>>>      
>>>
>>--
>>*******************************************************
>>*          Daniel Templeton   ERGB01 x60220           *
>>*         Staff Engineer, Sun N1 Grid Engine          *
>>*******************************************************
>>*    "Camera one closes in, the soundtrack starts,    *
>>*     The scene begins.  You're playing you now."     *
>>*                -Josh Joplin Group, "Camera One"     *
>>*******************************************************
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>    
>>
>
>
>  
>

-- 
*******************************************************
*          Daniel Templeton   ERGB01 x60220           *
*         Staff Engineer, Sun N1 Grid Engine          *
*******************************************************
*    "Camera one closes in, the soundtrack starts,    *
*     The scene begins.  You're playing you now."     *
*                -Josh Joplin Group, "Camera One"     *
*******************************************************



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list