[GE users] Help reg SGE 6.0 Globus 3.2 integration

Daniel Templeton Dan.Templeton at Sun.COM
Tue Sep 28 15:43:55 BST 2004


If you log in as sgeadmin on node2 and source the settings file, can you 
execute the following successfully?

% mkdir $SGE_ROOT/default/spool/node2/active_jobs/mytest
% touch $SGE_ROOT/default/spool/node2/active_jobs/mytest/test
% ls -R $SGE_ROOT/default/spool/node2/active_jobs/mytest

Daniel

Shuja Parvez wrote:

>Hello.
>Thank you for the reply.
>  
>
>>I helped someone with the exact same symptoms last week.  His problem
>>was that the sge_execd was not running as the right user.  Did you check
>>that node2's execd is run as root or as the $SGE_ROOT directory owner?
>>    
>>
>sge_execd runs as user sgeadmin
>sgeadmin  4768     1  0 15:48 ?      00:00:00
>/home/sgeadmin/bin/lx26-x86/sge_execd
>
>And the $SGE_ROOT=/home/sgeadmin which is NFSed from node1
>This looks on node2 as follows
>drwxrwxrwx  20   1003 globus 4096 2004-09-28 16:33 sgeadmin
>
>Now, I have user sgeadmin on both machines and sgeadmin belongs to the
>group sgeadmin. But when I NFS it the group is shown as globus, that too
>confuses me now.
>
>This is how /home/sgeadmin looks on node1
>drwxrwxrwx  20 sgeadmin sgeadmin 4096 2004-09-28 16:33 sgeadmin
>
>  
>
>>Another thought would be to check that the SGE user on node2 has
>>permission to write to the $SGE_ROOT over NFS.
>>    
>>
>it does. i confirmed that.
>
>  
>
>>If SGE is running as
>>root, this can be an issue since root turns into nobody when it crosses
>>NFS boundaries.
>>
>>Daniel
>>
>>Shuja Parvez wrote:
>>
>>    
>>
>>>Hi
>>>I have 2 nodes node1. SGE Master which is also the Globus gatekeeper and
>>>an
>>>execution host.
>>>node2. Execution host.
>>>I installed SGE succesfully and the jobs run on queues on both machines.
>>>But when i submit a job from globus to the Sun Grid engine, the job goes
>>>into the error state and i have the following message in the spooler:
>>>===8X message on node2===
>>>09/28/2004 15:57:23|execd|node2|E|shepherd of job 172.1 exited with exit
>>>status = 26
>>>09/28/2004 15:57:23|execd|node2|E|can't open usage file
>>>"active_jobs/172.1/usage" for job 172.1: No such file or directory
>>>09/28/2004 15:57:23|execd|node2|E|"can't read usage file for job 172.1
>>>===8X ===
>>>===8X message on qmaster ===
>>>09/28/2004 16:01:44|qmaster|node1|W|job 172.1 failed on host
>>>node2.cfd1.honda-ri.de general opening input/output file because: can't
>>>read usage file for job 172.1
>>>09/28/2004 16:01:44|qmaster|node1|W|rescheduling job 172.1
>>>===8X ===
>>>The jobs always go into the error state, and when i clear the error
>>>through qmon, the jobs are rescheduled on node1 and then it continues.
>>>
>>>Could anyone please help me out of this
>>>Regards
>>>
>>>
>>>
>>>
>>>      
>>>
>>--
>>*******************************************************
>>*          Daniel Templeton   ERGB01 x60220           *
>>*         Staff Engineer, Sun N1 Grid Engine          *
>>*******************************************************
>>*    "Camera one closes in, the soundtrack starts,    *
>>*     The scene begins.  You're playing you now."     *
>>*                -Josh Joplin Group, "Camera One"     *
>>*******************************************************
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>    
>>
>
>
>  
>

-- 
*******************************************************
*          Daniel Templeton   ERGB01 x60220           *
*         Staff Engineer, Sun N1 Grid Engine          *
*******************************************************
*    "Camera one closes in, the soundtrack starts,    *
*     The scene begins.  You're playing you now."     *
*                -Josh Joplin Group, "Camera One"     *
*******************************************************



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list