[GE issues] 6_2U3 issue/question

reuti reuti at staff.uni-marburg.de
Mon Jan 18 13:04:00 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

Am 15.01.2010 um 15:32 schrieb gtatachar:

> Hi,
>
> I am running 6_2U3 grid engine and occasionally see the following  
> error:
>
> Shepherd error:
> 01/13/2010 19:22:09 [3012:13075]: error: can't open output file "/ 
> home/qeoptuat/grid/ 
> 20100113_191437_589963000_EST_nyqspla126v.ny.gsam.gs.com_932/10.out.tx 
> t": No such file or directory
>
> The entry in the messages file on the spool has the following entry:
>
> 01/13/2010 19:22:10|  main|nyqspla116v|E|shepherd of job 35971.10  
> exited with exit status = 26
> 01/13/2010 19:22:10|  main|nyqspla116v|E|can't open usage file  
> "active_jobs/35971.10/usage" for job 35971.10:
> No such file or directory
> 01/13/2010 19:22:10|  main|nyqspla116v|E|01/13/2010 19:22:09  
> [3012:13075]: error: can't open output file "/hom
> e/qeoptuat/grid/ 
> 20100113_191437_589963000_EST_nyqspla126v.ny.gsam.gs.com_932/10.out.tx 
> t": No such file or dire
> ctory
>
> Can anyone shed more light on exit status = 26? Is this somehow  
> related to NFS? ( I am using NFS spooling)

you mean SGE's spool directory is in /usr/sge or alike? It's best to  
have it local on each node in a location like /var/spool/sge.  
Otherwise the job will first be send from the qmaster to a node by  
its own protocol, and then stored by the execd in the specified  
location, which would mean to transfer it to the file server. If the  
qmaster is running on the machine which is also the fileserver, this  
would mean that the information is transferred back and forth.

http://gridengine.sunsource.net/howto/nfsreduce.html

-- Reuti


> Also to note is
> 1.	the fact that the source of the error seems to be due to a long  
> delay between when a child job terminates from the grid, and when  
> the parent job is able to locate the child job?s output file on NFS.
> 2.	We never experienced this on n1ge6 grid using Berkeley db.
>
> Any information is greatly appreciated.
>
> Thanks
> Gopi
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=36&dsMessageId=238980
>
> To unsubscribe from this discussion, e-mail: [issues- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=239518

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list