[GE users] Machines constantly in Error state, and won't stay cleared...

templedf dan.templeton at sun.com
Tue Jan 19 20:22:39 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Look in the "messages" file in the execd's spool directory. It's 
probably located at $SGE_ROOT/$SGE_CELL/spool/air/messages. If it's not 
there, look at qconf -sconf air or qconf -sconf to find the location of 
the spool directory.

Daniel

sgenedharvey wrote:
> I can?t figure out what?s causing this problem. I installed execution 
> host on 4 machines, precisely the same, by copying & pasting the 
> commands precisely the same on each machine. Two of the machines are 
> working flawlessly, and two of the machines keep going into Error 
> state. Even if I uninstall/reinstall execution host daemon, completely 
> removing all SGE files from the systems ... Even after clearing the 
> Error state from the queues ... The first time the machine is 
> scheduled to run a job, the job fails, and the machine returns to 
> Error state.
>
>
> If I check the reason for Error state:
> [eharvey at air gridout]$ qstat -explain E
> queuename qtype resv/used/tot. load_avg arch states
> ---------------------------------------------------------------------------------
> camb.q at air.lyricsemi.hdq BIP 0/0/2 2.11 lx24-amd64 E
> queue camb.q marked QERROR as result of job 367's failure at host 
> air.lyricsemi.hdq
> queue camb.q marked QERROR as result of job 369's failure at host 
> air.lyricsemi.hdq
>
>
> So, obviously, I want to know why those 2 jobs failed .... But can?t 
> seem to find any record anywhere...
>
>
> If I check the man page, it says ?Please check the error logfile of 
> that sge_execd?
> But I can?t find any logfile ... Can anybody tell me where to find the 
> logfile? Or any other method to figure out why these machines keep 
> going into error state?
>
>
> I am running SGE 6.2u4
>
> Thanks....

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=239790

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list