[GE users] Job puts entire cluster into Error state over misplaced pid file? Help!

Bevan C Bennett bevan at fulcrummicro.com
Sat Sep 8 23:58:04 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Beadles, Jeff wrote:
> Have you tried looking in the messages file in spool directory for the 
> execution host? It should have the reason for why the system was put 
> into an error state.

Yes, that's the log I included, which indicated that the exec host was 
looking for the jid file in the TEMP directory instead of in the regular 
location, then failed because it wasn't there.

I need to know -why- some jobs are, seemingly randomly, looking in the 
wrong place for their jid file, and why SGE lets these jobs send EVERY 
SINGLE QUEUE into an unnecessary error state one after another.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list