[GE users] FW: my sge system is not working with the fault tolerance

tamara sgesystem at live.com
Tue Mar 3 11:32:43 GMT 2009

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


you could either define it in $SGE_ROOT/default/common/sge_request or
in the queue definition in the entry "rerun                 TRUE"

the only place i could find rerun is in the complex configuration in qmon (is it the same queue configuration that you mentioned?) and i make it requestable
also i open sge_request but i can't find rerun there

but i faced a lot of problems:

for example job5 been sent to host1 and during its running the network is disconnected and instead of job5 goes to pending then to another host, the job goes to finished jobs directly and when i tried to know more about the job (qacct -j job5) it's says: error job id 5 is not found

another problem is hosts in queue control is missing a lot of information:
Arch, MemUsed, VirtUsed and VirtTotal
(but I'm not sure if this problem connected to configuring rerun)


See all the ways you can stay connected to friends and family<http://www.microsoft.com/windows/windowslive/default.aspx>

More information about the gridengine-users mailing list