[GE users] SGE error ?

Reuti reuti at staff.uni-marburg.de
Tue Oct 26 19:21:01 BST 2004


> I have a some weird problem when I submit a job to our linux cluster.
> We are using Grid Engine Enterprise Edition 5.3p2 version.
> When I submit a parallel job using MPICH, SGE correctly asigns and
> transfers the job to the right nodes. But SGE terminates as after the job
> begins and job running status on "qstat" disappears.
> But the job is still running on the nodes implicitly and just "qstat"
> give an information for "load_avg" number in each node.
> 
> Have any idea ?

0. Anything in any output file of the job or the PE?

1. Check what is in the messages file on the qmaster: 
$SGE_ROOT/default/spool/qmaster/messages

2. The same for the assigned masternode:
$SGE_ROOT/default/spool/<nodename>/messages

3. Read the appropiate Howto at the sunorce.net site for tight integration.

Cheers - Reuti


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list