[GE users] dropped because it is full

Michael Heinrich michael at vierpi.de
Tue Jun 27 14:21:22 BST 2006


    [ The following text is in the "iso-8859-15" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hugo Darío Barrera <hbarrera at iciq.es> writes:


> tekla001 to tekla018
> pe called "big2"
> hostgroup @big2 has hosts tekla011 to tekla018
> queue control big2 has hostgroup @big2
> i run a job at big2
> I get from qstat -t  that master job is tekla012 and running nodes between 
> tekla011 to tekla018.
> but If i go to master job and i do a "ps aux |grep vasp" (which is what im 
> running), i can see that hosts tekla001 to tekla007 are running that job.
> So i go to tekla001 i.e. and see that process running.
>
> Why is qstat -t lying me? :)
>

qstat is not lying - you're cheating qstat ;)

In your machinefile you have

tekla001
..
..
tekla018


right?

Therefore mpirun starts its slaves on tekla001 and so on.  You have to
generate a machinefile based on the selection SGE does for you.  All
information you need is in the env-variable $PE_HOSTFILE.

Generate machinefile dynamically in your jobscript like:

awk '{for (i=1; i<=$2; i++){print $1}}' $PE_HOSTFILE > machinefile


Regards,
-- 
Michael Heinrich

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list