[GE users] dropped because it is full

Reuti reuti at staff.uni-marburg.de
Tue Jun 27 17:01:01 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi again,

better than creating the machinefile by hand on your own in each  
script, is to have a look into the $SGE_ROOT/mpi directory and the  
Howto http://gridengine.sunsource.net/howto/mpich-integration.html,  
as the next problem might be that mpi processes are not under the  
control of SGE. Therefore the PE should take care of this stuff  
(which is the same in each script), and create the usual $TMPDIR/ 
machines for you already on its own, and you can just use it. Some  
starting point you may find in the SGE 6 Administration Guide page  
155 ff.

HTH - Reuti


Am 27.06.2006 um 16:48 schrieb Hugo Darío Barrera:

> Hi,
>
>
> that worked but...
>
> You forgot RTFM :D (although dont know which one to read...)
>
>
>
>
> Thanks a lot MIchael!
>
> and thanks the other answers too
>
>
> Best regards!
>
>
>
> On Tuesday 27 June 2006 15:21, Michael Heinrich wrote:
>
> > Hugo Darío Barrera <hbarrera at iciq.es> writes:
>
> > > tekla001 to tekla018
>
> > > pe called "big2"
>
> > > hostgroup @big2 has hosts tekla011 to tekla018
>
> > > queue control big2 has hostgroup @big2
>
> > > i run a job at big2
>
> > > I get from qstat -t that master job is tekla012 and running nodes
>
> > > between tekla011 to tekla018.
>
> > > but If i go to master job and i do a "ps aux |grep vasp" (which  
> is what
>
> > > im running), i can see that hosts tekla001 to tekla007 are  
> running that
>
> > > job. So i go to tekla001 i.e. and see that process running.
>
> > >
>
> > > Why is qstat -t lying me? :)
>
> >
>
> > qstat is not lying - you're cheating qstat ;)
>
> >
>
> > In your machinefile you have
>
> >
>
> > tekla001
>
> > ..
>
> > ..
>
> > tekla018
>
> >
>
> >
>
> > right?
>
> >
>
> > Therefore mpirun starts its slaves on tekla001 and so on. You  
> have to
>
> > generate a machinefile based on the selection SGE does for you. All
>
> > information you need is in the env-variable $PE_HOSTFILE.
>
> >
>
> > Generate machinefile dynamically in your jobscript like:
>
> >
>
> > awk '{for (i=1; i<=$2; i++){print $1}}' $PE_HOSTFILE > machinefile
>
> >
>
> >
>
> > Regards,
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list