[GE users] SGE and MM5

lukacm at pdx.edu lukacm at pdx.edu
Tue Apr 12 23:41:08 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti,

Quoting Reuti <reuti at staff.uni-marburg.de>:

> Hi,
>
> to get a tight integration with SGE, you setup a PE which you requested for
> the
> job? There is a sample mpi installation in the SGE distribution and a Howto
> page available at sunsource.net. What was your submitted script and qsub
> command?

I followed the manual, but no job can run on the lamtight or non tight
integration and the error message is always:

Jobs cannot run because resources requested are not available for parallel job

however all queues are emtpy and their respective load is 0.
>
> - Don't use -nolocal with SGE. You will get an uneven distribution.

I used this only because the person installing the PG compiler did it with no
local settings so the mpi must be notified,however i do not useit with qsub.

>
> - You didn't specify a "-machinefile $TMPDIR/maches" for your mpirun, so the
> setup nodes in "blabla/share/machines.LINUX" will be used, and not the SGE
> selected ones for a parallel job.

this is the inside of the mm5_submit.sh:

#$ -V
#$ -N mm5job
#$ -o /home/submitter/mm5/sge-output.txt -j y
#$ -pe mpi 4
#$ -v MPIR_HOME=/opt/mpich/gnu/bin
#$ -v MPICH_PROCESS_GROUP=no
#$ -v CONV_RSH=ssh
cd  /home/submitter/mm5
#$ -cwd
#$ -e ./
#$ -o ./
###Remember only the home directory and /exports/visible are
###avilable throughout the cluster
/opt/mpich/gnu/bin/mpirun -np $NSLOTS -machinefile ./mmachine
/home/visible/MM5/Run/mm5.mpp


Depending on when i run the job i can also get his error message:


Jobs can not run because total slots of pe are not in range of job

Moreover i managed to make the job run for tw machines but only for two.

is this the problem of SGE configuration or MM5? Or does it requires tight
integration?

thank you

martin



>
> CU - Reuti
>
>
> Quoting lukacm at pdx.edu:
>
> > Hello list,
> >
> > i have the following problem. We installed MM5 program and it runs using
> > mpi.
> > thus if i do: /opt/mpich/gnu/bin/mpirun -nolocal -np 4
> > /home/visible/MM5/Run/mm5.mpp, the program will start and run. However when
> > i
> > submit it to SGE, the program stays in the waiting state and never goes to
> > the
> > activation/run state. And stays there ... well until being removed.
> > Is there a way how i can debug it?
> > Is there anyone that had the smae problems?
> >
> > thank you
> >
> > martin
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list