[GE users] SGE and physical machine slot allocation

lukacm at pdx.edu lukacm at pdx.edu
Thu Apr 20 18:05:51 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello,

Yes the command line from the submit file is as follows:

$MPIR_HOME/mpirun -np $NSLOTS -v -machinefile /home/visible/mbmachines
/home/visible/apps/MrBayes/mb anolis.nex

All variables are defined and so.

However , concernign this: "So you also renamed the created link in startmpi.sh
to create a ssh wrapper?" i am not sure. Did not find anything about it in the
FAQ's. Is there any doc on this? I modified 'rsh' in the /gridengine/opt/
directory. all links in startmpi.sh in the same directory are pointing
correctly to that wrapper. So i guess i am confused about your question

martin


Quoting Reuti <reuti at staff.uni-marburg.de>:

> Am 19.04.2006 um 23:30 schrieb lukacm at pdx.edu:
>
> > Hello,
> >
> > yes the job is running fine, but not as SGE scheduled it on the
> > physical
> > machines, i.e. parallel slots.
> >
> > the qsub command looks like qsub -pe mpich 4 mbsub.sh
> >
> > inside the main important flags are
> >
> > #$ -v P4_RSHCOMMAND=ssh
> > #$ -v P4_GLOBMEMSIZE=10000000
> > #$ -v MPICH_PROCESS_GROUP=no
> > #$ -v CONV_RSH=ssh
>
> So you also renamed the created link in startmpi.sh to create a ssh
> wrapper?
>
> Have you given any hostlist to the mpirun command? - Reuti
>
>
> >
> > I also did the tight integration of MPICH and SGE using the method
> > number 2.
> >
> > In general i would not mind this issue, but when i have to clean a
> > set of
> > zombies from the same user, and i do not know which processes are
> > zombies and
> > which not, it makes a problem.
> >
> > martin
> >
> > Quoting Reuti <reuti at staff.uni-marburg.de>:
> >
> >> Hi,
> >>
> >> Am 19.04.2006 um 21:59 schrieb lukacm at pdx.edu:
> >>
> >>> Hello all,
> >>>
> >>> a job run with SGE generates the following strangeness.
> >>>
> >>> --------------------------------------------------------------------
> >>> --
> >>> ------
> >>> arc.q at compute-0-11.local       BIPC  2/2       1.00     lx26-amd64
> >>>    3964 0.55500 tas        ruedas       r     04/19/2006
> >>> 10:50:59     2
> >>> --------------------------------------------------------------------
> >>> --
> >>> ------
> >>> arc.q at compute-0-12.local       BIPC  1/2       0.00     lx26-amd64
> >>>    3964 0.55500 tas        ruedas       r     04/19/2006
> >>> 10:50:59     1
> >>> --------------------------------------------------------------------
> >>> --
> >>> ------
> >>>
> >>> The slots allocated by SGE do not correspond to the queues that are
> >>> shown by
> >>> qstat. Is there a rememdy to tight integrate SGE to the physical
> >>> machines?
> >>
> >> this seems not to be a problem of SGE, but of the integration of your
> >> parallel job into SGE. So this job got three slots, but is only using
> >> one slot according to the load you mean?
> >>
> >> What is your defined queue, PE, the defined scripts for this PE and
> >> your qsub command?
> >>
> >> Is your job instead running on other nodes than the intended ones?
> >>
> >> -- Reuti
> >>
> >>
> >>> thank you
> >>>
> >>>
> >>> martin
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list