[GE users] SGE and physical machine slot allocation

Reuti reuti at staff.uni-marburg.de
Thu Apr 20 18:18:35 BST 2006


Am 20.04.2006 um 19:05 schrieb lukacm at pdx.edu:

> Hello,
>
> Yes the command line from the submit file is as follows:
>
> $MPIR_HOME/mpirun -np $NSLOTS -v -machinefile /home/visible/mbmachines
> /home/visible/apps/MrBayes/mb anolis.nex

Nope, this will use any machine in the cluster. The to be used  
machinefile is $TMPDIR/machines which is created by the start  
procedure of the PE.

Please have a look at the supplied mpi.sh scipt in $SGE_ROOT/mpi

-- Reuti


> All variables are defined and so.
>
> However , concernign this: "So you also renamed the created link in  
> startmpi.sh
> to create a ssh wrapper?" i am not sure. Did not find anything  
> about it in the
> FAQ's. Is there any doc on this? I modified 'rsh' in the / 
> gridengine/opt/
> directory. all links in startmpi.sh in the same directory are pointing
> correctly to that wrapper. So i guess i am confused about your  
> question
>
> martin
>
>
> Quoting Reuti <reuti at staff.uni-marburg.de>:
>
>> Am 19.04.2006 um 23:30 schrieb lukacm at pdx.edu:
>>
>>> Hello,
>>>
>>> yes the job is running fine, but not as SGE scheduled it on the
>>> physical
>>> machines, i.e. parallel slots.
>>>
>>> the qsub command looks like qsub -pe mpich 4 mbsub.sh
>>>
>>> inside the main important flags are
>>>
>>> #$ -v P4_RSHCOMMAND=ssh
>>> #$ -v P4_GLOBMEMSIZE=10000000
>>> #$ -v MPICH_PROCESS_GROUP=no
>>> #$ -v CONV_RSH=ssh
>>
>> So you also renamed the created link in startmpi.sh to create a ssh
>> wrapper?
>>
>> Have you given any hostlist to the mpirun command? - Reuti
>>
>>
>>>
>>> I also did the tight integration of MPICH and SGE using the method
>>> number 2.
>>>
>>> In general i would not mind this issue, but when i have to clean a
>>> set of
>>> zombies from the same user, and i do not know which processes are
>>> zombies and
>>> which not, it makes a problem.
>>>
>>> martin
>>>
>>> Quoting Reuti <reuti at staff.uni-marburg.de>:
>>>
>>>> Hi,
>>>>
>>>> Am 19.04.2006 um 21:59 schrieb lukacm at pdx.edu:
>>>>
>>>>> Hello all,
>>>>>
>>>>> a job run with SGE generates the following strangeness.
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> --
>>>>> --
>>>>> ------
>>>>> arc.q at compute-0-11.local       BIPC  2/2       1.00     lx26-amd64
>>>>>    3964 0.55500 tas        ruedas       r     04/19/2006
>>>>> 10:50:59     2
>>>>> ------------------------------------------------------------------ 
>>>>> --
>>>>> --
>>>>> ------
>>>>> arc.q at compute-0-12.local       BIPC  1/2       0.00     lx26-amd64
>>>>>    3964 0.55500 tas        ruedas       r     04/19/2006
>>>>> 10:50:59     1
>>>>> ------------------------------------------------------------------ 
>>>>> --
>>>>> --
>>>>> ------
>>>>>
>>>>> The slots allocated by SGE do not correspond to the queues that  
>>>>> are
>>>>> shown by
>>>>> qstat. Is there a rememdy to tight integrate SGE to the physical
>>>>> machines?
>>>>
>>>> this seems not to be a problem of SGE, but of the integration of  
>>>> your
>>>> parallel job into SGE. So this job got three slots, but is only  
>>>> using
>>>> one slot according to the load you mean?
>>>>
>>>> What is your defined queue, PE, the defined scripts for this PE and
>>>> your qsub command?
>>>>
>>>> Is your job instead running on other nodes than the intended ones?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> thank you
>>>>>
>>>>>
>>>>> martin
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> --
>>>>> -
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list