[GE users] SGE and physical machine slot allocation

Reuti reuti at staff.uni-marburg.de
Thu Apr 20 09:24:11 BST 2006


Am 19.04.2006 um 23:30 schrieb lukacm at pdx.edu:

> Hello,
>
> yes the job is running fine, but not as SGE scheduled it on the  
> physical
> machines, i.e. parallel slots.
>
> the qsub command looks like qsub -pe mpich 4 mbsub.sh
>
> inside the main important flags are
>
> #$ -v P4_RSHCOMMAND=ssh
> #$ -v P4_GLOBMEMSIZE=10000000
> #$ -v MPICH_PROCESS_GROUP=no
> #$ -v CONV_RSH=ssh

So you also renamed the created link in startmpi.sh to create a ssh  
wrapper?

Have you given any hostlist to the mpirun command? - Reuti


>
> I also did the tight integration of MPICH and SGE using the method  
> number 2.
>
> In general i would not mind this issue, but when i have to clean a  
> set of
> zombies from the same user, and i do not know which processes are  
> zombies and
> which not, it makes a problem.
>
> martin
>
> Quoting Reuti <reuti at staff.uni-marburg.de>:
>
>> Hi,
>>
>> Am 19.04.2006 um 21:59 schrieb lukacm at pdx.edu:
>>
>>> Hello all,
>>>
>>> a job run with SGE generates the following strangeness.
>>>
>>> -------------------------------------------------------------------- 
>>> --
>>> ------
>>> arc.q at compute-0-11.local       BIPC  2/2       1.00     lx26-amd64
>>>    3964 0.55500 tas        ruedas       r     04/19/2006
>>> 10:50:59     2
>>> -------------------------------------------------------------------- 
>>> --
>>> ------
>>> arc.q at compute-0-12.local       BIPC  1/2       0.00     lx26-amd64
>>>    3964 0.55500 tas        ruedas       r     04/19/2006
>>> 10:50:59     1
>>> -------------------------------------------------------------------- 
>>> --
>>> ------
>>>
>>> The slots allocated by SGE do not correspond to the queues that are
>>> shown by
>>> qstat. Is there a rememdy to tight integrate SGE to the physical
>>> machines?
>>
>> this seems not to be a problem of SGE, but of the integration of your
>> parallel job into SGE. So this job got three slots, but is only using
>> one slot according to the load you mean?
>>
>> What is your defined queue, PE, the defined scripts for this PE and
>> your qsub command?
>>
>> Is your job instead running on other nodes than the intended ones?
>>
>> -- Reuti
>>
>>
>>> thank you
>>>
>>>
>>> martin
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list