[GE users] LAM / MPI : Reservation

Reuti reuti at staff.uni-marburg.de
Wed Feb 21 00:24:15 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Am 20.02.2007 um 19:13 schrieb christophe.caron at jouy.inra.fr:

> Hello all,
>
>
> In fact, it seems reservation is ok (after running some basic mpi  
> tests).
>
> The mistake was about the allocation_rule ($fill_up) works and came  
> from
> our interpretation on how it does it..
>
> It seems that SGE "locks" the first node that release a job to fill  
> up this node. If another nodes release slots, MPI jobs does not run  
> even if all free slots are greater than reservation.

I'm not sure, whether is really the intended behavior, as $fill_up  
will still allocate slots from different machines (in contrast to  
$pe_slots, which will only use slots from one machine).

Maybe other slots were already reserved for other jobs?

-- Reuti


> Thanks again Reuti.
>
> cc
>
>
>> Hi again,
>>
>> Am 16.02.2007 um 18:49 schrieb christophe.caron at jouy.inra.fr:
>>
>>> On Fri, 16 Feb 2007, Reuti wrote:
>>>> Am 16.02.2007 um 17:39 schrieb christophe.caron at jouy.inra.fr:
>>>>> Hello
>>>>> We use SGE 6u9+MPI 7.1.1 and FSS scheduling.
>>>>> We would to run MPI jobs with 4 slots but others "normal" jobs are
>>>>> always submitted before the MPI jobs in the queue (even if the  
>>>>> mpi jobs is the first in the queue wait).
>>>>> *Reservation is enable:
>>>>> max_reservation                   20
>>>>> *Submission :
>>>>> #qsub -R y -q long.q -pe lam711 4 job.sh
>>>> What is qstat -j <jobid> saying (with scheduler info turned on)?  
>>>> The PE is attached to long.q? - Reuti
>>
>> okay, the PE definition is...? - Reuti
>>
>>> # qconf -sq long.q
>>> pe_list               make lam711
>>> #################################################################### 
>>> ## ##############
>>> #qstat -j xxx
>>> ==============================================================
>>> job_number:                 1271926
>>> exec_file:                  job_scripts/1271926
>>> submission_time:            Fri Feb 16 18:38:58 2007
>>> owner:                      steletch
>>> uid:                        14962
>>> group:                      mig
>>> gid:                        233
>>> sge_o_home:                 /home/mig/steletch
>>> sge_o_log_name:             steletch
>>> sge_o_path: /opt/sge/bin/lx24-amd64:/usr/kerberos/bin:/usr/local/  
>>> bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/genome/ 
>>> arb-22-08-2003:/ usr/local/public/pgaccess:/usr/local/genome/bin:/ 
>>> usr/local/www/bin:/ usr/local/public/java/bin:/usr/sbin:/usr/ 
>>> local/public/bin:/usr/ local/public/lam/bin:/usr/local/genome/ 
>>> emboss/bin:/usr/local/genome/ phylip/bin:/usr/local/genome/mugen/ 
>>> bin:/usr/local/genome/mgadist:/ usr/local/genome/MUMmer:/usr/ 
>>> local/genome/TMHMM/bin:/usr/local/ genome/hmmer/bin:/usr/local/ 
>>> genome/fasta:/usr/local/genome/autodock/ bin:/usr/local/genome/ 
>>> mcl64/mcl-05-321/bin:/usr/local/adm/bin:/usr/ local/public/ 
>>> modeller6v2/bin:/home/mig/steletch/bin
>>> sge_o_shell:                /bin/bash
>>> sge_o_workdir: /projet/mig/steletch/  
>>> 20070213OR1G1_256lipids_nouveau_protocole/md
>>> sge_o_host:                 d2r2
>>> account:                    sge
>>> cwd: /projet/mig/steletch/ 
>>> 20070213OR1G1_256lipids_nouveau_protocole/md
>>> path_aliases:               /tmp_mnt/ * * /
>>> stderr_path_list: /projet/mig/steletch/  
>>> 20070213OR1G1_256lipids_nouveau_protocole/OR1G1-md.runlog
>>> reserve:                    y
>>> mail_options:               aes
>>> mail_list:                  steletch at jouy.inra.fr
>>> notify:                     FALSE
>>> job_name:                   OR1G1-qsub-md.sh
>>> stdout_path_list: /projet/mig/steletch/  
>>> 20070213OR1G1_256lipids_nouveau_protocole/OR1G1-md.out
>>> jobshare:                   0
>>> hard_queue_list:            long.q
>>> shell_list:                 /bin/bash
>>> ...
>>> script_file:                OR1G1-qsub-md.sh
>>> parallel environment:  lam711 range: 2
>>>                            queue instance "long.q at n49" dropped  
>>> because it is full
>>>                             queue instance "long.q at n56" dropped  
>>> because it is full
>>>                             queue instance "long.q at n48" dropped  
>>> because it is full
>>>                             queue instance "long.q at n44" dropped  
>>> because it is full
>>> ...
>>>                            queue instance "long.q at n65" dropped  
>>> because it is full
>>>                             queue instance "long.q at n52" dropped  
>>> because it is full
>>>                             queue instance "long.q at n53" dropped  
>>> because it is full
>>> ...
>>>                            cannot run in PE "lam711" because it  
>>> only offers 60 slots
>>> ...
>>> long.q has 120 slots and pe lam711 has only 16 slots.
>>> cc
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>> ******************************************************************** 
>>> ******
>>> Christophe Caron - INRA 	    | Tél: (+33) 013465 2888
>>> Mathematique, Informatique et Genome| Fax: (+33) 013465 2901
>>> Domaine de Vilvert 		    | Email: christophe.caron at jouy.inra.fr
>>> F-78350 Jouy-en-Josas		    | http://migale.jouy.inra.fr/
>>> ******************************************************************** 
>>> ******
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
>
>
> ********************************************************************** 
> ****
> Christophe Caron - INRA 	    | Tél: (+33) 013465 2888
> Mathematique, Informatique et Genome| Fax: (+33) 013465 2901
> Domaine de Vilvert 		    | Email: christophe.caron at jouy.inra.fr
> F-78350 Jouy-en-Josas		    | http://migale.jouy.inra.fr/
> ********************************************************************** 
> ****
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list