[GE users] LAM / MPI : Reservation

christophe.caron at jouy.inra.fr christophe.caron at jouy.inra.fr
Thu Feb 22 19:50:02 GMT 2007


    [ The following text is in the "ISO-8859-15" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello

With $pe_slots as allocation_rule mpi jobs seems to work fine in all case 
(test, gromacs applications...)!!

I don't understand why with $fill_up rule it seems to me that it was ok
(Only with mpihello test applications in fact)

I'm not using reservation for others jobs than MPI.



cc


> Hi,
>
> Am 20.02.2007 um 19:13 schrieb christophe.caron at jouy.inra.fr:
>
>> Hello all,
>> 
>> 
>> In fact, it seems reservation is ok (after running some basic mpi tests).
>> 
>> The mistake was about the allocation_rule ($fill_up) works and came from
>> our interpretation on how it does it..
>> 
>> It seems that SGE "locks" the first node that release a job to fill up this 
>> node. If another nodes release slots, MPI jobs does not run even if all 
>> free slots are greater than reservation.
>
> I'm not sure, whether is really the intended behavior, as $fill_up will still 
> allocate slots from different machines (in contrast to $pe_slots, which will 
> only use slots from one machine).
>
> Maybe other slots were already reserved for other jobs?
>
> -- Reuti
>
>
>> Thanks again Reuti.
>> 
>> cc
>> 
>> 
>>> Hi again,
>>> 
>>> Am 16.02.2007 um 18:49 schrieb christophe.caron at jouy.inra.fr:
>>> 
>>>> On Fri, 16 Feb 2007, Reuti wrote:
>>>>> Am 16.02.2007 um 17:39 schrieb christophe.caron at jouy.inra.fr:
>>>>>> Hello
>>>>>> We use SGE 6u9+MPI 7.1.1 and FSS scheduling.
>>>>>> We would to run MPI jobs with 4 slots but others "normal" jobs are
>>>>>> always submitted before the MPI jobs in the queue (even if the mpi jobs 
>>>>>> is the first in the queue wait).
>>>>>> *Reservation is enable:
>>>>>> max_reservation                   20
>>>>>> *Submission :
>>>>>> #qsub -R y -q long.q -pe lam711 4 job.sh
>>>>> What is qstat -j <jobid> saying (with scheduler info turned on)? The PE 
>>>>> is attached to long.q? - Reuti
>>> 
>>> okay, the PE definition is...? - Reuti
>>> 
>>>> # qconf -sq long.q
>>>> pe_list               make lam711
>>>> ###################################################################### 
>>>> ##############
>>>> #qstat -j xxx
>>>> ==============================================================
>>>> job_number:                 1271926
>>>> exec_file:                  job_scripts/1271926
>>>> submission_time:            Fri Feb 16 18:38:58 2007
>>>> owner:                      steletch
>>>> uid:                        14962
>>>> group:                      mig
>>>> gid:                        233
>>>> sge_o_home:                 /home/mig/steletch
>>>> sge_o_log_name:             steletch
>>>> sge_o_path: /opt/sge/bin/lx24-amd64:/usr/kerberos/bin:/usr/local/ 
>>>> bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/genome/arb-22-08-2003:/ 
>>>> usr/local/public/pgaccess:/usr/local/genome/bin:/usr/local/www/bin:/ 
>>>> usr/local/public/java/bin:/usr/sbin:/usr/local/public/bin:/usr/ 
>>>> local/public/lam/bin:/usr/local/genome/emboss/bin:/usr/local/genome/ 
>>>> phylip/bin:/usr/local/genome/mugen/bin:/usr/local/genome/mgadist:/ 
>>>> usr/local/genome/MUMmer:/usr/local/genome/TMHMM/bin:/usr/local/ 
>>>> genome/hmmer/bin:/usr/local/genome/fasta:/usr/local/genome/autodock/ 
>>>> bin:/usr/local/genome/mcl64/mcl-05-321/bin:/usr/local/adm/bin:/usr/ 
>>>> local/public/modeller6v2/bin:/home/mig/steletch/bin
>>>> sge_o_shell:                /bin/bash
>>>> sge_o_workdir: /projet/mig/steletch/ 
>>>> 20070213OR1G1_256lipids_nouveau_protocole/md
>>>> sge_o_host:                 d2r2
>>>> account:                    sge
>>>> cwd: /projet/mig/steletch/20070213OR1G1_256lipids_nouveau_protocole/md
>>>> path_aliases:               /tmp_mnt/ * * /
>>>> stderr_path_list: /projet/mig/steletch/ 
>>>> 20070213OR1G1_256lipids_nouveau_protocole/OR1G1-md.runlog
>>>> reserve:                    y
>>>> mail_options:               aes
>>>> mail_list:                  steletch at jouy.inra.fr
>>>> notify:                     FALSE
>>>> job_name:                   OR1G1-qsub-md.sh
>>>> stdout_path_list: /projet/mig/steletch/ 
>>>> 20070213OR1G1_256lipids_nouveau_protocole/OR1G1-md.out
>>>> jobshare:                   0
>>>> hard_queue_list:            long.q
>>>> shell_list:                 /bin/bash
>>>> ...
>>>> script_file:                OR1G1-qsub-md.sh
>>>> parallel environment:  lam711 range: 2
>>>>                            queue instance "long.q at n49" dropped because it 
>>>> is full
>>>>                             queue instance "long.q at n56" dropped because 
>>>> it is full
>>>>                             queue instance "long.q at n48" dropped because 
>>>> it is full
>>>>                             queue instance "long.q at n44" dropped because 
>>>> it is full
>>>> ...
>>>>                            queue instance "long.q at n65" dropped because it 
>>>> is full
>>>>                             queue instance "long.q at n52" dropped because 
>>>> it is full
>>>>                             queue instance "long.q at n53" dropped because 
>>>> it is full
>>>> ...
>>>>                            cannot run in PE "lam711" because it only 
>>>> offers 60 slots
>>>> ...
>>>> long.q has 120 slots and pe lam711 has only 16 slots.
>>>> cc
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>> ******************************************************************** 
>>>> ******
>>>> Christophe Caron - INRA 	    | Tél: (+33) 013465 2888
>>>> Mathematique, Informatique et Genome| Fax: (+33) 013465 2901
>>>> Domaine de Vilvert 		    | Email: christophe.caron at jouy.inra.fr
>>>> F-78350 Jouy-en-Josas		    | http://migale.jouy.inra.fr/
>>>> ******************************************************************** 
>>>> ******
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>> 
>>> 
>> 
>> 
>> 
>> **************************************************************************
>> Christophe Caron - INRA 	    | Tél: (+33) 013465 2888
>> Mathematique, Informatique et Genome| Fax: (+33) 013465 2901
>> Domaine de Vilvert 		    | Email: christophe.caron at jouy.inra.fr
>> F-78350 Jouy-en-Josas		    | http://migale.jouy.inra.fr/
>> **************************************************************************
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>



**************************************************************************
Christophe Caron - INRA 	    | Tél: (+33) 013465 2888
Mathematique, Informatique et Genome| Fax: (+33) 013465 2901
Domaine de Vilvert 		    | Email: christophe.caron at jouy.inra.fr
F-78350 Jouy-en-Josas		    | http://migale.jouy.inra.fr/
**************************************************************************



    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list