[GE users] LAM / MPI : Reservation

christophe.caron at jouy.inra.fr christophe.caron at jouy.inra.fr
Tue Feb 20 18:13:59 GMT 2007


    [ The following text is in the "ISO-8859-15" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello all,


In fact, it seems reservation is ok (after running some basic mpi tests).

The mistake was about the allocation_rule ($fill_up) works and came from
our interpretation on how it does it..

It seems that SGE "locks" the first node that release a job to fill up 
this node. If another nodes release slots, MPI jobs does not run even if 
all free slots are greater than reservation.

Thanks again Reuti.

cc


> Hi again,
>
> Am 16.02.2007 um 18:49 schrieb christophe.caron at jouy.inra.fr:
>
>> On Fri, 16 Feb 2007, Reuti wrote:
>> 
>>> Am 16.02.2007 um 17:39 schrieb christophe.caron at jouy.inra.fr:
>>> 
>>>> Hello
>>>> We use SGE 6u9+MPI 7.1.1 and FSS scheduling.
>>>> We would to run MPI jobs with 4 slots but others "normal" jobs are
>>>> always submitted before the MPI jobs in the queue (even if the mpi jobs 
>>>> is the first in the queue wait).
>>>> *Reservation is enable:
>>>> max_reservation                   20
>>>> *Submission :
>>>> #qsub -R y -q long.q -pe lam711 4 job.sh
>>> 
>>> What is qstat -j <jobid> saying (with scheduler info turned on)? The PE is 
>>> attached to long.q? - Reuti
>
> okay, the PE definition is...? - Reuti
>
>> 
>> # qconf -sq long.q
>> pe_list               make lam711
>> 
>> 
>> ###################################################################### 
>> ##############
>> #qstat -j xxx
>> ==============================================================
>> job_number:                 1271926
>> exec_file:                  job_scripts/1271926
>> submission_time:            Fri Feb 16 18:38:58 2007
>> owner:                      steletch
>> uid:                        14962
>> group:                      mig
>> gid:                        233
>> sge_o_home:                 /home/mig/steletch
>> sge_o_log_name:             steletch
>> sge_o_path: /opt/sge/bin/lx24-amd64:/usr/kerberos/bin:/usr/local/ 
>> bin:/bin:/usr/bin:/usr/X11R6/bin:/usr/local/genome/arb-22-08-2003:/ 
>> usr/local/public/pgaccess:/usr/local/genome/bin:/usr/local/www/bin:/ 
>> usr/local/public/java/bin:/usr/sbin:/usr/local/public/bin:/usr/ 
>> local/public/lam/bin:/usr/local/genome/emboss/bin:/usr/local/genome/ 
>> phylip/bin:/usr/local/genome/mugen/bin:/usr/local/genome/mgadist:/ 
>> usr/local/genome/MUMmer:/usr/local/genome/TMHMM/bin:/usr/local/ 
>> genome/hmmer/bin:/usr/local/genome/fasta:/usr/local/genome/autodock/ 
>> bin:/usr/local/genome/mcl64/mcl-05-321/bin:/usr/local/adm/bin:/usr/ 
>> local/public/modeller6v2/bin:/home/mig/steletch/bin
>> sge_o_shell:                /bin/bash
>> sge_o_workdir: /projet/mig/steletch/ 
>> 20070213OR1G1_256lipids_nouveau_protocole/md
>> sge_o_host:                 d2r2
>> account:                    sge
>> cwd: /projet/mig/steletch/20070213OR1G1_256lipids_nouveau_protocole/md
>> path_aliases:               /tmp_mnt/ * * /
>> stderr_path_list: /projet/mig/steletch/ 
>> 20070213OR1G1_256lipids_nouveau_protocole/OR1G1-md.runlog
>> reserve:                    y
>> mail_options:               aes
>> mail_list:                  steletch at jouy.inra.fr
>> notify:                     FALSE
>> job_name:                   OR1G1-qsub-md.sh
>> stdout_path_list: /projet/mig/steletch/ 
>> 20070213OR1G1_256lipids_nouveau_protocole/OR1G1-md.out
>> jobshare:                   0
>> hard_queue_list:            long.q
>> shell_list:                 /bin/bash
>> ...
>> script_file:                OR1G1-qsub-md.sh
>> parallel environment:  lam711 range: 2
>>                            queue instance "long.q at n49" dropped because it 
>> is full
>>                             queue instance "long.q at n56" dropped because it 
>> is full
>>                             queue instance "long.q at n48" dropped because it 
>> is full
>>                             queue instance "long.q at n44" dropped because it 
>> is full
>> ...
>>                            queue instance "long.q at n65" dropped because it 
>> is full
>>                             queue instance "long.q at n52" dropped because it 
>> is full
>>                             queue instance "long.q at n53" dropped because it 
>> is full
>> ...
>>                            cannot run in PE "lam711" because it only offers 
>> 60 slots
>> ...
>> 
>> 
>> long.q has 120 slots and pe lam711 has only 16 slots.
>> 
>> 
>> cc
>> 
>> 
>> 
>> 
>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>> 
>>> 
>> 
>> 
>> 
>> **************************************************************************
>> Christophe Caron - INRA 	    | Tél: (+33) 013465 2888
>> Mathematique, Informatique et Genome| Fax: (+33) 013465 2901
>> Domaine de Vilvert 		    | Email: christophe.caron at jouy.inra.fr
>> F-78350 Jouy-en-Josas		    | http://migale.jouy.inra.fr/
>> **************************************************************************
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>



**************************************************************************
Christophe Caron - INRA 	    | Tél: (+33) 013465 2888
Mathematique, Informatique et Genome| Fax: (+33) 013465 2901
Domaine de Vilvert 		    | Email: christophe.caron at jouy.inra.fr
F-78350 Jouy-en-Josas		    | http://migale.jouy.inra.fr/
**************************************************************************



    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list