[GE users] Problems with Advanced Reservations

reuti reuti at staff.uni-marburg.de
Mon Oct 11 12:19:30 BST 2010


Hi,

Am 11.10.2010 um 12:53 schrieb pablorey:

>     We are doing some tests with Advanced Reservations and we have some problems that need to be solved because we need to start to use them.
> 
>     We submitted the AR without problems and the nodes were reserved properly for the required time:
> 
> prey at fs001:~> qrsub -q small_queue*,medium_queue*,large_queue*,superdome* -l num_proc=16,s_rt=01:00:00,s_vmem=10G,h_fsize=20G -pe mpi 10 -a 10111115 -d 3:00:00
> 
> prey at fs001:~> qrstat -ar 83
> --------------------------------------------------------------------------------
> id                             83
> name
> owner                          prey
> state                          r
> start_time                     10/11/2010 11:15:00
> end_time                       10/11/2010 14:15:00
> duration                       03:00:00
> submission_time                10/11/2010 11:12:38
> group                          root
> account                        sge
> resource_list                  num_proc=16, s_rt=3600, s_vmem=10G, h_fsize=20G
> granted_slots_list             small_queue at cn008.null=1,medium_queue at cn014.null=1,medium_queue at cn015.null=1,medium_queue at cn026.null=1,medium_queue at cn027.null=1,medium_queue at cn028.null=1,medium_queue at cn029.null=1,medium_queue at cn030.null=1,medium_queue at cn032.null=1,medium_queue at cn033.null=1
> granted_parallel_environment   mpi slots 10
> 
>     The problem is detected when we want to submit a job associated to this AR:
> 
> prey at fs001:~> qsub.orig -w v -ar 83  -l num_proc=16,s_rt=01:00:00,s_vmem=10G,h_fsize=20G -pe mpi 2 test2.sh
> Unable to run job: Job 2407116 cannot run in queue instance "all.q" because it was not reserved by advance reservation 83
> Job 2407116 cannot run in queue instance "meteogalicia_HP" because it was not reserved by advance reservation 83
> .....
> Job 2407116 cannot run in queue instance "failed_nodes" because it was not reserved by advance reservation 83
> Job 2407116 (-l h_fsize=20G,num_proc=16,s_rt=3600,s_vmem=10G) cannot run in queue "medium_queue at cn014.null" because it offers only qf:s_rt=00:00:00
> Job 2407116 (-l h_fsize=20G,num_proc=16,s_rt=3600,s_vmem=10G) cannot run in queue "medium_queue at cn015.null" because it offers only qf:s_rt=00:00:00
> Job 2407116 (-l h_fsize=20G,num_proc=16,s_rt=3600,s_vmem=10G) cannot run in queue "medium_queue at cn026.null" because it offers only qf:s_rt=00:00:00
> Job 2407116 (-l h_fsize=20G,num_proc=16,s_rt=3600,s_vmem=10G) cannot run in queue "medium_queue at cn027.null" because it offers only qf:s_rt=00:00:00
> Job 2407116 (-l h_fsize=20G,num_proc=16,s_rt=3600,s_vmem=10G) cannot run in queue "medium_queue at cn028.null" because it offers only qf:s_rt=00:00:00
> Job 2407116 (-l h_fsize=20G,num_proc=16,s_rt=3600,s_vmem=10G) cannot run in queue "medium_queue at cn029.null" because it offers only qf:s_rt=00:00:00
> Job 2407116 (-l h_fsize=20G,num_proc=16,s_rt=3600,s_vmem=10G) cannot run in queue "medium_queue at cn030.null" because it offers only qf:s_rt=00:00:00
> Job 2407116 (-l h_fsize=20G,num_proc=16,s_rt=3600,s_vmem=10G) cannot run in queue "medium_queue at cn032.null" because it offers only qf:s_rt=00:00:00
> Job 2407116 (-l h_fsize=20G,num_proc=16,s_rt=3600,s_vmem=10G) cannot run in queue "medium_queue at cn033.null" because it offers only qf:s_rt=00:00:00
> Job 2407116 cannot run in PE "mpi" because it only offers 1 slots
> verification: no suitable queues.
> Exiting.
> 
>     As you can see, the problem seems to be which 9 of the 10 reserved nodes (all of them in the same queue). We have tested requesting different s_rt values without success. We also have tested requesting different number of mpi slots. It only work when we request "-pe mpi 1" because one node (small_queue at cn008.null) seems to be reserved properly.
> 
>     Any idea?. What should we check?

what is the definition of s_rt in the queue definition?

-- Reuti


> 
>     I am very sorry if this is a know issue. The AR is something new for me.
> 
>     Thanks for having read.
> 
> -- 
> Pablo Rey Mayo
> Tecnico de Sistemas
> Centro de Supercomputacion de Galicia (CESGA)
> Avda. de Vigo s/n (Campus Sur)
> 15705 Santiago de Compostela (Spain)
> Tel: +34 981 56 98 10 ext. 233; Fax: +34 981 59 46 16
> email: prey at cesga.es; http://www.cesga.es/
> ------------------------------------------------
> NOTA: Este mensaje ha sido redactado intencionadamente sin utilizar
> acentos ni caracteres especiales, para que pueda ser visualizado
> correctamente desde cualquier cliente de correo y sistema.
> ------------------------------------------------
> <xacobeo.jpg>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=286485

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list