[GE users] Problems with Advanced Reservations

reuti reuti at staff.uni-marburg.de
Wed Oct 13 13:52:51 BST 2010


Hi,

Am 13.10.2010 um 10:57 schrieb pablorey:

>     Hi Reuti,
> 
>     Yes, the "mpi_1p" has a fixed allocation rule of 1. We use it to be sure that GE assigns 1 MPI slot per node.
> 
>     In the attached document you can see the configuration of the mpi and mpi_1p parallel environment. We also have checked other parallel environments with different allocation rules (round_robin, 2, 4, ...) with the same results.
> 
>     You can also find in the attached document two examples. The first of them use "mpi_1p" to reserve several nodes and we cannot submit the jobs (except if we request only 1 slot with "-pe mpi_1p 1"). In the second example we use "mpi" and so only 1 node is reserved. In this case all works properly.
> 
>     Regarding the .sge_request file, we don't use it so we don't request any queue by default. We specify the queues in the qrsub command because we only want to use nodes belonging to that queues.

what I can see in the attached document: did you change num_proc's relation? It's by default "== " and as it's just like a feature of a machine it shouldn't be touched, besides requesting the exact amount for certain machines. You are requesting "num_proc=1" which would mean single core machines.

Near the end of page 2 of the document: you can submit the job with "-w n" but get "no suitable queue(s)" for "-w v". For a normal `qsub` bypassing the verification would lead to a job which is waiting forever. But in my test the job started to run inside the AR when I request far too large amounts for s_rt - you observed the same? I would judge it to be a bug, although I'm not sure for now how to phrase it in issuzilla.

To the real problem: don't request any s_rt or alike in the real `qsub`, or specify "-w n".

-- Reuti



>     Thank you very much by your help,
>     Pablo
> 
> 
> 
> On 11/10/2010 19:32, reuti wrote:
>> Am 11.10.2010 um 16:23 schrieb pablorey:
>> 
>> 
>>>     Hi Reuti,
>>> 
>>>     Yes, I request always the same parallel environment used to submit the AR when I submit jobs (mpi_1p or mpi). The first test job is always done requesting the same resources used in the qrsub command. As it don't work, I change the requirements (num_proc, s_rt, s_vmen, ...) o the number of slots but always use the PE requested in the qrsub command.
>>> 
>> And the "mpi_1p" has a fixed allocation rule of 1 then?
>> 
>> For now I can't reproduce this. Can you force the execution with "-w n" instead of "-w v"?
>> 
>> Do you request any queues by an .sge_request by default?
>> 
>> -- Reuti
>> 
>> 
> 
> -- 
> Pablo Rey Mayo
> Tecnico de Sistemas
> Centro de Supercomputacion de Galicia (CESGA)
> Avda. de Vigo s/n (Campus Sur)
> 15705 Santiago de Compostela (Spain)
> Tel: +34 981 56 98 10 ext. 233; Fax: +34 981 59 46 16
> email: prey at cesga.es; http://www.cesga.es/
> ------------------------------------------------
> NOTA: Este mensaje ha sido redactado intencionadamente sin utilizar
> acentos ni caracteres especiales, para que pueda ser visualizado
> correctamente desde cualquier cliente de correo y sistema.
> ------------------------------------------------
> <xacobeo.jpg>
> <AR_problem.pdf>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=286868

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list