[GE users] Resource reservation fails for large job

andreas andreas.haas at sun.com
Thu Jun 4 10:30:16 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Sabine,

On Wed, 3 Jun 2009, s_kreidl wrote:

> Hi Andreas,
>
>>> Where does SGE get the 70 seconds from?
>>
>> I think this is from default_duration in sched_conf(5):
>>
>>     http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/devel/rfe/resource_reservation.txt
>>
>> to verify do a
>>
>>     # qconf -ssconf | grep default_duration
>
> Thanks for the link, I will have a look at this tonight.
> The "qconf -sconf" delivers:
>   default_duration                  0:00:10
> so 10 seconds, not 70.

It's 10 seconds from default_duration plus 60 seconds duration_offset as described in the specification document above.

With the duration_offset one can level the difference between net and gross job runtimes due to scheduling overhead, job delivery etc.

>>> Even more irritating, the reservation times, e.g. 1244??742740(=Thu, 11 Jun 2009 17:52:20 GMT) indeed seem to take the queues soft runtime limit of 10 days into account.
>>
>> Hm. I could look into this, but I need a more complete 'schedule' file.
>
> How much of a schedule file can you devour, respectively should I attach to me message? I'm getting approximately 30MB of scheduling data every day (doing a daily logrotate).

You can send me this 30MB file in an email direcly to me.

>>> Does resource reservation only work, if the job has a fixed h_rt or s_rt value (if so which one?) provided with the -l option? Or must a corresponding complex be enforced? What exactly does one need to do to get resource reservation running?
>>
>> Either way you need to have control over job wall-clock times. Otherwise any reservation scheduling is in vain.
>> As for a reliable default for each job you add -l h_rt=... or -l s_rt=... into the site-wide sge_request(5) file:
>>
>>     $SGE_ROOT/default/common/sge_request
>>
>> this default gets picked-up by any job that is submitted.
>>
>> For overwriting the defaults the same options can be used at the qsub command line.
>>
>
> I've added the queue's soft and hard runtime limits to sge_request, and also set those limits for the already pending jobs in the queue. I'll send my observations asap.

Good.

> However, the settings in the sge_request file can simply be erased by the "-clear" option to the qsub, etc. commands, right?

Right.

> What is the best way to enforce the runtime limits globally?

Well, the use of -clear can not be deactivated.

If you are not happy with this solution you could additionally specify the queue's hard/soft limit as default_duration. 
That way the scheduler knows the maximum job runtime *before* it selects a queue instance and can thus consider it accordingly.

>
> One last question: The s_rt limit doesn't show up in the output of "qstat -j jobid". Does this mean, that this is then not taken into account for scheduling?

Queue resource limits apparently can not be used at the time when the scheduler looks for an assignment,
but after an assignment is found it is applied.

Regards,
Andreas

>
> Thanks,
> Sabine
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=200686
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

http://gridengine.info/

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschäftsführer: Thomas Schröder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=200832

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list