[GE users] Resource reservation fails for large job

s_kreidl sabine.kreidl at uibk.ac.at
Wed Jun 3 17:03:06 BST 2009

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Andreas,

> > Where does SGE get the 70 seconds from?
> I think this is from default_duration in sched_conf(5):
>     http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/devel/rfe/resource_reservation.txt
> to verify do a
>     # qconf -ssconf | grep default_duration

Thanks for the link, I will have a look at this tonight. 
The "qconf -sconf" delivers:
   default_duration                  0:00:10
so 10 seconds, not 70.

> > Even more irritating, the reservation times, e.g. 1244??742740(=Thu, 11 Jun 2009 17:52:20 GMT) indeed seem to take the queues soft runtime limit of 10 days into account.
> Hm. I could look into this, but I need a more complete 'schedule' file.

How much of a schedule file can you devour, respectively should I attach to me message? I'm getting approximately 30MB of scheduling data every day (doing a daily logrotate).
> >
> > Does resource reservation only work, if the job has a fixed h_rt or s_rt value (if so which one?) provided with the -l option? Or must a corresponding complex be enforced? What exactly does one need to do to get resource reservation running?
> Either way you need to have control over job wall-clock times. Otherwise any reservation scheduling is in vain.
> As for a reliable default for each job you add -l h_rt=... or -l s_rt=... into the site-wide sge_request(5) file:
>     $SGE_ROOT/default/common/sge_request
> this default gets picked-up by any job that is submitted.
> For overwriting the defaults the same options can be used at the qsub command line.

I've added the queue's soft and hard runtime limits to sge_request, and also set those limits for the already pending jobs in the queue. I'll send my observations asap.

However, the settings in the sge_request file can simply be erased by the "-clear" option to the qsub, etc. commands, right?
What is the best way to enforce the runtime limits globally?

One last question: The s_rt limit doesn't show up in the output of "qstat -j jobid". Does this mean, that this is then not taken into account for scheduling?



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list