[GE users] Prevent job execution if a certain wallclock time value is exceeded
reuti at staff.uni-marburg.de
Sun Nov 1 17:45:12 GMT 2009
Am 29.10.2009 um 14:06 schrieb rafaarco:
> Hello everyone,
> I would like to know the best way in SGE to implement the following
> Let's suppose every user is given 1000 hours of wallclock time in
> to run his jobs. Every time a user submits a job and finishes, this
> job's total wallclock time (slots * actual wallclock time) is
> from the initial user's time. Finally, when this time falls to 0, the
> user is prevented to run (ideally, submit) more jobs.
at time of submission the job can still be eligible to be submitted.
But when it would start, the user can be over the limit already. One
place to define such a thing when the user can't run anything could
be a global "xuser_lists disabled" (qconf -mconf) and adjust the
entries in the "disabled" list (it's just a conventional user list,
which name you can adjust).
For a check at submission time a JSV (job submission verifier) would
do the job (whether the requested wallclock time of the job would
pass the limit, if the user is already in the above "disabled" list,
he can't submit anything anyway).
> I am thinking of using prolog and epilog to do it. The initial user
> is stored somewhere (e.g., a database) and when a job finishes its
> wallclock time is fetched via qacct and the total time left is
> When a job is started, time is checked so that if job's h_rt*slots is
> higher than the time left, job fails to start.
AFAIK the accounting record is written after the job had finished,
hence in the epilog you can use qacct only to collect the information
of the slave tasks of a parallel job. Also for parallel jobs it might
be good to set "accounting_summary TRUE", otherwise you have to
summarize all records for one job on your own.
The only option to cover all, is a separate dawmon which will check
the accounting and granted times. When an user reached his limit, he
should be added to the "disabled" list and all of his jobs killed.
I wonder, whether this is worth an RFE. There is no hook in SGE,
where you can define a job-epilog, which is run after the job
completed and all accounting records were written for sure. Such a
process would then run under the SGE admin user (or root) and of
course wouldn't count against the job wallclock time. Maybe for now
the entry "mailer" can be abused fur such a thing (it's started by
the sge_execd), but this would also mean to check wether it's because
of the begin (b) of a job or because of an abort or end (ea).
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users