[GE users] soft wallclock limit exceeded: job not killed

Reuti reuti at staff.uni-marburg.de
Tue Jan 8 12:33:03 GMT 2008


Hi,

Am 08.01.2008 um 12:04 schrieb Alois Dirnaichner:

> we forced our users to submit with the s_rt flag to limit their job's
> runtimes.
> The maximum for their request is defined in the queue configuration.
> Notify time is one hour.
> What should happen is this (according to man pages and mailing list):
> After the job reaches s_rt limit, it is sent SIGUSR1 and after one
> additional hour to feather the nest it is killed.

why one hour? You set this as h_rt - this will not be added but is a  
limit on it's own? The sigusr1 is ignored by the script/program? - Reuti

> Correct me if I'm wrong.
>
> Nevertheless some jobs manage to escape the procedure:
>
> qstat:
> hard resource_list: h_vmem=4G,s_rt=1296000
> usage 34: cpu=15:15:03:33, mem=1556363.63071 GBs, io=0.00000,
> vmem=1.917G, maxvmem=1.918G
>
> 1296000s = 15d, hence, the job had flinged off its restraints. Is it
> possible that the cpu time is greater than the wallclock time?
> I don't know how to check the wallclock time with qstat. A quick check
> with ARCo unearthed other recent s_rt violations.
> What is happening?
> Yours,
>
> Al
>
>
> -- 
>
> Alois Dirnaichner
> http://www.theorie.physik.uni-muenchen.de/~al
>
> Rechnerbetriebsgruppe
> Arnold Sommerfeld Center
> Theresienstr. 39
> 80333 Muenchen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list