[GE users] "h_rt" or "s_rt" for predicting job end times

futurity neil at futurity.co.uk
Thu Feb 12 16:38:56 GMT 2009


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

I finally understand it :)

Thanks Reuti.  

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: 12 February 2009 15:09
To: users at gridengine.sunsource.net
Subject: Re: [GE users] "h_rt" or "s_rt" for predicting job end times

Hi,

Am 12.02.2009 um 15:54 schrieb futurity:

> Thank you Brian for your replay.
>
> I can't see my users coding to listen for a SIGUSR1, so as you say, we 
> may as well use "h_rt".

- the h/s_rt... are explained in `man queue_conf`section "RESOURCE LIMITS"

- this is already an RFE, i.e. a not-enforced runtime prediction:  
http://gridengine.sunsource.net/issues/show_bug.cgi?id=2087

-- Reuti


> Saying that, is there any way for a job to specify the amount of time 
> a job should run for without it being terminated if it runs for too 
> long?
>
> Regards
>
> Neil
>
> -----Original Message-----
> From: brs [mailto:brs at usf.edu]
> Sent: 12 February 2009 14:44
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] "h_rt" or "s_rt" for predicting job end times
>
> Neil,
>
> IIRC, They should both override default_duration (in the
> sched_conf) in order to tell the scheduler "how long" a job should 
> run. The difference is that s_rt will send a SIGUSR1 to your process 
> some time before it sends a SIGTERM. This allows you to implement a 
> signal handler in your job to properly handle job termination. This 
> value is set in the queue configuration in the 'notify' field. h_rt 
> does not provide a SIGUSR1 and, I believe, sends a SIGKILL to the 
> processes, terminating them ungracefully.
> This can be bad if any of your jobs catch SIGTERM in order to 
> facilitate a clean-up process before exiting, BUT I have not seen very 
> many codes that catch SIGTERM or SIGUSR1 (on our system at
> least) and so most of our users use h_rt.
>
> Best Regards,
> Brian Smith
>
> futurity wrote:
>>
>> Hi,
>>
>> We?re using Grid Engine 6.1 and need some help deciding which out of 
>> "h_rt" and "s_rt" our jobs should be using in order to help the 
>> scheduler predict when jobs will finish.
>>
>> When I posted recently about our reservation problems, Reuti 
>> suggested I look into using ?h_rt?. Unfortunately the Admin and User 
>> PDF guides don?t contain any information on either "h_rt" or "s_rt", 
>> so I had to experiment to find out what it does.
>>
>> From my experiments, it appears that ?h_rt? sets a run time per job, 
>> which is used by the scheduler to predict when jobs finish.
>> Unfortunately, it causes jobs to be terminated if they run for longer 
>> than this specified time. I?m guessing that ?h_? stands for a hard 
>> limit and this is why jobs are terminated when then exceed this?
>>
>> I?m guessing that ?s_rt? is a soft limit? I?m hoping that this means 
>> that once the time specified by the job is reached, that it does 
>> ?NOT?
>> terminate the job? i.e. if the user specified the wrong time limit by 
>> accident, or the job ran slower for some reason, that the job would 
>> be allowed to continue running?
>>
>> Does anyone know if ?s_rt? is also used by the scheduler in the same 
>> way that ?h_rt? is used and if the only difference would be that one 
>> terminates and the other doesn?t?
>>
>> Sorry for all these questions but I can?t seem to find any 
>> documentation on these two settings. If anyone can point me at some 
>> documentation it would be really appreciated.
>>
>> Many thanks,
>>
>> Neil
>>
>
>
> --
> Brian Smith
> Sr. HPC Systems Administrator
> Research Computing, University of South Florida
> 4202 E. Fowler Ave. ENB308
> Office Phone: +1 813 974-1467
> Organization URL: http://rc.usf.edu
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=104078
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=104084
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=104086

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=104136

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list