[GE users] modifying CPU time on running jobs.

Reuti reuti at staff.uni-marburg.de
Wed Nov 14 17:27:38 GMT 2007


Hi,

Am 14.11.2007 um 17:53 schrieb Baudilio Tejerina:

> In the configuration you suggest, do you mean having something like  
> h_cpu=24:00:00 and s_cpu=23:00:00  ?
>
> If in either case (reaching h_cpu or s_cpu) the job gets killed, I  
> don't see the benefits or usefulness of using a 'lower' s_cpu or  
> actually, any s_cpu at all.

the SIGXCPU for s_cpu will be send to the complete processgroup. The  
default action is to terminate the process with a core dump (http:// 
linux.about.com/od/commands/l/blcmdl7_signal.htm) . To gain an  
advantage, you need:

a) ignore the signal in the jobscript, e.g. for bash:

trap '' xcpu

b) install a signal handler for SIGXCPU in the executing program for  
a proper shutdown, e.g. to write intermediate results immediately  
before it gets killed finally

HTH -- Reuti


> Thank you so much for your comments.
>
> Baudilio
>
>
> On Nov 8, 2007, at 1:39 PM, Daniel Templeton wrote:
>
>> If both the queue (h_cpu       24:0:0) and the job (-l  
>> h_cpu=24:0:0) specify a CPU time limit, the stricter limit wins.   
>> That means you cannot qalter a job's limit above what is imposed  
>> by the queue.  If you want a grace period for jobs with an h_cpu  
>> limit, try using a lower s_cpu limit in addition.  With s_cpu, the  
>> job gets a SIGXCPU when the limit is met.  The job can catch the  
>> SIGXCPU and try to exit gracefully before the SIGKILL from the  
>> h_cpu limit comes.
>>
>> Note that there is a race condition between Grid Engine and the OS  
>> on h_cpu limits.  The OS will try to send a SIGXCPU, while Grid  
>> Engine tries to send a SIGKILL.  In my experience, the SIGXCPU  
>> usually get to the job first.
>>
>> Daniel
>>
>> Baudilio Tejerina wrote:
>>> Hi,
>>>
>>> Is it possible to modify the status of a job by extending the CPU  
>>> time limit imposed by the queue (h_cpu                 24:00:00)?
>>> What I'm seeking is to give a specific job a grace period of time  
>>> without altering the queue configuration.
>>>
>>>
>>> I've looked into the 'qalter' utility but, it doesn't seem to  
>>> contemplate this sort of conditions.
>>>
>>>
>>> Thanks,
>>>
>>> Baudilio
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list