[GE users] Killed by Limit and Transfer to another Queue

reuti reuti at staff.uni-marburg.de
Fri Dec 11 12:30:00 GMT 2009


Am 11.12.2009 um 00:48 schrieb kain2log:

>> There is no such facility in SGE. What can be done is to setup
>> checkpointing (when the applications support this on their own
>> already), and use qalter to change the requested queue for the
>> continuation. So, checkpointing's "migration" would not migrate to
>> another machine but to another queue (possibly on the same machine).
>>
>>
>>
>> -- Reuti
>
> Reuti
>
> Thank you.
>
> If I would like to resume or continue the job, then it would be  
> definitely have to be on the same machine.
> (I plan to do this in the future)

Why on the same machine? One duty of the checkpointing interface is  
to copy the local data from $TMPDIR to any intermediate storage and  
then to the new node when the job continues on another machine.


> The wallclock limit will be so short (about 5mins) that it would be  
> OK to restart the job. Would checkpoint's migration be applicable,  
> or are there other work around?

Then you don't need any checkpointing facitility. Just reschedule the  
job with: qmod -rj <job_id>

But why do you want to do this? When you know beforehand that the job  
will run longer then 5 minutes, then you could request this run time  
(-l h_rt) and SGE would automatically send the job to the correct queue.

-- Reuti


> Best Regards.
> Gilbert
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=232697
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=232777

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list