[GE users] Killed by Limit and Transfer to another Queue

kain2log gilbert at rldp.com.ph
Mon Dec 14 03:24:43 GMT 2009


Dear Reuti,
Thank you again for your reply.


> 
> Why on the same machine? One duty of the checkpointing interface is  
> to copy the local data from $TMPDIR to any intermediate storage and  
> then to the new node when the job continues on another machine.

Same machine...
well, Im not sure if our software execution can be suspended and be transferred & restart to another machine, but I know it can be suspended and restart for later.
(by the way, we are using Cadence-spectre)

I guess I need to read more about our software and SGE checkpointing too, this is a great idea.

> 
> 
> > The wallclock limit will be so short (about 5mins) that it would be  
> > OK to restart the job. Would checkpoint's migration be applicable,  
> > or are there other work around?
> 
> Then you don't need any checkpointing facitility. Just reschedule the  
> job with: qmod -rj <job_id>
> 
> But why do you want to do this? When you know beforehand that the job  
> will run longer then 5 minutes, then you could request this run time  
> (-l h_rt) and SGE would automatically send the job to the correct queue.

Sometimes its hard to judge how long a simulation would take, also even if it is very obvious that a job would execute for more 5 minutes, USERs would still queue on the one with the limit. So I want to do the queue transfer automatically.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=233175

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list