[GE users] Suspending a job

spow guillaume.quere at fr.thalesgroup.com
Tue Jul 6 13:06:46 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi reuti,

reuti a écrit :
> I would say, that there is much end user documentation. You can use the older PDF docs (with several hundreds of pages: http://docs.sun.com/app/docs/coll/1017.4 ), which is still valid but doesn't cover all of the new features. Other versions are covered in Wikis and the also the Howtos:
>
> http://gridengine.sunsource.net/howto/howto.html
>
>   
I read the N1 doc already which is what we use today, but I didn't come 
accross the howtos, tyvm !
The upgrade to SGE 6.2u5 should be complete by next week.
>
> No. Your applications must support checkpointing also outside of SGE on its own already. Then SGE can be setup to trigger these already available checkpointing mechanism.
>
> The checkpointing interface in combination with a subordination can be used to requeue a preempt job when a superordinated job starts though. But as resources are only released after the subordinated job is requeued, the superordinated job must have the ablility to start already although some resources are blocked by the subordinated job.
>   
I looked at the sample code given by the checkpointing howto : it is too 
complicated to implement for end users, as a crashing job would consume 
less time than re-writing all the code samples they are currently 
executing. I thought the use of the checkpointing environment was much 
easier to use !
> When you are not satisfied with the above options, you will have to use a co-scheduler, which will requeue the job in question to free up resources. It also needs to take measures to avoid that the requeued job will restart immediately.
>
> A parallel job should always preempt a seial one in your setup?
>
> -- Reuti
>   
Could you further explain what a co-scheduler is, or give me an url ? I 
have been unable to find decent google answers.
As for the parallel jobs, they should indeed always preempt serial.
Only in a few cases will the administrator decide to stop some of  them, 
and those cases should be resolved manually.

Thank you for your answer.
G. Q.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266310

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list