[GE users] SGE Rescheduling

Sreenath Nampally sreenath at tigr.ORG
Thu Aug 3 14:23:49 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Dan,

Could you please point me to this bug that you mentioned about 
asynchronus rescheduling?
I tried searching the IssueTracker but Could not find it.

Thanks again
Sree

Gruhn Daniel J Contractor AF/A9IT wrote:

>One additional thing, I don't think the bug with rescheduling is fixed yet.
>That bug is that rescheduling seems to be an asyncronous process.  That is,
>the rescheduled job may be able to get started before the original job is
>killed.  In my case this makes a difference and I have to compensate for it.
>
>Dan
>
>//SIGNED//
>Daniel J.Gruhn, CTR (Group W Inc.)
>HQ USAF/A9IT
>Studies & Analyses, Assesments and Lessons Learned
>
>
>-----Original Message-----
>From: Reuti [mailto:reuti at staff.uni-marburg.de] 
>Sent: Thursday, August 03, 2006 7:33 AM
>To: users at gridengine.sunsource.net
>Subject: Re: [GE users] SGE Rescheduling
>
>Hi,
>
>Am 02.08.2006 um 23:14 schrieb Sreenath Nampally:
>
>  
>
>>Hello,
>>
>>Could someone explain the sequence of events that happen in SGE (both 
>>on qmaster and exec host) when a job is rescheduled  and suspended? 
>>What signals are sent to the job ?
>>    
>>
>
>if the job gets supended, it will get a SIGSTOP which you can't catch. But
>you could submit the job with -notify, to get a warning before, which you
>can catch. Have a look at `man qsub`, and you could even redefine the
>signal: `man sge_conf`section execd_params. But be aware, that the signal
>will be send to the whole process group, and this might need proper handling
>in the jobscript and the compiled program.
>
>If you reschedule a job, it will be killed, and also before this you could
>get a warning by -notify. But I think, you will only get the information
>about the kill, but not the reason that it will be rescheduled. Only during
>the next run, you can test the variable RESTARTED, whether it's 1. If you
>need a more sophisticated handling, you can also try to use the
>checkpointing interface.
>
>HTH - Reuti
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list