[GE users] SGE Rescheduling

Gruhn Daniel J Contractor AF/A9IT Daniel.Gruhn.ctr at pentagon.af.mil
Thu Aug 3 15:34:35 BST 2006


Sreenath,

Look at issue 1440:
http://gridengine.sunsource.net/issues/show_bug.cgi?id=1440

This also mentions the discussion list thread you can read.

Dan 

 
//SIGNED//
Daniel J.Gruhn, CTR (Group W Inc.)
HQ USAF/A9IT
Studies & Analyses, Assesments and Lessons Learned


-----Original Message-----
From: Sreenath Nampally [mailto:sreenath at tigr.ORG] 
Sent: Thursday, August 03, 2006 9:24 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] SGE Rescheduling

Dan,

Could you please point me to this bug that you mentioned about asynchronus
rescheduling?
I tried searching the IssueTracker but Could not find it.

Thanks again
Sree

Gruhn Daniel J Contractor AF/A9IT wrote:

>One additional thing, I don't think the bug with rescheduling is fixed yet.
>That bug is that rescheduling seems to be an asyncronous process.  That 
>is, the rescheduled job may be able to get started before the original 
>job is killed.  In my case this makes a difference and I have to compensate
for it.
>
>Dan
>
>//SIGNED//
>Daniel J.Gruhn, CTR (Group W Inc.)
>HQ USAF/A9IT
>Studies & Analyses, Assesments and Lessons Learned
>
>
>-----Original Message-----
>From: Reuti [mailto:reuti at staff.uni-marburg.de]
>Sent: Thursday, August 03, 2006 7:33 AM
>To: users at gridengine.sunsource.net
>Subject: Re: [GE users] SGE Rescheduling
>
>Hi,
>
>Am 02.08.2006 um 23:14 schrieb Sreenath Nampally:
>
>  
>
>>Hello,
>>
>>Could someone explain the sequence of events that happen in SGE (both 
>>on qmaster and exec host) when a job is rescheduled  and suspended?
>>What signals are sent to the job ?
>>    
>>
>
>if the job gets supended, it will get a SIGSTOP which you can't catch. 
>But you could submit the job with -notify, to get a warning before, 
>which you can catch. Have a look at `man qsub`, and you could even 
>redefine the
>signal: `man sge_conf`section execd_params. But be aware, that the 
>signal will be send to the whole process group, and this might need 
>proper handling in the jobscript and the compiled program.
>
>If you reschedule a job, it will be killed, and also before this you 
>could get a warning by -notify. But I think, you will only get the 
>information about the kill, but not the reason that it will be 
>rescheduled. Only during the next run, you can test the variable 
>RESTARTED, whether it's 1. If you need a more sophisticated handling, 
>you can also try to use the checkpointing interface.
>
>HTH - Reuti
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list