[GE users] Spectre checkpoint

reuti reuti at staff.uni-marburg.de
Wed Sep 9 12:21:02 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Am 09.09.2009 um 13:05 schrieb veerendra_n:

> Hi,
>
> Let me know if I can find some solution.

This I answered already two days ago - please check the archive for  
my email to force a job to run on the same host again.

-- Reuti

>
> Regards
> veerendra
>
> -----Original Message-----
> From: Veerendra [mailto:veerendra at yashasvi.co.in]
> Sent: Tuesday, September 08, 2009 6:57 PM
> To: 'users'
> Subject: RE: [GE users] Spectre checkpoint
>
> Here is the test setup
>
> Qmaster - Host A
> Execution host - Host B
> Execution host - Host C
>
> Configuration - In the queue configuration - Execution method I  
> have configured
>
> SUSPEND METHOD - SIGTSTP
> RESUME METHOD - SIGCONT  (This is based on spectre documentation)
>
> When I submit a job using qsub, the job starts execution on HOST B,  
> when I reschedule the job in middle, and if HOST B is not free it  
> starts the job on HOST C from the beginning (does not resume).
>
> However if HOST B is available it resumes the job from where it was  
> restarted.
>
> My requirement is to resume the job on HOST C also. (I have not  
> configured checkpoint as yet, only Execution method has been  
> configured on the queue).
>
> Regards
> Veeru!
>
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Tuesday, September 08, 2009 6:42 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Spectre checkpoint
>
> Am 08.09.2009 um 14:36 schrieb veerendra_n:
>
>> Hi Reuti
>>
>> Thanks for the response.
>>
>> My requirement is that when I reschedule a spectre job running on
>> host x to resume on host y.
>
> This I answered yesterday.
>
>
>> To achieve what can configuration needs to be in place? If
>> checkpoint configuration is the answer how do I go about?
>
> I still don't get it: you have a working checkpointing facility right
> now by just setting up the suspend_- and resume_method? Suspended
> jobs are still on the same machine and will continue at a later point
> in time on this machine.
>
> -- Reuti
>
>
>> Regards
>> Veeru!
>>
>> -----Original Message-----
>> From: reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: Tuesday, September 08, 2009 5:32 PM
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Spectre checkpoint
>>
>> Hi,
>>
>> Am 08.09.2009 um 12:01 schrieb veerendra_n:
>>
>>> I?m trying to configure checkpoint for a spectre job. I pass
>>> SIGTSTP and SIGCONT  in the  execution method and it works very
>>> well when the job reschedules on the same host.
>>>
>>> However the problem arises when the rescheduled job resumes on
>>> different host from where it started. It restarts from the
>>> beginning instead of resuming. Right now we have just configured
>>> Execution method in queue configuration (Suspend method SIGTSTP ?
>>> Resume method SIGCONT).
>>>
>>> How should I configure checkpointing?
>>
>> the job quits itself after writing the checkpointing file by the
>> sigtstp? When you only defined the suspend and resume method, then
>> the job stays on the node and won't get rescheduled at all. Therefore
>> I don't understand your question in detail.
>>
>> -- Reuti
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=216398
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=216402
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=216407
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=216548
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216553

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list