[GE users] Suspending Parallel Jobs

Shannon V. Davidson svdavidson at charter.net
Thu Sep 25 22:10:20 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Thanks Ron - I'll dig thru the code and see if I can find it.

Shannon

Ron Chen wrote:
> I remember seeing SGE code that specifically blocks sending the suspend signal to the MPI tasks. From the list discussions, the reason is that if a MPI job is suspended, then the TCP/IP network socket calls will timeout, and the job will then fail.
>
> I think if we comment out a few lines of code, or only enable that code by a switch, then it will make many people on this list happy, as it is a FAQ.
>   
>  -Ron
>
>
> --- On Fri, 9/26/08, Shannon V. Davidson <svdavidson at charter.net> wrote:
>   
>> I'm trying to suspend a parallel job using a tight PE
>> integration, but 
>> the non-local MPI tasks are not being suspended.  Is the
>> tight PE 
>> integration code supposed to send the SIGSTOP signal to
>> every MPI task 
>> in the job?  Is the suspend method executed on every
>> execution host in a 
>> parallel job?
>>
>> Thanks,
>> Shannon
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail:
>> users-help at gridengine.sunsource.net
>>     
>
>
>       
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>   



More information about the gridengine-users mailing list