[GE users] Using customized PE + qrsh

Andy Schwierskott andy.schwierskott at sun.com
Mon Oct 18 13:04:53 BST 2004


in case of a tight integration the process group of the child process of the
shepherd is getting signalled.

If the actual process which shuld receive the signal is in a different
process group it's possible to configure a customized
{susppend|terminate]_method to signal the processes which actually should
receive the signal.

Certainly you still might run into problems about parallel applications
which are not ready to be stopped during their run as Ron mentions belows.


> If all the tasks are started by the shepherd, then the
> tight integration should be fine.
> However, suspend usually doesn't work well with
> parallel programs. Since suspending a parallel job
> will cause TCP timeouts, and the communication layer
> of the MPI library will fail.
> -Ron
> --- Melvin Koh <melvin at apstc.sun.com.sg> wrote:
>> My question is that will this be a
>> tight-integration? In my tests, all
>> resource usage are recorded, all tasks are canceled
>> using qdel, but
>> suspend does not work. Is it that the application
>> has to suspend itself
>> base on some signals from SGE?

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list