[GE users] What happens after a qdel?

reuti reuti at staff.uni-marburg.de
Wed Oct 6 11:27:16 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

Am 06.10.2010 um 11:06 schrieb aeszter:

> currently, I am trying to solve a problem with tightly integrated MPI jobs. So I was wondering if there is any documentation on what actually happens when a parallel job gets a qdel, or when it runs out of time (i.e. h_rt is exceeded).
> Apparently, the master node runs the PE's stop_proc_args.

correct.


> However, the processes on the other nodes seem to get a SIGKILL. Is this a feature of SGE?

Yes, as long as it's a tightly integrated job and don't start the slaves by a traditional ssh/rsh but through SGE's `qrsh -inherit ...` so that SGE is this way aware of the existence of slave porcesses. To check this you can do:

$ ps -e f

(f w/o -) to get a output of the relation of processes.


Nevertheless some parallel libraries shut down automatically, when the master process is gone. This would then work also when the jobs are not tightly integrated into SGE, but you would have a wrong accounting.

You set up MPCH2 with tight integration for mpd-starttup method? With the upcomining Hydra as startup method in MPICH2, it will be much easier as the tight integration is built already into MPICH2.


> Is it configurable?

There is an entry "terminate_method" in the queue definition, which could do other things than killing by process group (default) or additional group id (configurable).

-- Reuti


> Or is the signal coming from a different process (maybe the mpd)?
> 
> 
> Thanks,
> 
> A.
> -- 
> Ansgar Esztermann
> DV-Systemadministration
> Max-Planck-Institut für biophysikalische Chemie, Abteilung 105
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=286103
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=286137

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list