[GE users] Slotwise subordinate suspension ignores suspend_method

reuti reuti at staff.uni-marburg.de
Thu Apr 15 22:17:42 BST 2010


Hi,

Am 15.04.2010 um 19:31 schrieb gracklewolf:

> I've configured 3 queues into a slotwise subordinate configuration  
> like so:
>
> E.q: subordinate_list  slots=8(A.q:0:sr)
> A.q: subordinate_list  slots=8(P.q:1:sr)
> P.q: subordinate_list  NONE
>
> I'm running some OpenMPI jobs in P.q.  I've configured P.q's  
> suspend_method to use SIGTSTP and resume_method to use SIGCONT so  
> that the mpi jobs will suspend all of their children properly.
>
> Everything works perfectly if I suspend an OpenMPI job by hand with  
> `qsub -sj <mpi_job_id>'.  The master mpirun receives the SIGTSTP  
> signal and broadcasts the expected SIGSTOP signal to its children  
> and the children all stop.  `qsub -usj <mpi_job_id>' will start the  
> whole MPI job again.
>
> However, if E.q or A.q fill up with jobs and there is an OpenMPI job  
> running in P.q, the MPI job will show status (S)ubordinate in qstat,

I' not sure whether I get it in the right way. When a slave-slot is  
suspended with a slotwise suspend, the complete parallel job keeps  
running and shows no (S) but still (r) for me.


> but the suspend_method signal is not sent to the master MPI job as  
> it would if it were suspended.  Is this expected behavior?  Why the  
> disconnect between subordinate status and suspend_method?

I think it's a bug in the implementation of slotwise suspend. With the  
original implementation of suspending a complete queue instance the  
suspension of a slave-task also put the master-task of this job into  
suspension (although it's on another exechost), as a partially running  
parallel job makes no sense.

-- Reuti


> thanks for any help.
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253554
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253578

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list