[GE users] suspend/resume rsh/qrsh parallel task with SGE

fboucher Florent.Boucher at cnrs-imn.fr
Mon Mar 9 13:57:43 GMT 2009


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Reuti,

reuti a écrit :

Hi,

Am 09.03.2009 um 10:46 schrieb fboucher:



I would like to be able to suspend parallel task that are not based on
MPI communications.
The main script, that runs on the master, start child processes using
rsh (or ssh) on different nodes. All those tasks are independent
and can
be done in parallel (no communications between them). However, one
need
to finish all of them before continuing the whole job.
I would like to be able to suspend all the job (as one can do with
mpitask). At the moment, the SIGTSTP or SIGSTOP signal that is send
using qmod -sj. However, the child processes generated by the master
script completely ignore this SIGNAL (it is not trap by rsh/qrsh
nor ssh).
Does a way exist to send directly this SIGTSTP signal to all the child
process created by the master script (or to trap it with the rsh/ssh
command) ?



a patch was on the list some time ago (of course, you need a tight
integration of the parallel application then):



I will update and see if it helps (we have 6.1u3 at the moment). However, do you think this patch will solve the case where doing qmod -sj $JOBID as no effect on the child processes ?
Also, what do you call a tight integration ? I am quite new with GE and not so familiar with.
I use specific parallel mpich environment to submit the job, capture the list of nodes and processors to generate my own machine files and then use it to start the remote task using commands like:
/opt/sge/bin/lx24-amd64/qrsh -inherit n001 lapw1c dnlapw1_1.def

http://gridengine.sunsource.net/ds/viewMessage.do?
dsForumId=38&dsMessageId=74965

http://gridengine.sunsource.net/issues/show_bug.cgi?id=2740

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=125423

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].


Florent


--
 -------------------------------------------------------------------------
| Florent BOUCHER                    |                                    |
| Institut des Matériaux Jean Rouxel | Mailto:Florent.Boucher at cnrs-imn.fr |
| 2, rue de la Houssini?re           | Phone: (33) 2 40 37 39 24          |
| BP 32229                           | Fax:   (33) 2 40 37 39 95          |
| 44322 NANTES CEDEX 3 (FRANCE)      | http://www.cnrs-imn.fr             |
 -------------------------------------------------------------------------


    [ Part 2, "Florent_Boucher.vcf"  Text/X-VCARD (Name: ]
    [ "Florent_Boucher.vcf") ~475 bytes. ]
    [ Unable to print this part. ]



More information about the gridengine-users mailing list