[GE users] MPICH process groups
Jeroen M. Kleijer
jeroen.m.kleijer at philips.com
Mon Dec 12 17:21:08 GMT 2005
I've followed the instructions for tight integration with MPICH to the
letter but somehow, qrsh won't pass the variable MPICH_PROCESS_GROUPS
along causing every new command started from the sge_execd slave node to
have its own process group.
To be exact, I did the following:
- in the submit script I put the line "export MPICH_PROCESS_GROUP=no"
before the mpirun line.
- I edited the rsh wrapper to read "qrsh -V ..." (even did "qrsh -V -v
MPICH_PROCESS_GROUP=no" to be on the safe side but to no avail)
- I edited the submit command to read "qsub -v MPICH_PROCESS_GROUP=no
....", also to no avail.
The job runs fine when submitted but when it gets killed, the remaning
processes still keep on running on the slave nodes even though they are
children of sge_execd. (after doing this for a while, starting jobs and
killing them, I run out of semaphores which is annoying as hell since I
then have to clean them up manually, anyone have any advice on how to
As this particular product is an off the shelf product (Msc.Marc) with
it's own compiled version of mpich I really don't want to try roll my own
version of mpich and try integrating that one.
Does anybody know what I'm doing wrong here? Why won't the processes on
the slave nodes die when a qdel command is issued?
Met vriendelijke groeten / Kind regards
Philips Applied Technologies
More information about the gridengine-users