[GE users] MPICH process groups
reuti at staff.uni-marburg.de
Mon Dec 12 17:45:54 GMT 2005
Am 12.12.2005 um 18:21 schrieb Jeroen M. Kleijer:
> Hi everyone,
> I've followed the instructions for tight integration with MPICH to
> the letter but somehow, qrsh won't pass the variable
> MPICH_PROCESS_GROUPS along causing every new command started from
> the sge_execd slave node to have its own process group.
> To be exact, I did the following:
> - in the submit script I put the line "export
> MPICH_PROCESS_GROUP=no" before the mpirun line.
> - I edited the rsh wrapper to read "qrsh -V ..." (even did "qrsh -V
> -v MPICH_PROCESS_GROUP=no" to be on the safe side but to no avail)
> - I edited the submit command to read "qsub -v
> MPICH_PROCESS_GROUP=no ....", also to no avail.
> The job runs fine when submitted but when it gets killed, the
> remaning processes still keep on running on the slave nodes even
> though they are children of sge_execd. (after doing this for a
> while, starting jobs and killing them, I run out of semaphores
> which is annoying as hell since I then have to clean them up
> manually, anyone have any advice on how to handle this?)
> As this particular product is an off the shelf product (Msc.Marc)
> with it's own compiled version of mpich I really don't want to try
> roll my own version of mpich and try integrating that one.
> Does anybody know what I'm doing wrong here? Why won't the
> processes on the slave nodes die when a qdel command is issued?
when there are semaphores left, this means that usually MPICH was
compiled to use shared memory on a node. It would also need a
machinefile of the type:
which will allocate the shared memory, but won't use it (at least
it's allocated in double on each node). Can you please post a
processtree like mentioned in the Howto of the master and a slave
node? I found something about run_marc. Is this still used, or is it
now a plain mpirun?
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users