[GE users] Stopping MPICH child processes on qdel

Brian Smith brian at cypher.acomp.usf.edu
Tue Nov 22 23:14:19 GMT 2005


To all:

Issuing qdel to an mpich job does not kill the child processes in SGE
6.0u6 with Myrinet tight-integration.

PE looks like so:

pe_name           mpich-pgi
slots             20
user_lists        NONE
xuser_lists       NONE
start_proc_args   /usr/local/sge/mpi/myrinet/startmpi.sh -catch-rsh
-unique  \
                  $pe_hostfile /usr/local/x86_64/pgi/mpich/bin/mpirun
stop_proc_args    /usr/local/sge/mpi/myrinet/stopmpi.sh
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

I have checked over 

http://gridengine.sunsource.net/howto/mpich-integration.html

and it appears quite outdated.  I have been unable to locate any recent
documentation on resolving this problem.

Calling my MPI jobs with

mpirun -np $NSLOTS -machinefile $TMPDIR/machines <binary>

or

sge_mpirun <binary>

Yields identical results.

a) Is there any recent documentation that covers this issue with
Myrinet?

b) When is this problem _finally_ going to be fixed?  I have been
dealing with it since 6.0 was initially released.  We're all the way to
update 6 and we're still incurring issues on our systems because of
it.  

Any and all help is appreciated.

Best Regards,

Brian Smith


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list