[GE users] subordinate queues and MPI jobs

Sean Dilda agrajag at dragaera.net
Wed May 5 16:03:21 BST 2004


Has anyone here played around with subordinate queues and MPI jobs?  I
just played around with it some, and am somewhat disappointed by the
results.

I'm running 5.3p4.  I have tight integration with MPICH setup (I'm using
an unmodified sshd though).

I started an MPI job running across several machines, using several
subordinate queues.  I then launched another job.  It went into a queue
that should have caused the MPI job to suspend due to subordinate
queues.  SGE properly listed all of the queues as suspended (all jobs
had a state of 'S').  However, when I checked the process table on the
compute nodes, I found a different story.

On the MASTER node for the mpi job, the job script as well as the mpirun
processes were stopped.  However, none of the child processes of mpirun
(the ones actually running my code) were stopped.  And none of the
processes on other nodes were stopped.

Is this a known problem?  Is there something I can do to fix this
behavior?  Am I trying to do something that isn't supported?

Thanks,


Sean


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list