[GE users] Tight integration template for IntelMPI ?

Olesen, Mark Mark.Olesen at emcontechnologies.com
Fri Nov 23 09:33:01 GMT 2007


    [ The following text is in the "X-UNKNOWN" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

> Some users complains about processes left working on nodes after a
> qdel.
> Our parallels apps use IntelMPI implementation.
> On which template (or paper) should I base my work to use tight
> integration with IntelMPI ?

We *thought* we were using a tight integration:

    pe_name           mpichA
    slots             999
    user_lists        NONE
    xuser_lists       NONE
    start_proc_args   /opt/sge/mpi/startmpi.sh -catch_rsh $pe_hostfile
    stop_proc_args    /opt/sge/mpi/stopmpi.sh
    allocation_rule   $fill_up
    control_slaves    TRUE
    job_is_first_task FALSE
    urgency_slots     min

but we still had the problem with defunct and orphaned processes.
Our problem arose from the application (STAR-CD) being shipped with its own
rsh script (to select rsh or ssh).
Based on the path initialization etc, it could be that the rsh link set by
'-catch_rsh' wouldn't be reached and thus the normal rsh/ssh would be used
instead of qrsh.
 
We have since altered our scripts to explicitly use qrsh:

    # use GridEngine qrsh for the mpi transport
    # hp-mpi
    MPI_REMSH=$SGE_ROOT/mpi/rsh; export MPI_REMSH
    # mpich
    P4_RSHCOMMAND=$SGE_ROOT/mpi/rsh; export P4_RSHCOMMAND

A similar cure might help you with IntelMPI. You'll have to check the
documentation to see which variable(s) it uses to specify a repleacement for
the rsh transport.

/mark
This e-mail message and any attachments may contain 
legally privileged, confidential or proprietary Information, 
or information otherwise protected by law of EMCON 
Technologies, its affiliates, or third parties. This notice 
serves as marking of its ?Confidential? status as defined 
in any confidentiality agreements concerning the sender 
and recipient. If you are not the intended recipient(s), 
or the employee or agent responsible for delivery of this 
message to the intended recipient(s), you are hereby 
notified that any dissemination, distribution or copying 
of this e-mail message is strictly prohibited. 
If you have received this message in error, please 
immediately notify the sender and delete this e-mail 
message from your computer.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list