[GE issues] [Issue 3018] New - the path to PE_HOSTFILE variable may not be set for pe slave tasks

aja alena.plestilova at sun.com
Tue May 5 15:09:57 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

http://gridengine.sunsource.net/issues/show_bug.cgi?id=3018
                 Issue #|3018
                 Summary|the path to PE_HOSTFILE variable may not be set for pe
                        | slave tasks
               Component|gridengine
                 Version|6.2
                Platform|Sun
                     URL|
              OS/Version|All
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|execution
             Assigned to|pollinger
             Reported by|aja






------- Additional comments from aja at sunsource.net Tue May  5 07:09:54 -0700 2009 -------
Steps to reproduce:

6.2u2_1 Cluster with two nodes, mpi CT installed. Normal PE and Q set up.
In my example I have two nodes named rabbit and aubing:



$qsub -pe orte 4 -q mpi.q pe_hostfile.sh
Your job 88 ("pe_hostfile.sh") has been submitted
$
$
$ cat pe_hostfile.sh
#!/bin/ksh
/opt/SUNWhpc/bin/mpirun -np 4 /home/tv9055/hpc/sge_mpi_scripts/test.sh
exit
$
$
$ cat test.sh
#!/bin/ksh

hostname
echo $PE_HOSTFILE
exit
$
$
And the result is:


$ cat pe_hostfile.sh.o88
Warning: no access to tty; thus no job control in this shell...
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
aubing
aubing
/opt/sgeee/default/spool/aubing/active_jobs/88.1/pe_hostfile
/opt/sgeee/default/spool/aubing/active_jobs/88.1/pe_hostfile
rabbit
/opt/sgeee/default/spool/rabbit/active_jobs/88.1/1.rabbit/pe_hostfile
rabbit
/opt/sgeee/default/spool/rabbit/active_jobs/88.1/1.rabbit/pe_hostfile
logout
$ 


This path is wrong. It does not exist and hence mpi programs which use
PE_HOSTFILE fail.
/opt/sgeee/default/spool/rabbit/active_jobs/88.1/1.rabbit/pe_hostfile
                                                          °°°°°°°°


The PE_HOSTFILE may not be provided to pe slave tasks, only the master task needs this information.
The customer used this PE_HOSTFILE information on slave task side and ran into problems, due to this changes. This is a not documented
feature. The customer changed his scripts and jobs and everthing is working again.
But the availablity of PE_HOSTFILE for slave tasks should be removed.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=190699

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list