[GE users] Why my job's accounting information always indicates "failed: 12 before pestop"

Eric Zhang maillistbox at 126.com
Wed Mar 14 05:16:29 GMT 2007


    [ The following text is in the "GB2312" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi, users:

Thanks, Chris. It's my mistake. :)

By the way, I have other 2 questions which listed in my last email:

=====================================================

2. I have read the article "Tight MPICH Integration in Grid Engine", and
in my job's script, I defined "-v MPICH_PROCESS_GROUP=no" to achieve the
tight integration. Is this correct?

3. In my PE's configuration, Is the option "-catch_rsh" necessary? I
found in sge's PE template which named "mpi.template", hasn't set this
option. I think that "startmpi.sh" will place a link which points to
sge's rsh wrapper in $TMPDIR so that the application will use sge's rsh
wrapper to dispatch it's processes, that means this option cannot be
ignored, Is this correct?

=======================================================

Give me some suggestions, Thanks a lot.

Eric Zhang
2007-03-14




Eric Zhang wrote:
> Hi, GE users:
>
> I am using sge 6.0u9 now, and my pe configuration is:
>
> ==================================================
> pe_name mpich
> slots 99
> user_lists NONE
> xuser_lists NONE
> start_proc_args /home/sge6/mpi/startmpi.sh -catch_rsh $pe_hostfile
> stop_proc_args /bin/sge6/mpi/stopmpi.sh
> allocation_rule $fill_up
> control_slaves TRUE
> job_is_first_task TRUE
> urgency_slots min
> ==================================================
>
> My job submit script is:
>
> ==================================================
> #!/bin/sh
> #
> #$ -S /bin/sh
> # ---------------------------
> # our name
> #$ -N EricPi
> #$ -j y
> #
> # output path
> #$ -o /home/eric/output
> #$ -e /home/eric/output
>
> # pe request
> #$ -pe mpich 2
> #
> #$ -v P4_RSHCOMMAND=rsh
> #$ -v MPICH_PROCESS_GROUP=no
> # ---------------------------
>
> #
> # needs in
> # $NSLOTS
> # the number of tasks to be used
> # $TMPDIR/machines
> # a valid machiche file to be passed to mpirun
>
> # export NSLOTS=4
>
> # enables $TMPDIR/rsh to catch rsh calls if available
> export path=$TMPDIR:$path
>
> /usr/local/mpich-ifort/bin/mpirun -np $NSLOTS -machinefile
> $TMPDIR/machines /home/eric/testcodes/pi3f90
> =========================================================================
>
> I have three questions here:
>
> 1. The job is running fine, but I found that in my job's accounting
> information, the "failed" field always indicates: "12: before pestop", why?
>
> 2. I have read the article "Tight MPICH Integration in Grid Engine", and
> in my job's script, I defined "-v MPICH_PROCESS_GROUP=no" to achieve the
> tight integration. Is this correct?
>
> 3. In my PE's configuration, Is the option "-catch_rsh" necessary? I
> found in sge's PE template which named "mpi.template", hasn't set this
> option. I think that "startmpi.sh" will place a link which points to
> sge's rsh wrapper in $TMPDIR so that the application will use sge's rsh
> wrapper to dispatch it's processes, that means this option cannot be
> ignored, Is this correct?
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list