[GE users] SGE & mvapich2 integration

Hugo Darío Barrera hbarrera at iciq.es
Wed Jun 27 11:10:56 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Reuti (and thanks for the answer) and All,

I have tried without any luck sending jobs. I think I'm forgetting something 
but still can't realize what.

I can run other commands (like "ls -la > ls.out" or " factor  
9999999999999999999 > fac") and I do get the output, but as far as I run 'em 
using mpiexec from SGE I just get stuck.
mpiexec from command line (using mpdboot) it works, so it not supposse to be a 
mpiexce issue.


If I run the daemonless method I get NO error in SGE but NO output from the 
job(it runs for like 1 second).

if I use instead the daemon-based method like:

[root at infi qmaster]# qconf -sp n
pe_name           n
slots             24
user_lists        NONE
xuser_lists       NONE
start_proc_args   /sge/mpich2_smpd/startmpich2.sh -catch_rsh $pe_hostfile \
                  /sge/mpich2_smpd/bin/lx24-amd64/start_mpich2
stop_proc_args    /sge/mpich2_smpd/stopmpich2.sh -catch_rsh \
                  /sge/mpich2_smpd/bin/lx24-amd64/start_mpich2
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min


(I'm using /sge/mpich2_smpd/bin/lx24-amd64/start_mpich2  to 
replace /home/reuti/local/mpich2_smpd.)

I get the error 

tail /sge/default/spool/qmaster/messages

06/27/2007 11:59:28|qmaster|infi|W|job 33.1 failed on host infi7 general in 
pestart because: 06/27/2007 11:52:02 [505:1467]: exit_status of pe_start = 1


And the job tries to run in other nodes leaving a "E" error in failed ones.


Thanks again and Kind Regards


On Friday 22 June 2007 14:08, Reuti wrote:
> Hi,
>
> Am 22.06.2007 um 13:01 schrieb Hugo Darío Barrera:
> > I'm trying to get SGE & mvapich2 to get to work but although I read
> > http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-
> > integration.html
> > I can't figure out yet how the nodes will be booted using
> > mpdboot.
> > Do I need to execute mpdboot in the script I'll send to sge?
> > some kind of:
> > ---------
> > test at infi$ cat scrpit
> >
> > #!/bin/bash
> > # pe request
> > #$ -pe testqueue 2
> >
> > mpdboot -n 2 --ncpus=2 -r ssh -f machineworld
> > mpdtrace
> >
> > /opt/intel/mpi/bin/mpiexec -n 4 -env I_MPI_DEBUG=2
> > I_MPI_DEVICE=rdma /home/test/dlpoly_test/execute/DLPOLY.X
> >
> > mpdallexit
>
> Infiniband support isn't covered in this Howto at all, as of time of
> writing I had no access to an Infiniband cluster. I could look into
> it, but anyway: the mpd-startup method isn't supported with SGE at
> all as outlined in the Howto. Please use to the smpd-startup method,
> either daemon-based or only with rsh.
>
> -- Reuti
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list