[GE users] Problems with MPI

Shaila Parashar shaila at engr.colostate.edu
Wed May 19 21:33:16 BST 2004


Hi

I followed the directions to setup qrsh over shh and  everything works
perfectly.
Thanks a lot

Shaila

Ron Chen wrote:

> How about serial jobs? Can you start a hello world
> type serial job on that node?
>
> Also, you need to use MPICH with rsh, so that mpirun
> will call SGE's qrsh (and then you can use qrsh over
> ssh -- just follow the doc from the HOWTO).
>
>  -Ron
>
> --- Shaila Parashar <shaila at engr.colostate.edu> wrote:
> > Hi
> >
> > I have installed SGEEE 5.3p3 on a cluster of 12 SUN
> > workstations. I am
> > trying to integrate it with MPI. i have installed
> > MPICH-1.2.5 with ssh
> > integration.
> > I followed the instructions in /sge/mpi directory
> > and installed the
> > parallel environment mpich using mpich.template as a
> > reference. Now when
> > I run an mpi job using the parallel environment , I
> > get the following
> > error messages on all my workstations and the queues
> > go into an error
> > state .
> >
> > Thu May 13 14:15:49 2004|execd|cae25|E|can't start
> > job "66304": can't
> > write script file "job_scripts/66304" wrote only -1
> > of 475480 bytes: Bad
> > address
> > Thu May 13 14:16:11 2004|execd|cae25|E|acknowledge
> > for unknown job
> > 66304.1/master
> > Thu May 13 14:16:11 2004|execd|cae25|E|can't find
> > active jobs directory
> > "active_jobs/66304.1" for reaping job 66304
> > Thu May 13 14:16:11 2004|execd|cae25|E|ERROR:
> > unlinking
> > "jobs/00/0006/6304.1": No such file or directory
> > Thu May 13 14:16:11 2004|execd|cae25|E|can not
> > remove job spool file:
> > jobs/00/0006/6304.1
> > Thu May 13 14:16:11 2004|execd|cae25|E|can't remove
> > directory
> > "active_jobs/66304.1": opendir(active_jobs/66304.1)
> > failed: No such file
> > or directory
> >
> >
> > Looking at the error messages I thought that it
> > might be write
> > permission issue- but that is not the case.It does
> > create a file in the
> > job_scripts directory but it is a binary file. So
> > that is the case. I
> > tried to look through the archives for this error
> > message but did not
> > have any luck.
> > Any suggestions, ideas or help would be greatly
> > appreciated. I have no
> > problems in running any other kinds of jobs.
> >
> > Thanks
> >
> >
> > --
> >
> *****************************************************************
> > Shaila Parashar
> > e-mail:shaila at engr.colostate.edu
> > UNIX System Administrator       tel:- (970)-491-6555
> > Engineering Network Services
> > Colorado State University
> > Fort Collins, CO 80523-1301
> >
> ******************************************************************
> > " Smile is a curve that sets things straight. "
> >
> >
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> > users-help at gridengine.sunsource.net
> >
>
>
>
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! - Internet access at a great low price.
> http://promo.yahoo.com/sbc/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

--
*****************************************************************
Shaila Parashar                 e-mail:shaila at engr.colostate.edu
UNIX System Administrator       tel:- (970)-491-6555
Engineering Network Services
Colorado State University
Fort Collins, CO 80523-1301
******************************************************************
" Smile is a curve that sets things straight. "




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list