[GE users] Problems with MPI

Shaila Parashar shaila at engr.colostate.edu
Thu May 13 21:56:27 BST 2004


I have installed SGEEE 5.3p3 on a cluster of 12 SUN workstations. I am
trying to integrate it with MPI. i have installed MPICH-1.2.5 with ssh
I followed the instructions in /sge/mpi directory and installed the
parallel environment mpich using mpich.template as a reference. Now when
I run an mpi job using the parallel environment , I get the following
error messages on all my workstations and the queues go into an error
state .

Thu May 13 14:15:49 2004|execd|cae25|E|can't start job "66304": can't
write script file "job_scripts/66304" wrote only -1 of 475480 bytes: Bad
Thu May 13 14:16:11 2004|execd|cae25|E|acknowledge for unknown job
Thu May 13 14:16:11 2004|execd|cae25|E|can't find active jobs directory
"active_jobs/66304.1" for reaping job 66304
Thu May 13 14:16:11 2004|execd|cae25|E|ERROR: unlinking
"jobs/00/0006/6304.1": No such file or directory
Thu May 13 14:16:11 2004|execd|cae25|E|can not remove job spool file:
Thu May 13 14:16:11 2004|execd|cae25|E|can't remove directory
"active_jobs/66304.1": opendir(active_jobs/66304.1) failed: No such file
or directory

Looking at the error messages I thought that it might be write
permission issue- but that is not the case.It does create a file in the
job_scripts directory but it is a binary file. So that is the case. I
tried to look through the archives for this error message but did not
have any luck.
Any suggestions, ideas or help would be greatly appreciated. I have no
problems in running any other kinds of jobs.


Shaila Parashar                 e-mail:shaila at engr.colostate.edu
UNIX System Administrator       tel:- (970)-491-6555
Engineering Network Services
Colorado State University
Fort Collins, CO 80523-1301
" Smile is a curve that sets things straight. "

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list