[GE users] erratic mpich2-mx errors

amato ate9c at virginia.edu
Fri Dec 4 20:02:43 GMT 2009

Hello everyone,

I run an Apple Xserve cluster which has Myrinet MPI comms. I am trying to do a loose integration of a MPICH2-mx parallel computational fluid dynamics (CFD) program with SGE.  Now, I can submit this CFD program to my cluster via mpiexec w/no problem. And if I compile the code to run on one processor I can submit this via qsub no problem. Also, I can run some basic mpi benchmarking programs (IMB) via qsub using a pe I created with no problem.

The problem is submitting my CFD program via qsub: Usually the program fails with errors about not being able to find an output file to write to (a file that the program should've written but didn't). Or it actually writes those output files, but they stay empty and the code just runs forever on my cluster.

However, if I create those files (empty) that the code should be writing, then the program will execute normally and exit when finished.

To complicate matters more, these file that the CFD program is writing are going into a directory where the code is also writing new folders, so I think permissions are the issue here.

Has anyone ever encountered anything similar?  Does anyone have a hunch as to where I can look to fix this situation?  Thanks!


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list