[GE users] determining when a mpirun has succeeded and failed

Adam Bruss adam.bruss at staarinc.com
Fri Jul 6 16:27:03 BST 2007


The reason for "-pe mpi 1" is arbitrary. I could have "-pe mpi 16" to have
it run on all 16 of our processors. It doesn't matter for what I'm trying to
do.

 

If I don't have the -D option I get this error message: "Error:Proc 0 Err:
RUNNING UNLICENSED VERSION!"

 

The -c option is concerned with how many executables to run and not anything
to do with parallel processing.

 

I was wrong when I said mpirun failed. What fails is the solver(dfem) that
we wrote. MPI didn't fail. I want to be able to tell when our solver(dfem)
fails through an error code. I was hoping this could be handled by the
exit_status variable in accounting rather than the output from our solver.

 

Here again is the command:

 

qsub -N dfem -b y -V -pe mpi 1 "mpirun -D -c 1 /Analyst/v10dev/dfem -type
rf3p /Analyst/v10dev/wgdblbnd.sup"

 

-Adam

 

-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Friday, July 06, 2007 5:55 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] determining when a mpirun has succeeded and failed

 

Am 05.07.2007 um 19:02 schrieb Adam Bruss:

 

> -D is needed for the LAM licensing.

> -c 1 tells mpirun to run one copy of the executable.

> dfem is the executable and the stuff after it are arguments to dfem.

 

I'm getting confused: what is the purpose of running a parallel job  

with only one CPU? After investigating, I even saw these options in  

the mpirun, but with another explanation:

 

-D Change current working directory of new processes to the directory  

where the executable resides

 

I don't know, whether this option really makes sense. So, just run  

with C as only option and it should work, as the Tight Integration  

set up already the right things for you.

 

If you create a MPI error, i.e. like:

 

return(MPI_ERR_OTHER);

MPI_Finalize();

 

(hence before the closing of the MPI environment), you can test it on  

the commandline with:

 

echo $?

 

after the mpirun. This return code should also appear in the SGE  

accounting file. What do you mean in detail with "mpirun failed"?

 

-- Reuti

 

 

> It works this way as far as running the job goes. In its current  

> state the

> exit_status of qacct is zero if the mpi run was a success and  zero  

> if the

> mpi run failed. I want to have the exit_status from qacct tell me  

> if the mpi

> job failed or succeeded.

> 

> According to a colleague of mine, SGE should be able to capture the  

> exit

> status of the mpirun.

> 

> Adam

> 

> -----Original Message-----

> From: Reuti [mailto:reuti at staff.uni-marburg.de]

> Sent: Thursday, July 05, 2007 11:03 AM

> To: Adam Bruss

> Subject: Re: [GE users] determining when a mpirun has succeeded and  

> failed

> 

> Am 05.07.2007 um 15:58 schrieb Adam Bruss:

>> I'm running the LAM implementation of MPI with tight integration

>> into SGE.

> Okay, what are the options:

> 

>   "mpirun -D -c 1 dfem -type rf3p wgdblbnd.sup"

> 

> hence -D, -c1 and -type rf3p and good for?

> 

> Just specify "mpirun C wgdblbnd.sup" (if this is your program) should

> work, as the Tight-Integration will create an universe for each job

> on its own.

> 

> -- Reuti

> 

> ---------------------------------------------------------------------

> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net

> For additional commands, e-mail: users-help at gridengine.sunsource.net

 

---------------------------------------------------------------------

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net

For additional commands, e-mail: users-help at gridengine.sunsource.net

 




More information about the gridengine-users mailing list