[GE users] determining when a mpirun has succeeded and failed

Reuti reuti at staff.uni-marburg.de
Fri Jul 6 11:54:41 BST 2007


Am 05.07.2007 um 19:02 schrieb Adam Bruss:

> -D is needed for the LAM licensing.
> -c 1 tells mpirun to run one copy of the executable.
> dfem is the executable and the stuff after it are arguments to dfem.

I'm getting confused: what is the purpose of running a parallel job  
with only one CPU? After investigating, I even saw these options in  
the mpirun, but with another explanation:

-D Change current working directory of new processes to the directory  
where the executable resides

I don't know, whether this option really makes sense. So, just run  
with C as only option and it should work, as the Tight Integration  
set up already the right things for you.

If you create a MPI error, i.e. like:

return(MPI_ERR_OTHER);
MPI_Finalize();

(hence before the closing of the MPI environment), you can test it on  
the commandline with:

echo $?

after the mpirun. This return code should also appear in the SGE  
accounting file. What do you mean in detail with "mpirun failed"?

-- Reuti


> It works this way as far as running the job goes. In its current  
> state the
> exit_status of qacct is zero if the mpi run was a success and  zero  
> if the
> mpi run failed. I want to have the exit_status from qacct tell me  
> if the mpi
> job failed or succeeded.
>
> According to a colleague of mine, SGE should be able to capture the  
> exit
> status of the mpirun.
>
> Adam
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Thursday, July 05, 2007 11:03 AM
> To: Adam Bruss
> Subject: Re: [GE users] determining when a mpirun has succeeded and  
> failed
>
> Am 05.07.2007 um 15:58 schrieb Adam Bruss:
>> I'm running the LAM implementation of MPI with tight integration
>> into SGE.
> Okay, what are the options:
>
>   "mpirun -D -c 1 dfem -type rf3p wgdblbnd.sup"
>
> hence -D, -c1 and -type rf3p and good for?
>
> Just specify "mpirun C wgdblbnd.sup" (if this is your program) should
> work, as the Tight-Integration will create an universe for each job
> on its own.
>
> -- Reuti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list