[GE users] determining when a mpirun has succeeded and failed

Reuti reuti at staff.uni-marburg.de
Fri Jul 6 17:07:42 BST 2007


    [ The following text is in the "WINDOWS-1252" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Am 06.07.2007 um 17:27 schrieb Adam Bruss:
> The reason for "-pe mpi 1" is arbitrary. I could have "-pe mpi 16"  
> to have it run on all 16 of our processors. It doesn?t matter for  
> what I?m trying to do.
>
> If I don't have the -D option I get this error message: "Error:Proc  
> 0 Err: RUNNING UNLICENSED VERSION!"
Not the LAM license - it's OpenSource, maybe your program looks for a  
special "license" file in the current working directory. But you  
state below, that the software is written on your own?
> The -c option is concerned with how many executables to run and not  
> anything to do with parallel processing.
Yes, but if there is a LAM universe unique for each job, just use the  
uppercase C to use all allocated slots/nodes for this job. This way  
you will never have to change it, when you decide to change the  
number of to be used slots.
> I was wrong when I said mpirun failed. What fails is the solver 
> (dfem) that we wrote.
So, how do you return form this subroutine, in case it fails for now?

As you have the source code, you can easily put a return statement  
there (as I outlined). Then this can be checked by $? on the  
commandline after running "mpirun..." interactively, or show up in  
the accounting file.

-- Reuti

> MPI didn't fail. I want to be able to tell when our solver(dfem)  
> fails through an error code. I was hoping this could be handled by  
> the exit_status variable in accounting rather than the output from  
> our solver.
>
>  Here again is the command:
>
>  qsub -N dfem -b y -V -pe mpi 1 "mpirun -D -c 1 /Analyst/v10dev/ 
> dfem -type rf3p /Analyst/v10dev/wgdblbnd.sup"
>
>
>
> -Adam
>
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: Friday, July 06, 2007 5:55 AM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] determining when a mpirun has succeeded and  
> failed
>
>
>
> Am 05.07.2007 um 19:02 schrieb Adam Bruss:
>
>
>
> > -D is needed for the LAM licensing.
>
> > -c 1 tells mpirun to run one copy of the executable.
>
> > dfem is the executable and the stuff after it are arguments to dfem.
>
>
>
> I'm getting confused: what is the purpose of running a parallel job
>
> with only one CPU? After investigating, I even saw these options in
>
> the mpirun, but with another explanation:
>
>
>
> -D Change current working directory of new processes to the directory
>
> where the executable resides
>
>
>
> I don't know, whether this option really makes sense. So, just run
>
> with C as only option and it should work, as the Tight Integration
>
> set up already the right things for you.
>
>
>
> If you create a MPI error, i.e. like:
>
>
>
> return(MPI_ERR_OTHER);
>
> MPI_Finalize();
>
>
>
> (hence before the closing of the MPI environment), you can test it on
>
> the commandline with:
>
>
>
> echo $?
>
>
>
> after the mpirun. This return code should also appear in the SGE
>
> accounting file. What do you mean in detail with "mpirun failed"?
>
>
>
> -- Reuti
>
>
>
>
>
> > It works this way as far as running the job goes. In its current
>
> > state the
>
> > exit_status of qacct is zero if the mpi run was a success and  zero
>
> > if the
>
> > mpi run failed. I want to have the exit_status from qacct tell me
>
> > if the mpi
>
> > job failed or succeeded.
>
> >
>
> > According to a colleague of mine, SGE should be able to capture the
>
> > exit
>
> > status of the mpirun.
>
> >
>
> > Adam
>
> >
>
> > -----Original Message-----
>
> > From: Reuti [mailto:reuti at staff.uni-marburg.de]
>
> > Sent: Thursday, July 05, 2007 11:03 AM
>
> > To: Adam Bruss
>
> > Subject: Re: [GE users] determining when a mpirun has succeeded and
>
> > failed
>
> >
>
> > Am 05.07.2007 um 15:58 schrieb Adam Bruss:
>
> >> I'm running the LAM implementation of MPI with tight integration
>
> >> into SGE.
>
> > Okay, what are the options:
>
> >
>
> >   "mpirun -D -c 1 dfem -type rf3p wgdblbnd.sup"
>
> >
>
> > hence -D, -c1 and -type rf3p and good for?
>
> >
>
> > Just specify "mpirun C wgdblbnd.sup" (if this is your program)  
> should
>
> > work, as the Tight-Integration will create an universe for each job
>
> > on its own.
>
> >
>
> > -- Reuti
>
> >
>
> >  
> ---------------------------------------------------------------------
>
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
> ---------------------------------------------------------------------
>
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list