[GE users] error qhen using qsub

Reuti reuti at staff.uni-marburg.de
Wed Feb 23 16:44:25 GMT 2005


lukacm at pdx.edu wrote:
> Hello,
> 
> we installed the PGI package with compilers. The person doing it disabled during
> the compialation the head node as a exec host so to run a mpi job using mpirun
> directly the '-nolocal' switch must be used.

- To use a parallel (e.g. MPI) program, you have to define a PE in SGE 
and request it with qsub. See man "sge_pe".

- This will grant some slots on the machines, according to your request. 
Don't use -nolocal, this would disable the head node of the job (this is 
not related to the master node of the cluster).

> Now when i try to submit a job using qsub, and if i point to the mpirun binary
> from PGI package i got this
> error:/opt/gridengine/default/spool/compute-0-0/job_scripts/269: line 7:
> /usr/pgi/linux86-64/5.2/bin/mpirun: No such file or directory.

- Copy /usr/pgi to a shared directory. At least the mpirun from PGI and 
maybe some dynamic libs also have to be available on the nodes.

- When you have this working, you can look for a tight integration with 
correct accounting. You can use mpirun from mpich to start PGI programs? 
Then maybe the same integration applies, you can find at the Howto page.

> Does it mean taht SGE is not correctly integrated with the mpi run or something
> else? If i refer to the original mpirun with ic in /opt/mpich/gnu/bin/ the
> process starts in parallel and everything is fine. however the PGI
> compiler/mpiruner is not even visible from the ccompute nodes.
> 
> Does anyone using PGI had similar troubles or have some infomration about this
> issue?

Cheers - Reuti


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list