[GE users] How to find out why the SGE job is not termination.

lukacm at pdx.edu lukacm at pdx.edu
Thu Aug 10 20:31:10 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello,

is there anyone who could help for integrating gamess to SGE. I saw some
previous posting from january
2005,(http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=9018) 
but i could not get the attached scripts to run gamess under sge. Where could i
get these attachments?

thank you


martin

Quoting Reuti <reuti at staff.uni-marburg.de>:

> Hi,
>
> Am 10.08.2006 um 16:04 schrieb Amit H Kumar:
>
> >
> > HI SGE,
> >
> > I submitted a HelloWorld program couple of times through SGE.
> > 99.99% of the
> > times I found it runs successfully.
> > But this one time I don't see it terminating.  Though the .oJOBID and
> > .poJOBID files are created successfully.
> >
> > The .oJOBID file has the "correct result" from MPICH2 job. So this
> > is not
> > due to MPICH2 job.
> > My script looks like this:  The only Problem here is The script the
> > I run
> > after MPICH2 job looks for a file and it is missing.
> > And the .oJOBID file for this unfinished job seems to have stuck at
> > that
> > point. Because it is not reporting that the file is missing.
> > Though it does report about missing file when i ran it for the 2nd 3rd
> > ....nth time.
> >
> > <snip> ======================
> >
> > #!/bin/tcsh
> >
> > #$ -N helloworld.exe
> > #$ -m ae
> > #$ -M me at odu.edu
> > #$ -cwd
> > #$ -j y
> > #$ -S /bin/tcsh
> >
> > set NPROCPM=2
> > @ NPROCS=$NSLOTS * $NPROCPM
> >
> > /usr/local/bin/mpiexec -machinefile $HPC_HOSTFILE -np $NPROCS
> > ./helloworld.exe
> >
> > /usr/local/bin/HPC_unsetmpi.csh $MPI_TYPE      #<====== This script
> > has a
> > bug: A missing file that it is trying to read.
> >
> > </snip> =======================
> >
>
> what is HPC_unsetmpi.csh doing? I found a similar procedure to set
> these during the startup:
>
> http://www.engres.odu.edu/Clusters/Options/mpi_tutorial.html
>
> On what platform are you running your script, as I'm not aware of the
> set/unset-script? You requested a PE in your qsub command, and the
> MPICH2 integration is setup in a proper way?
>
> Can you please post your PE and queue definition, and if they are not
> too long also the set/unset scripts.
>
> -- Reuti
>
>
> >
> > I have not changed any SGE settings in between runs.  I still see the
> > process's "common" files in the $SGE_ROOT/default/spool/qmaster/
> > jobs/......
> >
> > My question is How do i find Why and Where  is it stuck, looking at
> > these
> > spool directory may be on head node or compute nodes.
> >
> >
> > Thank you for any feedback,
> > -AK
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list