[GE users] relationship between qsub and qstat and queue allocations
Margaret Doll
Margaret_Doll at brown.edu
Thu Nov 13 14:24:45 GMT 2008
Thanks.
On Nov 13, 2008, at 9:21 AM, reuti wrote:
> Am 13.11.2008 um 15:02 schrieb Margaret Doll:
>
>> On Nov 13, 2008, at 3:52 AM, reuti wrote:
>>
>>> Hi Margaret,
>>>
>>> Am 12.11.2008 um 23:36 schrieb Margaret Doll:
>>>
>>>> qsub is not working the way that I thought it should. Each qsub
>>>> may
>>>> start several instances of a job, but it will
>>>>
>>>> create only one instance of a running job showing in qmon
>>>> only one instance of a queued job using "qstat -f" and
>>>> seems to count as only one job on one of the compute nodes.
>>>>
>>>> For instance, I am using
>>>>
>>>> qsub -q mem16.q shll
>>>>
>>>> where shll includes:
>>>>
>>>> #!/bin/bash
>>>> #$ -o $HOME/works-1/Out
>>>> #$ -j y
>>>> /opt/openmpi/bin/mpiexec -v -n 17 -machinefile $Home/works-1/
>>>> machinefile $Home/works-1/mad
>>>
>>> although Open MPI has a tight integration into SGE built in, you
>>> will
>>> need to define and request a PE (parallel environment), instead of
>>> supplying your own list of machines.
>>>
>>> http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge
>>>
>>> -- Reuti
>>>
>>
>> I found the settings.sh in /opt/gridengine/default/common/settings.sh
>>
>> I believe, however, before I run
>>
>> qsh -pe orte 4
>> mpirun -np 4 a.out
>>
>> I have to set up a parallel environment named orte. How do I set up
>> numerous
>> parallel environments that correspond to the queues that I set up in
>> qmon?
>
> You'll find the information in the "N1 Grid Engine 6 Administration
> Guide" on page 155 ff.
>
> http://gridengine.sunsource.net/documentation.html
>
> -- Reuti
>
>
>>
>>>
>>>>
>>>> machinefile includes:
>>>>
>>>> compute-0-10
>>>> compute-0-11
>>>> compute-0-10
>>>> compute-0-11
>>>> compute-0-10
>>>> compute-0-11
>>>> compute-0-10
>>>> compute-0-11
>>>> compute-0-10
>>>> compute-0-11
>>>> compute-0-10
>>>> compute-0-11
>>>> compute-0-10
>>>> compute-0-11
>>>> compute-0-10
>>>> compute-0-11
>>>> compute-0-10
>>>> compute-0-11
>>>>
>>>> I have each of the compute nodes set to run only eight queued
>>>> jobs
>>>> at a time.
>>>>
>>>> A queued job will show up on compute-0-10 when I run "qstat -f"
>>>>
>>>> compute-0-10 will be running 9 instances of the program;
>>>> compute-0-11
>>>> will be running 8.
>>>>
>>>> What am I doing incorrectly?
>>>>
