[GE users] cannot run in PE "smp" because it only offers -2147483648 slots

kdoman kdoman07 at gmail.com
Wed Feb 25 00:36:32 GMT 2009


Serial ran great. I can take the #$ -pe smp 2 out of the script and it
ran right away. If I came back to add that flag, then submit it, it
stays in 'qw' while my cluster only have the previously submitted
serial jobs.

And qstat -f responds normal. Nothing disable, no au state or anything.



On Tue, Feb 24, 2009 at 5:39 PM, reuti <reuti at staff.uni-marburg.de> wrote:
> Am 24.02.2009 um 23:31 schrieb kdoman:
>
>> qhost responded normally:
>> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE
>> SWAPTO  SWAPUS
>> ----------------------------------------------------------------------
>> ---------
>> global                  -               -     -       -
>> -       -       -
>> node01             lx26-x86        2  0.00    4.0G  108.2M
>> 7.8G     0.0
>> node02             lx26-x86        2  0.01    4.0G  107.1M
>> 7.8G     0.0
>> node03             lx26-x86        2  0.00    4.0G  116.6M
>> 7.8G     0.0
>> node04             lx26-x86        2  0.00    4.0G   80.8M
>> 7.8G     0.0
>> node05             lx26-x86        2  0.00    4.0G  106.0M
>> 9.8G     0.0
>> node06            lx26-x86        2  0.00    4.0G   79.3M
>> 7.8G     0.0
>> node07             lx26-x86        2  0.00    4.0G  105.7M
>> 7.8G     0.0
>> ... etc. ...
>>
>> qquota is empty; I'm thinking of RQS but never had time to start it.
>>
>> Could a 'yum update' command messed up something? I remembered setting
>> up a 'thread' pe awhile back and thought it was working back then. I
>> don't remember doing any update to SGE, but occasionally doing some
>> yum install and yum update on simple apps such as Firefox, but nothing
>> major.
>
> You didn't mention the output of "qstat -f", so let's assume it's fine.
>
> Are there any default requests in place, i.e. for a queue or another
> PE which can't be satisfied? Serial jobs are running fine? Other PEs
> are also running fine? Below you request long.q in your jobscript,
> but the output showed all.q. smp is also atached to long.q?
>
> -- Reuti
>
>
>> Still confused and baffled!
>>
>> K.
>>
>>
>> On Tue, Feb 24, 2009 at 3:49 PM, reuti <reuti at staff.uni-marburg.de>
>> wrote:
>>> Am 24.02.2009 um 22:06 schrieb kdoman:
>>>
>>>> Sorry, I ran a test case on development cluster where each node has
>>>> only two core. That's why all.q has slot of 2. This cluster has 64
>>>> nodes and it's mostly idle so I can do whatever I want. The slot is
>>>> correct at 2
>>>
>>> Aha, "qhost" and "qstat -f" show all nodes online and no queue
>>> disabled? Any RQS in place, i.e. "qquota" empty?
>>>
>>> -- Reuti
>>>
>>>> I'm running GE 6.1u4 on CentOS 5.2.
>>>>
>>>> # qconf -sp smp
>>>> pe_name           smp
>>>> slots             128
>>>> user_lists        NONE
>>>> xuser_lists       NONE
>>>> start_proc_args   /bin/true
>>>> stop_proc_args    /bin/true
>>>> allocation_rule   $pe_slots
>>>> control_slaves    FALSE
>>>> job_is_first_task TRUE
>>>> urgency_slots     min
>>>>
>>>> Simple sleep job (supposedly):
>>>> =====================
>>>> # cat sleep.sh
>>>> #!/bin/bash
>>>>
>>>> #$ -pe smp 2
>>>> #$ -cwd
>>>> #$ -q long.q
>>>> #$ -R y
>>>> sleep 60
>>>>
>>>> I can run a one-liner qsub and still get the same error:
>>>> qsub -cwd -b y -pe smp 2 sleep 60
>>>>
>>>> Thanks!
>>>> K.
>>>>
>>>>
>>>> On Tue, Feb 24, 2009 at 2:49 PM, reuti <reuti at staff.uni-marburg.de>
>>>> wrote:
>>>>> Hiho,
>>>>>
>>>>> Am 24.02.2009 um 21:10 schrieb kdoman:
>>>>>
>>>>>> hello list -
>>>>>> I need to submit only one job to one machine even though the
>>>>>> machine
>>>>>> has four cores. So I ran the command "qconf -ap smp", edit the
>>>>>> slots=1000, saved and added smp to the queue (via qconf -mq):
>>>>>>
>>>>>> qconf -sp smp:
>>>>>> ==============
>>>>>> pe_name           smp
>>>>>> slots             1000
>>>>>
>>>>> 1000 is of course save, although no. of nodes x 4 would do.
>>>>>
>>>>>> user_lists        NONE
>>>>>> xuser_lists       NONE
>>>>>> start_proc_args   /bin/true
>>>>>> stop_proc_args    /bin/true
>>>>>> allocation_rule   $pe_slots
>>>>>> control_slaves    FALSE
>>>>>> job_is_first_task TRUE
>>>>>> urgency_slots     min
>>>>>>
>>>>>> qconf -sq all.q
>>>>>> ==============
>>>>>> qname                 all.q
>>>>>> hostlist              @allhosts
>>>>>> seq_no                0
>>>>>> load_thresholds       np_load_avg=1.75
>>>>>> suspend_thresholds    NONE
>>>>>> nsuspend              1
>>>>>> suspend_interval      00:05:00
>>>>>> priority              0
>>>>>> min_cpu_interval      00:05:00
>>>>>> processors            UNDEFINED
>>>>>> qtype                 BATCH INTERACTIVE
>>>>>> ckpt_list             NONE
>>>>>> pe_list               make mpich mpi orte smp
>>>>>> rerun                 FALSE
>>>>>> slots                 2
>>>>>
>>>>> If all machines have 4 cores, you can just put here 4. Otherwise
>>>>> you
>>>>> would need to specify this by node or hostgroup in a heterogenous
>>>>> cluster.
>>>>>
>>>>>> .
>>>>>> .
>>>>>> etc...
>>>>>>
>>>>>> After I submitted the jobs, all jobs stayed in the 'qw' states.
>>>>>> qstat
>>>>>> -j <job-id> gave me this:
>>>>>> cannot run in PE "smp" because it only offers -2147483648 slots
>>>>>
>>>>> On which platform / OS / SGE version do you observe this?
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>>
>>>>>> Thanks all.
>>>>>>
>>>>>> ------------------------------------------------------
>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>>> dsForumId=38&dsMessageId=113694
>>>>>>
>>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=113718
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>> unsubscribe at gridengine.sunsource.net].
>>>>>
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=113727
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=113753
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?
>> dsForumId=38&dsMessageId=113781
>>
>> To unsubscribe from this discussion, e-mail: [users-
>> unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=113816
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=113844

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list