[GE users] cannot run in PE "smp" because it only offers -2147483648 slots

reuti reuti at staff.uni-marburg.de
Tue Feb 24 23:39:01 GMT 2009


Am 24.02.2009 um 23:31 schrieb kdoman:

> qhost responded normally:
> HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE   
> SWAPTO  SWAPUS
> ---------------------------------------------------------------------- 
> ---------
> global                  -               -     -       -        
> -       -       -
> node01             lx26-x86        2  0.00    4.0G  108.2M     
> 7.8G     0.0
> node02             lx26-x86        2  0.01    4.0G  107.1M     
> 7.8G     0.0
> node03             lx26-x86        2  0.00    4.0G  116.6M     
> 7.8G     0.0
> node04             lx26-x86        2  0.00    4.0G   80.8M     
> 7.8G     0.0
> node05             lx26-x86        2  0.00    4.0G  106.0M     
> 9.8G     0.0
> node06            lx26-x86        2  0.00    4.0G   79.3M     
> 7.8G     0.0
> node07             lx26-x86        2  0.00    4.0G  105.7M     
> 7.8G     0.0
> ... etc. ...
>
> qquota is empty; I'm thinking of RQS but never had time to start it.
>
> Could a 'yum update' command messed up something? I remembered setting
> up a 'thread' pe awhile back and thought it was working back then. I
> don't remember doing any update to SGE, but occasionally doing some
> yum install and yum update on simple apps such as Firefox, but nothing
> major.

You didn't mention the output of "qstat -f", so let's assume it's fine.

Are there any default requests in place, i.e. for a queue or another  
PE which can't be satisfied? Serial jobs are running fine? Other PEs  
are also running fine? Below you request long.q in your jobscript,  
but the output showed all.q. smp is also atached to long.q?

-- Reuti


> Still confused and baffled!
>
> K.
>
>
> On Tue, Feb 24, 2009 at 3:49 PM, reuti <reuti at staff.uni-marburg.de>  
> wrote:
>> Am 24.02.2009 um 22:06 schrieb kdoman:
>>
>>> Sorry, I ran a test case on development cluster where each node has
>>> only two core. That's why all.q has slot of 2. This cluster has 64
>>> nodes and it's mostly idle so I can do whatever I want. The slot is
>>> correct at 2
>>
>> Aha, "qhost" and "qstat -f" show all nodes online and no queue
>> disabled? Any RQS in place, i.e. "qquota" empty?
>>
>> -- Reuti
>>
>>> I'm running GE 6.1u4 on CentOS 5.2.
>>>
>>> # qconf -sp smp
>>> pe_name           smp
>>> slots             128
>>> user_lists        NONE
>>> xuser_lists       NONE
>>> start_proc_args   /bin/true
>>> stop_proc_args    /bin/true
>>> allocation_rule   $pe_slots
>>> control_slaves    FALSE
>>> job_is_first_task TRUE
>>> urgency_slots     min
>>>
>>> Simple sleep job (supposedly):
>>> =====================
>>> # cat sleep.sh
>>> #!/bin/bash
>>>
>>> #$ -pe smp 2
>>> #$ -cwd
>>> #$ -q long.q
>>> #$ -R y
>>> sleep 60
>>>
>>> I can run a one-liner qsub and still get the same error:
>>> qsub -cwd -b y -pe smp 2 sleep 60
>>>
>>> Thanks!
>>> K.
>>>
>>>
>>> On Tue, Feb 24, 2009 at 2:49 PM, reuti <reuti at staff.uni-marburg.de>
>>> wrote:
>>>> Hiho,
>>>>
>>>> Am 24.02.2009 um 21:10 schrieb kdoman:
>>>>
>>>>> hello list -
>>>>> I need to submit only one job to one machine even though the  
>>>>> machine
>>>>> has four cores. So I ran the command "qconf -ap smp", edit the
>>>>> slots=1000, saved and added smp to the queue (via qconf -mq):
>>>>>
>>>>> qconf -sp smp:
>>>>> ==============
>>>>> pe_name           smp
>>>>> slots             1000
>>>>
>>>> 1000 is of course save, although no. of nodes x 4 would do.
>>>>
>>>>> user_lists        NONE
>>>>> xuser_lists       NONE
>>>>> start_proc_args   /bin/true
>>>>> stop_proc_args    /bin/true
>>>>> allocation_rule   $pe_slots
>>>>> control_slaves    FALSE
>>>>> job_is_first_task TRUE
>>>>> urgency_slots     min
>>>>>
>>>>> qconf -sq all.q
>>>>> ==============
>>>>> qname                 all.q
>>>>> hostlist              @allhosts
>>>>> seq_no                0
>>>>> load_thresholds       np_load_avg=1.75
>>>>> suspend_thresholds    NONE
>>>>> nsuspend              1
>>>>> suspend_interval      00:05:00
>>>>> priority              0
>>>>> min_cpu_interval      00:05:00
>>>>> processors            UNDEFINED
>>>>> qtype                 BATCH INTERACTIVE
>>>>> ckpt_list             NONE
>>>>> pe_list               make mpich mpi orte smp
>>>>> rerun                 FALSE
>>>>> slots                 2
>>>>
>>>> If all machines have 4 cores, you can just put here 4. Otherwise  
>>>> you
>>>> would need to specify this by node or hostgroup in a heterogenous
>>>> cluster.
>>>>
>>>>> .
>>>>> .
>>>>> etc...
>>>>>
>>>>> After I submitted the jobs, all jobs stayed in the 'qw' states.
>>>>> qstat
>>>>> -j <job-id> gave me this:
>>>>> cannot run in PE "smp" because it only offers -2147483648 slots
>>>>
>>>> On which platform / OS / SGE version do you observe this?
>>>>
>>>> -- Reuti
>>>>
>>>>>
>>>>> Thanks all.
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>>> dsForumId=38&dsMessageId=113694
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-
>>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>>> dsForumId=38&dsMessageId=113718
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-
>>>> unsubscribe at gridengine.sunsource.net].
>>>>
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=113727
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=113753
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=113781
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=113816

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list