[GE users] Qalter -w v bug?

Reuti reuti at staff.uni-marburg.de
Tue Apr 29 20:21:48 BST 2008


Am 29.04.2008 um 20:11 schrieb Heywood, Todd:

> On 4/29/08 1:39 PM, "Reuti" <reuti at staff.uni-marburg.de> wrote:
>
>> Am 29.04.2008 um 16:32 schrieb Heywood, Todd:
>>
>>> Here's something strange. If schedd_job_info is set to "true" in
>>> sched_conf,
>>> my load sensors and resource requests work just fine. For example,
>>> for a job
>>> asking for a non-available resource, "qstat -j 5581491" shows:
>>>
>>> (-l home2load=1) cannot run globally because it offers only
>>> gl:home2load=2.530000
>>
>> The relation is <= ?
>
> The relation in the complex is >=:
>
> home2load           home2load     DOUBLE      >=    YES          
> NO         0
> 0
>
> Job is not supposed to run until homeload is <= 1 (in this  
> example). This
> has been working fine for awhile.
>
>>
>>> But if I change schedd_job_info to "false", and use "qalter -w v
>>> 5581491", I
>>> get complaints that the resource is unknown:
>>>
>>> Job 5581491 (-l home2load=1) cannot run in queue "public.q at blade49"
>>> because
>>> job requests unknown resource (home2load)
>>>
>>> (message occurs for all hosts, not just this one).
>>
>> Mmh, qalter -w v will assume an empty cluster. Is there any initial
>> value in the "qconf -se global" for home2load?

Load values are ignored with qalter -w v, as the cluster is assumed  
to be empty anyway (might change in future SGE versions).

> No initial value. I thought that was only for consumables.

I thought the same for a long time and never hit any problem using  
http://gridengine.sunsource.net/howto/loadsensor.html while  
forgetting to make tmpfree consumable. But we use tmpfree only as a  
load_threshold and this was working fine. But if we would request "-l  
tmpfree=1G" (still not consumable), then you need an initial value in  
"qconf -me global" as recent posts on the list made me aware of. Your  
issue seems to be similar.

Nothing in global:

reuti at theochem:~> qalter -w v 66368
Job 66368 (-l tmpfree=200G) cannot run in queue instance  
"short at node41" because job requests unknown resource (tmpfree)
...(all hosts)...

Defined in global:

reuti at theochem:~> qalter -w v 66368
Job 66368 (-l tmpfree=200G) cannot run globally because it offers  
only gf:tmpfree=40.000G

-- Reuti


> [root at bhmnode2 n1ge6]# qconf -se global
> hostname              global
> load_scaling          NONE
> complex_values        NONE
> load_values           home1load=0.00,home2load=1.33,home3load=20.83, \
>                       home4load=0.02,home5load=0.00
> processors            0
> user_lists            NONE
> xuser_lists           NONE
> projects              NONE
> xprojects             NONE
> usage_scaling         NONE
> report_variables       
> cpu,h_vmem,mem_free,np_load_avg,s_vmem,virtual_free, \
>                       tmp_free
> [root at bhmnode2 n1ge6]
>
>
>>
>> -- Reuti
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list