[GE users] qrsh consumes consumables, qsub does not

Ondrej Bojar bojar at ufal.mff.cuni.cz
Fri Nov 21 23:10:02 GMT 2008


Dear Reuti,

thanks for your response.

reuti wrote:
>>we use GE 6.1u3 and we have set mem_free to consumable:
>>
>>qconf -sc says:
>>...
>>mem_free   mf   MEMORY    <=   YES    YES        100      0
>>...
>>
>>All our execution hosts have mem_free set to their available physical
>>memory in qmon->Host Configuration->Consumables/Fixed Attributes.
>>
>>I can list hosts satisfying some minimum free memory limit:
>>
>>    qhost -l mem_free=15G
>>
>>
>>I can schedule interactive jobs requiring (reserving) some amount of
>>this mem_free consumable resource:
>>
>>    qrsh -l mem_free=15G "hostname; sleep 60"
>>
>>Checking the list of hosts (qhost -l above) confirms that the resource
>>has been partially consumed, e.g. the used execution host disappears
>>from the list.
> 
> 
> this I don't see.

I meant this: I keep printing the output of "qhost -l mem_free=15G" in one xterm 
window and I run 'qrsh -l mem_free=15G "hostname; sleep 60"' in another one. As 
soon as the interactive job is scheduled, the listing by qhost becomes shorter: 
the one execution host used by qrsh is no longer listed. This is what I would 
expect, the 15 GB of mem_free are consumed on that execution host.

>>Submitting a job with 'qsub -l mem_free=15G ...' however submits  
>>the job
>>on any free execution host, regardless of mem_free. Moreover, the  
>>'qhost
>>-l' list remains unchanged.)
>>
>>Could you think of any explanation?
>>
>>
>>A side issue is that even the 'qrsh -l mem_free=15G' is not  
>>reliable. I
>>often get the error 'Your "qrsh" request could not be scheduled, try
>>again later.', even in case there are enough hosts available in the
>>'qhost -l ...' listing.
>>
>>
>>(Our motivation is clear, we want to trust users: if someone submits a
>>job claiming he'll need 15 GB, we don't want to another 15GB job on a
>>16GB machine, even in case the job in question is not consuming its 15
>>GB yet.)
> 
> 
> a) Do you also observe, that the "mem_free" output is always prefixed  
> by a "hl:" in `qhost -F`?
> b) Can you try the same with virtual free? - You should see a "hc:"  
> in front of it then.

No. 'mem_free' is prefixed by hl: usually, but there are hosts that have it 
listed as hc:  Here are the exact counts:

3          hl:virtual_free
14         hc:mem_free
29         hl:mem_free
40         hc:virtual_free

Our definitions of mem_free and virtual_free are nearly identical:

virtual_free   vf         MEMORY      <=    YES         YES        0       0
mem_free       mf         MEMORY      <=    YES         YES        100     0


> As it's nowhere mentioned to be a feature of virtual_free only, maybe  
> it's a side effect of mem_tot/mem_used being displayed as columns in  
> the usual qhost output also. When you define it also as a consumable:  
> shall these columns also change their output?

I'm sorry but I don't understand your question.

I'm afraid I might miss some really basic information about the relation between 
resources and 'sensors'. I'm worried that by reusing some of the 
default/built-in resources, we might make our resources dependent on built-in 
sensors.

Would it be safer to define a new resource for our purpose from scratch?
And if so, are these all the steps we have to make?
   - introduce "our_mem"   <=, requestable, consumable as a consumable, some
     default, e.g. 100M
   - set our_mem of each host to the physical memory available (in
     qmon->Hosts->Consumables)
   - use -l our_mem=15G

By defining the resource from scratch I want to make sure it's only the 
information from our users that affects available our_mem values.

What I also do not exactly understand is the difference between the MEMORY and 
INT type for a consumable resource. Does the MEMORY type imply anything special? 
Why not just use INT?

I cannot reproduce the experiments now because the cluster is very busy and my 
qrsh requests "cannot be scheduled" (that's the only message I get).

Thanks, Ondrej.

> 
> -- Reuti
> 
> 
> 
>>Looking forward to any suggestions,
>>   Ondrej Bojar.
>>
>>-- 
>>Ondrej Bojar (mailto:obo at cuni.cz / bojar at ufal.mff.cuni.cz)
>>http://www.cuni.cz/~obo
>>
>>------------------------------------------------------
>>http://gridengine.sunsource.net/ds/viewMessage.do? 
>>dsForumId=38&dsMessageId=89374
>>
>>To unsubscribe from this discussion, e-mail: [users- 
>>unsubscribe at gridengine.sunsource.net].
> 
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89438
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

-- 
Ondrej Bojar (mailto:obo at cuni.cz / bojar at ufal.mff.cuni.cz)
http://www.cuni.cz/~obo

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89450

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list