[GE users] qrsh consumes consumables, qsub does not

reuti reuti at staff.uni-marburg.de
Sat Nov 22 11:06:04 GMT 2008


Am 22.11.2008 um 00:10 schrieb Ondrej Bojar:

> Dear Reuti,
>
> thanks for your response.
>
> reuti wrote:
>>> we use GE 6.1u3 and we have set mem_free to consumable:
>>>
>>> qconf -sc says:
>>> ...
>>> mem_free   mf   MEMORY    <=   YES    YES        100      0
>>> ...
>>>
>>> All our execution hosts have mem_free set to their available  
>>> physical
>>> memory in qmon->Host Configuration->Consumables/Fixed Attributes.
>>>
>>> I can list hosts satisfying some minimum free memory limit:
>>>
>>>    qhost -l mem_free=15G
>>>
>>>
>>> I can schedule interactive jobs requiring (reserving) some amount of
>>> this mem_free consumable resource:
>>>
>>>    qrsh -l mem_free=15G "hostname; sleep 60"
>>>
>>> Checking the list of hosts (qhost -l above) confirms that the  
>>> resource
>>> has been partially consumed, e.g. the used execution host disappears
>>> from the list.
>>
>>
>> this I don't see.
>
> I meant this: I keep printing the output of "qhost -l mem_free=15G"
> in one xterm
> window and I run 'qrsh -l mem_free=15G "hostname; sleep 60"' in  
> another one. As
> soon as the interactive job is scheduled, the listing by qhost  
> becomes shorter:
> the one execution host used by qrsh is no longer listed. This is  
> what I would
> expect, the 15 GB of mem_free are consumed on that execution host.

Ok, fine.

>
>>> Submitting a job with 'qsub -l mem_free=15G ...' however submits
>>> the job
>>> on any free execution host, regardless of mem_free. Moreover, the
>>> 'qhost
>>> -l' list remains unchanged.)
>>>
>>> Could you think of any explanation?
>>>
>>>
>>> A side issue is that even the 'qrsh -l mem_free=15G' is not
>>> reliable. I
>>> often get the error 'Your "qrsh" request could not be scheduled, try
>>> again later.', even in case there are enough hosts available in the
>>> 'qhost -l ...' listing.
>>>
>>>
>>> (Our motivation is clear, we want to trust users: if someone  
>>> submits a
>>> job claiming he'll need 15 GB, we don't want to another 15GB job  
>>> on a
>>> 16GB machine, even in case the job in question is not consuming  
>>> its 15
>>> GB yet.)
>>
>>
>> a) Do you also observe, that the "mem_free" output is always prefixed
>> by a "hl:" in `qhost -F`?
>> b) Can you try the same with virtual free? - You should see a "hc:"
>> in front of it then.
>
> No. 'mem_free' is prefixed by hl: usually, but there are hosts that  
> have it
> listed as hc:  Here are the exact counts:
>
> 3          hl:virtual_free
> 14         hc:mem_free
> 29         hl:mem_free
> 40         hc:virtual_free

So for some machine the load sensor is the actual output, for others  
the complex.


> Our definitions of mem_free and virtual_free are nearly identical:
>
> virtual_free   vf         MEMORY      <=    YES         YES         
> 0       0
> mem_free       mf         MEMORY      <=    YES         YES         
> 100     0
>
>
>> As it's nowhere mentioned to be a feature of virtual_free only, maybe
>> it's a side effect of mem_tot/mem_used being displayed as columns in
>> the usual qhost output also. When you define it also as a consumable:
>> shall these columns also change their output?
>
> I'm sorry but I don't understand your question.

$ qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE   
SWAPTO  SWAPUS

shall the column here (MEMUSE) also reflect this - shall MEMUSE  
derived from mem_free? Maybe somehting like "qhost -o  
virtual_free,np_load_avg" to list individual columns (that can be an  
RFE).


> I'm afraid I might miss some really basic information about the  
> relation between
> resources and 'sensors'. I'm worried that by reusing some of the
> default/built-in resources, we might make our resources dependent  
> on built-in
> sensors.

`man complex` will explain the idea to use it as load sensor and  
complex at the same time. It should work


> Would it be safer to define a new resource for our purpose from  
> scratch?
> And if so, are these all the steps we have to make?
>    - introduce "our_mem"   <=, requestable, consumable as a  
> consumable, some
>      default, e.g. 100M
>    - set our_mem of each host to the physical memory available (in
>      qmon->Hosts->Consumables)
>    - use -l our_mem=15G
>
> By defining the resource from scratch I want to make sure it's only  
> the
> information from our users that affects available our_mem values.

For me virtual_free was working all the time in the past. I tried  
also with mem_free now, and qrsh and qsub is behaving fine. I also  
tried 6.1u3 besides 6.2.

> What I also do not exactly understand is the difference between the  
> MEMORY and
> INT type for a consumable resource. Does the MEMORY type imply  
> anything special?
> Why not just use INT?

You can use INT and even attach 15G to it, but the output will be  
like 15000000000.000000

-- Reuti

> I cannot reproduce the experiments now because the cluster is very  
> busy and my
> qrsh requests "cannot be scheduled" (that's the only message I get).
>
> Thanks, Ondrej.
>
>>
>> -- Reuti
>>
>>
>>
>>> Looking forward to any suggestions,
>>>   Ondrej Bojar.
>>>
>>> -- 
>>> Ondrej Bojar (mailto:obo at cuni.cz / bojar at ufal.mff.cuni.cz)
>>> http://www.cuni.cz/~obo
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=89374
>>>
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=89438
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>
> -- 
> Ondrej Bojar (mailto:obo at cuni.cz / bojar at ufal.mff.cuni.cz)
> http://www.cuni.cz/~obo
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=89450
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=89490

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list