[GE users] Problem with Complexes and disabling queues

Reuti reuti at staff.uni-marburg.de
Thu Dec 15 13:42:30 GMT 2005


Hi,

according to the "glinux" output you are still using 5.3. But anyway,  
just attach it to:

complex_values             slots=4

in the host definition (qconf -me <nodename>). The "hv:slots=..."  
will then change to "hc:slots=...", and should limit the maximum  
number of slots used on this machine by all of the defined queues for  
it in common. - Reuti


Am 15.12.2005 um 13:30 schrieb Richard Hobbs:

> Hello,
>
> Output as requested:
>
> ============================================================
> [root at stg2 root]# qhost -h stg-tts1 -F
> HOSTNAME             ARCH       NPROC  LOAD   MEMTOT   MEMUSE   SWAPTO
> SWAPUS
> ---------------------------------------------------------------------- 
> ------
> ---
> global               -              -     -        -        -        -
> -
>    hv:arch=none
>    hv:num_proc=1.000000
>    hv:load_avg=99.990000
>    hv:load_short=99.990000
>    hv:load_medium=99.990000
>    hv:load_long=99.990000
>    hv:np_load_avg=99.990000
>    hv:np_load_short=99.990000
>    hv:np_load_medium=99.990000
>    hv:np_load_long=99.990000
>    hv:mem_free=0.000000
>    hv:mem_total=0.000000
>    hv:swap_free=0.000000
>    hv:swap_total=0.000000
>    hv:virtual_free=0.000000
>    hv:virtual_total=0.000000
>    hv:mem_used=infinity
>    hv:swap_used=infinity
>    hv:virtual_used=infinity
>    hv:swap_rsvd=0.000000
>    hv:swap_rate=0.000000
>    hv:slots=0.000000
>    hv:s_vmem=0.000000
>    hv:h_vmem=0.000000
>    hv:s_fsize=0.000000
>    hv:h_fsize=0.000000
>    hv:cpu=0.000000
> stg-tts1             glinux         4  0.00  1005.8M   203.2M     2.0G
> 756.0K
>    hl:arch=glinux
>    hl:num_proc=4.000000
>    hl:load_avg=0.000000
>    hl:load_short=0.000000
>    hl:load_medium=0.000000
>    hl:load_long=0.000000
>    hl:np_load_avg=0.000000
>    hl:np_load_short=0.000000
>    hl:np_load_medium=0.000000
>    hl:np_load_long=0.000000
>    hl:mem_free=802.59M
>    hl:mem_total=1005.83M
>    hl:swap_free=2.00G
>    hl:swap_total=2.00G
>    hl:virtual_free=2.78G
>    hl:virtual_total=2.98G
>    hl:mem_used=203.23M
>    hl:swap_used=756.00K
>    hl:virtual_used=203.97M
>    hv:swap_rsvd=0.000000
>    hv:swap_rate=0.000000
>    hv:slots=0.000000
>    hv:s_vmem=0.000000
>    hv:h_vmem=0.000000
>    hv:s_fsize=0.000000
>    hv:h_fsize=0.000000
>    hl:cpu=0.100000
>    hc:mem_slot=4.000000
> [root at stg2 root]#
> ============================================================
>
> Am I to understand that the default "slots" complex is designed to to
> exactly what we are trying to with "mem_slot"? Is it a definitive  
> maximum
> number of slots per machine, which will *never* be exceeded by  
> GridEngine?
>
> Also, given that our value for "slots" is currently set to zero,  
> how would I
> start to use this feature if I set it to 4?
>
> Thanks again,
> Richard.
>
> -- 
> Richard Hobbs (Systems Administrator)
> Toshiba Research Europe Ltd. - Speech Technology Group
> Web: http://www.toshiba-europe.com/research/
> Normal Email: richard.hobbs at crl.toshiba.co.uk
> Mobile Email: mobile at mongeese.co.uk
> Tel: +44 1223 376964        Mobile: +44 7811 803377
>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: 14 December 2005 21:07
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] Problem with Complexes and disabling queues
>>
>> Hi,
>>
>> Am 14.12.2005 um 17:22 schrieb Richard Hobbs:
>>
>>> Hello,
>>>
>>> We have various queues configured on various hosts. Each
>> host has a
>>> complex
>>> setup as a consumable resource, named "mem_slot". The value of
>>> "mem_slot" is
>>> 4. Basically, we have many queues on each machine, but only
>> 4 CPUs,
>>> and this
>>> consumable is therefore designed to stop too many jobs running on
>>> one host.
>>>
>>> Each queue (using 'qconf -mq queuename') then has a value for
>>> "mem_slot",
>>> which is 1.
>>>
>>> Also, each submitted job uses "-l mem_slot=1" to requests one
>>> mem_slot.
>>>
>>> This works fine.
>>>
>>> However, if I disable a queue with a running job in order to stop
>>> more jobs
>>> being submitted to this queue, it releases the mem_slot, and 5th
>>> job will
>>> enter the machine even if the previous jobs are all still running.
>>>
>>> It's almost as if disabling a queue releases the resources even
>>> though the
>>> job is still active and running.
>>>
>>> This seems like a bug...
>>>
>>> Can anyone confirm having seen this? Is there a fix? Is there a
>>> workaround?
>>
>> we are also using complexes, but I don't see this behavior in u6
>> (which is your version?). Can you check this by issuing:
>>
>> qhost -h <nodename> -F
>>
>> But anyway, you don't need this mem_slot at all I think. If I
>> understand you in the correct way, you could just attach the default
>> complex "slots" to your exec nodes with a value set to 4.
>>
>> Cheers - Reuti
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>> _____________________________________________________________________
>> This e-mail has been scanned for viruses by MCI's Internet
>> Managed Scanning Services - powered by MessageLabs. For
>> further information visit http://www.mci.com
>>
>>
>
>
>
> _____________________________________________________________________
> This e-mail has been scanned for viruses by MCI's Internet Managed  
> Scanning Services - powered by MessageLabs. For further information  
> visit http://www.mci.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list