[GE users] Problem with Complexes and disabling queues

Richard Hobbs richard.hobbs at crl.toshiba.co.uk
Thu Dec 15 16:08:35 GMT 2005


Hello,

Thank you again for your response. I have migrated over to using the default
"slots" complex instead, and it appears to be working.

I seem to remember that this may have been attempted previously (with 5.3p1)
and it failed for some reason, but it's a distant memory, and it appears to
be working with 5.3p6 anyway.

Thanks!!

Richard.

-- 
Richard Hobbs (Systems Administrator)
Toshiba Research Europe Ltd. - Speech Technology Group
Web: http://www.toshiba-europe.com/research/
Normal Email: richard.hobbs at crl.toshiba.co.uk
Mobile Email: mobile at mongeese.co.uk
Tel: +44 1223 376964        Mobile: +44 7811 803377 

> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: 15 December 2005 13:43
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Problem with Complexes and disabling queues
> 
> Hi,
> 
> according to the "glinux" output you are still using 5.3. But 
> anyway,  
> just attach it to:
> 
> complex_values             slots=4
> 
> in the host definition (qconf -me <nodename>). The "hv:slots=..."  
> will then change to "hc:slots=...", and should limit the maximum  
> number of slots used on this machine by all of the defined 
> queues for  
> it in common. - Reuti
> 
> 
> Am 15.12.2005 um 13:30 schrieb Richard Hobbs:
> 
> > Hello,
> >
> > Output as requested:
> >
> > ============================================================
> > [root at stg2 root]# qhost -h stg-tts1 -F
> > HOSTNAME             ARCH       NPROC  LOAD   MEMTOT   
> MEMUSE   SWAPTO
> > SWAPUS
> > 
> --------------------------------------------------------------
> -------- 
> > ------
> > ---
> > global               -              -     -        -        
> -        -
> > -
> >    hv:arch=none
> >    hv:num_proc=1.000000
> >    hv:load_avg=99.990000
> >    hv:load_short=99.990000
> >    hv:load_medium=99.990000
> >    hv:load_long=99.990000
> >    hv:np_load_avg=99.990000
> >    hv:np_load_short=99.990000
> >    hv:np_load_medium=99.990000
> >    hv:np_load_long=99.990000
> >    hv:mem_free=0.000000
> >    hv:mem_total=0.000000
> >    hv:swap_free=0.000000
> >    hv:swap_total=0.000000
> >    hv:virtual_free=0.000000
> >    hv:virtual_total=0.000000
> >    hv:mem_used=infinity
> >    hv:swap_used=infinity
> >    hv:virtual_used=infinity
> >    hv:swap_rsvd=0.000000
> >    hv:swap_rate=0.000000
> >    hv:slots=0.000000
> >    hv:s_vmem=0.000000
> >    hv:h_vmem=0.000000
> >    hv:s_fsize=0.000000
> >    hv:h_fsize=0.000000
> >    hv:cpu=0.000000
> > stg-tts1             glinux         4  0.00  1005.8M   
> 203.2M     2.0G
> > 756.0K
> >    hl:arch=glinux
> >    hl:num_proc=4.000000
> >    hl:load_avg=0.000000
> >    hl:load_short=0.000000
> >    hl:load_medium=0.000000
> >    hl:load_long=0.000000
> >    hl:np_load_avg=0.000000
> >    hl:np_load_short=0.000000
> >    hl:np_load_medium=0.000000
> >    hl:np_load_long=0.000000
> >    hl:mem_free=802.59M
> >    hl:mem_total=1005.83M
> >    hl:swap_free=2.00G
> >    hl:swap_total=2.00G
> >    hl:virtual_free=2.78G
> >    hl:virtual_total=2.98G
> >    hl:mem_used=203.23M
> >    hl:swap_used=756.00K
> >    hl:virtual_used=203.97M
> >    hv:swap_rsvd=0.000000
> >    hv:swap_rate=0.000000
> >    hv:slots=0.000000
> >    hv:s_vmem=0.000000
> >    hv:h_vmem=0.000000
> >    hv:s_fsize=0.000000
> >    hv:h_fsize=0.000000
> >    hl:cpu=0.100000
> >    hc:mem_slot=4.000000
> > [root at stg2 root]#
> > ============================================================
> >
> > Am I to understand that the default "slots" complex is 
> designed to to
> > exactly what we are trying to with "mem_slot"? Is it a definitive  
> > maximum
> > number of slots per machine, which will *never* be exceeded by  
> > GridEngine?
> >
> > Also, given that our value for "slots" is currently set to zero,  
> > how would I
> > start to use this feature if I set it to 4?
> >
> > Thanks again,
> > Richard.
> >
> > -- 
> > Richard Hobbs (Systems Administrator)
> > Toshiba Research Europe Ltd. - Speech Technology Group
> > Web: http://www.toshiba-europe.com/research/
> > Normal Email: richard.hobbs at crl.toshiba.co.uk
> > Mobile Email: mobile at mongeese.co.uk
> > Tel: +44 1223 376964        Mobile: +44 7811 803377
> >
> >> -----Original Message-----
> >> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> >> Sent: 14 December 2005 21:07
> >> To: users at gridengine.sunsource.net
> >> Subject: Re: [GE users] Problem with Complexes and disabling queues
> >>
> >> Hi,
> >>
> >> Am 14.12.2005 um 17:22 schrieb Richard Hobbs:
> >>
> >>> Hello,
> >>>
> >>> We have various queues configured on various hosts. Each
> >> host has a
> >>> complex
> >>> setup as a consumable resource, named "mem_slot". The value of
> >>> "mem_slot" is
> >>> 4. Basically, we have many queues on each machine, but only
> >> 4 CPUs,
> >>> and this
> >>> consumable is therefore designed to stop too many jobs running on
> >>> one host.
> >>>
> >>> Each queue (using 'qconf -mq queuename') then has a value for
> >>> "mem_slot",
> >>> which is 1.
> >>>
> >>> Also, each submitted job uses "-l mem_slot=1" to requests one
> >>> mem_slot.
> >>>
> >>> This works fine.
> >>>
> >>> However, if I disable a queue with a running job in order to stop
> >>> more jobs
> >>> being submitted to this queue, it releases the mem_slot, and 5th
> >>> job will
> >>> enter the machine even if the previous jobs are all still running.
> >>>
> >>> It's almost as if disabling a queue releases the resources even
> >>> though the
> >>> job is still active and running.
> >>>
> >>> This seems like a bug...
> >>>
> >>> Can anyone confirm having seen this? Is there a fix? Is there a
> >>> workaround?
> >>
> >> we are also using complexes, but I don't see this behavior in u6
> >> (which is your version?). Can you check this by issuing:
> >>
> >> qhost -h <nodename> -F
> >>
> >> But anyway, you don't need this mem_slot at all I think. If I
> >> understand you in the correct way, you could just attach 
> the default
> >> complex "slots" to your exec nodes with a value set to 4.
> >>
> >> Cheers - Reuti
> >>
> >> 
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: 
> users-help at gridengine.sunsource.net
> >>
> >>
> >> 
> _____________________________________________________________________
> >> This e-mail has been scanned for viruses by MCI's Internet
> >> Managed Scanning Services - powered by MessageLabs. For
> >> further information visit http://www.mci.com
> >>
> >>
> >
> >
> >
> > 
> _____________________________________________________________________
> > This e-mail has been scanned for viruses by MCI's Internet Managed  
> > Scanning Services - powered by MessageLabs. For further 
> information  
> > visit http://www.mci.com
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> _____________________________________________________________________
> This e-mail has been scanned for viruses by MCI's Internet 
> Managed Scanning Services - powered by MessageLabs. For 
> further information visit http://www.mci.com
> 
> 



_____________________________________________________________________
This e-mail has been scanned for viruses by MCI's Internet Managed Scanning Services - powered by MessageLabs. For further information visit http://www.mci.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list