[GE users] resource management (resending to the list)

Charu Chaubal Charu.Chaubal at Sun.COM
Mon Apr 26 02:23:48 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


On Apr 24, 2004, at 7:08 PM, Ron Chen wrote:

> Sorry, I didn't think about that load sensors are not
> attached to queues.
>
> So, for SGE 5.3, you can use the load sensor on each
> host to close/open the queues, depending on the
> availablity of diskspace. (but the value is
> hard-coded)
>
> For SGE 6.0, you will get the "or operator", and thus
> you can use "disk_a > x || disk_b > y". As for finding
> out which disk to use, I think doing a "qstat -r"
> inside the job would be able to find out which
> resource SGE allocates for that job.
>

One small note: the ability for a job to determine which resources it 
had requested is the subject of Issue #409:
http://gridengine.sunsource.net/issues/show_bug.cgi?id=409

Regards,
	Charu


> (So, if you use SGE 6.0, you don't need to have 1
> queue per disk. And in fact, you can have only 1 queue
> per cluster using "cluster queue"!)
>
>  -Ron
>
> --- "Ara.T.Howard" <Ara.T.Howard at noaa.gov> wrote:
>> ok.  i've setup two queues for each machine (one for
>> each machine:disk_{1,2} combo):
>>
>>   eg.
>>
>>     foreach machine in $machines
>>       setup queue ${machine}_disk_1
>>       setup queue ${machine}_disk_2
>>     end
>>
>> then i created two complexes, disk_1 and disk_2,
>> both complexes have the
>> following attributes
>>
>>   ------------ -------- ------- ------ ----- ------
>> ---------- -------
>>   NAME         SHORTCUT TYPE    VALUE  RELOP REQ
>> CONSUMABLE DEFAULT
>>   ------------ -------- ------- ------ ----- ------
>> ---------- -------
>>   njob         nj       INT     1      ==    FORCED
>> YES        1
>>   disk_free    df       MEMORY  0      <=    YES
>> YES        0
>>   disk_tot     dt       MEMORY  0      <=    YES
>> YES        0
>>   disk_used    du       MEMORY  0      >=    YES
>> YES        0
>>
>>
>> for each of created queues, complex 'disk_1' is
>> attached to ${machine}_disk_1
>> and complex 'disk_2' is attached to
>> ${machine}_disk_2
>>
>>
>> my aim here, is obviously to be able to say
>>
>>   qsub -l "disk_free=${required_space}" script
>>
>> and have a job be scheduled, one at the time, in a
>> queue with enough free
>> disk.  remember, there two such possible disks per
>> machine and either one will
>> suffice.
>>
>> here is where i'm running into problems, i have a
>> load monitor which reports
>> on the various disk quantities (modified from doc's
>> example 'tmpspace.sh').
>> the trouble is, it seems you can only associate a
>> load_monitor with a host,
>> not a host and a queue or only a queue.  so i seem
>> to have two alternatives,
>> neither of which would work:
>>
>>   0) there are two load_monitors per machine, one
>> which reports on disk_1 and
>>   one which reports on disk_2, each could output
>> key=val pairs like
>>
>>     ...
>>       disk_1_free=123435
>>       disk_1_used=54321
>>     ...
>>
>>   or
>>
>>     ...
>>       disk_2_free=123435
>>       disk_2_used=54321
>>     ...
>>
>>   depending on which disk.  note, however that they
>> both cannot output key=val
>>   pairs like
>>
>>     ...
>>       disk_free=123435
>>       disk_used=54321
>>     ...
>>
>>   since they would be clobbering each other's
>> output!  additionally it seems
>>   you cannot actually run two load monitors on a
>> machine...
>>
>>
>>   1) one load monitor per machine, it reports on
>> both disk_1 AND disk_2 with
>>   key=val pairs like
>>
>>     ...
>>       disk_1_free=123435
>>       disk_1_used=54321
>>       disk_2_free=123435
>>       disk_2_used=54321
>>     ...
>>
>>   using this method there is no way to de-multiplex
>> these values into the
>>   'disk_free', 'disk_used', etc. attributes
>> associated with each complex that
>>   sge will be looking for in the output of the load
>> monitors...
>>
>>
>> so, perhaps i am being dense (note that i have very
>> little experience with
>> sge), but i've re-read your original post and most
>> of the docs and it seems
>> like this again boils down to the inability to do
>> OR'ing of resources:
>>
>> this is my current understanding :
>>
>>   - exactly ONE load monitor may be configured per
>> host
>>
>>   - the load monitor must output unique key=val
>> pairs
>>
>>     eg
>>
>>       ...
>>         disk_1_free=12345
>>         disk_2_free=12345
>>       ...
>>
>>   - there is no way to merge (OR) key=val pairs.
>> either via setting up
>>     multiple queues, multiple complexes, or from the
>> qsub command line
>>
>> eg.  i think i'm stuck.  hopefully you can enlighten
>> me! ;-)
>>
>> -a
>>
>>>
>>> --- "Ara.T.Howard" <Ara.T.Howard at noaa.gov> wrote:
>>>> if i understand you correctly, this is
>> problematic
>>>> for two reasons:
>>>>
>>>>   - i cannot know apriori which host to run on
>>>>   - i cannot know apriori which disk of a
>> particular
>>>> host to run on
>>>>
>>>>
>>>> in otherwords, given the above, how would i say
>>>>
>>>>   qsub -l 'any host' -l 'any of two disks' job
>>>>
>>>> it seems i would need to know both the host and
>> disk
>>>> to submit to, wouldn't i?
>>>>
>>>> i would like to associate each of the disks with
>> a
>>
> === message truncated ===
>
>
>
> 	
> 		
> __________________________________
> Do you Yahoo!?
> Yahoo! Photos: High-quality 4x6 digital prints for 25?
> http://photos.yahoo.com/ph/print_splash
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
########################################################
# Charu V. Chaubal				# Phone: (650) 786-7672 (x87672)
# Grid Computing Technologist	# Fax:   (650) 786-4591
# Sun Microsystems, Inc.			# Email: charu.chaubal at sun.com
########################################################




More information about the gridengine-users mailing list