[GE users] resource management (resending to the list)

Ara.T.Howard Ara.T.Howard at noaa.gov
Fri Apr 23 15:56:30 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

On Wed, 21 Apr 2004, Ron Chen wrote:

> May be you need to understand the way that SGE schedules jobs to queues.
> After that you can go back to my previous email and try to understand it.
> 
> In SGE, the "queues" are just job containers, so you *don't* submit to a
> queue, but SGE picks a queue for you.
> 
> If you have never played with adding queues before, you can use "qconf -aq"
> to create a queue for your host. Then submit jobs and make sure there are
> more jobs than the number of slots in the cluster, monitor how SGE schedules
> jobs to queues closely with qstat.
> 
> I think you will be able to understand the "queue" concept after you have
> played with it rather than always  use the default one created by SGE during
> installation.
> 
> After you understand the a bit confusing queue concept, the rest will be
> easy. You attach the free space for each disk to the queue, and you don't
> need to worry about the disk any more.
> 
> If you submit a job like:
> 
> qsub -l "disk=${required_space}" script
> 
> SGE will be able to find a queue (ie. a disk) for that
> job.
> 
>  -Ron

ok.  i've setup two queues for each machine (one for each machine:disk_{1,2} combo):

  eg.

    foreach machine in $machines
      setup queue ${machine}_disk_1
      setup queue ${machine}_disk_2
    end

then i created two complexes, disk_1 and disk_2, both complexes have the
following attributes
  
  ------------ -------- ------- ------ ----- ------ ---------- -------
  NAME         SHORTCUT TYPE    VALUE  RELOP REQ    CONSUMABLE DEFAULT
  ------------ -------- ------- ------ ----- ------ ---------- -------
  njob         nj       INT     1      ==    FORCED YES        1                   
  disk_free    df       MEMORY  0      <=    YES    YES        0                   
  disk_tot     dt       MEMORY  0      <=    YES    YES        0                   
  disk_used    du       MEMORY  0      >=    YES    YES        0                   


for each of created queues, complex 'disk_1' is attached to ${machine}_disk_1
and complex 'disk_2' is attached to ${machine}_disk_2


my aim here, is obviously to be able to say

  qsub -l "disk_free=${required_space}" script

and have a job be scheduled, one at the time, in a queue with enough free
disk.  remember, there two such possible disks per machine and either one will
suffice.

here is where i'm running into problems, i have a load monitor which reports
on the various disk quantities (modified from doc's example 'tmpspace.sh').
the trouble is, it seems you can only associate a load_monitor with a host,
not a host and a queue or only a queue.  so i seem to have two alternatives,
neither of which would work:

  0) there are two load_monitors per machine, one which reports on disk_1 and
  one which reports on disk_2, each could output key=val pairs like

    ...
      disk_1_free=123435
      disk_1_used=54321
    ...

  or

    ...
      disk_2_free=123435
      disk_2_used=54321
    ...

  depending on which disk.  note, however that they both cannot output key=val
  pairs like

    ...
      disk_free=123435
      disk_used=54321
    ...

  since they would be clobbering each other's output!  additionally it seems
  you cannot actually run two load monitors on a machine...


  1) one load monitor per machine, it reports on both disk_1 AND disk_2 with
  key=val pairs like

    ...
      disk_1_free=123435
      disk_1_used=54321
      disk_2_free=123435
      disk_2_used=54321
    ...

  using this method there is no way to de-multiplex these values into the
  'disk_free', 'disk_used', etc. attributes associated with each complex that
  sge will be looking for in the output of the load monitors...


so, perhaps i am being dense (note that i have very little experience with
sge), but i've re-read your original post and most of the docs and it seems
like this again boils down to the inability to do OR'ing of resources:

this is my current understanding :

  - exactly ONE load monitor may be configured per host

  - the load monitor must output unique key=val pairs

    eg

      ...
        disk_1_free=12345
        disk_2_free=12345
      ...

  - there is no way to merge (OR) key=val pairs.  either via setting up
    multiple queues, multiple complexes, or from the qsub command line

eg.  i think i'm stuck.  hopefully you can enlighten me! ;-)

-a

> 
> --- "Ara.T.Howard" <Ara.T.Howard at noaa.gov> wrote:
> > if i understand you correctly, this is problematic
> > for two reasons:
> > 
> >   - i cannot know apriori which host to run on
> >   - i cannot know apriori which disk of a particular
> > host to run on
> > 
> > 
> > in otherwords, given the above, how would i say
> > 
> >   qsub -l 'any host' -l 'any of two disks' job
> > 
> > it seems i would need to know both the host and disk
> > to submit to, wouldn't i?
> > 
> > i would like to associate each of the disks with a
> > host as a resource and be
> > able to say:
> > 
> >   qsub -l "disk_a=${required_space} or
> > disk_b=${required_space}" ... job
> > 
> > but it seems that logical && is all i get...
> > 
> > -a
> > -- 
> >
> ===============================================================================
> > | EMAIL   :: Ara [dot] T [dot] Howard [at] noaa
> > [dot] gov
> > | PHONE   :: 303.497.6469
> > | ADDRESS :: E/GC2 325 Broadway, Boulder, CO
> > 80305-3328
> > | URL     :: http://www.ngdc.noaa.gov/stp/
> > | TRY     :: for l in ruby perl;do $l -e "print
> > \"\x3a\x2d\x29\x0a\"";done 
> >
> ===============================================================================
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> > users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> > users-help at gridengine.sunsource.net
> > 
> 
> 
> 
> 	
> 		
> __________________________________
> Do you Yahoo!?
> Yahoo! Photos: High-quality 4x6 digital prints for 25?
> http://photos.yahoo.com/ph/print_splash
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
===============================================================================
| EMAIL   :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE   :: 303.497.6469
| ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
| URL     :: http://www.ngdc.noaa.gov/stp/
| TRY     :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done 
===============================================================================




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list