[GE users] resource management (resending to the list)

Ara.T.Howard Ara.T.Howard at noaa.gov
Mon Apr 26 16:17:09 BST 2004


On Sun, 25 Apr 2004, Charu Chaubal wrote:

> 
> On Apr 24, 2004, at 7:08 PM, Ron Chen wrote:
> 
> > Sorry, I didn't think about that load sensors are not
> > attached to queues.
> >
> > So, for SGE 5.3, you can use the load sensor on each
> > host to close/open the queues, depending on the
> > availablity of diskspace. (but the value is
> > hard-coded)
> >
> > For SGE 6.0, you will get the "or operator", and thus
> > you can use "disk_a > x || disk_b > y". As for finding
> > out which disk to use, I think doing a "qstat -r"
> > inside the job would be able to find out which
> > resource SGE allocates for that job.
> >
> 
> One small note: the ability for a job to determine which resources it 
> had requested is the subject of Issue #409:
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=409
> 
> Regards,
> 	Charu


i ended up solving this in the following way:

  - a small object store is located locally on each node, it contains info on
    - the list of disks being monitored
    - which one is currently being monitored

  - the load monitor uses this small db to report load on an abstract disk,
    which is the one being currenlty monitored.  in otherwords disk_a and
    disk_b are reported as the abstract 'disk'.  this gives me boolean OR...

  - the actual processes which are spawned update this db and toggle the disk
    being reported on.  this coordinates access to disks and gives me the
    notoin of 'which resource of the two was i granted'

so in this way 

  - the load monitor reports that EITHER disk_a or disk_b has enough space to
    run the process

  - the process itself can easily determine which disk to run on


i think the only problem with this approach is that there is a small race
condition:

  1) a process has been dispatched and starts running but has not yet updated
  the local db

  2) load monitor reads db and incortectly reports free space that the new
  process may actually be __about__ to consume ('about' is relative here -
  each process takes about 1hr to complete)

  3) second process could be dispatched now and disk will blow up - but we
  won't know this for an hour or so...

in our case this this isn't too bad since this would only occur iff we were
nearing disk capacity AND the race condition happened...  also, i'm guessing
there is a reasonable (seconds) interval between the act of submitting a job
and re-aquiring resource loads since, in general there would be a lag between
a job starting and consuming resources... can anyone confirm this?

i suppose the fundemental problem with this approach is that job
submission/starting is not atomic... at least this is my understanding

my delima is that if i simply added 50 or so more lines of code to the above
logic i could simply spawn the commands via ssh and not use sge at all...
which is what i am currently doing.  my hope was to leverage the volume of
code sge contains, but it seems that the lever is a bit short with the current
version of sge...


-a
-- 
===============================================================================
| EMAIL   :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE   :: 303.497.6469
| ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
| URL     :: http://www.ngdc.noaa.gov/stp/
| TRY     :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done 
===============================================================================


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list