AW: AW: [GE users] resource allocation and race condition

Olesen, Mark Mark.Olesen at arvinmeritor.com
Mon Oct 18 11:05:45 BST 2004


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Thanks to all of you for the various answers.
Unfortunately, I don't seem to have found the magic formula yet.

As an example of the problem, I track 2 independent licenses:
  'shpc'  (8 licenses, exclusively for parallel calculations)
  'stars' (4 licenses, for serial/parallel/interactive use)

Since the 'stars' licenses can also be used for parallel, when the 'shpc'
licenses are exhausted, I build a composite resource 'shpc+' that can be
requested from the SGE.

The 'stars' licenses can be used interactively (pre/post-processing), or for
dry runs.  As such, they are often called directly from the user's
workstation and thus bypass the SGE. The 'shpc' licenses are mostly, but not
exclusively, used via the SGE. Thus, a load sensor to account for non-SGE
usage would appear to be unavoidable.
 
The exclusive reliance on the load sensor, however, gives rise to a possible
race condition between not only between SGE and non-SGE jobs, but also among
SGE-queued jobs themselves.
While a race condition between SGE and non-SGE jobs can be tolerated
somewhat, the race condition between SGE-queued jobs needs to be eliminated.

As many of you have pointed out, combining SGE consumables with a load
report will prevent oversubscription of a resource.  It seems to me,
however, that UNDERsubscription will the problem with this approach.

Given that load sensor currently reports the number of licenses available,
eg:
    Users of hpcdomains: \ 
        (Total of 8 licenses issued; Total of 6 licenses in use)
yields,
    begin
    global:shpc:2
    end

Then, AFAICS, the license would be tracked as follows (here I've invented a
new 'count' attribute for 'qconf -se global' to show the internal
consumption):

0) Start - no licenses used:
    complex_values shpc=8
    load_values    shpc=8
    count          shpc=0

1) Start 6 license job via SGE:
    complex_values shpc=8
    load_values    shpc=2
    count          shpc=6

2) Start 2 license job via SGE ?

Following the logic of host_conf(5), the quota definition (shpc=8) will be
replaced by the current load value (shpc=2).  From the 2 licenses that are
now registered as being available, *none* can be granted, since the internal
count is already at 6!

There must either be something severely wrong with my reasoning, or I'm
taking the wrong approach to mixing internal consumables and load reports.

/mark


Dr. Mark Olesen
Thermofluid Dynamics Analyst
ArvinMeritor Light Vehicle Systems
ArvinMeritor Emissions Technologies GmbH
Biberbachstr. 9
D-86154 Augsburg, GERMANY
tel: +49 (821) 4103 - 862
fax: +49 (821) 4103 - 7862
Mark.Olesen at ArvinMeritor.com

> Please see this HOWTO on tracking licenses with GE:
> http://bioteam.net/dag/sge-flexlm-integration/
> 
> The HOWTO has all the details, but basically, you track a license with
> *both* a load sensor *and* a consumable resource simultaneously.  The
> GE master will then use whichever is the lower of the two values in
> order to avoid oversubscribing a license.  The HOWTO talks about how
> there's still the possibility of a race condition, and ways to deal
> with it.
> 
> Regards,
> 	Charu
> 
> On Oct 15, 2004, at 8:19 AM, Olesen, Mark wrote:
> 
> >>> Assuming that I only have a single float license 'foo', I can
> >>> 'qsub -l foo=1' a job.  After a while I submit two (2) new jobs with
> >>> the
> >>> same resource requirement(s). Both these jobs wait politely in the
> >> queue,
> >>> since the resource 'foo' is unavailable.  After the first job
> >>> finishes,
> >> and
> >>> the load reports get correctly updated, *both* of the jobs in the
> >>> queue
> >> try
> >>> to grab the 'foo' resource (almost) simultaneously.
> >>> How can I circumvent such a race condition?
> >>
> >> Could you use a SGE consumable in addition to your load sensor? -
> >> Reuti
> >
> >
> > Based on what I can read from host_conf(5) about 'complex_values', I'd
> > have
> > to alter the load sensor so that it only tracks non-SGE license use
> > rather
> > than reporting the number of licenses currently available for use.
> >
> > This means that the load sensor needs to distinguish between
> > applications
> > that were started with/without SGE. If accomplished, this would make
> > the
> > load sensor anything other than lightweight.
> >
> > Is there a direct way, or a backdoor, to determine how many resources
> > SGE
> > believes are still free and/or have been allocated?  Perhaps this
> > could be a
> > means of adjusting the load sensor values.
> >
> > /mark
> >
> > Dr. Mark Olesen
> > Thermofluid Dynamics Analyst
> > ArvinMeritor Light Vehicle Systems
> > ArvinMeritor Emissions Technologies GmbH
> > Biberbachstr. 9
> > D-86154 Augsburg, GERMANY
> > tel: +49 (821) 4103 - 862
> > fax: +49 (821) 4103 - 7862
> > Mark.Olesen at ArvinMeritor.com
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: Reuti [mailto:reuti at staff.uni-marburg.de]
> >> Gesendet: Freitag, 15. Oktober 2004 11:00
> >> An: users at gridengine.sunsource.net
> >> Betreff: Re: [GE users] resource allocation and race condition
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
> ###############################################################
> # Charu V. Chaubal				# Phone: (650) 786-7672
(x87672)
> # Grid Computing Technologist	# Fax:   (650) 786-4591
> # Sun Microsystems, Inc.			# Email:
charu.chaubal at sun.com
> ###############################################################
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list