[GE users] qalter-ing consumable resource requests

jerry37 jerry37 at seznam.cz
Tue Dec 1 10:24:00 GMT 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Mark,

> ------------ P?vodní zpráva ------------
> Od: olesen <Mark.Olesen at emconTechnologies.com>
> P?edm?t: Re: [GE users] qalter-ing consumable resource requests
> Datum: 01.12.2009 11:07:31
> ----------------------------------------
> > > that someone can check the behavior of the job. Otherwise you might
> > > get confused when you look at "qhost -F" and discover some
> > > inconsistency.
> >
> > I don't really understand. I mean - being able to change any other
> > option (that can be changed) of a running job, you introduce
> > inconsistency into qstat output anyway, so whats the big deal?
>
> If the allocated resources are things like licenses, you may very well
> run into very interesting problems with 'lying' about how many resources
> are actually needed/used by a job.

I know, but the problem would actually lie in SGE not making a difference between what has been requested for the job in the scheduling time, what are the current requests which would come in place when migrated/rescheduled and what has been actually granted. Isn't this also the reason why there is no support for soft consumable resource requests?
I am fine with just internal SGE counter of the complex not knowing what actually consumes it as long as it keeps the track of it.

> > > (ssh to the headnode with hostbased authtication can help to avoid
> > > that every machine is also a submit-host, you could also submit local
> > > on the node of course.)
> >
> > I know what you mean, but we chose not to allow job submission from
> > within the job. Right now, all computations, no matter of how many
> > interdependant job they consist of, are static set of jobs bound to
> > one DRMAA session that share certain configurations. Such set is not
> > altered very easily so thats why the choice.
>
> Since you are using exit 99 to resubmit the job anyhow, you could try
> something like this approach:
>
> - use the job context to 'remember' information between various stages.
> - submit with a context = estimate
> - estimate the true resource requirements
> - place these requirements in context string, flag with context = alter
> - use qalter to place a hold on the job
> - exit 99
>
> The job is now in a hold state.
> An external script that runs regularly (eg, bound into a load sensor or
> as a separate daemon) checks for jobs in the hold state with context
> 'alter'. It adjusts the resource requirements with qalter, marks the
> change in the job context and releases the hold.
>
> I haven't tested if this actually works -- it's just an idea.
>
> /mark

I was considering this idea as well - using an external mechanism to do the real change while waiting in the hold - I will play with it today. I didn't think of job context variables though, thanks for that idea.

Jerry

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=230662

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list