No subject


Wed Jan 12 20:38:46 GMT 2011


1) A 'scout' job containing the identical job requirements, *except* the
licenses. Some care might be needed with the -pe.
2) An 'acquire' job containing *only* the license requirements (scaled by
the -pe).


A hold_jid is placed on 'pending' so that it must now wait for 'scout' to
complete (operator holds used as required in this phase).
 
When the 'scout' job can run, it implies that all the job requirements have
been satisfied except possibly the licenses. Within the 'scout' job, the
'acquire' job is submitted synchronously to the 'Provider'. (Via transfer
queues or ssh??)

Since the 'Provider' shadows the available licenses from 'Site1' and
'Site2', the 'acquire' job should only be able to run when enough global
licenses are available. Only after the 'Site1' and 'Site2' limits have been
adjusted and confirmed is the 'acquire' job allowed to complete.

With completion of the synchronous 'acquire' job, the 'scout' job also
finishes and leaves the license and other resources free for the original
'Pending' job.

Open issues:
* qdel a 'pending' job. The 'scout' must have "pending=JOBID" in its job
context or somewhere else to let us remove the 'scout' if the user has
deleted the original 'pending' process.

* does the 'pending' job really run immediately after the 'scout'?
Since the 'scout' is running as queue manager, it can boost the priority of
'pending' or add a complex (eg, 'borrowed=TRUE') with a very high priority.
This would be done by the 'scout' just before it exists.

* After licenses have be acquired, should there be a mechanism to restore
them to normal levels? Or does the 'Provider' do this itself?

* If we've borrowed licenses from another site, they really should be
checked back in with the 'Provider'. Otherwise a heavily loaded 'Site1' with
many pending jobs would never release them back to 'Site2'. A simple
solution might be the 'borrowed=TRUE' complex to tag licenses that should be
considered as external by qlicserver. By virtue of the 'acquire' job, enough
resources should be available at the job start, but would be removed from
the internal license management at the subsequent qlicserver interval.

* Some mechanism for suspending jobs using borrowed licenses would be
useful.

* I don't know what happens when the desired resource suddenly becomes
available locally while the 'scout' and 'acquire' jobs are active.

On the positive side:
With the change to qstat in v6.1, at least the user won't have to see all
these scout, acquire, whatever jobs ;)


/mark
This e-mail message and any attachments may contain legally privileged, confidential or proprietary Information, or information otherwise protected by law of EMCON Technologies, its affiliates, or third parties. This notice serves as marking of its "Confidential" status as defined in any confidentiality agreements concerning the sender and recipient. If you are not the intended recipient(s), or the employee or agent responsible for delivery of this message to the intended recipient(s), you are hereby notified that any dissemination, distribution or copying of this e-mail message is strictly prohibited. If you have received this message in error, please immediately notify the sender and delete this e-mail message from your computer.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list