[GE users] Prioritised VIP jobs and master queue setup

erilon78se erik.lonroth at scania.com
Tue Jul 27 07:46:38 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hello Reuti!

We have basically the approach that you describe. We have found it very successful to abstract the functionality of our applications as having three different profiles:

1. serial
2. parallel without master node
3. parallel with master node

With our decribed setup (described earlier in this email) we can apply the same principle in node allocation for any application. The only real drawback is the "node over-subsription" for jobs allocationg only "part-of-a-node" and secondly "under-utilization" on the exclusive master-nodes. (As we discussed earlier)

Our end users are however in great need of being able to submit VIP jobs and I think that this really must be a reoccurring problem.

Our strategy is now to implement the logic needed to single out jobs that needs to be "suspended" somehow in order to force the start of a newly submitted job with a "VIP" status. How that logic is to be applied is not clear yet to me, but I would hope for some good advice on this matter from you...

Regards
/Erik



-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de]
Sent: den 26 juli 2010 19:46
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Prioritised VIP jobs and master queue setup


Hi,

Am 26.07.2010 um 10:04 schrieb erilon78se:

> First, I would say that the functionality of:  "run this job now, and
> suspend what ever is necessary to get it working" would be a very good
> enhancement.
>
> We are currently trying to build around this, by having external logic
> (aka a "qsub-wrapper") that addresses this specific issue. Would you
> say that the "JSV" (job submission verifier) would be a suitable
> target for this purpose?

in principle it can be put into a JSV, but then you would need to parse the job script and interpret its logic. So the the approach of supplying some qsub-wrapper might be better, when by qsub-wrapper you mean some submit scripts, which will create a job script on-the-fly, submit it and remove it again. We are doing this here for most of our applications, and the end user doesn't have to know anything about programming a valid job script. This also has the advantage, that the logic of copying most of the stuff to the execution node and the result back again can be applied automatically.

Somehow I have the impression, that you need either a fast (parallel) home directory and compute directly therein *), or need some mechanism inside the jobscript (as different applications need different files to be copied) to copy stuff to and from the nodes. While the former will also handle "awkward" job scripts, does the latter save some money as long as the users conform to such a workflow.

*) Exceptions apply: some parallel apps need a shared directory between all involved nodes. But with more and more cores being put into a single enclosure, IMO the need for runs across several nodes will decline.


> Secondly. Regarding bug: 2603
> (http://gridengine.sunsource.net/issues/show_bug.cgi?id=2603) (Status:
> REOPENED) - Do you think that the issue will be addressed by
> developers soon, or, would you suggest we patch the SGE-software? In
> that case, who do you think we should be talking to to get the right
> functionality in place?

I don't know, maybe Daniel can make a statement on this I think. At least when someone with a paid maintenance contract needs this functionality, it needs to be fixed.


> Last, I have to thank you Reuti for excellent help on former questions
> we have had on various questions before. It has been of outstanding
> value for us and I'm sure many others share my opinion on this.

Thx - Reuti


> Regards
> /Erik Lönroth
>
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: den 23 juli 2010 16:49
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Prioritised VIP jobs and master queue setup
>
>
> Hi,
>
> Am 23.07.2010 um 12:42 schrieb erilon78se:
>
>> We have run SGE (5.X) for some good four years, with very good
>> results. We've invested a new cluster and are now trying to improve
>> on our queueing setup.
>>
>> Let me describe our environment briefly:
>>
>> * SGE v6.2u5
>>
>> * We have a number of paralell (running on multiple hosts)
>> applications, lets stay 10 different.
>>
>> * We have a number of serial (running on a single node) applications,
>> lets say 5 different.
>>
>> * Our hardware is homogenous with 8cores/host.
>>
>> * Some of the applications require a "master process", that do I/O,
>> controlling and needs to be exclusively allocated to a single node. A
>> "master process" cannot be run in conjunction with any other
>> application. (Aka "Over-subscription")
>>
>> We have previously solved this problem, with the use of a "master.q"
>> containing dedicated hosts with slots=1 and a PE where
>> job_is_first_task being set. This results in a correct slot
>> allocation as shown below.
>>
>>
>> Fig:A - "master.q solution"
>>
>> job-ID  prior   name       user         state submit/start at     queue                          master ja-task-ID task-ID state cpu        mem     io      stat failed
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  1031 0.95734 test_check konrns       r     07/19/2010 10:53:22 mastr.q at ts201-c-1-1.sss.se.sca MASTER
>>  1031 0.95734 test_check konrns       r     07/19/2010 10:53:22 the.q at ts201-c-1-0.sss.se.scani SLAVE
>>                                                                 the.q at ts201-c-1-0.sss.se.scani SLAVE
>>                                                                 the.q at ts201-c-1-0.sss.se.scani SLAVE
>>                                                                 the.q at ts201-c-1-0.sss.se.scani SLAVE
>>                                                                 the.q at ts201-c-1-0.sss.se.scani SLAVE
>>                                                                 the.q at ts201-c-1-0.sss.se.scani SLAVE
>>
>> the.q at ts201-c-1-0.sss.se.scani SLAVE
>>
>> the.q at ts201-c-1-0.sss.se.scani SLAVE
>>
>>
>>
>> Now, I have two core problems to solve on which I need your help:
>>
>> Problem A - Using any host for any purpose.
>> -------------------------------------------
>> The problem with the "master.q" strategy is that hosts inside the
>> "mastr.q" (above), are not available for other jobs NOT requiering a
>> "master process". They idle until such a job enters the system, which
>> is not what we want.
>>
>> We want to be able to use all hosts as a potential "master host" and
>> lock it down once a "master process" enters it, AND, never let in a
>> "master process" on a node that runs anything at all.
>>
>> If we can solve this, we can use all hosts in a "parallell queue" for
>> any application, but wont risk failure when over-subscribing a host
>> running a master process.
>
> there is a possible setup, but as long as there is issue:
>
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2603
>
> reopened, it won't work. (The trick is to use a host-consumable to
> trigger an alarm in parallel.q (i.e. getting it disabled) when at
> least one slot in serial.q is used. serial.q OTOH is subordinated to
> parallel.q to make it exclusive also the other way.)
>
>
>> Problem B - Bow the way for VIP-jobs.
>> -------------------------------------
>> Once "the.q" becomes full enough, we have not found a good way to
>> "suspend/checkpoint" one - or a few - jobs, in order to free up "just
>> enough" resources for a "VIP-job" entering the system.
>>
>> A VIP-job can be any application, submitted at any time, with any
>> resource requests and with any time limit or whatever.
>>
>> We need a way to let OGE automatically "choose" enough "normal jobs",
>> selected based on the VIP-job requirements, suspend those, and make
>> sure the VIP-job will be started before any other job in a waiting
>> state.
>>
>> If more than one VIP-job is submitted,
>
> Unfortunately there is no feature: "run this job now, and suspend what
> ever is necessary to get it working":
>
> http://gridengine.sunsource.net/ds/viewMessage.do?dsMessageId=228184&d
> sForumId=38
>
>
>> VIP-jobs currently running should be protected from suspension.
>
> Would need:
>
> http://gridengine.sunsource.net/issues/show_bug.cgi?id=3162
>
> This would need a complete rewrite of the scheduler, which could also
> include more real-time and cron-like features then. There were several
> on the list.
>
> -- Reuti
>
>
>>
>>
>> I have been trying to forumlate my problems as good as I can, and I
>> greatly apprechiate any ideas for a setup.
>>
>> Kind regards
>> /Erik Lönroth, Technical Responsible on High Performance Computing,
>> Scania Infomate AB - Sweden.
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMess
>> a
>> geId=269892
>>
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
> geId=269956
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessa
> geId=270429
>
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=270487

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=270586

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list