[GE users] release a resource in a middle of a job by the job itself

reuti reuti at staff.uni-marburg.de
Wed Jul 14 18:33:37 BST 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Lily,

Am 14.07.2010 um 17:58 schrieb lily:

> Hi, Reuti,
> 
> We have similar need to release resources (actually the slots) in the middle of the running jobs. But the job is an MPI job, so we can't split it into multiple jobs. 
> 
> For a job that has load balancing problem ( depending on input data), some MPI ranks finish early, but have to wait for the whole job to come down, and thus wasting a lots of node hours. Is there a way to release the slots occupied by the early finished MPI ranks to qmaster/scheduler?
> 
> We are advised to increase the slots number configured for the node when an MPI rank finishes its work and goes to MPI_Finalize(), is it safe to do so?

well, this is not so straight forward as it looks at the first  glance. When I get you right, one or more of the many ranks of of the job will reach MPI_Finalize() earlier than others, and hence would wait there for the other ranks to finish. This would imply to issue a change of the configuration just before this statement.

Even if you would do this, the configuration would need to be changed after the job again - back to the intended value. When you have many Open MPI jobs running at the same time, this might be complicated to decide which other tasks are also in the same state. Besides this, changing a configuration by a parallel job could lead to a so called race-condition where two or more instances of the slaves try the change the value at nearly the same time and mess up the configuration completely.

I'm not aware of any job scheduler, which allows varying resource requests over the runtime of a job. Having a cut-off of the resources of not well balanced parallel jobs is a known problem with some applications.

==

How is your application organized? I mean, the idea of MPI is to split the data to several nodes - in an ideal situation in an even distribution. Then later on the results from all nodes are collected. Do you know at any time near the start of the job, that it will not run all tasks for the same time?

==

Anyway, a safe solution would be to oversubscribe nodes by intention and use a load_threshold to avoid performance impacts in a bigger scale where one job is slowed down by another one.

- define e.g. 12 slots for an 8 core machine
- define a load_adjustment for np_load_avg of 1 in the scheduler configuration, the job_load_adjustments can be 10 minutes or so
- define a load_thresholds of np_load_avg=1.1 in the queue definition, this will put the queue into alarm state, if the average load is higher than 1.1. This simply means, that it won't accept any further jobs, despite the fact that slots are free. Already scheduled jobs on this node will continue without any effect.

Experiment with with, whether it's a possible solution for your situation. (In fact, my impression of "load_thresholds" was always, that it was designed exactly for this purpose: oversubscribe big SMP machines (e.g. former SPARC ones) by intention. As long as typical PC cluster was made of COTS nodes with one or two cores, this feature was not of much use.)

-- Reuti


> Regards,
> Lily
> 
> -----Original Message-----
> From: reuti [mailto:reuti at staff.uni-marburg.de] 
> Sent: Tuesday, June 15, 2010 12:47 PM
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] release a resource in a middle of a job by the job itself
> 
> Hi,
> 
> Am 15.06.2010 um 15:39 schrieb introx:
> 
>> I could do so, the problem is that I don't know in advance how many parts will be in this chain...
> 
> maybe you can put this in two array jobs:
> 
> -- For job_a.sh you set -tc 1 (only one running at a time) and an range by -t which will most likely cover all steps in the worst case and requesting a license.
> 
> -- For job_b.sh you set a -hold_jid_ad with the same -t range (waiting for each steps finish). When this step doesn't need a license, even more than one could start at a time (they are limited by job_a.sh anayway).
> 
> -- Reuti
> 
> 
>> On Tue, Jun 15, 2010 at 4:26 PM, dr_st <stephane.teletchea at univ-nantes.fr> wrote:
>> introx a écrit :
>>> Hi,
>>> 
>>> I have a job which uses a licenses but it doesn't need the license all
>>> the time so it can release it when it doesn't use it.
>>> 
>>> Can a job release a resource (such as license resource...) in the middle
>>> of its operation?
>>> And if so can it acquire this resource later on?
>>> 
>>> 
>>> Thanks
>>> Erez
>>> 
>> 
>> You could try to split your main job:
>> - job_A.sh launches the program with license, and when complete launches
>> - job_B.sh which performs the rest of the analysis.
>> 
>> Chaining jobs is probably a good option there, no?
>> 
>> Cheers,
>> Stéphane
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=262127
>> 
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>> 
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=262199
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 
> This e-mail, including any attachments and response string, may contain proprietary information which is confidential and may be legally privileged. It is for the intended recipient only. If you are not the intended recipient or transmission error has misdirected this e-mail, please notify the author by return e-mail and delete this message and any attachment immediately. If you are not the intended recipient you must not use, disclose, distribute, forward, copy, print or rely on this e-mail in any way except as permitted by the author.
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=268000
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=268022

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list