[GE users] (another) slotwise preemption question

cjf001 john.foley at motorola.com
Fri Aug 27 17:23:32 BST 2010


Reuti -

thanks for the additional info. I was also thinking of something along
those lines, and the checkpointing solution is also interesting. I'll
admit that I'm also considering upgrading to u6 (we have a scheduled
downtime in a few weeks) and using it for 90 days, during which time
I'd cross my fingers and hope that something "positive" comes out
of all the licensing talk we've seen in the past few weeks. If it
doesn't, I could always rollback and use one of these "workarounds",
I guess.

Anyway, I do have one more question - without getting into too
much detail, the way this whole thing (slotwise suspension problem)
came to my attention was that one of our user groups submits a
lot of array jobs, and the "co-scheduler" (I like that term ! --
thanks, GQ) that I wrote to handle the job kills and resubmits wasn't
handling them properly. Do you have any thoughts on whether or
not the checkpointing solution will handle the array jobs correctly ?
That is, when just one of an array-job's tasks gets suspended, would it
submit *just* that one task again ?
    
     Thanks,

       John


reuti wrote:
> Am 27.08.2010 um 17:16 schrieb cjf001:
>
>> Thanks Daniel - I saw your post at
>>
>> http://blogs.sun.com/templedf/entry/better_preemption
>>
>> where I think you talk about this issue at the bottom.
>> So, it all comes back to getting the u6 binaries !
>>
>> I did notice in the u6 release notes several "known
>> issues" with the slotwise suspension, but at first
>> glance they don't look like show-stoppers.
>>
>> So, quick question - is the "P" state a new state that
>> you see in something like the qstat output ? I don't
>> see it on my "cheatsheet" of states.
>
> One additional thought on this: it's possible to use a consumable complex to send the secondary queue to alarm state and avoid the black hole. For a 4 slot queue:
>
> $ qconf -sc
> #name               shortcut   type        relop   requestable consumable default  urgency
> ...
> free_slots          fs         INT<=      YES         YES        1        0
>
> $ qconf -sq secondary.q
> ...
> load_thresholds       free_slots=1
>
> $ qconf -se node01
> ...
> complex_value free_slots=5
>
>
> Whether you use your custom suspend method or the checkpointing enviroment is personal taste. After the value of free_slots went down to 0 (due to one primary job), it will increase to 1 again, when the suspended job left the system. Nevertheless 4 are used up again, and the secondary queue is in alarm state.
>
> -- Reuti
>
>
>>     Thanks,
>>
>>        John
>>
>>
>> dagru wrote:
>>> With 6.2 update 6 it was an enhancement (suspension
>>> prevention) introduced which fixes your issue.
>>> When a queue instance is "full" (in terms of: "the next
>>> job would be suspended") it goes into the preempted
>>> state (P). This means the qinstance is not considered
>>> by the scheduler anymore for further dispatching jobs
>>> into it. It searches a qinstance where it can let the
>>> job run immediately. If none is found, or other resource
>>> requests do not not match, it stays in qw.
>>>
>>> Daniel
>>>
>>>
>>> Am Donnerstag, den 26.08.2010, 16:22 -0500 schrieb cjf001:
>>>> Hi guys - here's a non-licensing question for you for a change :)
>>>>
>>>> I'm back into the depths of slotwise preemption, running
>>>> SGEv6.2u5 here on RHEL 5.2. I have 1 four-cpu (four slot)
>>>> machine I'm using for testing. I have 2 cluster queues -
>>>> "primary" and "secondary". "secondary" is subordinate to
>>>> "primary". My test job just sleeps for 4 minutes and then
>>>> dumps its environment.
>>>>
>>>> When I load up the machine with, say, 8 jobs in the secondary
>>>> queue, all is well - 4 jobs running, and 4 jobs waiting. Then
>>>> when I add *one* job into the primary queue, it suspends one
>>>> of the secondary jobs, as expected with slotwise preemption.
>>>> Now we have 4 jobs running, one suspended, and 4 waiting.
>>>>
>>>> If I use the "standard" suspension operation (no custom script),
>>>> the state of the jobs sits just like this until the primary
>>>> job completes - then the suspended job resumes - again, as
>>>> expected.
>>>>
>>>> However, we use a custom suspension script here that actually
>>>> qdel's the suspended job, because we don't like them lying around
>>>> on the execute hosts using up memory (we'll resubmit them
>>>> later). When I use this suspension method, it gets a little
>>>> weird.....
>>>>
>>>> What happens is that the suspended job disappears (from the qstat
>>>> output), as expected, since we killed it. So now we have 4 jobs
>>>> running (3 secondary and 1 primary), and 4 jobs waiting (all
>>>> secondary). But, for some reason, SGE isn't happy with that - it
>>>> tries to run one of the waiting jobs, even though all 4 slots are
>>>> full, and it's immediately suspended - so now we're back to 4 jobs
>>>> running and one suspended, with just 3 waiting now. We kill the
>>>> suspended job, and the same thing happens. Not what we were expecting....
>>>>
>>>> So, question is, why is SGE trying to push a 5th job onto
>>>> a machine that has only 4 slots, and all 4 are "busy" ? And, is
>>>> there a way around this ?
>>>>
>>>>      Thanks,
>>>>
>>>>       John
>>>>
>>>>
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=277373
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>>
>>
>>
>> --
>> ###########################################################################
>> # John Foley                          # Location:  IL93-E1-21S            #
>> # IT&  Systems Administration         # Maildrop:  IL93-E1-35O            #
>> # LV Simulation Cluster Support       #    Email: john.foley at motorola.com #
>> # Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
>> # 600 North US Highway 45             #      Fax: (847) 523-5767          #
>> # Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
>> ###########################################################################
>>                (this email sent using SeaMonkey on Windows)
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=277457
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=277478
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



-- 
###########################################################################
# John Foley                          # Location:  IL93-E1-21S            #
# IT & Systems Administration         # Maildrop:  IL93-E1-35O            #
# LV Simulation Cluster Support       #    Email: john.foley at motorola.com #
# Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
# 600 North US Highway 45             #      Fax: (847) 523-5767          #
# Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
###########################################################################
               (this email sent using SeaMonkey on Windows)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=277483

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list