[GE users] (another) slotwise preemption question

cjf001 john.foley at motorola.com
Thu Aug 26 22:22:43 BST 2010


Hi guys - here's a non-licensing question for you for a change :)

I'm back into the depths of slotwise preemption, running
SGEv6.2u5 here on RHEL 5.2. I have 1 four-cpu (four slot)
machine I'm using for testing. I have 2 cluster queues -
"primary" and "secondary". "secondary" is subordinate to
"primary". My test job just sleeps for 4 minutes and then
dumps its environment.

When I load up the machine with, say, 8 jobs in the secondary
queue, all is well - 4 jobs running, and 4 jobs waiting. Then
when I add *one* job into the primary queue, it suspends one
of the secondary jobs, as expected with slotwise preemption.
Now we have 4 jobs running, one suspended, and 4 waiting.

If I use the "standard" suspension operation (no custom script),
the state of the jobs sits just like this until the primary
job completes - then the suspended job resumes - again, as
expected.

However, we use a custom suspension script here that actually
qdel's the suspended job, because we don't like them lying around
on the execute hosts using up memory (we'll resubmit them
later). When I use this suspension method, it gets a little
weird.....

What happens is that the suspended job disappears (from the qstat
output), as expected, since we killed it. So now we have 4 jobs
running (3 secondary and 1 primary), and 4 jobs waiting (all
secondary). But, for some reason, SGE isn't happy with that - it
tries to run one of the waiting jobs, even though all 4 slots are
full, and it's immediately suspended - so now we're back to 4 jobs
running and one suspended, with just 3 waiting now. We kill the
suspended job, and the same thing happens. Not what we were expecting....

So, question is, why is SGE trying to push a 5th job onto
a machine that has only 4 slots, and all 4 are "busy" ? And, is
there a way around this ?

    Thanks,

     John


-- 
###########################################################################
# John Foley                          # Location:  IL93-E1-21S            #
# IT & Systems Administration         # Maildrop:  IL93-E1-35O            #
# LV Simulation Cluster Support       #    Email: john.foley at motorola.com #
# Motorola, Inc. -  Mobile Devices    #    Phone: (847) 523-8719          #
# 600 North US Highway 45             #      Fax: (847) 523-5767          #
# Libertyville, IL. 60048  (USA)      #     Cell: (847) 460-8719          #
###########################################################################
               (this email sent using SeaMonkey on Windows)

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=277226

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list