[GE users] Parallel job being allocated slots in different queues

reuti reuti at staff.uni-marburg.de
Mon Jan 18 12:55:10 GMT 2010


Hi,

Am 18.01.2010 um 13:16 schrieb robhorton:

> Hi,
>
> We've got two queues for parallel jobs, parallel.q and longparallel.q.
> They have basically the same configuration except that longparallel.q
> has a longer h_rt and has a limited userlist. Each queue has the other
> as a subordinate queue.
>
> This was working fine, but I've just seen a job which appears to have
> been allocated slots in both queues which have then both been  
> suspended
> meaning that the job doesn't run, i.e.
>
> andromeda:~>qstat -g t | grep  
> 183323                                                                 
>                                                    10:42am
>  183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22  
> parallel.q at comp002. SLAVE
>  183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22  
> parallel.q at comp003. MASTER
>  183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22  
> parallel.q at comp004. SLAVE
>  183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22  
> parallel.q at comp012. SLAVE
> ...
>  183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22  
> longparallel.q at comp001. SLAVE
>  183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22  
> longparallel.q at comp002. SLAVE
> ...
>
> When the job was deleted and resubmitted it was scheduled as I would
> expect. I've not seen anything similar happen before (the setup hasn't
> changed for around six months). I'm running 6.1u6.
>
> Has anyone seen this before?

yes, this is the normal behavior. SGE just collects slots from all  
eligible queues; maybe in former times your h_rt request was always  
in a range which does not fit into the normal queue. When you don't  
like behavior, you will have to define two PEs and bind only one to  
each queue. Once SGE selected a PE for a job, it will stay in this PE  
for sure. Best is to use a naming like "mype" and "mype_long", as you  
can then just specify: qusb -pe "mype*" ...

It's already an RFE that a parallel job should stay in a queue and/or  
hostgroup.

-- Reuti


> Thanks,
> Rob
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=239495
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=239516

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list