[GE users] Parallel job being allocated slots in different queues

robhorton r.horton at qmul.ac.uk
Mon Jan 18 12:16:13 GMT 2010


Hi,

We've got two queues for parallel jobs, parallel.q and longparallel.q.
They have basically the same configuration except that longparallel.q
has a longer h_rt and has a limited userlist. Each queue has the other
as a subordinate queue.

This was working fine, but I've just seen a job which appears to have
been allocated slots in both queues which have then both been suspended
meaning that the job doesn't run, i.e.

andromeda:~>qstat -g t | grep 183323                                                                                                                   10:42am
 183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22 parallel.q at comp002. SLAVE         
 183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22 parallel.q at comp003. MASTER        
 183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22 parallel.q at comp004. SLAVE         
 183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22 parallel.q at comp012. SLAVE         
...
 183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22 longparallel.q at comp001. SLAVE         
 183323 1.60000 Parsek_dam user           S     01/16/2010 23:27:22 longparallel.q at comp002. SLAVE         
...

When the job was deleted and resubmitted it was scheduled as I would
expect. I've not seen anything similar happen before (the setup hasn't
changed for around six months). I'm running 6.1u6.

Has anyone seen this before?

Thanks,
Rob

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=239495

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list