[GE users] Problem using a hostgroup in -masterq and not in -q

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Fri Jun 30 10:38:36 BST 2006

Hi Pascal,

On Fri, 30 Jun 2006, Pascal GILGENKRANTZ wrote:

> Hello,
> We are facing an issue with the PE -masterq usage. The goal is to make
> sure that our PE job will run its "master" job only on our "@masters" 
> hostgroup, and all others sub-jobs of the PE in our "@slaves" hostgroup.
> But, with Grid Engine 6.0u8, it's *impossible* to run a PE job like this:
> % qsub -pe my_pe 10 -q q1@@slaves -masterq q1@@masters <command>
> the result is a job pending forever, with qstat -r always complaining: 
> "cannot run in PE "my_pe" because it only offers 100 slots" whereas I 
> requested only 10 slots...

Hm. Though this is inadequate diagnosis output, but fixing this wouldn't 
help you either.

> One way of make it working is to add the masterq group of host (@masters)
> in the -q option. This is not what we want to do, because there is a risk to
> allocate another "master" host as a slave, and we need to keep the masters 
> free for other PE jobs. One solution is to use a soft resource (-soft -l) to 
> specify
> that slaves must be used preferably, but still there is a chance to consume
> a master when there is a a lack of slave ressources.


> One solution would be to be able to specify different hard resources for the 
> -q
> and the -masterq options (grid engine issue 75), but it's not implemented. 
> Please feel free to share your experience if you have the similar problem !

It's interesting to get your view on it. Actually your expectation about 
behavoir of

    -q q1@@slaves -masterq q1@@masters

is quite fair!

To me this raises the question, if it were reasonable to slightly redefine 
relation between -q and -masterq to make the above work. New behaviour would 

    * -q request applies to all tasks, if no -masterq was specified
    * -q request applies to slave tasks only, if -masterq was specified

I claim, when you read sge_queue_match_static() function in 
libs/sched/sge_select_queue.c you see quickly how to change 
Grid Engine behaviour accordingly. The required change I would 
describe as follows

(1) switch order of sequence for JB_hard_queue_list checking (-q) and
     JB_master_hard_queue_list checking (-masterq)
(2) skip JB_hard_queue_list checking if JB_master_hard_queue_list checking
     went through successfully

in fact is just a minor change, if you're not afraid of C language. 
Possibly you anyways do compile Grid Engine by yourself?

What do others think about this?


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list