[GE users] Problem using a hostgroup in -masterq and not in -q
Andreas.Haas at Sun.COM
Andreas.Haas at Sun.COM
Fri Jun 30 10:38:36 BST 2006
On Fri, 30 Jun 2006, Pascal GILGENKRANTZ wrote:
> We are facing an issue with the PE -masterq usage. The goal is to make
> sure that our PE job will run its "master" job only on our "@masters"
> hostgroup, and all others sub-jobs of the PE in our "@slaves" hostgroup.
> But, with Grid Engine 6.0u8, it's *impossible* to run a PE job like this:
> % qsub -pe my_pe 10 -q q1@@slaves -masterq q1@@masters <command>
> the result is a job pending forever, with qstat -r always complaining:
> "cannot run in PE "my_pe" because it only offers 100 slots" whereas I
> requested only 10 slots...
Hm. Though this is inadequate diagnosis output, but fixing this wouldn't
help you either.
> One way of make it working is to add the masterq group of host (@masters)
> in the -q option. This is not what we want to do, because there is a risk to
> allocate another "master" host as a slave, and we need to keep the masters
> free for other PE jobs. One solution is to use a soft resource (-soft -l) to
> that slaves must be used preferably, but still there is a chance to consume
> a master when there is a a lack of slave ressources.
> One solution would be to be able to specify different hard resources for the
> and the -masterq options (grid engine issue 75), but it's not implemented.
> Please feel free to share your experience if you have the similar problem !
It's interesting to get your view on it. Actually your expectation about
-q q1@@slaves -masterq q1@@masters
is quite fair!
To me this raises the question, if it were reasonable to slightly redefine
relation between -q and -masterq to make the above work. New behaviour would
* -q request applies to all tasks, if no -masterq was specified
* -q request applies to slave tasks only, if -masterq was specified
I claim, when you read sge_queue_match_static() function in
libs/sched/sge_select_queue.c you see quickly how to change
Grid Engine behaviour accordingly. The required change I would
describe as follows
(1) switch order of sequence for JB_hard_queue_list checking (-q) and
JB_master_hard_queue_list checking (-masterq)
(2) skip JB_hard_queue_list checking if JB_master_hard_queue_list checking
went through successfully
in fact is just a minor change, if you're not afraid of C language.
Possibly you anyways do compile Grid Engine by yourself?
What do others think about this?
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users