[GE issues] [Issue 3281] New - Complex EXCL with consumable JOB needs completely free nodes during scheduling
reuti at staff.uni-marburg.de
Sun Aug 29 14:37:55 BST 2010
Summary|Complex EXCL with consumable JOB needs completely free
| nodes during scheduling
------- Additional comments from reuti at sunsource.net Sun Aug 29 06:37:51 -0700 2010 -------
Having a complex:
#name shortcut type relop requestable consumable default urgency
master mst BOOL EXCL YES JOB 0 1000
will "subtract" the consumable only once. When it's not a global consumable but a host one, it will be honored only on the master node of a
parallel job. Submitting such a request:
$ qsub -pe mpich 7 -l master test.sh
in an empty cluster works fine, and the "master" complex will give complete access to the elected master node. Of course, the number for the
remaining slots on the master node must be adjusted to honor this cut-off, i.e. slots=(needed)-1+(slots per host) for a PE with $fill_up.
Once the job is running some serial jobs can be submitted and fill the gaps on the slave nodes (this conforms to the output of `qhost -F
master`, that it's only changed on the master node of the parallel job).
But when there are already some serial jobs running in the cluster, the above job is less likely to start, as it seems that during
scheduling the EXCL complex will be checked for all slaves too. The output of `qstat -j <jobid>` shows an error like:
scheduling info: cannot run in PE "mpich" because it only offers 4 slots
But this reflects only one complete free node, which would be good for the master. There are more free slots scattered around the cluster.
In addition, `qalter -w v/p <jobid>` ouptuts "no suitable queues" for a waiting job like this. For "-w v" (which assumes an empty cluster)
it's wrong - the job will start once the former serial jobs are gone. For "-w p" it corresponds with the ouput of `qstat -j <jobid>`,
nevertheless it's also wrong, as the job could run even with other jobs in place.
To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users