[GE users] Specifying maximum number of jobs per node
reuti at staff.uni-marburg.de
Mon Sep 22 17:17:08 BST 2008
Am 22.09.2008 um 11:00 schrieb Bradford, Matthew:
> We have a similar problem to Craig, and I don't think the suggested
> solution quite fits our requirements.
> We have a cluster containing both 4 cores and 8 cores nodes, with
> all nodes being allowed to run any job if they are available. We
> don't want to partition the cluster up in terms of types of batch/
> PE jobs so that any job could run on any node, however, we also
> have a requirement that if an MPI parallel job, which spans more
> than 1 node, is running, then no other jobs may run on those nodes.
> If a serial, single core job is running on a node, then other
> single core jobs can also run on that node, but no parallel jobs
> can be started on that node.
> We currently use mutual subordination between queues, with a
> parallel queue with a single slot and various PEs and also a serial
> queue, with 1 slot per core.
> Due to the issues with queue subordination preventing resource
> reservation functioning correctly, we are looking at having a
> configuration with a single, or as few queues as possible, with 1
> slot per core and no queue subordination. When users only want to
> request the number of cores for a specific job, then this is fine,
> as we can have parallel environments with allocation rules locked
> down to either 4 cores or 8 cores.
> If a user submits a request such as:
> qsub -pe mpi_* 32 mpi_application
> then SGE will fit the job on either 8 4-core machines or 4 8-core
> machines, which is fine, and the usage accounting is accurate. (We
> are using ACCT_RESERVED_USAGE and SHARETREE_RESERVED_USAGE, so jobs
> are accounted for as NSLots x Time.)
> The problem we have is that we sometimes have a case where the user
> may want to specify the number of nodes over which they want to
> execute the job, and only want to use 2 cores per node. Such as:
> qsub_wrapper -pe mpi_* 8x2 mpi_application
> but they don't want any other jobs to be able to start on those
> nodes. If we multiply the requested nodes by 4 in the qsub_wrapper,
> then the job could run on 8 4-cores nodes, as the requested 32
> slots would use up all the slots on those nodes, and the start-up
> script for the selected parallel environment would modify the PE
> machine file accordingly to only add any one node twice. In this
> way, SGE thinks the node is full, it accounts correctly for the
> usage, but the integrated PE only tries to start 2 processes per node.
> This is fine when we are in a homogeneous cluster with all nodes
> having the same number of cores as it allows us to multiply each
> slot request by a constant. When we have a cluster that contains 4
> and 8 core machines, then we don't know what constant to multiply
> the slot request by in the qsub_wrapper at submission time, and
> therefore, in the above example, the job may run on four 8-core
> machines rather than eight 4-core machines.
> We need to be able:
> 1. to allow users to specify the number of nodes,
> 2. to allow exclusive access to that node,
> 3. to account correctly using the RESERVED_USAGE parameters, (1
> slot per core and all slots used up for a running job).
> 4. Not use subordination as it breaks resource reservation.
I see what you want to do, but this is not directly supported right
now out-of-the-box. There is already an RFE, to implement exclusive
node usage for more advanced setups:
> If this doesn't make any sense then I'll have another go at
> explaining it.
> Any help would be much appreciated.
> Thanks very much,
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users