[GE users] Specifying maximum number of jobs per node

Reuti reuti at staff.uni-marburg.de
Mon Sep 22 17:17:08 BST 2008


Am 22.09.2008 um 11:00 schrieb Bradford, Matthew:

> We have a similar problem to Craig, and I don't think the suggested  
> solution quite fits our requirements.
>
> We have a cluster containing both 4 cores and 8 cores nodes, with  
> all nodes being allowed to run any job if they are available. We  
> don't want to partition the cluster up in terms of types of batch/ 
> PE jobs so that any job could run on any node, however, we also  
> have a requirement that if an MPI parallel job, which spans more  
> than 1 node, is running, then no other jobs may run on those nodes.  
> If a serial, single core job is running on a node, then other  
> single core jobs can also run on that node, but no parallel jobs  
> can be started on that node.
>
> We currently use mutual subordination between queues, with a  
> parallel queue with a single slot and various PEs and also a serial  
> queue, with 1 slot per core.
>
> Due to the issues with queue subordination preventing resource  
> reservation functioning correctly, we are looking at having a  
> configuration with a single, or as few queues as possible, with 1  
> slot per core and no queue subordination. When users only want to  
> request the number of cores for a specific job, then this is fine,  
> as we can have parallel environments with allocation rules locked  
> down to either 4 cores or 8 cores.
>
> If a user submits a request such as:
>
> qsub -pe mpi_* 32 mpi_application
>
> then SGE will fit the job on either 8 4-core machines or 4 8-core  
> machines, which is fine, and the usage accounting is accurate. (We  
> are using ACCT_RESERVED_USAGE and SHARETREE_RESERVED_USAGE, so jobs  
> are accounted for as NSLots x Time.)
>
> The problem we have is that we sometimes have a case where the user  
> may want to specify the number of nodes over which they want to  
> execute the job, and only want to use 2 cores per node. Such as:
>
> qsub_wrapper -pe mpi_* 8x2 mpi_application
>
> but they don't want any other jobs to be able to start on those  
> nodes. If we multiply the requested nodes by 4 in the qsub_wrapper,  
> then the job could run on 8 4-cores nodes, as the requested 32  
> slots would use up all the slots on those nodes, and the start-up  
> script for the selected parallel environment would modify the PE  
> machine file accordingly to only add any one node twice. In this  
> way, SGE thinks the node is full, it accounts correctly for the  
> usage, but the integrated PE only tries to start 2 processes per node.
>
> This is fine when we are in a homogeneous cluster with all nodes  
> having the same number of cores as it allows us to multiply each  
> slot request by a constant. When we have a cluster that contains 4  
> and 8 core machines, then we don't know what constant to multiply  
> the slot request by in the qsub_wrapper at submission time, and  
> therefore, in the above example, the job may run on four 8-core  
> machines rather than eight 4-core machines.
>
> We need to be able:
>
> 1. to allow users to specify the number of nodes,
> 2. to allow exclusive access to that node,
> 3. to account correctly using the RESERVED_USAGE parameters, (1  
> slot per core and all slots used up for a running job).
> 4. Not use subordination as it breaks resource reservation.

I see what you want to do, but this is not directly supported right  
now out-of-the-box. There is already an RFE, to implement exclusive  
node usage for more advanced setups:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=2629

-- Reuti


> If this doesn't make any sense then I'll have another go at  
> explaining it.
>
> Any help would be much appreciated.
>
> Thanks very much,
>
> Mat


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list