[GE users] Accounting of Parallel Jobs

Reuti reuti at staff.uni-marburg.de
Tue Jan 29 22:12:48 GMT 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Am 29.01.2008 um 22:24 schrieb Bradford, Matthew:

> integration with the SCore parallel environment, and SGE is unable  
> to record accurate usage of a job's CPU time. We are looking at the  
> ACCT_RESERVED_USAGE and SHARETREE_RESERVED_USAGE flags in the execd  
> params, which provides an improvement to the reporting as it gives  
> us (Wallclock time X slots), but the problem is, all the SCore  
> parallel jobs only use 1 slot per node, even though they are using  
> all 4 cores on a node. This would be OK if every job was a parallel  
> SCore job, but some of the jobs are simple serial jobs, which run  
> within a serial queue, and use 1 slot per core. The accounting  
> problem is then that a serial job using 1 slot is reported to use  
> the same amount of CPU as a parallel job, using 1 slot but 4 cores.
>
just submit also these jobs as parallel ones and request 4 slots. To  
get them all on one node you need one PE with allocation_rule  
$PE_SLOTS and 4 slots on this machine, as there are 4 cores. If you  
need OTOH 4/8/12/... slots for this job in total you could  
alternatively setup the allocation_rule to the fixed value 4.

In the extreme: make this queue a parallel only queue (qtype NONE)  
and attach only one PE with fixed allocation rule 4.

-- Reuti

> This will cause problems when looking at a sharetree set up, as a  
> group which tends to run serial jobs will be penalised compared to  
> a group that tends to run parallel jobs.
>
> Is there any way of scaling the usage of the slots on a cluster  
> queue basis, so that a single slot within a parallel queue is  
> equivalent to 4 slots within a serial queue.
>
> Alternatively, and in the longer term, is there any intention of  
> providing the functionality where a user can request number of  
> nodes, and then number of cores per node, rather than the single  
> "slots" parameter. This would mean that the current configuration  
> that we are using, where the parallel queues only offer 1 slot,  
> could be changed so that SGE understands that a user is requesting  
> multiple cores, and would reduce the reporting anomaly.
>
> Any advice would be much appreciated.
>
> Thanks,
>
> Mat
>
>
> Matthew Bradford
> Information Analyst
> Applications Services Field Operations EMEA
> UKIMEA RABU
> EDS c/o Rolls-Royce Plc, Moor Lane
> PO Box 31
> Derby
> DE24 8BJ
>
> email:  matthew.bradford at eds.com
> Office: +44 01332 2 22059
>
> This message contains information which may be confidential and  
> privileged. Unless you are the intended addressee (or authorised to  
> receive for the addressee) you may not use, copy or disclose to  
> anyone the message or any information contained in this message. If  
> you have received this message in error, please advise the sender  
> by reply email and delete the message.
> ? 2005 Electronic Data Systems Corporation. All rights reserved.
>
> Electronic Data Systems Ltd
> Registered Office:, Lansdowne House, Berkeley Square, London  W1J 6ER
> Registered in England no: 53419
> VAT number: 432 99 5915
>
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list