[GE users] Accounting of Parallel Jobs
reuti at staff.uni-marburg.de
Tue Jan 29 22:12:48 GMT 2008
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Am 29.01.2008 um 22:24 schrieb Bradford, Matthew:
> integration with the SCore parallel environment, and SGE is unable
> to record accurate usage of a job's CPU time. We are looking at the
> ACCT_RESERVED_USAGE and SHARETREE_RESERVED_USAGE flags in the execd
> params, which provides an improvement to the reporting as it gives
> us (Wallclock time X slots), but the problem is, all the SCore
> parallel jobs only use 1 slot per node, even though they are using
> all 4 cores on a node. This would be OK if every job was a parallel
> SCore job, but some of the jobs are simple serial jobs, which run
> within a serial queue, and use 1 slot per core. The accounting
> problem is then that a serial job using 1 slot is reported to use
> the same amount of CPU as a parallel job, using 1 slot but 4 cores.
just submit also these jobs as parallel ones and request 4 slots. To
get them all on one node you need one PE with allocation_rule
$PE_SLOTS and 4 slots on this machine, as there are 4 cores. If you
need OTOH 4/8/12/... slots for this job in total you could
alternatively setup the allocation_rule to the fixed value 4.
In the extreme: make this queue a parallel only queue (qtype NONE)
and attach only one PE with fixed allocation rule 4.
> This will cause problems when looking at a sharetree set up, as a
> group which tends to run serial jobs will be penalised compared to
> a group that tends to run parallel jobs.
> Is there any way of scaling the usage of the slots on a cluster
> queue basis, so that a single slot within a parallel queue is
> equivalent to 4 slots within a serial queue.
> Alternatively, and in the longer term, is there any intention of
> providing the functionality where a user can request number of
> nodes, and then number of cores per node, rather than the single
> "slots" parameter. This would mean that the current configuration
> that we are using, where the parallel queues only offer 1 slot,
> could be changed so that SGE understands that a user is requesting
> multiple cores, and would reduce the reporting anomaly.
> Any advice would be much appreciated.
> Matthew Bradford
> Information Analyst
> Applications Services Field Operations EMEA
> UKIMEA RABU
> EDS c/o Rolls-Royce Plc, Moor Lane
> PO Box 31
> DE24 8BJ
> email: matthew.bradford at eds.com
> Office: +44 01332 2 22059
> This message contains information which may be confidential and
> privileged. Unless you are the intended addressee (or authorised to
> receive for the addressee) you may not use, copy or disclose to
> anyone the message or any information contained in this message. If
> you have received this message in error, please advise the sender
> by reply email and delete the message.
> ? 2005 Electronic Data Systems Corporation. All rights reserved.
> Electronic Data Systems Ltd
> Registered Office:, Lansdowne House, Berkeley Square, London W1J 6ER
> Registered in England no: 53419
> VAT number: 432 99 5915
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users