[GE users] Accounting of Parallel Jobs

Reuti reuti at staff.uni-marburg.de
Wed Jan 30 21:47:41 GMT 2008


Am 30.01.2008 um 11:37 schrieb Bradford, Matthew:

> I'm not sure whether I explained everything very well.
> Currently, when a user wants to submit an SCore job, they use  
> something
> like the following within their submitted script:
> #$ masterq=masterq at headnode

aha, now I see. I thought that there was the discussion, that also  
the slave tasks will possibly end up on the head node?

> 	scrun 8x4 <application>
> Where 8 represents the number of execution nodes they require, and 4
> represents the number of cores per node.
> They then request 9 nodes via SGE with
> 	qsub -pe score 9 <application_script>

But this is a different/bigger issue than just accounting, more:


and all referenced issues.

> Where 9 equates to the 8 execution nodes plus 1 extra for the parallel
> jobs master node (which is the head node of the cluster). This is
> stripped out using the PE start up script, which then populates the
> SCore machine file and launches the SCore job.
> The thought is that there should never be more than 1 parallel job
> running on an execution node, for performance reasons, which is why  
> the
> parallel queue has only 1 slot.

And if you also request 4 slots in the masterq? Then it might of  
course compute too much usage...

Raed on...

> The parallel queue only accepts parallel jobs, and there is a separate
> queue for serial jobs, which has the same number of slots as there are
> cores on the node. To prevent serial jobs and parallel jobs running on
> the same node, the queues are sub-ordinates of each other.
> It could be possible using the different allocation rules as you
> suggest, and modify the PE startup scripts to provide the machine file
> in the correct format for SCore, but it would also cause the master  
> node
> to use 4 slots as well, which is undesirable. Also, this would be a
> static configuration, and if the user wanted to request (scrun 8x2
> <application>) then we'd need another parallel environment, which is a
> possibility. I'll need to investigate this further.
> Thanks,
> Mat
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: 29 January 2008 22:13
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Accounting of Parallel Jobs
> Hi,
> Am 29.01.2008 um 22:24 schrieb Bradford, Matthew:
>> integration with the SCore parallel environment, and SGE is unable to
>> record accurate usage of a job's CPU time. We are looking at the

Why is cputime not working? It's not tightly integrated?

-- Reuti

>> params, which provides an improvement to the reporting as it gives us
>> (Wallclock time X slots), but the problem is, all the SCore parallel
>> jobs only use 1 slot per node, even though they are using all 4 cores
>> on a node. This would be OK if every job was a parallel SCore job,  
>> but
>> some of the jobs are simple serial jobs, which run within a serial
>> queue, and use 1 slot per core. The accounting problem is then that a
>> serial job using 1 slot is reported to use the same amount of CPU  
>> as a
>> parallel job, using 1 slot but 4 cores.
> just submit also these jobs as parallel ones and request 4 slots.  
> To get
> them all on one node you need one PE with allocation_rule $PE_SLOTS  
> and
> 4 slots on this machine, as there are 4 cores. If you need OTOH
> 4/8/12/... slots for this job in total you could alternatively  
> setup the
> allocation_rule to the fixed value 4.
> In the extreme: make this queue a parallel only queue (qtype NONE) and
> attach only one PE with fixed allocation rule 4.
> -- Reuti
>> This will cause problems when looking at a sharetree set up, as a
>> group which tends to run serial jobs will be penalised compared to a
>> group that tends to run parallel jobs.
>> Is there any way of scaling the usage of the slots on a cluster queue
>> basis, so that a single slot within a parallel queue is equivalent to
>> 4 slots within a serial queue.
>> Alternatively, and in the longer term, is there any intention of
>> providing the functionality where a user can request number of nodes,
>> and then number of cores per node, rather than the single "slots"
>> parameter. This would mean that the current configuration that we are
>> using, where the parallel queues only offer 1 slot, could be changed
>> so that SGE understands that a user is requesting multiple cores, and
>> would reduce the reporting anomaly.
>> Any advice would be much appreciated.
>> Thanks,
>> Mat
>> Matthew Bradford
>> Information Analyst
>> Applications Services Field Operations EMEA UKIMEA RABU EDS c/o
>> Rolls-Royce Plc, Moor Lane PO Box 31 Derby
>> DE24 8BJ
>> email:  matthew.bradford at eds.com
>> Office: +44 01332 2 22059
>> This message contains information which may be confidential and
>> privileged. Unless you are the intended addressee (or authorised to
>> receive for the addressee) you may not use, copy or disclose to  
>> anyone
>> the message or any information contained in this message. If you have
>> received this message in error, please advise the sender by reply
>> email and delete the message.
>> (c) 2005 Electronic Data Systems Corporation. All rights reserved.
>> Electronic Data Systems Ltd
>> Registered Office:, Lansdowne House, Berkeley Square, London  W1J 6ER
>> Registered in England no: 53419 VAT number: 432 99 5915
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list