[GE users] Accounting of Parallel Jobs

Reuti reuti at staff.uni-marburg.de
Wed Jan 30 21:47:41 GMT 2008


Hi,

Am 30.01.2008 um 11:37 schrieb Bradford, Matthew:

> I'm not sure whether I explained everything very well.
>
> Currently, when a user wants to submit an SCore job, they use  
> something
> like the following within their submitted script:
>
> #$ masterq=masterq at headnode

aha, now I see. I thought that there was the discussion, that also  
the slave tasks will possibly end up on the head node?

> 	
> 	scrun 8x4 <application>
>
> Where 8 represents the number of execution nodes they require, and 4
> represents the number of cores per node.
>
> They then request 9 nodes via SGE with
>
> 	qsub -pe score 9 <application_script>

But this is a different/bigger issue than just accounting, more:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=75

and all referenced issues.


> Where 9 equates to the 8 execution nodes plus 1 extra for the parallel
> jobs master node (which is the head node of the cluster). This is
> stripped out using the PE start up script, which then populates the
> SCore machine file and launches the SCore job.
>
> The thought is that there should never be more than 1 parallel job
> running on an execution node, for performance reasons, which is why  
> the
> parallel queue has only 1 slot.

And if you also request 4 slots in the masterq? Then it might of  
course compute too much usage...

Raed on...

> The parallel queue only accepts parallel jobs, and there is a separate
> queue for serial jobs, which has the same number of slots as there are
> cores on the node. To prevent serial jobs and parallel jobs running on
> the same node, the queues are sub-ordinates of each other.
>
> It could be possible using the different allocation rules as you
> suggest, and modify the PE startup scripts to provide the machine file
> in the correct format for SCore, but it would also cause the master  
> node
> to use 4 slots as well, which is undesirable. Also, this would be a
> static configuration, and if the user wanted to request (scrun 8x2
> <application>) then we'd need another parallel environment, which is a
> possibility. I'll need to investigate this further.
>
> Thanks,
>
> Mat
>
>
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: 29 January 2008 22:13
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] Accounting of Parallel Jobs
>
> Hi,
>
> Am 29.01.2008 um 22:24 schrieb Bradford, Matthew:
>
>> integration with the SCore parallel environment, and SGE is unable to
>> record accurate usage of a job's CPU time. We are looking at the

Why is cputime not working? It's not tightly integrated?

-- Reuti


>> ACCT_RESERVED_USAGE and SHARETREE_RESERVED_USAGE flags in the execd
>> params, which provides an improvement to the reporting as it gives us
>> (Wallclock time X slots), but the problem is, all the SCore parallel
>> jobs only use 1 slot per node, even though they are using all 4 cores
>> on a node. This would be OK if every job was a parallel SCore job,  
>> but
>
>> some of the jobs are simple serial jobs, which run within a serial
>> queue, and use 1 slot per core. The accounting problem is then that a
>> serial job using 1 slot is reported to use the same amount of CPU  
>> as a
>
>> parallel job, using 1 slot but 4 cores.
>>
> just submit also these jobs as parallel ones and request 4 slots.  
> To get
> them all on one node you need one PE with allocation_rule $PE_SLOTS  
> and
> 4 slots on this machine, as there are 4 cores. If you need OTOH
> 4/8/12/... slots for this job in total you could alternatively  
> setup the
> allocation_rule to the fixed value 4.
>
> In the extreme: make this queue a parallel only queue (qtype NONE) and
> attach only one PE with fixed allocation rule 4.
>
> -- Reuti
>
>> This will cause problems when looking at a sharetree set up, as a
>> group which tends to run serial jobs will be penalised compared to a
>> group that tends to run parallel jobs.
>>
>> Is there any way of scaling the usage of the slots on a cluster queue
>> basis, so that a single slot within a parallel queue is equivalent to
>> 4 slots within a serial queue.
>>
>> Alternatively, and in the longer term, is there any intention of
>> providing the functionality where a user can request number of nodes,
>> and then number of cores per node, rather than the single "slots"
>> parameter. This would mean that the current configuration that we are
>> using, where the parallel queues only offer 1 slot, could be changed
>> so that SGE understands that a user is requesting multiple cores, and
>> would reduce the reporting anomaly.
>>
>> Any advice would be much appreciated.
>>
>> Thanks,
>>
>> Mat
>>
>>
>> Matthew Bradford
>> Information Analyst
>> Applications Services Field Operations EMEA UKIMEA RABU EDS c/o
>> Rolls-Royce Plc, Moor Lane PO Box 31 Derby
>> DE24 8BJ
>>
>> email:  matthew.bradford at eds.com
>> Office: +44 01332 2 22059
>>
>> This message contains information which may be confidential and
>> privileged. Unless you are the intended addressee (or authorised to
>> receive for the addressee) you may not use, copy or disclose to  
>> anyone
>
>> the message or any information contained in this message. If you have
>> received this message in error, please advise the sender by reply
>> email and delete the message.
>> (c) 2005 Electronic Data Systems Corporation. All rights reserved.
>>
>> Electronic Data Systems Ltd
>> Registered Office:, Lansdowne House, Berkeley Square, London  W1J 6ER
>> Registered in England no: 53419 VAT number: 432 99 5915
>>
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list