[GE users] Difference in number of allocated and execution slots with PE: Wall clock time in loose and tight intregration

Reuti reuti at staff.uni-marburg.de
Fri Apr 25 14:24:12 BST 2008


Hi,

Am 25.04.2008 um 11:01 schrieb Azhar Ali Shah:

> Many thanks Reuti.
> There is also another related issue:
>
> with daemonless smpd method using SSH (loosely integrated) gives  
> following wall clock time s for 3 jobs that require 4 processors:
>
> J1:  00:30:34
> J2:  01:17:53
> J3:  04:02:18

with a loose integration the accounting records for the slave tasks  
will be missing in the accounting file. Hence you get only the  
wallclock time of the master task.

(With loose you mean you didn't change the rsh_daemon and so on in  
SGE's configuration?)

> when these jobs are run with tight integration (daemon based smpd)  
> again requiring 4 processors, the wall clock time changes to:
>
> J1: 00:34:44
> J2: 02:58: 51
> J3: 10: 31:44
>
> Though the wall clock time for J1 is approximately same what makes  
> J2 and J3 to differ so long?

Depending on the chosen allocation by SGE, the wallclock time must be  
devided by the number of daemons. To check the absolute wallclock  
time, use only the last entry in qacct -j for this job.

J2: could be 1 daemon in addition to the idling master task
J3: Dunno without the slot allocation in both cases

(There are more cores than jobs in each machine?)

-- Reuti



> Thanks again for your time and help
>
> Azhar
>
>
>
>
> --- On Thu, 4/24/08, Reuti <reuti at staff.uni-marburg.de> wrote:
> From: Reuti <reuti at staff.uni-marburg.de>
> Subject: Re: [GE users] Difference in number of allocated and  
> execution slots with PE
> To: users at gridengine.sunsource.net
> Date: Thursday, April 24, 2008, 1:01 PM
>
> Hi,
>
> Am 24.04.2008 um 13:41 schrieb Azhar Ali Shah:
>
> > Using SGE with MPICH2-1.0.7rc2 (smpd daemon based method) on my 6
> > node cluster with 9 processors in total, when I submit a job
> > requesting 9 processors, it gets run on only on 7 (including one
> > master), as I could verify from qacct command.
> >
> > I cann't understand why the rest of two processors don't take part
>
> > in computation?
>
> with the daemon based method you get only one entry for the daemon
> per node (hence 6) plus one entry for the master task (i.e. the job
> script) of this parallel job (so 7 in total, as I would expect it).
> This is the reason to have the setting "job_is_first_task FALSE" in
> the daemon based method. In qacct you should see for this "master"
> entry (usually the last one) nearly no computation (i.e. CPU) time
> reported.
>
> > Though when 4 processors are requested the job runs on 4 slaves
> > plus one master i.e total of 5 processors in use!
>
> Same as above, correct output. Although depending on the actual load
> of the cluster you might get less entries, as there is only one
> daemon necessary even when two processes for one job are running on
> this slave node.
>
> > Any ideas on how to get arround this?
>
> Nothing to worry about, all seems to be in best order :-)
>
> -- Reuti
>
>
> > Thanks
> > Azhar
> >
> >
> >
> > Be a better friend, newshound, and know-it-all with Yahoo! Mobile.
> > Try it now.
> >  
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  
> Try it now.  
> ---------------------------------------------------------------------  
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net  
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list