[GE users] Correct accounting with mpich-mx tight integration

Reuti reuti at staff.uni-marburg.de
Thu May 10 20:38:37 BST 2007


Am 10.05.2007 um 18:49 schrieb Chris Rudge:

> Yes, I do mean the time reported by the qacct but I think that the  
> problem
> is actually in the way SGE does accounting and not the parallel  
> efficiency
> of the code.
>
> For example, if I run a 4 cpu MPI job for 1 hour (walltime) and  
> then do
> 	Qacct -j <jobid>
>
> What I'd expect to see is a walltime = 1 hour and a cputime <= 4 hours
> (depending on parallel efficiency).
>
> However, with the tight integration of mpich-mx what I'd actually  
> see is 5
> individual sections in the qacct report for this job. All would have a
> walltime of 1 hour, four of them would have a cputime of something  
> up to 1
> hour and the other would have a cputime of 0. As far as I can tell  
> these
> sections in the report are accumulated into a total accounting  
> record of
>   Walltime = 5 hours and cputime <= 4 hours
>
> I fully understand why it reports these times but I can't see any  
> reason why
> reporting this for the walltime could possibly considered the right  
> thing to
> do. I use PBSPro with mpich-gm on another cluster and the walltime  
> report on
> that system for the same job would be the expected 1 hour.

Are you using the plain mpirun in the PBSPro cluster (and hence the  
accounting is only done for the master process as there is still no  
qrsh AFAIK), or the mpiexec replacement from http://www.osc.edu/~pw/ 
mpiexec/index.php to use the TM-Interface to start the tasks on the  
slave nodes? (based on my Torque knowledge)

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list