[GE users] PE Accounting

templedf dan.templeton at sun.com
Fri Nov 13 15:07:33 GMT 2009


As a sanity check, I just ran a PE job using the make PE that runs the 
worker.sh example job on each slave node.  Right before it ends I see:

usage    1:               cpu=00:05:55, mem=1.81908 GBs, io=0.00000, 
vmem=3.180M, maxvmem=26.500M

but qacct -j shows me:

ru_wallclock 180         
ru_utime     205.866     
ru_stime     149.709     
cpu          0.144       
mem          0.000            
io           0.000            
iow          0.000            
maxvmem      26.500M

Again, the cpu time appears to only be the master task.  At least this 
time, though, the qstat:usage:cpu is equal to the qacct:ru_utime + 
qacct:ru_stime.  I will admit that I'm running the 6.2u5 alpha2 build, 
so if no one can explain these numbers or has seen this behavior before, 
I'll assume it's a bug in u5.

Daniel


templedf wrote:
> Can someone point out what I'm missing here?  I have a tightly 
> integrated parallel job.  It runs for about 11 minutes.  Just before it 
> ends, I run a qstat -j, and I get the following usage line:
>
> usage    1:               cpu=00:40:53, mem=652.32980 GBs, io=0.00000, 
> vmem=23.023M, maxvmem=50.125M
>
> As far as I can tell, it's working as intended.  I have 40 CPU-minutes 
> in 11 minutes of execution, which corresponds to the 4 slots on which 
> the job is running.  BUT, after the job ends, I run qacct -j, and it 
> shows me:
>
> cpu          9.040       
> mem          0.369            
> io           0.000            
> iow          0.000            
> maxvmem      50.125M
>
> It got the maxvmem right, but the CPU time is clearly only for the 
> master task.  Even better, a little further up in the qacct output we see:
>
> ru_wallclock 726         
> ru_utime     1356.589    
> ru_stime     104.770     
>
> which says that the job consumed twice as much CPU time as wallclock 
> time.  Huh?
>
> Why do I have three different CPU time values that don't agree with each 
> other?  Am I just misunderstanding the numbers?
>
> Daniel
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226685
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=226689

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list