[GE issues] [Issue 3179] New - CPU time in accounting file is wrong for PE jobs with accounting_summary set to TRUE

templedf dan.templeton at sun.com
Fri Nov 13 15:44:31 GMT 2009

                 Issue #|3179
                 Summary|CPU time in accounting file is wrong for PE jobs with 
                        |accounting_summary set to TRUE
       Status whiteboard|
              Issue type|DEFECT
             Assigned to|roland
             Reported by|templedf

------- Additional comments from templedf at sunsource.net Fri Nov 13 07:44:30 -0800 2009 -------
For parallel jobs that are submitted with a PE that has accounting_summary set to TRUE, the cpu field in the qacct -j output shows only the
CPU time for the master task.  The slave task CPU time is lost.

To reproduce, write a worker job:

#$ -S /bin/sh

. $SGE_ROOT/$SGE_CELL/common/settings.sh

qrsh -inherit host1 $SGE_ROOT/examples/jobs/worker.sh &
qrsh -inherit host2 $SGE_ROOT/examples/jobs/worker.sh &
qrsh -inherit hostN $SGE_ROOT/examples/jobs/worker.sh &

sleep 150

Submit that job with the make PE:

% qsub -pe make N petest.sh

After about 2 minutes, check the qstat -j output:

% qstat -j 129
job_number:                 129
exec_file:                  job_scripts/129
submission_time:            Fri Nov 13 06:58:43 2009
owner:                      dant
uid:                        40240
group:                      staff
gid:                        10
sge_o_home:                 /home/dant
sge_o_log_name:             dant
sge_o_shell:                /bin/tcsh
sge_o_tz:                   US/Pacific
sge_o_workdir:              /home/dant
sge_o_host:                 gridengine6
account:                    sge
mail_list:                  dant at gridengine6
notify:                     FALSE
job_name:                   petest.sh
jobshare:                   0
shell_list:                 NONE:/bin/sh
script_file:                /tmp/petest.sh
parallel environment:  make range: 6
usage    1:                 cpu=00:11:57, mem=3.14164 GBs, io=0.00000, vmem=27.914M, maxvmem=40.078M
scheduling info:            (Collecting of scheduler job information is turned off)

Notice the reported CPU time from qstat.  After the job completes, run qacct -j:

% qacct -j 129
qname        all.q               
hostname     grid1               
group        staff               
owner        dant                
project      NONE                
department   defaultdepartment   
jobname      petest.sh             
jobnumber    129                 
taskid       undefined
account      sge                 
priority     0                   
qsub_time    Fri Nov 13 06:58:43 2009
start_time   Fri Nov 13 06:35:03 2009
end_time     Fri Nov 13 06:37:34 2009
granted_pe   make                
slots        6                   
failed       0    
exit_status  0                   
ru_wallclock 151          
ru_utime     380.760      
ru_stime     336.310      
ru_maxrss    0                   
ru_ixrss     0                   
ru_ismrss    0                   
ru_idrss     0                   
ru_isrss     0                   
ru_minflt    0                   
ru_majflt    0                   
ru_nswap     0                   
ru_inblock   0                   
ru_oublock   0                   
ru_msgsnd    0                   
ru_msgrcv    0                   
ru_nsignals  0                   
ru_nvcsw     0                   
ru_nivcsw    0                   
cpu          0.304        
mem          0.001             
io           0.000             
iow          0.000             
maxvmem      40.078M
arid         undefined

Notice that the reported CPU time does not match the CPU time reported by qstat -j, nor does it match reality.

This PE accounting works correctly in u3.


To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list