[GE issues] [Issue 3179] New - CPU time in accounting file is wrong for PE jobs with accounting_summary set to TRUE

templedf dan.templeton at sun.com
Fri Nov 13 15:44:31 GMT 2009


http://gridengine.sunsource.net/issues/show_bug.cgi?id=3179
                 Issue #|3179
                 Summary|CPU time in accounting file is wrong for PE jobs with 
                        |accounting_summary set to TRUE
               Component|gridengine
                 Version|6.2u4
                Platform|All
                     URL|
              OS/Version|All
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|clients
             Assigned to|roland
             Reported by|templedf






------- Additional comments from templedf at sunsource.net Fri Nov 13 07:44:30 -0800 2009 -------
For parallel jobs that are submitted with a PE that has accounting_summary set to TRUE, the cpu field in the qacct -j output shows only the
CPU time for the master task.  The slave task CPU time is lost.

To reproduce, write a worker job:

---
#!/bin/sh
#$ -S /bin/sh

. $SGE_ROOT/$SGE_CELL/common/settings.sh

qrsh -inherit host1 $SGE_ROOT/examples/jobs/worker.sh &
qrsh -inherit host2 $SGE_ROOT/examples/jobs/worker.sh &
qrsh -inherit hostN $SGE_ROOT/examples/jobs/worker.sh &

sleep 150
---

Submit that job with the make PE:

% qsub -pe make N petest.sh

After about 2 minutes, check the qstat -j output:

% qstat -j 129
==============================================================
job_number:                 129
exec_file:                  job_scripts/129
submission_time:            Fri Nov 13 06:58:43 2009
owner:                      dant
uid:                        40240
group:                      staff
gid:                        10
sge_o_home:                 /home/dant
sge_o_log_name:             dant
sge_o_path:                
/sge/bin/sol-amd64:/usr/dt/bin:/usr/openwin/bin:/usr/ccs/bin:/opt/SUNWspro/bin:/usr/bin:/usr/sbin:/opt/sfw/bin:/usr/sfw/bin:/usr/local/bin:/usr/java/bin:/usr/dist/exe:/sbin:/usr/ucb:/usr/dist/local/exe:/usr/lib/lp/postscript:/opt/SUNWut/sbin:/opt/SUNWut/bin:.
sge_o_shell:                /bin/tcsh
sge_o_tz:                   US/Pacific
sge_o_workdir:              /home/dant
sge_o_host:                 gridengine6
account:                    sge
mail_list:                  dant at gridengine6
notify:                     FALSE
job_name:                   petest.sh
jobshare:                   0
shell_list:                 NONE:/bin/sh
env_list:                   
script_file:                /tmp/petest.sh
parallel environment:  make range: 6
usage    1:                 cpu=00:11:57, mem=3.14164 GBs, io=0.00000, vmem=27.914M, maxvmem=40.078M
scheduling info:            (Collecting of scheduler job information is turned off)

Notice the reported CPU time from qstat.  After the job completes, run qacct -j:

% qacct -j 129
==============================================================
qname        all.q               
hostname     grid1               
group        staff               
owner        dant                
project      NONE                
department   defaultdepartment   
jobname      petest.sh             
jobnumber    129                 
taskid       undefined
account      sge                 
priority     0                   
qsub_time    Fri Nov 13 06:58:43 2009
start_time   Fri Nov 13 06:35:03 2009
end_time     Fri Nov 13 06:37:34 2009
granted_pe   make                
slots        6                   
failed       0    
exit_status  0                   
ru_wallclock 151          
ru_utime     380.760      
ru_stime     336.310      
ru_maxrss    0                   
ru_ixrss     0                   
ru_ismrss    0                   
ru_idrss     0                   
ru_isrss     0                   
ru_minflt    0                   
ru_majflt    0                   
ru_nswap     0                   
ru_inblock   0                   
ru_oublock   0                   
ru_msgsnd    0                   
ru_msgrcv    0                   
ru_nsignals  0                   
ru_nvcsw     0                   
ru_nivcsw    0                   
cpu          0.304        
mem          0.001             
io           0.000             
iow          0.000             
maxvmem      40.078M
arid         undefined

Notice that the reported CPU time does not match the CPU time reported by qstat -j, nor does it match reality.

This PE accounting works correctly in u3.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=36&dsMessageId=226704

To unsubscribe from this discussion, e-mail: [issues-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list