[GE users] CPU time = Wallclock time?

Orlando Richards orlando.richards at ed.ac.uk
Wed Oct 22 12:12:39 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I found the following post on the gridengine.info blog:

"Feedback needed: Obsolete options and parameters considered for removal

Posted by chris on 24/06/2008

Grid Engine developers posted a list today of SGE configuration 
parameters and client arguments that are being considered for removal 
from the product because they are either obsolete or they duplicate 
settings found elsewhere.

<snip>

- qmaster_params merge ACCT_RESERVED_USAGE and SHARETREE_RESERVED_USAGE
    We can't imaging a use case to have these values separated"

Our reason for needing them separated is that our Cern grid users use 
CPU/Wallclock to determine the efficiency of their jobs on any given 
system. In general, this is a useful metric to have.

Indeed, the accounted CPU time should, in my mind, always represent the 
amount of CPU time consumed and not the wallclock value (or a derivation 
from it).

I understand that the ACCT_RESERVED_USAGE option is there to give a 
value for wallclock*slots in the case where you wish to account for time 
* slots used, but this seems the wrong place to put it. Also, it doesn't 
seem to do that for me - at least when using our OpenMP environment:
[root at eddie01 ~]# qconf -sp OpenMP
pe_name           OpenMP
slots             1400
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $pe_slots
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

and with ACCT_RESERVED_USAGE=true and SHARETREE_RESERVED_USAGE=false 
(trimmed output):

[orichard at frontend02 scripts]$ qacct -j 1445999
==============================================================
jobnumber    1445999
qsub_time    Wed Oct 22 12:05:14 2008
start_time   Wed Oct 22 12:05:48 2008
end_time     Wed Oct 22 12:06:09 2008
granted_pe   OpenMP
slots        2
failed       0
exit_status  0
ru_wallclock 21
ru_utime     0
ru_stime     0
cpu          0



For the SHARETREE_RESERVED_USAGE - we balance use on our system based on 
the amount of time a job occupies a slot, regardless of whether it is 
using the CPU or not (we have one slot per cpu), for which we require 
SHARETREE_RESERVED_USAGE=true.


--
Orlando.



Orlando Richards wrote:
> Hi folks,
> 
> We seem to have a problem with CPU time always being accounted as equal 
> to Wallclock time (or sometimes 1s higher) - even if the job is just a 
> "sleep 20s" job. The UTIME and STIME report correctly though.
> 
> We're running SGE 6.1u4.
> 
> We have
> execd_params                 SHARETREE_RESERVED_USAGE=TRUE \
>                              ACCT_RESERVED_USAGE=FALSE
> 
> so would expect the CPU time to be recorded as roughly UTIME + STIME - 
> but this is not the case.
> 
> I tried setting SHARETREE_RESERVED_USAGE to FALSE as well, to see if it 
> made any difference, and suddenly we get the expected behaviour (CPU 
> time = 0, wallclock = 20).
> 
> Does anyone know if this is expected behaviour? Is there anything we can 
> do to correct it?
> 
> 
> Sample qacct -j JOBID output for a 20s sleep job:
> 
> 
> ==============================================================
> qname        ecdf
> hostname     node005.beowulf.cluster
> group        is_iti_ug
> owner        orichard
> project      ecdf_baseline
> department   defaultdepartment
> jobname      simple.sh
> jobnumber    1445888
> taskid       undefined
> account      sge
> priority     5
> qsub_time    Wed Oct 22 11:51:42 2008
> start_time   Wed Oct 22 11:52:18 2008
> end_time     Wed Oct 22 11:52:38 2008
> granted_pe   NONE
> slots        1
> failed       0
> exit_status  0
> ru_wallclock 20
> ru_utime     0
> ru_stime     0
> ru_maxrss    0
> ru_ixrss     0
> ru_ismrss    0
> ru_idrss     0
> ru_isrss     0
> ru_minflt    1622
> ru_majflt    0
> ru_nswap     0
> ru_inblock   0
> ru_oublock   0
> ru_msgsnd    0
> ru_msgrcv    0
> ru_nsignals  0
> ru_nvcsw     30
> ru_nivcsw    4
> cpu          20
> mem          40.020
> io           0.000
> iow          0.000
> maxvmem      103.973M
> 
> 
> 
> 
> 


-- 
             --
    Dr Orlando Richards
   Information Services
IT Infrastructure Division
        Unix Section
     Tel: 0131 650 4994

The University of Edinburgh is a charitable body, registered in 
Scotland, with registration number SC005336.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list