[GE users] CPU time = Wallclock time?

Wolfgang Friebel Wolfgang.Friebel at desy.de
Wed Nov 5 14:57:04 GMT 2008


On Wed, 5 Nov 2008, orlandorichards wrote:

I have noticed that behaviour as well. However I believe that this must 
not be a bug, but could be a "feature" ;-)

If I want to do the cpu accounting properly, I have to sum up utime+stime. 
Unfortunately there seems to be really a bug (in 6.0u9) which we are using 
currently: These parameters are not set if the job was killed by SGE due 
to exceeding some limits.

The recorded cpu value however seems to be the parameter that is used for 
job scheduling. You can give relative weights with the parameter 
usage_weight_list: cpu=wcpu,mem=wmem,io=wio (see man sched_conf) when 
using the sharetree algorithm.

As SGE cannot know what is the cpu/wallclock ratio for a given job, SGE 
has to calculate the tickets (before job start) based on the time 
requested (cpu or wallclock, should not make a difference, as the ratio is 
unknown)

That makes me believe that the parameter cpu is correctly reporting the 
used wallclock time, when SHARETREE_RESERVED_USAGE=TRUE was given, as 
effectively the wallclock time was used in the share calculation.

This cpu parameter is even multiplied with a load scaling factor.
This could be another bug in the code, as we observed, that the factor is 
not related to the host, where the job was running but is just one of the
configured load scaling factors (the first one?)

Whether the observed behaviour is really a bug or not has probably to be 
answered by the author of the code only. The documentation at least leaves 
room for interpretation:
    cpu    The cpu time usage in seconds.
Could that be interpreted as "The cpu" "time usage" i.e. how long was a 
cpu blocked instead of how long was the cpu busy?

-- 
Wolfgang Friebel                   Deutsches Elektronen-Synchrotron DESY
Phone/Fax:  +49 33762 77372/216    Platanenallee 6
Mail: Wolfgang.Friebel AT desy.de  D-15738 Zeuthen  Germany

> Reuti wrote:
>> Hi,
>>
>> Am 22.10.2008 um 12:53 schrieb Orlando Richards:
>>
>>> We seem to have a problem with CPU time always being accounted as
>>> equal to Wallclock time (or sometimes 1s higher) - even if the job is
>>> just a "sleep 20s" job. The UTIME and STIME report correctly though.
>>>
>>> We're running SGE 6.1u4.
>>>
>>> We have
>>> execd_params                 SHARETREE_RESERVED_USAGE=TRUE \
>>>                              ACCT_RESERVED_USAGE=FALSE
>>>
>>> so would expect the CPU time to be recorded as roughly UTIME + STIME -
>>> but this is not the case.
>>>
>>> I tried setting SHARETREE_RESERVED_USAGE to FALSE as well, to see if
>>> it made any difference, and suddenly we get the expected behaviour
>>> (CPU time = 0, wallclock = 20).
>>>
>>> Does anyone know if this is expected behaviour?
>>
>> something is really broken (I check in 6.2). They seem to operate in
>> they way, that SHARETREE_RESERVED_USAGE refers to the accounting file.
>> Whether ACCT_RESERVED_USAGE operates the same way for the sharetree I
>> didn't check.
>>
>> Changing SHARETREE_RESERVED_USAGE between TRUE and FALSE shows
>> constantly a changed behvaior for the accounting record. This even works
>> for parallel jobs then as expected.
>>
>>> Is there anything we can do to correct it?
>>
>> Fixing the source ;-) So an issue should be filed for it.
>>
>> -- Reuti
>>
>>
>>> Sample qacct -j JOBID output for a 20s sleep job:
>>>
>>>
>>> ==============================================================
>>> qname        ecdf
>>> hostname     node005.beowulf.cluster
>>> group        is_iti_ug
>>> owner        orichard
>>> project      ecdf_baseline
>>> department   defaultdepartment
>>> jobname      simple.sh
>>> jobnumber    1445888
>>> taskid       undefined
>>> account      sge
>>> priority     5
>>> qsub_time    Wed Oct 22 11:51:42 2008
>>> start_time   Wed Oct 22 11:52:18 2008
>>> end_time     Wed Oct 22 11:52:38 2008
>>> granted_pe   NONE
>>> slots        1
>>> failed       0
>>> exit_status  0
>>> ru_wallclock 20
>>> ru_utime     0
>>> ru_stime     0
>>> ru_maxrss    0
>>> ru_ixrss     0
>>> ru_ismrss    0
>>> ru_idrss     0
>>> ru_isrss     0
>>> ru_minflt    1622
>>> ru_majflt    0
>>> ru_nswap     0
>>> ru_inblock   0
>>> ru_oublock   0
>>> ru_msgsnd    0
>>> ru_msgrcv    0
>>> ru_nsignals  0
>>> ru_nvcsw     30
>>> ru_nivcsw    4
>>> cpu          20
>>> mem          40.020
>>> io           0.000
>>> iow          0.000
>>> maxvmem      103.973M
>>>
>>>
>>>
>>>
>>>
>>> --
>>>             --
>>>    Dr Orlando Richards
>>>   Information Services
>>> IT Infrastructure Division
>>>        Unix Section
>>>     Tel: 0131 650 4994
>>>
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=88106

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list