[GE users] Theoretical question about wallclock and qacct

mhanby mhanby at uab.edu
Tue Mar 3 20:22:10 GMT 2009


This test was spawned by one of our grant writers asking "Is there a way
for me to query grid engine to figure out how utilized the cluster is
from day to day, week to week or month to month?"

Based on what's in the accounting file, I don't see how that's possible.
I was thinking along the lines of trying to figure out the theoretical
max number of compute time based on number of slots, and use that
combined with actual usage to determine what percentage of the max was
used.

By the way, my accounting file has 5 entries, 2 for the master host and
3 for the remaining compute nodes that were used. I must have miscounted
before, it actually ran on 4 hosts, not 5, which would make sense since
each host has 8 cores / slots.

-----Original Message-----
From: reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: Tuesday, March 03, 2009 12:49 PM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] Theoretical question about wallclock and qacct

Am 03.03.2009 um 16:14 schrieb mhanby:

> I have a theoretical question regarding the number presented by qacct
> for WALLCLOCK. I'm running GE 6.1u5 with OpenMPI 1.2.8 compiled on the
> head node (so OpenMPI should be GE aware).
>
> As a test, I ran a 32 slot OpenMPI job that had a total runtime of 60
> minutes. The WALLCLOCK reported in the email delivered after job
> completion was 1:00:04 hours.
>
> The qacct command for that same job reports 18020, which translates  
> to ~
> 5:00:05 hours.

Is there only one record in the accounting file for this job? With a  
Tight Integration you should get 6 - one for the jobscript and one  
for each qrsh made.

> The job ran on 5 hosts, so it appears that the WALLCLOCK is only
> recording the seconds on each host and not each CPU / slot?
>
> Is this the way it's supposed to work, or is this a tight vs loose
> integration thing?
>
> I would have expected the WALLCLOCK for the 32 slot job to be ~  
> 32:00:00
> hours

For 6.2 it's an issue for the new ability to summarize the accounting  
records automatically:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=2787

Although it's still open to discuss to sum up the walltime at all.  
You could even argue that it should be just the time passed by, i.e.  
1 hr in your case. So upgrading woudn't help in yoru case anyway.

-- Reuti


> Thanks,
>
> Mike
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=119640
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
Id=119735

To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=119789

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list