[GE users] Accounting

Mark Westwood mark.westwood at ohmsurveys.com
Tue Apr 19 10:12:40 BST 2005

Hi Shaila

We're currently on GE v5.3, planning to move to v6.x real soon, so my 
experience may be only indirectly useful.

We run a 70 processor Opteron cluster, the o/s is Linux. When I took 
over as cluster manager I (after giving myself the grand title 'cluster 
manager') took on the task of producing useful (to 'management') reports 
from the GE accounting file.  I did what I guess you are doing - load 
the accounting file into a home-brewed MySQL database for further 
processing and report extraction.

Currently the process is a bit less automated than I would like, but 
since we're upgrading soon I don't propose to change until the new 
version is installed.  I report monthly, so the couple of hours effort I 
have to put in is not too onerous and it helps me to keep an eye on the 
data.  Here's what I do:

- Load the accounting file into a Tasks table; this table has one row 
for each completed task.  During this operation I massage the data a 
bit, doing things like translating Unix timestamps into MySQL date-time 

- From the Tasks table create a Jobs table which has one row for each 
job - so for parallel jobs the Jobs table holds one row which aggregates 
the data for the individual tasks.  We have a policy in operation that 
all jobs submitted to the cluster must have a project number attached 
(ie grid engine rejects jobs without project numbers), so from the Jobs 
table I can produce a number of useful summary reports - use by user and 
by project, for whatever time period I'm interested in.

- Like you I'm interested in time series of cluster usage.  Each month I 
produce a time series of hour-by-hour CPU usage, that is a series of 
numbers which shows, for each hour in a month, how many (out of 70) 
processors were in use. First thing is to create (or delete the old 
version of and re-create) a temporary table, called Hours, which will 
eventually have one row for each hour in the month. In principal my 
script will produce time series of usage for whatever period and 
whatever interval I want, eg minute-by-minute for the last 3 weeks.

Now things are going to get very kludgy, so hang on tight:

-- I have a shell script which executes a loop once for each hour in the 

-- for each hour, the script executes an SQL command to select all Jobs 
which ran for at least part of the hour;

-- for each Job selected, calculate how many CPU-seconds (product of 
number of CPUs and number of seconds) the Job consumed in the hour. 
This requires checking whether the job execution time overlapped the 
start and end of the hour in question to calculate the time the job ran 
during that hour.

-- for each hour, sum the CPU-seconds consumed by all jobs which ran for 
some time in that hour. Divide this sum by the number of CPU-seconds 
available in that hour (ie 70 * 3600) and take the nearest integer to 
that result.  This number, the number of CPUs used in the hour in 
question, is an integer between 0 and 70.  It is written into the 
temporary Hours table.

-- and so on for each hour in the month.

It ain't pretty but it works, and only takes a few minutes on my PC.

The result is the time series I wanted, which I can then import into, 
say, Excel (although I also use Matlab here) for graphing.  I then have 
a nice graph of cluster usage for the month which I can include in my 
reports. I also use Excel / Matlab for graphing the other data I extract 
from the database.

Management then have the ammunition to wonder out loud why they bought 
so many processors to be under-used, but that's another story.

If you want a copy of the shell script, email me directly.


Shaila Parashar wrote:
> Hi
> We have a cluster consisting of 4 nodes  running SGE 6.0u3 .  The number 
> of CPUs on these nodes are 24, 12, 8 and 4 . We have been running this 
> cluster since Aug 2004 and are now in need of some statistics. I read 
> the mailing lists and did get some ideas about the statistics. I also 
> imported the accounting file into MySQL . I did manage to get statistics 
> on the jobs - as average CPU time used, average waiting time, etc. But 
> we need statistics similar to the following :-
> Number of CPU's used versus time. Also number oc CPUs used on each of 
> the hosts vs time.
> Basically we need plots vs time.
> I wanted to know if this is possible from the SGE accounting files and 
> if so how ?
> I would appreciate it if  you can suggest of any other values that we 
> can plot against time.
> Any ideas/suggestions on how to get these values ( if possible ) will be 
> really appreciated.
> Thanks
> Shaila

Mark Westwood
Parallel Programmer
The Technology Centre
Offshore Technology Park
Claymore Drive
AB23 8GD
United Kingdom

+44 (0)870 429 6586

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list