[GE users] qacct ambiguities

Reuti reuti at staff.uni-marburg.de
Wed Nov 28 21:48:49 GMT 2007


Am 28.11.2007 um 20:27 schrieb Ross Dickson:

> I have two questions about qacct output.
>
> (1) Why do I sometimes see multiple records for a single job when  
> using "qacct -j <jobid>"?   And why do many of these have a bogus  
> qsub_time (Wed Dec 31 20:00:00 1969)?

Each call to a qrsh in a tightly integrated parallel job will create  
a record. If there are many qrsh calls (like in parallel Gaussian  
with Linda), you can get several hundreds entries there. In some  
older versions of SGE only the main task of a parallel job will have  
a correct qsub_time.

There are issues:

http://gridengine.sunsource.net/issues/show_bug.cgi?id=2092

and

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1689

and should be fixed in 6.0u10 - not in 6.1u3?

BTW: can some please mark one of these as a duplicate.


> (2)  How should I interpret the WALLCLOCK and CPU times  returned  
> by qacct?  Consider:
>
> > qacct -d 30 -pe
> PE       WALLCLOCK       UTIME     STIME        CPU         MEMORY
> ==================================================================
> NONE        896093      125519        40     131683      70485.870
> cre        6362816     5785288   1790762   14617130    9039932.662
> mpich     25070095    21679280     11122   21690514    4943072.822
> openmp     2586960     5213138      4152    9114528   15434070.616
>
> Looking at the CRE parallel environment I see the ratio of CPU to  
> wall clock time is about 2.3, which suggests to me that the  
> wallclock time is just end time minus start time, with no slot  
> count factored in.

Correct.

>   The OpenMP figures show about 3.5 as much CPU as wall time,  
> leading to the same conclusion.
> However, for MPICH I see *more* wall time than CPU, which suggests  
> either
>  (a) we have a lot of MPICH jobs sitting around idling, or

All slave tasks are having an entry in the accounting file and so the  
accumulated walltime. And: yes - not all parallel programs are really  
100% parallel all the time, but sometimes only during certain tasks  
(or e.g. waiting for other parallel tasks to deliver their results).  
We handle this in one of our clusters by allowing a small  
oversubscription of the nodes to gather the otherwise idling time.

>  (b) the wall time reported for this parallel environment is  
> multiplied by the slot count (or summed over slots), contradicting  
> the conclusions above, or

No, "or summed over slots = entries" is correct. Why is this a  
contradiction? Or do you mean your statement about OpenMP?

>  (c) the MPICH CPU total does *not* include all the slots.

If it's ightly integrated, it will include all slots, i.e. entries in  
the accounting file.

OTOH: for an OpenMP job there is only one entry, as there is no qrsh- 
call but threads inside the program.

-- Reuti


> If (c) is the case, then we're making a mistake using CPU time in  
> our usage accounting aren't we?  Are we seriously undercounting the  
> MPI CPU usage?
>
> This example is from SGE 6.0u7 running on Solaris, although I've  
> seen similar mysteries on our Linux clusters as well, and on a  
> machine which was recently upgraded from 6.0u7 to 6.1u2.
>
>
> -- 
> Ross Dickson         HPC Consultant
> ACEnet               http://www.ace-net.ca
> +1 902 494 6710      Skype: ross.m.dickson
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list