[GE users] CPU time

Andy Schwierskott andy.schwierskott at sun.com
Tue Nov 16 08:42:36 GMT 2004


Hi,

  - does the configured "gid_range" comprise otherwise unused group id's? If
    the gid range uses gropup id's from users (or system users) SGE will
    measure this uses as well as job usage.

  - Has this been any type of parallel or multithreaded application?

If you are sure that these two question can be answered with now it would be
interesting if you could monitor your jobs when they are running with

    qstat -j <jobid>

The measured CPU time will be updated approximately every load report
interval - it would be interesting to know whether there's a sudden
significant increase of the CPU time which otherwise cannot be explained.

>From the wallclokc time of the job one would assume not more than  3:52 cpu
hours for a non parallel/multithreaded job (always asuming only one CPU
bound jobs is running in th ejob script).

Andy

> Dear All,
>
> We are running SGE 6.01 in a Linux cluster with 8 nodes. We set Maximum
> CPU time to 10hours. We noticed several jobs were halted on master host
> due to CPU time was over 10hours, however the wallclock time was less
> than 10hours. The example message (job 171) is below. We also checked
> completed jobs on this host (master host), we found CPU time is less
> than the sum of user time and system time. The example message (job 139)
> is below. Could you please tell about how the SGE or Linux calculate on
> CPU time, what's wrong with my master host?
>
> Best wishes,
> Jason
> ------------------------------------------------------------------------
> -------------------
>
> Job 171 (Jason-run) Aborted
>
> Exit Status = 137
>
> Signal = KILL
>
> User = Jason
>
> Queue = all.q at XX.XX.XX
>
> Host = XXX.XX.XX
>
> Start Time = 11/15/2004 13:35:07
>
> End Time = 11/15/2004 17:27:17
>
> CPU = 10:01:48
>
> Max vmem = 997.484M
>
> failed assumedly after job because:
>
> job 171.1 died through signal KILL (9)
>
>
> ------------------------------------------------------------------------
> -------------------
> Job 139 (Jason-run) Complete
>
> User = Jason
>
> Queue = all.q at XXX.XX.XX
>
> Host = XXX.XX.XX
>
> Start Time = 11/14/2004 16:59:52
>
> End Time = 11/14/2004 20:53:42
>
> User Time = 03:48:21
>
> System Time = 00:02:30
>
> Wallclock Time = 03:53:50
>
> CPU = 08:19:29
>
> Max vmem = 997.500M
>
> Exit Status = 0
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


Andy

--
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Andy Schwierskott           Tel: +49 (0)941 3075-200 (x60200)
N1 Grid Engine Engineering  Fax: +49 (0)941 3075-222 (x60222)
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7       mailto:andy.schwierskott at sun.com
D-93049 Regensburg          http://www.sun.com/gridware

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list