[GE users] Jobs killed because of boguous h_cpu values

Andy Schwierskott andy.schwierskott at sun.com
Fri Jan 13 10:37:15 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Göran,

could it be that an old process which used the same add. group id was still
running? An indication would be that jobs which begin to run quickly get a
big online CPU usage because the CPU usages of any running jobs will be
added to the current jobs's usage.

With 6.0u7 you can use the "execd_param" "ENABLE_ADDGRP_KILL=true" to ensure
that Grid Engine kills all processes which have the additonal group id after
job end.

Andy

> Recently SGE has started to kill jobs incorrectly claiming they have
> exceeded their h_cpu limit.
>
> As an example, job 2280213 was submitted earlier today.  It executed
> during one second, between 13:17:07 and 13:17:08 (see the attached
> qacct output).  In the log file of the execution machine, tiptonville,
> it says that the job was killed because it exceeded the h_cpu limit,
> having used 7767 seconds while the limit is 660.  The limit is
> correct, but the usage is obviously wrong.  I attach the tiptonville's
> messages file and the configuration of the short queue.
>
> If you look further in the log, there are several jobs that have used
> almost, but not exactly, the same amount of time.  There are even more
> in the messages files from previous days.  Checking a few samples of
> them, they have also executed for just a second or so.  Essentially,
> they are killed immediately.
>
> As I mentioned, this started recently.  More exactly, it seems to have
> started after we upgraded to U7 on 21 of December.  While we are not
> sure this is related to the upgrade, it is a strong suspicion.
>
> Has anybody seen anything like this?  Does anybody have a clue what
> the reason for this could be?
>
>


    [ Part 2: "Attached Text" ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list