[GE users] Jobs killed because of boguous h_cpu values
andy.schwierskott at sun.com
Fri Jan 13 10:37:15 GMT 2006
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
could it be that an old process which used the same add. group id was still
running? An indication would be that jobs which begin to run quickly get a
big online CPU usage because the CPU usages of any running jobs will be
added to the current jobs's usage.
With 6.0u7 you can use the "execd_param" "ENABLE_ADDGRP_KILL=true" to ensure
that Grid Engine kills all processes which have the additonal group id after
> Recently SGE has started to kill jobs incorrectly claiming they have
> exceeded their h_cpu limit.
> As an example, job 2280213 was submitted earlier today. It executed
> during one second, between 13:17:07 and 13:17:08 (see the attached
> qacct output). In the log file of the execution machine, tiptonville,
> it says that the job was killed because it exceeded the h_cpu limit,
> having used 7767 seconds while the limit is 660. The limit is
> correct, but the usage is obviously wrong. I attach the tiptonville's
> messages file and the configuration of the short queue.
> If you look further in the log, there are several jobs that have used
> almost, but not exactly, the same amount of time. There are even more
> in the messages files from previous days. Checking a few samples of
> them, they have also executed for just a second or so. Essentially,
> they are killed immediately.
> As I mentioned, this started recently. More exactly, it seems to have
> started after we upgraded to U7 on 21 of December. While we are not
> sure this is related to the upgrade, it is a strong suspicion.
> Has anybody seen anything like this? Does anybody have a clue what
> the reason for this could be?
[ Part 2: "Attached Text" ]
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users