[GE users] wildly innacurate cpu usage, SGE 6.0u4

Andy Schwierskott andy.schwierskott at sun.com
Thu Jan 24 13:48:25 GMT 2008


Hi Aaron,

AFAIR there was a bug in this area in the execd which was fixed with a patch
after 6.0u4. Unfortunately I don't find the IZ id right now. I think there
was some discussion on the mailing list as well on this topic.

The second root cause could be not killed jobs from previous jobs which
carry the same additional group id. When there is a wrap around of the
additional group id range the execd would monitor those lingering processes
from previous jobs as well.

Could it be that this is the cause in your case? Or do you have some process
monitoring for wild running processes in place?

The ENABLE_ADDGRP_KILL execd_param would ensure that all processes are
killed after job end via their additional group id. About this topic and its
pros andcons we had also discussion on the mailing list.

Andy



On Thu, 24 Jan 2008, Reuti wrote:

> Hi,
>
> Am 24.01.2008 um 13:07 schrieb aaron at cs.york.ac.uk:
>
>> I was doing some detailed analysis of the job mix on our system from the
>> past year to find out if the resources offered match those we provide so
>> as to
>> inform future purchasing decisions. At first it looked that our resources 
>> did
>> not match from analyses run on the accounting file.
>> 
>> On closer analysis, however, it seems that in a very few instances the cpu
>> time used exceeded by order(s) of magnitude the ru_wallclock*slots time. 
>> Has
>> anyone else seen this, and in what circumstances? The jobs affected seem
>> to have failed.
>
> are these jobs forking any other process or thread - i.e. they are running in 
> some way parallel on one and the same node?
>
> If so, you could observe a load greater than the number of cores inside these 
> systems.
>
> -- Reuti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list