[GE users] Housekeeping

Reuti reuti at staff.uni-marburg.de
Tue Jul 24 14:06:02 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Erik:

Am 24.07.2007 um 09:08 schrieb Lönroth Erik:

> I'm searching for some good strategy as to keep my cluster clean  
> from "runaway" processes that SGE failes to kill in case or  
> abnormal application exits or similar. E.g. Processes that are left  
> behind without a job id.
>
> I'm certain many of you have invented ways to discover those  
> processes and to report them or automatically dispose of them.
>
> I'd be grateful to share your experiences on this before I try  
> invent another wheel.

please don't reuse a reply to start a new thread - to me your posting  
appears to be a reply to "[GE users] N1GE6.1 ResourceAllocation".

Anyway: I also notice this from time to time. Sometimes SGE seems not  
to kill the complete process group, but only one task, although the  
process group was fine, and I can kill all of its processes with  
"kill -9 -- -processgroup_id" by hand without any problems afterwards.

But I havent't found the real cause for this behavior yet. Fixing it  
in SGE would be the preferred solution, instead of running any cron  
job for this IMO.

-- Reuti
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list