[GE users] sge orphans

Beadles, Jeff jeff_beadles at mentor.com
Tue Jul 31 18:58:16 BST 2007

Have you looked at the execd_params ENABLE_ADDGRP_KILL=true  (from qconf

It's been doing a pretty good job of keeping these cleaned up,

Jeff Beadles

-----Original Message-----
From: Paul MacInnis [mailto:macinnis at dal.ca] 
Sent: Tuesday, July 31, 2007 10:43 AM
To: users at gridengine.sunsource.net
Subject: [GE users] sge orphans

It would be nice if SGE always did what was intended but, in the
real world, unexpected things happen ...

I've attached a shell script that we've been using for over
2 years to clear out processes started by SGE but now detached from
any current SGE job (watch out, some line lines may wrap).

Here's how it works.
On each slave node it peeks at each process's environment looking
for a "JOB_ID=" entry, extracts the job number, asks qstat if the
number is valid, and kills the process if it's not.

We've used it on SGE 5.2 and 6.1, both using tight integration of

It is a Linux script in that it uses /proc/$pid/environ to see a
process's environment.  For Solaris this must be replaced by
    pargs  -e  $pid
Perhaps a conditional based on "uname -s" is a way to generalize
the script but I don't have a Solaris system to test on.

Running the script with no parameters on a slave node will list
proceses that should be killed but, no action is taken.  You must
use a -kill option to have the script actually kill anything.

To safely try it out, simply log onto a slave node and run the
script with no parameters.  You should be root because otherwise
Linux won't let you read /proc entries that don't belong to you.
And $SGE_ROOT must be set.

Once an hour our master node logs into each slave node and runs
something like this:

kill_sge_orphans.sh -kill

All processes killed are noted in the syslog file using logger.
We've setup logwatch to display these messages in its daily

The number of processes killed this way varies a lot, due mainly
to what our users are up to - we sometimes go weeks with none but
if a user is trying something new, this saves us a lot of tedious
cleanup work.  We haven't had to do a manual process cleanup in


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list