[GE users] Tracking PIDs

John Leidel john.leidel at gmail.com
Wed Aug 8 18:59:45 BST 2007


I just implemented that a few weeks ago.  I agree, it catches 95% of
what would usually slip through the cracks.  Occasionally we still have
some orphaned processes hang around.  Our users have a very bad habit of
getting impatient and killing jobs before they complete [instead of
waiting an extra 20 minutes and allowing them to finish].    

I did find Paul McInnis' script from an earlier post.  I'm going to try
it out and see how it works.  

script : 
http://gridengine.sunsource.net/servlets/ReadMsg?list=users&msgNo=21013

BTW... thanks Paul.  

On Wed, 2007-08-08 at 10:52 -0700, Beadles, Jeff wrote:
> Have you looked at setting "ENABLE_ADDGRP_KILL=true" to the execd_params
> in the grid configuration (qconf -mconf)
> 
> It's handled most of these situations for us very well.
> 
> 	-Jeff
> 
> -----Original Message-----
> From: John Leidel [mailto:john.leidel at gmail.com] 
> Sent: Wednesday, August 08, 2007 8:21 AM
> To: users at gridengine.sunsource.net
> Subject: [GE users] Tracking PIDs
> 
> All, is there currently a way [possibly within the run/prologue scripts]
> to record all the currently running PIDs of a specific job.  The
> situation is the following... 
> 
> - I have setup a h_rt of 3:00:00 for a specific set of intensive jobs
> [MPI]
> - When they hit their h_rt, SGE obviously kills everything it knows
> about.  As we've all experienced, it sometimes leaves dangling processes
> out there.  These things eat quite a bit of memory, so its something we
> *really* need to recover.
> 
> So... I want to record the PIDs of the exe's running within the job...
> then with an epilogue script, make sure I kill those PIDS when the jobs
> are complete.  Is there a better way to accomplish this?  
> 
> thanks
> john
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list