[GE users] Yet another qdel mpich problem (SGE 6.0u1)

Charu Chaubal Charu.Chaubal at Sun.COM
Wed Sep 8 13:53:29 BST 2004


Hello,

On Sep 7, 2004, at 6:22 PM, Vladimir Florinski wrote:

> On Tue, 2004-09-07 at 17:05, Charu Chaubal wrote:
>> Hi,
>>
>> One suggestion is to write a custom delete method which looks for all 
>> processes
>> with the special GE additional group ID set and kill them 
>> forcefully.... unless
>> you find that the escaped processes somehow "shed" their additional 
>> group id in
>> the process....
>>
>> Regards,
>> 	Charu
>>
>
> I wouldn't know how to do this. Are there instructions and / or 
> examples
> somewhere?

It's fairly straightforward.  In your terminate method, get the addt'l 
group ID like this:
ADDGRPID=`cat $SGE_JOB_SPOOL_DIR/addgrpid`

Find all processes which have this addt'l group ID, eg, on Solaris do:
	pcred /proc/* > $TMPFILE 2> /dev/null
	grep groups $TMPFILE | grep $ADDGRPID

And then do kill -9 on all of those found processes.

Note that this can sometimes inadvertently kill an NFS client process 
on certain OSes (I think Linux???).  But, I am not sure of the details, 
and I believe this is a rare situation.

Regards,
	Charu



>
> -- 
> Vladimir Florinski
> Assistant Research Physicist
> Institute of Geophysics and Planetary Physics
> University of California
> Riverside, CA 92521
> phone: 1-909-787-3943
> fax: 1-909-787-4509
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
###############################################################
# Charu V. Chaubal				# Phone: (650) 786-7672 (x87672)
# Grid Computing Technologist	# Fax:   (650) 786-4591
# Sun Microsystems, Inc.			# Email: charu.chaubal at sun.com
###############################################################




More information about the gridengine-users mailing list