[GE users] jobs never die on nodes with mpich

Bogdan Costescu bogdan.costescu at iwr.uni-heidelberg.de
Fri Aug 13 18:06:19 BST 2004


On Thu, 12 Aug 2004, Michel Cuendet wrote:

> It's a bit of everything...

I was just looking at your list of processes and remembered one thing
that might make a difference. SGE on Linux does not have by default
enabled the code that sends the signal to the group of processes, see:

http://gridengine.sunsource.net/servlets/ReadMsg?msgId=6340&listName=users

This was discussed several times on the SGE lists, please search the 
archives for details.

When disabled, I also had problems with killing parallel jobs. I 
enabled it (that means recompiling...) and I never saw these problems 
again - and I remember now that I even wrote this to the lists... 
seems like my memory is overloaded ;-)

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list