[GE users] rsh zombies when using mpich2 -- johnny layne

Johnny Layne laynejg at vcu.edu
Mon Jul 30 16:03:29 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

hye again,
    I'm very sorry, I was lazy in my emailing & forgot to edit the 
subject.  Here is my email:

hye everyone,
   I'm playing around with mpich2, running some VASP jobs.  I'm noticing 
that occasionally some rsh processes become zombies, anybody else seeing 
this?  Right now I suspect it's possibly due to not using a job-specific 
.smpd file, I'm going to play around & see if creating a specific one 
for each job seems to help.  So I wonder if launching a bunch of these 
jobs in quick succession is causing problems when the jobs finish & the 
.smpd has changed.

   I've got everything set up following Reuti's tight integration with 
mpich 2 
(http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html) 
and in general it works great, I've just noticed this happening a couple 
times, and couldn't find (so far) any similar postings in the mailing 
list archive.

   I could add this guy's solution to my stopmpich2.sh to kill any 
zombies I suppose:  
https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2004-January/004113.html 
or do something along those lines anyway in the kill code.

   It's not a big problem for me as I'll hunt down zombie processes & 
kill 'em, but I hardly trust our users to do that when we turn this 
stuff loose to them!  Thanks for any advice & info in advance.  I'll 
continue playing around with things & post if something seems to work.
    johnny

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list