[GE users] Problem with commd communications

Craig Tierney ctierney at hpti.com
Mon Jun 14 22:52:59 BST 2004


On Mon, 2004-06-14 at 15:47, Yogesh Chaudhary wrote:
> Hi,
> 
> We have been having similar problem with commd..
> 
> here are the few things we do to solve it..
> 
> stop the execution hosts. stop the master host daemons. wait until the 
> commd communication stops completely. use netstat -a | grep commd | wc -l
> 
> Then start the master host daemons and start the execution hosts.
> 
> Once, we found that there was a user who was trying to do qstat every 
> second and this hosed commd...

We have users that do this, but it is intentional.  We have
a script called qsub_wait, that allows a user to submit a job
and then wait for its completion before it continues.

I have blamed this pattern, but I couldn't prove it.

All of our hosts are submit hosts, so it can really happen from
anywhere.  

Craig




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list