[GE users] SGE - Failed To Execute openmp parallel job - job stuck in 't' state

brettlee brettlee at yahoo.com
Tue Sep 29 18:54:36 BST 2009


Sounds like it can't bind to that socket as something else (sge_execd?) is bound to it.

Does `netstat -an | grep 6445` show anything?

 -Brett


----- Original Message ----
From: marbarfa <marbarfa at gmail.com>
To: users at gridengine.sunsource.net
Sent: Tuesday, September 29, 2009 9:30:25 AM
Subject: RE: [GE users] SGE - Failed To Execute openmp parallel job - job stuck in 't' state

Anyone?, I really don't know what to do.., the example jobs work fine, moreover, I tried to execute some of my serial jobs and no problem at all..., just reinstalled everything again and get the same problem...

After the sgeexecd falls, I start it again and get a message in the /tmp folder:

09/28/2009 14:31:02|  main|node10|E|communication error for "node10/execd/1" running on port 6445: "can't bind socket"

09/28/2009 14:31:03|  main|node10|E|commlib error: can't bind socket (no additional information available)

09/28/2009 14:31:31|  main|node10|C|abort qmaster registration due to communication errors


Any ideas?

Thanks.
Marcos

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=219656

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=219671

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list