[GE users] |E|commlib error: got select error (Broken pipe)

reuti reuti at staff.uni-marburg.de
Thu Dec 2 09:26:41 GMT 2010


Am 30.11.2010 um 05:57 schrieb adarsh:

> I want to know that is it necessary to have default directory of cells mounted over NFS.
> I simply SCP all SGE package to Execution Hosts after installing qmaster.
> After this I execute ./install_execd at slaves.

in principle this is possible. You could have even saved some work, when you would have installed the execd also on the qmaster (just to generate the correct $SGE_ROOT/default/common/sgeexecd). Then remove it again from the list of execution hosts.

Then you only need after the transfer of the complete directory $SGE_ROOT to copy $SGE_ROOT/default/common/sgeexecd (or link to) in one or more of the /etc/init.d/rcX.d after adding each one with `qconf -ah <exechost>` as adminitrative hosts. They will be added as exechosts automatically, when the qmaster is contacted.

Some details you may find here:


> MY Qmon shows all hosts with their loads.
> Yet I am not able to successfully finish my job.Execution Host Logs ( messages ) shows 
> 11/30/2010 09:30:36|  main|ws34-rak-lin|I|starting up GE 6.2u5 (lx24-amd64)
> But my job remain in qw state after submission.
> Qmaster Logs shows :-
> 11/30/2010 09:26:08|listen|ws37-mah-lin|E|commlib error: got select error (Broken pipe)
> 11/30/2010 09:26:15|listen|ws37-mah-lin|E|commlib error: got read error (closing "ws34-rak-lin/execd/1")

Any firewall on any machine for ports 6444 and 6445 (unless you changed the default ports).

The `qacct`will only work on the headnode, as it needs access to the $SGE_ROOT/default/common/accounting which is not shared in your cluster. And even on the headnode it won't display anything of the job right now, as it's written after the job left the system. You can try `qstat -j 6` or `qalter -w p 6` for running/pending ones.

-- Reuti

> When I issue ./qstat command it shows result in the file attached.
> Please be kind to help.
> Thanks & Regards
> Adarsh Sharma
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=300487
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].<Error>


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list