[GE users] Problem of SGE

reuti reuti at staff.uni-marburg.de
Mon Jul 5 10:43:56 BST 2010

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]


Am 04.07.2010 um 17:08 schrieb gqc606:

> I installed SGE and MPICH2 on my computers,and integrated them with the following page: <http://gridengine.su?nsource.net/howto/mp?ich2-integration/mpi?ch2-integration.html?>
>  First it can work well,everything is all right.But thirty hours later,I got some wrong messages in this directory on one of my computer nodes:
> [root at compute-0-0 ~]# cat /opt/gridengine/default/spool/compute-0-0/messages
> 06/25/2010 22:26:28| main|compute-0-0|E|can't send asynchronous message to commproc (qmaster:1) on host "cluster.local": can't resolve host name
> 06/25/2010 22:26:52| main|compute-0-0|E|commlib error: got select error (Connection reset by peer)

this doesn't look like being connected to the MPICH2 setup, but like a NIS problem. All hostnames can be resolved on all machines? The spool directory is on a shared directory, or are these local on each machine?

Only MPICH2 jobs are affected?

-- Reuti

> I am confused,and don't know how to solve this problem.who can give me some advice?Thanks!
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266025
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list