[GE users] Problem of SGE

kdoman kdoman07 at gmail.com
Fri Jul 9 16:25:59 BST 2010

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi Reuti -
This is so odd! I don't recall my queue ever run into the error mode
until recently, and the only thing I implemented  recently was the
MPICH2 integration following your method.

The error is very random. One of my clusters has around 2000 serial
jobs right now and last night almost 20% of the nodes ended up with
the error in the queue. I ran "qmod -c" to clear out the error and
this morning, some of the nodes had error again.


On Mon, Jul 5, 2010 at 4:43 AM, reuti <reuti at staff.uni-marburg.de> wrote:
> Hi,
> Am 04.07.2010 um 17:08 schrieb gqc606:
>> I installed SGE and MPICH2 on my computers,and integrated them with the following page: <http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html>
>>  First it can work well,everything is all right.But thirty hours later,I got some wrong messages in this directory on one of my computer nodes:
>> [root at compute-0-0 ~]# cat /opt/gridengine/default/spool/compute-0-0/messages
>> 06/25/2010 22:26:28| main|compute-0-0|E|can't send asynchronous message to commproc (qmaster:1) on host "cluster.local": can't resolve host name
>> 06/25/2010 22:26:52| main|compute-0-0|E|commlib error: got select error (Connection reset by peer)
> this doesn't look like being connected to the MPICH2 setup, but like a NIS problem. All hostnames can be resolved on all machines? The spool directory is on a shared directory, or are these local on each machine?
> Only MPICH2 jobs are affected?
> -- Reuti
>> I am confused,and don't know how to solve this problem.who can give me some advice?Thanks!
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266025
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=266127
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

<object type="application/x-shockwave-flash"
width="230" height="85"><param name="movie"
value="https://clients4.google.com/voice/embed/webCallButton" /><param
name="wmode" value="transparent" /><param name="FlashVars"


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list