[GE users] shepherd problem

Philippe Caussignac philippe.caussignac at epfl.ch
Fri Mar 16 17:23:15 GMT 2007

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


I have a cluster made of 6 bidual-core Woodcrest nodes, myrinet mx, Suse 
10.2 OS, SGE 6.09.
I uses the tight integration of myrinet mpich-mx (compile with the rsh 
command option) through the $SGE_ROOT/mpi/startmpi.sh and 

When I submit jobs with 8 processors, everything is OK. For jobs with 
9-11 processors it's sometimes OK, sometimes not. For jobs with 12 and 
more processors, it never works.

The error message in the error log of sge is:

cannot get connection to "shepherd" at host "node06"
cannot get connection to "shepherd" at host "node02"
cannot get connection to "shepherd" at host "node03"

No idea what to do, except installing sge5.3 which works perfectly on 
another myrinet cluster.


                            Philippe Caussignac
                            EPFL FSB IMB LCVMM (Station 8)
                            CH-1015 LAUSANNE (Switzerland)
                            email: Philippe.Caussignac at epfl.ch
                            Phone: (41) 21 693 25 78
                            Fax:   (41) 21 693 55 30

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list