[GE users] Re: LAM SGE Integration issues with rocks 4.1

Srividya Valivarthi srividya.v at gmail.com
Wed Jan 11 18:34:46 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

   Thanks for your prompt response. I am sorry if i was not clear on
the earlier mail. I did not  start the lamd deamons prior to
submitting the job by hand. What I was trying to convey was that the
lamd deamons are running on the compute nodes possibly started by SGE
itself, but somehow is not registered with LAM/MPI??!!

    And also the hostfile that is used during lamboot
#lamboot -v -ssi boot rsh hostfile
is as follows, which already had the .local suffix as
medusa.lab.ac.uab.edu cpu=4
compute-0-0.local cpu=4
compute-0-1.local cpu=4
compute-0-2.local cpu=4
compute-0-3.local cpu=4
compute-0-4.local cpu=4
compute-0-5.local cpu=4
compute-0-6.local cpu=4
compute-0-7.local cpu=4

Any further ideas to solve this issue will be very helpful.

Thanks,
Srividya
On 1/11/06, Reuti <reuti at staff.uni-marburg.de> wrote:
> Hi,
>
> Am 11.01.2006 um 18:55 schrieb Srividya Valivarthi:
>
> > Hi,
> >
> >     I am working with a pentium III rocks cluster which has LAM/MPI
> > version 7.1.1 and SGE version 6.0. I am trying to get the loose
> > integration mechanism with rsh working with SGE and LAM as suggested
> > by the following post on this mailing list
> > http://gridengine.sunsource.net/howto/lam-integration/lam-
> > integration.html
> >
> > However, on submitting the jobs to the queue, i get the following
> > error message
> > ----------------------------------------------------------------------
> > -------
> > It seems that there is no lamd running on the host compute-0-5.local.
> >
> > This indicates that the LAM/MPI runtime environment is not operating.
> > The LAM/MPI runtime environment is necessary for the "mpirun" command.
> >
> > Please run the "lamboot" command the start the LAM/MPI runtime
> > environment.  See the LAM/MPI documentation for how to invoke
> > "lamboot" across multiple machines.
> > ----------------------------------------------------------------------
> > -------
> > But, lamnodes  command shows all the nodes on the system and i can
> > also see the lamd deamon running on the local compute nodes.  Any
> > ideas on the what the issue could be are greatly appreciated.
>
> there is no need to startup any daemon on your own by hand before. In
> fact, it will not work. SGE takes care of starting a private daemon
> for each job on all the selected nodes for this particular job.
>
> One issue with ROCKS might be similar to this (change the startscript
> to include .local for the nodes in the "machines"-file):
>
> http://gridengine.sunsource.net/servlets/ReadMsg?
> listName=users&msgNo=14170
>
> Just let me know, whether it worked after adjusting the start script.
>
> -- Reuti
>
>
> >
> > Thanks,
> > Srividya
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list