[GE users] Re: LAM SGE Integration issues with rocks 4.1

Srividya Valivarthi srividya.v at gmail.com
Wed Jan 11 18:50:22 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

   I did define the pe for loose rsh using qmon. and also added this
pe to the queue list using the queue manager provided by qmon.

Thanks,
Srividya

On 1/11/06, Reuti <reuti at staff.uni-marburg.de> wrote:
> Hi again.
>
> Am 11.01.2006 um 19:34 schrieb Srividya Valivarthi:
>
> > Hi,
> >
> >    Thanks for your prompt response. I am sorry if i was not clear on
> > the earlier mail. I did not  start the lamd deamons prior to
> > submitting the job by hand. What I was trying to convey was that the
> > lamd deamons are running on the compute nodes possibly started by SGE
> > itself, but somehow is not registered with LAM/MPI??!!
> >
> >     And also the hostfile that is used during lamboot
> > #lamboot -v -ssi boot rsh hostfile
>
> lamboot will start the daemons, which isn't necessary. Also with a
> loose integration, SGE will start the daemons on its own (just by rsh
> in contrast to qrsh with a Tight Integration).
>
> LAM/MPI is in some way SGE aware, and will look for some special
> information in the SGE created directories on all the slave nodes.
>
> But anyway: how did you define the PE - loose with rsh or qrsh? - Reuti
>
>
> > is as follows, which already had the .local suffix as
> > medusa.lab.ac.uab.edu cpu=4
> > compute-0-0.local cpu=4
> > compute-0-1.local cpu=4
> > compute-0-2.local cpu=4
> > compute-0-3.local cpu=4
> > compute-0-4.local cpu=4
> > compute-0-5.local cpu=4
> > compute-0-6.local cpu=4
> > compute-0-7.local cpu=4
> >
> > Any further ideas to solve this issue will be very helpful.
> >
> > Thanks,
> > Srividya
> > On 1/11/06, Reuti <reuti at staff.uni-marburg.de> wrote:
> >> Hi,
> >>
> >> Am 11.01.2006 um 18:55 schrieb Srividya Valivarthi:
> >>
> >>> Hi,
> >>>
> >>>     I am working with a pentium III rocks cluster which has LAM/MPI
> >>> version 7.1.1 and SGE version 6.0. I am trying to get the loose
> >>> integration mechanism with rsh working with SGE and LAM as suggested
> >>> by the following post on this mailing list
> >>> http://gridengine.sunsource.net/howto/lam-integration/lam-
> >>> integration.html
> >>>
> >>> However, on submitting the jobs to the queue, i get the following
> >>> error message
> >>> --------------------------------------------------------------------
> >>> --
> >>> -------
> >>> It seems that there is no lamd running on the host
> >>> compute-0-5.local.
> >>>
> >>> This indicates that the LAM/MPI runtime environment is not
> >>> operating.
> >>> The LAM/MPI runtime environment is necessary for the "mpirun"
> >>> command.
> >>>
> >>> Please run the "lamboot" command the start the LAM/MPI runtime
> >>> environment.  See the LAM/MPI documentation for how to invoke
> >>> "lamboot" across multiple machines.
> >>> --------------------------------------------------------------------
> >>> --
> >>> -------
> >>> But, lamnodes  command shows all the nodes on the system and i can
> >>> also see the lamd deamon running on the local compute nodes.  Any
> >>> ideas on the what the issue could be are greatly appreciated.
> >>
> >> there is no need to startup any daemon on your own by hand before. In
> >> fact, it will not work. SGE takes care of starting a private daemon
> >> for each job on all the selected nodes for this particular job.
> >>
> >> One issue with ROCKS might be similar to this (change the startscript
> >> to include .local for the nodes in the "machines"-file):
> >>
> >> http://gridengine.sunsource.net/servlets/ReadMsg?
> >> listName=users&msgNo=14170
> >>
> >> Just let me know, whether it worked after adjusting the start script.
> >>
> >> -- Reuti
> >>
> >>
> >>>
> >>> Thanks,
> >>> Srividya
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list