[GE users] Re: LAM SGE Integration issues with rocks 4.1

Reuti reuti at staff.uni-marburg.de
Wed Jan 11 18:45:04 GMT 2006


Hi again.

Am 11.01.2006 um 19:34 schrieb Srividya Valivarthi:

> Hi,
>
>    Thanks for your prompt response. I am sorry if i was not clear on
> the earlier mail. I did not  start the lamd deamons prior to
> submitting the job by hand. What I was trying to convey was that the
> lamd deamons are running on the compute nodes possibly started by SGE
> itself, but somehow is not registered with LAM/MPI??!!
>
>     And also the hostfile that is used during lamboot
> #lamboot -v -ssi boot rsh hostfile

lamboot will start the daemons, which isn't necessary. Also with a  
loose integration, SGE will start the daemons on its own (just by rsh  
in contrast to qrsh with a Tight Integration).

LAM/MPI is in some way SGE aware, and will look for some special  
information in the SGE created directories on all the slave nodes.

But anyway: how did you define the PE - loose with rsh or qrsh? - Reuti


> is as follows, which already had the .local suffix as
> medusa.lab.ac.uab.edu cpu=4
> compute-0-0.local cpu=4
> compute-0-1.local cpu=4
> compute-0-2.local cpu=4
> compute-0-3.local cpu=4
> compute-0-4.local cpu=4
> compute-0-5.local cpu=4
> compute-0-6.local cpu=4
> compute-0-7.local cpu=4
>
> Any further ideas to solve this issue will be very helpful.
>
> Thanks,
> Srividya
> On 1/11/06, Reuti <reuti at staff.uni-marburg.de> wrote:
>> Hi,
>>
>> Am 11.01.2006 um 18:55 schrieb Srividya Valivarthi:
>>
>>> Hi,
>>>
>>>     I am working with a pentium III rocks cluster which has LAM/MPI
>>> version 7.1.1 and SGE version 6.0. I am trying to get the loose
>>> integration mechanism with rsh working with SGE and LAM as suggested
>>> by the following post on this mailing list
>>> http://gridengine.sunsource.net/howto/lam-integration/lam-
>>> integration.html
>>>
>>> However, on submitting the jobs to the queue, i get the following
>>> error message
>>> -------------------------------------------------------------------- 
>>> --
>>> -------
>>> It seems that there is no lamd running on the host  
>>> compute-0-5.local.
>>>
>>> This indicates that the LAM/MPI runtime environment is not  
>>> operating.
>>> The LAM/MPI runtime environment is necessary for the "mpirun"  
>>> command.
>>>
>>> Please run the "lamboot" command the start the LAM/MPI runtime
>>> environment.  See the LAM/MPI documentation for how to invoke
>>> "lamboot" across multiple machines.
>>> -------------------------------------------------------------------- 
>>> --
>>> -------
>>> But, lamnodes  command shows all the nodes on the system and i can
>>> also see the lamd deamon running on the local compute nodes.  Any
>>> ideas on the what the issue could be are greatly appreciated.
>>
>> there is no need to startup any daemon on your own by hand before. In
>> fact, it will not work. SGE takes care of starting a private daemon
>> for each job on all the selected nodes for this particular job.
>>
>> One issue with ROCKS might be similar to this (change the startscript
>> to include .local for the nodes in the "machines"-file):
>>
>> http://gridengine.sunsource.net/servlets/ReadMsg?
>> listName=users&msgNo=14170
>>
>> Just let me know, whether it worked after adjusting the start script.
>>
>> -- Reuti
>>
>>
>>>
>>> Thanks,
>>> Srividya
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list