[GE users] OpenMPI job on stay on one node [Solved]

sgexav xaviercouvelard at gmail.com
Mon Sep 7 15:11:15 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

So for resume:

Compile open mpi with the --with-sge option.
Then enable qrsh via ssh:

http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html

It works!!!!
Thanks
Xavier

reuti a écrit :
> Am 07.09.2009 um 15:18 schrieb sgexav:
>
>   
>> reuti a écrit :
>>     
>>> Am 07.09.2009 um 13:32 schrieb sgexav:
>>>
>>>
>>>       
>>>>> <snip>
>>>>> as Lydia wrote: you don't need this argument, just leave the  
>>>>> option -
>>>>> machinefile ... out. Open MPI will detect the granted nodes on its
>>>>> own from the original pe_hostfile. The $TMPDIR/machines would be
>>>>> created by the start_proc_args for other MPI libraries, but can be
>>>>> left out here hence the file won't be create
>>>>>
>>>>>
>>>>>           
>>>> OK, doing it that way with "pe orte" et without mychinefile in  
>>>> mpirun
>>>> command
>>>> i see my run starting on the nodes, but i get this error
>>>> error: error: ending connection before all data received
>>>> error:
>>>> <snip>
>>>> What doe it mean?
>>>>
>>>>         
>>> Did you redefine the settings of (here the 6.2u3 setup with the
>>> builtin method in former versions it was different):
>>>
>>> $ qconf -sconf
>>> #global:
>>> ...
>>> qlogin_command               builtin
>>> qlogin_daemon                builtin
>>> rlogin_command               builtin
>>> rlogin_daemon                builtin
>>> rsh_command                  builtin
>>> rsh_daemon                   builtin
>>>
>>> -- Reuti
>>>
>>>
>>>       
>> i am using 6.2u2 dilivered with Rocks 5.2
>> qconf -sconf gave:
>>
>> qlogin_command               builtin
>> qlogin_daemon                builtin
>> rlogin_command               builtin
>> rlogin_daemon                builtin
>> rsh_command                  builtin
>> rsh_daemon                   builtin
>> but also:
>> qrsh_command                 /usr/bin/ssh
>>     
>
> AFAIK there are no "qrsh_..." entries at all.
>
>   
>> rsh_command                  /usr/bin/ssh
>> rlogin_command               /usr/bin/ssh
>>     
>
> Having only the last three set it's not sufficient for an SSH  
> integration. And unless SGE is compiled with a special flag, it's not  
> a Tight Integration anyway. I don't know, why ROCKS includes these  
> settings. If you want to go for SSH, you would need:
>
> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>
> You used "qconf -mconf" and the last lines are always added again? Is  
> there any local configuration for each node, i.e. "qconf -sconfl" ahs  
> entries? When you have an uniform cluster, you can delete them all.
>
> ===
>
> To your second eMail: "builtin" is a new mechanism, which don't need  
> and rsh or ssh.
>
> ===
>
> You can have a cluster w/o active rsh and ssh, but still running  
> parallel apps buy SGE either "builtin" or former "rsh-replacement".  
> Even for (interactive) qlogin and rlogin, the telnetd and rshd must  
> be installed, but they don't need to be activated in /etc/xinetd.d/ 
> rsh or .../telnet. Still a Tight Integration w/o the option to be  
> bypassed by the user, as for each command a dedicated daemon to login  
> will be launched.
>
> -- Reuti
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216256
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216257

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list