[GE users] OpenMPI job on stay on one node [Solved]

sgexav xaviercouvelard at gmail.com
Mon Sep 7 15:19:35 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

reuti a écrit :
> Am 07.09.2009 um 16:11 schrieb sgexav:
>
>   
>> So for resume:
>>
>> Compile open mpi with the --with-sge option.
>> Then enable qrsh via ssh:
>>
>> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>>     
>
> Great!
>
> But be aware, that using the default SSH you will have wrong  
> accounting. If you want (or must) use SSH instead of any builtin/rsh  
> method, you would also need to recompile SGE with the option "-tight- 
> ssh" to compile a custom SSH version, which will honor the accounting.
>
> -- Reuti
>   
Ah,
i am not sure to want to recompile sge, while it comes with rocks, and 
rocks far from very stable.
Do you advise me to enable rsh in rocks??
Xavier
>> It works!!!!
>> Thanks
>> Xavier
>>
>> reuti a écrit :
>>     
>>> Am 07.09.2009 um 15:18 schrieb sgexav:
>>>
>>>
>>>       
>>>> reuti a écrit :
>>>>
>>>>         
>>>>> Am 07.09.2009 um 13:32 schrieb sgexav:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>>> <snip>
>>>>>>> as Lydia wrote: you don't need this argument, just leave the
>>>>>>> option -
>>>>>>> machinefile ... out. Open MPI will detect the granted nodes on  
>>>>>>> its
>>>>>>> own from the original pe_hostfile. The $TMPDIR/machines would be
>>>>>>> created by the start_proc_args for other MPI libraries, but  
>>>>>>> can be
>>>>>>> left out here hence the file won't be create
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> OK, doing it that way with "pe orte" et without mychinefile in
>>>>>> mpirun
>>>>>> command
>>>>>> i see my run starting on the nodes, but i get this error
>>>>>> error: error: ending connection before all data received
>>>>>> error:
>>>>>> <snip>
>>>>>> What doe it mean?
>>>>>>
>>>>>>
>>>>>>             
>>>>> Did you redefine the settings of (here the 6.2u3 setup with the
>>>>> builtin method in former versions it was different):
>>>>>
>>>>> $ qconf -sconf
>>>>> #global:
>>>>> ...
>>>>> qlogin_command               builtin
>>>>> qlogin_daemon                builtin
>>>>> rlogin_command               builtin
>>>>> rlogin_daemon                builtin
>>>>> rsh_command                  builtin
>>>>> rsh_daemon                   builtin
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>
>>>>>           
>>>> i am using 6.2u2 dilivered with Rocks 5.2
>>>> qconf -sconf gave:
>>>>
>>>> qlogin_command               builtin
>>>> qlogin_daemon                builtin
>>>> rlogin_command               builtin
>>>> rlogin_daemon                builtin
>>>> rsh_command                  builtin
>>>> rsh_daemon                   builtin
>>>> but also:
>>>> qrsh_command                 /usr/bin/ssh
>>>>
>>>>         
>>> AFAIK there are no "qrsh_..." entries at all.
>>>
>>>
>>>       
>>>> rsh_command                  /usr/bin/ssh
>>>> rlogin_command               /usr/bin/ssh
>>>>
>>>>         
>>> Having only the last three set it's not sufficient for an SSH
>>> integration. And unless SGE is compiled with a special flag, it's not
>>> a Tight Integration anyway. I don't know, why ROCKS includes these
>>> settings. If you want to go for SSH, you would need:
>>>
>>> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>>>
>>> You used "qconf -mconf" and the last lines are always added again? Is
>>> there any local configuration for each node, i.e. "qconf -sconfl" ahs
>>> entries? When you have an uniform cluster, you can delete them all.
>>>
>>> ===
>>>
>>> To your second eMail: "builtin" is a new mechanism, which don't need
>>> and rsh or ssh.
>>>
>>> ===
>>>
>>> You can have a cluster w/o active rsh and ssh, but still running
>>> parallel apps buy SGE either "builtin" or former "rsh-replacement".
>>> Even for (interactive) qlogin and rlogin, the telnetd and rshd must
>>> be installed, but they don't need to be activated in /etc/xinetd.d/
>>> rsh or .../telnet. Still a Tight Integration w/o the option to be
>>> bypassed by the user, as for each command a dedicated daemon to login
>>> will be launched.
>>>
>>> -- Reuti
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>>> dsForumId=38&dsMessageId=216256
>>>
>>> To unsubscribe from this discussion, e-mail: [users- 
>>> unsubscribe at gridengine.sunsource.net].
>>>
>>>       
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=216257
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216260
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216261

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list