[GE users] OpenMPI job on stay on one node [Solved]

reuti reuti at staff.uni-marburg.de
Mon Sep 7 15:15:42 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Am 07.09.2009 um 16:11 schrieb sgexav:

> So for resume:
>
> Compile open mpi with the --with-sge option.
> Then enable qrsh via ssh:
>
> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html

Great!

But be aware, that using the default SSH you will have wrong  
accounting. If you want (or must) use SSH instead of any builtin/rsh  
method, you would also need to recompile SGE with the option "-tight- 
ssh" to compile a custom SSH version, which will honor the accounting.

-- Reuti


> It works!!!!
> Thanks
> Xavier
>
> reuti a écrit :
>> Am 07.09.2009 um 15:18 schrieb sgexav:
>>
>>
>>> reuti a écrit :
>>>
>>>> Am 07.09.2009 um 13:32 schrieb sgexav:
>>>>
>>>>
>>>>
>>>>>> <snip>
>>>>>> as Lydia wrote: you don't need this argument, just leave the
>>>>>> option -
>>>>>> machinefile ... out. Open MPI will detect the granted nodes on  
>>>>>> its
>>>>>> own from the original pe_hostfile. The $TMPDIR/machines would be
>>>>>> created by the start_proc_args for other MPI libraries, but  
>>>>>> can be
>>>>>> left out here hence the file won't be create
>>>>>>
>>>>>>
>>>>>>
>>>>> OK, doing it that way with "pe orte" et without mychinefile in
>>>>> mpirun
>>>>> command
>>>>> i see my run starting on the nodes, but i get this error
>>>>> error: error: ending connection before all data received
>>>>> error:
>>>>> <snip>
>>>>> What doe it mean?
>>>>>
>>>>>
>>>> Did you redefine the settings of (here the 6.2u3 setup with the
>>>> builtin method in former versions it was different):
>>>>
>>>> $ qconf -sconf
>>>> #global:
>>>> ...
>>>> qlogin_command               builtin
>>>> qlogin_daemon                builtin
>>>> rlogin_command               builtin
>>>> rlogin_daemon                builtin
>>>> rsh_command                  builtin
>>>> rsh_daemon                   builtin
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>
>>> i am using 6.2u2 dilivered with Rocks 5.2
>>> qconf -sconf gave:
>>>
>>> qlogin_command               builtin
>>> qlogin_daemon                builtin
>>> rlogin_command               builtin
>>> rlogin_daemon                builtin
>>> rsh_command                  builtin
>>> rsh_daemon                   builtin
>>> but also:
>>> qrsh_command                 /usr/bin/ssh
>>>
>>
>> AFAIK there are no "qrsh_..." entries at all.
>>
>>
>>> rsh_command                  /usr/bin/ssh
>>> rlogin_command               /usr/bin/ssh
>>>
>>
>> Having only the last three set it's not sufficient for an SSH
>> integration. And unless SGE is compiled with a special flag, it's not
>> a Tight Integration anyway. I don't know, why ROCKS includes these
>> settings. If you want to go for SSH, you would need:
>>
>> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>>
>> You used "qconf -mconf" and the last lines are always added again? Is
>> there any local configuration for each node, i.e. "qconf -sconfl" ahs
>> entries? When you have an uniform cluster, you can delete them all.
>>
>> ===
>>
>> To your second eMail: "builtin" is a new mechanism, which don't need
>> and rsh or ssh.
>>
>> ===
>>
>> You can have a cluster w/o active rsh and ssh, but still running
>> parallel apps buy SGE either "builtin" or former "rsh-replacement".
>> Even for (interactive) qlogin and rlogin, the telnetd and rshd must
>> be installed, but they don't need to be activated in /etc/xinetd.d/
>> rsh or .../telnet. Still a Tight Integration w/o the option to be
>> bypassed by the user, as for each command a dedicated daemon to login
>> will be launched.
>>
>> -- Reuti
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=216256
>>
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>>
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=216257
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216260

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list