[GE users] OpenMPI job on stay on one node

reuti reuti at staff.uni-marburg.de
Mon Sep 7 14:42:17 BST 2009


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Am 07.09.2009 um 15:18 schrieb sgexav:

> reuti a écrit :
>> Am 07.09.2009 um 13:32 schrieb sgexav:
>>
>>
>>>> <snip>
>>>> as Lydia wrote: you don't need this argument, just leave the  
>>>> option -
>>>> machinefile ... out. Open MPI will detect the granted nodes on its
>>>> own from the original pe_hostfile. The $TMPDIR/machines would be
>>>> created by the start_proc_args for other MPI libraries, but can be
>>>> left out here hence the file won't be create
>>>>
>>>>
>>> OK, doing it that way with "pe orte" et without mychinefile in  
>>> mpirun
>>> command
>>> i see my run starting on the nodes, but i get this error
>>> error: error: ending connection before all data received
>>> error:
>>> <snip>
>>> What doe it mean?
>>>
>>
>> Did you redefine the settings of (here the 6.2u3 setup with the
>> builtin method in former versions it was different):
>>
>> $ qconf -sconf
>> #global:
>> ...
>> qlogin_command               builtin
>> qlogin_daemon                builtin
>> rlogin_command               builtin
>> rlogin_daemon                builtin
>> rsh_command                  builtin
>> rsh_daemon                   builtin
>>
>> -- Reuti
>>
>>
> i am using 6.2u2 dilivered with Rocks 5.2
> qconf -sconf gave:
>
> qlogin_command               builtin
> qlogin_daemon                builtin
> rlogin_command               builtin
> rlogin_daemon                builtin
> rsh_command                  builtin
> rsh_daemon                   builtin
> but also:
> qrsh_command                 /usr/bin/ssh

AFAIK there are no "qrsh_..." entries at all.

> rsh_command                  /usr/bin/ssh
> rlogin_command               /usr/bin/ssh

Having only the last three set it's not sufficient for an SSH  
integration. And unless SGE is compiled with a special flag, it's not  
a Tight Integration anyway. I don't know, why ROCKS includes these  
settings. If you want to go for SSH, you would need:

http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html

You used "qconf -mconf" and the last lines are always added again? Is  
there any local configuration for each node, i.e. "qconf -sconfl" ahs  
entries? When you have an uniform cluster, you can delete them all.

===

To your second eMail: "builtin" is a new mechanism, which don't need  
and rsh or ssh.

===

You can have a cluster w/o active rsh and ssh, but still running  
parallel apps buy SGE either "builtin" or former "rsh-replacement".  
Even for (interactive) qlogin and rlogin, the telnetd and rshd must  
be installed, but they don't need to be activated in /etc/xinetd.d/ 
rsh or .../telnet. Still a Tight Integration w/o the option to be  
bypassed by the user, as for each command a dedicated daemon to login  
will be launched.

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=216256

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list