[GE users] What's the consequence if I removed these lines from sge_conf

igardais igardais at yahoo.fr
Wed Jan 6 19:36:00 GMT 2010


    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

OK.
For point 3 : *_command are all set to the default builtin setting
For point 2 : we are using Intel MPI (mostly 3.1 and 3.2). Even if based on Open MPI, I don't know if SGE integration has been ported.
For point 1 : does --rsh need to be set ? catch_rsh seems to prepend the 'rsh' link in the PATH variable but according to point 2, I don't know if the default --rsh command is just 'rsh' or its full path.

Ionel


Le 6 janv. 2010 ? 18:52, reuti a écrit :

> Hi,
> 
> Am 06.01.2010 um 07:53 schrieb igardais:
> 
>> What about rsh interception when using "builtin" commands ?
>> All my mpi scripts specify "--rsh=/usr/bin/ssh" to use the classic  
>> key-based password-less login but with little control over the job.
> 
> three things.
> 
> First: correct. This absolute path will bypass any jopb control  
> imposed by SGE. The idea behind the -catch_rsh in the PE defintion is:
> 
> - SGE will create a link called "rsh" in $TMPDIR on the master node  
> of the parallel job which will point to SGE's rsh-wrapper. It's  
> important to realize, that at this point the name "rsh" it's just a  
> name and is not related to any startup mechnism at all. You can even  
> tell your application "--rsh=fubar" and create a link called "fubar"  
> in $TMPDIR. This is usually done in the defined start_proc_args in  
> the PE.
> 
> - Then SGE's rsh-wrapper will be called, which will use "qrsh - 
> inherit ..." to get to the other slave tasks.
> 
> - The "qrsh -inherit ..." will use one of the 3 mentioned startup  
> mechanisms below. For rsh and ssh a dedicated daemon rshd/sshd will  
> be started by SGE on a dedicated port just for this one call. It's  
> not necessary to have sshd/rshd running all the time. This way you  
> can have a cluster where no user can login to a node but can still  
> use this way to start tasks between the nodes.
> 
> Second: Did you compile Open MPI with --with-sge? Then --rsh  
> shouldn't have any effect at all, as Open MPI will detect  
> automatically that it's running under SGE.
> 
> Third: As said, the entries for rsh_command and rsh_daemon must  
> match. When only the *_commands are defined, the *_deamons will have  
> a default. When there is a mismatch, an rsh might try to contact an  
> sshd, or the -builtin- mechanism a rshd. None of this will work. Best  
> is to include  entries of the pair.
> 
> -- Reuti
> 
> 
>> I'm considering rsh-interception but my first attemps (a few years  
>> back now) were unsuccessful.
>> 
>> Any hints ?
>> 
>> Thanks,
>> Regards,
>> Ionel
>> 
>> 
>> De : reuti <reuti at staff.uni-marburg.de>
>> ? : users at gridengine.sunsource.net
>> Envoyé le : Mer 6 Janvier 2010, 1 h 56 min 40 s
>> Objet : Re: [GE users] What's the consequence if I removed these  
>> lines from sge_conf
>> 
>> Am 06.01.2010 um 01:40 schrieb kdoman:
>> 
>>> What's the consequence of removing the lines below from sge conf?  
>> If I
>>> don't, we cannot submit any parallel jobs that request "-pe orte"
>>> greater than 4.
>>> 
>>> qrsh_command                /usr/bin/ssh
>>> rsh_command                  /usr/bin/ssh
>>> rlogin_command              /usr/bin/ssh
>> 
>> The definition of the the *_command must match the ones of the
>> *_daemon. It defines what mechanism will be used to start interactive
>> jobs or slave tasks. You can have:
>> 
>> Classic rsh startup (e.g. for x86):
>> 
>> qlogin_command              /usr/bin/telnet
>> qlogin_daemon                /usr/sbin/in.telnetd
>> rlogin_command              /usr/sge/utilbin/lx24-x86/rlogin
>> rlogin_daemon                /usr/sbin/in.rlogind
>> rsh_command                  /usr/sge/utilbin/lx24-x86/rsh
>> rsh_daemon                  /usr/sge/utilbin/lx24-x86/rshd -l
>> 
>> All builtin:
>> 
>> qlogin_command              builtin
>> qlogin_daemon                builtin
>> rlogin_command              builtin
>> rlogin_daemon                builtin
>> rsh_command                  builtin
>> rsh_daemon                  builtin
>> 
>> or ssh according to:
>> 
>> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>> 
>> The three options qlogin_*, rlogin_* and rsh_* must be conistent, but
>> can be different for each pair of them of course.
>> 
>> Also note, that these entries can be overwritten on an exechost
>> level, i.e. its local configuration: qconf -mconf <exechost>
>> 
>> -- Reuti
>> 
>> 
>>> Without the above modification, any job submission with -pe orte
>>> greater than 4 would received this error:
>>> 
>>> error: error: ending connection before all data received
>>> error:
>>> error reading job context from "qlogin_starter"
>>> 
>> ----------------------------------------------------------------------
>>> ----
>>> A daemon (pid 2160) died unexpectedly with status 1 while attempting
>>> to launch so we are aborting.
>>> 
>>> There may be more information reported by the environment (see  
>> above).
>>> 
>>> This may be because the daemon was unable to find all the needed
>>> shared
>>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
>>> have the
>>> location of the shared libraries on the remote nodes and this will
>>> automatically be forwarded to the remote nodes.
>>> 
>> ----------------------------------------------------------------------
>>> ----
>>> 
>> ----------------------------------------------------------------------
>>> ----
>>> mpirun noticed that the job aborted, but has no info as to the  
>> process
>>> that caused that situation.
>>> 
>> ----------------------------------------------------------------------
>>> ----
>>> mpirun: clean termination accomplished
>>> 
>>> 
>>> Thanks.
>>> K.
>>> 
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?
>>> dsForumId=38&dsMessageId=236695
>>> 
>>> To unsubscribe from this discussion, e-mail: [users-
>>> unsubscribe at gridengine.sunsource.net].
>> 
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do? 
>> dsForumId=38&dsMessageId=236698
>> 
>> To unsubscribe from this discussion, e-mail: [users- 
>> unsubscribe at gridengine.sunsource.net].
>> 
>> 
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=236874
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=236895

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list