[GE users] What's the consequence if I removed these lines from sge_conf

reuti reuti at staff.uni-marburg.de
Wed Jan 6 00:56:40 GMT 2010


Am 06.01.2010 um 01:40 schrieb kdoman:

> What's the consequence of removing the lines below from sge conf? If I
> don't, we cannot submit any parallel jobs that request "-pe orte"
> greater than 4.
>
> qrsh_command                 /usr/bin/ssh
> rsh_command                  /usr/bin/ssh
> rlogin_command               /usr/bin/ssh

The definition of the the *_command must match the ones of the  
*_daemon. It defines what mechanism will be used to start interactive  
jobs or slave tasks. You can have:

Classic rsh startup (e.g. for x86):

qlogin_command               /usr/bin/telnet
qlogin_daemon                /usr/sbin/in.telnetd
rlogin_command               /usr/sge/utilbin/lx24-x86/rlogin
rlogin_daemon                /usr/sbin/in.rlogind
rsh_command                  /usr/sge/utilbin/lx24-x86/rsh
rsh_daemon                   /usr/sge/utilbin/lx24-x86/rshd -l

All builtin:

qlogin_command               builtin
qlogin_daemon                builtin
rlogin_command               builtin
rlogin_daemon                builtin
rsh_command                  builtin
rsh_daemon                   builtin

or ssh according to:

http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html

The three options qlogin_*, rlogin_* and rsh_* must be conistent, but  
can be different for each pair of them of course.

Also note, that these entries can be overwritten on an exechost  
level, i.e. its local configuration: qconf -mconf <exechost>

-- Reuti


> Without the above modification, any job submission with -pe orte
> greater than 4 would received this error:
>
> error: error: ending connection before all data received
> error:
> error reading job context from "qlogin_starter"
> ---------------------------------------------------------------------- 
> ----
> A daemon (pid 2160) died unexpectedly with status 1 while attempting
> to launch so we are aborting.
>
> There may be more information reported by the environment (see above).
>
> This may be because the daemon was unable to find all the needed  
> shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to  
> have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> ---------------------------------------------------------------------- 
> ----
> ---------------------------------------------------------------------- 
> ----
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> ---------------------------------------------------------------------- 
> ----
> mpirun: clean termination accomplished
>
>
> Thanks.
> K.
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=236695
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=236698

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list