[GE users] prevent users from executing jobs on nodes except via sungrid

Jerry Mersel jerry.mersel at weizmann.ac.il
Thu Mar 30 15:34:21 BST 2006

Thanks Reuti:

  You could probably tell I'm getting kinda desparate with this,
  and so I am.

  I stopped running the system rshd.
  I tried it with a conventional switch, and with MPICH that doesn't come
  from voltaire, and built it with <SGEROOT>/utilbin/lx24-amd64/rsh as

  If I run on one node it works, (doesn't run on master), on 2 or more
  I get Connection refused.

  I have looked into <sge>/mpi.

  The PE that I am using:

pe_name           mlmersel
slots             999
user_lists        NONE
xuser_lists       NONE
start_proc_args   /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh -catch_rsh
stop_proc_args    /wiccusers/mlmersel/mlmersel/mpi/stopmpi.sh
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min

The queue config:

  qname                 all.q
hostlist              @allhosts
seq_no                0
load_thresholds       np_load_avg=1.75
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:05:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             NONE
pe_list               jerry mlmersel mpi mymake
rerun                 FALSE
slots                 2,[wiccopt-2.weizmann.ac.il=2], \
                      [wiccopt-3.weizmann.ac.il=2], \
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            NONE
xuser_lists           NONE
subordinate_list      NONE
complex_values        NONE
projects              NONE
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 INFINITY
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY

the cluster config:

execd_spool_dir              /shared/SGE/default/spool
mailer                       /bin/mail
xterm                        /usr/bin/X11/xterm
load_sensor                  none
prolog                       none
epilog                       none
shell_start_mode             posix_compliant
login_shells                 sh,ksh,csh,tcsh
min_uid                      0
min_gid                      0
user_lists                   none
xuser_lists                  none
projects                     none
xprojects                    none
enforce_project              false
enforce_user                 auto
load_report_time             00:00:40
max_unheard                  00:05:00
reschedule_unknown           00:00:00
loglevel                     log_warning
administrator_mail           mlmersel at wicc.weizmann.ac.il
set_token_cmd                none
pag_cmd                      none
token_extend_time            none
shepherd_cmd                 none
qmaster_params               none
execd_params                 none
reporting_params             accounting=true reporting=false \
                             flush_time=00:00:15 joblog=false
finished_jobs                100
gid_range                    20000-20200
qlogin_command               telnet
qlogin_daemon                /usr/sbin/in.telnetd
rlogin_daemon                /usr/sbin/in.rlogind
max_aj_instances             2000
max_aj_tasks                 75000
max_u_jobs                   0
max_jobs                     0
auto_user_oticket            0
auto_user_fshare             0
auto_user_default_project    none
auto_user_delete_time        86400
delegated_file_staging       false
reprioritize                 0

the script:

  # ---------------------------
# our name
#$ -N MPI_Job
# pe request
#$ -pe mlmersel 2-8
# needs in
#       the number of tasks to be used
#   $TMPDIR/machines
#       a valid machiche file to be passed to mpirun

echo "Got $NSLOTS slots."

# enables $TMPDIR/rsh to catch rsh calls if available
set path=($TMPDIR /shared/mpich/bin $path)

/shared/mpich/bin/mpirun -np 2 -machinefile $TMPDIR/machines

Thanks again,


> Hi,
> Am 29.03.2006 um 22:23 schrieb Jerry Mersel:
>> Thank you for your quick reply.
>> I think I need someone to help me to walk through this problem of
>> mine.
>> If anyone can do that I would really appreciate it.
>> Here's the problem:
>>   I want to run parallel applications using mpich (we have a voltaire
>> infiniband switch), I want to avoid using <home>/.ssh/authorized_keys2
>> or <home>/.rhosts. To do that I was told that I would have to use
>> "tight integration" between MPICH and SGE. I was also told that SGE
>> does
>> not support tight integration with ssh so I would have to use rsh.
>> That's fine by me.
> the rsh is still a private daemon of SGE in the tight integration.
> You don't need any running daemon for it in the system.
>> I tried the examples, set up the PE accordingly, but nothing seems
>> to help.
> What is your setup of the defined queues and PEs? You looked into the
> $SGE_ROOT/mpi directory? Could you also first try it with a
> conventional ethernet connection instead of Infiniband?
> -- Reuti
>> Please, oh please walk me through this.
>>                                       Regards,
>>                                          Jerry
>>> Hi,
>>> Am 29.03.2006 um 14:03 schrieb Jerry Mersel:
>>>> Hi:
>>>>   Since I need to use tight integration with my PE, and
>>>> therefore to use rsh what should I set the cluster
>>>> configuration to, in particular:
>>>>  qlogin daemon
>>>>  qlogin command,
>>>>  rsh daemon,
>>>>  rsh command,
>>>>  rlogin daemon,
>>>>  rlogin command
>>>> just guessing but are they:
>>>>    <sge_Root>/bin/<ARC>/rshd - rsh daemon
>>>>    <sge_Root>/bin/<ARC>/rshd - rlogin daemon
>>>>    <sge_Root>/bin/<ARC>/rsh - rsh command
>>> by default the supplied values during the installation of SGE are
>>> okay, so you could remove some of them and set only the default
>>> entries:
>>> qlogin_command               telnet
>>> qlogin_daemon                /usr/sbin/in.telnetd
>>> rlogin_daemon                /usr/sbin/in.rlogind
>>> in qconf -sconf
>>> HTH - Reuti
>>>> do the qlogin daemon and command need to be set, what about the
>>>> shprard
>>>> command, anything else?
>>>>                        Thanks,
>>>>                         Jerry
>>>>> That means sshd is reading the default config file!
>>>>> So if you disable login for the system sshd, the one launched by
>>>>> SGE will also get the same configuration.
>>>>> The way to get this to work is to copy the configuration to
>>>>> another location, and then disable login in the default one.
>>>>> Then add "-f <path to the backup config file>" to rsh_daemon and
>>>>> rlogin_daemon in your cluster configuration :
>>>>> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>>>>> Note that when you use SSH, the PE is not tight anymore. Tight
>>>>> SSH integration is enabled in maintrunk, and may be back-ported
>>>>> to V6.0:
>>>>> http://gridengine.sunsource.net/servlets/BrowseList?
>>>>> list=dev&by=thread&from=9051
>>>>>  -Ron
>>>>> --- Jerry Mersel <jerry.mersel at weizmann.ac.il> wrote:
>>>>>> Thank you for your quick response.
