[GE users] prevent users from executing jobs on nodes except via sungrid

Reuti reuti at staff.uni-marburg.de
Fri Mar 31 08:52:26 BST 2006


Hi,

Am 30.03.2006 um 16:34 schrieb Jerry Mersel:

> Thanks Reuti:
>
>   You could probably tell I'm getting kinda desparate with this,
>   and so I am.
>
>   I stopped running the system rshd.
>   I tried it with a conventional switch, and with MPICH that  
> doesn't come
>   from voltaire, and built it with <SGEROOT>/utilbin/lx24-amd64/rsh as
>   the RSHCOMMAND.
>

no, this way the wrapper won't work. Please recompile it with a  
simple switch:

-rsh=rsh

in the MPICH ./configure - although it's deprecated.

Cheers - Reuti


>   If I run on one node it works, (doesn't run on master), on 2 or more
>   I get Connection refused.
>
>   I have looked into <sge>/mpi.
>
>   The PE that I am using:
>
> pe_name           mlmersel
> slots             999
> user_lists        NONE
> xuser_lists       NONE
> start_proc_args   /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh - 
> catch_rsh
> $pe_hostfile
> stop_proc_args    /wiccusers/mlmersel/mlmersel/mpi/stopmpi.sh
> allocation_rule   $round_robin
> control_slaves    TRUE
> job_is_first_task FALSE
> urgency_slots     min
>
>
> The queue config:
>
>   qname                 all.q
> hostlist              @allhosts
> seq_no                0
> load_thresholds       np_load_avg=1.75
> suspend_thresholds    NONE
> nsuspend              1
> suspend_interval      00:05:00
> priority              0
> min_cpu_interval      00:05:00
> processors            UNDEFINED
> qtype                 BATCH INTERACTIVE
> ckpt_list             NONE
> pe_list               jerry mlmersel mpi mymake
> rerun                 FALSE
> slots                 2,[wiccopt-2.weizmann.ac.il=2], \
>                       [wiccopt-3.weizmann.ac.il=2], \
>                       [wiccopt-4.weizmann.ac.il=2], 
> [wiccopt-1.weizmann.ac.il=2]
> tmpdir                /tmp
> shell                 /bin/csh
> prolog                NONE
> epilog                NONE
> shell_start_mode      posix_compliant
> starter_method        NONE
> suspend_method        NONE
> resume_method         NONE
> terminate_method      NONE
> notify                00:00:60
> owner_list            NONE
> user_lists            NONE
> xuser_lists           NONE
> subordinate_list      NONE
> complex_values        NONE
> projects              NONE
> xprojects             NONE
> calendar              NONE
> initial_state         default
> s_rt                  INFINITY
> h_rt                  INFINITY
> s_cpu                 INFINITY
> h_cpu                 INFINITY
> s_fsize               INFINITY
> h_fsize               INFINITY
> s_data                INFINITY
> h_data                INFINITY
> s_stack               INFINITY
> h_stack               INFINITY
> s_core                INFINITY
> h_core                INFINITY
> s_rss                 INFINITY
> h_rss                 INFINITY
> s_vmem                INFINITY
> h_vmem                INFINITY
>
> the cluster config:
>
> execd_spool_dir              /shared/SGE/default/spool
> mailer                       /bin/mail
> xterm                        /usr/bin/X11/xterm
> load_sensor                  none
> prolog                       none
> epilog                       none
> shell_start_mode             posix_compliant
> login_shells                 sh,ksh,csh,tcsh
> min_uid                      0
> min_gid                      0
> user_lists                   none
> xuser_lists                  none
> projects                     none
> xprojects                    none
> enforce_project              false
> enforce_user                 auto
> load_report_time             00:00:40
> max_unheard                  00:05:00
> reschedule_unknown           00:00:00
> loglevel                     log_warning
> administrator_mail           mlmersel at wicc.weizmann.ac.il
> set_token_cmd                none
> pag_cmd                      none
> token_extend_time            none
> shepherd_cmd                 none
> qmaster_params               none
> execd_params                 none
> reporting_params             accounting=true reporting=false \
>                              flush_time=00:00:15 joblog=false
> sharelog=00:00:00
> finished_jobs                100
> gid_range                    20000-20200
> qlogin_command               telnet
> qlogin_daemon                /usr/sbin/in.telnetd
> rlogin_daemon                /usr/sbin/in.rlogind
> max_aj_instances             2000
> max_aj_tasks                 75000
> max_u_jobs                   0
> max_jobs                     0
> auto_user_oticket            0
> auto_user_fshare             0
> auto_user_default_project    none
> auto_user_delete_time        86400
> delegated_file_staging       false
> reprioritize                 0
>
>
>
> the script:
>
>   # ---------------------------
> # our name
> #$ -N MPI_Job
> #
> # pe request
> #$ -pe mlmersel 2-8
> #
> #$ -v MPICH_PROCESS_GROUP=no
> #
> # needs in
> #   $NSLOTS
> #       the number of tasks to be used
> #   $TMPDIR/machines
> #       a valid machiche file to be passed to mpirun
>
> echo "Got $NSLOTS slots."
>
> # enables $TMPDIR/rsh to catch rsh calls if available
> set path=($TMPDIR /shared/mpich/bin $path)
>
> /shared/mpich/bin/mpirun -np 2 -machinefile $TMPDIR/machines
> /shared/mpich/examples/cpi
>
> Thanks again,
>
>                               Regards,
>                                  Jerry
>
>
>
>
>
>
>
>> Hi,
>>
>> Am 29.03.2006 um 22:23 schrieb Jerry Mersel:
>>
>>> Thank you for your quick reply.
>>>
>>> I think I need someone to help me to walk through this problem of
>>> mine.
>>> If anyone can do that I would really appreciate it.
>>>
>>> Here's the problem:
>>>
>>>   I want to run parallel applications using mpich (we have a  
>>> voltaire
>>> infiniband switch), I want to avoid using <home>/.ssh/ 
>>> authorized_keys2
>>> or <home>/.rhosts. To do that I was told that I would have to use
>>> "tight integration" between MPICH and SGE. I was also told that SGE
>>> does
>>> not support tight integration with ssh so I would have to use rsh.
>>> That's fine by me.
>>
>> the rsh is still a private daemon of SGE in the tight integration.
>> You don't need any running daemon for it in the system.
>>
>>> I tried the examples, set up the PE accordingly, but nothing seems
>>> to help.
>>
>> What is your setup of the defined queues and PEs? You looked into the
>> $SGE_ROOT/mpi directory? Could you also first try it with a
>> conventional ethernet connection instead of Infiniband?
>>
>> -- Reuti
>>
>>
>>> Please, oh please walk me through this.
>>>
>>>                                       Regards,
>>>                                          Jerry
>>>
>>>
>>>
>>>
>>>> Hi,
>>>>
>>>> Am 29.03.2006 um 14:03 schrieb Jerry Mersel:
>>>>
>>>>> Hi:
>>>>>
>>>>>   Since I need to use tight integration with my PE, and
>>>>> therefore to use rsh what should I set the cluster
>>>>> configuration to, in particular:
>>>>>
>>>>>  qlogin daemon
>>>>>  qlogin command,
>>>>>  rsh daemon,
>>>>>  rsh command,
>>>>>  rlogin daemon,
>>>>>  rlogin command
>>>>>
>>>>> just guessing but are they:
>>>>>
>>>>>    <sge_Root>/bin/<ARC>/rshd - rsh daemon
>>>>>    <sge_Root>/bin/<ARC>/rshd - rlogin daemon
>>>>>    <sge_Root>/bin/<ARC>/rsh - rsh command
>>>>
>>>> by default the supplied values during the installation of SGE are
>>>> okay, so you could remove some of them and set only the default
>>>> entries:
>>>>
>>>> qlogin_command               telnet
>>>> qlogin_daemon                /usr/sbin/in.telnetd
>>>> rlogin_daemon                /usr/sbin/in.rlogind
>>>>
>>>> in qconf -sconf
>>>>
>>>> HTH - Reuti
>>>>
>>>>
>>>>> do the qlogin daemon and command need to be set, what about the
>>>>> shprard
>>>>> command, anything else?
>>>>>
>>>>>
>>>>>                        Thanks,
>>>>>                         Jerry
>>>>>
>>>>>
>>>>>> That means sshd is reading the default config file!
>>>>>>
>>>>>> So if you disable login for the system sshd, the one launched by
>>>>>> SGE will also get the same configuration.
>>>>>>
>>>>>> The way to get this to work is to copy the configuration to
>>>>>> another location, and then disable login in the default one.
>>>>>> Then add "-f <path to the backup config file>" to rsh_daemon and
>>>>>> rlogin_daemon in your cluster configuration :
>>>>>>
>>>>>> http://gridengine.sunsource.net/howto/qrsh_qlogin_ssh.html
>>>>>>
>>>>>> Note that when you use SSH, the PE is not tight anymore. Tight
>>>>>> SSH integration is enabled in maintrunk, and may be back-ported
>>>>>> to V6.0:
>>>>>>
>>>>>> http://gridengine.sunsource.net/servlets/BrowseList?
>>>>>> list=dev&by=thread&from=9051
>>>>>>
>>>>>>  -Ron
>>>>>>
>>>>>>
>>>>>>
>>>>>> --- Jerry Mersel <jerry.mersel at weizmann.ac.il> wrote:
>>>>>>> Thank you for your quick response.
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list