[GE users] prevent users from executing jobs on nodes except via sungrid

Reuti reuti at staff.uni-marburg.de
Mon Apr 3 20:46:02 BST 2006


Am 03.04.2006 um 15:09 schrieb Jerry Mersel:

> Hi :
>
>
> I really appreciate all the time and effort Reuti, and the other users
> gave in order to solve my problem.
>
> My boss (not my real boss who is my wife) wants to get users  
> working on
> this grid so we're going to use ssh and I'll probably right a script
> to make sure that the processes are children of SGE.
>
> But another try - the only difference is in the script, the PE is  
> the same,
> the one using ssh uses mpirun_ssh ...
> and the one using SGE's rsh uses mpirun_rsh,
> with the mpirun_rsh I get permission denied on the nodes.

There is no need to have different mpiruns. The to be used rsh/ssh- 
command can be overwritten at runtime with e.g.:

export P4_RSHCOMMAND=rsh

please have also a look at: http://gridengine.sunsource.net/howto/ 
mpich-integration.html

- Is there any hint in any of the logfiles of the nodes?
- Any /etc/hosts.allow or /etc/hosts.deny on the nodes?

-- Reuti


>                              Thank you very much,
>                                   Jerry
>
>
>
>> Am 02.04.2006 um 19:36 schrieb Jerry Mersel:
>>
>>> No the job isn't running as root.
>>>
>>> I have a strong suspicion that SGE's rshd daemon isn't running.
>>>
>>> Do I have to enable it in some way? I just assumed when I stopped
>>> running the system's rshd daemon that SGE would take care of running
>>> its own rshd daemon. But maybe I was mistaken.
>>
>> The SGE daemon will not run all the time, but will be started in one
>> instance for every qrsh call you make on a randomly chosen port. Are
>> these three programs in utilbin owned by root and have the suid set:
>>
>> -r-s--x--x  1 root root  26K 2005-12-09 13:41 rlogin
>> -r-s--x--x  1 root root  20K 2005-12-09 13:41 rsh
>> ...
>> -r-s--x--x  1 root root  22K 2005-12-09 13:41 testsuidroot
>>
>> -- Reuti
>>
>>
>>> When I'm not using SGE and running the system's rshd daemon it was
>>> necessary to setup <HOME>/.rhosts so the users could run on parallel
>>> machines without using a password.
>>>
>>>                             Best Regards,
>>>                                Jerry
>>>
>>>
>>> P.S. It was working with rshd (system) and .rhosts and/or
>>> sshd/authorized_keys2. But I don't think that's in tight  
>>> integration.
>>>
>>> P.S.S Perhaps it would help if I ran it without mpirun?
>>>
>>> P.S.S.S Just babbling at the moment.
>>>
>>>> Am 02.04.2006 um 14:57 schrieb Jerry Mersel:
>>>>
>>>>> I looked into <tmp/some_dir/> there the links and machinefile
>>>>> have been set up correctly.
>>>>>
>>>>> What is causing the trouble is "Connection refused".
>>>>>
>>>>> When I was using the system rsh (along with .rhosts) I was able to
>>>>> make
>>>>> the connection.
>>>>>
>>>>> Now using SGE rshd with .rhosts or without I get Connection  
>>>>> refused.
>>>>
>>>> Are you running the jobs as root? I never had to put something
>>>> into .rhosts for each of my individual users.
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>>                          Regards,
>>>>>                            Jerry
>>>>>
>>>>>
>>>>>
>>>>>> Am 31.03.2006 um 14:04 schrieb Jerry Mersel:
>>>>>>
>>>>>>> I rebuilt it, but it didn't help:
>>>>>>>
>>>>>>> here are the results:
>>>>>>>
>>>>>>> error file:
>>>>>>>
>>>>>>> connect to address 192.168.1.3: Connection refused
>>>>>>> connect to address 192.168.1.3: Connection refused
>>>>>>> trying normal rsh (/usr/bin/rsh)
>>>>>>
>>>>>> The question is, whether the directory with the rsh-wrapper was
>>>>>> correctly setup on the slave node. Just submit a parallel job,
>>>>>> put a
>>>>>> sleep 600 or so in the job script (instead of any mpirun  
>>>>>> command),
>>>>>> and check whether the /wiccusers/mlmersel/mlmersel/mpi/ 
>>>>>> startmpi.sh
>>>>>> created the correct machinefile and the correct link to the rsh-
>>>>>> wrapper to the /wiccusers/mlmersel/mlmersel/mpi/rsh on the  
>>>>>> master-
>>>>>> node of the parallel job.
>>>>>>
>>>>>> BTW: Any firewall on the slave nodes?
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>
>>>>>>> wiccopt-3.weizmann.ac.il: Connection refused
>>>>>>>
>>>>>>> standard output file:
>>>>>>>
>>>>>>> p0_11663:  p4_error: Child process exited while making
>>>>>>> connection to
>>>>>>> remote proc
>>>>>>> ess on wiccopt-3: 0
>>>>>>> p0_11663: (33.023438) net_send: could not write to fd=4, errno
>>>>>>> = 32
>>>>>>>
>>>>>>>                           Regards,
>>>>>>>                              Jerry
>>>>>>>
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Am 30.03.2006 um 16:34 schrieb Jerry Mersel:
>>>>>>>>
>>>>>>>>> Thanks Reuti:
>>>>>>>>>
>>>>>>>>>   You could probably tell I'm getting kinda desparate with  
>>>>>>>>> this,
>>>>>>>>>   and so I am.
>>>>>>>>>
>>>>>>>>>   I stopped running the system rshd.
>>>>>>>>>   I tried it with a conventional switch, and with MPICH that
>>>>>>>>> doesn't come
>>>>>>>>>   from voltaire, and built it with <SGEROOT>/utilbin/lx24- 
>>>>>>>>> amd64/
>>>>>>>>> rsh as
>>>>>>>>>   the RSHCOMMAND.
>>>>>>>>>
>>>>>>>>
>>>>>>>> no, this way the wrapper won't work. Please recompile it with a
>>>>>>>> simple switch:
>>>>>>>>
>>>>>>>> -rsh=rsh
>>>>>>>>
>>>>>>>> in the MPICH ./configure - although it's deprecated.
>>>>>>>>
>>>>>>>> Cheers - Reuti
>>>>>>>>
>>>>>>>>
>>>>>>>>>   If I run on one node it works, (doesn't run on master), on
>>>>>>>>> 2 or
>>>>>>>>> more
>>>>>>>>>   I get Connection refused.
>>>>>>>>>
>>>>>>>>>   I have looked into <sge>/mpi.
>>>>>>>>>
>>>>>>>>>   The PE that I am using:
>>>>>>>>>
>>>>>>>>> pe_name           mlmersel
>>>>>>>>> slots             999
>>>>>>>>> user_lists        NONE
>>>>>>>>> xuser_lists       NONE
>>>>>>>>> start_proc_args   /wiccusers/mlmersel/mlmersel/mpi/ 
>>>>>>>>> startmpi.sh -
>>>>>>>>> catch_rsh
>>>>>>>>> $pe_hostfile
>>>>>>>>> stop_proc_args    /wiccusers/mlmersel/mlmersel/mpi/stopmpi.sh
>>>>>>>>> allocation_rule   $round_robin
>>>>>>>>> control_slaves    TRUE
>>>>>>>>> job_is_first_task FALSE
>>>>>>>>> urgency_slots     min
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The queue config:
>>>>>>>>>
>>>>>>>>>   qname                 all.q
>>>>>>>>> hostlist              @allhosts
>>>>>>>>> seq_no                0
>>>>>>>>> load_thresholds       np_load_avg=1.75
>>>>>>>>> suspend_thresholds    NONE
>>>>>>>>> nsuspend              1
>>>>>>>>> suspend_interval      00:05:00
>>>>>>>>> priority              0
>>>>>>>>> min_cpu_interval      00:05:00
>>>>>>>>> processors            UNDEFINED
>>>>>>>>> qtype                 BATCH INTERACTIVE
>>>>>>>>> ckpt_list             NONE
>>>>>>>>> pe_list               jerry mlmersel mpi mymake
>>>>>>>>> rerun                 FALSE
>>>>>>>>> slots                 2,[wiccopt-2.weizmann.ac.il=2], \
>>>>>>>>>                       [wiccopt-3.weizmann.ac.il=2], \
>>>>>>>>>       %2
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> --
>>>>>>> --
>>>>>>> -
>>>>>>> To unsubscribe, e-mail: users- 
>>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: users-
>>>>>>> help at gridengine.sunsource.net
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> --
>>>>>> --
>>>>>> To unsubscribe, e-mail: users- 
>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-
>>>>>> help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> --
>>>>> -
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list