[GE users] prevent users from executing jobs on nodes except via sungrid

Reuti reuti at staff.uni-marburg.de
Sun Apr 2 19:00:42 BST 2006


Am 02.04.2006 um 19:36 schrieb Jerry Mersel:

> No the job isn't running as root.
>
> I have a strong suspicion that SGE's rshd daemon isn't running.
>
> Do I have to enable it in some way? I just assumed when I stopped
> running the system's rshd daemon that SGE would take care of running
> its own rshd daemon. But maybe I was mistaken.

The SGE daemon will not run all the time, but will be started in one  
instance for every qrsh call you make on a randomly chosen port. Are  
these three programs in utilbin owned by root and have the suid set:

-r-s--x--x  1 root root  26K 2005-12-09 13:41 rlogin
-r-s--x--x  1 root root  20K 2005-12-09 13:41 rsh
...
-r-s--x--x  1 root root  22K 2005-12-09 13:41 testsuidroot

-- Reuti


> When I'm not using SGE and running the system's rshd daemon it was
> necessary to setup <HOME>/.rhosts so the users could run on parallel
> machines without using a password.
>
>                             Best Regards,
>                                Jerry
>
>
> P.S. It was working with rshd (system) and .rhosts and/or
> sshd/authorized_keys2. But I don't think that's in tight integration.
>
> P.S.S Perhaps it would help if I ran it without mpirun?
>
> P.S.S.S Just babbling at the moment.
>
>> Am 02.04.2006 um 14:57 schrieb Jerry Mersel:
>>
>>> I looked into <tmp/some_dir/> there the links and machinefile
>>> have been set up correctly.
>>>
>>> What is causing the trouble is "Connection refused".
>>>
>>> When I was using the system rsh (along with .rhosts) I was able to
>>> make
>>> the connection.
>>>
>>> Now using SGE rshd with .rhosts or without I get Connection refused.
>>
>> Are you running the jobs as root? I never had to put something
>> into .rhosts for each of my individual users.
>>
>> -- Reuti
>>
>>
>>>                          Regards,
>>>                            Jerry
>>>
>>>
>>>
>>>> Am 31.03.2006 um 14:04 schrieb Jerry Mersel:
>>>>
>>>>> I rebuilt it, but it didn't help:
>>>>>
>>>>> here are the results:
>>>>>
>>>>> error file:
>>>>>
>>>>> connect to address 192.168.1.3: Connection refused
>>>>> connect to address 192.168.1.3: Connection refused
>>>>> trying normal rsh (/usr/bin/rsh)
>>>>
>>>> The question is, whether the directory with the rsh-wrapper was
>>>> correctly setup on the slave node. Just submit a parallel job,  
>>>> put a
>>>> sleep 600 or so in the job script (instead of any mpirun command),
>>>> and check whether the /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh
>>>> created the correct machinefile and the correct link to the rsh-
>>>> wrapper to the /wiccusers/mlmersel/mlmersel/mpi/rsh on the master-
>>>> node of the parallel job.
>>>>
>>>> BTW: Any firewall on the slave nodes?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> wiccopt-3.weizmann.ac.il: Connection refused
>>>>>
>>>>> standard output file:
>>>>>
>>>>> p0_11663:  p4_error: Child process exited while making  
>>>>> connection to
>>>>> remote proc
>>>>> ess on wiccopt-3: 0
>>>>> p0_11663: (33.023438) net_send: could not write to fd=4, errno  
>>>>> = 32
>>>>>
>>>>>                           Regards,
>>>>>                              Jerry
>>>>>
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Am 30.03.2006 um 16:34 schrieb Jerry Mersel:
>>>>>>
>>>>>>> Thanks Reuti:
>>>>>>>
>>>>>>>   You could probably tell I'm getting kinda desparate with this,
>>>>>>>   and so I am.
>>>>>>>
>>>>>>>   I stopped running the system rshd.
>>>>>>>   I tried it with a conventional switch, and with MPICH that
>>>>>>> doesn't come
>>>>>>>   from voltaire, and built it with <SGEROOT>/utilbin/lx24-amd64/
>>>>>>> rsh as
>>>>>>>   the RSHCOMMAND.
>>>>>>>
>>>>>>
>>>>>> no, this way the wrapper won't work. Please recompile it with a
>>>>>> simple switch:
>>>>>>
>>>>>> -rsh=rsh
>>>>>>
>>>>>> in the MPICH ./configure - although it's deprecated.
>>>>>>
>>>>>> Cheers - Reuti
>>>>>>
>>>>>>
>>>>>>>   If I run on one node it works, (doesn't run on master), on  
>>>>>>> 2 or
>>>>>>> more
>>>>>>>   I get Connection refused.
>>>>>>>
>>>>>>>   I have looked into <sge>/mpi.
>>>>>>>
>>>>>>>   The PE that I am using:
>>>>>>>
>>>>>>> pe_name           mlmersel
>>>>>>> slots             999
>>>>>>> user_lists        NONE
>>>>>>> xuser_lists       NONE
>>>>>>> start_proc_args   /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh -
>>>>>>> catch_rsh
>>>>>>> $pe_hostfile
>>>>>>> stop_proc_args    /wiccusers/mlmersel/mlmersel/mpi/stopmpi.sh
>>>>>>> allocation_rule   $round_robin
>>>>>>> control_slaves    TRUE
>>>>>>> job_is_first_task FALSE
>>>>>>> urgency_slots     min
>>>>>>>
>>>>>>>
>>>>>>> The queue config:
>>>>>>>
>>>>>>>   qname                 all.q
>>>>>>> hostlist              @allhosts
>>>>>>> seq_no                0
>>>>>>> load_thresholds       np_load_avg=1.75
>>>>>>> suspend_thresholds    NONE
>>>>>>> nsuspend              1
>>>>>>> suspend_interval      00:05:00
>>>>>>> priority              0
>>>>>>> min_cpu_interval      00:05:00
>>>>>>> processors            UNDEFINED
>>>>>>> qtype                 BATCH INTERACTIVE
>>>>>>> ckpt_list             NONE
>>>>>>> pe_list               jerry mlmersel mpi mymake
>>>>>>> rerun                 FALSE
>>>>>>> slots                 2,[wiccopt-2.weizmann.ac.il=2], \
>>>>>>>                       [wiccopt-3.weizmann.ac.il=2], \
>>>>>>>       %2
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> --
>>>>> -
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list