[GE users] prevent users from executing jobs on nodes except via sungrid

Jerry Mersel jerry.mersel at weizmann.ac.il
Sun Apr 2 21:06:29 BST 2006


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Yes, they are.

> Am 02.04.2006 um 19:36 schrieb Jerry Mersel:
>
>> No the job isn't running as root.
>>
>> I have a strong suspicion that SGE's rshd daemon isn't running.
>>
>> Do I have to enable it in some way? I just assumed when I stopped
>> running the system's rshd daemon that SGE would take care of running
>> its own rshd daemon. But maybe I was mistaken.
>
> The SGE daemon will not run all the time, but will be started in one
> instance for every qrsh call you make on a randomly chosen port. Are
> these three programs in utilbin owned by root and have the suid set:
>
> -r-s--x--x  1 root root  26K 2005-12-09 13:41 rlogin
> -r-s--x--x  1 root root  20K 2005-12-09 13:41 rsh
> ...
> -r-s--x--x  1 root root  22K 2005-12-09 13:41 testsuidroot
>
> -- Reuti
>
>
>> When I'm not using SGE and running the system's rshd daemon it was
>> necessary to setup <HOME>/.rhosts so the users could run on parallel
>> machines without using a password.
>>
>>                             Best Regards,
>>                                Jerry
>>
>>
>> P.S. It was working with rshd (system) and .rhosts and/or
>> sshd/authorized_keys2. But I don't think that's in tight integration.
>>
>> P.S.S Perhaps it would help if I ran it without mpirun?
>>
>> P.S.S.S Just babbling at the moment.
>>
>>> Am 02.04.2006 um 14:57 schrieb Jerry Mersel:
>>>
>>>> I looked into <tmp/some_dir/> there the links and machinefile
>>>> have been set up correctly.
>>>>
>>>> What is causing the trouble is "Connection refused".
>>>>
>>>> When I was using the system rsh (along with .rhosts) I was able to
>>>> make
>>>> the connection.
>>>>
>>>> Now using SGE rshd with .rhosts or without I get Connection refused.
>>>
>>> Are you running the jobs as root? I never had to put something
>>> into .rhosts for each of my individual users.
>>>
>>> -- Reuti
>>>
>>>
>>>>                          Regards,
>>>>                            Jerry
>>>>
>>>>
>>>>
>>>>> Am 31.03.2006 um 14:04 schrieb Jerry Mersel:
>>>>>
>>>>>> I rebuilt it, but it didn't help:
>>>>>>
>>>>>> here are the results:
>>>>>>
>>>>>> error file:
>>>>>>
>>>>>> connect to address 192.168.1.3: Connection refused
>>>>>> connect to address 192.168.1.3: Connection refused
>>>>>> trying normal rsh (/usr/bin/rsh)
>>>>>
>>>>> The question is, whether the directory with the rsh-wrapper was
>>>>> correctly setup on the slave node. Just submit a parallel job,
>>>>> put a
>>>>> sleep 600 or so in the job script (instead of any mpirun command),
>>>>> and check whether the /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh
>>>>> created the correct machinefile and the correct link to the rsh-
>>>>> wrapper to the /wiccusers/mlmersel/mlmersel/mpi/rsh on the master-
>>>>> node of the parallel job.
>>>>>
>>>>> BTW: Any firewall on the slave nodes?
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>> wiccopt-3.weizmann.ac.il: Connection refused
>>>>>>
>>>>>> standard output file:
>>>>>>
>>>>>> p0_11663:  p4_error: Child process exited while making
>>>>>> connection to
>>>>>> remote proc
>>>>>> ess on wiccopt-3: 0
>>>>>> p0_11663: (33.023438) net_send: could not write to fd=4, errno
>>>>>> = 32
>>>>>>
>>>>>>                           Regards,
>>>>>>                              Jerry
>>>>>>
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Am 30.03.2006 um 16:34 schrieb Jerry Mersel:
>>>>>>>
>>>>>>>> Thanks Reuti:
>>>>>>>>
>>>>>>>>   You could probably tell I'm getting kinda desparate with this,
>>>>>>>>   and so I am.
>>>>>>>>
>>>>>>>>   I stopped running the system rshd.
>>>>>>>>   I tried it with a conventional switch, and with MPICH that
>>>>>>>> doesn't come
>>>>>>>>   from voltaire, and built it with <SGEROOT>/utilbin/lx24-amd64/
>>>>>>>> rsh as
>>>>>>>>   the RSHCOMMAND.
>>>>>>>>
>>>>>>>
>>>>>>> no, this way the wrapper won't work. Please recompile it with a
>>>>>>> simple switch:
>>>>>>>
>>>>>>> -rsh=rsh
>>>>>>>
>>>>>>> in the MPICH ./configure - although it's deprecated.
>>>>>>>
>>>>>>> Cheers - Reuti
>>>>>>>
>>>>>>>
>>>>>>>>   If I run on one node it works, (doesn't run on master), on
>>>>>>>> 2 or
>>>>>>>> more
>>>>>>>>   I get Connection refused.
>>>>>>>>
>>>>>>>>   I have looked into <sge>/mpi.
>>>>>>>>
>>>>>>>>   The PE that I am using:
>>>>>>>>
>>>>>>>> pe_name           mlmersel
>>>>>>>> slots             999
>>>>>>>> user_lists        NONE
>>>>>>>> xuser_lists       NONE
>>>>>>>> start_proc_args   /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh -
>>>>>>>> catch_rsh
>>>>>>>> $pe_hostfile
>>>>>>>> stop_proc_args    /wiccusers/mlmersel/mlmersel/mpi/stopmpi.sh
>>>>>>>> allocation_rule   $round_robin
>>>>>>>> control_slaves    TRUE
>>>>>>>> job_is_first_task FALSE
>>>>>>>> urgency_slots     min
>>>>>>>>
>>>>>>>>
>>>>>>>> The queue config:
>>>>>>>>
>>>>>>>>   qname                 all.q
>>>>>>>> hostlist              @allhosts
>>>>>>>> seq_no                0
>>>>>>>> load_thresholds       np_load_avg=1.75
>>>>>>>> suspend_thresholds    NONE
>>>>>>>> nsuspend              1
>>>>>>>> suspend_interval      00:05:00
>>>>>>>> priority              0
>>>>>>>> min_cpu_interval      00:05:00
>>>>>>>> processors            UNDEFINED
>>>>>>>> qtype                 BATCH INTERACTIVE
>>>>>>>> ckpt_list             NONE
>>>>>>>> pe_list               jerry mlmersel mpi mymake
>>>>>>>> rerun                 FALSE
>>>>>>>> slots                 2,[wiccopt-2.weizmann.ac.il=2], \
>>>>>>>>                       [wiccopt-3.weizmann.ac.il=2], \
>>>>>>>>       %2
>>>>>>
>>>>>> ------------------------------------------------------------------
>>>>>> --
>>>>>> -
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-
>>>>>> help at gridengine.sunsource.net
>>>>>
>>>>> -------------------------------------------------------------------
>>>>> --
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-
>>>>> help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>>>>
>>>> --------------------------------------------------------------------
>>>> -
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list