[GE users] prevent users from executing jobs on nodes except via sungrid

Jerry Mersel jerry.mersel at weizmann.ac.il
Sun Apr 2 18:36:48 BST 2006


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

No the job isn't running as root.

I have a strong suspicion that SGE's rshd daemon isn't running.

Do I have to enable it in some way? I just assumed when I stopped
running the system's rshd daemon that SGE would take care of running
its own rshd daemon. But maybe I was mistaken.

When I'm not using SGE and running the system's rshd daemon it was
necessary to setup <HOME>/.rhosts so the users could run on parallel
machines without using a password.

                            Best Regards,
                               Jerry


P.S. It was working with rshd (system) and .rhosts and/or
sshd/authorized_keys2. But I don't think that's in tight integration.

P.S.S Perhaps it would help if I ran it without mpirun?

P.S.S.S Just babbling at the moment.

> Am 02.04.2006 um 14:57 schrieb Jerry Mersel:
>
>> I looked into <tmp/some_dir/> there the links and machinefile
>> have been set up correctly.
>>
>> What is causing the trouble is "Connection refused".
>>
>> When I was using the system rsh (along with .rhosts) I was able to
>> make
>> the connection.
>>
>> Now using SGE rshd with .rhosts or without I get Connection refused.
>
> Are you running the jobs as root? I never had to put something
> into .rhosts for each of my individual users.
>
> -- Reuti
>
>
>>                          Regards,
>>                            Jerry
>>
>>
>>
>>> Am 31.03.2006 um 14:04 schrieb Jerry Mersel:
>>>
>>>> I rebuilt it, but it didn't help:
>>>>
>>>> here are the results:
>>>>
>>>> error file:
>>>>
>>>> connect to address 192.168.1.3: Connection refused
>>>> connect to address 192.168.1.3: Connection refused
>>>> trying normal rsh (/usr/bin/rsh)
>>>
>>> The question is, whether the directory with the rsh-wrapper was
>>> correctly setup on the slave node. Just submit a parallel job, put a
>>> sleep 600 or so in the job script (instead of any mpirun command),
>>> and check whether the /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh
>>> created the correct machinefile and the correct link to the rsh-
>>> wrapper to the /wiccusers/mlmersel/mlmersel/mpi/rsh on the master-
>>> node of the parallel job.
>>>
>>> BTW: Any firewall on the slave nodes?
>>>
>>> -- Reuti
>>>
>>>
>>>> wiccopt-3.weizmann.ac.il: Connection refused
>>>>
>>>> standard output file:
>>>>
>>>> p0_11663:  p4_error: Child process exited while making connection to
>>>> remote proc
>>>> ess on wiccopt-3: 0
>>>> p0_11663: (33.023438) net_send: could not write to fd=4, errno = 32
>>>>
>>>>                           Regards,
>>>>                              Jerry
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> Am 30.03.2006 um 16:34 schrieb Jerry Mersel:
>>>>>
>>>>>> Thanks Reuti:
>>>>>>
>>>>>>   You could probably tell I'm getting kinda desparate with this,
>>>>>>   and so I am.
>>>>>>
>>>>>>   I stopped running the system rshd.
>>>>>>   I tried it with a conventional switch, and with MPICH that
>>>>>> doesn't come
>>>>>>   from voltaire, and built it with <SGEROOT>/utilbin/lx24-amd64/
>>>>>> rsh as
>>>>>>   the RSHCOMMAND.
>>>>>>
>>>>>
>>>>> no, this way the wrapper won't work. Please recompile it with a
>>>>> simple switch:
>>>>>
>>>>> -rsh=rsh
>>>>>
>>>>> in the MPICH ./configure - although it's deprecated.
>>>>>
>>>>> Cheers - Reuti
>>>>>
>>>>>
>>>>>>   If I run on one node it works, (doesn't run on master), on 2 or
>>>>>> more
>>>>>>   I get Connection refused.
>>>>>>
>>>>>>   I have looked into <sge>/mpi.
>>>>>>
>>>>>>   The PE that I am using:
>>>>>>
>>>>>> pe_name           mlmersel
>>>>>> slots             999
>>>>>> user_lists        NONE
>>>>>> xuser_lists       NONE
>>>>>> start_proc_args   /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh -
>>>>>> catch_rsh
>>>>>> $pe_hostfile
>>>>>> stop_proc_args    /wiccusers/mlmersel/mlmersel/mpi/stopmpi.sh
>>>>>> allocation_rule   $round_robin
>>>>>> control_slaves    TRUE
>>>>>> job_is_first_task FALSE
>>>>>> urgency_slots     min
>>>>>>
>>>>>>
>>>>>> The queue config:
>>>>>>
>>>>>>   qname                 all.q
>>>>>> hostlist              @allhosts
>>>>>> seq_no                0
>>>>>> load_thresholds       np_load_avg=1.75
>>>>>> suspend_thresholds    NONE
>>>>>> nsuspend              1
>>>>>> suspend_interval      00:05:00
>>>>>> priority              0
>>>>>> min_cpu_interval      00:05:00
>>>>>> processors            UNDEFINED
>>>>>> qtype                 BATCH INTERACTIVE
>>>>>> ckpt_list             NONE
>>>>>> pe_list               jerry mlmersel mpi mymake
>>>>>> rerun                 FALSE
>>>>>> slots                 2,[wiccopt-2.weizmann.ac.il=2], \
>>>>>>                       [wiccopt-3.weizmann.ac.il=2], \
>>>>>>       %2
>>>>
>>>> --------------------------------------------------------------------
>>>> -
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list