[GE users] prevent users from executing jobs on nodes except via sungrid

Reuti reuti at staff.uni-marburg.de
Sun Apr 2 14:31:49 BST 2006


Am 02.04.2006 um 14:57 schrieb Jerry Mersel:

> I looked into <tmp/some_dir/> there the links and machinefile
> have been set up correctly.
>
> What is causing the trouble is "Connection refused".
>
> When I was using the system rsh (along with .rhosts) I was able to  
> make
> the connection.
>
> Now using SGE rshd with .rhosts or without I get Connection refused.

Are you running the jobs as root? I never had to put something  
into .rhosts for each of my individual users.

-- Reuti


>                          Regards,
>                            Jerry
>
>
>
>> Am 31.03.2006 um 14:04 schrieb Jerry Mersel:
>>
>>> I rebuilt it, but it didn't help:
>>>
>>> here are the results:
>>>
>>> error file:
>>>
>>> connect to address 192.168.1.3: Connection refused
>>> connect to address 192.168.1.3: Connection refused
>>> trying normal rsh (/usr/bin/rsh)
>>
>> The question is, whether the directory with the rsh-wrapper was
>> correctly setup on the slave node. Just submit a parallel job, put a
>> sleep 600 or so in the job script (instead of any mpirun command),
>> and check whether the /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh
>> created the correct machinefile and the correct link to the rsh-
>> wrapper to the /wiccusers/mlmersel/mlmersel/mpi/rsh on the master-
>> node of the parallel job.
>>
>> BTW: Any firewall on the slave nodes?
>>
>> -- Reuti
>>
>>
>>> wiccopt-3.weizmann.ac.il: Connection refused
>>>
>>> standard output file:
>>>
>>> p0_11663:  p4_error: Child process exited while making connection to
>>> remote proc
>>> ess on wiccopt-3: 0
>>> p0_11663: (33.023438) net_send: could not write to fd=4, errno = 32
>>>
>>>                           Regards,
>>>                              Jerry
>>>
>>>
>>>> Hi,
>>>>
>>>> Am 30.03.2006 um 16:34 schrieb Jerry Mersel:
>>>>
>>>>> Thanks Reuti:
>>>>>
>>>>>   You could probably tell I'm getting kinda desparate with this,
>>>>>   and so I am.
>>>>>
>>>>>   I stopped running the system rshd.
>>>>>   I tried it with a conventional switch, and with MPICH that
>>>>> doesn't come
>>>>>   from voltaire, and built it with <SGEROOT>/utilbin/lx24-amd64/
>>>>> rsh as
>>>>>   the RSHCOMMAND.
>>>>>
>>>>
>>>> no, this way the wrapper won't work. Please recompile it with a
>>>> simple switch:
>>>>
>>>> -rsh=rsh
>>>>
>>>> in the MPICH ./configure - although it's deprecated.
>>>>
>>>> Cheers - Reuti
>>>>
>>>>
>>>>>   If I run on one node it works, (doesn't run on master), on 2 or
>>>>> more
>>>>>   I get Connection refused.
>>>>>
>>>>>   I have looked into <sge>/mpi.
>>>>>
>>>>>   The PE that I am using:
>>>>>
>>>>> pe_name           mlmersel
>>>>> slots             999
>>>>> user_lists        NONE
>>>>> xuser_lists       NONE
>>>>> start_proc_args   /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh -
>>>>> catch_rsh
>>>>> $pe_hostfile
>>>>> stop_proc_args    /wiccusers/mlmersel/mlmersel/mpi/stopmpi.sh
>>>>> allocation_rule   $round_robin
>>>>> control_slaves    TRUE
>>>>> job_is_first_task FALSE
>>>>> urgency_slots     min
>>>>>
>>>>>
>>>>> The queue config:
>>>>>
>>>>>   qname                 all.q
>>>>> hostlist              @allhosts
>>>>> seq_no                0
>>>>> load_thresholds       np_load_avg=1.75
>>>>> suspend_thresholds    NONE
>>>>> nsuspend              1
>>>>> suspend_interval      00:05:00
>>>>> priority              0
>>>>> min_cpu_interval      00:05:00
>>>>> processors            UNDEFINED
>>>>> qtype                 BATCH INTERACTIVE
>>>>> ckpt_list             NONE
>>>>> pe_list               jerry mlmersel mpi mymake
>>>>> rerun                 FALSE
>>>>> slots                 2,[wiccopt-2.weizmann.ac.il=2], \
>>>>>                       [wiccopt-3.weizmann.ac.il=2], \
>>>>>       %2
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list