[GE users] prevent users from executing jobs on nodes except via sungrid

Jerry Mersel jerry.mersel at weizmann.ac.il
Sun Apr 2 13:57:55 BST 2006


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I looked into <tmp/some_dir/> there the links and machinefile
have been set up correctly.

What is causing the trouble is "Connection refused".

When I was using the system rsh (along with .rhosts) I was able to make
the connection.

Now using SGE rshd with .rhosts or without I get Connection refused.

                         Regards,
                           Jerry



> Am 31.03.2006 um 14:04 schrieb Jerry Mersel:
>
>> I rebuilt it, but it didn't help:
>>
>> here are the results:
>>
>> error file:
>>
>> connect to address 192.168.1.3: Connection refused
>> connect to address 192.168.1.3: Connection refused
>> trying normal rsh (/usr/bin/rsh)
>
> The question is, whether the directory with the rsh-wrapper was
> correctly setup on the slave node. Just submit a parallel job, put a
> sleep 600 or so in the job script (instead of any mpirun command),
> and check whether the /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh
> created the correct machinefile and the correct link to the rsh-
> wrapper to the /wiccusers/mlmersel/mlmersel/mpi/rsh on the master-
> node of the parallel job.
>
> BTW: Any firewall on the slave nodes?
>
> -- Reuti
>
>
>> wiccopt-3.weizmann.ac.il: Connection refused
>>
>> standard output file:
>>
>> p0_11663:  p4_error: Child process exited while making connection to
>> remote proc
>> ess on wiccopt-3: 0
>> p0_11663: (33.023438) net_send: could not write to fd=4, errno = 32
>>
>>                           Regards,
>>                              Jerry
>>
>>
>>> Hi,
>>>
>>> Am 30.03.2006 um 16:34 schrieb Jerry Mersel:
>>>
>>>> Thanks Reuti:
>>>>
>>>>   You could probably tell I'm getting kinda desparate with this,
>>>>   and so I am.
>>>>
>>>>   I stopped running the system rshd.
>>>>   I tried it with a conventional switch, and with MPICH that
>>>> doesn't come
>>>>   from voltaire, and built it with <SGEROOT>/utilbin/lx24-amd64/
>>>> rsh as
>>>>   the RSHCOMMAND.
>>>>
>>>
>>> no, this way the wrapper won't work. Please recompile it with a
>>> simple switch:
>>>
>>> -rsh=rsh
>>>
>>> in the MPICH ./configure - although it's deprecated.
>>>
>>> Cheers - Reuti
>>>
>>>
>>>>   If I run on one node it works, (doesn't run on master), on 2 or
>>>> more
>>>>   I get Connection refused.
>>>>
>>>>   I have looked into <sge>/mpi.
>>>>
>>>>   The PE that I am using:
>>>>
>>>> pe_name           mlmersel
>>>> slots             999
>>>> user_lists        NONE
>>>> xuser_lists       NONE
>>>> start_proc_args   /wiccusers/mlmersel/mlmersel/mpi/startmpi.sh -
>>>> catch_rsh
>>>> $pe_hostfile
>>>> stop_proc_args    /wiccusers/mlmersel/mlmersel/mpi/stopmpi.sh
>>>> allocation_rule   $round_robin
>>>> control_slaves    TRUE
>>>> job_is_first_task FALSE
>>>> urgency_slots     min
>>>>
>>>>
>>>> The queue config:
>>>>
>>>>   qname                 all.q
>>>> hostlist              @allhosts
>>>> seq_no                0
>>>> load_thresholds       np_load_avg=1.75
>>>> suspend_thresholds    NONE
>>>> nsuspend              1
>>>> suspend_interval      00:05:00
>>>> priority              0
>>>> min_cpu_interval      00:05:00
>>>> processors            UNDEFINED
>>>> qtype                 BATCH INTERACTIVE
>>>> ckpt_list             NONE
>>>> pe_list               jerry mlmersel mpi mymake
>>>> rerun                 FALSE
>>>> slots                 2,[wiccopt-2.weizmann.ac.il=2], \
>>>>                       [wiccopt-3.weizmann.ac.il=2], \
>>>>       %2
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list