[GE users] prevent users from executing jobs on nodes except via sungrid

Rayson Ho rayrayson at gmail.com
Mon Mar 27 15:20:55 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Please re-enable normal user login and then find out when it is
enabled (so that parallel jobs do not fail), whether the PE uses sshd
or rshd - also find out if it uses the system rshd or SGE rshd.

You can get that info by looking at the parent/child relationship of
the slave MPI tasks.

If you are using SGE rshd, and if you disable login by creating
/etc/nologin, then follow this:
http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&msgNo=5023

Otherwise, if it is because you are not using SGE's rshd, then tight
integration is not configured correctly...

Rayson



On 3/27/06, Jerry Mersel <jerry.mersel at weizmann.ac.il> wrote:
> I thought after setting up tight integration everything was working,
> but I was mistaken.
>
>
> When I run a parallel job with MPICH I still get errors in the stderr
> output file such as:
>
> Child xxx exited without finalize.
>
> If I allow for the user to login without password on the other nodes
> it works. But I only want root to log into the other nodes.
>
>
> Here is the PE setup:
>
> pe_name           mpi
> slots             999
> user_lists        NONE
> xuser_lists       NONE
> start_proc_args   /home/mlmersel/mpi/startmpi.sh -catch_rsh $pe_hostfile
> stop_proc_args    /home/mlmersel/mpi/stopmpi.sh
> allocation_rule   $round_robin
> control_slaves    TRUE
> job_is_first_task FALSE
> urgency_slots     min
>
>
> # ---------------------------
> # our name
> #$ -N MPI_Job
> #
> # pe request
> #$ -pe mpi 2-8
> #
> # MPIR_HOME from submitting environment
> #$ -v MPIR_HOME
> # ---------------------------
>
> Here is the script:
>
>
> #
> # needs in
> #   $NSLOTS
> #       the number of tasks to be used
> #   $TMPDIR/machines
> #       a valid machine file to be passed to mpirun
>
> echo "Got $NSLOTS slots."
>
> /usr/voltaire/mpi/bin/mpirun_ssh -np 2 -hostfile $TMPDIR/machines
> /usr/voltaire/mpi/bin/cpi
>
>
> I usually load it using qmon with pe mpi 2-8.
>
>
> I'm not sure on how to solve this so any help will be of benefit.
>
>
>                              Thanks,
>                                 Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list