[GE users] rcmd: socket: Permission denied

Tod Hagan tod at gust.sr.unh.edu
Wed Jul 25 16:50:36 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello,

I'm a long-time 5.3 user having a bit of trouble getting 6.1 configured.
I'm getting an error from qrsh when trying to run a parallel job using
MPICH tight integration on RHEL 5.

Version: sge-6.1-bin-lx24-x86.tar.gz
O/S: Red Hat Enterprise Linux Server release 5 (Tikanga)
MPICH: mpich-1.2.7p1

The error from mpirun:
        
        running /net/data/raid5/data/tod/MM5-sge-test/Run/mm5.mpp on 6 LINUX ch_p4 processors
        Created /net/data/raid5/data/tod/MM5-sge-test/Run/PI5718
        /opt/ge6.1_lx24-x86/bin/lx24-x86/qrsh -inherit -nostdin node114 /net/data/raid5/data/tod/MM5-sge-test/Run/mm5.mpp node104 43889 \-p4amslave \-p4yourname node114 \-p4rmrank 1
        rcmd: socket: Permission denied
        can't open file /tmp/20.1.all.q/pid.1.node114: No such file or directory
        p0_5886:  p4_error: Child process exited while making connection to remote process on node114: 0
        p0_5886: (70.252667) net_send: could not write to fd=4, errno = 32
        
The PE:
        
        qconf -sp mpich
        pe_name           mpich
        slots             40
        user_lists        NONE
        xuser_lists       NONE
        start_proc_args   /opt/ge6.1_lx24-x86/mpi/startmpi.sh -catch_rsh $pe_hostfile
        stop_proc_args    /opt/ge6.1_lx24-x86/mpi/stopmpi.sh
        allocation_rule   $round_robin
        control_slaves    TRUE
        job_is_first_task TRUE
        urgency_slots     min
        
qrsh by itself also fails:

        > qrsh -verbose
        Your job 27 ("QRLOGIN") has been submitted
        waiting for interactive job to be scheduled ...error: 1: rlogin_daemon "/usr/sbin/in.rlogind" can't be read: No such file or directory (2)
        
        > ls -l /usr/sbin/in.rlogind
        ls: /usr/sbin/in.rlogind: No such file or directory
        > qconf -sconf | grep login
        login_shells                 sh,ksh,csh,tcsh
        > qconf -sconf | grep rsh
        >

I don't understand why qrsh is trying to use /usr/sbin/in.rlogind when
I've cleared it from the rlogin_daemon parameter in the global
configuration. The rlogin_daemon section of the sge_conf man page says

        If no value is given, a specialized Grid Engine component is
        used.

but that doesn't seem to be working.

Any insights or suggestions would be most appreciated.

Thanks in advance.

Tod

-- 
Tod Hagan
Information Technologist
AIRMAP/Climate Change Research Center
Institute for the Study of Earth, Oceans, and Space
University of New Hampshire
Durham, NH 03824
Phone: 603-862-3116


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list