[GE users] gamess info

lukacm at pdx.edu lukacm at pdx.edu
Thu Aug 24 19:23:45 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

My fault,

i guess i will just try make it run using ssh

thank you

martin

Quoting Reuti <reuti at staff.uni-marburg.de>:

> Hi,
>
> Am 24.08.2006 um 19:10 schrieb lukacm at pdx.edu:
>
> > Yes,
> >
> > all tools by SGE are working. The script is working (partially ,
> > because the
> > export are not working for some reason) with ssh; i.e. the jobs are
> > correctly
> > started on all remote nodes and do check in.
> >
> > With correct parameters SGE tools are spiffy and no problem.
> >
> > The return to the /usr/bin/rsh is a feature of ddikick, that if it
> > does not find
> > rsh, it tries all other system default rsh scripts.
>
> I grep'ed the GAMESS source for the word "trying" of the error
> message, and didn't found any hint. - Reuti
>
>
> >
> > martin
> >
> >
> > Quoting Reuti <reuti at staff.uni-marburg.de>:
> >
> >> Hi,
> >>
> >> but it looks much better now at least.
> >>
> >> Am 23.08.2006 um 21:17 schrieb lukacm at pdx.edu:
> >>
> >>> It is missing hostname,
> >>>
> >>> the qrsh is deplyed. i can catch now the log (finally ) of the
> >>> ddikick process:
> >>>
> >>>  ddikick.x: finished with -ddi argument.
> >>>  ddikick.x: finished with -dditree argument
> >>>  ddikick.x: finished with -ppn argument
> >>>  ddikick.x: finished with -scr argument.
> >>>
> >>>  Distributed Data Interface kickoff program.
> >>>  Initiating 4 compute processes on 4 nodes to run the following
> >>> command:
> >>>  /home/visible/apps/gamess/gamess.01.x exam20
> >>>
> >>>  ddikick.x: kickoff host = compute-0-5.local
> >>>  Master Kickoff Host compute-0-5.local is accepting connections on
> >>> port 33170.
> >>>  Awaiting connections from 8 GDDI processes.
> >>>  ddikick.x : Thread created on compute-0-5.local:33170 to accept
> >>> connections.
> >>>  ddikick.x: execvp command line: rsh compute-0-12.local
> >>> /home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/
> >>> gamess.01.x
> >>> exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1
> >>> compute-0-12.lo
> >>> cal:cpus=1 compute-0-9.local:cpus=1 -dditree compute-0-5.local
> >>> 33170 2 4 rsh
> >>> -scr /tmp/3840.1.gamess.q
> >>>  ddikick.x: execvp command line: rsh compute-0-4.local
> >>> /home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/
> >>> gamess.01.x
> >>> exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1
> >>> compute-0-12.loc
> >>> al:cpus=1 compute-0-9.local:cpus=1 -dditree compute-0-5.local 33170
> >>> 1 2 rsh -scr
> >>> /tmp/3840.1.gamess.q
> >>> Attemping to create DDI process 0 on local node 0.
> >>> DDI Process 0 Command Line: /home/visible/apps/gamess/gamess.01.x
> >>> exam20 -ddi
> >>> compute-0-5.local 33170 0 0 4 4 compute-0-5.local:cpus=1
> >>> compute-0-4.local:cpus=1 compute-0-12.local:cpus=1
> >>> compute-0-9.local:cpus=1
> >>> Attemping to create DDI process 4 on local node 0.
> >>> DDI Process 4 Command Line: /home/visible/apps/gamess/gamess.01.x
> >>> exam20 -ddi
> >>> compute-0-5.local 33170 0 4 4 4 compute-0-5.local:cpus=1
> >>> compute-0-4.local:cpus=1 compute-0-12.local:cpus=1
> >>> compute-0-9.local:cpus=1
> >>> /opt/gridengine/bin/lx26-amd64/qrsh -V -inherit compute-0-12.local
> >>> /home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/
> >>> gamess.01.x
> >>> exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1
> >>> comp
> >>> ute-0-12.local:cpus=1 compute-0-9.local:cpus=1 -dditree
> >>> compute-0-5.local 33170
> >>> 2 4 rsh -scr /tmp/3840.1.gamess.q
> >>> /opt/gridengine/bin/lx26-amd64/qrsh -V -inherit compute-0-4.local
> >>> /home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/
> >>> gamess.01.x
> >>> exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1
> >>> compu
> >>> te-0-12.local:cpus=1 compute-0-9.local:cpus=1 -dditree
> >>> compute-0-5.local 33170 1
> >>> 2 rsh -scr /tmp/3840.1.gamess.q
> >>>  ddikick.x: 4 bytes received; $lu remaining.
> >>>  ddikick.x: 4 bytes received; $lu remaining.
> >>>  ddikick.x : 0 checked in; receiving via port 33177 (Remaining=7).
> >>>  ddikick.x: 4 bytes received; $lu remaining.
> >>>  ddikick.x: 4 bytes received; $lu remaining.
> >>>  ddikick.x : 4 checked in; receiving via port 33179 (Remaining=6).
> >>>  ddikick.x: Sending kill signal to DDI processes.
> >>>  ddikick.x: Sending kill signal to DDI process 0.
> >>>  ddikick.x: Sending kill signal to DDI process 4.
> >>>  DDI Process 0: terminated upon request.
> >>>  DDI Process 4: terminated upon request.
> >>>  ddikick.x: Execution terminated due to error(s).
> >>>
> >>> and it the error log i have the same as before:
> >>>
> >>>
> >>> error: commlib error: access denied (client IP resolved to host
> >>> name "". This is
> >>> not identical to clients host name "")
> >>
> >> Okay, now we have to investigate this. The hostnames are also all
> >> known on all machines via /etc/hosts or e.g. NIS? Are the SGE tools
> >> gethostbyname, gethostbyaddr are working as expected and providing
> >> reasonable results on all nodes, and for all nodes on each one?
> >>
> >>> error: executing task of job 3840 failed: failed sending task to
> >>> execd at compute-0-12.local: can't find connection
> >>> error: commlib error: access denied (client IP resolved to host
> >>> name "". This is
> >>> not identical to clients host name "")
> >>> error: executing task of job 3840 failed: failed sending task to
> >>> execd at compute-0-4.local: can't find connection
> >>>  ddikick.x: Timed out while waiting for DDI processes to check in.
> >>>  ddikick.x: Fatal error detected.
> >>>  The error is most likely to be in the application, so check for
> >>>  input errors, disk space, memory needs, application bugs, etc.
> >>>  ddikick.x will now clean up all processes, and exit...
> >>> connect to address 10.5.255.249: Connection refused
> >>> connect to address 10.5.255.249: Connection refused
> >>> trying normal rsh (/usr/bin/rsh)
> >>> compute-0-5.local: Connection refused
> >>> connect to address 10.5.255.250: Connection refused
> >>> connect to address 10.5.255.250: Connection refused
> >>> trying normal rsh (/usr/bin/rsh)
> >>> compute-0-4: Connection refused
> >>> connect to address 10.5.255.242: Connection refused
> >>> connect to address 10.5.255.242: Connection refused
> >>> trying normal rsh (/usr/bin/rsh)
> >>> compute-0-12: Connection refused
> >>> connect to address 10.5.255.245: Connection refused
> >>> connect to address 10.5.255.245: Connection refused
> >>> trying normal rsh (/usr/bin/rsh)
> >>> compute-0-9: Connection refused
> >>
> >> I wonder, why here again the hostname has no .local, and is using the
> >> full path to /usr/bin/rsh. I agree, that this will not work.
> >>
> >>>
> >>> However, there is a mix between the rsh and qrsh . In the ddikick
> >>> log there are
> >>> both :
> >>>
> >>>  ddikick.x: execvp command line: rsh compute-0-12.local
> >>> /home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/
> >>> gamess.01.x
> >>> exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1
> >>> compute-0-12.lo
> >>> cal:cpus=1 compute-0-9.local:cpus=1 -dditree compute-0-5.local
> >>> 33170 2 4 rsh
> >>> -scr /tmp/3840.1.gamess.q
> >>>
> >>> this is not working for 100%
> >>
> >> This rsh will be caught by the rsh-wrapper. This is, as it should be.
> >>
> >>> and later there is
> >>>
> >>> /opt/gridengine/bin/lx26-amd64/qrsh -V -inherit compute-0-12.local
> >>> /home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/
> >>> gamess.01.x
> >>> exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1
> >>> comp
> >>> ute-0-12.local:cpus=1 compute-0-9.local:cpus=1 -dditree
> >>> compute-0-5.local 33170
> >>> 2 4 rsh -scr /tmp/3840.1.gamess.q
> >>
> >> This is the message from the wrapper, it's fine.
> >>
> >> -- Reuti
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list