[GE users] gamess info

lukacm at pdx.edu lukacm at pdx.edu
Wed Aug 23 20:17:56 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

It is missing hostname,

the qrsh is deplyed. i can catch now the log (finally ) of the ddikick process:

 ddikick.x: finished with -ddi argument.
 ddikick.x: finished with -dditree argument
 ddikick.x: finished with -ppn argument
 ddikick.x: finished with -scr argument.

 Distributed Data Interface kickoff program.
 Initiating 4 compute processes on 4 nodes to run the following command:
 /home/visible/apps/gamess/gamess.01.x exam20

 ddikick.x: kickoff host = compute-0-5.local
 Master Kickoff Host compute-0-5.local is accepting connections on port 33170.
 Awaiting connections from 8 GDDI processes.
 ddikick.x : Thread created on compute-0-5.local:33170 to accept connections.
 ddikick.x: execvp command line: rsh compute-0-12.local
/home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/gamess.01.x
exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1
compute-0-12.lo
cal:cpus=1 compute-0-9.local:cpus=1 -dditree compute-0-5.local 33170 2 4 rsh
-scr /tmp/3840.1.gamess.q
 ddikick.x: execvp command line: rsh compute-0-4.local
/home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/gamess.01.x
exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1
compute-0-12.loc
al:cpus=1 compute-0-9.local:cpus=1 -dditree compute-0-5.local 33170 1 2 rsh -scr
/tmp/3840.1.gamess.q
Attemping to create DDI process 0 on local node 0.
DDI Process 0 Command Line: /home/visible/apps/gamess/gamess.01.x exam20 -ddi
compute-0-5.local 33170 0 0 4 4 compute-0-5.local:cpus=1
compute-0-4.local:cpus=1 compute-0-12.local:cpus=1 compute-0-9.local:cpus=1
Attemping to create DDI process 4 on local node 0.
DDI Process 4 Command Line: /home/visible/apps/gamess/gamess.01.x exam20 -ddi
compute-0-5.local 33170 0 4 4 4 compute-0-5.local:cpus=1
compute-0-4.local:cpus=1 compute-0-12.local:cpus=1 compute-0-9.local:cpus=1
/opt/gridengine/bin/lx26-amd64/qrsh -V -inherit compute-0-12.local
/home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/gamess.01.x
exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1 comp
ute-0-12.local:cpus=1 compute-0-9.local:cpus=1 -dditree compute-0-5.local 33170
2 4 rsh -scr /tmp/3840.1.gamess.q
/opt/gridengine/bin/lx26-amd64/qrsh -V -inherit compute-0-4.local
/home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/gamess.01.x
exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1 compu
te-0-12.local:cpus=1 compute-0-9.local:cpus=1 -dditree compute-0-5.local 33170 1
2 rsh -scr /tmp/3840.1.gamess.q
 ddikick.x: 4 bytes received; $lu remaining.
 ddikick.x: 4 bytes received; $lu remaining.
 ddikick.x : 0 checked in; receiving via port 33177 (Remaining=7).
 ddikick.x: 4 bytes received; $lu remaining.
 ddikick.x: 4 bytes received; $lu remaining.
 ddikick.x : 4 checked in; receiving via port 33179 (Remaining=6).
 ddikick.x: Sending kill signal to DDI processes.
 ddikick.x: Sending kill signal to DDI process 0.
 ddikick.x: Sending kill signal to DDI process 4.
 DDI Process 0: terminated upon request.
 DDI Process 4: terminated upon request.
 ddikick.x: Execution terminated due to error(s).

and it the error log i have the same as before:


error: commlib error: access denied (client IP resolved to host name "". This is
not identical to clients host name "")
error: executing task of job 3840 failed: failed sending task to
execd at compute-0-12.local: can't find connection
error: commlib error: access denied (client IP resolved to host name "". This is
not identical to clients host name "")
error: executing task of job 3840 failed: failed sending task to
execd at compute-0-4.local: can't find connection
 ddikick.x: Timed out while waiting for DDI processes to check in.
 ddikick.x: Fatal error detected.
 The error is most likely to be in the application, so check for
 input errors, disk space, memory needs, application bugs, etc.
 ddikick.x will now clean up all processes, and exit...
connect to address 10.5.255.249: Connection refused
connect to address 10.5.255.249: Connection refused
trying normal rsh (/usr/bin/rsh)
compute-0-5.local: Connection refused
connect to address 10.5.255.250: Connection refused
connect to address 10.5.255.250: Connection refused
trying normal rsh (/usr/bin/rsh)
compute-0-4: Connection refused
connect to address 10.5.255.242: Connection refused
connect to address 10.5.255.242: Connection refused
trying normal rsh (/usr/bin/rsh)
compute-0-12: Connection refused
connect to address 10.5.255.245: Connection refused
connect to address 10.5.255.245: Connection refused
trying normal rsh (/usr/bin/rsh)
compute-0-9: Connection refused

However, there is a mix between the rsh and qrsh . In the ddikick log there are
both :

 ddikick.x: execvp command line: rsh compute-0-12.local
/home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/gamess.01.x
exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1
compute-0-12.lo
cal:cpus=1 compute-0-9.local:cpus=1 -dditree compute-0-5.local 33170 2 4 rsh
-scr /tmp/3840.1.gamess.q

this is not working for 100%

and later there is

/opt/gridengine/bin/lx26-amd64/qrsh -V -inherit compute-0-12.local
/home/visible/apps/gamess/ddikick.x /home/visible/apps/gamess/gamess.01.x
exam20 -ddi 4 4 compute-0-5.local:cpus=1 compute-0-4.local:cpus=1 comp
ute-0-12.local:cpus=1 compute-0-9.local:cpus=1 -dditree compute-0-5.local 33170
2 4 rsh -scr /tmp/3840.1.gamess.q


for somereason it first calls rsh, then it fails and then it tries to run qrsh.

martin



Quoting Reuti <reuti at staff.uni-marburg.de>:

> Am 23.08.2006 um 19:48 schrieb lukacm at pdx.edu:
>
> > I tried both with the same results.
> > Adding more commands into the script for SGE, i can also see the
> > errors such as:
> >
> > rsh: missing host.
>
> Do you mean "missing hostname" or just "missing host."?
>
> The former one is from the rsh-wrapper, so we would be on the right
> track.
>
> -- Reuti
>
> >
> > martin
> >
> >
> > Quoting Reuti <reuti at staff.uni-marburg.de>:
> >
> >> Am 23.08.2006 um 18:45 schrieb lukacm at pdx.edu:
> >>
> >>> Hello,
> >>>
> >>> i am using the normal rsh from /usr/bin/rsh.
> >>
> >> The question was: Did you set DDI_RSH to "rsh" or "/usr/bin/rsh"? The
> >> latter won't work, as you specify the full path, so the wrapper will
> >> be skipped.
> >>
> >> -- Reuti
> >>
> >>
> >>> For some reason, it just cannot start the gamess process remotely.
> >>> As i
> >>> mentioned. i can start a gamess process if the main process is on
> >>> the local
> >>> node.
> >>>
> >>> If starting on the remote nodes, it just fails to initiate the main
> >>> node.
> >>>
> >>>
> >>>
> >>> martin
> >>>
> >>> Quoting Reuti <reuti at staff.uni-marburg.de>:
> >>>
> >>>> Am 22.08.2006 um 20:53 schrieb lukacm at pdx.edu:
> >>>>
> >>>>> Well,
> >>>>>
> >>>>> as i mentioned (at least i hope), i can run gamess using the
> >>>>> rungms
> >>>>> script and
> >>>>> ssh. It does not work using rsh because the cluster is not rsh
> >>>>> enabled. This is
> >>>>> different for SGE having its own subsytem.
> >>>>>
> >>>>>
> >>>>> When i run gamess in SGE, either ssh or rsh i get similar errors:
> >>>>>
> >>>>> in one case (ssh) i get only error messages as shown below (in hte
> >>>>> previous
> >>>>> mail). When run with rsh, i get errors directly from the gamess
> >>>>> program such
> >>>>> as:
> >>>>>
> >>>>> connect to address 10.5.255.240: connect to address 10.5.255.248:
> >>>>> Connection
> >>>>> refused
> >>>>> Connection refused
> >>>>> connect to address 10.5.255.240: Connection refused
> >>>>> trying normal rsh (/usr/bin/rsh)
> >>>>
> >>>> Did you set DDI_RSH to rsh or /usr/bin/rsh - seems that the wrapper
> >>>> is not used. Otherwise: please put a `which rsh` and `echo
> >>>> $PATH` in
> >>>> your jobscript to get a closer look.
> >>>>
> >>>>> connect to address 10.5.255.248: Connection refused
> >>>>> trying normal rsh (/usr/bin/rsh)
> >>>>> compute-0-14.local: Connection refused
> >>>>> compute-0-6.local: Connection refused
> >>>>
> >>>> The addresses are okay here (including .local).
> >>>>
> >>>>>  ddikick.x: Timed out while waiting for DDI processes to check in.
> >>>>>  ddikick.x: Fatal error detected.
> >>>>>  The error is most likely to be in the application, so check for
> >>>>>  input errors, disk space, memory needs, application bugs, etc.
> >>>>>  ddikick.x will now clean up all processes, and exit...
> >>>>> compute-0-12.local: Connection refused
> >>>>> compute-0-6: Connection refused
> >>>>> compute-0-14: Connection refused
> >>>>> compute-0-13: Connection refused
> >>>>
> >>>> And here not. But anyway: it is only necessary, that the scratch
> >>>> directory has the same name on all nodes. So in 5.3 it was
> >>>> necessary
> >>>> to create and delete them, as they all had unique names on each
> >>>> node.
> >>>> In 6.0, you can delete the for-loop for the creation and
> >>>> deletion of
> >>>> the directories in {start,stop}gamess.sh.
> >>>>
> >>>> -- Reuti
> >>>>
> >>>> -------------------------------------------------------------------
> >>>> --
> >>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>> For additional commands, e-mail: users-
> >>>> help at gridengine.sunsource.net
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> --------------------------------------------------------------------
> >>> -
> >>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list