[GE users] Problems with LAM tight integration

slaton slaton at berkeley.edu
Fri Aug 4 18:43:22 BST 2006


> Fine, but I was a little bit confused - sorry: can you please put it in 
> the startlam.sh, and also echo the $PATH there (just before the 
> lamboot)? As the daemons startup there, the error must also occur there.
> 
> You could even just put only an echo something in your real job script 
> without any mpirun command at all, or use the "lamnodes" command there, 
> to display the started daemons of LAM.

OK....results:

 startlam.sh: using rsh_wrapper /opt/sge/lam_tight_qrsh/rsh
 startlam.sh: using PATH /tmp/105.1.testing:/usr/local/bin:/bin:/usr/bin
 startlam.sh: using rsh /tmp/105.1.testing/rsh

Not sure why PATH is being clobbered here.

Then i get, in the (pe) file:

 error: ERROR! invalid option argument "-n"
 The lamboot agent timed out while waiting for the newly-booted process
 to call back and indicated that it had successfully booted.

 [...lots more verbose errors from lam...]

 error: error reading returncode of remote command

In the error (e) file:
 
 It seems that there is no lamd running on the host qcn12.

 This indicates that the LAM/MPI runtime environment is not operating.
 The LAM/MPI runtime environment is necessary for the "lamnodes" command.

This is after changing the script to just run the command 'lamnodes'. So 
lamnodes fails because lamboot has problems, just like before with 
mpihello. Presumably this goes back to the same problem of additional 
arguments being added to 'rsh'...

thanks,
slaton

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list