[GE users] Weird issue with loose LAM/rsh integration

Joshua Baker-LePain jlb at salilab.org
Mon Sep 22 05:23:59 BST 2008


I'm running SGE 6.1u5 on top of CentOS 4.6 and the included lam-7.1.2. 
I'm attempting to set up loose integration using RSH as detailed here 
<http://gridengine.sunsource.net/howto/lam-integration/lam-integration.html>. 
Everything seems to work *except* actually running the MPI program.  The 
lamboot works (I can see lamd on the node, and 'lamnodes' within the job 
script returns the proper output), but upon attempting to 'lamrun' my 
sample program I get the output below.  Note that this program runs just 
fine outside of SGE.  In fact, I can even login to the node with the SGE 
started lamd and successfully run this program.  Any ideas?  Thanks!

MPI program output:
*** Oops -- I cannot open the LAM help file.
*** I tried looking for it in the following places:
***
*** Oops -- I cannot open the LAM help file.
*** I tried looking for it in the following places:
***
***   $HOME/lam-helpfile
***   $HOME/lam-helpfile
***   $HOME/lam-7.0.3-helpfile
***   $HOME/lam-7.0.3-helpfile
***   $HOME/etc/lam-helpfile
***   $HOME/etc/lam-helpfile
***   $HOME/etc/lam-7.0.3-helpfile
***   $LAMHELPDIR/lam-helpfile
***   $HOME/etc/lam-7.0.3-helpfile
***   $LAMHELPDIR/lam-7.0.3-helpfile
***   $LAMHOME/etc/lam-helpfile
***   $LAMHELPDIR/lam-helpfile
***   $LAMHOME/etc/lam-7.0.3-helpfile
***   $SYSCONFDIR/lam-helpfile
***   $LAMHELPDIR/lam-7.0.3-helpfile
***   $SYSCONFDIR/lam-7.0.3-helpfile
***
*** You were supposed to get help on the program "MPI"
***   $LAMHOME/etc/lam-helpfile
*** about the topic "no-lamd"
***
*** Sorry!
-----------------------------------------------------------------------------
***   $LAMHOME/etc/lam-7.0.3-helpfile
***   $SYSCONFDIR/lam-helpfile
***   $SYSCONFDIR/lam-7.0.3-helpfile
***
*** You were supposed to get help on the program "MPI"
*** about the topic "no-lamd"
***
*** Sorry!
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE).  You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list