[GE users] Problems with LAM tight integration

slaton slaton at berkeley.edu
Wed Aug 2 20:35:50 BST 2006

Hi Reuti, thanks for the reply.

> > E|invalid job object in job submission from user "slaton", commproc
> > "qsub" on host "qln01"
> is the same SGE version running on all machines?

Yes. it's a warewulf cluster, so /opt/sge is an NFS mount from the master 
node to the compute nodes. hence all nodes are actually running from the 
same physical SGE installation.

> Is the qmaster also running on "qln01"?

No. qln01 is just a submit host which also serves as a login host for 
users. the qmaster (qmn01) is the master node (which is also where the 
physical SGE installation resides).

> > -catch_rsh /opt/sge/default/spool/qcn13/active_jobs/50.1/pe_hostfile
> > qcn13
> > qcn14
> > qcn15
> > qcn16
> > /opt/sge/bin/lx24-amd64/qrsh -V -inherit -n -p 32795 qcn13 exec
> There is the -n, and -p (priority) is also not the intended option for 
> qrsh I think. This is the echo from your rsh-wrapper? Can you please 
> check, whether you are using the correct rsh-wrapper, i.e. create the 
> correct link in $TMPDIR to point to the rsh-wrapper for lam_tight_qrsh?

OK -- while a job was running i rsh'd into the node and verified that in 
/tmp/[jobid], there was a symlink rsh -> /opt/sge/lam_tight_qrsh/rsh, 
which is indeed what is specified in startlam.sh. This is where i 
installed the rsh wrapper included in the pkg w/ startlam.sh and 

> Is the "/opt/sge/bin/lx24-amd64/qrsh -V -inherit" compiled into LAM as 
> to be used rsh-program by accident, and so bypassing the rsh-wrapper 
> (the -n would be filtered out by the rsh-wrapper otherwise)?

Don't think so, as i did not compile LAM with the --with-rsh option. We 
just use vanilla rsh, not ssh or any other thing.


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list