[GE users] Problems with LAM tight integration

Reuti reuti at staff.uni-marburg.de
Wed Aug 2 21:05:40 BST 2006


Am 02.08.2006 um 21:35 schrieb slaton:

> Hi Reuti, thanks for the reply.
>
>>> E|invalid job object in job submission from user "slaton", commproc
>>> "qsub" on host "qln01"
>>
>> is the same SGE version running on all machines?
>
> Yes. it's a warewulf cluster, so /opt/sge is an NFS mount from the  
> master
> node to the compute nodes. hence all nodes are actually running  
> from the
> same physical SGE installation.
>
>> Is the qmaster also running on "qln01"?
>
> No. qln01 is just a submit host which also serves as a login host for
> users. the qmaster (qmn01) is the master node (which is also where the
> physical SGE installation resides).

Okay.

>>> -catch_rsh /opt/sge/default/spool/qcn13/active_jobs/50.1/pe_hostfile
>>> qcn13
>>> qcn14
>>> qcn15
>>> qcn16
>>> /opt/sge/bin/lx24-amd64/qrsh -V -inherit -n -p 32795 qcn13 exec
>>

I still wonder, where this line is printed. The option -n is wrong  
there. In your original post there were two additonal lines: are  
these from your script?

>> There is the -n, and -p (priority) is also not the intended option  
>> for
>> qrsh I think. This is the echo from your rsh-wrapper? Can you please
>> check, whether you are using the correct rsh-wrapper, i.e. create the
>> correct link in $TMPDIR to point to the rsh-wrapper for  
>> lam_tight_qrsh?
>
> OK -- while a job was running i rsh'd into the node and verified  
> that in
> /tmp/[jobid], there was a symlink rsh -> /opt/sge/lam_tight_qrsh/rsh,
> which is indeed what is specified in startlam.sh. This is where i
> installed the rsh wrapper included in the pkg w/ startlam.sh and
> stoplam.sh.

Okay.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list