[GE users] LAM & SGE

Bogdan Costescu bogdan.costescu at iwr.uni-heidelberg.de
Fri Aug 27 19:08:37 BST 2004


On Fri, 27 Aug 2004, Orion Poplawski wrote:

> I end up with a runaway qrsh process using 100% cpu and the job
> fails and puts the queue in an error state.

Where does this qrsh process runs ? I assume on the master node of the
job, but just to be sure...

> Both SGE and lam are configured to use ssh.

In my setup (SGE 5.3p5 and LAM-7.0.3) they are using rsh, so it might 
be something ssh-related...

> n-1<11478> ssi:boot:rsh:   n0 cynosure.cora.nwra.com --> 65.171.192.72
> ...
> SGE-LAM DEBUG: QRSH LOCAL CONFIG: -inherit -nostdin -V
> cynosure.colorado-research.com /usr/bin/lamd -H 65.171.192.72 -P 58387 -n
> 0 -o 0 -d -sessionsuffix sge-38136-undefined

There seem to be 2 names for the same host. Are you sure that this 
doesn't break things ? LAM accepts any name/IP associated with a local 
interface and will use that interface for connecting; SGE is not that 
forgiving, you have to set aliases correctly.

-- 
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list