[GE users] SGE issues with LAM and LSDyna

Bernard Li bli at bcgsc.ca
Wed Jan 26 17:55:38 GMT 2005


Hi Joe:

Which SGE-LAM integration are you using?  I have the integration by
Chris Duncan for LAM which works with 5.3p6 (I am using LAM 7.0.6).

Cheers,

Bernard 

> -----Original Message-----
> From: Joe Landman [mailto:landman at scalableinformatics.com] 
> Sent: Wednesday, January 26, 2005 7:22
> To: users at gridengine.sunsource.net
> Subject: [GE users] SGE issues with LAM and LSDyna
> 
> Hi Folks:
> 
>   We are running SGE 5.3p6 for lx24_amd64 with LSDyna.  We 
> are attempting to use the LAM compilation due to issues with 
> MPICH.  We are having a problem that seems to show up only 
> under SGE.  When we run the batch script by hand, it works just fine.
> 
>   In our batch job, LAM boots up fine, and I can tping it.  
> 
> /opt/lam/gnu/bin/tping -c1 N
>   1 byte from 1 remote node and 1 local node: 0.000 secs
> 
>   but, the mpirun complains that it cannot see the other 
> lamd's (which tping found).
> 
> /opt/lam/gnu/bin/tping -c1 N
>   1 byte from 1 remote node and 1 local node: 0.000 secs
> 
> 1 message, 1 byte (0.001K), 0.000 secs ( infK/sec) roundtrip 
> min/avg/max: 0.000/0.000/0.000 /opt/lam/gnu/bin/mpirun -np 4 
> /apps/lsdyna/mpp970_s_5434a_amd64_linux_lam703 -- 
> i=four_cpu.key memory=250000000
> --------------------------------------------------------------
> ---------------
> 
> It seems that there is no lamd running on the host compute-1-6.local.
> 
> This is odd, as it works by hand:
> 
> [landman at compute-1-6 ~/test51]$ lamboot -v machines
> 
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
> 
> n-1<22349> ssi:boot:base:linear: booting n0 (compute-1-6) 
> n-1<22349> ssi:boot:base:linear: booting n1 (compute-1-22) 
> n-1<22349> ssi:boot:base:linear: finished
> [landman at compute-1-6 ~/test51]$ tping -c2 N
>   1 byte from 1 remote node and 1 local node: 0.000 secs
>   1 byte from 1 remote node and 1 local node: 0.000 secs
> 
> 2 messages, 2 bytes (0.002K), 0.000 secs ( infK/sec) 
> roundtrip min/avg/max: 0.000/0.000/0.000
> [landman at compute-1-6 ~/test51]$ /opt/lam/gnu/bin/mpirun -np 4
> /apps/lsdyna/mpp970_s_5434a_amd64_linux_lam703 --  
> i=four_cpu.key  memory=250000000
>       Date: 01/26/2005      Time: 10:13:01
>  Executing with local workstation license
> 
>      ___________________________________________________
>      |                                                 |
>      |  Livermore  Software  Technology  Corporation   |
>      |                                                 |
>      |  7374 Las Positas Road                          |
>      |  Livermore, CA 94551                            |
> ...
> 
> Any thoughts?  If we have to go back to MPICH, then we need a 
> reliable way to kill hung MPICH processes (we set up tight 
> integration, but it looks like MPICH issues are messing up 
> clean kill of processes).
> 
> Joe
> 
> --
> Joseph Landman, Ph.D
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web  : http://scalableinformatics.com
> phone: +1 734 612 4615
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list