[GE users] Startup times and other issues with 6.0u3

Reuti reuti at staff.uni-marburg.de
Sat Mar 19 15:40:57 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

Quoting Brian R Smith <brian at cypher.acomp.usf.edu>:

> Reuti:
> 
> shell                 /bin/csh
> shell_start_mode      posix_compliant
> 
> Thats how we run it on our other machines and it seems to work just fine. 
> Figured I didn't have to change anything.  

thx for the all the infos. If all of your jobs are using csh it's perfect okay 
to let it stay at this setting.
 
> The PE mpich-vasp has all traces of tight integration removed.  I was using
> it for testing Vasp.  However, I got vasp to work with tight-integration so
> that queue is depricated.  

Okay, I saw it only in your definition and thought about the reason having a 
special mpich for vasp.
 
> As for MM5, it's compiled with the PGI compilers.  That said, it runs
> beautifully outside of SGE on the same system (mpirun).  It is only under SGE
> that the processes come to a crawl.  If you want more info on it, let me
> know.  I just don't see how that would help.
> 
> my mpirun command for that is simply
> 
> $MPIR_PATH/mpirun -np $NSLOTS -machinefile $TMPDIR/machines mm5.mpp
> 
> mm5.mpp does some small writes to the NFS mount that it runs out of.  During
> the span of about 3 minutes, it will dump approximately 10MB of data into its
> current working directory.  I have load tested this outside of SGE and found
> that the NFS writes are not the cause of any slowdown.

It's not so much. Requires MM5 a shared scratch space or could it also run by 
using the created $TMPDIR on the nodes?
 
> MPICH runs over a dedicated GigE connection that's devoted to 1) SGE 2)
> Message Passing 3) PVFS2.  The other network (100mb) handles NFS, NIS, etc. 
> The GigE network is a Cisco Catalyst series GigE switch with Intel GigE
> controllers on the nodes.  We are not using Jumbo frames on that network
> either as we've yet to get any testing done on the benefits of doing so.

So you set up a host_aliases file to seperate the traffic - is `hostname` 
giving the "external" or "internal" name of the machine? You replaced 
"MPI_HOST=`hostname`" in your mpirun with a mapping to your GigE name if 
necessary?

Just for interest: how many nodes are you using in your cluster for PVFS2 
servers?
 
Cheers - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list