[GE users] Startup times and other issues with 6.0u3

Brian R Smith brian at cypher.acomp.usf.edu
Sat Mar 19 00:37:04 GMT 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hey guys,

I just finished migrating my beowulf from PBSpro to SGE 6.0u3 and I am 
running into some issues.  I've run SGE on other beowulfs before so I 
know this behavior is not typical.

1) The time it takes to actually start a process (parallel) seems to 
increase in an almost exponential fashion to the number of nodes being 
requested.  I've noticed that SGE does a lot of communications prior to 
execution, checking loads, getting stats, etc. but that still doesn't 
explain why it is taking nearly 10 minutes to begin executing a 42 
processor job on a completely un-utilized cluster.  Any pointers on this 
would be great.

2) It seems that under tight integration with MPICH, I am getting some 
strange inconsistencies with some of my codes, specifically MM5.  
Running the mpirun command directly from the shell, with no 'rsh' call 
interceptions, seems to work perfect.  However, once this code is 
executed from the SGE environment, there are some serious communications 
issues (message passing wise) that slow down these codes horribly.  This 
is an intermittent problem that happens to about 30% of MM5 jobs.  I was 
considering creating another Parallel Environment that was not tightly 
integrated and was wondering if anyone has experienced this before I 
charge ahead, possibly in the wrong direction.

I happen to be running CentOS 4 (RHEL rebuild) with SELINUX disabled and 
kerberos-enabled rsh removed from the path (we use straight rsh on the 
nodes).  If anyone knows of any issues with this configuration, let me know.

Thanks

Brian Smith

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list