[GE users] Startup times and other issues with 6.0u3

Brian R Smith brian at cypher.acomp.usf.edu
Sat Mar 19 01:06:16 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Ron,

You are absolutely the man.  Setting "control slaves" to false fixed all 
of my problems.  Thank you so much.

Brian Smith

Ron Chen wrote:

>--- Brian R Smith wrote:
>  
>
>>1) The time it takes to actually start a process
>>(parallel) seems to 
>>increase in an almost exponential fashion to the
>>number of nodes being 
>>requested.
>>    
>>
>
>A problem was fixed recently, it was related to the
>slow start performance of parallel jobs. See if
>turning off "control slaves" in your PE improves
>anything? If that works then wait for SGE 6 update 4,
>it got this problem fixed.
>
>However, 10 minutes is really long - I believe
>something else is going on with your machine or your
>setup or OS!
>
>
>  
>
>> I've noticed that SGE does a lot of
>>communications prior to 
>>execution, checking loads, getting stats, etc. but
>>that still doesn't 
>>explain why it is taking nearly 10 minutes to begin
>>executing a 42 
>>processor job on a completely un-utilized cluster. 
>>Any pointers on this 
>>would be great.
>>
>>2) It seems that under tight integration with MPICH,
>>I am getting some 
>>strange inconsistencies with some of my codes,
>>specifically MM5.  
>>Running the mpirun command directly from the shell,
>>with no 'rsh' call 
>>interceptions, seems to work perfect.  However, once
>>this code is 
>>executed from the SGE environment, there are some
>>serious communications 
>>issues (message passing wise) that slow down these
>>codes horribly.  This 
>>is an intermittent problem that happens to about 30%
>>of MM5 jobs.  I was 
>>considering creating another Parallel Environment
>>that was not tightly 
>>integrated and was wondering if anyone has
>>experienced this before I 
>>charge ahead, possibly in the wrong direction.
>>    
>>
>
>Is reprioritization on?
>
>Also, more info about the hardware/software used would
>be helpful.
>
> -Ron
>
>
>  
>
>>I happen to be running CentOS 4 (RHEL rebuild) with
>>SELINUX disabled and 
>>kerberos-enabled rsh removed from the path (we use
>>straight rsh on the 
>>nodes).  If anyone knows of any issues with this
>>configuration, let me know.
>>
>>Thanks
>>
>>Brian Smith
>>
>>
>>    
>>
>---------------------------------------------------------------------
>  
>
>>To unsubscribe, e-mail:
>>users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail:
>>users-help at gridengine.sunsource.net
>>
>>
>>    
>>
>
>
>		
>__________________________________ 
>Do you Yahoo!? 
>Yahoo! Small Business - Try our new resources site!
>http://smallbusiness.yahoo.com/resources/ 
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list