[GE users] Desperate need for CPU clock cycles!

Richard Ems Richard.Ems at cape-horn-eng.com
Mon Jan 21 17:52:45 GMT 2008


    [ The following text is in the "windows-1252" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Are you sure that it's a CPU bottleneck and not a memory one?
And, which version of SGE are you talking about?

We had a problem with the load getting also very high (on Linux, 
openSUSE 10.3) but this was because SGE was eating all our memory and 
the system started swapping.
There was a memory leak which has been found by Andreas just some days 
ago, but this probably only triggers on a configuration with several 
queues and PEs and submitting jobs using wildcards ... Andreas?

just my $0.01 ...

Richard


Neil Baker wrote:
> I?m  Neil, Richard?s (the original poster?s) colleague. 
> 
>  
> 
> Has anyone else had a similar experience when using Solaris 10 for the 
> qmaster? 
> 
>  
> 
> We actually migrated the qmaster from a RedHat Linux box to Solaris 10 
> box to try and gain extra stability as the RedHat box kept crashing (due 
> to hardware not Grid Engine).  I assumed that as Grid Engine was 
> initially written by SUN, that it should be more compatible and more 
> stable on SUN kit running Solaris.  I?ve also heard from people who say 
> that other scheduling software runs quite happily on similar specified 
> hardware.
> 
>  
> 
> Our execution hosts currently run OpenSuse 10 (these haven?t changed) 
> and we have approx 28 machines each running up to 4 jobs at a time (so a 
> max of 112 jobs running at a time).  We do use the gird a lot and there 
> is the possibility that the queued jobs can be as high as 500 to 1000 
> during peek usage.  We are also likely to double the number of execution 
> hosts in the near future.
> 
>  
> 
> The Sungrid binaries are also being shared via NFS from the same slow 
> Solaris Grid Engine machine.  The Solaris box is configured using soft 
> raid mirroring and could it be that the disk performance is causing a 
> bottle neck as the mirroring uses the CPU?  Is there an easy way for us 
> to tell if the disk is the bottle neck?  We do have a separate super 
> fast NetApp NAS device and I?m wondering how much of a benefit it would 
> be if we moved the shared binaries / SGE directory over to that NAS device?
> 
>  
> 
> In the past this system used to be a 1.8GHz box again with 512MB of 
> RAM.  Although this is approx 5 times faster than the 350Mhz Sun Netra 
> T1 105 we are experiencing these problems on, I didn?t expect the 
> qmaster to be so demanding on CPU resource.
> 
>  
> 
> Any suggestions would be gratefully received.


-- 
Richard Ems       mail: Richard.Ems at Cape-Horn-Eng.com

Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5? piso
46011 Valencia
Tel : +34 96 3242923 / Fax 924

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list