[GE users] Desperate need for CPU clock cycles!
Richard.Ems at cape-horn-eng.com
Mon Jan 21 17:52:45 GMT 2008
[ The following text is in the "windows-1252" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Are you sure that it's a CPU bottleneck and not a memory one?
And, which version of SGE are you talking about?
We had a problem with the load getting also very high (on Linux,
openSUSE 10.3) but this was because SGE was eating all our memory and
the system started swapping.
There was a memory leak which has been found by Andreas just some days
ago, but this probably only triggers on a configuration with several
queues and PEs and submitting jobs using wildcards ... Andreas?
just my $0.01 ...
Neil Baker wrote:
> I?m Neil, Richard?s (the original poster?s) colleague.
> Has anyone else had a similar experience when using Solaris 10 for the
> We actually migrated the qmaster from a RedHat Linux box to Solaris 10
> box to try and gain extra stability as the RedHat box kept crashing (due
> to hardware not Grid Engine). I assumed that as Grid Engine was
> initially written by SUN, that it should be more compatible and more
> stable on SUN kit running Solaris. I?ve also heard from people who say
> that other scheduling software runs quite happily on similar specified
> Our execution hosts currently run OpenSuse 10 (these haven?t changed)
> and we have approx 28 machines each running up to 4 jobs at a time (so a
> max of 112 jobs running at a time). We do use the gird a lot and there
> is the possibility that the queued jobs can be as high as 500 to 1000
> during peek usage. We are also likely to double the number of execution
> hosts in the near future.
> The Sungrid binaries are also being shared via NFS from the same slow
> Solaris Grid Engine machine. The Solaris box is configured using soft
> raid mirroring and could it be that the disk performance is causing a
> bottle neck as the mirroring uses the CPU? Is there an easy way for us
> to tell if the disk is the bottle neck? We do have a separate super
> fast NetApp NAS device and I?m wondering how much of a benefit it would
> be if we moved the shared binaries / SGE directory over to that NAS device?
> In the past this system used to be a 1.8GHz box again with 512MB of
> RAM. Although this is approx 5 times faster than the 350Mhz Sun Netra
> T1 105 we are experiencing these problems on, I didn?t expect the
> qmaster to be so demanding on CPU resource.
> Any suggestions would be gratefully received.
Richard Ems mail: Richard.Ems at Cape-Horn-Eng.com
Cape Horn Engineering S.L.
C/ Dr. J.J. Dómine 1, 5? piso
Tel : +34 96 3242923 / Fax 924
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users