[GE users] Desperate need for CPU clock cycles!

Andreas.Haas at Sun.COM Andreas.Haas at Sun.COM
Mon Jan 21 13:36:07 GMT 2008


Hi Richard,

increasing the load report interval may help, but without knowing the 
reason it's just guesswork.

You should try running the Dtrace master monitor

    http://wiki.gridengine.info/wiki/index.php/Dtrace

to gain some understanding about the root cause of this situation.

I assume memory shortage can be ruled out since that were easy to 
diagnose?

Regards,
Andreas


On Mon, 21 Jan 2008, Richard Hobbs wrote:

> Hello,
>
> We have just moved our qmaster to a 330MHz/512MB RAM Sun Netra t1 105
> running Solaris 10 11/06.
>
> We also have around 38 Linux-based dual-CPU exec hosts, each with 4
> running queues.
>
> Last week, we noticed that periodically (while people were submitting
> jobs [in the order of 60-200 in the queue at any one time]) the load
> average on the qmaster machine was through the roof (we noticed it at
> 164 at one point) and as a result, the qmaster grinds to a halt and
> becomes completely unresponsive for 20-40 minutes, during which time job
> submissions basically fail! It does recover afterwards, however.
>
> Obviously this is unacceptable, so we need a solution! :-)
>
> I realise that a 330MHz SPARC with 512MB RAM isn't the best spec, but
> this is only a job scheduler after all. Surely that should be plenty to
> run a qmaster on a grid of this size, right?
>
> Anyway, regardless of how this spec fits (or doesn't fit) the
> requirements of the qmaster, is there any way we can claw back some
> clock cycles to use during this process. We want our qmaster to be as
> efficient as possible, and ideally to continue running on this box!
>
> Are there any options we can turn on to make it quicker? Perhaps reduce
> the polling rate to the exec hosts (if such an event occurs)?
>
> Any advice is appreciated...
>
> Thanks in advance,
> Richard.
>
> -- 
> Richard Hobbs (Systems Administrator)
> Toshiba Research Europe Ltd. - Cambridge Research Laboratory
> Email: richard.hobbs at crl.toshiba.co.uk
> Web: http://www.toshiba-europe.com/research/
> Tel: +44 1223 436999        Mobile: +44 7811 803377
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

http://gridengine.info/

Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list