[GE users] Desperate need for CPU clock cycles!

Richard Hobbs richard.hobbs at crl.toshiba.co.uk
Mon Jan 21 11:01:10 GMT 2008

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


We have just moved our qmaster to a 330MHz/512MB RAM Sun Netra t1 105
running Solaris 10 11/06.

We also have around 38 Linux-based dual-CPU exec hosts, each with 4
running queues.

Last week, we noticed that periodically (while people were submitting
jobs [in the order of 60-200 in the queue at any one time]) the load
average on the qmaster machine was through the roof (we noticed it at
164 at one point) and as a result, the qmaster grinds to a halt and
becomes completely unresponsive for 20-40 minutes, during which time job
submissions basically fail! It does recover afterwards, however.

Obviously this is unacceptable, so we need a solution! :-)

I realise that a 330MHz SPARC with 512MB RAM isn't the best spec, but
this is only a job scheduler after all. Surely that should be plenty to
run a qmaster on a grid of this size, right?

Anyway, regardless of how this spec fits (or doesn't fit) the
requirements of the qmaster, is there any way we can claw back some
clock cycles to use during this process. We want our qmaster to be as
efficient as possible, and ideally to continue running on this box!

Are there any options we can turn on to make it quicker? Perhaps reduce
the polling rate to the exec hosts (if such an event occurs)?

Any advice is appreciated...

Thanks in advance,

Richard Hobbs (Systems Administrator)
Toshiba Research Europe Ltd. - Cambridge Research Laboratory
Email: richard.hobbs at crl.toshiba.co.uk
Web: http://www.toshiba-europe.com/research/
Tel: +44 1223 436999        Mobile: +44 7811 803377

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list