[GE users] Parallel job starvation

Daire Byrne Daire.Byrne at framestore-cfc.com
Fri Dec 21 16:07:56 GMT 2007

    [ The following text is in the "utf-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]


We have a workload that includes equal amounts of single slot/cpu jobs and multi-threaded (>1 slot) jobs. Now I'm aware that "reservation" is designed to ensure that multi-cpu jobs can still get enough slots to run but it seems to me this only works efficiently if you know the approximate run times of jobs. It is almost impossible in our environment to make accurate predictions of run times so I fear that reservation will either leave slots free for longer than necessary (how can you backfill if you don't know what job can "fit" in) or you end up waiting for longer than you thought for "reserved" slots on a host to free up.

In such an environment I'm thinking it is better to split the cluster into distinct "no. of slot" groups (using Qs?). So I would dispatch my 4 thread jobs only to machines which do 4 thread jobs. This way I am essentially turning a 4 thread job into a single slot job on these machines. When a job finishes it frees up exactly the 4 slots required for the next 4 thread/cpu job. Does anyone have any experience with this kind of setup? It seems that dynamically altering the number hosts accepting single thread or 4 thread jobs would be very difficult to manage but it may be more efficient in an environment where you can't predict the run times? We only currently run jobs that have 1, 2 and 4 x threads so it would potentially mean splitting the cluster into 3 distinct Q's.

It would be nice to have a hybrid solution - prefer the 4-thread machines/Q but if not available do a reservation elsewhere.


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list