[GE users] predict times

reuti reuti at staff.uni-marburg.de
Fri Dec 11 22:21:05 GMT 2009


Am 11.12.2009 um 12:16 schrieb cgull:

> Thanks again for your reply.
>
> I'm afraid you may have lost me with the resource reservation on.  
> I'm quite new to this SGE so sorry if i appear to be a little slow.
>
> The problem we have is that have three 96 node clusters.  The  
> clusters are different speeds.
> We normally put 48 node (one cluster) jobs onto these machines.  
> Jobs usually take approx 9 hours depending on which machine they  
> run on.
>
> Currently with SGE people can submit jobs that go onto the 48 nodes  
> as we want. The problem then comes when they want to prioritise the  
> jobs.
> As really the priority comes from when people need their jobs  
> finished by.
> We can manually estimate when a job should end and therefore  
> predict which machine pending jobs will start on which machine and  
> therefore estimate how long they will take etc. And prioritise the  
> pending queue accordingly.
>
> It also nice to know when a cluster or half a cluster (as we run on  
> 48 nodes) becomes available, and no more pending jobs, so that  
> enough jobs to fill a weekend can be put onto the queue.  As people  
> can fairly easily find extra jobs to put on if they know there is  
> the resource available.
>
> This currently is difficult to work out each time and would be good  
> to have automated even if it is an estimate. But will change as  
> jobs get added/reordered etc.
>
> What do you think the best way we should be configuring SGE to best  
> deal with this sort of scenario?

This is outside of the scope of SGE. In the end it could be scaled  
down to 6 machines with 2 of each speed. It's a one-dimensional  
cutting stock problem I think (http://en.wikipedia.org/wiki/ 
Cutting_stock_problem), with the time to be cut. The general problem  
would be to have a set of jobs with different normalized runtime and  
hence also a different runtime on each of the machines, which should  
in total finish in the least amount of time until the last job  
finishes. An additonal constraint is the sudden appearance of a job  
which must run just now (or as soon as any actual running job  
finishes), and will invalidate all setup optimizations.

You could write a program outside of SGE, which checks the estimated  
runtime of the waiting jobs and the already elapsed times of the  
running ones. With this you can output the endtime of each job and/or  
also route the jobs to different machines.


> Is it to use the resource reservation? If so can you explain a  
> little more or point me to more information?

No, this is only applicable for jobs with varying resource request.  
In your case there is nothing to reserve. Your jobs always use 48  
nodes each.

-- Reuti


> Thanks again for your time on this discussion.
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do? 
> dsForumId=38&dsMessageId=232765
>
> To unsubscribe from this discussion, e-mail: [users- 
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=232858

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list