[GE users] Some jobs wait long time in the queue

david zanella zanella at mayo.edu
Mon Aug 27 21:48:51 BST 2007


Look at the policy configuration, then click on share tree policy and have a 
look at each users usage. 

The scenario you describe is possible if user Y's usage is greater than
user's X usage. That is, if user Y just gone done running 1500 jobs,
they'll be pushed down in the wait queue until the usage equalizes.
There's also a "halftime" (qconf -msconf) that after so many hours (168
the default), your past usage stats don't count against you.

You need to lose the "maximumm number of jobs per user" and let 
the share tree do it's thing. It may take a few days to equilize while 
it builds statistics...

This is my understanding of how things work....


> > Am 27.08.2007 um 15:55 schrieb Massimo Canonico:
> >
> >> Hi,
> >> I have implemented the user fairshare policy
> >>
> >> *********Easy setup of equal user fairshare policy******
> >>  1. Make 2 changes in the main SGE configuration ('qconf -mconf'):
> >>         * enforce_user auto
> >>         * auto_user_fshare 100
> >>
> >>  2. Make 1 change in the SGE scheduler configuration ('qconf -msconf'):
> >>         * weight_tickets_functional 10000
> >>
> >> *******************sub array of tasks**************
> >>
> >> but I do not like the current behavior of the scheduler.
> >>
> >> In particular there is one user who has submitted 1000 jobs with a 
> >> rapid response time and another user that is waiting for a resource 
> >> for hours. I would like see that the user waiting in the queue gets 
> >> higher priority in order  to not wait so long time.
> >>
> >> How can I speed up the priority level of the users waiting in the queue?
> >
> > Are the 1000 jobs already executing? If there are still some waiting, 
> > then the one job should be executed before them. Are there any special 
> > resources required for this job, which need to be reserved beforehand?
> >
> No, no special requirements. In our cluster there are 12 machines, so 
> the scenario is the following:
> user X has submitted 1000 jobs, then user Y has submitted 1 job
> 
> the first 12 jobs of user X are running in the cluster, the other 988 
> jobs are waiting for idle machines and finally the job of user Y is at 
> the end of the queue.
> 
> I have observed the queue for all day, but the scheduler always selects 
> the X user jobs instead of the Y's job. The Y's job keeps his priority 
> equals to 0.5, while the X user jobs keep a value of about 0.535.
> 
> Now I have limited the number of job for users but this is a static 
> solution and I would like to set a dynamic one.
> 
> Any ideas?
> 
> Thanks in advice,
> M
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list