[GE users] scheduling weirdness in 6.0u3

Peter P Cebull Peter.Cebull at inl.gov
Wed Apr 6 15:33:58 BST 2005


Sean,

I think we encountered exactly the same issues a few months ago. We also 
have dual-proc nodes, and our work load is primarily parallel jobs with 
10-30 slots per job. I noticed that new jobs in many cases would get 
assigned to a node that already had a slave running on it (I don't recall 
now if master nodes were getting doubled up or just nodes with slaves). We 
expected to see one slot assigned to each node before any node had two 
slots assigned.

After some email exchanges with someone at Sun, we were told that there 
were some known issues with the distribution of parallel jobs based on the 
load formula. We were told to do the following: 1) set complex_values to 
"slots=2" for the execution hosts, and 2) set load_formula to "-slots" in 
the scheduler configuration. Since then we have had no scheduling 
weirdness like we did before -- jobs are assigned to nodes with no slots 
occupied first, then to nodes with one slot in use if no empty slots are 
available.

We are also running 6.0u3. Try the solution above and see if that helps.

Good Luck,
Peter
____________________________________________
Peter Cebull
Idaho National Laboratory
P.O. Box 1625, MS3605
Idaho Falls, ID 83415
Phone: 208-526-1909
Email: Peter.Cebull at inl.gov

Sean Dilda <agrajag at dragaera.net> wrote on 04/06/2005 08:03:27 AM:

> I'm running SGE 6.0u3 on my cluster and lately have been noticing that 
> SGE doesn't always pick the best node to run a job on.  All of my nodes 
> are dual CPU.  And sometimes when I submit a job, SGE will put it on a 
> node that already has a load average of 1.0, even though there are other 

> nodes in the cluster with a load of 0.0.  I've double-checked qstat -j 
> on the job and the nodes with a load average of 0.0 don't show up as 
> being disqualified for any reason.  It seems that SGE just isn't sorting 

> the list of available queues.  Has anyone else seen anything like this?
> 
> My queue_sort_method is set to 'load', and load_formula is set to 
> 'np_load_avg'
> 
> If anyone has any thoughts on how I might fix this, I'd appreciate 
> hearing them.
> 
> 
> Thanks,
> 
> 
> Sean
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 



More information about the gridengine-users mailing list