[GE users] Scaling up GE for huge number of jobs

Ron Chen ron_chen_123 at yahoo.com
Wed Jan 2 20:54:31 GMT 2008


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Is using array jobs an option?

 -Ron


--- Gary L Fox <garylfox at hotmail.com> wrote:
> I have a Linux cluster that is running RH4update 4 across all
> nodes (about 70 nodes total).
> 
> We have SGE 6.0u10 running and have had very little problems
> for quite  a while.
> However, our users have recently added a new type of job they
> run and they run these new jobs by the tens of thousands at a
> time.  
> Currently, the queue contains 160K jobs.  
> Well needless to say, things seem to be running in slow motion
> now.  The scheduler is running at around 100% CPU constantly.
> We were not getting any meaningful response in qmon and to
> qsub and qstat commands, so I restarted SGE.  I increased the
> schedule_interval from 15secs to 2 mins.  Between the restart
> and the increased interval, things seem to be working better,
> as we can now get a response from qmon and qstat and we can
> submit jobs too.  But things are still very much like slow
> motion.  
> 
> The cluster does not seem to remain full with jobs.  Some
> nodes have only one job running and a few even have no jobs.
> (each node is 2CPU and normally would have 2 jobs running).   
> We also have noticed that jobs from different users do not
> balance out (through fair share) as they have in the past. 
> Newly submitted jobs remain at the bottom of the queue with a
> priority of 0.0000.  Earlier queued jobs from another user
> have a priority around 0.55 to 0.56.  
> 
> I have always had reservations turned off with
> max_reservation=0.  I have the default value for
> max_functional_jobs_to_schedule set to 200.  I also just
> changed maxujobs to 136 from a value of 0.  
> 
> What can I do to optimize the settings for this scenario and
> get better utilization?  
> 
> Thank you,
> Gary
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
> 



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list