[GE users] Scaling up GE for huge number of jobs

Rayson Ho rayrayson at gmail.com
Fri Jan 4 20:19:04 GMT 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

What does qstat -j say??

That should include information from the scheduler on why new jobs are
not starting on nodes...

Rayson



On Jan 4, 2008 3:06 PM, Gary L Fox <garylfox at hotmail.com> wrote:
> However, our 2 core nodes remain half empty, in spite of the fairly low load
> values.  Always in the past, the ideal has been to have 2 jobs per node (1
> job per CPU/core).  We are using classic spooling with version 6.0u10.  Is
> there something that may have gotten corrupted by this really large number
> of queued jobs?  Is there any easy way to clear things out and reset SGE?
> Thank you,
> Gary
>
> ________________________________
>
> To: users at gridengine.sunsource.net
> From: Brett_W_Grant at raytheon.com
> Date: Wed, 2 Jan 2008 14:03:36 -0700
> Subject: Re: [GE users] Scaling up GE for huge number of jobs
>
>
>
>
> I think that this corresponds, but maybe not.  I have a number of what I
> call semi-large clusters that I use to run simulations.  Our IT dept goes by
> cores, not nodes, but one has 332 cores, one 268 cores, one 256 cores, and
> one 192 cores.  The 268 core cluster are macs, everything else are RH4 Linux
> clusters.
>
> Basically I have a simulation that takes a number of inputs, two of which
> are x & y positions and calculates a result at that x,y location.  Depending
> upon the other inputs, there are between 500 and 20,000 x,y positions for
> each set of inputs.  Each x,y point takes between 5 seconds and 5 minutes to
> simulate.  The important thing to know here is that all of the inputs except
> for x & y remain the same.
>
> The first thing that IT noticed was that due to the fast finish times of
> some of the sims, cores would sit idle.  They hypothesized that the jobs are
> finishing before the scheduler can get back to that node, so what they did
> was really up the number of slots per que instance.  Something like 3X the
> number of cores per node.  This made it so the computer is never idle.  That
> isn't really the approach that I would take, but that is what they did.  I
> don't know what the proper term for this is, but I call it, "way overloading
> the processor".
>
> I wrote a script that uses the SGE_TASK_ID parameters so that I could submit
> array jobs.  From the command line, this makes a 500k job look like one
> line, more of a user convenience than anything else, although it does make
> the job submission easier and faster.
>
> We have had issues with file locking, slow nfs response, running out of
> inodes, and a myriad of other issues that crept up when we scaled up, so
> watch out for that, too.
>
> In the end, we rewrote the simulation, so rather than each instance of the
> simulation simulating 1 x,y pair, each instance would simulate all of the
> x,y pairs for that input condition.  This works fairly well, if you make
> sure not to send the jobs to ques that overload the processors.  It does
> change the amount of time a grid job takes, though.  We went from short
> jobs, to jobs that take hours and days to complete.  The actual cpu time is
> the same, but if one user submits 70, 48 hour jobs, user B has to wait 48
> hours before his jobs start.  I know that people have made priority ques,
> but our IT dept has not.
>
> Rewriting the simulation was rather drastic, so I later developed a submit
> script that would look at the input file, which for us is one set of input
> conditions per line, and submit every 20th line.  The runscript was changed
> to run the line provided by SGE_TASK_ID and the next 19 past it.  I picked
> 20, because that gave a job run time of about 15 minutes, which seem to work
> well in our situation.
>
> I am sure that there are lot of other ways to do this, but this is the path
> that we have taken.
>
> Good Luck,
> Brett Grant
>
>
>
>
>  Gary L Fox <garylfox at hotmail.com> 01/02/2008 01:24 PM
>
>
> Please respond to
>  users at gridengine.sunsource.net
>
>
> To <users at gridengine.sunsource.net>
>
> cc
>
> Subject [GE users] Scaling up GE for huge number of jobs
>
>
>
>
>
>
>
>  I have a Linux cluster that is running RH4update 4 across all nodes (about
> 70 nodes total).
>
>  We have SGE 6.0u10 running and have had very little problems for quite  a
> while.
>  However, our users have recently added a new type of job they run and they
> run these new jobs by the tens of thousands at a time.
>  Currently, the queue contains 160K jobs.
>  Well needless to say, things seem to be running in slow motion now.  The
> scheduler is running at around 100% CPU constantly.
>  We were not getting any meaningful response in qmon and to qsub and qstat
> commands, so I restarted SGE.  I increased the schedule_interval from 15secs
> to 2 mins.  Between the restart and the increased interval, things seem to
> be working better, as we can now get a response from qmon and qstat and we
> can submit jobs too.  But things are still very much like slow motion.
>
>  The cluster does not seem to remain full with jobs.  Some nodes have only
> one job running and a few even have no jobs. (each node is 2CPU and normally
> would have 2 jobs running).
>  We also have noticed that jobs from different users do not balance out
> (through fair share) as they have in the past.  Newly submitted jobs remain
> at the bottom of the queue with a priority of 0.0000.  Earlier queued jobs
> from another user have a priority around 0.55 to 0.56.
>
>  I have always had reservations turned off with max_reservation=0.  I have
> the default value for max_functional_jobs_to_schedule set to 200.  I also
> just changed maxujobs to 136 from a value of 0.
>
>  What can I do to optimize the settings for this scenario and get better
> utilization?
>
>  Thank you,
>  Gary
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>  For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list