[GE users] Scaling up GE for huge number of jobs

Gary L Fox garylfox at hotmail.com
Fri Jan 4 20:06:04 GMT 2008


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


Thanks Brett and Ron,
I don't know what an array job is, so I will have to check that out.  It sounds like it might help this situation a lot.  


We have an additional problem now.  After clearing out all of the 160K jobs, I expected our cluster to return to normal operation.
However, our 2 core nodes remain half empty, in spite of the fairly low load values.  Always in the past, the ideal has been to have 2 jobs per node (1 job per CPU/core).  We are using classic spooling with version 6.0u10.  Is there something that may have gotten corrupted by this really large number of queued jobs?  Is there any easy way to clear things out and reset SGE?  
Thank you,
Gary

To: users at gridengine.sunsource.net
From: Brett_W_Grant at raytheon.com
Date: Wed, 2 Jan 2008 14:03:36 -0700
Subject: Re: [GE users] Scaling up GE for huge number of jobs



I think that this corresponds, but maybe
not.  I have a number of what I call semi-large clusters that I use
to run simulations.  Our IT dept goes by cores, not nodes, but one
has 332 cores, one 268 cores, one 256 cores, and one 192 cores.  The
268 core cluster are macs, everything else are RH4 Linux clusters.



Basically I have a simulation that takes
a number of inputs, two of which are x & y positions and calculates
a result at that x,y location.  Depending upon the other inputs, there
are between 500 and 20,000 x,y positions for each set of inputs.  Each
x,y point takes between 5 seconds and 5 minutes to simulate.  The
important thing to know here is that all of the inputs except for x &
y remain the same.



The first thing that IT noticed was
that due to the fast finish times of some of the sims, cores would sit
idle.  They hypothesized that the jobs are finishing before the scheduler
can get back to that node, so what they did was really up the number of
slots per que instance.  Something like 3X the number of cores per
node.  This made it so the computer is never idle.  That isn't
really the approach that I would take, but that is what they did.  I
don't know what the proper term for this is, but I call it, "way overloading
the processor".



I wrote a script that uses the SGE_TASK_ID
parameters so that I could submit array jobs.  From the command line,
this makes a 500k job look like one line, more of a user convenience than
anything else, although it does make the job submission easier and faster.



We have had issues with file locking,
slow nfs response, running out of inodes, and a myriad of other issues
that crept up when we scaled up, so watch out for that, too.



In the end, we rewrote the simulation,
so rather than each instance of the simulation simulating 1 x,y pair, each
instance would simulate all of the x,y pairs for that input condition.
 This works fairly well, if you make sure not to send the jobs to
ques that overload the processors.  It does change the amount of time
a grid job takes, though.  We went from short jobs, to jobs that take
hours and days to complete.  The actual cpu time is the same, but
if one user submits 70, 48 hour jobs, user B has to wait 48 hours before
his jobs start.  I know that people have made priority ques, but our
IT dept has not.



Rewriting the simulation was rather
drastic, so I later developed a submit script that would look at the input
file, which for us is one set of input conditions per line, and submit
every 20th line.  The runscript was changed to run the line provided
by SGE_TASK_ID and the next 19 past it.  I picked 20, because that
gave a job run time of about 15 minutes, which seem to work well in our
situation.



I am sure that there are lot of other
ways to do this, but this is the path that we have taken.   



Good Luck,

Brett Grant










Gary L Fox <garylfox at hotmail.com>

01/02/2008 01:24 PM



Please respond to

users at gridengine.sunsource.net






To
<users at gridengine.sunsource.net>


cc



Subject
[GE users] Scaling up GE for huge number
of jobs

















I have a Linux cluster that is running RH4update 4 across all nodes (about
70 nodes total).



We have SGE 6.0u10 running and have had very little problems for quite
 a while.

However, our users have recently added a new type of job they run and they
run these new jobs by the tens of thousands at a time.  

Currently, the queue contains 160K jobs.  

Well needless to say, things seem to be running in slow motion now.  The
scheduler is running at around 100% CPU constantly.

We were not getting any meaningful response in qmon and to qsub and qstat
commands, so I restarted SGE.  I increased the schedule_interval from
15secs to 2 mins.  Between the restart and the increased interval,
things seem to be working better, as we can now get a response from qmon
and qstat and we can submit jobs too.  But things are still very much
like slow motion.  



The cluster does not seem to remain full with jobs.  Some nodes have
only one job running and a few even have no jobs. (each node is 2CPU and
normally would have 2 jobs running).   

We also have noticed that jobs from different users do not balance out
(through fair share) as they have in the past.  Newly submitted jobs
remain at the bottom of the queue with a priority of 0.0000.  Earlier
queued jobs from another user have a priority around 0.55 to 0.56.  



I have always had reservations turned off with max_reservation=0.  I
have the default value for max_functional_jobs_to_schedule set to 200.
 I also just changed maxujobs to 136 from a value of 0.  



What can I do to optimize the settings for this scenario and get better
utilization?  



Thank you,

Gary

---------------------------------------------------------------------

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net

For additional commands, e-mail: users-help at gridengine.sunsource.net








More information about the gridengine-users mailing list