[GE users] Scaling up GE for huge number of jobs

Brett W Grant Brett_W_Grant at raytheon.com
Wed Jan 2 21:03:36 GMT 2008

I think that this corresponds, but maybe not.  I have a number of what I 
call semi-large clusters that I use to run simulations.  Our IT dept goes 
by cores, not nodes, but one has 332 cores, one 268 cores, one 256 cores, 
and one 192 cores.  The 268 core cluster are macs, everything else are RH4 
Linux clusters.

Basically I have a simulation that takes a number of inputs, two of which 
are x & y positions and calculates a result at that x,y location. 
Depending upon the other inputs, there are between 500 and 20,000 x,y 
positions for each set of inputs.  Each x,y point takes between 5 seconds 
and 5 minutes to simulate.  The important thing to know here is that all 
of the inputs except for x & y remain the same.

The first thing that IT noticed was that due to the fast finish times of 
some of the sims, cores would sit idle.  They hypothesized that the jobs 
are finishing before the scheduler can get back to that node, so what they 
did was really up the number of slots per que instance.  Something like 3X 
the number of cores per node.  This made it so the computer is never idle. 
 That isn't really the approach that I would take, but that is what they 
did.  I don't know what the proper term for this is, but I call it, "way 
overloading the processor".

I wrote a script that uses the SGE_TASK_ID parameters so that I could 
submit array jobs.  From the command line, this makes a 500k job look like 
one line, more of a user convenience than anything else, although it does 
make the job submission easier and faster.

We have had issues with file locking, slow nfs response, running out of 
inodes, and a myriad of other issues that crept up when we scaled up, so 
watch out for that, too.

In the end, we rewrote the simulation, so rather than each instance of the 
simulation simulating 1 x,y pair, each instance would simulate all of the 
x,y pairs for that input condition.  This works fairly well, if you make 
sure not to send the jobs to ques that overload the processors.  It does 
change the amount of time a grid job takes, though.  We went from short 
jobs, to jobs that take hours and days to complete.  The actual cpu time 
is the same, but if one user submits 70, 48 hour jobs, user B has to wait 
48 hours before his jobs start.  I know that people have made priority 
ques, but our IT dept has not.

Rewriting the simulation was rather drastic, so I later developed a submit 
script that would look at the input file, which for us is one set of input 
conditions per line, and submit every 20th line.  The runscript was 
changed to run the line provided by SGE_TASK_ID and the next 19 past it. I 
picked 20, because that gave a job run time of about 15 minutes, which 
seem to work well in our situation.

I am sure that there are lot of other ways to do this, but this is the 
path that we have taken. 

Good Luck,
Brett Grant

Gary L Fox <garylfox at hotmail.com> 
01/02/2008 01:24 PM
Please respond to
users at gridengine.sunsource.net

<users at gridengine.sunsource.net>

[GE users] Scaling up GE for huge number of jobs

I have a Linux cluster that is running RH4update 4 across all nodes (about 
70 nodes total).

We have SGE 6.0u10 running and have had very little problems for quite  a 
However, our users have recently added a new type of job they run and they 
run these new jobs by the tens of thousands at a time. 
Currently, the queue contains 160K jobs. 
Well needless to say, things seem to be running in slow motion now.  The 
scheduler is running at around 100% CPU constantly.
We were not getting any meaningful response in qmon and to qsub and qstat 
commands, so I restarted SGE.  I increased the schedule_interval from 
15secs to 2 mins.  Between the restart and the increased interval, things 
seem to be working better, as we can now get a response from qmon and 
qstat and we can submit jobs too.  But things are still very much like 
slow motion. 

The cluster does not seem to remain full with jobs.  Some nodes have only 
one job running and a few even have no jobs. (each node is 2CPU and 
normally would have 2 jobs running). 
We also have noticed that jobs from different users do not balance out 
(through fair share) as they have in the past.  Newly submitted jobs 
remain at the bottom of the queue with a priority of 0.0000.  Earlier 
queued jobs from another user have a priority around 0.55 to 0.56. 

I have always had reservations turned off with max_reservation=0.  I have 
the default value for max_functional_jobs_to_schedule set to 200.  I also 
just changed maxujobs to 136 from a value of 0. 

What can I do to optimize the settings for this scenario and get better 

Thank you,
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list