[GE users] intensive job

Reuti reuti at staff.uni-marburg.de
Sat Oct 25 12:14:55 BST 2008


Hi Mag,

Am 25.10.2008 um 02:40 schrieb Mag Gam:

> Hello All.
>
> We have a professor who is notorious for bring down our engineering
> GRID (64 servers) servers due to his direct numerical simulations. He
> basically runs a Java program with -Xmx 40000m (40 gigs). This
> preallocates 40 gigs of memory and then crashes the box because there

this looks more like that you have to setup SGE to manage the memory  
and request the necessary amount of memory for the job and submit it  
with "qsub -l virtual_free=40g ..."

http://gridengine.sunsource.net/servlets/ReadMsg? 
listName=users&msgNo=15079

> are other processes running on the box. Each box has 128G of Physical
> memory. He runs the application like this:
> cat series | java -Xmx 40000m fluid0 > out.dat
>
> the "series" file has over 10 million records.
>
> I was thinking of something like this: split the 10 million records
> into 10 files (each file has 1 million record), submit 10 array jobs,
> and then output to out.dat. But the order for 'out.dat' matters! I
> would like to run these 10 jobs independently, but how can I maintain
> order?  Or is there a better way to do this?
>
> By him submitting his current job it would not be wise...

You mean: one array job with 10 tasks - right? So "qsub -t 1-10 my_job".

In each jobscript you can use (adjust for the usual +/- 1 problem at  
the beginning and end):

sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p | java - 
Xmx 40000m fluid0 > out${SGE_TASK_ID}.dat

hence output only the necessary lines of the input file and create a  
unique output file for each task of an array job. Also for the output  
file, maybe it's not necessary to concat them into one file, as you  
can sometimes use a construct like:

cat out*.dat | my_pgm

for further processing. More than 9 tasks this would lead to the  
wrong order 1, 10, 2, 3, ... and you need a variant from the above  
command:

sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p | java - 
Xmx 40000m fluid0 > out$(printf "%02d" $SGE_TASK_ID).dat

for having leading zeros for the index in the name of the output file.

-- Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list