[GE users] intensive job

Reuti reuti at staff.uni-marburg.de
Sun Oct 26 16:37:51 GMT 2008


Am 26.10.2008 um 16:16 schrieb Mag Gam:

> Thanks Reuti as usual!
>
> I have came to this problem now. My java application is giving me  
> this error:
>
> Error occurred during initialization of VM
> Could not reserve enough space for object heap
>
> All of the servers are free of memory, so there is no memory  
> contention.
>
> I am submitting the job as qsub script.sh (without any -l options)
>
> However, if I run it via ssh I get the correct results. I am not sure
> why I am getting this error.
>
> I tried to look at this and it seems you are giving some replies here,
> but still not helpful :-(
>
> http://fossplanet.com/clustering.gridengine.users/message-1123088- 
> strange-consequence-changing-n1ge/

Mag,

this can really be related. Can you please post your queue  
configuration - did you define any limits there?

Another hint would be to submit a job listing the limits inside a  
job, i.e.:

#!/bin/sh
ulimit -aH
echo
ulimit -aS

-- Reuti

>
> Any ideas?
>
>
> On Sun, Oct 26, 2008 at 9:57 AM, Reuti <reuti at staff.uni-marburg.de>  
> wrote:
>> Hi,
>>
>> Am 26.10.2008 um 14:10 schrieb Mag Gam:
>>
>>> Hello Reuti:
>>>
>>> Would it help if I started at 10 instead of 1?
>>
>> sure, in this case you would just need the files *.10 to *.19 when  
>> you want
>> to avoid the computation of canonical names for *.01 to *.10.
>>
>> qsub -t 10-19 ...
>>
>> -- Reuti
>>
>>
>>> #!/bin/sh
>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to produce
>>> out.$SGE_TASK_ID"
>>> sleep 60
>>> exit 0
>>>
>>> and start it with:
>>> qsub -t 10 script.sh
>>>
>>> Works.
>>>
>>>
>>>
>>> On Sat, Oct 25, 2008 at 1:30 PM, Reuti <reuti at staff.uni- 
>>> marburg.de> wrote:
>>>>
>>>> Am 25.10.2008 um 16:20 schrieb Mag Gam:
>>>>
>>>>> Reuti:
>>>>>
>>>>> As usual, thankyou! This is very help, but perhaps I should  
>>>>> backup a
>>>>> little.
>>>>>
>>>>> "qsub -l virtual_free=40g" does that reserve space or does it  
>>>>> wait for
>>>>> that space?
>>>>
>>>> As long as there are only SGE's jobs: both.
>>>>
>>>>> Also, what if a user (non GRID) is using the servers. I
>>>>> assume SGE will not account for that, or will it?
>>>>
>>>> This is always unpredictable. Can you force your interactive  
>>>> users to go
>>>> through SGE by requesting a an interactive job? Then yoiu would  
>>>> need
>>>> h_vmem
>>>> instead of virtual_free to enforce the limits. for both typers  
>>>> of jobs.
>>>>
>>>>> My intention is this:
>>>>> I have 1000000 file
>>>>>
>>>>> I split it into 10 blocks
>>>>> 100000.a
>>>>> 100000.b
>>>>> 100000.c
>>>>> ....
>>>>> 100000.j
>>>>
>>>> when you have split them already, you will need to rename them to
>>>> 100000.1
>>>> ... 100000.10
>>>>
>>>>> I also have a wrapper script like this.
>>>>>
>>>>> #!/bin/ksh
>>>>> #wrapper script -- wrapper.sh <filename>
>>>>> #$ -cwd
>>>>> #$ -V
>>>>> #$ -N fluid
>>>>> #$ -S /bin/ksh
>>>>>
>>>>> file=$1
>>>>> cat $file | java -XmX 40000m fluid0 > out.$SGE_TASK_ID.dat
>>>>>
>>>>> I invoke the script like this:
>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.a
>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.b
>>>>> ...
>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.j
>>>>
>>>> Please try first a simple job, to see how array jobs are handled:
>>>>
>>>> #!/bin/sh
>>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to produce
>>>> out.$SGE_TASK_ID"
>>>> sleep 60
>>>> exit 0
>>>>
>>>> and start it with:
>>>>
>>>> qsub -t 10 script.sh
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>>
>>>>> I have tried to use the -t option for an array job, but it was not
>>>>> working for some reason.
>>>>>
>>>>> Any thoughts about this method?
>>>>>
>>>>> TIA
>>>>>
>>>>>
>>>>> On Sat, Oct 25, 2008 at 7:14 AM, Reuti <reuti at staff.uni- 
>>>>> marburg.de>
>>>>> wrote:
>>>>>>
>>>>>> Hi Mag,
>>>>>>
>>>>>> Am 25.10.2008 um 02:40 schrieb Mag Gam:
>>>>>>
>>>>>>> Hello All.
>>>>>>>
>>>>>>> We have a professor who is notorious for bring down our  
>>>>>>> engineering
>>>>>>> GRID (64 servers) servers due to his direct numerical  
>>>>>>> simulations. He
>>>>>>> basically runs a Java program with -Xmx 40000m (40 gigs). This
>>>>>>> preallocates 40 gigs of memory and then crashes the box  
>>>>>>> because there
>>>>>>
>>>>>> this looks more like that you have to setup SGE to manage the  
>>>>>> memory
>>>>>> and
>>>>>> request the necessary amount of memory for the job and submit  
>>>>>> it with
>>>>>> "qsub
>>>>>> -l virtual_free=40g ..."
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://gridengine.sunsource.net/servlets/ReadMsg? 
>>>>>> listName=users&msgNo=15079
>>>>>>
>>>>>>> are other processes running on the box. Each box has 128G of  
>>>>>>> Physical
>>>>>>> memory. He runs the application like this:
>>>>>>> cat series | java -Xmx 40000m fluid0 > out.dat
>>>>>>>
>>>>>>> the "series" file has over 10 million records.
>>>>>>>
>>>>>>> I was thinking of something like this: split the 10 million  
>>>>>>> records
>>>>>>> into 10 files (each file has 1 million record), submit 10  
>>>>>>> array jobs,
>>>>>>> and then output to out.dat. But the order for 'out.dat'  
>>>>>>> matters! I
>>>>>>> would like to run these 10 jobs independently, but how can I  
>>>>>>> maintain
>>>>>>> order?  Or is there a better way to do this?
>>>>>>>
>>>>>>> By him submitting his current job it would not be wise...
>>>>>>
>>>>>> You mean: one array job with 10 tasks - right? So "qsub -t 1-10
>>>>>> my_job".
>>>>>>
>>>>>> In each jobscript you can use (adjust for the usual +/- 1  
>>>>>> problem at
>>>>>> the
>>>>>> beginning and end):
>>>>>>
>>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p |  
>>>>>> java
>>>>>> -Xmx
>>>>>> 40000m fluid0 > out${SGE_TASK_ID}.dat
>>>>>>
>>>>>> hence output only the necessary lines of the input file and  
>>>>>> create a
>>>>>> unique
>>>>>> output file for each task of an array job. Also for the output  
>>>>>> file,
>>>>>> maybe
>>>>>> it's not necessary to concat them into one file, as you can  
>>>>>> sometimes
>>>>>> use
>>>>>> a
>>>>>> construct like:
>>>>>>
>>>>>> cat out*.dat | my_pgm
>>>>>>
>>>>>> for further processing. More than 9 tasks this would lead to  
>>>>>> the wrong
>>>>>> order
>>>>>> 1, 10, 2, 3, ... and you need a variant from the above command:
>>>>>>
>>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p |  
>>>>>> java
>>>>>> -Xmx
>>>>>> 40000m fluid0 > out$(printf "%02d" $SGE_TASK_ID).dat
>>>>>>
>>>>>> for having leading zeros for the index in the name of the  
>>>>>> output file.
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ----
>>>>>> To unsubscribe, e-mail: users- 
>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users- 
>>>>>> help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list