[GE users] intensive job

Mag Gam magawake at gmail.com
Sun Oct 26 15:16:42 GMT 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Thanks Reuti as usual!

I have came to this problem now. My java application is giving me this error:

Error occurred during initialization of VM
Could not reserve enough space for object heap

All of the servers are free of memory, so there is no memory contention.

I am submitting the job as qsub script.sh (without any -l options)

However, if I run it via ssh I get the correct results. I am not sure
why I am getting this error.

I tried to look at this and it seems you are giving some replies here,
but still not helpful :-(

http://fossplanet.com/clustering.gridengine.users/message-1123088-strange-consequence-changing-n1ge/

Any ideas?


On Sun, Oct 26, 2008 at 9:57 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
> Hi,
>
> Am 26.10.2008 um 14:10 schrieb Mag Gam:
>
>> Hello Reuti:
>>
>> Would it help if I started at 10 instead of 1?
>
> sure, in this case you would just need the files *.10 to *.19 when you want
> to avoid the computation of canonical names for *.01 to *.10.
>
> qsub -t 10-19 ...
>
> -- Reuti
>
>
>> #!/bin/sh
>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to produce
>> out.$SGE_TASK_ID"
>> sleep 60
>> exit 0
>>
>> and start it with:
>> qsub -t 10 script.sh
>>
>> Works.
>>
>>
>>
>> On Sat, Oct 25, 2008 at 1:30 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
>>>
>>> Am 25.10.2008 um 16:20 schrieb Mag Gam:
>>>
>>>> Reuti:
>>>>
>>>> As usual, thankyou! This is very help, but perhaps I should backup a
>>>> little.
>>>>
>>>> "qsub -l virtual_free=40g" does that reserve space or does it wait for
>>>> that space?
>>>
>>> As long as there are only SGE's jobs: both.
>>>
>>>> Also, what if a user (non GRID) is using the servers. I
>>>> assume SGE will not account for that, or will it?
>>>
>>> This is always unpredictable. Can you force your interactive users to go
>>> through SGE by requesting a an interactive job? Then yoiu would need
>>> h_vmem
>>> instead of virtual_free to enforce the limits. for both typers of jobs.
>>>
>>>> My intention is this:
>>>> I have 1000000 file
>>>>
>>>> I split it into 10 blocks
>>>> 100000.a
>>>> 100000.b
>>>> 100000.c
>>>> ....
>>>> 100000.j
>>>
>>> when you have split them already, you will need to rename them to
>>> 100000.1
>>> ... 100000.10
>>>
>>>> I also have a wrapper script like this.
>>>>
>>>> #!/bin/ksh
>>>> #wrapper script -- wrapper.sh <filename>
>>>> #$ -cwd
>>>> #$ -V
>>>> #$ -N fluid
>>>> #$ -S /bin/ksh
>>>>
>>>> file=$1
>>>> cat $file | java -XmX 40000m fluid0 > out.$SGE_TASK_ID.dat
>>>>
>>>> I invoke the script like this:
>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.a
>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.b
>>>> ...
>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.j
>>>
>>> Please try first a simple job, to see how array jobs are handled:
>>>
>>> #!/bin/sh
>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to produce
>>> out.$SGE_TASK_ID"
>>> sleep 60
>>> exit 0
>>>
>>> and start it with:
>>>
>>> qsub -t 10 script.sh
>>>
>>> -- Reuti
>>>
>>>
>>>>
>>>> I have tried to use the -t option for an array job, but it was not
>>>> working for some reason.
>>>>
>>>> Any thoughts about this method?
>>>>
>>>> TIA
>>>>
>>>>
>>>> On Sat, Oct 25, 2008 at 7:14 AM, Reuti <reuti at staff.uni-marburg.de>
>>>> wrote:
>>>>>
>>>>> Hi Mag,
>>>>>
>>>>> Am 25.10.2008 um 02:40 schrieb Mag Gam:
>>>>>
>>>>>> Hello All.
>>>>>>
>>>>>> We have a professor who is notorious for bring down our engineering
>>>>>> GRID (64 servers) servers due to his direct numerical simulations. He
>>>>>> basically runs a Java program with -Xmx 40000m (40 gigs). This
>>>>>> preallocates 40 gigs of memory and then crashes the box because there
>>>>>
>>>>> this looks more like that you have to setup SGE to manage the memory
>>>>> and
>>>>> request the necessary amount of memory for the job and submit it with
>>>>> "qsub
>>>>> -l virtual_free=40g ..."
>>>>>
>>>>>
>>>>>
>>>>> http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&msgNo=15079
>>>>>
>>>>>> are other processes running on the box. Each box has 128G of Physical
>>>>>> memory. He runs the application like this:
>>>>>> cat series | java -Xmx 40000m fluid0 > out.dat
>>>>>>
>>>>>> the "series" file has over 10 million records.
>>>>>>
>>>>>> I was thinking of something like this: split the 10 million records
>>>>>> into 10 files (each file has 1 million record), submit 10 array jobs,
>>>>>> and then output to out.dat. But the order for 'out.dat' matters! I
>>>>>> would like to run these 10 jobs independently, but how can I maintain
>>>>>> order?  Or is there a better way to do this?
>>>>>>
>>>>>> By him submitting his current job it would not be wise...
>>>>>
>>>>> You mean: one array job with 10 tasks - right? So "qsub -t 1-10
>>>>> my_job".
>>>>>
>>>>> In each jobscript you can use (adjust for the usual +/- 1 problem at
>>>>> the
>>>>> beginning and end):
>>>>>
>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p | java
>>>>> -Xmx
>>>>> 40000m fluid0 > out${SGE_TASK_ID}.dat
>>>>>
>>>>> hence output only the necessary lines of the input file and create a
>>>>> unique
>>>>> output file for each task of an array job. Also for the output file,
>>>>> maybe
>>>>> it's not necessary to concat them into one file, as you can sometimes
>>>>> use
>>>>> a
>>>>> construct like:
>>>>>
>>>>> cat out*.dat | my_pgm
>>>>>
>>>>> for further processing. More than 9 tasks this would lead to the wrong
>>>>> order
>>>>> 1, 10, 2, 3, ... and you need a variant from the above command:
>>>>>
>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p | java
>>>>> -Xmx
>>>>> 40000m fluid0 > out$(printf "%02d" $SGE_TASK_ID).dat
>>>>>
>>>>> for having leading zeros for the index in the name of the output file.
>>>>>
>>>>> -- Reuti
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list