[GE users] intensive job

Mag Gam magawake at gmail.com
Sun Oct 26 15:46:52 GMT 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

More information

$ ulimit -aH
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 530431
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 530431
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

With these settings I am able to run my application without the VM
errors. However, when I submit them via qsub it fails with the errors.

TIA



On Sun, Oct 26, 2008 at 11:16 AM, Mag Gam <magawake at gmail.com> wrote:
> Thanks Reuti as usual!
>
> I have came to this problem now. My java application is giving me this error:
>
> Error occurred during initialization of VM
> Could not reserve enough space for object heap
>
> All of the servers are free of memory, so there is no memory contention.
>
> I am submitting the job as qsub script.sh (without any -l options)
>
> However, if I run it via ssh I get the correct results. I am not sure
> why I am getting this error.
>
> I tried to look at this and it seems you are giving some replies here,
> but still not helpful :-(
>
> http://fossplanet.com/clustering.gridengine.users/message-1123088-strange-consequence-changing-n1ge/
>
> Any ideas?
>
>
> On Sun, Oct 26, 2008 at 9:57 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
>> Hi,
>>
>> Am 26.10.2008 um 14:10 schrieb Mag Gam:
>>
>>> Hello Reuti:
>>>
>>> Would it help if I started at 10 instead of 1?
>>
>> sure, in this case you would just need the files *.10 to *.19 when you want
>> to avoid the computation of canonical names for *.01 to *.10.
>>
>> qsub -t 10-19 ...
>>
>> -- Reuti
>>
>>
>>> #!/bin/sh
>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to produce
>>> out.$SGE_TASK_ID"
>>> sleep 60
>>> exit 0
>>>
>>> and start it with:
>>> qsub -t 10 script.sh
>>>
>>> Works.
>>>
>>>
>>>
>>> On Sat, Oct 25, 2008 at 1:30 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
>>>>
>>>> Am 25.10.2008 um 16:20 schrieb Mag Gam:
>>>>
>>>>> Reuti:
>>>>>
>>>>> As usual, thankyou! This is very help, but perhaps I should backup a
>>>>> little.
>>>>>
>>>>> "qsub -l virtual_free=40g" does that reserve space or does it wait for
>>>>> that space?
>>>>
>>>> As long as there are only SGE's jobs: both.
>>>>
>>>>> Also, what if a user (non GRID) is using the servers. I
>>>>> assume SGE will not account for that, or will it?
>>>>
>>>> This is always unpredictable. Can you force your interactive users to go
>>>> through SGE by requesting a an interactive job? Then yoiu would need
>>>> h_vmem
>>>> instead of virtual_free to enforce the limits. for both typers of jobs.
>>>>
>>>>> My intention is this:
>>>>> I have 1000000 file
>>>>>
>>>>> I split it into 10 blocks
>>>>> 100000.a
>>>>> 100000.b
>>>>> 100000.c
>>>>> ....
>>>>> 100000.j
>>>>
>>>> when you have split them already, you will need to rename them to
>>>> 100000.1
>>>> ... 100000.10
>>>>
>>>>> I also have a wrapper script like this.
>>>>>
>>>>> #!/bin/ksh
>>>>> #wrapper script -- wrapper.sh <filename>
>>>>> #$ -cwd
>>>>> #$ -V
>>>>> #$ -N fluid
>>>>> #$ -S /bin/ksh
>>>>>
>>>>> file=$1
>>>>> cat $file | java -XmX 40000m fluid0 > out.$SGE_TASK_ID.dat
>>>>>
>>>>> I invoke the script like this:
>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.a
>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.b
>>>>> ...
>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.j
>>>>
>>>> Please try first a simple job, to see how array jobs are handled:
>>>>
>>>> #!/bin/sh
>>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to produce
>>>> out.$SGE_TASK_ID"
>>>> sleep 60
>>>> exit 0
>>>>
>>>> and start it with:
>>>>
>>>> qsub -t 10 script.sh
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>>
>>>>> I have tried to use the -t option for an array job, but it was not
>>>>> working for some reason.
>>>>>
>>>>> Any thoughts about this method?
>>>>>
>>>>> TIA
>>>>>
>>>>>
>>>>> On Sat, Oct 25, 2008 at 7:14 AM, Reuti <reuti at staff.uni-marburg.de>
>>>>> wrote:
>>>>>>
>>>>>> Hi Mag,
>>>>>>
>>>>>> Am 25.10.2008 um 02:40 schrieb Mag Gam:
>>>>>>
>>>>>>> Hello All.
>>>>>>>
>>>>>>> We have a professor who is notorious for bring down our engineering
>>>>>>> GRID (64 servers) servers due to his direct numerical simulations. He
>>>>>>> basically runs a Java program with -Xmx 40000m (40 gigs). This
>>>>>>> preallocates 40 gigs of memory and then crashes the box because there
>>>>>>
>>>>>> this looks more like that you have to setup SGE to manage the memory
>>>>>> and
>>>>>> request the necessary amount of memory for the job and submit it with
>>>>>> "qsub
>>>>>> -l virtual_free=40g ..."
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&msgNo=15079
>>>>>>
>>>>>>> are other processes running on the box. Each box has 128G of Physical
>>>>>>> memory. He runs the application like this:
>>>>>>> cat series | java -Xmx 40000m fluid0 > out.dat
>>>>>>>
>>>>>>> the "series" file has over 10 million records.
>>>>>>>
>>>>>>> I was thinking of something like this: split the 10 million records
>>>>>>> into 10 files (each file has 1 million record), submit 10 array jobs,
>>>>>>> and then output to out.dat. But the order for 'out.dat' matters! I
>>>>>>> would like to run these 10 jobs independently, but how can I maintain
>>>>>>> order?  Or is there a better way to do this?
>>>>>>>
>>>>>>> By him submitting his current job it would not be wise...
>>>>>>
>>>>>> You mean: one array job with 10 tasks - right? So "qsub -t 1-10
>>>>>> my_job".
>>>>>>
>>>>>> In each jobscript you can use (adjust for the usual +/- 1 problem at
>>>>>> the
>>>>>> beginning and end):
>>>>>>
>>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p | java
>>>>>> -Xmx
>>>>>> 40000m fluid0 > out${SGE_TASK_ID}.dat
>>>>>>
>>>>>> hence output only the necessary lines of the input file and create a
>>>>>> unique
>>>>>> output file for each task of an array job. Also for the output file,
>>>>>> maybe
>>>>>> it's not necessary to concat them into one file, as you can sometimes
>>>>>> use
>>>>>> a
>>>>>> construct like:
>>>>>>
>>>>>> cat out*.dat | my_pgm
>>>>>>
>>>>>> for further processing. More than 9 tasks this would lead to the wrong
>>>>>> order
>>>>>> 1, 10, 2, 3, ... and you need a variant from the above command:
>>>>>>
>>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p | java
>>>>>> -Xmx
>>>>>> 40000m fluid0 > out$(printf "%02d" $SGE_TASK_ID).dat
>>>>>>
>>>>>> for having leading zeros for the index in the name of the output file.
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list