[GE users] intensive job

Mag Gam magawake at gmail.com
Sun Oct 26 17:08:15 GMT 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

I am certain I don't have any quotas regarding this.


qconf -srqs
{
   name         cpu_limit
   description  NONE
   enabled      TRUE
   limit        users mathprof to slots=8
}


Here is there output for the job

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 530431
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 530431
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) 15625000
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 530431
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 15625000
cpu time               (seconds, -t) unlimited
max user processes              (-u) 530431
virtual memory          (kbytes, -v) 15625000
file locks                      (-x) unlimited


See anything else?


On Sun, Oct 26, 2008 at 12:37 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
> Am 26.10.2008 um 16:16 schrieb Mag Gam:
>
>> Thanks Reuti as usual!
>>
>> I have came to this problem now. My java application is giving me this
>> error:
>>
>> Error occurred during initialization of VM
>> Could not reserve enough space for object heap
>>
>> All of the servers are free of memory, so there is no memory contention.
>>
>> I am submitting the job as qsub script.sh (without any -l options)
>>
>> However, if I run it via ssh I get the correct results. I am not sure
>> why I am getting this error.
>>
>> I tried to look at this and it seems you are giving some replies here,
>> but still not helpful :-(
>>
>>
>> http://fossplanet.com/clustering.gridengine.users/message-1123088-strange-consequence-changing-n1ge/
>
> Mag,
>
> this can really be related. Can you please post your queue configuration -
> did you define any limits there?
>
> Another hint would be to submit a job listing the limits inside a job, i.e.:
>
> #!/bin/sh
> ulimit -aH
> echo
> ulimit -aS
>
> -- Reuti
>
>>
>> Any ideas?
>>
>>
>> On Sun, Oct 26, 2008 at 9:57 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
>>>
>>> Hi,
>>>
>>> Am 26.10.2008 um 14:10 schrieb Mag Gam:
>>>
>>>> Hello Reuti:
>>>>
>>>> Would it help if I started at 10 instead of 1?
>>>
>>> sure, in this case you would just need the files *.10 to *.19 when you
>>> want
>>> to avoid the computation of canonical names for *.01 to *.10.
>>>
>>> qsub -t 10-19 ...
>>>
>>> -- Reuti
>>>
>>>
>>>> #!/bin/sh
>>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to produce
>>>> out.$SGE_TASK_ID"
>>>> sleep 60
>>>> exit 0
>>>>
>>>> and start it with:
>>>> qsub -t 10 script.sh
>>>>
>>>> Works.
>>>>
>>>>
>>>>
>>>> On Sat, Oct 25, 2008 at 1:30 PM, Reuti <reuti at staff.uni-marburg.de>
>>>> wrote:
>>>>>
>>>>> Am 25.10.2008 um 16:20 schrieb Mag Gam:
>>>>>
>>>>>> Reuti:
>>>>>>
>>>>>> As usual, thankyou! This is very help, but perhaps I should backup a
>>>>>> little.
>>>>>>
>>>>>> "qsub -l virtual_free=40g" does that reserve space or does it wait for
>>>>>> that space?
>>>>>
>>>>> As long as there are only SGE's jobs: both.
>>>>>
>>>>>> Also, what if a user (non GRID) is using the servers. I
>>>>>> assume SGE will not account for that, or will it?
>>>>>
>>>>> This is always unpredictable. Can you force your interactive users to
>>>>> go
>>>>> through SGE by requesting a an interactive job? Then yoiu would need
>>>>> h_vmem
>>>>> instead of virtual_free to enforce the limits. for both typers of jobs.
>>>>>
>>>>>> My intention is this:
>>>>>> I have 1000000 file
>>>>>>
>>>>>> I split it into 10 blocks
>>>>>> 100000.a
>>>>>> 100000.b
>>>>>> 100000.c
>>>>>> ....
>>>>>> 100000.j
>>>>>
>>>>> when you have split them already, you will need to rename them to
>>>>> 100000.1
>>>>> ... 100000.10
>>>>>
>>>>>> I also have a wrapper script like this.
>>>>>>
>>>>>> #!/bin/ksh
>>>>>> #wrapper script -- wrapper.sh <filename>
>>>>>> #$ -cwd
>>>>>> #$ -V
>>>>>> #$ -N fluid
>>>>>> #$ -S /bin/ksh
>>>>>>
>>>>>> file=$1
>>>>>> cat $file | java -XmX 40000m fluid0 > out.$SGE_TASK_ID.dat
>>>>>>
>>>>>> I invoke the script like this:
>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.a
>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.b
>>>>>> ...
>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.j
>>>>>
>>>>> Please try first a simple job, to see how array jobs are handled:
>>>>>
>>>>> #!/bin/sh
>>>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to produce
>>>>> out.$SGE_TASK_ID"
>>>>> sleep 60
>>>>> exit 0
>>>>>
>>>>> and start it with:
>>>>>
>>>>> qsub -t 10 script.sh
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>>
>>>>>> I have tried to use the -t option for an array job, but it was not
>>>>>> working for some reason.
>>>>>>
>>>>>> Any thoughts about this method?
>>>>>>
>>>>>> TIA
>>>>>>
>>>>>>
>>>>>> On Sat, Oct 25, 2008 at 7:14 AM, Reuti <reuti at staff.uni-marburg.de>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Mag,
>>>>>>>
>>>>>>> Am 25.10.2008 um 02:40 schrieb Mag Gam:
>>>>>>>
>>>>>>>> Hello All.
>>>>>>>>
>>>>>>>> We have a professor who is notorious for bring down our engineering
>>>>>>>> GRID (64 servers) servers due to his direct numerical simulations.
>>>>>>>> He
>>>>>>>> basically runs a Java program with -Xmx 40000m (40 gigs). This
>>>>>>>> preallocates 40 gigs of memory and then crashes the box because
>>>>>>>> there
>>>>>>>
>>>>>>> this looks more like that you have to setup SGE to manage the memory
>>>>>>> and
>>>>>>> request the necessary amount of memory for the job and submit it with
>>>>>>> "qsub
>>>>>>> -l virtual_free=40g ..."
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&msgNo=15079
>>>>>>>
>>>>>>>> are other processes running on the box. Each box has 128G of
>>>>>>>> Physical
>>>>>>>> memory. He runs the application like this:
>>>>>>>> cat series | java -Xmx 40000m fluid0 > out.dat
>>>>>>>>
>>>>>>>> the "series" file has over 10 million records.
>>>>>>>>
>>>>>>>> I was thinking of something like this: split the 10 million records
>>>>>>>> into 10 files (each file has 1 million record), submit 10 array
>>>>>>>> jobs,
>>>>>>>> and then output to out.dat. But the order for 'out.dat' matters! I
>>>>>>>> would like to run these 10 jobs independently, but how can I
>>>>>>>> maintain
>>>>>>>> order?  Or is there a better way to do this?
>>>>>>>>
>>>>>>>> By him submitting his current job it would not be wise...
>>>>>>>
>>>>>>> You mean: one array job with 10 tasks - right? So "qsub -t 1-10
>>>>>>> my_job".
>>>>>>>
>>>>>>> In each jobscript you can use (adjust for the usual +/- 1 problem at
>>>>>>> the
>>>>>>> beginning and end):
>>>>>>>
>>>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p | java
>>>>>>> -Xmx
>>>>>>> 40000m fluid0 > out${SGE_TASK_ID}.dat
>>>>>>>
>>>>>>> hence output only the necessary lines of the input file and create a
>>>>>>> unique
>>>>>>> output file for each task of an array job. Also for the output file,
>>>>>>> maybe
>>>>>>> it's not necessary to concat them into one file, as you can sometimes
>>>>>>> use
>>>>>>> a
>>>>>>> construct like:
>>>>>>>
>>>>>>> cat out*.dat | my_pgm
>>>>>>>
>>>>>>> for further processing. More than 9 tasks this would lead to the
>>>>>>> wrong
>>>>>>> order
>>>>>>> 1, 10, 2, 3, ... and you need a variant from the above command:
>>>>>>>
>>>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p | java
>>>>>>> -Xmx
>>>>>>> 40000m fluid0 > out$(printf "%02d" $SGE_TASK_ID).dat
>>>>>>>
>>>>>>> for having leading zeros for the index in the name of the output
>>>>>>> file.
>>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list