[GE users] intensive job

Reuti reuti at staff.uni-marburg.de
Sun Oct 26 17:44:56 GMT 2008


Am 26.10.2008 um 18:08 schrieb Mag Gam:

> I am certain I don't have any quotas regarding this.
>
>
> qconf -srqs
> {
>    name         cpu_limit
>    description  NONE
>    enabled      TRUE
>    limit        users mathprof to slots=8
> }

Not the resource quotas, the queue configuration (qconf -sq myq). But  
it seems, that there are some limits defined, as stack and virtual  
memory are defined as 15G.

Only the soft-quotas are in effect, means what is an interactive  
"ulimit -aS" showing in addition?

The user is only allowed to change the limits in effect (i.e. the  
doft-limit) between the hard-limti and zero. He can also lower the  
hard-limit. But once it's lowered, it can't be risen again (unless  
root is executing these commands).

-- Reuti


>
>
> Here is there output for the job
>
> core file size          (blocks, -c) unlimited
> data seg size           (kbytes, -d) unlimited
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 530431
> max locked memory       (kbytes, -l) 32
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) unlimited
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 530431
> virtual memory          (kbytes, -v) unlimited
> file locks                      (-x) unlimited
>
> core file size          (blocks, -c) 0
> data seg size           (kbytes, -d) 15625000
> scheduling priority             (-e) 0
> file size               (blocks, -f) unlimited
> pending signals                 (-i) 530431
> max locked memory       (kbytes, -l) 32
> max memory size         (kbytes, -m) unlimited
> open files                      (-n) 1024
> pipe size            (512 bytes, -p) 8
> POSIX message queues     (bytes, -q) 819200
> real-time priority              (-r) 0
> stack size              (kbytes, -s) 15625000
> cpu time               (seconds, -t) unlimited
> max user processes              (-u) 530431
> virtual memory          (kbytes, -v) 15625000
> file locks                      (-x) unlimited
>
>
> See anything else?
>
>
> On Sun, Oct 26, 2008 at 12:37 PM, Reuti <reuti at staff.uni- 
> marburg.de> wrote:
>> Am 26.10.2008 um 16:16 schrieb Mag Gam:
>>
>>> Thanks Reuti as usual!
>>>
>>> I have came to this problem now. My java application is giving me  
>>> this
>>> error:
>>>
>>> Error occurred during initialization of VM
>>> Could not reserve enough space for object heap
>>>
>>> All of the servers are free of memory, so there is no memory  
>>> contention.
>>>
>>> I am submitting the job as qsub script.sh (without any -l options)
>>>
>>> However, if I run it via ssh I get the correct results. I am not  
>>> sure
>>> why I am getting this error.
>>>
>>> I tried to look at this and it seems you are giving some replies  
>>> here,
>>> but still not helpful :-(
>>>
>>>
>>> http://fossplanet.com/clustering.gridengine.users/message-1123088- 
>>> strange-consequence-changing-n1ge/
>>
>> Mag,
>>
>> this can really be related. Can you please post your queue  
>> configuration -
>> did you define any limits there?
>>
>> Another hint would be to submit a job listing the limits inside a  
>> job, i.e.:
>>
>> #!/bin/sh
>> ulimit -aH
>> echo
>> ulimit -aS
>>
>> -- Reuti
>>
>>>
>>> Any ideas?
>>>
>>>
>>> On Sun, Oct 26, 2008 at 9:57 AM, Reuti <reuti at staff.uni- 
>>> marburg.de> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Am 26.10.2008 um 14:10 schrieb Mag Gam:
>>>>
>>>>> Hello Reuti:
>>>>>
>>>>> Would it help if I started at 10 instead of 1?
>>>>
>>>> sure, in this case you would just need the files *.10 to *.19  
>>>> when you
>>>> want
>>>> to avoid the computation of canonical names for *.01 to *.10.
>>>>
>>>> qsub -t 10-19 ...
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> #!/bin/sh
>>>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to produce
>>>>> out.$SGE_TASK_ID"
>>>>> sleep 60
>>>>> exit 0
>>>>>
>>>>> and start it with:
>>>>> qsub -t 10 script.sh
>>>>>
>>>>> Works.
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Oct 25, 2008 at 1:30 PM, Reuti <reuti at staff.uni- 
>>>>> marburg.de>
>>>>> wrote:
>>>>>>
>>>>>> Am 25.10.2008 um 16:20 schrieb Mag Gam:
>>>>>>
>>>>>>> Reuti:
>>>>>>>
>>>>>>> As usual, thankyou! This is very help, but perhaps I should  
>>>>>>> backup a
>>>>>>> little.
>>>>>>>
>>>>>>> "qsub -l virtual_free=40g" does that reserve space or does it  
>>>>>>> wait for
>>>>>>> that space?
>>>>>>
>>>>>> As long as there are only SGE's jobs: both.
>>>>>>
>>>>>>> Also, what if a user (non GRID) is using the servers. I
>>>>>>> assume SGE will not account for that, or will it?
>>>>>>
>>>>>> This is always unpredictable. Can you force your interactive  
>>>>>> users to
>>>>>> go
>>>>>> through SGE by requesting a an interactive job? Then yoiu  
>>>>>> would need
>>>>>> h_vmem
>>>>>> instead of virtual_free to enforce the limits. for both typers  
>>>>>> of jobs.
>>>>>>
>>>>>>> My intention is this:
>>>>>>> I have 1000000 file
>>>>>>>
>>>>>>> I split it into 10 blocks
>>>>>>> 100000.a
>>>>>>> 100000.b
>>>>>>> 100000.c
>>>>>>> ....
>>>>>>> 100000.j
>>>>>>
>>>>>> when you have split them already, you will need to rename them to
>>>>>> 100000.1
>>>>>> ... 100000.10
>>>>>>
>>>>>>> I also have a wrapper script like this.
>>>>>>>
>>>>>>> #!/bin/ksh
>>>>>>> #wrapper script -- wrapper.sh <filename>
>>>>>>> #$ -cwd
>>>>>>> #$ -V
>>>>>>> #$ -N fluid
>>>>>>> #$ -S /bin/ksh
>>>>>>>
>>>>>>> file=$1
>>>>>>> cat $file | java -XmX 40000m fluid0 > out.$SGE_TASK_ID.dat
>>>>>>>
>>>>>>> I invoke the script like this:
>>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.a
>>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.b
>>>>>>> ...
>>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.j
>>>>>>
>>>>>> Please try first a simple job, to see how array jobs are handled:
>>>>>>
>>>>>> #!/bin/sh
>>>>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to  
>>>>>> produce
>>>>>> out.$SGE_TASK_ID"
>>>>>> sleep 60
>>>>>> exit 0
>>>>>>
>>>>>> and start it with:
>>>>>>
>>>>>> qsub -t 10 script.sh
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> I have tried to use the -t option for an array job, but it  
>>>>>>> was not
>>>>>>> working for some reason.
>>>>>>>
>>>>>>> Any thoughts about this method?
>>>>>>>
>>>>>>> TIA
>>>>>>>
>>>>>>>
>>>>>>> On Sat, Oct 25, 2008 at 7:14 AM, Reuti <reuti at staff.uni- 
>>>>>>> marburg.de>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Mag,
>>>>>>>>
>>>>>>>> Am 25.10.2008 um 02:40 schrieb Mag Gam:
>>>>>>>>
>>>>>>>>> Hello All.
>>>>>>>>>
>>>>>>>>> We have a professor who is notorious for bring down our  
>>>>>>>>> engineering
>>>>>>>>> GRID (64 servers) servers due to his direct numerical  
>>>>>>>>> simulations.
>>>>>>>>> He
>>>>>>>>> basically runs a Java program with -Xmx 40000m (40 gigs). This
>>>>>>>>> preallocates 40 gigs of memory and then crashes the box  
>>>>>>>>> because
>>>>>>>>> there
>>>>>>>>
>>>>>>>> this looks more like that you have to setup SGE to manage  
>>>>>>>> the memory
>>>>>>>> and
>>>>>>>> request the necessary amount of memory for the job and  
>>>>>>>> submit it with
>>>>>>>> "qsub
>>>>>>>> -l virtual_free=40g ..."
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://gridengine.sunsource.net/servlets/ReadMsg? 
>>>>>>>> listName=users&msgNo=15079
>>>>>>>>
>>>>>>>>> are other processes running on the box. Each box has 128G of
>>>>>>>>> Physical
>>>>>>>>> memory. He runs the application like this:
>>>>>>>>> cat series | java -Xmx 40000m fluid0 > out.dat
>>>>>>>>>
>>>>>>>>> the "series" file has over 10 million records.
>>>>>>>>>
>>>>>>>>> I was thinking of something like this: split the 10 million  
>>>>>>>>> records
>>>>>>>>> into 10 files (each file has 1 million record), submit 10  
>>>>>>>>> array
>>>>>>>>> jobs,
>>>>>>>>> and then output to out.dat. But the order for 'out.dat'  
>>>>>>>>> matters! I
>>>>>>>>> would like to run these 10 jobs independently, but how can I
>>>>>>>>> maintain
>>>>>>>>> order?  Or is there a better way to do this?
>>>>>>>>>
>>>>>>>>> By him submitting his current job it would not be wise...
>>>>>>>>
>>>>>>>> You mean: one array job with 10 tasks - right? So "qsub -t 1-10
>>>>>>>> my_job".
>>>>>>>>
>>>>>>>> In each jobscript you can use (adjust for the usual +/- 1  
>>>>>>>> problem at
>>>>>>>> the
>>>>>>>> beginning and end):
>>>>>>>>
>>>>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p  
>>>>>>>> | java
>>>>>>>> -Xmx
>>>>>>>> 40000m fluid0 > out${SGE_TASK_ID}.dat
>>>>>>>>
>>>>>>>> hence output only the necessary lines of the input file and  
>>>>>>>> create a
>>>>>>>> unique
>>>>>>>> output file for each task of an array job. Also for the  
>>>>>>>> output file,
>>>>>>>> maybe
>>>>>>>> it's not necessary to concat them into one file, as you can  
>>>>>>>> sometimes
>>>>>>>> use
>>>>>>>> a
>>>>>>>> construct like:
>>>>>>>>
>>>>>>>> cat out*.dat | my_pgm
>>>>>>>>
>>>>>>>> for further processing. More than 9 tasks this would lead to  
>>>>>>>> the
>>>>>>>> wrong
>>>>>>>> order
>>>>>>>> 1, 10, 2, 3, ... and you need a variant from the above command:
>>>>>>>>
>>>>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p  
>>>>>>>> | java
>>>>>>>> -Xmx
>>>>>>>> 40000m fluid0 > out$(printf "%02d" $SGE_TASK_ID).dat
>>>>>>>>
>>>>>>>> for having leading zeros for the index in the name of the  
>>>>>>>> output
>>>>>>>> file.
>>>>>>>>
>>>>>>>> -- Reuti
>>>>>>>>
>>>>>>>> --------------------------------------------------------------- 
>>>>>>>> ------
>>>>>>>> To unsubscribe, e-mail: users- 
>>>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>>>> For additional commands, e-mail: users- 
>>>>>>>> help at gridengine.sunsource.net
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> -----
>>>>>>> To unsubscribe, e-mail: users- 
>>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: users- 
>>>>>>> help at gridengine.sunsource.net
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ----
>>>>>> To unsubscribe, e-mail: users- 
>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users- 
>>>>>> help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list