[GE users] intensive job

Reuti reuti at staff.uni-marburg.de
Wed Oct 29 11:33:08 GMT 2008


Hi Mag,

Am 29.10.2008 um 04:17 schrieb Mag Gam:

> Reuti.
>
> Thankyou again for all of your help and persistence. Without you, I
> would of had a lot of trouble!
>
> Clearly, this is a memory problem we are going thru. I was wondering
> if there is a particular documented case study for this. I would like
> to read it to implement for our lab.

they are shortly mentioned in the Administration Guide on page 76 ff:  
http://dlc.sun.com/pdf/817-5677/817-5677.pdf

http://gridengine.sunsource.net/servlets/ReadMsg? 
listName=users&msgNo=10553
http://gridengine.sunsource.net/servlets/ReadMsg? 
listName=users&msgNo=15079

h_vmem or virtual free at your choice. Nowadays I would define the  
amount of h_vmem or virtual_free equal to the physical installed memory.


> By researching, I was thinking of something like this -- since the
> problem is memory related.
> 64 Servers/CPUs  (64 Slots)

You mean 64 cores per machine? Slots is the number of a queue  
instances per node. Not in total. If you want to limit it in total  
across mutiple machines, you will need an RQS.

> 1) Create 3 queues: large, medium, small
>     large queue will have a memory limit of 32G
>     medium queue will have memory limit of 16G
>     small queue will have a memory limit of 4G
> 2) Slot allocation: allocate 4 slots for large queues, 10 slots for
> medium queue, and 50 slots for small queue.

This you can only use in addition, as the memory setup mentioned  
above must be implemented anyway. It's just to control the mix of  
jobs in the cluster. If you like to do so, it's fine. Be aware, that  
a job requesting 4G can run in all queues. The idea in SGE is, that  
SGE will select an appropriate queue according to the resource  
requests. It's different form PBS, where often jobs are submitted  
"into a queue".

By having more queues per machine, you have to limit the number of  
active instances by slots in total. You can define the slots (equal  
to the number of cores) either in the exechost definition by setting  
"complex_values slots=64" or creating an RQS for it "limit hosts {*}  
to slots=64" (if each machine has 64 cores).


> 3) Have equal share
> (http://gridengine.info/2006/01/17/easy-setup-of-equal-user- 
> fairshare-policy)

Fine.


> 4) If a professor is running a large job on a server, that server
> should not have more slots open. This is because if smaller/medium
> jobs get executed the server could crash.

Then he should also request the total amount of memory in the node.  
Unfortunately the is only a RFE to get exclusive access to a node,  
biut it's not implemented for now.

5) To avoid starvation of jobs requesting much memory, it's advisable  
to enable also resource reservation with a sensible setting for  
"max_reservation  25" and submit jobs with "-R y".

-- Reuti


> Does this sound reasonable or am I missing something?
>
>
>
>
>
>
>
> When a user submits a job, she should know which queue to submit the
> job. For example, if she knows its going to be a memory intensive job,
> she will run it in queue large. If she runs it in queue medium or
> small it would fail.
>
>
>
>
> On Tue, Oct 28, 2008 at 8:37 AM, Reuti <reuti at staff.uni-marburg.de>  
> wrote:
>> Am 28.10.2008 um 12:25 schrieb Mag Gam:
>>
>>> Reuti:
>>>
>>> Thanks again! I will try this.
>>>
>>> If I decided to go with creating a new queue, I was planning to  
>>> clone
>>> my current queue and assign it smaller amout of slots. Do I need  
>>> to do
>>> anything else to activate it?
>>
>> If you copy all entries except the qname it should work instantly,  
>> as a
>> hostlist is already attached. But don't put too many slots on each  
>> host,
>> means more than cores are installed across all queues per node. To  
>> limit
>> this you have to define a) the number of slots in complex_values  
>> for each
>> exec_host or b) define an RQS to limit the slots to the number of  
>> cores -
>> use one method of your choice.
>>
>> -- Reuti
>>
>>
>>> On Mon, Oct 27, 2008 at 6:40 AM, Reuti <reuti at staff.uni- 
>>> marburg.de> wrote:
>>>>
>>>> Hi Mag,
>>>>
>>>> Am 26.10.2008 um 23:26 schrieb Mag Gam:
>>>>
>>>>>> -) If also other jobs should run there: implement virtual_free or
>>>>>> h_vmem
>>>>>> to be consumable and request the proper amount like I  
>>>>>> mentioned in my
>>>>>> first
>>>>>> reply. When the memory is used up, no other jobs will be  
>>>>>> scheduled
>>>>>> thereto.
>>>>>> All jobs must request either virtual_free or h_vmem, so you  
>>>>>> will have
>>>>>> to
>>>>>> define a sensible default for it in the complex configuration  
>>>>>> (qconf
>>>>>> -mc)
>>>>>
>>>>> I am going with this option.
>>>>>
>>>>> I am submitting jobs like this:
>>>>> qsub -l h_vmem=40g script.sh
>>>>
>>>> did you read the first sentence carefully: "...implement  
>>>> virtual_free or
>>>> h_vmem to be consumable ..."? Otherwise the limit is just a  
>>>> limit, and
>>>> you
>>>> can run too many jobs per machine.
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> The problem is during the array (-t 10 50) , I can see (qstat - 
>>>>> f) 5 to
>>>>> 10 jobs running on the same box. This is naturally going to  
>>>>> cause the
>>>>> job to fail. It seems the memory limits and 1 job per host is not
>>>>> working. This job is multihreaded - it uses 8 CPUs :-)
>>>>>
>>>>> So, basically I want this: run only ONE (1) instance of this  
>>>>> program
>>>>> on a server. Once that job is completed do the next job. I  
>>>>> don't want
>>>>> to run more than 1 instance of this job (if thats possible to  
>>>>> do with
>>>>> array jobs).
>>>>>
>>>>> Any ideas?
>>>>>
>>>>> TIA
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Oct 26, 2008 at 5:15 PM, Reuti <reuti at staff.uni- 
>>>>> marburg.de>
>>>>> wrote:
>>>>>>
>>>>>> Am 26.10.2008 um 20:57 schrieb Mag Gam:
>>>>>>
>>>>>>> Reuti:
>>>>>>>
>>>>>>> You are right! I did have a memory limit. I removed it and his
>>>>>>> application works! Thankyou very much.
>>>>>>>
>>>>>>> Since these are intensive processes, we want to run only 1  
>>>>>>> process per
>>>>>>> host. To be safe, we can even wait for the process to  
>>>>>>> complete and
>>>>>>> then submit a subtask. Is it possible to do that?
>>>>>>
>>>>>> There are two options:
>>>>>>
>>>>>> -) If you have just these type of jobs you could define the queue
>>>>>> having
>>>>>> only one slot per machine (entry "slot" in the queue  
>>>>>> definition). This
>>>>>> way
>>>>>> all can be submitted, and start only one after another on each  
>>>>>> machine.
>>>>>>
>>>>>> -) If also other jobs should run there: implement virtual_free or
>>>>>> h_vmem
>>>>>> to
>>>>>> be consumable and request the proper amount like I mentioned  
>>>>>> in my
>>>>>> first
>>>>>> reply. When the memory is used up, no other jobs will be  
>>>>>> scheduled
>>>>>> thereto.
>>>>>> All jobs must request either virtual_free or h_vmem, so you  
>>>>>> will have
>>>>>> to
>>>>>> define a sensible default for it in the complex configuration  
>>>>>> (qconf
>>>>>> -mc).
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>
>>>>>>> I am asking this because I am getting random out of memory  
>>>>>>> messages.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Oct 26, 2008 at 1:44 PM, Reuti <reuti at staff.uni- 
>>>>>>> marburg.de>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Am 26.10.2008 um 18:08 schrieb Mag Gam:
>>>>>>>>
>>>>>>>>> I am certain I don't have any quotas regarding this.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> qconf -srqs
>>>>>>>>> {
>>>>>>>>>  name         cpu_limit
>>>>>>>>>  description  NONE
>>>>>>>>>  enabled      TRUE
>>>>>>>>>  limit        users mathprof to slots=8
>>>>>>>>> }
>>>>>>>>
>>>>>>>> Not the resource quotas, the queue configuration (qconf -sq  
>>>>>>>> myq). But
>>>>>>>> it
>>>>>>>> seems, that there are some limits defined, as stack and virtual
>>>>>>>> memory
>>>>>>>> are
>>>>>>>> defined as 15G.
>>>>>>>>
>>>>>>>> Only the soft-quotas are in effect, means what is an  
>>>>>>>> interactive
>>>>>>>> "ulimit
>>>>>>>> -aS" showing in addition?
>>>>>>>>
>>>>>>>> The user is only allowed to change the limits in effect  
>>>>>>>> (i.e. the
>>>>>>>> doft-limit) between the hard-limti and zero. He can also  
>>>>>>>> lower the
>>>>>>>> hard-limit. But once it's lowered, it can't be risen again  
>>>>>>>> (unless
>>>>>>>> root
>>>>>>>> is
>>>>>>>> executing these commands).
>>>>>>>>
>>>>>>>> -- Reuti
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is there output for the job
>>>>>>>>>
>>>>>>>>> core file size          (blocks, -c) unlimited
>>>>>>>>> data seg size           (kbytes, -d) unlimited
>>>>>>>>> scheduling priority             (-e) 0
>>>>>>>>> file size               (blocks, -f) unlimited
>>>>>>>>> pending signals                 (-i) 530431
>>>>>>>>> max locked memory       (kbytes, -l) 32
>>>>>>>>> max memory size         (kbytes, -m) unlimited
>>>>>>>>> open files                      (-n) 1024
>>>>>>>>> pipe size            (512 bytes, -p) 8
>>>>>>>>> POSIX message queues     (bytes, -q) 819200
>>>>>>>>> real-time priority              (-r) 0
>>>>>>>>> stack size              (kbytes, -s) unlimited
>>>>>>>>> cpu time               (seconds, -t) unlimited
>>>>>>>>> max user processes              (-u) 530431
>>>>>>>>> virtual memory          (kbytes, -v) unlimited
>>>>>>>>> file locks                      (-x) unlimited
>>>>>>>>>
>>>>>>>>> core file size          (blocks, -c) 0
>>>>>>>>> data seg size           (kbytes, -d) 15625000
>>>>>>>>> scheduling priority             (-e) 0
>>>>>>>>> file size               (blocks, -f) unlimited
>>>>>>>>> pending signals                 (-i) 530431
>>>>>>>>> max locked memory       (kbytes, -l) 32
>>>>>>>>> max memory size         (kbytes, -m) unlimited
>>>>>>>>> open files                      (-n) 1024
>>>>>>>>> pipe size            (512 bytes, -p) 8
>>>>>>>>> POSIX message queues     (bytes, -q) 819200
>>>>>>>>> real-time priority              (-r) 0
>>>>>>>>> stack size              (kbytes, -s) 15625000
>>>>>>>>> cpu time               (seconds, -t) unlimited
>>>>>>>>> max user processes              (-u) 530431
>>>>>>>>> virtual memory          (kbytes, -v) 15625000
>>>>>>>>> file locks                      (-x) unlimited
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> See anything else?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, Oct 26, 2008 at 12:37 PM, Reuti <reuti at staff.uni- 
>>>>>>>>> marburg.de>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Am 26.10.2008 um 16:16 schrieb Mag Gam:
>>>>>>>>>>
>>>>>>>>>>> Thanks Reuti as usual!
>>>>>>>>>>>
>>>>>>>>>>> I have came to this problem now. My java application is  
>>>>>>>>>>> giving me
>>>>>>>>>>> this
>>>>>>>>>>> error:
>>>>>>>>>>>
>>>>>>>>>>> Error occurred during initialization of VM
>>>>>>>>>>> Could not reserve enough space for object heap
>>>>>>>>>>>
>>>>>>>>>>> All of the servers are free of memory, so there is no memory
>>>>>>>>>>> contention.
>>>>>>>>>>>
>>>>>>>>>>> I am submitting the job as qsub script.sh (without any -l  
>>>>>>>>>>> options)
>>>>>>>>>>>
>>>>>>>>>>> However, if I run it via ssh I get the correct results. I  
>>>>>>>>>>> am not
>>>>>>>>>>> sure
>>>>>>>>>>> why I am getting this error.
>>>>>>>>>>>
>>>>>>>>>>> I tried to look at this and it seems you are giving some  
>>>>>>>>>>> replies
>>>>>>>>>>> here,
>>>>>>>>>>> but still not helpful :-(
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://fossplanet.com/clustering.gridengine.users/ 
>>>>>>>>>>> message-1123088-strange-consequence-changing-n1ge/
>>>>>>>>>>
>>>>>>>>>> Mag,
>>>>>>>>>>
>>>>>>>>>> this can really be related. Can you please post your queue
>>>>>>>>>> configuration
>>>>>>>>>> -
>>>>>>>>>> did you define any limits there?
>>>>>>>>>>
>>>>>>>>>> Another hint would be to submit a job listing the limits  
>>>>>>>>>> inside a
>>>>>>>>>> job,
>>>>>>>>>> i.e.:
>>>>>>>>>>
>>>>>>>>>> #!/bin/sh
>>>>>>>>>> ulimit -aH
>>>>>>>>>> echo
>>>>>>>>>> ulimit -aS
>>>>>>>>>>
>>>>>>>>>> -- Reuti
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Any ideas?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Sun, Oct 26, 2008 at 9:57 AM, Reuti
>>>>>>>>>>> <reuti at staff.uni-marburg.de>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Am 26.10.2008 um 14:10 schrieb Mag Gam:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hello Reuti:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Would it help if I started at 10 instead of 1?
>>>>>>>>>>>>
>>>>>>>>>>>> sure, in this case you would just need the files *.10 to  
>>>>>>>>>>>> *.19
>>>>>>>>>>>> when
>>>>>>>>>>>> you
>>>>>>>>>>>> want
>>>>>>>>>>>> to avoid the computation of canonical names for *.01 to  
>>>>>>>>>>>> *.10.
>>>>>>>>>>>>
>>>>>>>>>>>> qsub -t 10-19 ...
>>>>>>>>>>>>
>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> #!/bin/sh
>>>>>>>>>>>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to
>>>>>>>>>>>>> produce
>>>>>>>>>>>>> out.$SGE_TASK_ID"
>>>>>>>>>>>>> sleep 60
>>>>>>>>>>>>> exit 0
>>>>>>>>>>>>>
>>>>>>>>>>>>> and start it with:
>>>>>>>>>>>>> qsub -t 10 script.sh
>>>>>>>>>>>>>
>>>>>>>>>>>>> Works.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sat, Oct 25, 2008 at 1:30 PM, Reuti
>>>>>>>>>>>>> <reuti at staff.uni-marburg.de>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Am 25.10.2008 um 16:20 schrieb Mag Gam:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Reuti:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As usual, thankyou! This is very help, but perhaps I  
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> backup
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> little.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "qsub -l virtual_free=40g" does that reserve space or  
>>>>>>>>>>>>>>> does it
>>>>>>>>>>>>>>> wait
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> that space?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As long as there are only SGE's jobs: both.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, what if a user (non GRID) is using the servers. I
>>>>>>>>>>>>>>> assume SGE will not account for that, or will it?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This is always unpredictable. Can you force your  
>>>>>>>>>>>>>> interactive
>>>>>>>>>>>>>> users
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> go
>>>>>>>>>>>>>> through SGE by requesting a an interactive job? Then  
>>>>>>>>>>>>>> yoiu would
>>>>>>>>>>>>>> need
>>>>>>>>>>>>>> h_vmem
>>>>>>>>>>>>>> instead of virtual_free to enforce the limits. for  
>>>>>>>>>>>>>> both typers
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> jobs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My intention is this:
>>>>>>>>>>>>>>> I have 1000000 file
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I split it into 10 blocks
>>>>>>>>>>>>>>> 100000.a
>>>>>>>>>>>>>>> 100000.b
>>>>>>>>>>>>>>> 100000.c
>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>> 100000.j
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> when you have split them already, you will need to  
>>>>>>>>>>>>>> rename them
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> 100000.1
>>>>>>>>>>>>>> ... 100000.10
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I also have a wrapper script like this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> #!/bin/ksh
>>>>>>>>>>>>>>> #wrapper script -- wrapper.sh <filename>
>>>>>>>>>>>>>>> #$ -cwd
>>>>>>>>>>>>>>> #$ -V
>>>>>>>>>>>>>>> #$ -N fluid
>>>>>>>>>>>>>>> #$ -S /bin/ksh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> file=$1
>>>>>>>>>>>>>>> cat $file | java -XmX 40000m fluid0 > out. 
>>>>>>>>>>>>>>> $SGE_TASK_ID.dat
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I invoke the script like this:
>>>>>>>>>>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.a
>>>>>>>>>>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.b
>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.j
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please try first a simple job, to see how array jobs are
>>>>>>>>>>>>>> handled:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #!/bin/sh
>>>>>>>>>>>>>> echo "I'm $SGE_TASK_ID and will read 10000. 
>>>>>>>>>>>>>> $SGE_TASK_ID to
>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>> out.$SGE_TASK_ID"
>>>>>>>>>>>>>> sleep 60
>>>>>>>>>>>>>> exit 0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and start it with:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> qsub -t 10 script.sh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have tried to use the -t option for an array job,  
>>>>>>>>>>>>>>> but it was
>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>> working for some reason.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Any thoughts about this method?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> TIA
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Oct 25, 2008 at 7:14 AM, Reuti
>>>>>>>>>>>>>>> <reuti at staff.uni-marburg.de>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Mag,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Am 25.10.2008 um 02:40 schrieb Mag Gam:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello All.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> We have a professor who is notorious for bring down  
>>>>>>>>>>>>>>>>> our
>>>>>>>>>>>>>>>>> engineering
>>>>>>>>>>>>>>>>> GRID (64 servers) servers due to his direct numerical
>>>>>>>>>>>>>>>>> simulations.
>>>>>>>>>>>>>>>>> He
>>>>>>>>>>>>>>>>> basically runs a Java program with -Xmx 40000m (40  
>>>>>>>>>>>>>>>>> gigs).
>>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>> preallocates 40 gigs of memory and then crashes the  
>>>>>>>>>>>>>>>>> box
>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> this looks more like that you have to setup SGE to  
>>>>>>>>>>>>>>>> manage the
>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> request the necessary amount of memory for the job  
>>>>>>>>>>>>>>>> and submit
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> "qsub
>>>>>>>>>>>>>>>> -l virtual_free=40g ..."
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://gridengine.sunsource.net/servlets/ReadMsg? 
>>>>>>>>>>>>>>>> listName=users&msgNo=15079
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> are other processes running on the box. Each box  
>>>>>>>>>>>>>>>>> has 128G of
>>>>>>>>>>>>>>>>> Physical
>>>>>>>>>>>>>>>>> memory. He runs the application like this:
>>>>>>>>>>>>>>>>> cat series | java -Xmx 40000m fluid0 > out.dat
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> the "series" file has over 10 million records.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I was thinking of something like this: split the 10  
>>>>>>>>>>>>>>>>> million
>>>>>>>>>>>>>>>>> records
>>>>>>>>>>>>>>>>> into 10 files (each file has 1 million record),  
>>>>>>>>>>>>>>>>> submit 10
>>>>>>>>>>>>>>>>> array
>>>>>>>>>>>>>>>>> jobs,
>>>>>>>>>>>>>>>>> and then output to out.dat. But the order for  
>>>>>>>>>>>>>>>>> 'out.dat'
>>>>>>>>>>>>>>>>> matters!
>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>> would like to run these 10 jobs independently, but  
>>>>>>>>>>>>>>>>> how can I
>>>>>>>>>>>>>>>>> maintain
>>>>>>>>>>>>>>>>> order?  Or is there a better way to do this?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> By him submitting his current job it would not be  
>>>>>>>>>>>>>>>>> wise...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> You mean: one array job with 10 tasks - right? So  
>>>>>>>>>>>>>>>> "qsub -t
>>>>>>>>>>>>>>>> 1-10
>>>>>>>>>>>>>>>> my_job".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In each jobscript you can use (adjust for the usual  
>>>>>>>>>>>>>>>> +/- 1
>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> beginning and end):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$ 
>>>>>>>>>>>>>>>> [SGE_TASK_ID*1000000]p
>>>>>>>>>>>>>>>> |
>>>>>>>>>>>>>>>> java
>>>>>>>>>>>>>>>> -Xmx
>>>>>>>>>>>>>>>> 40000m fluid0 > out${SGE_TASK_ID}.dat
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> hence output only the necessary lines of the input  
>>>>>>>>>>>>>>>> file and
>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> unique
>>>>>>>>>>>>>>>> output file for each task of an array job. Also for the
>>>>>>>>>>>>>>>> output
>>>>>>>>>>>>>>>> file,
>>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>> it's not necessary to concat them into one file, as  
>>>>>>>>>>>>>>>> you can
>>>>>>>>>>>>>>>> sometimes
>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> construct like:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> cat out*.dat | my_pgm
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> for further processing. More than 9 tasks this would  
>>>>>>>>>>>>>>>> lead to
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> wrong
>>>>>>>>>>>>>>>> order
>>>>>>>>>>>>>>>> 1, 10, 2, 3, ... and you need a variant from the above
>>>>>>>>>>>>>>>> command:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> sed -n -e $[(SGE_TASK_ID-1)*1000000],$ 
>>>>>>>>>>>>>>>> [SGE_TASK_ID*1000000]p
>>>>>>>>>>>>>>>> |
>>>>>>>>>>>>>>>> java
>>>>>>>>>>>>>>>> -Xmx
>>>>>>>>>>>>>>>> 40000m fluid0 > out$(printf "%02d" $SGE_TASK_ID).dat
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> for having leading zeros for the index in the name  
>>>>>>>>>>>>>>>> of the
>>>>>>>>>>>>>>>> output
>>>>>>>>>>>>>>>> file.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ------------------------------------------------------- 
>>>>>>>>>>>>>>>> --------------
>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -------------------------------------------------------- 
>>>>>>>>>>>>>>> -------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --------------------------------------------------------- 
>>>>>>>>>>>>>> ------------
>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------------------------------------------------- 
>>>>>>>>>>>>> -----------
>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ----------------------------------------------------------- 
>>>>>>>>>>>> ----------
>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ------------------------------------------------------------ 
>>>>>>>>>>> ---------
>>>>>>>>>>> To unsubscribe, e-mail: users- 
>>>>>>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------- 
>>>>>>>>>> --------
>>>>>>>>>> To unsubscribe, e-mail: users- 
>>>>>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------------------------------------------------------------- 
>>>>>>>>> -------
>>>>>>>>> To unsubscribe, e-mail: users- 
>>>>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>>>>> For additional commands, e-mail: users- 
>>>>>>>>> help at gridengine.sunsource.net
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------- 
>>>>>>>> ------
>>>>>>>> To unsubscribe, e-mail: users- 
>>>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>>>> For additional commands, e-mail: users- 
>>>>>>>> help at gridengine.sunsource.net
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------- 
>>>>>>> -----
>>>>>>> To unsubscribe, e-mail: users- 
>>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: users- 
>>>>>>> help at gridengine.sunsource.net
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------- 
>>>>>> ----
>>>>>> To unsubscribe, e-mail: users- 
>>>>>> unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users- 
>>>>>> help at gridengine.sunsource.net
>>>>>>
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users- 
>>>>> help at gridengine.sunsource.net
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list