[GE users] intensive job

Mag Gam magawake at gmail.com
Wed Oct 29 11:46:14 GMT 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

>You mean 64 cores per machine? Slots is the number of a queue instances per node. Not in total. If you want to limit it in total across mutiple machines, you will need an RQS.

64 total cores.


I am going to try this when I get to school today. Will keep you updated.




On Wed, Oct 29, 2008 at 7:33 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
> Hi Mag,
>
> Am 29.10.2008 um 04:17 schrieb Mag Gam:
>
>> Reuti.
>>
>> Thankyou again for all of your help and persistence. Without you, I
>> would of had a lot of trouble!
>>
>> Clearly, this is a memory problem we are going thru. I was wondering
>> if there is a particular documented case study for this. I would like
>> to read it to implement for our lab.
>
> they are shortly mentioned in the Administration Guide on page 76 ff:
> http://dlc.sun.com/pdf/817-5677/817-5677.pdf
>
> http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&msgNo=10553
> http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&msgNo=15079
>
> h_vmem or virtual free at your choice. Nowadays I would define the amount of
> h_vmem or virtual_free equal to the physical installed memory.
>
>
>> By researching, I was thinking of something like this -- since the
>> problem is memory related.
>> 64 Servers/CPUs  (64 Slots)
>
> You mean 64 cores per machine? Slots is the number of a queue instances per
> node. Not in total. If you want to limit it in total across mutiple
> machines, you will need an RQS.
>
>> 1) Create 3 queues: large, medium, small
>>    large queue will have a memory limit of 32G
>>    medium queue will have memory limit of 16G
>>    small queue will have a memory limit of 4G
>> 2) Slot allocation: allocate 4 slots for large queues, 10 slots for
>> medium queue, and 50 slots for small queue.
>
> This you can only use in addition, as the memory setup mentioned above must
> be implemented anyway. It's just to control the mix of jobs in the cluster.
> If you like to do so, it's fine. Be aware, that a job requesting 4G can run
> in all queues. The idea in SGE is, that SGE will select an appropriate queue
> according to the resource requests. It's different form PBS, where often
> jobs are submitted "into a queue".
>
> By having more queues per machine, you have to limit the number of active
> instances by slots in total. You can define the slots (equal to the number
> of cores) either in the exechost definition by setting "complex_values
> slots=64" or creating an RQS for it "limit hosts {*} to slots=64" (if each
> machine has 64 cores).
>
>
>> 3) Have equal share
>>
>> (http://gridengine.info/2006/01/17/easy-setup-of-equal-user-fairshare-policy)
>
> Fine.
>
>
>> 4) If a professor is running a large job on a server, that server
>> should not have more slots open. This is because if smaller/medium
>> jobs get executed the server could crash.
>
> Then he should also request the total amount of memory in the node.
> Unfortunately the is only a RFE to get exclusive access to a node, biut it's
> not implemented for now.
>
> 5) To avoid starvation of jobs requesting much memory, it's advisable to
> enable also resource reservation with a sensible setting for
> "max_reservation  25" and submit jobs with "-R y".
>
> -- Reuti
>
>
>> Does this sound reasonable or am I missing something?
>>
>>
>>
>>
>>
>>
>>
>> When a user submits a job, she should know which queue to submit the
>> job. For example, if she knows its going to be a memory intensive job,
>> she will run it in queue large. If she runs it in queue medium or
>> small it would fail.
>>
>>
>>
>>
>> On Tue, Oct 28, 2008 at 8:37 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
>>>
>>> Am 28.10.2008 um 12:25 schrieb Mag Gam:
>>>
>>>> Reuti:
>>>>
>>>> Thanks again! I will try this.
>>>>
>>>> If I decided to go with creating a new queue, I was planning to clone
>>>> my current queue and assign it smaller amout of slots. Do I need to do
>>>> anything else to activate it?
>>>
>>> If you copy all entries except the qname it should work instantly, as a
>>> hostlist is already attached. But don't put too many slots on each host,
>>> means more than cores are installed across all queues per node. To limit
>>> this you have to define a) the number of slots in complex_values for each
>>> exec_host or b) define an RQS to limit the slots to the number of cores -
>>> use one method of your choice.
>>>
>>> -- Reuti
>>>
>>>
>>>> On Mon, Oct 27, 2008 at 6:40 AM, Reuti <reuti at staff.uni-marburg.de>
>>>> wrote:
>>>>>
>>>>> Hi Mag,
>>>>>
>>>>> Am 26.10.2008 um 23:26 schrieb Mag Gam:
>>>>>
>>>>>>> -) If also other jobs should run there: implement virtual_free or
>>>>>>> h_vmem
>>>>>>> to be consumable and request the proper amount like I mentioned in my
>>>>>>> first
>>>>>>> reply. When the memory is used up, no other jobs will be scheduled
>>>>>>> thereto.
>>>>>>> All jobs must request either virtual_free or h_vmem, so you will have
>>>>>>> to
>>>>>>> define a sensible default for it in the complex configuration (qconf
>>>>>>> -mc)
>>>>>>
>>>>>> I am going with this option.
>>>>>>
>>>>>> I am submitting jobs like this:
>>>>>> qsub -l h_vmem=40g script.sh
>>>>>
>>>>> did you read the first sentence carefully: "...implement virtual_free
>>>>> or
>>>>> h_vmem to be consumable ..."? Otherwise the limit is just a limit, and
>>>>> you
>>>>> can run too many jobs per machine.
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>> The problem is during the array (-t 10 50) , I can see (qstat -f) 5 to
>>>>>> 10 jobs running on the same box. This is naturally going to cause the
>>>>>> job to fail. It seems the memory limits and 1 job per host is not
>>>>>> working. This job is multihreaded - it uses 8 CPUs :-)
>>>>>>
>>>>>> So, basically I want this: run only ONE (1) instance of this program
>>>>>> on a server. Once that job is completed do the next job. I don't want
>>>>>> to run more than 1 instance of this job (if thats possible to do with
>>>>>> array jobs).
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> TIA
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sun, Oct 26, 2008 at 5:15 PM, Reuti <reuti at staff.uni-marburg.de>
>>>>>> wrote:
>>>>>>>
>>>>>>> Am 26.10.2008 um 20:57 schrieb Mag Gam:
>>>>>>>
>>>>>>>> Reuti:
>>>>>>>>
>>>>>>>> You are right! I did have a memory limit. I removed it and his
>>>>>>>> application works! Thankyou very much.
>>>>>>>>
>>>>>>>> Since these are intensive processes, we want to run only 1 process
>>>>>>>> per
>>>>>>>> host. To be safe, we can even wait for the process to complete and
>>>>>>>> then submit a subtask. Is it possible to do that?
>>>>>>>
>>>>>>> There are two options:
>>>>>>>
>>>>>>> -) If you have just these type of jobs you could define the queue
>>>>>>> having
>>>>>>> only one slot per machine (entry "slot" in the queue definition).
>>>>>>> This
>>>>>>> way
>>>>>>> all can be submitted, and start only one after another on each
>>>>>>> machine.
>>>>>>>
>>>>>>> -) If also other jobs should run there: implement virtual_free or
>>>>>>> h_vmem
>>>>>>> to
>>>>>>> be consumable and request the proper amount like I mentioned in my
>>>>>>> first
>>>>>>> reply. When the memory is used up, no other jobs will be scheduled
>>>>>>> thereto.
>>>>>>> All jobs must request either virtual_free or h_vmem, so you will have
>>>>>>> to
>>>>>>> define a sensible default for it in the complex configuration (qconf
>>>>>>> -mc).
>>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>>
>>>>>>>> I am asking this because I am getting random out of memory messages.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Oct 26, 2008 at 1:44 PM, Reuti <reuti at staff.uni-marburg.de>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Am 26.10.2008 um 18:08 schrieb Mag Gam:
>>>>>>>>>
>>>>>>>>>> I am certain I don't have any quotas regarding this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> qconf -srqs
>>>>>>>>>> {
>>>>>>>>>>  name         cpu_limit
>>>>>>>>>>  description  NONE
>>>>>>>>>>  enabled      TRUE
>>>>>>>>>>  limit        users mathprof to slots=8
>>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Not the resource quotas, the queue configuration (qconf -sq myq).
>>>>>>>>> But
>>>>>>>>> it
>>>>>>>>> seems, that there are some limits defined, as stack and virtual
>>>>>>>>> memory
>>>>>>>>> are
>>>>>>>>> defined as 15G.
>>>>>>>>>
>>>>>>>>> Only the soft-quotas are in effect, means what is an interactive
>>>>>>>>> "ulimit
>>>>>>>>> -aS" showing in addition?
>>>>>>>>>
>>>>>>>>> The user is only allowed to change the limits in effect (i.e. the
>>>>>>>>> doft-limit) between the hard-limti and zero. He can also lower the
>>>>>>>>> hard-limit. But once it's lowered, it can't be risen again (unless
>>>>>>>>> root
>>>>>>>>> is
>>>>>>>>> executing these commands).
>>>>>>>>>
>>>>>>>>> -- Reuti
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is there output for the job
>>>>>>>>>>
>>>>>>>>>> core file size          (blocks, -c) unlimited
>>>>>>>>>> data seg size           (kbytes, -d) unlimited
>>>>>>>>>> scheduling priority             (-e) 0
>>>>>>>>>> file size               (blocks, -f) unlimited
>>>>>>>>>> pending signals                 (-i) 530431
>>>>>>>>>> max locked memory       (kbytes, -l) 32
>>>>>>>>>> max memory size         (kbytes, -m) unlimited
>>>>>>>>>> open files                      (-n) 1024
>>>>>>>>>> pipe size            (512 bytes, -p) 8
>>>>>>>>>> POSIX message queues     (bytes, -q) 819200
>>>>>>>>>> real-time priority              (-r) 0
>>>>>>>>>> stack size              (kbytes, -s) unlimited
>>>>>>>>>> cpu time               (seconds, -t) unlimited
>>>>>>>>>> max user processes              (-u) 530431
>>>>>>>>>> virtual memory          (kbytes, -v) unlimited
>>>>>>>>>> file locks                      (-x) unlimited
>>>>>>>>>>
>>>>>>>>>> core file size          (blocks, -c) 0
>>>>>>>>>> data seg size           (kbytes, -d) 15625000
>>>>>>>>>> scheduling priority             (-e) 0
>>>>>>>>>> file size               (blocks, -f) unlimited
>>>>>>>>>> pending signals                 (-i) 530431
>>>>>>>>>> max locked memory       (kbytes, -l) 32
>>>>>>>>>> max memory size         (kbytes, -m) unlimited
>>>>>>>>>> open files                      (-n) 1024
>>>>>>>>>> pipe size            (512 bytes, -p) 8
>>>>>>>>>> POSIX message queues     (bytes, -q) 819200
>>>>>>>>>> real-time priority              (-r) 0
>>>>>>>>>> stack size              (kbytes, -s) 15625000
>>>>>>>>>> cpu time               (seconds, -t) unlimited
>>>>>>>>>> max user processes              (-u) 530431
>>>>>>>>>> virtual memory          (kbytes, -v) 15625000
>>>>>>>>>> file locks                      (-x) unlimited
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> See anything else?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Oct 26, 2008 at 12:37 PM, Reuti
>>>>>>>>>> <reuti at staff.uni-marburg.de>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Am 26.10.2008 um 16:16 schrieb Mag Gam:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks Reuti as usual!
>>>>>>>>>>>>
>>>>>>>>>>>> I have came to this problem now. My java application is giving
>>>>>>>>>>>> me
>>>>>>>>>>>> this
>>>>>>>>>>>> error:
>>>>>>>>>>>>
>>>>>>>>>>>> Error occurred during initialization of VM
>>>>>>>>>>>> Could not reserve enough space for object heap
>>>>>>>>>>>>
>>>>>>>>>>>> All of the servers are free of memory, so there is no memory
>>>>>>>>>>>> contention.
>>>>>>>>>>>>
>>>>>>>>>>>> I am submitting the job as qsub script.sh (without any -l
>>>>>>>>>>>> options)
>>>>>>>>>>>>
>>>>>>>>>>>> However, if I run it via ssh I get the correct results. I am not
>>>>>>>>>>>> sure
>>>>>>>>>>>> why I am getting this error.
>>>>>>>>>>>>
>>>>>>>>>>>> I tried to look at this and it seems you are giving some replies
>>>>>>>>>>>> here,
>>>>>>>>>>>> but still not helpful :-(
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> http://fossplanet.com/clustering.gridengine.users/message-1123088-strange-consequence-changing-n1ge/
>>>>>>>>>>>
>>>>>>>>>>> Mag,
>>>>>>>>>>>
>>>>>>>>>>> this can really be related. Can you please post your queue
>>>>>>>>>>> configuration
>>>>>>>>>>> -
>>>>>>>>>>> did you define any limits there?
>>>>>>>>>>>
>>>>>>>>>>> Another hint would be to submit a job listing the limits inside a
>>>>>>>>>>> job,
>>>>>>>>>>> i.e.:
>>>>>>>>>>>
>>>>>>>>>>> #!/bin/sh
>>>>>>>>>>> ulimit -aH
>>>>>>>>>>> echo
>>>>>>>>>>> ulimit -aS
>>>>>>>>>>>
>>>>>>>>>>> -- Reuti
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Oct 26, 2008 at 9:57 AM, Reuti
>>>>>>>>>>>> <reuti at staff.uni-marburg.de>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 26.10.2008 um 14:10 schrieb Mag Gam:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello Reuti:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Would it help if I started at 10 instead of 1?
>>>>>>>>>>>>>
>>>>>>>>>>>>> sure, in this case you would just need the files *.10 to *.19
>>>>>>>>>>>>> when
>>>>>>>>>>>>> you
>>>>>>>>>>>>> want
>>>>>>>>>>>>> to avoid the computation of canonical names for *.01 to *.10.
>>>>>>>>>>>>>
>>>>>>>>>>>>> qsub -t 10-19 ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> #!/bin/sh
>>>>>>>>>>>>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to
>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>> out.$SGE_TASK_ID"
>>>>>>>>>>>>>> sleep 60
>>>>>>>>>>>>>> exit 0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and start it with:
>>>>>>>>>>>>>> qsub -t 10 script.sh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Works.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Sat, Oct 25, 2008 at 1:30 PM, Reuti
>>>>>>>>>>>>>> <reuti at staff.uni-marburg.de>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 25.10.2008 um 16:20 schrieb Mag Gam:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Reuti:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As usual, thankyou! This is very help, but perhaps I should
>>>>>>>>>>>>>>>> backup
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> little.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> "qsub -l virtual_free=40g" does that reserve space or does
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>> wait
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> that space?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As long as there are only SGE's jobs: both.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, what if a user (non GRID) is using the servers. I
>>>>>>>>>>>>>>>> assume SGE will not account for that, or will it?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This is always unpredictable. Can you force your interactive
>>>>>>>>>>>>>>> users
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> go
>>>>>>>>>>>>>>> through SGE by requesting a an interactive job? Then yoiu
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>> h_vmem
>>>>>>>>>>>>>>> instead of virtual_free to enforce the limits. for both
>>>>>>>>>>>>>>> typers
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> jobs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> My intention is this:
>>>>>>>>>>>>>>>> I have 1000000 file
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I split it into 10 blocks
>>>>>>>>>>>>>>>> 100000.a
>>>>>>>>>>>>>>>> 100000.b
>>>>>>>>>>>>>>>> 100000.c
>>>>>>>>>>>>>>>> ....
>>>>>>>>>>>>>>>> 100000.j
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> when you have split them already, you will need to rename
>>>>>>>>>>>>>>> them
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> 100000.1
>>>>>>>>>>>>>>> ... 100000.10
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I also have a wrapper script like this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> #!/bin/ksh
>>>>>>>>>>>>>>>> #wrapper script -- wrapper.sh <filename>
>>>>>>>>>>>>>>>> #$ -cwd
>>>>>>>>>>>>>>>> #$ -V
>>>>>>>>>>>>>>>> #$ -N fluid
>>>>>>>>>>>>>>>> #$ -S /bin/ksh
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> file=$1
>>>>>>>>>>>>>>>> cat $file | java -XmX 40000m fluid0 > out.$SGE_TASK_ID.dat
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I invoke the script like this:
>>>>>>>>>>>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.a
>>>>>>>>>>>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.b
>>>>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>> qsub -l virtual_free=40g ./wrapper.sh 10000.j
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Please try first a simple job, to see how array jobs are
>>>>>>>>>>>>>>> handled:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> #!/bin/sh
>>>>>>>>>>>>>>> echo "I'm $SGE_TASK_ID and will read 10000.$SGE_TASK_ID to
>>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>> out.$SGE_TASK_ID"
>>>>>>>>>>>>>>> sleep 60
>>>>>>>>>>>>>>> exit 0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> and start it with:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> qsub -t 10 script.sh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I have tried to use the -t option for an array job, but it
>>>>>>>>>>>>>>>> was
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>> working for some reason.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Any thoughts about this method?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> TIA
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Oct 25, 2008 at 7:14 AM, Reuti
>>>>>>>>>>>>>>>> <reuti at staff.uni-marburg.de>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Mag,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Am 25.10.2008 um 02:40 schrieb Mag Gam:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello All.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> We have a professor who is notorious for bring down our
>>>>>>>>>>>>>>>>>> engineering
>>>>>>>>>>>>>>>>>> GRID (64 servers) servers due to his direct numerical
>>>>>>>>>>>>>>>>>> simulations.
>>>>>>>>>>>>>>>>>> He
>>>>>>>>>>>>>>>>>> basically runs a Java program with -Xmx 40000m (40 gigs).
>>>>>>>>>>>>>>>>>> This
>>>>>>>>>>>>>>>>>> preallocates 40 gigs of memory and then crashes the box
>>>>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> this looks more like that you have to setup SGE to manage
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>> request the necessary amount of memory for the job and
>>>>>>>>>>>>>>>>> submit
>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> "qsub
>>>>>>>>>>>>>>>>> -l virtual_free=40g ..."
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&msgNo=15079
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> are other processes running on the box. Each box has 128G
>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> Physical
>>>>>>>>>>>>>>>>>> memory. He runs the application like this:
>>>>>>>>>>>>>>>>>> cat series | java -Xmx 40000m fluid0 > out.dat
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> the "series" file has over 10 million records.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I was thinking of something like this: split the 10
>>>>>>>>>>>>>>>>>> million
>>>>>>>>>>>>>>>>>> records
>>>>>>>>>>>>>>>>>> into 10 files (each file has 1 million record), submit 10
>>>>>>>>>>>>>>>>>> array
>>>>>>>>>>>>>>>>>> jobs,
>>>>>>>>>>>>>>>>>> and then output to out.dat. But the order for 'out.dat'
>>>>>>>>>>>>>>>>>> matters!
>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>> would like to run these 10 jobs independently, but how can
>>>>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>> maintain
>>>>>>>>>>>>>>>>>> order?  Or is there a better way to do this?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> By him submitting his current job it would not be wise...
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> You mean: one array job with 10 tasks - right? So "qsub -t
>>>>>>>>>>>>>>>>> 1-10
>>>>>>>>>>>>>>>>> my_job".
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In each jobscript you can use (adjust for the usual +/- 1
>>>>>>>>>>>>>>>>> problem
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> beginning and end):
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> sed -n -e
>>>>>>>>>>>>>>>>> $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p
>>>>>>>>>>>>>>>>> |
>>>>>>>>>>>>>>>>> java
>>>>>>>>>>>>>>>>> -Xmx
>>>>>>>>>>>>>>>>> 40000m fluid0 > out${SGE_TASK_ID}.dat
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> hence output only the necessary lines of the input file and
>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> unique
>>>>>>>>>>>>>>>>> output file for each task of an array job. Also for the
>>>>>>>>>>>>>>>>> output
>>>>>>>>>>>>>>>>> file,
>>>>>>>>>>>>>>>>> maybe
>>>>>>>>>>>>>>>>> it's not necessary to concat them into one file, as you can
>>>>>>>>>>>>>>>>> sometimes
>>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> construct like:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> cat out*.dat | my_pgm
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> for further processing. More than 9 tasks this would lead
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> wrong
>>>>>>>>>>>>>>>>> order
>>>>>>>>>>>>>>>>> 1, 10, 2, 3, ... and you need a variant from the above
>>>>>>>>>>>>>>>>> command:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> sed -n -e
>>>>>>>>>>>>>>>>> $[(SGE_TASK_ID-1)*1000000],$[SGE_TASK_ID*1000000]p
>>>>>>>>>>>>>>>>> |
>>>>>>>>>>>>>>>>> java
>>>>>>>>>>>>>>>>> -Xmx
>>>>>>>>>>>>>>>>> 40000m fluid0 > out$(printf "%02d" $SGE_TASK_ID).dat
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> for having leading zeros for the index in the name of the
>>>>>>>>>>>>>>>>> output
>>>>>>>>>>>>>>>>> file.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>> users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>> For additional commands, e-mail:
>>>>>>>>> users-help at gridengine.sunsource.net
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list