[GE users] Specify the number of nodes when you submit job array

aali ahmaksod at gmail.com
Tue Oct 19 22:31:30 BST 2010


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi reuti, I just found a mistake in my previous email, here it is corrected:

Imagine my grid has 4 nodes, each with 8 cores, and I would like to submit job array to each NODE, where each job array has 10 jobs.
So if nodeX has only 3 free cores, and nodeY has only 5 free cores; I want the job array to go and submit 3 and the other 7 to be waiting on machineX and submit 5 jobs to machineY and the other 5 to be waiting for the same machine.

Basically, I am doing this as I want to use local disk for my jobs as they have a lot of I/O and I don't want to hammer the mounted disk, and at the end of each job array, I will update the shared disk.

I tried to use PE this way and it is doing something but not what I am looking for.

qsub -pe threaded 1 -P TEST -p -1 -l node=1 -N job1 -j y -o logFile-cwd -t 1-10:1
qsub -pe threaded 1 -P TEST -p -1 -l node=1 -N job2 -j y -o logFile-cwd -t 21-30:1
qsub -pe threaded 1 -P TEST -p -1 -l node=1 -N job3 -j y -o logFile-cwd -t 31-40:1

I get that all the job arrays are going in series for a single machine, i.e. job1 goes to nodeX only (which is exactly what I want), but job2 doesn't go to nodeY, it waits until job1 finishes. I also tried to changed the number after the variable threaded, but with no difference.

Here is the configuration file for my PE variable threaded:
qconf -sp threaded
pe_name            threaded
slots              8
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $pe_slots
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE


What am I missing here experts?

Cheers,
Ahmed

On 19 October 2010 20:08, Ahmed Ali <ahmaksod at gmail.com<mailto:ahmaksod at gmail.com>> wrote:
Imagine my grid has 4 nodes, each with 8 cores, and I would like to submit job array to each core, where each job array has 10 jobs.
So if nodeX has only 3 free cores, and nodeY has only 5 free cores; I want the job array to go and submit 3 and the other 7 to be waiting on machineX and submit 5 jobs to machineY and the other 5 to be waiting for the same machine.

Basically, I am doing this as I want to use local disk for my jobs as they have a lot of I/O and I don't want to hammer the mounted disk, and at the end of each job array, I will update the shared disk.

I tried to use PE this way and it is doing something but not what I am looking for.

qsub -pe threaded 1 -P TEST -p -1 -l node=1 -N job1 -j y -o logFile-cwd -t 1-10:1
qsub -pe threaded 1 -P TEST -p -1 -l node=1 -N job2 -j y -o logFile-cwd -t 21-30:1
qsub -pe threaded 1 -P TEST -p -1 -l node=1 -N job3 -j y -o logFile-cwd -t 31-40:1

I get that all the job arrays are going in series for a single machine, i.e. job1 goes to nodeX only (which is exactly what I want), but job2 doesn't go to nodeY, it waits until job1 finishes. I also tried to changed the number after the variable threaded, but with no difference.

Here is the configuration file for my PE variable threaded:
qconf -sp threaded
pe_name            threaded
slots              8
user_lists         NONE
xuser_lists        NONE
start_proc_args    /bin/true
stop_proc_args     /bin/true
allocation_rule    $pe_slots
control_slaves     FALSE
job_is_first_task  TRUE
urgency_slots      min
accounting_summary FALSE


What am I missing here experts?

Cheers,
Ahmed

On 19 October 2010 18:13, reuti <reuti at staff.uni-marburg.de<mailto:reuti at staff.uni-marburg.de>> wrote:
Hi,

Am 19.10.2010 um 15:16 schrieb aali:

> Is it possible to specify the number of nodes when you submit job array?
>
> To make it simple, I want this job array to run on single machine, so I am trying to submit these 10 jobs in the following command:
>
> qsub -P TEST -p -1 -l node=1 -N jobname -j y -o logFile-cwd -t 1-10:1
>
> But this doesn't work, so is it possible to control the number of nodes when you submit a job array?

well, if you bind it hard to one node, then you it could be submitted this way. Then you are of course limited to this particular node (i.e. "-l h=node004").

There is nothing in SGE which allows you to specify the number of nodes. But with recent versions of SGE you can use "-tc <int>" to specify the number of instances of your array job and avoid flooding the complete cluster this way.

-- Reuti


> Cheers,
> Ahmed

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=288413

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].





More information about the gridengine-users mailing list