[GE users] submitting parallel jobs

reuti reuti at staff.uni-marburg.de
Mon Nov 2 12:18:38 GMT 2009


Am 30.10.2009 um 17:09 schrieb cgull:

> Hi
> I am new to SGE and am slightly struggling with how queues should  
> be working.
> I have installed 6.2u3 and started configuring this.
> I have been able to setup SGE so that I have a master which has 3  
> queues setup : bira cevert hill
> These three queues each have their own hostgroup (this is of four  
> machine each) therefore @hill is made up of (hill hill2 hill3  
> hill4), @cevert (cevert cevert2 cevert3 cevert4), and @bira (bira  
> bira2 bira3 bira4).

so you have 12 machines in total.

> I have also setup a parallel environment (orte) so that I can run  
> across the 4 machines.
> I am able to submit a parallel job with "qsub -q  hill script.txt",  
> this appears to pick up the machines that I want correctly.
> I am also able to submit using the hostgroup "qsub -q '*@hill'  
> script.txt' this also appears to pick up the machines I want.

You mean *@@hill I guess.

> I also can submit to bira and hill ok in similar methods.
> What I would like to do is either submit the job so that a job goes  
> to ALL the queues and if one is available run on it.
> If none are available wait until a queue becomes available.
> I thought I may be able to do this by using "qsub -soft -q  
> '*@cevert' '*@bira' '*@hill'  script"

This would supply three arguments to the -q option.

When you come from Torque, this might be unfamiliar: there you submit  
into a queue. SGE is different: you specify resource requests and SGE  
will select an appropriate queue for the job.

Therefore you need only one queue, and can submit with `qsub -q  
*@@bira script.txt`. If you don't care on which of the three  
hostgroups the job runs, you will need three PEs. Once a PE is  
selected, only slots from this PE will be collected for a job. OTOH:  
if a PE is attached to different hostgroups or queues, you might get  
a mixture of slots - just what fits the resource request.

Using three PE like orte_bira, orte_cevert and orte_hill you need a  
PE definition of:

pe_list NONE,[@bira=orte_bira],[@cevert=orte_cevert],[@hill=orte_hill]

and submit with: qsub -pe orte* 4 ...

which forces SGE to select a PE, and then collect slots from the  
queues it is attached to.

(BTW: specifying more than one queue has the notation: qsub -q  
all.q,extra.q,any.q (you don't need the * for the hosts).

-- Reuti

> But qsub does not appear to like two being defined gives error  
> "Unable to read script file because of error: error opening *@bira:  
> No such file or directory"
> I wonder how I should be configuring this so that I am able to  
> achieve the this?
> Any help would be most appreciated.
> If you need any further explanation or more information please do  
> not hesitate to ask.
> Thanks for your time in advance,
> Best regards,
> Matt McNally


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list