[GE users] pe_slots issues

Reuti reuti at staff.uni-marburg.de
Thu Nov 1 21:43:01 GMT 2007


Am 01.11.2007 um 21:16 schrieb John Coldrick:

>
> 	I've got a PE set up:
> ***
> 	pe_name           m9_1
> 	slots             999
> 	user_lists        NONE
> 	xuser_lists       NONE
> 	start_proc_args   /bin/true
> 	stop_proc_args    /bin/true
> 	allocation_rule   $pe_slots

You might want to try $round_robin All possible values are listed in  
man sge_pe.

> 	control_slaves    FALSE
> 	job_is_first_task FALSE
> 	urgency_slots     min
> ***
> 	When I submit a task requiring more than 3 slots, via:
>
> 	qsub -pe m9_1 4 task.sh
>
> 	They won't run, I get
> 	
> 	Cannot run in PE "m9_1" because it only offers 0 slots.  If I  
> change '4' to 3
> or less, it works.  Anything more than 4, same problem.  If I  
> remove the
> request for the PE completely and leave all other params the same, it
> runs(although obviously not properly allocating multiple slots from  
> SGE).
>
> 	I have 3 systems here that have 8 cores/slots, here's the qstat - 
> se output of
> one of them:
> ***
> hostname              xxxx.axyzfx.com
> load_scaling          NONE
> complex_values        mem_free=8
> load_values           load_avg=0.180000,load_short=0.410000, \
>                        
> load_medium=0.180000,load_long=0.140000,arch=lx24-amd64,
> \
>                        
> num_proc=8,mem_free=7774.554688M,swap_free=2047.175781M,
> \
>                        
> virtual_free=9821.730469M,mem_total=7991.695312M, \
>                        
> swap_total=2055.148438M,virtual_total=10046.843750M, \
>                       mem_used=217.140625M,swap_used=7.972656M, \
>                       virtual_used=225.113281M,cpu=0.000000, \
>                       np_load_avg=0.022500,np_load_short=0.051250, \
>                       np_load_medium=0.022500,np_load_long=0.017500
> processors            8
> user_lists            NONE
> xuser_lists           NONE
> projects              NONE
> xprojects             NONE
> usage_scaling         NONE
> report_variables      NONE
> ***
>
> so you can see it's not a load issue.
>
> 	I have the PE assigned in *two* places(to be safe - I'm completely  
> unclear
> what the difference means) - in both the "Customize" and the "Modify"
> portions of the queue(all.q).

You can check with "qconf -sq all.q". It should read there "pe_list  
m9_1".

"Customize" will customize the output in qmon. "Modify" will change  
the queue settings - this you need.

> 	This used to work when I was running SGE 6.0u10 - the problem  
> seemed to be
> introduced when I upgraded to 6.1u2(actually a fresh install).  One  
> thing,
> which may or may not be a clue, is that there are *only* 3 systems  
> on the
> grid that have more that 2 slots.  Coincidence that 3 slots is the  
> maximum I
> can run with this PE?

No. With $pe_slots all slots must come from one node, and maybe there  
is already something else running, so you can't get more. You defined  
8 also for the slot count in the queue configuration for these three  
nodes?

Often advisable with parallel jobs is to request reservation with "-R  
y" in qsub and set a sensible value for "max_reservation" in the  
scheduler configuration.

-- Reuti


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list