[GE users] pe_slots issues

Reuti reuti at staff.uni-marburg.de
Fri Nov 2 14:52:54 GMT 2007

Am 02.11.2007 um 14:32 schrieb John Coldrick:

> On Thursday 01 November 2007 17:43, Reuti wrote:
>> Am 01.11.2007 um 21:16 schrieb John Coldrick:
>>> 	I've got a PE set up:
>>> ***
>>> 	pe_name           m9_1
>>> 	slots             999
>>> 	user_lists        NONE
>>> 	xuser_lists       NONE
>>> 	start_proc_args   /bin/true
>>> 	stop_proc_args    /bin/true
>>> 	allocation_rule   $pe_slots
>> You might want to try $round_robin All possible values are listed in
>> man sge_pe.
> 	Thanks for getting back...my apps can't run cross-system, they all  
> must run
> on a single system, which is why I'm using $pe_slots.  I assume  
> that's the
> one I should be using, right?  If I do use round_robin, it splits  
> the 4 slots
> up over multiple systems, as it should.

Okay, I see.

>> No. With $pe_slots all slots must come from one node, and maybe there
>> is already something else running, so you can't get more. You defined
>> 8 also for the slot count in the queue configuration for these three
>> nodes?
> 	Correct.  I've got complete control of the grid, so nothing else  
> is running
> when I'm testing, and all the systems have all their slots open and
> available.

So only one queue on these 8-core machines?

>> Often advisable with parallel jobs is to request reservation with "-R
>> y" in qsub and set a sensible value for "max_reservation" in the
>> scheduler configuration.
> 	That makes no difference - I've tried reserving along with  
> variations of 0-8
> for the max reservation, and the behaviour is the same.

It's the number of jobs. 20 maybe good.

> 	Is there anywhere else where a maximum ceiling of three(or 'n')  
> slots could
> exist?  SGE6.0 worked fine with this, I like to keep current if I can,
> though.  :)  I can't help but think there's a new default somewhere  
> that
> didn't exist in 6.0 that's I'm getting caught up on.  Just to check  
> - getting
> that message that the PE only offers '0' slots - isn't that  
> indicative of
> something being very wrong?  If I qalter the existing job to 3  
> slots, off it
> goes, it runs using the PE.  It seems fundamentally wrong to me  
> that this
> message shows up at all given that, unless it's more generic than I'm
> assuming and it's a catchall for numerous variables failing, such  
> as load or
> mem(which, btw, is fine, provable by requesting 3 slots running fine).

In the queue definition you defined more slots for these three machines?

-- Reuti

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list