[GE users] pe_slots issues
reuti at staff.uni-marburg.de
Fri Nov 2 14:52:54 GMT 2007
Am 02.11.2007 um 14:32 schrieb John Coldrick:
> On Thursday 01 November 2007 17:43, Reuti wrote:
>> Am 01.11.2007 um 21:16 schrieb John Coldrick:
>>> I've got a PE set up:
>>> pe_name m9_1
>>> slots 999
>>> user_lists NONE
>>> xuser_lists NONE
>>> start_proc_args /bin/true
>>> stop_proc_args /bin/true
>>> allocation_rule $pe_slots
>> You might want to try $round_robin All possible values are listed in
>> man sge_pe.
> Thanks for getting back...my apps can't run cross-system, they all
> must run
> on a single system, which is why I'm using $pe_slots. I assume
> that's the
> one I should be using, right? If I do use round_robin, it splits
> the 4 slots
> up over multiple systems, as it should.
Okay, I see.
>> No. With $pe_slots all slots must come from one node, and maybe there
>> is already something else running, so you can't get more. You defined
>> 8 also for the slot count in the queue configuration for these three
> Correct. I've got complete control of the grid, so nothing else
> is running
> when I'm testing, and all the systems have all their slots open and
So only one queue on these 8-core machines?
>> Often advisable with parallel jobs is to request reservation with "-R
>> y" in qsub and set a sensible value for "max_reservation" in the
>> scheduler configuration.
> That makes no difference - I've tried reserving along with
> variations of 0-8
> for the max reservation, and the behaviour is the same.
It's the number of jobs. 20 maybe good.
> Is there anywhere else where a maximum ceiling of three(or 'n')
> slots could
> exist? SGE6.0 worked fine with this, I like to keep current if I can,
> though. :) I can't help but think there's a new default somewhere
> didn't exist in 6.0 that's I'm getting caught up on. Just to check
> - getting
> that message that the PE only offers '0' slots - isn't that
> indicative of
> something being very wrong? If I qalter the existing job to 3
> slots, off it
> goes, it runs using the PE. It seems fundamentally wrong to me
> that this
> message shows up at all given that, unless it's more generic than I'm
> assuming and it's a catchall for numerous variables failing, such
> as load or
> mem(which, btw, is fine, provable by requesting 3 slots running fine).
In the queue definition you defined more slots for these three machines?
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users