[GE users] pe_slots issues
Reuti
reuti at staff.uni-marburg.de
Thu Nov 1 21:43:01 GMT 2007
Am 01.11.2007 um 21:16 schrieb John Coldrick:
>
> I've got a PE set up:
> ***
> pe_name m9_1
> slots 999
> user_lists NONE
> xuser_lists NONE
> start_proc_args /bin/true
> stop_proc_args /bin/true
> allocation_rule $pe_slots
You might want to try $round_robin All possible values are listed in
man sge_pe.
> control_slaves FALSE
> job_is_first_task FALSE
> urgency_slots min
> ***
> When I submit a task requiring more than 3 slots, via:
>
> qsub -pe m9_1 4 task.sh
>
> They won't run, I get
>
> Cannot run in PE "m9_1" because it only offers 0 slots. If I
> change '4' to 3
> or less, it works. Anything more than 4, same problem. If I
> remove the
> request for the PE completely and leave all other params the same, it
> runs(although obviously not properly allocating multiple slots from
> SGE).
>
> I have 3 systems here that have 8 cores/slots, here's the qstat -
> se output of
> one of them:
> ***
> hostname xxxx.axyzfx.com
> load_scaling NONE
> complex_values mem_free=8
> load_values load_avg=0.180000,load_short=0.410000, \
>
> load_medium=0.180000,load_long=0.140000,arch=lx24-amd64,
> \
>
> num_proc=8,mem_free=7774.554688M,swap_free=2047.175781M,
> \
>
> virtual_free=9821.730469M,mem_total=7991.695312M, \
>
> swap_total=2055.148438M,virtual_total=10046.843750M, \
> mem_used=217.140625M,swap_used=7.972656M, \
> virtual_used=225.113281M,cpu=0.000000, \
> np_load_avg=0.022500,np_load_short=0.051250, \
> np_load_medium=0.022500,np_load_long=0.017500
> processors 8
> user_lists NONE
> xuser_lists NONE
> projects NONE
> xprojects NONE
> usage_scaling NONE
> report_variables NONE
> ***
>
> so you can see it's not a load issue.
>
> I have the PE assigned in *two* places(to be safe - I'm completely
> unclear
> what the difference means) - in both the "Customize" and the "Modify"
> portions of the queue(all.q).
You can check with "qconf -sq all.q". It should read there "pe_list
m9_1".
"Customize" will customize the output in qmon. "Modify" will change
the queue settings - this you need.
> This used to work when I was running SGE 6.0u10 - the problem
> seemed to be
> introduced when I upgraded to 6.1u2(actually a fresh install). One
> thing,
> which may or may not be a clue, is that there are *only* 3 systems
> on the
> grid that have more that 2 slots. Coincidence that 3 slots is the
> maximum I
> can run with this PE?
No. With $pe_slots all slots must come from one node, and maybe there
is already something else running, so you can't get more. You defined
8 also for the slot count in the queue configuration for these three
nodes?
Often advisable with parallel jobs is to request reservation with "-R
y" in qsub and set a sensible value for "max_reservation" in the
scheduler configuration.
-- Reuti
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users
mailing list