[GE users] Parallel jobs not running : 0 slots available

leonardz leonardz at sickkids.ca
Fri Aug 7 15:44:35 BST 2009


On our system we run one parallel environment per node to ensure parallel jobs do not schedule across multiple nodes.
The production environment is sge6.0u8. 

Jobs are not running in this queue. The scalar all.q runs fine. When I schedule a parallel  job on, eg cn-r5-10 I use 
ompi8p-5-10 5  
to ask for 5 cores on that specific host. I see:

qstat -j 7294180 | grep -- 5-10
parallel environment:  ompi8p-5-10 range: 5
                            cannot run in PE "ompi8p-5-10" because it only offers 0 slots

But I have cjhecked PE, Q and ExecHost definitions and they all say 8 slots. Is there somewhere else I need to check to see if slots > 0 for a parallel job??


So for example:

on host cn-r5-10
qconf -se cn-r5-10
hostname              cn-r5-10
load_scaling          NONE
complex_values        slots=8, .....

the PE is ompi8p-5-10
qconf -sp ompi8p-5-10
pe_name           ompi8p-5-10
slots             8
user_lists        NONE
xuser_lists       NONE
start_proc_args   /bin/true
stop_proc_args    /bin/true
allocation_rule   $fill_up
.....
The queue ompi8p-5-10.q has:
qconf -sq ompi8p-5-10.q
qname                 ompi8p-5-10.q
hostlist              cn-r5-10
seq_no                25
.....
pe_list               ompi8p-5-10
rerun                 FALSE
slots                 8
.....

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=211378

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list