[GE users] cannot run in PE when requesting consumable

mhanby mhanby at uab.edu
Mon Mar 22 18:14:14 GMT 2010


Howdy, (GE 6.2u5) I have defined a consumable license resource, but when request more than 11 the job will not start with the "cannot run in PE" error:

parallel environment:  matlab range: 32
scheduling info:            queue instance " compute-0-52.local" dropped because it is full
                            ............
                            cannot run in PE "matlab" because it only offers 4 slots

Here is the consumable:
$ qconf -sc |grep matlab
#name               shortcut   type        relop requestable consumable default  urgency 
#----------------------------------------------------------------------------------------
matlab_dcs          ml_dcs     INT         <=    YES         YES        0        0

I added it to the global execution host:
$ qconf -se global|grep matlab
complex_values        matlab_dcs=128

I can verify that no matlab_dcs are consumed at the moment:
$ qstat -s r -r |grep matlab_dcs

Then submit the job
$ qsub -pe matlab 32 -l matlab_dcs=32 myjob.qsub
$ qstat -r -u $USER

134523 0.55108 Job24      mikeh     qw    03/19/2010 13:22:32                     32
       Full jobname:     Job24
       Requested PE:     matlab 32
       Hard Resources:   matlab_dcs=32 (0.000000)
                         h_rt=4020 (0.000000)
                         s_rt=3900 (0.000000)
                         mem_free=1G (0.000000)
       Soft Resources:   

And look at qstat -j:
$ qstat -j 134523|tail -n 1
                            cannot run in PE "matlab" because it only offers 4 slots


I have the matlab PE set up with 128 slots and have also verified this same behavior using other PE's and this consumable (openmpi, for example). I also verified that this job is the only one in the queue requesting that PE. Each time I get the same error.

If I request "matlab_dcs=10" or "matlab_dcs=11" the job starts just fine, but more than 12, the job won't start:
$ qsub -pe matlab 10 -l matlab_dcs=10 myjob.qsub
$ qstat -u $USER -r
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
 224664 0.53903 j_openmpi_ mikeh        r     03/22/2010 12:59:05 all.q at compute-0-46.local       10        
       Full jobname:     Job24
       Master Queue:     all.q at compute-0-46.local
       Requested PE:     matlab 10
       Granted PE:       matlab 10
       Hard Resources:   h_rt=3900 (0.000000)
                         matlab_dcs=10 (0.000000)
                         s_rt=3840 (0.000000)

Am I missing something when defining the consumable?

Is there any way to tell what qmaster thinks is the current number of checked out consumables other than doing "qstat -s r -r |grep matlab_dcs"? 

=================================
Mike Hanby
mhanby at uab.edu
Information Systems Specialist II
IT HPCS / Research Computing

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250561

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list