[GE users] parallel environment jobs not being run

Brian Wey bwey at solarflare.com
Thu Oct 2 18:51:37 BST 2008


I am getting the following message from 'qstat -j job_number' from a job
that isn't starting:

 

cannot run in PE "multicpu" because it only offers 0 slots

 

The job was submitted with 'qsub -pe multicpu 4'.

 

Here is the output from 'qstat -f':

 

analog_lnx_M32 at ana01.gb.solarf BIP   5/8       2.53     lx24-amd64

analog_lnx_M32 at bat.gb.solarfla BIP   0/8       -NA-     lx24-amd64    au

analog_lnx_M32 at cutlass.gb.sola BIP   2/8       0.03     lx24-amd64

analog_lnx_M32 at epee.gb.solarfl BIP   4/8       1.76     lx24-amd64

analog_lnx_M32 at gladius.gb.sola BIP   0/8       0.00     lx24-amd64    d

analog_lnx_M32 at knife.gb.solarf BIP   5/8       2.75     lx24-amd64

 

Here is the output from 'qstat -sp multicpu'

 

grid04> qconf -sp multicpu

pe_name           multicpu

slots             999

user_lists        NONE

xuser_lists       NONE

start_proc_args   /bin/true

stop_proc_args    /bin/true

allocation_rule   $pe_slots

control_slaves    FALSE

job_is_first_task TRUE

urgency_slots     min

 

I would expect the job to run in the analog_lnx_M32 at cutlass queue.
While there are two slots currently occupied, the system is idle and has
plenty of free memory, swap space, and other resources.  There are no
messages in the qstat -j output that indicate that queue is not
available.

 

What am I doing wrong?

 

Thanks, Brian




More information about the gridengine-users mailing list