[GE users] PE only offers 0 slots

Bart Willems b-willems at northwestern.edu
Sun Jun 29 21:27:26 BST 2008


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I am trying to set up tight integration between SGE 6.1 and MPICH2 1.0.7
using the daemon-based smpd startup method described by Reuti:

http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html

My problem is that I keep running into a "cannot run in PE "mpich2_smpd"
because it only offers 0 slots" error. My test job is the mpihello
program.

I currently have 4 serial job queues (debug.q, shortserial.q, medserial.q,
longserial.q) and 1 parallel job queue (shortparallel.q). For testing
purposes, I am trying out the parallel queue on three nodes: compute-0-60,
compute-0-61, and compute-0-62. All three nodes have two quad-core CPUs.
More details are included below.

I am new to both SGE and MPICH2, so any insight into this problem would be
most appreciated!

Thanks,
Bart


My job submission file:
=======================

#!/bin/bash

#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -l h_cpu=10:00:00
#$ -pe mpich2_smpd 2
#$ -P Parallel

export PATH=/share/apps/mpich2/bin:$PATH
port=$((JOB_ID % 5000 + 20000))
mpiexec -n $NSLOTS -machinefile $PWD/machines.$JOB_ID -port $port
./mpihello

exit


=========================================
Last few lines of output from :qstat -f":
=========================================

############################################################################
 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING
JOBS
############################################################################
   9446 0.50208 submit_mpi bart         qw    06/29/2008 14:50:46     1


==============================================
Last few lines of output from "qstat -j 9446":
==============================================

                            (no project) does not have the correct project
to run in cluster queue "medserial.q"
                            (no project) does not have the correct project
to run in cluster queue "shortserial.q"
                            (no project) does not have the correct project
to run in cluster queue "debug.q"
                            (no project) does not have the correct project
to run in cluster queue "longserial.q"
                            cannot run in PE "mpich2_smpd" because it only
offers 0 slots


=========================
Output from "qstat -g c":
=========================

CLUSTER QUEUE                   CQLOAD   USED  AVAIL  TOTAL aoACDS  cdsuE

-------------------------------------------------------------------------------
debug.q                           0.98      0    372    444      0     72 
longserial.q                      0.98    294      6    444    312     72 
medserial.q                       0.98    308     64    444      0     72 
shortparallel.q                   0.96      0      0     24      0     24 
shortserial.q                     0.98      0    372    444      0     72


========================================
Output from "qconf -sq shortparallel.q":
========================================

qname                 shortparallel.q
hostlist              @parallelhosts
seq_no                0
load_thresholds       np_load_avg=1.4
suspend_thresholds    NONE
nsuspend              1
suspend_interval      00:05:00
priority              0
min_cpu_interval      00:15:00
processors            UNDEFINED
qtype                 BATCH INTERACTIVE
ckpt_list             nwu
pe_list               mpich2_smpd
rerun                 FALSE
slots                 4,[@parallelhosts=8]
tmpdir                /tmp
shell                 /bin/csh
prolog                NONE
epilog                NONE
shell_start_mode      posix_compliant
starter_method        NONE
suspend_method        NONE
resume_method         NONE
terminate_method      NONE
notify                00:00:60
owner_list            NONE
user_lists            parallelusers
xuser_lists           NONE
subordinate_list      longserial.q=2, medserial.q=2, shortserial.q=2
complex_values        NONE
projects              Parallel
xprojects             NONE
calendar              NONE
initial_state         default
s_rt                  INFINITY
h_rt                  INFINITY
s_cpu                 INFINITY
h_cpu                 48:00:00
s_fsize               INFINITY
h_fsize               INFINITY
s_data                INFINITY
h_data                INFINITY
s_stack               INFINITY
h_stack               INFINITY
s_core                INFINITY
h_core                INFINITY
s_rss                 INFINITY
h_rss                 INFINITY
s_vmem                INFINITY
h_vmem                INFINITY


==========================================
Output from "qconf -shgrp @parallelhosts":
==========================================

group_name @parallelhosts
hostlist compute-0-60.local compute-0-61.local compute-0-62.local


======================================
Output from "qconf -su parallelusers":
======================================

name    parallelusers
type    ACL
fshare  0
oticket 0
entries bart


===================================
Output from "qconf -sprj Parallel":
===================================

name Parallel
oticket 0
fshare 0
acl parallelusers
xacl NONE


====================================
Output from "qconf -sp mpich2_smpd:"
====================================

pe_name           mpich2_smpd
slots             9999
user_lists        parallelusers
xuser_lists       NONE
start_proc_args   /opt/gridengine/mpich2_smpd/startmpich2.sh -catch_rsh \
                  $pe_hostfile /share/apps/mpich2
stop_proc_args    /opt/gridengine/mpich2_smpd/stopmpich2.sh -catch_rsh \
                  /share/apps/mpich2
allocation_rule   $round_robin
control_slaves    TRUE
job_is_first_task FALSE
urgency_slots     min





---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list