[GE users] Selecting an IO host and cluster nodes

Marc Langlois marc at keyseismic.com
Tue May 31 00:43:16 BST 2005

I'm trying to configure SGE 6 to allow a user to start a parallel job on
a selected file server (IO) node, then allocate compute nodes from a
cluster, but only allow a single process to ever run on each cluster
node. The IO nodes are running Solaris 8, the cluster nodes Fedora Core

This was discussed in a thread titled "Pick nodes from one queue plus 1
node from another" on 04-apr-2005, and implements the PBS node
allocation that can be specified with:

  "-l nodes=ionode:1+clusternode:3".

I followed the suggestions by Craig Tierny in that thread:

With 2 IO nodes named io[1-2], and 3 cluster nodes named cls[1-3]:
- for each IO node, create a host group (named @ioN) that includes the
one IO node and all 3 cluster nodes.
- create a PE named "peioN" for each IO node with 4 slots.
- create a cluster queue named "ioN.q" for each IO node that only
includes its own host group.

To start two jobs on IO nodes io1 and io2, I'm using:

qsub -pe peio1 4 -masterq io1.q at io1 myjob.sh
qsub -pe peio2 4 -masterq io2.q at io2 myjob.sh

What I expected to happen was the second job would hold in the "qw"
state until the first completed, since it needs the same cluster nodes
as allocated for the first job. But, both jobs get started at the same
time, which is not what I want. This does make some sense when looking
at the queue instances, since there are a total of 8 queue slots

Am I missing something in my setup or how I'm using qsub? Should the
allocation rule in the PE have any effect? I've tried '1' and
'$fill_up', but I get the same behavior with both.


Marc Langlois
marc at keyseismic dot com
Calgary, AB, Canada

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list