[GE users] Wildcards in PE still broken in 6.0u3

Tim Mueller tim_mueller at hotmail.com
Fri Apr 29 01:21:39 BST 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

It appears that wildcards in the Parallel Environment name still have problems in 6.0u3.  I have set up a linux cluster of 32 dual processor Noconas running Linux.  There are 4 queues of 16 processors each, and a corresponding pe for each queue.  The queues are named as follows:

mymachine.q.0
mymachine.q.1
mymachine.q.2
mymachine.q.3

And the PE's are

mymachine.0.mpi
mymachine.1.mpi
mymachine.2.mpi
mymachine.3.mpi

All of the PE's have 16 slots.  When I submit a job with the following line:

#$ -pe *.mpi 8

the job will be assigned to a seemingly random PE, but then placed in a queue that does not correspond to that PE.  I can submit up to 6 jobs this way, each of which will get assigned to the same PE and placed in any queue that does not correspond to the PE.  This causes 48 processors to be used for a PE with only 16 slots.  E.g., I might get:

Job 1        mymachine.3.mpi        mymachine.q.0        8 processors
Job 2        mymachine.3.mpi        mymachine.q.0        8 processors
Job 3        mymachine.3.mpi        mymachine.q.1        8 processors
Job 4        mymachine.3.mpi        mymachine.q.1        8 processors
Job 5        mymachine.3.mpi        mymachine.q.2        8 processors
Job 6        mymachine.3.mpi        mymachine.q.2        8 processors
Job 7        qw
Job 8        qw

When I should get:

Job 1        mymachine.0.mpi        mymachine.q.0        8 processors
Job 2        mymachine.0.mpi        mymachine.q.0        8 processors
Job 3        mymachine.1.mpi        mymachine.q.1        8 processors
Job 4        mymachine.1.mpi        mymachine.q.1        8 processors
Job 5        mymachine.2.mpi        mymachine.q.2        8 processors
Job 6        mymachine.2.mpi        mymachine.q.2        8 processors
Job 5        mymachine.3.mpi        mymachine.q.3        8 processors
Job 6        mymachine.3.mpi        mymachine.q.3        8 processors

If I try to then submit a job directly (with no wildcard) to the PE that all of the jobs were assigned to, it will not run because I have already far exceeded the slots limit for this PE.

I should note that when I do not use wildcards, everything behaves as it should.  E.g, a job submitted to mymachine.2.mpi will be assigned to mymachine.2.mpi and mymachine.2.q, and I cannot use more than 16 slots in mymachine.2.mpi at once.

I searched the list, and although there seem to have been other problems with wildcards in the past, I have seen nothing that references this behavior.  Does anyone have an explanation / workaround?

Tim



More information about the gridengine-users mailing list