[GE users] Wildcards in PE still broken in 6.0u3

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Fri Apr 29 07:34:45 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Tim,

I am not quite sure I understand your setup. Could you please attach 
your cqueue configuration? From
the results you posted, it reads as if:
queue
mymachine.q.0  references mymachine.3.mpi
mymachine.q.1  reference mymachine.3.mpi

and so on.

Cheers,
Stephan

Tim Mueller wrote:

> Hi,
>  
> It appears that wildcards in the Parallel Environment name still have 
> problems in 6.0u3.  I have set up a linux cluster of 32 dual processor 
> Noconas running Linux.  There are 4 queues of 16 processors each, and 
> a corresponding pe for each queue.  The queues are named as follows:
>  
> mymachine.q.0
> mymachine.q.1
> mymachine.q.2
> mymachine.q.3
>  
> And the PE's are
>  
> mymachine.0.mpi
> mymachine.1.mpi
> mymachine.2.mpi
> mymachine.3.mpi
>  
> All of the PE's have 16 slots.  When I submit a job with the following 
> line:
>  
> #$ -pe *.mpi 8
>  
> the job will be assigned to a seemingly random PE, but then placed in 
> a queue that does not correspond to that PE.  I can submit up to 6 
> jobs this way, each of which will get assigned to the same PE and 
> placed in any queue that does not correspond to the PE.  This causes 
> 48 processors to be used for a PE with only 16 slots.  E.g., I might get:
>  
> Job 1        mymachine.3.mpi        mymachine.q.0        8 processors
> Job 2        mymachine.3.mpi        mymachine.q.0        8 processors
> Job 3        mymachine.3.mpi        mymachine.q.1        8 processors
> Job 4        mymachine.3.mpi        mymachine.q.1        8 processors
> Job 5        mymachine.3.mpi        mymachine.q.2        8 processors
> Job 6        mymachine.3.mpi        mymachine.q.2        8 processors
> Job 7        qw
> Job 8        qw
>  
> When I should get:
>  
> Job 1        mymachine.0.mpi        mymachine.q.0        8 processors
> Job 2        mymachine.0.mpi        mymachine.q.0        8 processors
> Job 3        mymachine.1.mpi        mymachine.q.1        8 processors
> Job 4        mymachine.1.mpi        mymachine.q.1        8 processors
> Job 5        mymachine.2.mpi        mymachine.q.2        8 processors
> Job 6        mymachine.2.mpi        mymachine.q.2        8 processors
> Job 5        mymachine.3.mpi        mymachine.q.3        8 processors
> Job 6        mymachine.3.mpi        mymachine.q.3        8 processors
>  
> If I try to then submit a job directly (with no wildcard) to the PE 
> that all of the jobs were assigned to, it will not run because I have 
> already far exceeded the slots limit for this PE.
>  
> I should note that when I do not use wildcards, everything behaves as 
> it should.  E.g, a job submitted to mymachine.2.mpi will be assigned 
> to mymachine.2.mpi and mymachine.2.q, and I cannot use more than 16 
> slots in mymachine.2.mpi at once.
>  
> I searched the list, and although there seem to have been other 
> problems with wildcards in the past, I have seen nothing that 
> references this behavior.  Does anyone have an explanation / workaround?
>  
> Tim



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list