[GE users] Wildcards in PE still broken in 6.0u3

Tim Mueller tim_mueller at hotmail.com
Fri Apr 29 15:51:34 BST 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

That's the problem.  The setup is actually

mymachine.q.0 references mymachine.0.mpi
mymachine.q.1 references mymachine.1.mpi
mymachine.q.2 references mymachine.2.mpi
mymachine.q.3 references mymachine.3.mpi

There is no reason, as far as I can tell, that a job could ever be in both 
mymachine.3.mpi and mymachine.q.1.  And oddly enough, when I use wildcards, 
the the scheduler won't put a job assigned to mymachine.3.mpi into 
mymachine.q.3 until all of the other queues are full.  At that point, it's 
too late because mymachine.3.mpi is using 48 slots, when it's only allowed 
to use up to 16.

When I don't use wildcards, I get the behavior I expect:  A job submitted to 
mymachine.3.mpi gets put in mymachine.q.3, etc.

Tim

----- Original Message ----- 
From: "Stephan Grell - Sun Germany - SSG - Software Engineer" 
<stephan.grell at sun.com>
To: <users at gridengine.sunsource.net>
Sent: Friday, April 29, 2005 2:34 AM
Subject: Re: [GE users] Wildcards in PE still broken in 6.0u3


> Hi Tim,
>
> I am not quite sure I understand your setup. Could you please attach your 
> cqueue configuration? From
> the results you posted, it reads as if:
> queue
> mymachine.q.0  references mymachine.3.mpi
> mymachine.q.1  reference mymachine.3.mpi
>
> and so on.
>
> Cheers,
> Stephan
>
> Tim Mueller wrote:
>
>> Hi,
>>  It appears that wildcards in the Parallel Environment name still have 
>> problems in 6.0u3.  I have set up a linux cluster of 32 dual processor 
>> Noconas running Linux.  There are 4 queues of 16 processors each, and a 
>> corresponding pe for each queue.  The queues are named as follows:
>>  mymachine.q.0
>> mymachine.q.1
>> mymachine.q.2
>> mymachine.q.3
>>  And the PE's are
>>  mymachine.0.mpi
>> mymachine.1.mpi
>> mymachine.2.mpi
>> mymachine.3.mpi
>>  All of the PE's have 16 slots.  When I submit a job with the following 
>> line:
>>  #$ -pe *.mpi 8
>>  the job will be assigned to a seemingly random PE, but then placed in a 
>> queue that does not correspond to that PE.  I can submit up to 6 jobs 
>> this way, each of which will get assigned to the same PE and placed in 
>> any queue that does not correspond to the PE.  This causes 48 processors 
>> to be used for a PE with only 16 slots.  E.g., I might get:
>>  Job 1        mymachine.3.mpi        mymachine.q.0        8 processors
>> Job 2        mymachine.3.mpi        mymachine.q.0        8 processors
>> Job 3        mymachine.3.mpi        mymachine.q.1        8 processors
>> Job 4        mymachine.3.mpi        mymachine.q.1        8 processors
>> Job 5        mymachine.3.mpi        mymachine.q.2        8 processors
>> Job 6        mymachine.3.mpi        mymachine.q.2        8 processors
>> Job 7        qw
>> Job 8        qw
>>  When I should get:
>>  Job 1        mymachine.0.mpi        mymachine.q.0        8 processors
>> Job 2        mymachine.0.mpi        mymachine.q.0        8 processors
>> Job 3        mymachine.1.mpi        mymachine.q.1        8 processors
>> Job 4        mymachine.1.mpi        mymachine.q.1        8 processors
>> Job 5        mymachine.2.mpi        mymachine.q.2        8 processors
>> Job 6        mymachine.2.mpi        mymachine.q.2        8 processors
>> Job 5        mymachine.3.mpi        mymachine.q.3        8 processors
>> Job 6        mymachine.3.mpi        mymachine.q.3        8 processors
>>  If I try to then submit a job directly (with no wildcard) to the PE that 
>> all of the jobs were assigned to, it will not run because I have already 
>> far exceeded the slots limit for this PE.
>>  I should note that when I do not use wildcards, everything behaves as it 
>> should.  E.g, a job submitted to mymachine.2.mpi will be assigned to 
>> mymachine.2.mpi and mymachine.2.q, and I cannot use more than 16 slots in 
>> mymachine.2.mpi at once.
>>  I searched the list, and although there seem to have been other problems 
>> with wildcards in the past, I have seen nothing that references this 
>> behavior.  Does anyone have an explanation / workaround?
>>  Tim
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list