[GE users] Wildcards in PE still broken in 6.0u3

Reuti reuti at staff.uni-marburg.de
Fri Apr 29 16:14:11 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Tim,

what is:

qstat -r

showing as granted PEs? - Reuti


Quoting Tim Mueller <tim_mueller at hotmail.com>:

> Hi,
> 
> That's the problem.  The setup is actually
> 
> mymachine.q.0 references mymachine.0.mpi
> mymachine.q.1 references mymachine.1.mpi
> mymachine.q.2 references mymachine.2.mpi
> mymachine.q.3 references mymachine.3.mpi
> 
> There is no reason, as far as I can tell, that a job could ever be in both 
> mymachine.3.mpi and mymachine.q.1.  And oddly enough, when I use wildcards,
> 
> the the scheduler won't put a job assigned to mymachine.3.mpi into 
> mymachine.q.3 until all of the other queues are full.  At that point, it's 
> too late because mymachine.3.mpi is using 48 slots, when it's only allowed 
> to use up to 16.
> 
> When I don't use wildcards, I get the behavior I expect:  A job submitted to
> 
> mymachine.3.mpi gets put in mymachine.q.3, etc.
> 
> Tim
> 
> ----- Original Message ----- 
> From: "Stephan Grell - Sun Germany - SSG - Software Engineer" 
> <stephan.grell at sun.com>
> To: <users at gridengine.sunsource.net>
> Sent: Friday, April 29, 2005 2:34 AM
> Subject: Re: [GE users] Wildcards in PE still broken in 6.0u3
> 
> 
> > Hi Tim,
> >
> > I am not quite sure I understand your setup. Could you please attach your
> 
> > cqueue configuration? From
> > the results you posted, it reads as if:
> > queue
> > mymachine.q.0  references mymachine.3.mpi
> > mymachine.q.1  reference mymachine.3.mpi
> >
> > and so on.
> >
> > Cheers,
> > Stephan
> >
> > Tim Mueller wrote:
> >
> >> Hi,
> >>  It appears that wildcards in the Parallel Environment name still have 
> >> problems in 6.0u3.  I have set up a linux cluster of 32 dual processor 
> >> Noconas running Linux.  There are 4 queues of 16 processors each, and a 
> >> corresponding pe for each queue.  The queues are named as follows:
> >>  mymachine.q.0
> >> mymachine.q.1
> >> mymachine.q.2
> >> mymachine.q.3
> >>  And the PE's are
> >>  mymachine.0.mpi
> >> mymachine.1.mpi
> >> mymachine.2.mpi
> >> mymachine.3.mpi
> >>  All of the PE's have 16 slots.  When I submit a job with the following 
> >> line:
> >>  #$ -pe *.mpi 8
> >>  the job will be assigned to a seemingly random PE, but then placed in a
> 
> >> queue that does not correspond to that PE.  I can submit up to 6 jobs 
> >> this way, each of which will get assigned to the same PE and placed in 
> >> any queue that does not correspond to the PE.  This causes 48 processors
> 
> >> to be used for a PE with only 16 slots.  E.g., I might get:
> >>  Job 1        mymachine.3.mpi        mymachine.q.0        8 processors
> >> Job 2        mymachine.3.mpi        mymachine.q.0        8 processors
> >> Job 3        mymachine.3.mpi        mymachine.q.1        8 processors
> >> Job 4        mymachine.3.mpi        mymachine.q.1        8 processors
> >> Job 5        mymachine.3.mpi        mymachine.q.2        8 processors
> >> Job 6        mymachine.3.mpi        mymachine.q.2        8 processors
> >> Job 7        qw
> >> Job 8        qw
> >>  When I should get:
> >>  Job 1        mymachine.0.mpi        mymachine.q.0        8 processors
> >> Job 2        mymachine.0.mpi        mymachine.q.0        8 processors
> >> Job 3        mymachine.1.mpi        mymachine.q.1        8 processors
> >> Job 4        mymachine.1.mpi        mymachine.q.1        8 processors
> >> Job 5        mymachine.2.mpi        mymachine.q.2        8 processors
> >> Job 6        mymachine.2.mpi        mymachine.q.2        8 processors
> >> Job 5        mymachine.3.mpi        mymachine.q.3        8 processors
> >> Job 6        mymachine.3.mpi        mymachine.q.3        8 processors
> >>  If I try to then submit a job directly (with no wildcard) to the PE that
> 
> >> all of the jobs were assigned to, it will not run because I have already
> 
> >> far exceeded the slots limit for this PE.
> >>  I should note that when I do not use wildcards, everything behaves as it
> 
> >> should.  E.g, a job submitted to mymachine.2.mpi will be assigned to 
> >> mymachine.2.mpi and mymachine.2.q, and I cannot use more than 16 slots in
> 
> >> mymachine.2.mpi at once.
> >>  I searched the list, and although there seem to have been other problems
> 
> >> with wildcards in the past, I have seen nothing that references this 
> >> behavior.  Does anyone have an explanation / workaround?
> >>  Tim
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list