[GE users] Wildcards in PE still broken in 6.0u3
Reuti
reuti at staff.uni-marburg.de
Fri Apr 29 16:14:11 BST 2005
[ The following text is in the "ISO-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
Hi Tim,
what is:
qstat -r
showing as granted PEs? - Reuti
Quoting Tim Mueller <tim_mueller at hotmail.com>:
> Hi,
>
> That's the problem. The setup is actually
>
> mymachine.q.0 references mymachine.0.mpi
> mymachine.q.1 references mymachine.1.mpi
> mymachine.q.2 references mymachine.2.mpi
> mymachine.q.3 references mymachine.3.mpi
>
> There is no reason, as far as I can tell, that a job could ever be in both
> mymachine.3.mpi and mymachine.q.1. And oddly enough, when I use wildcards,
>
> the the scheduler won't put a job assigned to mymachine.3.mpi into
> mymachine.q.3 until all of the other queues are full. At that point, it's
> too late because mymachine.3.mpi is using 48 slots, when it's only allowed
> to use up to 16.
>
> When I don't use wildcards, I get the behavior I expect: A job submitted to
>
> mymachine.3.mpi gets put in mymachine.q.3, etc.
>
> Tim
>
> ----- Original Message -----
> From: "Stephan Grell - Sun Germany - SSG - Software Engineer"
> <stephan.grell at sun.com>
> To: <users at gridengine.sunsource.net>
> Sent: Friday, April 29, 2005 2:34 AM
> Subject: Re: [GE users] Wildcards in PE still broken in 6.0u3
>
>
> > Hi Tim,
> >
> > I am not quite sure I understand your setup. Could you please attach your
>
> > cqueue configuration? From
> > the results you posted, it reads as if:
> > queue
> > mymachine.q.0 references mymachine.3.mpi
> > mymachine.q.1 reference mymachine.3.mpi
> >
> > and so on.
> >
> > Cheers,
> > Stephan
> >
> > Tim Mueller wrote:
> >
> >> Hi,
> >> It appears that wildcards in the Parallel Environment name still have
> >> problems in 6.0u3. I have set up a linux cluster of 32 dual processor
> >> Noconas running Linux. There are 4 queues of 16 processors each, and a
> >> corresponding pe for each queue. The queues are named as follows:
> >> mymachine.q.0
> >> mymachine.q.1
> >> mymachine.q.2
> >> mymachine.q.3
> >> And the PE's are
> >> mymachine.0.mpi
> >> mymachine.1.mpi
> >> mymachine.2.mpi
> >> mymachine.3.mpi
> >> All of the PE's have 16 slots. When I submit a job with the following
> >> line:
> >> #$ -pe *.mpi 8
> >> the job will be assigned to a seemingly random PE, but then placed in a
>
> >> queue that does not correspond to that PE. I can submit up to 6 jobs
> >> this way, each of which will get assigned to the same PE and placed in
> >> any queue that does not correspond to the PE. This causes 48 processors
>
> >> to be used for a PE with only 16 slots. E.g., I might get:
> >> Job 1 mymachine.3.mpi mymachine.q.0 8 processors
> >> Job 2 mymachine.3.mpi mymachine.q.0 8 processors
> >> Job 3 mymachine.3.mpi mymachine.q.1 8 processors
> >> Job 4 mymachine.3.mpi mymachine.q.1 8 processors
> >> Job 5 mymachine.3.mpi mymachine.q.2 8 processors
> >> Job 6 mymachine.3.mpi mymachine.q.2 8 processors
> >> Job 7 qw
> >> Job 8 qw
> >> When I should get:
> >> Job 1 mymachine.0.mpi mymachine.q.0 8 processors
> >> Job 2 mymachine.0.mpi mymachine.q.0 8 processors
> >> Job 3 mymachine.1.mpi mymachine.q.1 8 processors
> >> Job 4 mymachine.1.mpi mymachine.q.1 8 processors
> >> Job 5 mymachine.2.mpi mymachine.q.2 8 processors
> >> Job 6 mymachine.2.mpi mymachine.q.2 8 processors
> >> Job 5 mymachine.3.mpi mymachine.q.3 8 processors
> >> Job 6 mymachine.3.mpi mymachine.q.3 8 processors
> >> If I try to then submit a job directly (with no wildcard) to the PE that
>
> >> all of the jobs were assigned to, it will not run because I have already
>
> >> far exceeded the slots limit for this PE.
> >> I should note that when I do not use wildcards, everything behaves as it
>
> >> should. E.g, a job submitted to mymachine.2.mpi will be assigned to
> >> mymachine.2.mpi and mymachine.2.q, and I cannot use more than 16 slots in
>
> >> mymachine.2.mpi at once.
> >> I searched the list, and although there seem to have been other problems
>
> >> with wildcards in the past, I have seen nothing that references this
> >> behavior. Does anyone have an explanation / workaround?
> >> Tim
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net
More information about the gridengine-users
mailing list