[GE users] PE Slots Problem

Ivan Adzhubey iadzhubey at rics.bwh.harvard.edu
Tue Aug 14 17:42:24 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Brian,

Thanks for reply. Even though my setup is different, the connection between 
requested resources and queue instances configuration is a good starting 
point for debugging. The fact that you need to add any consumable resources 
manually to each queue instance is confusing, and I only recently have 
discovered this. Perhaps, I should give it another try with PE queues. Thanks 
for offering help, I sure come back as soon as I get my hands on it.

--Ivan

On Sunday 12 August 2007 03:14:13 am Brian R. Smith wrote:
> Ivan,
>
> I had created a boolean complex, t_devel, and set its default value to
> FALSE (qconf -mc), naively believing that this assignment would take for
> each queue instance.  So, for example, if all.q has no explicit
> definition for t_devel, it will automatically assume it to be FALSE.
> This, as it turns out, was not the case.  The queue devel.q had
> specified t_devel=TRUE so that development jobs would be sent to a set
> of over-subscribed (8 slots per processor) nodes for low time limit
> execution.  Essentially, any job that requested t_devel would be
> considered a non-production run and would be sent to the appropriate
> hardware (based on any other resource requests that were made).
>
> For normal jobs, t_devel is not considered or specified in the job's
> submit script but it was added to the file 'sge_request' as
> t_devel=FALSE in order specify a default value.  Because of this, each
> submitted job (except for development jobs) was requesting
> t_devel=FALSE, but each queue had t_devel as being undefined.  Since
> this didn't match up, no slots were made available to the parallel
> environment and hence my jobs could not run.
>
> To correct the problem, I simply added t_devel=FALSE to each one of my
> queues' complex lists so that it would correspond to the settings I
> added to sge_request.
>
> I don't know the details of your problem or if its related to this.  If
> you want to give me some details on your problem and perhaps some other
> information like queue configs, parallel environment configs, message
> file output, etc., I'd be willing to take a look.
>
> Brian
>
> Ivan Adzhubey wrote:
> > Hi Brian,
> >
> > I reported exactly the same problem with our dual-CPU cluster nodes about
> > a year ago and I was never able to get it resolved. I tried all possible
> > configurations and even reinstalled SGE from scratch couple of times but
> > nothing worked. Neither Reuti was able to help me even though he has
> > spent quite some time on this issue. So I'd appreciate if you can
> > elaborate on what exactly you've done to get it working. We normally do
> > not run much of projects requiring MPI but still may need it in the
> > future.
> >
> > Thanks,
> > Ivan
> >
> > On Thursday 09 August 2007 06:23:43 pm Brian R. Smith wrote:
> >> Got it.  It was just a problem with a boolean complex value not being
> >> addressed in the queue configuration.  Everything is working fine now.
> >> Thanks for your time.
> >>
> >> -Brian
> >>
> >> Brian R. Smith wrote:
> >>> Andreas & Reuti,
> >>>
> >>> No, there is no load threshold defined for that queue and there are no
> >>> other jobs running on the host.  The load is at 0.00.  Is there any
> >>> other possible information I can provide?
> >>>
> >>> Thanks for your help.
> >>>
> >>> -Brian


The information transmitted in this electronic communication is intended only for the person or entity to whom it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this information in error, please contact the Compliance HelpLine at 800-856-1983 and properly dispose of this information.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list