[GE users] sge6.2 pe bug ?

mpcsr Michael.Phillips at csr.com
Wed Dec 10 16:45:48 GMT 2008


Hi Andreas,

I found the problem. The host resource CSR_SITE was not set on a few
machines which meant that they were not being considered for running the
jobs by the scheduler.

Thanks for your and Reuti's help


Mike

> -----Original Message-----
> From: Andreas.Haas at sun.com [mailto:Andreas.Haas at sun.com]
> Sent: 10 December 2008 15:39
> To: users at gridengine.sunsource.net
> Subject: RE: [GE users] sge6.2 pe bug ?
> 
> Hi Mike,
> 
> - How is the CSR_SITE=sj attached to these queue instances? As static
> queue/host attribute or maybe as a load value?
> 
> - What gets you qstat -j <jobid> ?
> 
> - What is the allocation rule of calibre PE?
> 
> - Have you tried qselect -pe calibre -q all.q -l CSR_SITE=sj ?
> 
> - Are there any resource quotas in use?
> 
> - Any differences per host/hostgroup with the all.q configuration?
> 
> Regards,
> Andreas
> 
> On Wed, 10 Dec 2008, mpcsr wrote:
> 
> > Hi,
> >
> > I work with Colin,
> >
> > $ qselect -q all.q -l CSR_SITE=sj
> > all.q at camunxgrd19.europe.root.pri
> > all.q at camunxgrd10.europe.root.pri
> > all.q at benny.csr.com
> > all.q at brain.csr.com
> > all.q at camunxgrd09.europe.root.pri
> > all.q at camunxgrd21.europe.root.pri
> > all.q at choo-choo.csr.com
> > all.q at clyde.csr.com
> > all.q at dastardly.csr.com
> > all.q at pitstop.csr.com
> > all.q at rockets.csr.com
> > all.q at snoozy.csr.com
> > all.q at softy.csr.com
> > all.q at spook.csr.com
> > all.q at tom.csr.com
> > all.q at yakyak.csr.com
> > all.q at zippy.csr.com
> > all.q at jerry.europe.root.pri
> >
> > Submit without the queue name:
> >
> > $ qsub -pe calibre 8 -l CSR_SITE=sj
> > /home/sgeadm/sge/examples/jobs/simple.sh
> > Your job 66272 ("simple.sh") has been submitted
> > -bash-3.00$ qstat -u mp05
> > job-ID  prior   name       user         state submit/start at
queue
> > slots ja-task-ID
> >
------------------------------------------------------------------------
> > -----------------------------------------
> >  66272 0.60500 simple.sh  mp05         r     12/10/2008 13:42:22
> > all.q at camunxgrd10.europe.root.     8
> >
> > Submit with the queue name:
> >
> > $ qsub -q all.q -pe calibre 8 -l CSR_SITE=sj
> > /home/sgeadm/sge/examples/jobs/simple.sh
> > Your job 66281 ("simple.sh") has been submitted
> > -bash-3.00$ qstat -u mp05
> > job-ID  prior   name       user         state submit/start at
queue
> > slots ja-task-ID
> >
------------------------------------------------------------------------
> > -----------------------------------------
> >  66281 0.60500 simple.sh  mp05         qw    12/10/2008 13:44:24
> > 8
> > -bash-3.00$
> >
> >
> > Mike Phillips
> >
> >> -----Original Message-----
> >> From: Andreas.Haas at sun.com [mailto:Andreas.Haas at sun.com]
> >> Sent: 10 December 2008 10:13
> >> To: users at gridengine.sunsource.net
> >> Subject: Re: [GE users] sge6.2 pe bug ?
> >>
> >> On Wed, 10 Dec 2008, Colin Thomas wrote:
> >>
> >>> Hi,
> >>>
> >>> We are running with SGE 6.2 code.
> >>>
> >>> There would appear to be a buglet in the parallel code .
> >>>
> >>> qsub -pe calibre 8 -l CSR_SITE=sj
> >>> /home/sgeadm/sge/examples/jobs/simple.sh            DOES WORKS
> >>> qsub -q all.q -pe calibre 8
/home/sgeadm/sge/examples/jobs/simple.sh
> >>> DOES WORKS
> >>> qsub -q all.q -pe calibre 8 -l CSR_SITE=sj
> >>> /home/sgeadm/sge/examples/jobs/simple.sh DOES NOT WORK - pends
> >>>
> >>> => with a pe, can specify a queue OR a -l , but NOT both ???
> >>
> >> Both can be specified. Is -l CSR_SITE=sj available with -q all.q?
> >>
> >> Regards,
> >> Andreas
> >>
> >>>
> >>> Has this been reported/observed before, or have I missed something
?
> >>>
> >>> I look forward to your combined wisdom..
> >>>
> >>>
> >>>
> >>> Best regards
> >>>
> >>>
> >>>
> >>> Colin Thomas
> >>>
> >>> ------------------------------------------------------
> >>>
> >>
> >
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
> > Id
> >> =92039
> >>>
> >>> To unsubscribe from this discussion, e-mail: [users-
> >> unsubscribe at gridengine.sunsource.net].
> >>>
> >>
> >> http://gridengine.info/
> >>
> >> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
D-85551
> >> Kirchheim-Heimstetten
> >> Amtsgericht Muenchen: HRB 161028
> >> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland
> > Boemer
> >> Vorsitzender des Aufsichtsrates: Martin Haering
> >>
> >> ------------------------------------------------------
> >>
> >
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
> > Id
> >> =92043
> >>
> >> To unsubscribe from this discussion, e-mail: [users-
> >> unsubscribe at gridengine.sunsource.net].
> >>
> >>
> >>  To report this email as spam click
> >>
> >
https://www.mailcontrol.com/sr/YIRRDgpvhrLTndxI!oX7UlFqrXJaieUTSCnvOdpUG
> > 2w
> >> EU8ofQRCIfDEcP0CAL14TkRl9lFEttW9vYd5fHr7iGA== .
> >
> > ------------------------------------------------------
> >
>
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
Id
> =92071
> >
> > To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].
> >
> 
> http://gridengine.info/
> 
> Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551
> Kirchheim-Heimstetten
> Amtsgericht Muenchen: HRB 161028
> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland
Boemer
> Vorsitzender des Aufsichtsrates: Martin Haering
> 
> ------------------------------------------------------
>
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
Id
> =92086
> 
> To unsubscribe from this discussion, e-mail: [users-
> unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=92090

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list