[GE users] au state

Wheeler, Dr M.D. mdw10 at leicester.ac.uk
Wed May 11 16:30:12 BST 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

thx

----------------------------------------------
Dr. Martyn D. Wheeler
Department of Chemistry
University of Leicester
University Road
Leicester, LE1 7RH, UK.
Tel (office): +44 (0)116 252 3985
Tel (lab):    +44 (0)116 252 2115
Fax:          +44 (0)116 252 3789
Email:        martyn.wheeler at le.ac.uk
http://www.le.ac.uk/chemistry/staff/mdw10.html
 

> -----Original Message-----
> From: Chris Dagdigian [mailto:dag at sonsorol.org]
> Sent: 11 May 2005 16:29
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] au state
> 
> 
> 
> 
> When Grid Engine starts up it echos the hostname (as it understands  
> it) of the qmaster server to a location:
> 
> $SGE_ROOT/<cell>/common/act_qmaster
> 
> When compute nodes start up they read the act_qmaster file to learn  
> which host they need to connect to.
> 
> There are probably a few issues here, the biggest one is that your  
> compute nodes can't figure out how to get to a machine named  
> "marvin.local"
> 
> This would be caused by not having DNS configured for this hostname  
> or the compute nodes not having a valid entry for marvin.local in  
> their /etc/hosts file.
> 
> You can fix this in DNS, in /etc/hosts or by changing the FDQN of  
> your qmaster host. If none of these can be changed easily 
> there is an  
> mechanism within SGE called "host_aliases" whereby you can remap  
> "marvin.local" to an IP address that you know the nodes can get at.
> 
> Good debug tools can be found in SGE_ROOT/utilbin/ -- in that  
> directory are applications that will let you see exactly what Grid  
> Engine thinks about hostname resolution and lookups.
> 
> If I'm totally wrong about this being a hostname/resolution issue  
> here are some other possibilities:
> 
> (1) You have a firewall blocking port 535  ( probably not the 
> case if  
> you had SGE working previously)
> 
> (2) Equally possible is that sge_qmaster is not actually running on  
> host marvin.local or had some sort of fatal startup problem. 
> This can  
> happen if previous SGE daemons did not exit cleanly
> 
> -Chris
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On May 11, 2005, at 11:11 AM, Wheeler, Dr M.D. wrote:
> 
> > i just rebooted my machines and now i get the error:
> > unable to contact qmaster via "marvin.local" commd using port 535  
> > (service "sge_commd")
> >
> > help please.....
> >
> > Martyn
> >
> > ----------------------------------------------
> > Dr. Martyn D. Wheeler
> > Department of Chemistry
> > University of Leicester
> > University Road
> > Leicester, LE1 7RH, UK.
> > Tel (office): +44 (0)116 252 3985
> > Tel (lab):    +44 (0)116 252 2115
> > Fax:          +44 (0)116 252 3789
> > Email:        martyn.wheeler at le.ac.uk
> > http://www.le.ac.uk/chemistry/staff/mdw10.html
> >
> >
> >
> >> -----Original Message-----
> >> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> >> Sent: 11 May 2005 16:04
> >> To: users at gridengine.sunsource.net
> >> Subject: Re: [GE users] au state
> >>
> >>
> >> What it:
> >>
> >> qalter -w v 1932
> >>
> >> saying? - Reuti
> >>
> >> Wheeler, Dr M.D. wrote:
> >>
> >>> here's some more output
> >>>
> >>>
> >>> # qconf -sp molpro
> >>> pe_name           molpro
> >>> queue_list        compute-0-0.q compute-0-2.q
> >>> slots             999
> >>> user_lists        NONE
> >>> xuser_lists       NONE
> >>> start_proc_args   /home/software/scripts/startmolpro.sh
> >>>
> >> -catch_rsh $pe_hostfile
> >>
> >>> stop_proc_args    /home/software/scripts/stopmolpro.sh
> >>> allocation_rule   $fill_up
> >>> control_slaves    TRUE
> >>> job_is_first_task FALSE
> >>>
> >>>
> >>> # qstat -r
> >>> job-ID  prior name       user         state submit/start at
> >>>
> >>     queue      master  ja-task-ID
> >>
> >>>
> >>>
> >> --------------------------------------------------------------
> >> -------------------------------
> >>
> >>>    1929     0 HCl_H2O_2+ victorm      r     05/11/2005
> >>>
> >> 11:35:38 compute-0- MASTER
> >>
> >>>        Full jobname:     HCl_H2O_2+SP
> >>>        Master queue:     compute-0-0.q
> >>>        Requested PE:     molpro 2
> >>>        Granted PE:       molpro 2
> >>>        Hard Resources:   h_rt=500:00:00
> >>>                          virtual_free=2900M
> >>>                          h_fsize=30G
> >>>                          arch=lx24-amd64
> >>>             0 HCl_H2O_2+ victorm      r     05/11/2005
> >>>
> >> 11:35:38 compute-0- SLAVE
> >>
> >>>             0 HCl_H2O_2+ victorm      r     05/11/2005
> >>>
> >> 11:35:38 compute-0- SLAVE
> >>
> >>>    1931     0 formal     nantakorn    qw    05/11/2005 13:08:15
> >>>        Full jobname:     formal
> >>>        Requested PE:     molpro 2
> >>>        Hard Resources:   h_rt=500:00:00
> >>>                          virtual_free=2900M
> >>>                          h_fsize=30G
> >>>                          arch=lx24-amd64
> >>>    1932     0 p          nantakorn    qw    05/11/2005 13:38:07
> >>>        Full jobname:     p
> >>>        Requested PE:     molpro 2
> >>>        Hard Resources:   h_rt=500:00:00
> >>>                          virtual_free=2900M
> >>>                          h_fsize=30G
> >>>                          arch=lx24-amd64
> >>>
> >>> ----------------------------------------------
> >>> Dr. Martyn D. Wheeler
> >>> Department of Chemistry
> >>> University of Leicester
> >>> University Road
> >>> Leicester, LE1 7RH, UK.
> >>> Tel (office): +44 (0)116 252 3985
> >>> Tel (lab):    +44 (0)116 252 2115
> >>> Fax:          +44 (0)116 252 3789
> >>> Email:        martyn.wheeler at le.ac.uk
> >>> http://www.le.ac.uk/chemistry/staff/mdw10.html
> >>>
> >>>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> >>>> Sent: 11 May 2005 13:57
> >>>> To: users at gridengine.sunsource.net
> >>>> Subject: Re: [GE users] au state
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> the requested PE also has compute-0-2.q in it's list? How
> >>>> many slots did
> >>>> you request?
> >>>>
> >>>> CU - Reuti
> >>>>
> >>>>
> >>>> Wheeler, Dr M.D. wrote:
> >>>>
> >>>>
> >>>>> compute0-0, compute0-1, and compute0-2, are identical, so i
> >>>>>
> >>>>
> >>>> figure it should bypass c0-1 and move onto c0-2
> >>>>
> >>>>
> >>>>> ----------------------------------------------
> >>>>> Dr. Martyn D. Wheeler
> >>>>> Department of Chemistry
> >>>>> University of Leicester
> >>>>> University Road
> >>>>> Leicester, LE1 7RH, UK.
> >>>>> Tel (office): +44 (0)116 252 3985
> >>>>> Tel (lab):    +44 (0)116 252 2115
> >>>>> Fax:          +44 (0)116 252 3789
> >>>>> Email:        martyn.wheeler at le.ac.uk
> >>>>> http://www.le.ac.uk/chemistry/staff/mdw10.html
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Robert Griffiths
> >>>>>> [mailto:Robert.Griffiths at mitsubishi-sec-intl.com]
> >>>>>> Sent: 11 May 2005 13:40
> >>>>>> To: 'users at gridengine.sunsource.net'
> >>>>>> Subject: RE: [GE users] au state
> >>>>>>
> >>>>>>
> >>>>>> Well, it looks like the scheduler knows about both
> >>>>>> compute-0-1.q and you've
> >>>>>> filled-up compute-0-0.q.
> >>>>>>
> >>>>>> You must now look at how job 1931 was submitted - does it
> >>>>>>
> >>>>
> >>>> request any
> >>>>
> >>>>
> >>>>>> resources which are available only on compute-0-0 or
> >>>>>> compute-0-1. It could
> >>>>>> be that it's requesting something which *doesn't exist* on
> >>>>>> compute-0-2 or
> >>>>>> your 32-bit node hence it can't be scheduled.
> >>>>>>
> >>>>>> Cheers,
> >>>>>>
> >>>>>> Rob
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Wheeler, Dr M.D. [mailto:mdw10 at leicester.ac.uk]
> >>>>>> Sent: 11 May 2005 13:36
> >>>>>> To: users at gridengine.sunsource.net
> >>>>>> Subject: RE: [GE users] au state
> >>>>>>
> >>>>>>
> >>>>>> # qstat -j
> >>>>>> scheduling info:            queue "compute-0-1.q" dropped
> >>>>>> because it is
> >>>>>> temporarily not available
> >>>>>>                           queue "compute-0-0.q" dropped
> >>>>>> because it is full
> >>>>>>
> >>>>>> Jobs cannot run because resources requested are not available
> >>>>>> for parallel
> >>>>>> job
> >>>>>>       1931
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 
> ****************************************************************
> >>>>>> Mitsubishi Securities International plc ('MSI') is
> >>>>>> registered in England, company number 1698498 and
> >>>>>> registered office at 6 Broadgate, London EC2M 2AA.
> >>>>>> MSI is part of the Mitsubishi Tokyo Financial Group
> >>>>>> and is authorised and regulated by The Financial
> >>>>>> Services Authority. This message is intended solely
> >>>>>> for the individual addressee named above. The
> >>>>>> information contained in this e-mail is confidential
> >>>>>> and may be legally privileged. If you are not the
> >>>>>> intended recipient please delete in its entirety.
> >>>>>> Messages sent via this medium may be subject to
> >>>>>> delays, non-delivery and unauthorised alteration.
> >>>>>> The information contained herein or attached hereto
> >>>>>> has been obtained from sources we believe to be
> >>>>>> reliable but we do not represent that it is accurate
> >>>>>> or complete. Any reference to past performance should
> >>>>>> not be taken as an indication of future performance.
> >>>>>> The information contained herein or attached hereto
> >>>>>> is not to be construed as an offer or solicitation to
> >>>>>> buy or sell any security, instrument or investment.
> >>>>>> MSI or any affiliated company, may have an interest,
> >>>>>> position, or effect transactions, in any investment
> >>>>>> mentioned herein. Any opinions or recommendations
> >>>>>> expressed herein are solely those of the author or
> >>>>>> analyst and are subject to change without notice.
> >>>>>>
> >>>>>>
> >>>>>> ------------------------------------------------------------
> >>>>>>
> >>>>
> >>>> ---------
> >>>>
> >>>>
> >>>>>> To unsubscribe, e-mail: users- 
> >>>>>> unsubscribe at gridengine.sunsource.net
> >>>>>> For additional commands, e-mail:
> >>>>>>
> >> users-help at gridengine.sunsource.net
> >>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>> ------------------------------------------------------------
> >>>>
> >> ---------
> >>
> >>>>
> >>>>
> >>>>> To unsubscribe, e-mail: 
> users-unsubscribe at gridengine.sunsource.net
> >>>>> For additional commands, e-mail:
> >>>>>
> >> users-help at gridengine.sunsource.net
> >>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> ------------------------------------------------------------
> >>>>
> >> ---------
> >>
> >>>> To unsubscribe, e-mail: 
> users-unsubscribe at gridengine.sunsource.net
> >>>> For additional commands, e-mail: users- 
> >>>> help at gridengine.sunsource.net
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >> 
> ---------------------------------------------------------------------
> >>
> >>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>> For additional commands, e-mail: 
> users-help at gridengine.sunsource.net
> >>>
> >>>
> >>
> >>
> >> 
> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >> For additional commands, e-mail: 
> users-help at gridengine.sunsource.net
> >>
> >>
> >>
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list