[GE users] sge v6.0u3 update from v6.0u1

Reuti reuti at staff.uni-marburg.de
Mon Mar 21 23:04:02 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

did you installed it in a complete new directory, or just over the old 
installation? In this case there maybe some old spooling information inside. 
Otherwise: when you want to keep the configuration files - and they are still 
intact, you can try to empty only the spooling directories: jobs, jobs_scripts, 
zombies - for the exec nodes also active_jobs (you can save these dirs [or move 
them instead], before you delete to much by accident). I see this behavior on a 
node, when it crashed and refuses to startup SGE, although it looks okay.

Cheers - Reuti


Quoting Viktor Oudovenko <udo at physics.rutgers.edu>:

> Linux Suse 7.3 with new kernel:
> 
> rupc-cs04b:/opt/SGE/default/common # uname -a
> Linux rupc-cs04b 2.4.28 #9 SMP Wed Dec 8 14:52:03 EST 2004 i686 unknown
> 
> It is not. I just made a fresh installation of 6.0u4 and it worked but
> previous one which I want to keed as all the hosts and all the setting I
> defined there does not want to start.
> 
> You know the key word here is crash. Something was written somewhere that
> qmaster does not want to start. It is not the problem of busy ports of it
> the problem that master does not start!
> Any help and ideas are welcome ! I am really running out of time.
> Best,
> v
> 
> 
> > -----Original Message-----
> > From: Ovid Jacob [mailto:ovid.jacob at sun.com] 
> > Sent: Monday, March 21, 2005 17:20
> > To: users at gridengine.sunsource.net
> > Cc: Ovid.Jacob at sun.com
> > Subject: Re: [GE users] sge v6.0u3 update from v6.0u1
> > 
> > 
> > Viktor,
> > 
> > What OS are you running?
> > 
> > Check that port 536 is not used by some other procces?
> > 
> > grep 536 /etc/services
> > 
> > If you get a non-empty string, try changing the ports to 
> > something like
> > 
> > sge_qmaster 836/tcp #SGE_PORT
> > sge_execd 837/tcp #SGE_PORT
> > 
> > 
> > Viktor Oudovenko wrote:
> > > Hi, guys,
> > > 
> > > Did anybody meet this problem:
> > > 
> > > rupc-cs04b:/opt/SGE/default/spool/qmaster # 
> > /etc/init.d/sgemaster start
> > >    starting sge_qmaster
> > >    starting sge_schedd
> > > error: commlib error: got read error (closing connection)
> > > error: commlib error: can't connect to service (socket error 
> > > errno=111)
> > > error: getting configuration: unable to contact qmaster 
> > using port 536 on
> > > host "rupc-cs04b" can't get configuration from qmaster -- 
> > waiting ...
> > > error: can't connect to service
> > > can't get configuration from qmaster -- waiting ...
> > > error: can't connect to service
> > > can't get configuration from qmaster -- waiting ...
> > > error: can't connect to service
> > > error: can't get configuration from qmaster -- backgrounding
> > > 
> > > 
> > > After server crush I could not start SGE 6.0u1 qmaster did 
> > not want to 
> > > start. I have upgraded  6.0u1 to 6.0u3 and got the messages above.
> > > 
> > > 
> > > In qmaster messages I have:
> > > 
> > > 
> > > rupc-cs04b:/opt/SGE/default/spool/qmaster # more messages 
> > 03/21/2005 
> > > 15:56:47|qmaster|rupc-cs04b|E|wrong cull version, read 
> > 0x00000000, but 
> > > expected actual version 0x10020000 03/21/2005 
> > > 15:56:47|qmaster|rupc-cs04b|E|error in init_packbuffer: wrong cull 
> > > version rupc-cs04b:/opt/SGE/default/spool/qmaster #
> > > 
> > > 
> > > Any ideas how to fix this? It is VERY urgent! Please help! 
> > Thank you 
> > > any body for attention and help!
> > > 
> > > Best,
> > > viktor
> > > 
> > > 
> > > 
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > 
> > -- 
> > 
> > 
> > take care,
> > ovid
> > 
> > ----------------------------------------------------------------
> > 	         "Your Windows system is my other computer."
> >                             Grid Engineering
> > 
> > http://namefinder.sfbay.sun.com/NameFinder?view=sunEmployees&n
> fquery=ovid+jacob
>                           http://tent.sfbay:88/
>                           http://www.mishkan.com
>                           ovid.jacob at sun.com
>                           x84774 (650.786.4774)
> -----------------------------------------------------------------
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list