[GE users] qlogin put node in Error state

reuti reuti at staff.uni-marburg.de
Tue Oct 12 09:13:30 BST 2010


Am 11.10.2010 um 20:04 schrieb gg3796:

> Thanks Reuti:
>  
> I am using builtin:
>  
> qlogin_command               builtin
> qlogin_daemon                builtin
> rlogin_command               builtin
> rlogin_daemon                builtin
> rsh_command                  builtin
> rsh_daemon                   builtin
>  
> local_configuration for hosts doesn't have any thing related, only following 3 lines
> mailer                       /bin/mail
> xterm                        /usr/bin/xterm
> execd_spool_dir              /var/sge/6.2u3/california/spool/

Fine.


> I can ssh to the hosts without any problem. It was all working well until I upgraded all submit and executaion hosts to rhel5.4. One thing I would like to mention is SGEMASTER  is still running RHEL4.X. Do you think that may be the problem.

Do you see any additional hint in the messages file of the qmaster and/or the involved nodes?

-- Reuti


> Regards,
> Babar
>  
> 
> From: reuti <reuti at staff.uni-marburg.de>
> To: users at gridengine.sunsource.net
> Sent: Mon, October 11, 2010 2:19:50 AM
> Subject: Re: [GE users] qlogin put node in Error state
> 
> Hi,
> 
> Am 09.10.2010 um 05:02 schrieb gg3796:
> 
> > I am running 6.2u3. since we upgraded our  Desktops and Servers to RHEL5.4 qlogin put the Exec host to E state.
> 
> what is your startup method for `qlogin` (`qconf -sconf` and/or the local configuration of each exechost)? I would assume, that the "telnetd" or "telnet" wasn't installed and you are not using -builtin-. NB: "telnetd" can stay disabled in /etc/xinit.d/telnetd as SGE will start its own instance of `telnetd`.
> 
> -- Reuti
> 
> 
> 
> > The only message is see in the exec host spool message file is:
> > 10/08/2010 19:49:35|  main|cluster-1|E|shepherd of job 4456333.1 exited with exit status = 11
> >  
> >  
> > The job status email has following lines in it:
> >  
> >  
> > Job 4456333 caused action: Queue "pd.q at cluster-1.xyz.com" set to ERROR
> > 
> > User = babar
> > 
> > Queue = pd.q at c8-1.xyz.com
> > 
> > Start Time = <unknown>
> > 
> > End Time = <unknown>
> > 
> > failed before job:10/08/2010 19:49:34 [511:4487]: startup of qrsh job failed:
> > 
> >  
> >  
> >  
> >  
> > Thanks,
> > 
> > Babar
> > 
> >  
> >  
> > 
> >
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=286475
> 
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
> 
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=286556

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list