[GE users] GridEngine fails to start/run

Rayson Ho rayrayson at gmail.com
Mon Mar 13 19:18:09 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Is the qmaster process actually running on the master node?? When you
restart SGE, did the old one actually exit??

Any "interesting" messages in the qmaster log file??

Rayson



On 3/13/06, Brady Catherman <bradyc at uidaho.edu> wrote:
> What is odd though is that there are 1226 jobs queued on Phlegathon
> (our 32 node Mac cluster) and it is having all sorts of problems.
> qstat fails, errors on startup and such. At the same time there is
> 10K jobs on our Linux cluster and it is running like a champ =)
>
> This is classic spooling with 6.0 u7.
>
>
> # qstat
> failed receiving gdi request
>
> #
>
>
> On Mar 13, 2006, at 10:56 AM, McCalla, Mac wrote:
>
> >  Hi Brady,
> >
> >       Are you sure the processes of sge_qmaster and sge_schedd have
> > actually failed?
> > If our system is loaded up (lots of jobs), I see these messages when
> > (re)starting qmaster/schedd,
> > but eventually (may take several minutes), the scheduler will register
> > with qmaster
> > and things are fine.
> >
> > BTW Is this a classic spooling or BDB install?  and what version of
> > SGE?
> >
> > Mac McCalla
> > Geoscience Systems Consultant
> > Amerada Hess Corporation
> > 500 Dallas St. , Houston, Texas  77002
> > Office: 713 609-5434
> >
> >
> > -----Original Message-----
> > From: Brady Catherman [mailto:bradyc at uidaho.edu]
> > Sent: Monday, March 13, 2006 12:41 PM
> > To: users at gridengine.sunsource.net
> > Subject: [GE users] GridEngine fails to start/run
> >
> > On our Mac OS 10.4 system Grid Engine just started failing to
> > startup. This same exact build was working fine up until today.
> > Everything has started getting gdi failures. I have no clue what
> > would have cause grid engine to just start failing all of a sudden.
> > There are no errors in the qmaster/messages file so I have no clue
> > where to start troubleshooting..
> >
> > # /opt/sge/default/common/sgemaster start
> >     starting sge_qmaster
> >     starting sge_schedd
> > daemonize error: timeout while waiting for daemonize state
> > error: getting configuration: failed receiving gdi request
> >
> > # qstat
> > failed receiving gdi request
> >
> >
> > This is Grid Engine 6.0u7 on Mac OS 10.4.5
> >
> > Anybody have any ideas where to start with this one?
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list