[GE users] GridEngine fails to start/run

Brady Catherman bradyc at uidaho.edu
Mon Mar 13 20:50:26 GMT 2006


Okay.. I think I fixed it.. It appears that some of the files in  
spool/qmaster/qinstances/ had gone all screwy on me. When I attempted  
to cat them they started giving Input/Output errors.. I erased my  
entire hostgroup configuration and recreated it and now it appears to  
schedule and run =)

I am getting a whole new problem though.. I remember seeing this and  
fixing it when I first set the system up but I just recompiled and  
now ts coming back again. When I run qstat -j I get:
critical error: !!!!!!!!!! lGetUlong(): got NULL element for  
ULNG !!!!!!!!!!

anybody know the quick fix? =)



On Mar 13, 2006, at 11:18 AM, Rayson Ho wrote:

> Is the qmaster process actually running on the master node?? When you
> restart SGE, did the old one actually exit??
>
> Any "interesting" messages in the qmaster log file??
>
> Rayson
>
>
>
> On 3/13/06, Brady Catherman <bradyc at uidaho.edu> wrote:
>> What is odd though is that there are 1226 jobs queued on Phlegathon
>> (our 32 node Mac cluster) and it is having all sorts of problems.
>> qstat fails, errors on startup and such. At the same time there is
>> 10K jobs on our Linux cluster and it is running like a champ =)
>>
>> This is classic spooling with 6.0 u7.
>>
>>
>> # qstat
>> failed receiving gdi request
>>
>> #
>>
>>
>> On Mar 13, 2006, at 10:56 AM, McCalla, Mac wrote:
>>
>>>  Hi Brady,
>>>
>>>       Are you sure the processes of sge_qmaster and sge_schedd have
>>> actually failed?
>>> If our system is loaded up (lots of jobs), I see these messages when
>>> (re)starting qmaster/schedd,
>>> but eventually (may take several minutes), the scheduler will  
>>> register
>>> with qmaster
>>> and things are fine.
>>>
>>> BTW Is this a classic spooling or BDB install?  and what version of
>>> SGE?
>>>
>>> Mac McCalla
>>> Geoscience Systems Consultant
>>> Amerada Hess Corporation
>>> 500 Dallas St. , Houston, Texas  77002
>>> Office: 713 609-5434
>>>
>>>
>>> -----Original Message-----
>>> From: Brady Catherman [mailto:bradyc at uidaho.edu]
>>> Sent: Monday, March 13, 2006 12:41 PM
>>> To: users at gridengine.sunsource.net
>>> Subject: [GE users] GridEngine fails to start/run
>>>
>>> On our Mac OS 10.4 system Grid Engine just started failing to
>>> startup. This same exact build was working fine up until today.
>>> Everything has started getting gdi failures. I have no clue what
>>> would have cause grid engine to just start failing all of a sudden.
>>> There are no errors in the qmaster/messages file so I have no clue
>>> where to start troubleshooting..
>>>
>>> # /opt/sge/default/common/sgemaster start
>>>     starting sge_qmaster
>>>     starting sge_schedd
>>> daemonize error: timeout while waiting for daemonize state
>>> error: getting configuration: failed receiving gdi request
>>>
>>> # qstat
>>> failed receiving gdi request
>>>
>>>
>>> This is Grid Engine 6.0u7 on Mac OS 10.4.5
>>>
>>> Anybody have any ideas where to start with this one?
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list