[GE users] sge_schedd process getting shutdown automatically

manju a manju.kudu at gmail.com
Thu Sep 18 07:46:58 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

i didnt find any thing in /var/log/messages/, but i can tell you one thing
master server got rebooted twice due to some power problem a month ago !!!
but there is gap, monitoring the same!!! i will restart the SGE as you told,
lets see what will happen!!

On Tue, Sep 16, 2008 at 4:06 PM, Reuti <reuti at staff.uni-marburg.de> wrote:

> Am 16.09.2008 um 07:12 schrieb manju a:
>
>  i dont think so, already two times it has happen in two weeks!!!! its very
>> strange.. i can see this messages in the following path
>> /$SGE_ROOT/farm1/spool/qmaster/schedd/message...
>>
>
> ...and then you just restarted the scheduler process or the complete SGE?
> Shutting down the qmaster/scheduler on the master machine will not affect
> running jobs. Maybe a complete shutdown and restart removes any strange
> configuration in the setup when you perform it.
>
> The master machine has plenty of memory (you can check with `free`an
> `vmstat`)? Is there something in the /var/log/messages file of the operating
> system, like the OOM-Killer is doing it, although I'm not sure that this
> would be graceful shutdown with this proper message (I would assume more a
> sigkill and no further output of the scheduler)?
>
> -- Reuti
>
>
>
>  thanks
>> manjunath A
>>
>>
>> On Mon, Sep 15, 2008 at 8:50 PM, Reuti <reuti at staff.uni-marburg.de>
>> wrote:
>> Hi,
>>
>> Am 14.09.2008 um 10:51 schrieb manju a:
>>
>>
>> we are seeing some strange problem in our SGE6.1u2 farm, sge_schedd
>> process  is getting killed automatically after some days, no idea whats
>> going on....
>>
>> but sge_qmaster process seems to be remain same. only sge_schedd process
>> being killed.
>>
>> here is the message file says
>>
>> 09/04/2008 19:07:39|schedd|masterserver|I|controlled shutdown 6.1u2
>>
>> this looks like a proper shutdown, i.e not a crash. So: where is it coming
>> from? Did someone issue "qconf -ks" by accident?
>>
>> -- Reuti
>>
>>
>> but we never stopped any process, not sure why its getting shutdown!!!
>>  please help us on this!!!
>>
>> thanks
>> manjunath A
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>



More information about the gridengine-users mailing list