[GE users] sge6.2u3 - scheduler dying intermittantly

rpatterson patterso at mail.nih.gov
Tue Aug 11 12:59:51 BST 2009


Yes - we have $SGE_ROOT on a netapp. I've been worried about this all
along, but up until just recently, it has not been a problem. I think
I'm going to start digging into netapp stats today to see if there are
any clues there.

Thanks Harvey!




-----Original Message-----
From: bomb20 [mailto:Harvey.Richardson at zeenty.com] 
Sent: Tuesday, August 11, 2009 4:27 AM
To: users at gridengine.sunsource.net
Subject: Re: [GE users] sge6.2u3 - scheduler dying intermittantly

rpatterson wrote:
> Recently, I have been having trouble with the scheduler thread dying
on
> our master. I assume that this is what's happening because the
> sge_qmaster process is still running, and running jobs continue on
> without a problem, but client requests (qsub/qstat) can no longer make
a
> connection, and no new jobs are dispatched. Recently, this has been
> happening about once a week.

Do you have the $SGE_ROOT on NFS by any chance and is that reliable?
The reason I ask is that I have seen similar things when the directoy
goes away for a short time due to server or network issues.

Harvey

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessage
Id=211794

To unsubscribe from this discussion, e-mail:
[users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=211816

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list