[GE users] node(s) temporarily unavailable

Bill Knebel billk at metrumrg.com
Wed Mar 15 13:40:55 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Can you point me in the direction of where to find those parameters?

Bill

McCalla, Mac wrote:

>Hi Bill,
>
>You might check configuration parameters to make sure that
>load_report_time hasn't been set
>higher than max_unheard for some reason.
>
>Mac McCalla 
>
>-----Original Message-----
>From: Bill Knebel [mailto:billk at metrumrg.com] 
>Sent: Tuesday, March 14, 2006 3:41 PM
>To: users at gridengine.sunsource.net
>Subject: [GE users] node(s) temporarily unavailable
>
>I get the following error in the qmaster "messages" file upon submitting
>
>jobs when the cluster has been idle for a period of time.
>
>qmaster|headnode|E|got max. unheard timeout for target "execd" on host 
>"node15", can't delivering job "25434"
>
>The same message is repeated for all nodes.  Eventually, the jobs move 
>from the queue onto the nodes but it does take some time. A "qstat -f" 
>shortly after the jobs are submitted results in many nodes being listed 
>with a load average of NA and a stat of "au". Eventually. all of the 
>nodes come back and are available without any restart of sge.
>
>Any suggestions as to why this problem is occurring?
>
>Bill
>
>  
>

-- 
Bill Knebel, PharmD, Ph.D.
Principal Scientist
Metrum Research Group
2 Tunxis Road
Suite 112
Tariffville, CT 06081
email: billk at metrumrg.com
tel: (860) 930-1370

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list