[GE users] node(s) temporarily unavailable

McCalla, Mac macmccalla at hess.com
Tue Mar 14 22:03:38 GMT 2006


Hi Bill,

You might check configuration parameters to make sure that
load_report_time hasn't been set
higher than max_unheard for some reason.

Mac McCalla 

-----Original Message-----
From: Bill Knebel [mailto:billk at metrumrg.com] 
Sent: Tuesday, March 14, 2006 3:41 PM
To: users at gridengine.sunsource.net
Subject: [GE users] node(s) temporarily unavailable

I get the following error in the qmaster "messages" file upon submitting

jobs when the cluster has been idle for a period of time.

qmaster|headnode|E|got max. unheard timeout for target "execd" on host 
"node15", can't delivering job "25434"

The same message is repeated for all nodes.  Eventually, the jobs move 
from the queue onto the nodes but it does take some time. A "qstat -f" 
shortly after the jobs are submitted results in many nodes being listed 
with a load average of NA and a stat of "au". Eventually. all of the 
nodes come back and are available without any restart of sge.

Any suggestions as to why this problem is occurring?

Bill

-- 
Bill Knebel, PharmD, Ph.D.
Principal Scientist
Metrum Research Group
2 Tunxis Road
Suite 112
Tariffville, CT 06081
email: billk at metrumrg.com
tel: (860) 930-1370

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list