[GE users] SGE jobs in "qw" state

Mark_Johnson at URSCorp.com Mark_Johnson at URSCorp.com
Mon May 22 19:50:33 BST 2006


I have built a Rocks 4.1 Cluster, and am trying to resolve a problem with
the SGE.

I can submit jobs to the queue, but once sibmitted they just sit thre in
the "qw" state.  I have received good help from the Rocks community, but am
still unable to get the jobs to start.  Below are a few lines from the
/opt/gridengine/default/spool/qmaster/message.  It looks like the qmaster
cannot contact the "execd" on the nodes and timesout ?

Any thoughts or ideas are appreciated..

ps...dumb it down for me as I have a Windows Handicap...

Mark,

05/22/2006 10:39:06|qmaster|medusa|I|execd on compute-0-179.local
registered
05/22/2006 10:39:06|qmaster|medusa|I|execd on compute-0-178.local
registered
05/22/2006 10:39:07|qmaster|medusa|I|execd on compute-0-180.local
registered
05/22/2006 10:40:11|qmaster|medusa|E|got max. unheard timeout for target
"execd" on host "compute-0-157.local", can't delivering job "42"
05/22/2006 10:40:11|qmaster|medusa|W|rescheduling job 42.1
05/22/2006 10:40:11|qmaster|medusa|E|failed delivering job 42.1
05/22/2006 10:40:26|qmaster|medusa|E|got max. unheard timeout for target
"execd" on host "compute-0-156.local", can't delivering job "42"
05/22/2006 10:40:26|qmaster|medusa|W|rescheduling job 42.1
05/22/2006 10:40:26|qmaster|medusa|E|failed delivering job 42.1
05/22/2006 10:40:32|qmaster|medusa|I|urs1 has deleted job 42
[urs1 at medusa qmaster]$




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list