[GE users] unavailable nodes and loadleveling

Jiann-Ming Su sujiannming at gmail.com
Thu Apr 28 19:31:26 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

I've been thrown into the SGE fire.  I'm responsible for maintaining
an already running SGE cluster.  One of the problems I'm seeing is
that jobs are not being dispersed to all nodes.  After doing a little
bit of searching, it seems like not all of my nodes are available,
even though they
are physically up.  I run "qstat -j" and get the following for all the
nodes that don't seem to be available.

  queue instance "all.q at node16.mydomain.bogus" dropped because it is
temporarily not available

How do I verify a node's participation in the grid?  And, where are
the config files located?  Qmon seems to be the preferred config tool,
but I'm more comfortable editing text files.  Thanks for any tips.
-- 
Jiann-Ming Su
"I have to decide between two equally frightening options. 
 If I wanted to do that, I'd vote." --Duckman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list