[GE users] reimaged nodes unused

FL lengyel at gmail.com
Tue Nov 13 15:09:02 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,
I've had to reimage four nodes of an SGE cluster to repair disks.
The nodes are now online, and SGE sees them, but no jobs
are being allocated to them. One of them is in an error (E) state
(the node isn't indicated below. but it's one of the re-imaged nodes):


m248 qmaster # qstat -g c
CLUSTER QUEUE                   CQLOAD   USED  AVAIL  TOTAL aoACDS  cdsuE
-------------------------------------------------------------------------------
all.q                             0.87      0    101    104      3      0
p3.q                              0.80     25      4     30      0      1
p4.q                              1.63     19      0     19     12      0
x86_64.q                          0.90     19      0     19      0      0

This was cleared with qmod, but no jobs are being accepted.

I'm wondering how to get the reimaged nodes to accept jobs
again. The only change since the last re-imaging
is that I'm using a basic fair share policy (this is probably unrelated).

F

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list