[GE users] subnodes disappeared

jsadino jsadino.queens at gmail.com
Thu Nov 19 03:52:47 GMT 2009

Hello Everyone,
I have been using Grid Engine 6.0u8 here at work about a year, but haven't dug in real deep to it yet.  I have 16 processors spread across 5 subnodes with 5 queues spread across the subnodes for jobs (see picture).  One of my subnodes, compute-0-4.local was stuck in an "au" state.  When I click on "Clear Error", it says that is is already in a non-error state.  By going to Queue Control -> Queue Instances -> Load and comparing this screen for the 0-4 node to this screen for another node that were working, I noticed that 0-4 was missing 15 entries (from load_avg to np_load_long) and that most of them were under the Fixed Attributes column (see picture).  So then I went to one of the 0-4 queue instances and clicked Customize+ -> Resource Filter, doubleclicked on load_avg, and changed it to 1.  Then, all of my 0-4 subnodes disappeared from my queue instances!  Then I did something else to something else (sorry, forget exactly), and then my 0-2, 0-3, and 0-7 queues disappeared too!  But when I go to Cluster Queues, it still says that I have 16 total with 14 available (see picture.  Currently running three jobs), so they are still there somewhere, I just can't see them.  Can anybody help me get my queues back, and then clear the "au" error states.  

I checked the gridengine/default/spool/qmaster/messages and didn't see anything relative to the 0-4 node that was obvious.

Sorry for the randomness, any help would be much appreciated before my boss finds out :)

Jeff Sadino


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2, "1.png"  Image/PNG (Name: "1.png") 135 KB. ]
    [ Unable to print this part. ]

    [ Part 3, "2.png"  Image/PNG (Name: "2.png") 80 KB. ]
    [ Unable to print this part. ]

    [ Part 4, "3.png"  Image/PNG (Name: "3.png") 137 KB. ]
    [ Unable to print this part. ]

    [ Part 5, "4.png"  Image/PNG (Name: "4.png") 168 KB. ]
    [ Unable to print this part. ]

More information about the gridengine-users mailing list