[GE users] subnodes disappeared

reuti reuti at staff.uni-marburg.de
Thu Nov 19 10:03:17 GMT 2009


Hi,

Am 19.11.2009 um 04:52 schrieb jsadino:

> I have been using Grid Engine 6.0u8 here at work about a year, but  
> haven't dug in real deep to it yet.  I have 16 processors spread  
> across 5 subnodes with 5 queues spread across the subnodes for jobs  
> (see picture).  One of my subnodes, compute-0-4.local was stuck in  
> an "au" state.  When I click on "Clear Error", it says that is is  
> already in a non-error state.  By going to Queue Control -> Queue  
> Instances -> Load and comparing this screen for the 0-4 node to  
> this screen for another node that were working, I noticed that 0-4  
> was missing 15 entries (from load_avg to np_load_long) and that  
> most of them were under the Fixed Attributes column (see picture).   
> So then I went to one of the 0-4 queue instances and clicked  
> Customize+ ->

when there is a "+" after Customize, it means that a filter is active  
(just to filter the displayed list). In the window where you define  
the resources press "clear" and "save". All things should reappear.  
They were never gone from SGE.


> Resource Filter, doubleclicked on load_avg, and changed it to 1.   
> Then, all of my 0-4 subnodes disappeared from my queue instances!   
> Then I did something else to something else (sorry, forget  
> exactly), and then my 0-2, 0-3, and 0-7 queues disappeared too!   
> But when I go to Cluster Queues, it still says that I have 16 total  
> with 14 available (see picture.  Currently running three jobs), so  
> they are still there somewhere, I just can't see them.  Can anybody  
> help me get my queues back, and then clear the "au" error states.

State "au" is not an error which you can clear (this applies to the  
queue state "E"). With "au" SGE can't contact the nodes any longer:

- is the execd running on the nodes?
- any firewall preventing communication to the nodes?
- are the nodes reachable in the network with ssh or alike?

-- Reuti

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=227880

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list