[GE users] How to change "S" and "aS" state on queue instances

Daniel Templeton Dan.Templeton at Sun.COM
Sat Jul 14 00:02:53 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hugo,

The 'S' state means that the queue is suspended on subordinate.  
Depending on how you configured the subordinate list, when, for example, 
an instance of your special queue is full on a given host, the other 
three queues will be suspended on that host.  It sounds like you may 
need to rethink you queue configurations.

Daniel

Hugo R. Hernandez-Mora wrote:
> Hello there,
> we have a cluster of ~300 nodes with two slots each, running Solaris 
> 10 u06/6 and SGE 6.06.   We have configured the qmaster with a shadow 
> server to migrate services if it fails.    Also, we have configured 
> four different queues depending on the kind of job to be submitted:
>      special queue: using the 100% of the resources of the cluster,
>    short queue: using all the nodes but only one slot per node.   For 
> jobs of less than 2 CPU hours,
>    medium queue: using ~30% of the resources (only one slot per 
> node).  For jobs of less than 12 CPU hours,
>    long queue: using ~10% of the resources (only one slot per node).   
> For jobs of unlimited time.
>
> The queues have a subordinance order in terms of resources: special 
> queue has subordinate the other three queues, and so on.
>
> Since two weeks ago, we have been experiencing a problem with the 
> cluster.  Most of the nodes are turning into "aS" state.  We have 
> verified if these nodes are too busy in terms of resources usage.  
> They are in good shape but masked with that state, preventing to run 
> jobs on them.   The only solution last time was to restart both 
> qmaster and the system went into a good state.   Now, we are 
> experiencing the same situation, no good info on the logfiles or 
> message files about the problem.   We did the same procedure for the 
> last time, restart the two qmasters, but now the nodes are marked with 
> "S" state.  We have tried to force to unsuspended the nodes without 
> any success.    No jobs can run into the marked nodes and the 
> resources of them are completely available.
>
> Can somebody help me with this problem?   I will appreciate it!
> Regards,
> - Hugo
>




    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list