[GE users] How to change "S" and "aS" state on queue instances

Hugo R. Hernandez-Mora hugo.hernandez at loni.ucla.edu
Fri Jul 13 22:56:02 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello there,
we have a cluster of ~300 nodes with two slots each, running Solaris 10 
u06/6 and SGE 6.06.   We have configured the qmaster with a shadow 
server to migrate services if it fails.    Also, we have configured four 
different queues depending on the kind of job to be submitted:
   
    special queue: using the 100% of the resources of the cluster,
    short queue: using all the nodes but only one slot per node.   For 
jobs of less than 2 CPU hours,
    medium queue: using ~30% of the resources (only one slot per node).  
For jobs of less than 12 CPU hours,
    long queue: using ~10% of the resources (only one slot per node).   
For jobs of unlimited time.

The queues have a subordinance order in terms of resources: special 
queue has subordinate the other three queues, and so on.

Since two weeks ago, we have been experiencing a problem with the 
cluster.  Most of the nodes are turning into "aS" state.  We have 
verified if these nodes are too busy in terms of resources usage.  They 
are in good shape but masked with that state, preventing to run jobs on 
them.   The only solution last time was to restart both qmaster and the 
system went into a good state.   Now, we are experiencing the same 
situation, no good info on the logfiles or message files about the 
problem.   We did the same procedure for the last time, restart the two 
qmasters, but now the nodes are marked with "S" state.  We have tried to 
force to unsuspended the nodes without any success.    No jobs can run 
into the marked nodes and the resources of them are completely available.

Can somebody help me with this problem?   I will appreciate it!
Regards,
- Hugo

-- 
Hugo R. Hernandez-Mora
System Administrator
Laboratory of Neuro Imaging, UCLA
635 Charles E. Young Drive South, Suite 225
Los Angeles, CA 90095-7332
Tel: 310.267.5076
Fax: 310.206.5518
hugo.hernandez at loni.ucla.edu
--

"Si seus esfor?os, foram vistos com indefren?a, não desanime, 
que o sol faze un espectacolo maravilhoso todas as manhãs 
cuando a maior parte das pessoas, ainda estam durmindo" 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list