[GE users] Fair share scheduling problem

Jean-Paul Minet minet at cism.ucl.ac.be
Tue Jan 3 09:56:20 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Stephan,

I am using 6.0u6 on Solaris 10 (Sunfire V440).  Working nodes are bi-opteron 
(SUSE 9.0).  The cluster has just been handed-over from Sun, which did the SGE 
default install.  I am trying now to get a simple fairshare policy setup (all 
users having equal shares over a period of time).

Note that host sorting for scheduling is based on "slots" (also defined as 
consumable resources for exec hosts) to fill up hosts as much as possible with 
sequential jobs (so as to leave empty nodes for MPI/OpenMP jobs).

We don't use projects, just a flat "default" user tree. So there are no regular 
updates nor modifications.

See my previous mail (with sched config, share tree config and qstat) for more info.

Which other info would be helpful/meaningful?

Thks

Jean-paul

Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
> Hi Paul,
> 
> which version are you using? We had some bugs in that area. And to
> answer your
> question, yes, they are related. The question is, why the user/project
> updates
> are failing.
> 
> Do you reconfigure your users/projects on a regular basis?
> 
> Could you give use some insight into your configuration?
> 
> Kind Regards,
> Stephan
> 
> Jean-Paul Minet wrote On 01/02/06 17:38,:
> 
> 
>>Hello all,
>>
>>I have setup SGE to use fair share scheduling, with a fair share tree composed 
>>of a "default" leaf and a "test" node with two users.  Several users (falling 
>>under test node or default leaf) have running jobs.  For all of them, "Actual 
>>Resource Share", "Targeted Resource Share" and "Combined Usage" remain at 0. 
>>Also, a "qstat -ext" shows 0 as stkct for all jobs.
>>
>>Note that I am getting messages like
>>
>>01/02/2006 12:03:58|qmaster|lmsp|E|orders user/project version (63119) is not 
>>uptodate (63120) for user/project "bricteux"
>>
>>in the qmaster message file.  Would this be linked to the problem?  How could I 
>>move forward?
>>
>>Any help will be appreciated.
>>
>>Rgds
>>
>>Jean-Paul
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> 
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 
> 

-- 
Jean-Paul Minet
Gestionnaire CISM - Institut de Calcul Intensif et de Stockage de Masse
Université Catholique de Louvain
Tel: (32) (0)10.47.35.67 - Fax: (32) (0)10.47.34.52

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list