[GE users] SGE6 does not backfill

Reuti reuti at staff.uni-marburg.de
Sun Apr 10 18:27:38 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Quoting Juha Jäykkä <juhaj at iki.fi>:

> > The jobs still running have a h_rt set, and the new submitted serial
> > ones also?
> 
> The ones in queue do have, but I did not check all new serial jobs. What I
> did, is I submitted one with -l h_rt=00:01:00 just for testing. It does
> not backfill.
> 
> > I wonder, why all of your queues are dropped. qstat -f shows that all
> > are empty  besides the three running ones?
> 
> They are empty. Now there is one "quirk" we did: all.q at topaasi.local has
> slots=0 in it, but I cannot see how that could affect it.
> 
> Should the queues be dropped only in case there is actually someone eating
> its CPU or should the resource resesrvation cause it to be dropped, too?

For me it's only dropped, if there is something running in all slots of a 
queue. But not for a reservation. - Reuti

> You notice, the top-job in the queue wants all the resources of the
> cluster and thus it reserves all the queues. But this does not change the
> behaviour: I changed the priorities of the jobs and put a smaller job on
> top - same behaviour.
> 
> Am I missing some config parameter or something? Urgency value for h_rt,
> perhaps?
> 
> > > Sun Apr 10 18:34:53 2005|-------------START-SCHEDULER-RUN-------------
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at topaasi.local" dropped
> > > because it is full
> > > Sun Apr 10 18:34:53 2005|queues dropped because they are full:
> > > all.q at topaasi.local
> > > Sun Apr 10 18:34:53 2005|Job 182 cannot run because available slots
> > > combined under PE "lam" are not in range of job
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-6.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-10.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-8.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-3.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-9.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-2.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-0.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-4.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-5.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-7.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-11.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-1.local"
> > > dropped because it is full
> > > Sun Apr 10 18:34:53 2005|queue instance "all.q at topaasi.local" dropped
> > > because it is full
> > > Sun Apr 10 18:34:53 2005|queues dropped because they are full:
> > > all.q at compute-0-6.local all.q at compute-0-10.local
> > > all.q at compute-0-8.local all.q at compute-0-3.local
> > > all.q at compute-0-9.local all.q at compute-0-2.local
> > > all.q at compute-0-0.local all.q at compute-0-4.local
> > > all.q at compute-0-5.local all.q at compute-0-7.local
> > > Sun Apr 10 18:34:53 2005|queues dropped because they are full:
> > > all.q at compute-0-11.local all.q at compute-0-1.local all.q at topaasi.local
> > > Sun Apr 10 18:34:53 2005|--------------STOP-SCHEDULER-RUN-------------
> > > 
> > > BTW, the queues have changed a little since my last mail, by the
> > > situation at the moment is, there are three jobs running and job 182
> > > requests 24 (=all) CPU's. The jobs have still many hours until their
> > > h_rt runs out.
> > > 
> > > In fact, I have never seen SGE6 backfill anything yet... I have only
> > > seen the highest priority job being dispatched. What worries me here
> > > is, that in the snipped from /sgeroot/cell/common/scheduler I sent in
> > > the first mail, reservations are only done for the first job and the
> > > rest of the jobs are apparently not even considered!
> > > 
> > > -- 
> > > 		 -----------------------------------------------
> > > 		| Juha Jäykkä, juolja at utu.fi			|
> > > 		| home: http://www.utu.fi/~juolja/		|
> > > 		 -----------------------------------------------
> > > 
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > > 
> > > 
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > 
> 
> 
> -- 
> 		 -----------------------------------------------
> 		| Juha Jäykkä, juolja at utu.fi			|
> 		| home: http://www.utu.fi/~juolja/		|
> 		 -----------------------------------------------
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list