[GE users] SGE6 does not backfill

Juha Jäykkä juhaj at iki.fi
Sun Apr 10 18:00:47 BST 2005


    [ The following text is in the "ISO-8859-15" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

> The jobs still running have a h_rt set, and the new submitted serial
> ones also?

The ones in queue do have, but I did not check all new serial jobs. What I
did, is I submitted one with -l h_rt=00:01:00 just for testing. It does
not backfill.

> I wonder, why all of your queues are dropped. qstat -f shows that all
> are empty  besides the three running ones?

They are empty. Now there is one "quirk" we did: all.q at topaasi.local has
slots=0 in it, but I cannot see how that could affect it.

Should the queues be dropped only in case there is actually someone eating
its CPU or should the resource resesrvation cause it to be dropped, too?
You notice, the top-job in the queue wants all the resources of the
cluster and thus it reserves all the queues. But this does not change the
behaviour: I changed the priorities of the jobs and put a smaller job on
top - same behaviour.

Am I missing some config parameter or something? Urgency value for h_rt,
perhaps?

> > Sun Apr 10 18:34:53 2005|-------------START-SCHEDULER-RUN-------------
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at topaasi.local" dropped
> > because it is full
> > Sun Apr 10 18:34:53 2005|queues dropped because they are full:
> > all.q at topaasi.local
> > Sun Apr 10 18:34:53 2005|Job 182 cannot run because available slots
> > combined under PE "lam" are not in range of job
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-6.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-10.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-8.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-3.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-9.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-2.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-0.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-4.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-5.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-7.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-11.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at compute-0-1.local"
> > dropped because it is full
> > Sun Apr 10 18:34:53 2005|queue instance "all.q at topaasi.local" dropped
> > because it is full
> > Sun Apr 10 18:34:53 2005|queues dropped because they are full:
> > all.q at compute-0-6.local all.q at compute-0-10.local
> > all.q at compute-0-8.local all.q at compute-0-3.local
> > all.q at compute-0-9.local all.q at compute-0-2.local
> > all.q at compute-0-0.local all.q at compute-0-4.local
> > all.q at compute-0-5.local all.q at compute-0-7.local
> > Sun Apr 10 18:34:53 2005|queues dropped because they are full:
> > all.q at compute-0-11.local all.q at compute-0-1.local all.q at topaasi.local
> > Sun Apr 10 18:34:53 2005|--------------STOP-SCHEDULER-RUN-------------
> > 
> > BTW, the queues have changed a little since my last mail, by the
> > situation at the moment is, there are three jobs running and job 182
> > requests 24 (=all) CPU's. The jobs have still many hours until their
> > h_rt runs out.
> > 
> > In fact, I have never seen SGE6 backfill anything yet... I have only
> > seen the highest priority job being dispatched. What worries me here
> > is, that in the snipped from /sgeroot/cell/common/scheduler I sent in
> > the first mail, reservations are only done for the first job and the
> > rest of the jobs are apparently not even considered!
> > 
> > -- 
> > 		 -----------------------------------------------
> > 		| Juha Jäykkä, juolja at utu.fi			|
> > 		| home: http://www.utu.fi/~juolja/		|
> > 		 -----------------------------------------------
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> > 
> > 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


-- 
		 -----------------------------------------------
		| Juha Jäykkä, juolja at utu.fi			|
		| home: http://www.utu.fi/~juolja/		|
		 -----------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list