[GE users] SGE6 does not backfill

Juha Jäykkä juhaj at iki.fi
Wed Apr 13 12:57:24 BST 2005


    [ The following text is in the "ISO-8859-15" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

> The reports which we got were on 6.0u3 and u4. And it were always linux
> on amd64 machines.

We have linux 2.4. Can I change the file descriptor limit? Does sge
inherit it from the shell which starts it or something? I hope it does not
use some hard-coded, unchangeable value.

> >If the job goes to unknown state, could I not use reschedule_unknown to
> >reschedule it after it fails?
> Hm, good idea. We should try it.

Tried this, no use. Did not help.

> qping -dump master_host $SGE_QMASTER_PORT qmaster 1
> 
> This way you will the all communications between the qmaster and the
> clients (scheduler, execd,..)

Ok, I did this. I attached the qping log, but looking at it I could not
make heads or tails out of it. I hope you can. During the whole time of
the qping run, I submitted 10 large (whole cluster) parallel jobs, out of
which 4 failed.

This is a rather frustrating bug. It does not undermine the whole
package, but makes it quite unreliable.

Is there some other debugging info I can provide you? Running some of the
daemons with some debug-switches perhaps?

-- 
		 -----------------------------------------------
		| Juha Jäykkä, juolja at utu.fi			|
		| home: http://www.utu.fi/~juolja/		|
		 -----------------------------------------------



    [ Part 2: "Attached Text" ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list