[GE users] reschedule_unknown_list

Sean Dilda agrajag at dragaera.net
Thu Apr 1 17:23:35 BST 2004


While running strace on sge_qmaster, I noticed that it was updating the
exec_hosts/ files for some of the nodes rather often.  It was updating
and several of them every second, even though no new jobs or real status
changes were going on during these periods.  It seems there's a group of
66 nodes whose status get updated constantly (several times a second).

Looking in the files, I notice they all have a reschedule_unknown_list
line similar to this:

reschedule_unknown_list    131 1=8,132 1=8,141 1=8,148 1=8,158 1=8

The other nodes that don't get constantly updated have a value of NONE
for this.

Looking at the code that produces this file, it seems that the 131, 132,
etc are supposed to be job numbers.  However, that doesn't make a whole
lot of sense.  The lowest job number on the system right now is 2062.

What are these numbers on the reschedule_unknown_list?  If they are job
numbers, how do I make SGE forget about them?  Or at least, how do I get
qmaster to stop updating these files several times a second?

Thanks,


Sean


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list