[GE users] Job ID Roll over (was: fatal error, run database recovery)

Andy Schwierskott andy.schwierskott at sun.com
Tue Mar 15 10:07:41 GMT 2005


> The value (MAX_SEQNUM) seems to be configurable at
> compile time, I will play with it a bit and see if
> increasing it would break anything or not :)
> It used to be 999,999 but was increased 10 times to
> 9,999,999, and in theory the code supports 2^32 but it
> was limited to the current value due to performance
> reason:
> http://gridengine.sunsource.net/servlets/ReadMsg?listName=dev&msgId=1059

With the need for efficient array job spooling (just spool the changed array
task and not the complete array job as it was donw before) we had to change
the spooling format for classic spooling.

The compiled in MAX_SEQNUM has nothing to do with efficiency. Technically it
can be changed without any problems - you just should make sure that the
qstat output and other output containing job id's (including qmon) is not
getting garbled or cut off for job id's going beyond 10 millions.


> Many large sites are now using DBD spooling in order
> to keep up the I/O requirements, to them 10M is a very
> small number. May be for DBD sites we need a larger
> value for MAX_SEQNUM.
> -Ron
> --- "Beadles, Jeff" <jeff_beadles at mentorg.com> wrote:
>> Naw, why bother.  I just grabbed a copy of the
>> accounting file.
>> We roll over job numbers every few months anyway.
>> It seems to roll over at 9,999,999.
>> I've got users that submit several thousand jobs per
>> day here, so it's a little different scale. :-)
>>   -Jeff
>> ________________________________
>> From: Rayson Ho [mailto:raysonho at eseenet.com]
>> Sent: Mon 3/14/2005 2:02 PM
>> To: users at gridengine.sunsource.net
>> Subject: RE: [GE users] fatal error, run database
>> recovery
>>> 3 million+ per month, that are typically bursty.
>> (big flood of >work,
>> trickles down to nothing for a bit, and then another
>> flood.)
>> Ar, I see... our cluster is small and usually runs a
>> few thousand jobs per
>> month :)
>> BTW, did you set the
>> $SGE_ROOT/default/spool/qmaster/jobseqnum to the
>> last
>> job ID before the reinstall, so that accounting
>> would be correct??
>> Rayson

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list