[GE users] Job ID Roll over (was: fatal error, run database recovery)

Ron Chen ron_chen_123 at yahoo.com
Tue Mar 15 12:04:13 GMT 2005


--- Andy Schwierskott wrote: 
> With the need for efficient array job spooling (just
> spool the changed array
> task and not the complete array job as it was donw
> before) we had to change
> the spooling format for classic spooling.

But isn't it a bad thing for classic spooling if the
jobid is very long? Since we need to hash the jobid
into the filesystem directory structures, the longer
the jobid, the greater the level of nesting is
required?

qstat and qmon should be fine. The code checks whether
the id is greater than 99999999. Originally I thought
it should check for MAX_SEQNUM instead, but then after
a second thought, I found that the current way is
correct: we can set MAX_SEQNUM to a higher number
without the need to modify the output formating code.

 -Ron


> 
> The compiled in MAX_SEQNUM has nothing to do with
> efficiency. Technically it
> can be changed without any problems - you just
> should make sure that the
> qstat output and other output containing job id's
> (including qmon) is not
> getting garbled or cut off for job id's going beyond
> 10 millions.
> 
> Andy
> 
> 
> > Many large sites are now using DBD spooling in
> order
> > to keep up the I/O requirements, to them 10M is a
> very
> > small number. May be for DBD sites we need a
> larger
> > value for MAX_SEQNUM.
> >
> > -Ron
> >
> >
> > --- "Beadles, Jeff" <jeff_beadles at mentorg.com>
> wrote:
> >> Naw, why bother.  I just grabbed a copy of the
> >> accounting file.
> >>
> >> We roll over job numbers every few months anyway.
> >> It seems to roll over at 9,999,999.
> >>
> >> I've got users that submit several thousand jobs
> per
> >> day here, so it's a little different scale. :-)
> >>
> >>   -Jeff
> >>
> >>
> >> ________________________________
> >>
> >> From: Rayson Ho [mailto:raysonho at eseenet.com]
> >> Sent: Mon 3/14/2005 2:02 PM
> >> To: users at gridengine.sunsource.net
> >> Subject: RE: [GE users] fatal error, run database
> >> recovery
> >>
> >>
> >>
> >>> 3 million+ per month, that are typically bursty.
> >> (big flood of >work,
> >> trickles down to nothing for a bit, and then
> another
> >> flood.)
> >>
> >> Ar, I see... our cluster is small and usually
> runs a
> >> few thousand jobs per
> >> month :)
> >>
> >> BTW, did you set the
> >> $SGE_ROOT/default/spool/qmaster/jobseqnum to the
> >> last
> >> job ID before the reinstall, so that accounting
> >> would be correct??
> >>
> >> Rayson
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> 
> 


		
__________________________________ 
Do you Yahoo!? 
Yahoo! Small Business - Try our new resources site!
http://smallbusiness.yahoo.com/resources/ 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list