[GE users] qstat/qacct

yacc143 andreas at kostyrka.org
Thu Feb 12 18:15:37 GMT 2009


My problem is not posthum analysis, it's more a GUI frontend for the
jobs being submitted. For this I need realtime (more or less)
information about the job status.

-) I need to know that the job is pending. (Currently using my
logging inside the job proper, I'm just assuming that the job will
eventually run.)
-) I'm strongly interested when the job starts and ends.
-) I'm strongly interested in the exit status of my job.

The only way, and even that was unperfect that I found was running
qstat and qacct with -j JOBNAME, and parsing the output. That worked
fine in the beginning, but as always, things grow. Around 1000 jobs
running these commands periodically started to fail (actually running
them and processing the output took just plainly to long), so I've
implemented the work around with logging the start and exit of jobs.
Works fine, although it sounds like something that should be part of
the SGE proper, and it fails on the first point, e.g. if some job fails
before it starts to run, bad things happen.

Anyway, your answer suggests that my idea of logging the start/exit
myself is a sound decision.

Thanks,

Andreas

Am Thu, 12 Feb 2009 11:44:00 -0500
schrieb craffi <dag at sonsorol.org>:

> You can directly process the accounting file yourself for bulk  
> analysis of completed jobs or slurp the file into a simple SQL  
> database. The SGE ARCo system can do this on a larger scale. I've  
> personally written perl scripts in the past that took the accounting  
> file and stuffed it into a simple mysql database.
> 
> One of the best "full SGE life cycle" implementations I've ever seen  
> did this:
> 
> - Global prolog scripts perform an SQL insert on a central database  
> for all newly dispatched jobs, capturing significant info about the  
> job environment
> - Global epilog script also does an SQL update to log exit code and  
> resource consumption data
> 
> Using the prolog/epilog hooks let this group build a custom system  
> that  tracked the full life cycle of each job. More importantly  
> though, enough data was captured that the group could resubmit and  
> repeat any job if needed in *exactly* the same way it was run/ 
> submitted previously.
> 
> -Chris
> 
> 
> 
> On Feb 12, 2009, at 11:16 AM, yacc143 wrote:
> 
> > I wondered if there is some way to query the status of submitted
> > jobs?
> >
> > I've been doing a qstat -j JOBID (that would yield the job if it's
> > running or pending or failed to start), and a qacct -j JOBNAME to  
> > figure
> > out the exit status of the jobs.
> >
> > The above has proven to be too slow (it works fine for hundreds of  
> > jobs
> > but breaks miserably when scaled to 20000 qsubs :( ).
> >
> > Now I'm using a trick in making the submitted job log it's start
> > time and end time/exit status, but that's quite dirty.
> >
> > So what's the correct way to track a job from submission till "exit
> > status", for a potentially quite large collection of jobs?
> >
> > TIA,
> >
> > Andreas
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=104142
> 
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=104200

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list