[GE users] qstat/qacct

craffi dag at sonsorol.org
Thu Feb 12 16:44:00 GMT 2009

You can directly process the accounting file yourself for bulk  
analysis of completed jobs or slurp the file into a simple SQL  
database. The SGE ARCo system can do this on a larger scale. I've  
personally written perl scripts in the past that took the accounting  
file and stuffed it into a simple mysql database.

One of the best "full SGE life cycle" implementations I've ever seen  
did this:

- Global prolog scripts perform an SQL insert on a central database  
for all newly dispatched jobs, capturing significant info about the  
job environment
- Global epilog script also does an SQL update to log exit code and  
resource consumption data

Using the prolog/epilog hooks let this group build a custom system  
that  tracked the full life cycle of each job. More importantly  
though, enough data was captured that the group could resubmit and  
repeat any job if needed in *exactly* the same way it was run/ 
submitted previously.


On Feb 12, 2009, at 11:16 AM, yacc143 wrote:

> I wondered if there is some way to query the status of submitted jobs?
> I've been doing a qstat -j JOBID (that would yield the job if it's
> running or pending or failed to start), and a qacct -j JOBNAME to  
> figure
> out the exit status of the jobs.
> The above has proven to be too slow (it works fine for hundreds of  
> jobs
> but breaks miserably when scaled to 20000 qsubs :( ).
> Now I'm using a trick in making the submitted job log it's start time
> and end time/exit status, but that's quite dirty.
> So what's the correct way to track a job from submission till "exit
> status", for a potentially quite large collection of jobs?
> TIA,
> Andreas


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list