[GE users] qstat/qacct
dag at sonsorol.org
Thu Feb 12 16:44:00 GMT 2009
You can directly process the accounting file yourself for bulk
analysis of completed jobs or slurp the file into a simple SQL
database. The SGE ARCo system can do this on a larger scale. I've
personally written perl scripts in the past that took the accounting
file and stuffed it into a simple mysql database.
One of the best "full SGE life cycle" implementations I've ever seen
- Global prolog scripts perform an SQL insert on a central database
for all newly dispatched jobs, capturing significant info about the
- Global epilog script also does an SQL update to log exit code and
resource consumption data
Using the prolog/epilog hooks let this group build a custom system
that tracked the full life cycle of each job. More importantly
though, enough data was captured that the group could resubmit and
repeat any job if needed in *exactly* the same way it was run/
On Feb 12, 2009, at 11:16 AM, yacc143 wrote:
> I wondered if there is some way to query the status of submitted jobs?
> I've been doing a qstat -j JOBID (that would yield the job if it's
> running or pending or failed to start), and a qacct -j JOBNAME to
> out the exit status of the jobs.
> The above has proven to be too slow (it works fine for hundreds of
> but breaks miserably when scaled to 20000 qsubs :( ).
> Now I'm using a trick in making the submitted job log it's start time
> and end time/exit status, but that's quite dirty.
> So what's the correct way to track a job from submission till "exit
> status", for a potentially quite large collection of jobs?
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users