[GE users] Determining the failure states of completed jobs in SGE 5.3

John Hearns john.hearns at streamline-computing.com
Thu Jun 7 11:03:18 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Dennis Williams wrote:
> Hello,
>  
> My team are in the process of building an application that submits jobs to node clusters running SGE 5.3. One of the requirements is to monitor the status of a job (throughout its lifecycle) that has been submitted to the SGE.
>  
> Using the "qstat" command it is possible to determine if a job is currently waiting in a queue or running on a node, but once the job has completed I would like to be able to determine if the job has completed successfully or with errors. I understand that once a job has completed two files are written on the compute node containing the stdout and stderr, but our application will not have access to these nodes as they are on private networks.
>  
> So my question is:
>  
> 1) Does SGE 5.3 provide commands (or techniques) that would enable clients to determine if a job has completed with or without errors?


A job can email its status.
qsub -m beas -M user at host
(begin, end, abort, suspend)

Or look into DRMAA

-- 
      John Hearns
      Senior HPC Engineer
      Streamline Computing,
      The Innovation Centre, Warwick Technology Park,
      Gallows Hill, Warwick CV34 6UW
      Office: 01926 623130 Mobile: 07841 231235

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list