[GE users] Jobs in "hold" state disappear. Debugging help?

gutnik gutnik at gmail.com
Wed Mar 24 21:11:47 GMT 2010


On Wed, Mar 24, 2010 at 1:24 PM, reuti <reuti at staff.uni-marburg.de> wrote:
> Am 24.03.2010 um 21:18 schrieb gutnik:
>
>> I'm using a cad tool (cadence) that is somewhat integrated with SGE.
>> In one situation, it launches a number of simulations, and one
>> "cleanup" job. I see the simulation jobs submitted, and I see the
>> cleanup job submitted with a hold that depends on the simulation
>> jobs. Great.
>>
>> What I see is that the simulations finish, and about 30 seconds
>> later, the cleanup job is removed from the queue without ever being
>> run. So, I have two questions:
>
> Removed from `qstat` or within Cadence?

I believe the job is removed by SGE. I don't think Cadence removes it; if there
were a log I could see, I could confirm that.

>> 2) Why is the job being removed? One possibility (from the manual) is
>> that the simulations are exiting with code 100. Is that
>
> The it's not removed but the job put in error state.
>
> To investigate this you can use:
>
> $ qstat -s z

qstat -s z
 lists the job.

> $ qacct -j <job_od_of_cleaner>

"error: jobid 1234 not found"
 if the job of the cleaner was 1234.

> You can even define in a ~/.sge_request file that you can an email for
> each started/ended/aborted job (which will be attached to each job
> even when you have no chance to enter such option in your application).

How do I do that? (Ideally, I'd like email for each job, and each change of
status and reason.)

   Vadim

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=251212

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list