[GE users] checking job return status in epilog script

madpower prandtstetter at ads.tuwien.ac.at
Wed Jun 10 08:22:22 BST 2009


> What exactly are you trying to find out??
Well, the problem that arises in our environment is that sometimes jobs
stop to be executed on the execution hosts but they are not killed.
I.e., they are in status "S" (sleeping). We, however, do not know why
this happens and what to do such that this does not happen again.
Unfortunately, the behavior cannot be reproduced since it is non
deterministic.

So my hope was, when reading the original post(s), that there might be
some information in the usage file which indicates further details on
the failure (maybe mem-usage, cpu-usage, i/o, etc.).

> The job directory is created on the execution host, and when the job
> finishes, the directory is cleaned up after the job data is sent to
> qmaster.
So this means that while a job is executed there should be some
informations (in some files) on the execution host. Is there any default
 directory, where these files are stored or any default names for this
files, e.g., $TASK_ID.usage?
Because then I can search for these files and have a look at them. I did
not find, however, any file on my execution hosts having "usage" in
their name. Or are they created only on finish of jobs - as described
above, our jobs do not finish (neither clean nor unclean) but they sleep.

Thanks,
Matthias

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201388

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list