[GE users] checking job return status in epilog script

madpower prandtstetter at ads.tuwien.ac.at
Fri Jun 12 07:13:32 BST 2009


Dear John,

> here's a couple of thoughts/ideas for you -
thank you very much. I had almost the same idea, but was at this time
too busy with other work. So I did not test it in detail at this time.

> first, when I was experimenting with this yesterday, I put the
> following line in my epilog script:
Anyhow, there is one big problem with this approach. As soon as the job
terminates, everything is okay in our cluster. The problem is, that the
job never terminates. So I cannot automatically copy everthing from the
$SGE_JOB_SPOOL_DIR.
Nevertheless, I am going to do something like this for perfectly
terminating jobs, just to know which files are exactly in this directory
and whether it is useful to further investigate informations written in
there.

> Note that I extract the exit
> status from the "usage" file.
This is one of the problems I have. I do not find any file named usage
(or similar) on the entire execution host while jobs are running. So I
fear that this file is created on termination of a job. However, maybe
there is other useful information in the spool directory.

> PS - oh, one other thing I noticed in your post --
> you mentioned that your problem jobs are in state "S", which
> you called "sleeping" -- from the way I understand the qstat
> output, capital s ("S") means that the queue is suspended
> (as opposed to a small s ("s") which means that the job is
> suspended). Not sure if that's just the term you use for this
> or not, but I thought I'd point it out - it could be that
> your problem is that the queue is getting suspended for some
> reason....
Well, thanks for this indication. Maybe I was a little unprecise on this
topic. In my case the "S" output is from the unix command "top" (or ps
faux) on the console of the execution host. So the jobs are regularily
listes as "running" in qstat but actually they are not running on the
execution host. So therefore I hope to find some useful information in
log files. Regarding system logs there are no problems listed.

Anyway, we are doing an upgrade now on the new kernel and debian 5.0
with the included SGE-packages. So we hope that these problems will be
history in the future.

Thanks again,
Matthias

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=201625

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list