[GE users] Stale finished jobs

Norbert Crettol norbert.crettol at idiap.ch
Wed Dec 5 14:08:23 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Reuti wrote:
> Quoting Norbert Crettol <norbert.crettol at idiap.ch>:
>
>> Reuti wrote:
>>> so the job never ran? I saw this, when the user has no rights to  
>>> read the spooled jobscript on the node or it's not created there at 
>>>  all. I.e. the "exec" of the fork to be replaced with the actual  
>>> jobscript fails. Is the spool directory for the nodes also in  
>>> $SGE_ROOT/default/spool/<node>/... or somewhere in /var/spool/sge  
>>> local on the node?
>> The spool is $SGE_ROOT/default/spool/<node>. There is nothing in
>
> For the qmaster it might be cosmetic to put it in the default location 
> or in /var/spool/sge. But for the nodes, it will avoid network traffic 
> as the job is first tranferred by SGE to the node, and then written to 
> the NFS volume again. Can you change your setup to have the spool 
> directory of the nodes local? This might explain the errors from time 
> to time.
I'll try to change the spool to local if possible. How much
space do I need for the exec spool ?

> There is nothing in there also when a job runs?
I've discovered a setup problem : when I upgraded from
6.1u2 to 6.1u3, the upgrade script crashed (stating there
was probably a file permission problem) and I had to
finish the upgrade "by hand". I just noticed that the
exec spool was still pointing to the old (6.1u2) directory.

When jobs run, they are visible there. I'll check how it
is with stale jobs. I'll start another test session.

Thank you for the hints.

Norbert

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list