[GE users] Spurious errors with checkpointing

Ruppert dieter_ruppert at siemens.com
Wed Dec 5 10:55:58 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


Hi,

we use Gridengine 6.0u6 on Solaris10/Sparc and are occasionally
getting errors "can't stat ... as stdout_path: Permission denied"
from jobs which are being suspended in subordinate queues.

The setup is the following: we have two queues per host, one for
potentially long running jobs (t) and one for "immediate" jobs (b), with
t being subordinate to b. Thus, when a job in b starts, a job in t 
is being migrated to an other, usually less powerful machine by a 
checkpointing environment attached to t which does a kill -USR1 to 
the job's process group.

All this works usually as it should, but from time to time a job in t
goes into an error state with the above message. The stdout_path
is, of course, owned and writable by the user of the job.



More information about the gridengine-users mailing list