[GE users] sge5.3 - delay between dispatching?

Dan Gruhn Dan.Gruhn at Group-W-Inc.com
Fri Mar 11 13:36:37 GMT 2005


I have the same type of problem.  I use an NFS compatible mutual
exclusion technique that involves file link to another file.  Only the
first job to try the link gets a success and all others fail.

touch lockfile
ln lockfile lockfile.lock >/dev/null 2>&1
while [ $result -ne 0 ]
  sleep 1
  touch lockfile
  ln lockfile lockfile.lock
# Do your file creation stuff here ...
rm -f lockfile lockfile.lock

On Fri, 2005-03-11 at 07:27, Justus Loerke wrote:

> Hi,
> I'm looking for a way to set a defined time delay (say 5 or 10 secs) 
> between the dispatching of subsequent jobs to execution hosts. Sorry if 
> I'm posting to the group with this, but I didn't find anything in the 
> archive or the docs.
> I'm having some problems with dispatching and the NFS system: if 2 (or 
> more) jobs are dispatched and started on different execution hosts at 
> the same time, these jobs (namely spider) will try to open a results 
> file with the same name on the NFS shared directory; this will crash all 
> jobs but one with a 'stale NFS handle' error. If a filename was in use, 
> the jobs would just use the next free name, but the real problem is that 
> several jobs try to create the same file _at the same time_. Use of 
> local (execution host) disks for the destination of the results file is 
> something we're checking right now, but it clutters up local disks, 
> distributes debug information over too many hosts and is just not 
> elegant, you know? :)
> So is there a way to reconfigure the scheduler to wait a defined time 
> interval between the  dispatching of jobs? This would solve my problem, 
> since newly dispatched jobs would try opening results files in intervals 
> of 5 or 10 secs and would then use different file names.
> Thanks, Justus.

More information about the gridengine-users mailing list