[GE users] using NFS

reuti reuti at staff.uni-marburg.de
Mon Sep 7 12:24:59 BST 2009


Am 07.09.2009 um 12:53 schrieb veerendra_n:

> We have a scenario wherein all jobs in a queue will be rescheduled  
> if the job takes more than 5 min to complete.
> When we reschedule the job will it resume in another execution host  
> or does it resume in the same execution host? (we have implemented  
> checkpoint, so the job can resume from that state)
> We are planning to use local disk place to run jobs, if the job  
> resumes on different execution host then we may have to use NFS.

it will be scheduled to any node, which is free. You can either use a  
setup to copy the file from the local scratch space to a common  
scratch space (and again to the local node the next time the job  
starts) to avoid using NFS (which is of course an option, and writing  
the checkpoint file one time shouldn't put much load on the NFS server).

If you want the job to run always on the same node, you can use the  
variable $RESTARTED, and when it's the first run, you can use qalter  
to change the request for the next invocation of the job (either by  
requesting a specific queue instance [-q ... at ...] or hostname [ -l  
h=... ]). For this to work all nodes must be submit nodes.

-- Reuti


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list