[GE users] Checkpointing and using local hard disk of execution host
veerendra at yashasvi.co.in
Tue Aug 25 12:54:22 BST 2009
[ The following text is in the "Windows-1252" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
I?m using Sung grid on Linux with 10 systems to run Spectre (From cadence).
We have configured two queues 1. long.q 2. short.q (Long queue are for jobs that run for 1 hour, short queue is for jobs that run for 5 minutes)
(In long queue if the jobs run for more than one hour it will be check-pointed and rescheduled. For short queue the jobs which run for more than 5 minutes will be rescheduled and jobs in the queue wait state will go to execute state.)
We are able to successfully write a co-scheduler to implement the above requirement. However I have one query.
Engineers submit Spectre jobs from an NFS mounted directory structure. (Ex: /project/chip1/username-X. Within this directory the input file to Spectre input.scs is stored. This file is now submitted as input to the grid, which is ? qsub input.scs ?q short.q. The input.scs file has all the required pointers to the library. When job is submitted to the grid using qsub the o/p file which input.raw is stored in the same directory.
We do not have a high end NAS; as a result we observe performance issue with NAS. Can we achieve the following?
* Can we use the local hard disk of the execution host to dump the input.raw file?
* Since we use checkpoint restart to reschedule the jobs in both long.q and short.q, is it ideal to use local hard disk. (When the grid reschedules the job, it could resume in another execution host right? and it may not find the data as it is in the local directory of other execution host.
Please clarify and provide inputs on how to achieve it
Thanks and Regards
More information about the gridengine-users