[GE users] Execution of task chains on the same host
paguerlais at airfrance.fr
Tue Mar 16 09:31:23 GMT 2010
One of my users need to run tasks on large data sets. His computations are sliced in multiple sequential tasks, the output of the previous one is the input of the next one. Those intermediate results can be up to 1TB.
For latency, performance and economic issues, I don't want to set a shared storage for those intermediate data, but rather store them locally on the node which performed the task. This introduces a constraint on the following step, which must run on the same host than the previous one. Shortly, it should look like the following :
- submit (qsub) the first task
- SGE selects an execution host and run that first task on it
- the task generates intermediate data and stores them on its actual execution host
- for each and every following task, SGE must send it to the same execution host.
I've found two solutions to fit that constraint :
- create as many queues as execution hosts, one host per queue, and submit each task chain in a specific queue. Simple, but in case of an execution node crash I'll have to manage a spare spool (either automatically or manually), and the executions on this queue will be halted till the fix
- submit the first job with a known name (-N <name>), use qstat or qacct to find its actual execution node, then submit each following taks with an '-l hostname=<name of the first job found above>' option.
- is SGE able to manage natively such job chains with such job constraints ?
- can you imagine another way to achieve this (easier, simplier, more elegant, ...) ?
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users