[GE users] Illumina/Solexa pipeline on large SGE 6.2 systems?

Chris Dagdigian dag at sonsorol.org
Tue Sep 2 18:20:22 BST 2008

Hi folks,

Illumina makes next-gen DNA Sequencing instruments that generate tons  
of data, enough data that a cluster is usually a good idea for the  
analysis pipeline steps including image analysis, base calling and  
sequence assembly.

The Illumina analysis pipeline is kinda clever in that it has scripts  
that generate massive Makefiles that control all the tasks associated  
with the analysis run. To run the pipeline on a local server you just  
kick off the prep script, navigate to a directory and type "make".

Pretty cool.

Of course the nicest thing about a workflow based on unix make is that  
you can "parallelize" your workflow by replacing unix make with SGE  
qmake and presto, you have a "cluster aware" analysis pipeline with  
very little effort.

Many people are doing this -- Illumina pipelines running via SGE qmake  
are pretty popular.

A common problem that Illumina users run into is the sheer size and  
scope of their SGE qmake activities can often overwhelm the SGE  
qmaster -- this is where you see people asking on forums about  
shepherd related errors, lack of filehandles and timeout errors with  
qmaster and such.

Given that a big feature of SGE 6.2 is a new totally-internal  
implementation of interactive job support I'm wondering if there is  
anyone on this list who has found that running under 6.2 has made the  
pipeline runs smoother and less prone to resource or bottleneck  
related SGE errors.

I'm wondering if my personal set of Illumina/SGE best practices should  
begin with "step 1, install or upgrade to SGE 6.2". Anyone have any  
real world experiences to share with 6.2 and goat_pipeline.py?


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list