[GE users] Illumina/Solexa pipeline on large SGE 6.2 systems?

Sean Davis sdavis2 at mail.nih.gov
Tue Sep 2 19:04:33 BST 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

On Tue, Sep 2, 2008 at 1:20 PM, Chris Dagdigian <dag at sonsorol.org> wrote:
>
> Hi folks,
>
> Illumina makes next-gen DNA Sequencing instruments that generate tons of
> data, enough data that a cluster is usually a good idea for the analysis
> pipeline steps including image analysis, base calling and sequence assembly.
>
> The Illumina analysis pipeline is kinda clever in that it has scripts that
> generate massive Makefiles that control all the tasks associated with the
> analysis run. To run the pipeline on a local server you just kick off the
> prep script, navigate to a directory and type "make".
>
> Pretty cool.
>
> Of course the nicest thing about a workflow based on unix make is that you
> can "parallelize" your workflow by replacing unix make with SGE qmake and
> presto, you have a "cluster aware" analysis pipeline with very little
> effort.
>
> Many people are doing this -- Illumina pipelines running via SGE qmake are
> pretty popular.
>
> A common problem that Illumina users run into is the sheer size and scope of
> their SGE qmake activities can often overwhelm the SGE qmaster -- this is
> where you see people asking on forums about shepherd related errors, lack of
> filehandles and timeout errors with qmaster and such.
>
> Given that a big feature of SGE 6.2 is a new totally-internal implementation
> of interactive job support I'm wondering if there is anyone on this list who
> has found that running under 6.2 has made the pipeline runs smoother and
> less prone to resource or bottleneck related SGE errors.
>
> I'm wondering if my personal set of Illumina/SGE best practices should begin
> with "step 1, install or upgrade to SGE 6.2". Anyone have any real world
> experiences to share with 6.2 and goat_pipeline.py?

We have used it under 6.1 and 6.2.  I wouldn't say that I have seen a
big difference, but we have only 24 nodes (until recently) with which
to work; this may not be enough to stress SGE.  We have noticed that
there is a significant I/O and network bottleneck, but we are using
pretty simple hardware (Gig ethernet and SATA drives in small RAID-6
configurations--no dedicated SAN or high-performance NAS).  I know of
groups using it in 120-node clusters and SGE and they have not noted
problems to us with that setup even though they were probably running
fairly dated SGE.

Sean

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list