[GE users] Illumina/Solexa pipeline on large SGE 6.2 systems?

Sean Davis sdavis2 at mail.nih.gov
Fri Sep 5 20:55:57 BST 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

On Fri, Sep 5, 2008 at 3:49 PM, Jesse Becker <beckerjes at mail.nih.gov> wrote:
> Chris Dagdigian wrote:
>
>> The Illumina analysis pipeline is kinda clever in that it has scripts
>>  that generate massive Makefiles that control all the tasks associated  with
>> the analysis run. To run the pipeline on a local server you just  kick off
>> the prep script, navigate to a directory and type "make".
>>
>> Pretty cool.
>
> Yeah, it is a pretty interesting use of makefiles.  The dependency tree is
> really impressive to see (there are a few scripts out there that parse
> Makefiles generate graphviz files to help vizualize them).
>
>
>> Of course the nicest thing about a workflow based on unix make is that
>>  you can "parallelize" your workflow by replacing unix make with SGE  qmake
>> and presto, you have a "cluster aware" analysis pipeline with  very little
>> effort.
>
> Unfortunately, it wasn't quite that simple when we originally tried to do
> this.  We're running SGE 6.0u8.  The Makefiles themselves are valid, but we
> ran headlong into the two "Known Problems" mentioned in the qmake man page.
>
> The biggest issue has to do with multiple commands in a single rule.  There
> were a few instances of the problem described in the manpage that qmake
> can't handle.  The man page has an example, but briefly, the problem is that
> a target such as:
>
>  foo:
>        cd bar
>        cc -o foo foo.c
>
> won't work with qmake.  It has to be rewritten as something like:
>
>  foo:
>        cd bar; \
>        cc -o foo foo.c
>
> Instead of modifying makefiles after they were generated, we instead left
> them as-is and created a PE to allocate N slots per host (where N is the
> number of cores).  We then call 'make' directly against each lane using
> something like this:
>
>   qsub -cwd -pe make-dedicated 4 -b y  make -j 4
>
> While not ideal, we do get 8-way parallelism on a per-lane basis, and 4-way
> parallelism within each lane.  This has worked well enough for the 10
> months.  However, we are starting the process of re-evaluating this method,
> along with an upgrade to 6.2.

I should have mentioned that when we have use SGE for the pipeline, we
have not been using qmake, generally, but distmake, which is SGE
aware.

Sean

> Also, as Sean Davis mentioned elsewhere, the pipeline is very IO intensive,
> and you can have issues with the NFS servers.  We had some issues with
> attribute (e.g. timestamp) caching that caused some dependency problems.
>
>> Given that a big feature of SGE 6.2 is a new totally-internal
>>  implementation of interactive job support I'm wondering if there is  anyone
>> on this list who has found that running under 6.2 has made the  pipeline
>> runs smoother and less prone to resource or bottleneck  related SGE errors.
>
> We actually wound up witting our own "pipeline" around the Illumina
> pipeline.  It's more extensive, in that we use it for moving data off the
> sequencers, checking that the raw data (images) are valid, and actually
> running the Illumina pipeline.  After each cycle, we also run a "partial"
> analysis on a subset of tiles, just to see how things are doing.  But as I
> said, this is using 6.0u8, not 6.2.
>
> I'll let you know how the upgrade goes.
>
>
> --
> Jesse Becker
> NHGRI Linux support (Digicon Contractor)
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list