[GE users] Anyone running Schrodinger chem/modeling apps under SGE6?

Andreas Haas Andreas.Haas at Sun.COM
Fri Feb 25 11:49:32 GMT 2005


Hi Chris,

On Thu, 24 Feb 2005, Chris Dagdigian wrote:

>
> One of the more interesting SGE projects I did last year involved
> getting Schrodinger products (glide, impact, etc.) happily integrated
> with SGE 5.3EE including flexlm license management and epilog scripts
> that catch and resubmit jobs that failed with errors that were machine
> correctable.
>
> Interestingly enough the same techniques used on the SGE 5.3 system are
> not working all that well on a new cluster running SGE 6.0u3 - in the
> past the Schrodinger job dispatch code easily and trivially exec'ed the
> submit script that did all the heavy work of setting ENV vars, calling
> qsub and passing back the jobID to Schrodinger so that their workflow
> tools knew how to kill or abort running jobs. Pretty basic.

Controlling workflows via CLI works but it is somewhat clumsy. With
DRMAA API one gets JobIDs of submitted jobs through a reasonable
interface and drmaa_wait() allows retrieving exit status of finished
jobs. Possibly DRMAA Perl binding would fit the needs of Schrodinger
workflow control.

>
> Now on a SGE 6.0u3 cluster, that process seems to hang - the shell
> script is invoked and the qsub command it calls hangs forever until
> reaching a timeout limit. The only way we can reliably get single,
> non-parallel jobs to run is to "fool" Schrodinger into thinking the job
> is "remote" in which case it uses passwordless SSH to exec the master
> submit script. In this situation everything works perfectly.

Daniel already replied to this. It might be a bug with qsub we're not
yet aware of. What is the exact commandline used by the submission script
when running qsub? Can you reproduce that by running qsub independent of
Schrodinger environment?

>
> Rather than detail the Schrodinger jmonitor and jdispatch functions for
> an uninterested list :) I thought I'd ask 2 questions:
>
> (1) This suite has some minimal workflow tools which require it to be
> aware of running jobs. This requires passing the SGE job ID back to the
> job dispatch process. This is done in their code by opening a pipe to
> the "qsub ./master-job-script.sh" command so that the STDOUT can be
> parsed in order to grab the jobID value. -- Has anything changed with
> qsub in 6.0 vs 5.3 regarding it's use and behavior of STDIN and STDOUT?
> Any new arguments need to be passed to qsub?

We haven't changed existing 5.3 qsub option semantics. New options
were -R, -js, -i, -b. A diff on 5.3/6.0 qsub -help unveils the details
in comprehensive fashion. Please find it attached.

> (2) Are there any current Schrodinger users running SGE 6.0 who are
> willing to talk off-list about their integration and experiences?
>
>
> I'll summarize anything I learn and if I can get permission I'll post
> details on Schrodinger/SGE integration specifics once we get things
> working as smoothly with 6.0 as it did with 5.3EE

That would be great! Having it under "Special Applications" in the
HOWTO would allow the community to improve it over time.

Regards,
Andreas


    [ Part 2, ""  Text/PLAIN (Name: "5360.diff") ~2.1 KB. ]
    [ Unable to print this part. ]


    [ Part 3: "Attached Text" ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list