[GE users] [OT] Running R under SGE and OpenMPI

Sean Davis sdavis2 at mail.nih.gov
Thu Oct 9 19:58:03 BST 2008


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

On Thu, Oct 9, 2008 at 1:53 PM, Rayson Ho <rayrayson at gmail.com> wrote:
> Can you pass in "--mca pls_gridengine_verbose 1" to mpirun??
>
> According to the OpenMPI FAQ, this should add the -verbose flag to
> qrsh, and will help us debug the problem.

The job script looks like:
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
mpirun --mca pls_gridengine_verbose 1 -np $NSLOTS hostname

And the output on the error stream:
> more junksub.sh.e3574
[shakespeare:05720] mca: base: component_find: unable to open ras tm:
file not found (ignored)
[shakespeare:05720] mca: base: component_find: unable to open pls tm:
file not found (ignored)
Starting server daemon at host "shakespeare.nci.nih.gov"
Starting server daemon at host "octopus.nci.nih.gov"
Server daemon successfully started with task id "1.shakespeare"
[shakespeare:05733] mca: base: component_find: unable to open ras tm:
file not found (ignored)
[shakespeare:05733] mca: base: component_find: unable to open pls tm:
file not found (ignored)
error: executing task of job 3576 failed: failed sending task to
execd at octopus.nci.nih.gov: can't find connecti
on
[shakespeare:05720] ERROR: A daemon on node octopus.nci.nih.gov failed
to start as expected.
[shakespeare:05720] ERROR: There may be more information available from
[shakespeare:05720] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[shakespeare:05720] ERROR: If the problem persists, please restart the
[shakespeare:05720] ERROR: Grid Engine PE job
[shakespeare:05720] ERROR: The daemon exited unexpectedly with status 1.

However, there is no output in any output stream.

And if I log into shakespeare and qrsh -q all.q at octopus, I immediately
get a slot, so there isn't a "direct" problem with connecting.

Again, thanks.
Sean


> On 10/9/08, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> On Thu, Oct 9, 2008 at 12:42 PM, Reuti <reuti at staff.uni-marburg.de> wrote:
>> > Sean,
>> >
>> > Am 09.10.2008 um 17:47 schrieb Sean Davis:
>> >
>> >> On Thu, Oct 9, 2008 at 11:31 AM, Rayson Ho <rayrayson at gmail.com> wrote:
>> >>>
>> >>> On 10/9/08, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> >>>>
>> >>>> I can confirm that there is no --with-sge in the configure --help
>> >>>> output, even in 1.2.7.
>> >>>
>> >>> Yes, that's only in 1.3.
>> >>>
>> >>> BTW, how is your PE defined?? The error message tells you that the
>> >>> remote execd is not starting the remote tasks:
>> >>>
>> >>>  error: got no connection within 60 seconds
>> >>>  [octopus:21290] ERROR: A daemon on node shakespeare.nci.nih.gov failed
>> >>>  to start as expected.
>> >
>> > is there any firewall installed on the machines?
>>
>> No.  And shakespeare is the qmaster of an otherwise working cluster.
>>
>> Sean
>>
>> >> Thanks, Rayson.
>> >>
>> >>> qconf -sp orte
>> >>
>> >> pe_name            orte
>> >> slots              999
>> >> user_lists         NONE
>> >> xuser_lists        NONE
>> >> start_proc_args    /bin/true
>> >> stop_proc_args     /bin/true
>> >> allocation_rule    $round_robin
>> >> control_slaves     TRUE
>> >> job_is_first_task  FALSE
>> >> urgency_slots      min
>> >> accounting_summary FALSE
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> >> For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list