[GE users] again on PVM and integration with sge

davide cittaro daweonline at gmail.com
Mon Apr 10 12:33:07 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Reuti,

On 4/10/06, Reuti <reuti at staff.uni-marburg.de> wrote:
> you mean you followed: http://gridengine.sunsource.net/howto/pvm-
> integration/pvm-integration.html? Did you also used the new pvm
> scripts from this Howto page? What is your definition of the
> corresponding PE?

I followed the instruction on that page, sure.
I downloaded this pvm scripts:
http://gridengine.sunsource.net/howto/pvm-integration/pvm60.tgz
as indicated at the bottom of the page.
The PE definition:

$ qconf -sp pvm
pe_name           pvm
slots             54
user_lists        AC
xuser_lists       NONE
start_proc_args   /opt/n1ge6/pvm/startpvm.sh $pe_hostfile $host  \
                  /usr/share/pvm3
stop_proc_args    /opt/n1ge6/pvm/stoppvm.sh $pe_hostfile $host
allocation_rule   1
control_slaves    FALSE
job_is_first_task TRUE
urgency_slots     min


>
> I'm confused, why your output contains node3 and node7, as the script
> will only run on the headnode of your parallel job.

Ehr... The headnode is not execution node, just submit (and master)
host. It seems that GE takes then node3 as pvm master node or, at
least, it tries to launch pvm from that node. nodes1-9 are not submit
hosts.

 $ qconf -sel
node1.sge.ifom-ieo-campus.it
node2.sge.ifom-ieo-campus.it
node3.sge.ifom-ieo-campus.it
node4.sge.ifom-ieo-campus.it
node5.sge.ifom-ieo-campus.it
node6.sge.ifom-ieo-campus.it
node7.sge.ifom-ieo-campus.it
node8.sge.ifom-ieo-campus.it
node9.sge.ifom-ieo-campus.it

$ qconf -sh
node1.sge.ifom-ieo-campus.it
node2.sge.ifom-ieo-campus.it
node3.sge.ifom-ieo-campus.it
node4.sge.ifom-ieo-campus.it
node5.sge.ifom-ieo-campus.it
node6.sge.ifom-ieo-campus.it
node7.sge.ifom-ieo-campus.it
node8.sge.ifom-ieo-campus.it
node9.sge.ifom-ieo-campus.it
special.sge.ifom-ieo-campus.it

qconf -ss
special.sge.ifom-ieo-campus.it

d

>
> -- Reuti
>
>
> Am 10.04.2006 um 12:41 schrieb davide cittaro:
>
> > Hi all, I still have a little problem with PVM (pvm 3.4) on linux.
> > I have installed PVM on Gentoo, everything is in /usr/share/pvm3
> > I can use pvm alone, without GE, just executing
> > /usr/share/pvm3/lib/pvm hostfile, with a hostfile I created...
> > I followed the instruction given on gridengine site to integrate pvm
> > with GE, but:
> >
> > dcittaro at special ~/pvmtest $ cat tester_loose.sh.po39603
> > /opt/n1ge6/omix/spool/node3/active_jobs/39603.1/pe_hostfile
> > node3.sge.ifom-ieo-campus.it /usr/share/pvm3
> > /tmp/pvmtmp015264.0
> > startpvm.sh: startup failed - invoking cleanup script
> > /opt/n1ge6/omix/spool/node3/active_jobs/39603.1/pe_hostfile
> > node3.sge.ifom-ieo-campus.it
> > /opt/n1ge6/omix/spool/node3/active_jobs/39603.1/pe_hostfile
> > node3.sge.ifom-ieo-campus.it
> > /opt/n1ge6/omix/spool/node7/active_jobs/39603.1/pe_hostfile
> > node7.sge.ifom-ieo-campus.it /usr/share/pvm3
> > /tmp/pvmtmp005727.0
> > startpvm.sh: startup failed - invoking cleanup script
> > /opt/n1ge6/omix/spool/node7/active_jobs/39603.1/pe_hostfile
> > node7.sge.ifom-ieo-campus.it
> > /opt/n1ge6/omix/spool/node7/active_jobs/39603.1/pe_hostfile
> > node7.sge.ifom-ieo-campus.it
> >
> > and
> >
> > dcittaro at special ~/pvmtest $ cat tester_loose.sh.pe39603
> > startpvm: Couldn't get all of the 4 requested hosts
> > rm: cannot remove `/tmp/39603.1.bofh.q/hostfile': No such file or
> > directory
> > libpvm [pid15289] /tmp/pvmd.2486: No such file or directory
> > libpvm [pid15289] /tmp/pvmd.2486: No such file or directory
> > libpvm [pid15289]: pvm_halt(): Can't contact local daemon
> > startpvm: Couldn't get all of the 4 requested hosts
> > rm: cannot remove `/tmp/39603.1.bofh.q/hostfile': No such file or
> > directory
> > libpvm [pid5752] /tmp/pvmd.2486: No such file or directory
> > libpvm [pid5752] /tmp/pvmd.2486: No such file or directory
> > libpvm [pid5752]: pvm_halt(): Can't contact local daemon
> >
> > I've seen that this happened to somebody on net-bsd but there was not
> > that help...
> > How do you think can I fix this? What I should check in addition?
> >
> > Thanks
> >
> > d
> > --
> > dawe
> > http://dawe.ilbello.com
> > ---
> > "Prediction is very difficult, especially if it's about the future." -
> > Niels Bohr
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>


--
dawe
http://dawe.ilbello.com
---
"Prediction is very difficult, especially if it's about the future." -
Niels Bohr

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list