No subject


Wed Jan 12 20:38:46 GMT 2011


environment isn't being set up properly (PVM_ROOT, etc.) but as I stated, I
put that stuff in my dot files but it didn't help.


On 6/15/06, Bernard Li <bli at bcgsc.ca> wrote:
>
>  Have you checked /tmp/pvm* for pvmd log messages?
>
> Cheers,
>
> Bernard
>
>  ------------------------------
> *From:* Greg A [mailto:clusterman at gmail.com]
> *Sent:* Thursday, June 15, 2006 12:48
> *To:* users at gridengine.sunsource.net
> *Subject:* [GE users] pvm tight integration help
>
> We are having some difficulty getting PVM tight integration to work and we
> are hoping someone can help.
>
> Our test grid has a parallel queue set up with a couple different pvm
> environments defined.  We created one to test loose integration and one to
> test tight.  We followed the recipe that Reuti wrote and for some reason our
> qrsh is hanging and the jobs don't start on the slave nodes.  Instead it
> tries to transfer the job to another node until all nodes Error out.
>
> We are using the tester_tight script along with the hello code downloaded
> from the site.  We've also tried our pvm scripts and code but haven't had
> any success there either.
>
> Here is a "ps" output I captured on the master node after submitting a pvm
> job.
>
> # qsub -pe pvm-sf 4 tester_tight.sh
> # rsh node01 ps -e f -o pid,ppid,pgrp,command --cols=100
>  2212     1  2212 [sge_execd]
> 12064  2212 12064  \_ [sge_shepherd]
> 12065 12064 12065      \_ /bin/sh -f /sge_root/pvm/startpvm.sh -catch_rsh
> 12075 12065 12065          \_ /sge_root/pvm/bin/lx24-x86/start_pvm -h 4 -n
> node05
> 12076 12075 12065              \_ [qrsh <defunct>]
>
> Our grid is running Redhat 9 and our native pvm installation is version
> 3.4.4.  I thought this may be an issue because Reuti's recipe calls for
> version 3.4.5 so I went ahead and installed that in my home directory.  I
> then repointed the pvm environment to that source and it still failed and
> got stuck at the same place with the same "ps" output.  I've also updated my
> .cshrc with the proper PVM_ROOT and PVM_ARCH thinking that the
> /etc/profile.d/pvm.csh that version pvm 3.4.4 installs was causing the
> issue.  That didn't help and I still get stuck at the qrsh <defunct> spot.
>
> I'm seeing very little info on the messages files but here is an example
> of the repeated message:
>
> 06/15/2006 11:36:39|execd|node05|W|reaping job "6506" ptf complains: Job
> does
> not exist
> 06/15/2006 12:26:25|execd|node05|E|shepherd of job 6507.1 exited with exit
> sta
> tus = 10
>
> We'd really appreciate any help!
>



More information about the gridengine-users mailing list