No subject


Wed Jan 12 20:38:46 GMT 2011


"/proc/kcore is like an "alias" for the memory in your computer."



On 3/22/06, Joe Landman <landman at scalableinformatics.com> wrote:
> Ok, awake again ...
>
> Kim Leng Goh wrote:
> > Step 3 didn't work.
> >
> > $ qstat -f
> > denied: host "network-0-0.local" is neither submit nor admin host
>
> Hmmm.... this suggests that its the head node that has problems and not
> the compute node.
>
> Sledgehammer again: copy this to a file, make it executable, and then
> run it.
>
> #!/usr/bin/perl
> @files = `find /opt/gridengine`;
> foreach $file (@files)
>   {
>    chomp($grep = `grep -i network-0-0 $file`);
>    printf "%s\n",$file if ($grep ne "");
>   }
>
>
> This will run through all of your files in /opt/gridengine (could be a
> large number), and print out the names of any of them which have
> network-0-0 within them.  Somewhat like a recursive grep, only dealing
> with a binary indicator of existance.
>
>
>
> > Only thing I have not tried is restarting rcsge on the head node.
> > Since adding back the compute-0-7 node is less a priority than
> > ensuring the integrity of the jobs on the other nodes, I hope to find
> > another solution.
>
> Are the other jobs going to terminate soon?  SGE 5 didn't like being
> reset with jobs running as I remember.
>
> I might suggest that it is time to upgrade to the newer SGE.  You can
> install it "alongside" the 5.3 by hand.  Put it in /opt/sge6.0u7 or
> something like that.  Use a different set of ports in /etc/services.
> Add in compute-0-7 by installing the newer sge there as well, and
> turning it on.  As other nodes finish their jobs, switch them.
>
> I wish I had a better answer.
>
> Joe
>
> >
> > On 3/22/06, Kim Leng Goh <kimleng.goh at gmail.com> wrote:
> >
> >>Wow, it's amazing that a normal user (not root) can execute
> >>/boot/kickstart/cluster-kickstart successfully:
> >>
> >>On 3/22/06, Joe Landman <landman at scalableinformatics.com> wrote:
> >>[...]
> >>
> >>>3) if the above doesn't work, then you need to go to the tactical
> >>>battlefield weaponry and re-install that compute node.  Remove all
> >>>traces of compute-0-7 from the sge directories on the head node using
> >>>the qconf tools or the qmon tool, and restart sge on the head node.
> >>>
> >>>Note:  before you do the following, THIS WILL DESTROY DATA AND
> >>>CONFIGURATION FILES ON THE COMPUTE NODE.   Don't do this unless you have
> >>>no other choice.  Rocks will automatically set most everything up for
> >>>you, but if you have made changes, you are going to need to re-replicate
> >>>those changes.
> >>>
> >>>Only if you are really sure you want to do this.  Remember, data and
> >>>other bits will be forever lost from compute-0-7 if you follow these steps.
> >>>
> >>>Once you are really, really sure you want to do this, log onto the
> >>>compute-0-7 (DO NOT TYPE THIS ON THE HEAD NODE!!!!!!!) and type
> >>>
> >>>        /boot/kickstart/cluster-kickstart
> >>>
> >>>and then step away.
> >>>
> >>>The node will be rebuilt.  SGE will be re-installed.  Everything should
> >>>work.
> >>
> >>
> >>$ /boot/kickstart/cluster-kickstart
> >>Shutting down kernel logger:                               [  OK  ]
> >>Shutting down system logger:                               [  OK  ]
> >>
> >>Broadcast message from root (pts/0) (Wed Mar 22 14:42:28 2006):
> >>
> >>The system is going down for reboot NOW!
> >>
> >>
> >>
> >>>Unless the Rocks database is toasted.  Or the distribution has been
> >>>damaged.  But you have bigger worries if this is the case.
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web  : http://www.scalableinformatics.com
> phone: +1 734 786 8423
> fax  : +1 734 786 8452
> cell : +1 734 612 4615
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list