[GE users] How to clear internal hostname cache?

Joe Landman landman at scalableinformatics.com
Wed Mar 22 14:20:50 GMT 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Ok, awake again ...

Kim Leng Goh wrote:
> Step 3 didn't work.
> 
> $ qstat -f
> denied: host "network-0-0.local" is neither submit nor admin host

Hmmm.... this suggests that its the head node that has problems and not 
the compute node.

Sledgehammer again: copy this to a file, make it executable, and then 
run it.

#!/usr/bin/perl
@files = `find /opt/gridengine`;
foreach $file (@files)
  {
   chomp($grep = `grep -i network-0-0 $file`);
   printf "%s\n",$file if ($grep ne "");
  }


This will run through all of your files in /opt/gridengine (could be a 
large number), and print out the names of any of them which have 
network-0-0 within them.  Somewhat like a recursive grep, only dealing 
with a binary indicator of existance.

	

> Only thing I have not tried is restarting rcsge on the head node.
> Since adding back the compute-0-7 node is less a priority than
> ensuring the integrity of the jobs on the other nodes, I hope to find
> another solution.

Are the other jobs going to terminate soon?  SGE 5 didn't like being 
reset with jobs running as I remember.

I might suggest that it is time to upgrade to the newer SGE.  You can 
install it "alongside" the 5.3 by hand.  Put it in /opt/sge6.0u7 or 
something like that.  Use a different set of ports in /etc/services. 
Add in compute-0-7 by installing the newer sge there as well, and 
turning it on.  As other nodes finish their jobs, switch them.

I wish I had a better answer.

Joe

> 
> On 3/22/06, Kim Leng Goh <kimleng.goh at gmail.com> wrote:
> 
>>Wow, it's amazing that a normal user (not root) can execute
>>/boot/kickstart/cluster-kickstart successfully:
>>
>>On 3/22/06, Joe Landman <landman at scalableinformatics.com> wrote:
>>[...]
>>
>>>3) if the above doesn't work, then you need to go to the tactical
>>>battlefield weaponry and re-install that compute node.  Remove all
>>>traces of compute-0-7 from the sge directories on the head node using
>>>the qconf tools or the qmon tool, and restart sge on the head node.
>>>
>>>Note:  before you do the following, THIS WILL DESTROY DATA AND
>>>CONFIGURATION FILES ON THE COMPUTE NODE.   Don't do this unless you have
>>>no other choice.  Rocks will automatically set most everything up for
>>>you, but if you have made changes, you are going to need to re-replicate
>>>those changes.
>>>
>>>Only if you are really sure you want to do this.  Remember, data and
>>>other bits will be forever lost from compute-0-7 if you follow these steps.
>>>
>>>Once you are really, really sure you want to do this, log onto the
>>>compute-0-7 (DO NOT TYPE THIS ON THE HEAD NODE!!!!!!!) and type
>>>
>>>        /boot/kickstart/cluster-kickstart
>>>
>>>and then step away.
>>>
>>>The node will be rebuilt.  SGE will be re-installed.  Everything should
>>>work.
>>
>>
>>$ /boot/kickstart/cluster-kickstart
>>Shutting down kernel logger:                               [  OK  ]
>>Shutting down system logger:                               [  OK  ]
>>
>>Broadcast message from root (pts/0) (Wed Mar 22 14:42:28 2006):
>>
>>The system is going down for reboot NOW!
>>
>>
>>
>>>Unless the Rocks database is toasted.  Or the distribution has been
>>>damaged.  But you have bigger worries if this is the case.
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list