[GE users] How to clear internal hostname cache?

Joe Landman landman at scalableinformatics.com
Wed Mar 22 14:20:50 GMT 2006

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Ok, awake again ...

Kim Leng Goh wrote:
> Step 3 didn't work.
> $ qstat -f
> denied: host "network-0-0.local" is neither submit nor admin host

Hmmm.... this suggests that its the head node that has problems and not 
the compute node.

Sledgehammer again: copy this to a file, make it executable, and then 
run it.

@files = `find /opt/gridengine`;
foreach $file (@files)
   chomp($grep = `grep -i network-0-0 $file`);
   printf "%s\n",$file if ($grep ne "");

This will run through all of your files in /opt/gridengine (could be a 
large number), and print out the names of any of them which have 
network-0-0 within them.  Somewhat like a recursive grep, only dealing 
with a binary indicator of existance.


> Only thing I have not tried is restarting rcsge on the head node.
> Since adding back the compute-0-7 node is less a priority than
> ensuring the integrity of the jobs on the other nodes, I hope to find
> another solution.

Are the other jobs going to terminate soon?  SGE 5 didn't like being 
reset with jobs running as I remember.

I might suggest that it is time to upgrade to the newer SGE.  You can 
install it "alongside" the 5.3 by hand.  Put it in /opt/sge6.0u7 or 
something like that.  Use a different set of ports in /etc/services. 
Add in compute-0-7 by installing the newer sge there as well, and 
turning it on.  As other nodes finish their jobs, switch them.

I wish I had a better answer.


> On 3/22/06, Kim Leng Goh <kimleng.goh at gmail.com> wrote:
>>Wow, it's amazing that a normal user (not root) can execute
>>/boot/kickstart/cluster-kickstart successfully:
>>On 3/22/06, Joe Landman <landman at scalableinformatics.com> wrote:
>>>3) if the above doesn't work, then you need to go to the tactical
>>>battlefield weaponry and re-install that compute node.  Remove all
>>>traces of compute-0-7 from the sge directories on the head node using
>>>the qconf tools or the qmon tool, and restart sge on the head node.
>>>Note:  before you do the following, THIS WILL DESTROY DATA AND
>>>CONFIGURATION FILES ON THE COMPUTE NODE.   Don't do this unless you have
>>>no other choice.  Rocks will automatically set most everything up for
>>>you, but if you have made changes, you are going to need to re-replicate
>>>those changes.
>>>Only if you are really sure you want to do this.  Remember, data and
>>>other bits will be forever lost from compute-0-7 if you follow these steps.
>>>Once you are really, really sure you want to do this, log onto the
>>>compute-0-7 (DO NOT TYPE THIS ON THE HEAD NODE!!!!!!!) and type
>>>        /boot/kickstart/cluster-kickstart
>>>and then step away.
>>>The node will be rebuilt.  SGE will be re-installed.  Everything should
>>$ /boot/kickstart/cluster-kickstart
>>Shutting down kernel logger:                               [  OK  ]
>>Shutting down system logger:                               [  OK  ]
>>Broadcast message from root (pts/0) (Wed Mar 22 14:42:28 2006):
>>The system is going down for reboot NOW!
>>>Unless the Rocks database is toasted.  Or the distribution has been
>>>damaged.  But you have bigger worries if this is the case.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list