[GE users] sge master dying

Iwona Sakrejda isakrejda at lbl.gov
Fri Jun 22 00:46:53 BST 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi,

please see below my replies....


Andreas.Haas at Sun.COM wrote:
>  ... actually to be entirely sure you pass each hostname to
>
>    $SGE_ROOT/utilbin/<arch>/gethostbyname
>
> as to check whether C library call gethostbyname() can also cope with 
> them. If that works fine, we can be sure it is not so trivial ...
I checked that way all the hosts in all the hostgroups and they resolve 
fine. Here is how I did it:
 cat @athlon03|grep hostlist|sed 's/ /\n/'g|grep pc|awk '{print " 
/common/sge/6.0u4/utilbin/lx24-x86/gethostbyname "$1}'|sh
and like that group by group which I hope eliminates possible typos (all 
my host names start with "pc").

and for each host I get something like that:

Host Address(es): 128.55.37.73
Hostname: pc2834.nersc.gov
Aliases:  pc2834

Some of my hostgroups (not the ones that I attempted to modify) are 
empty and that's ok, right?
>
> Actually I'm curious to see the host group before the 'qconf -mhgrp' 
> change
> and the new host group configuration. 
I tried two different hostgroups on 4 or so occasions and the master 
always crashed.
I did not experiment more because taht upsets users.

The hostgroup I was trying to modify looks as follows:

[root at pc2533 hostgroups]# cat @debug
# Version: SGE 6.0u4
#
# DO NOT MODIFY THIS FILE MANUALLY!
#
group_name  @debug
hostlist    pc2632.nersc.gov pc2104.nersc.gov pc0920.nersc.gov 
pc0922.nersc.gov pc0928.nersc.gov

and qconf -shgrp @debug shows it as:

pc2609 74% qconf -shgrp @debug
group_name @debug
hostlist pc2632.nersc.gov pc2104.nersc.gov pc0920.nersc.gov 
pc0922.nersc.gov \
         pc0928.nersc.gov

I was trying to add a space and pc2302.nersc.gov in the line above.

Actually when I made a typo (a "," instead of " " for the separator) I got
a message about the problem and the master survived that without any 
problems.
> I guess you are using that host group in a cluster queue configuration.
yes.
>
> Regards,
> Andreas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list