[GE users] Install problem setting up GE 6.1 u2 on Fedora 8

Daniel Templeton Dan.Templeton at sun.com
Tue Nov 27 02:03:53 GMT 2007


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Try just running the sge_qmaster process with dl 1 set.  It should tell 
you exactly what's wrong.

Daniel

Bruce Rothermal wrote:
> Found it bin/sge_qmaster
>
> strace looks like it is trying to bind to address 0.0.0.0
>
> uname({sys="Linux", node="strat", ...}) = 0
> getrlimit(RLIMIT_NOFILE, {rlim_cur=8*1024, rlim_max=8*1024}) = 0
> socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 4
> setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> bind(4, {sa_family=AF_INET, sin_port=htons(48620), 
> sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already 
> in use)
> shutdown(4, 2 /* send and receive */)   = -1 ENOTCONN (Transport 
> endpoint is not connected)
> close(4)                                = 0
> write(2, "    81   9758 46912496256640 ", 29    81   9758 
> 46912496256640 ) = 29
> write(2, "    ../daemons/qmaster/sge_qmast"..., 96    
> ../daemons/qmaster/sge_qmaster_main.c 328 abort qmaster startup due to 
> communication errors
> ) = 96
> open("/tmp/sge_messages", O_WRONLY|O_CREAT|O_APPEND, 0666) = 4
> write(4, "11/26/2007 18:10:40|qmaster|stra"..., 86) = 86
> close(4)                                = 0
> exit_group(1)                           = ?
>
> I'll have to look in the source to find out which function it is using 
> to get the address. Probably a configuration problem. I'm more 
> familiar with working with Solaris for the past 11 yrs. Fedora is new 
> to me.
>
> Bruce
>
> Daniel Templeton wrote:
>> Nope.  You should source the dl.[c]sh file and then run "dl 1".  Or, 
>> instead you could additionally set the SGE_ND env var to 1.  The dl 
>> script is easier to deal with, though.
>>
>> Daniel
>>
>> Bruce Rothermal wrote:
>>> Thanks Daniel
>>>
>>> I set the env var SGE_DEBUG_LEVEL="2 0 0 0 0 0 0 0"; export 
>>> SGE_DEBUG_LEVEL
>>>
>>> Then run install   install_qmaster
>>>
>>> Everything runs the same with no debug output to the terminal. Is it 
>>> supposed to go to some log file or the terminal?
>>>
>>> Bruce
>>>
>>>
>>> Daniel Templeton wrote:
>>>> When I have that problem, it's normally because the qmaster's port 
>>>> is taken by some other application.  You can see what's going on by 
>>>> setting debug level 1.  See:
>>>>
>>>> http://blogs.sun.com/templedf/entry/using_debugging_output
>>>>
>>>> for details on setting debug levels.
>>>>
>>>> Daniel
>>>>
>>>> Bruce Rothermal wrote:
>>>>> I'm running through the install process and I seam to be hanging 
>>>>> at the point of:
>>>>>
>>>>>> Grid Engine qmaster and scheduler startup
>>>>>> -----------------------------------------
>>>>>>
>>>>>> Starting qmaster and scheduler daemon. Please wait ...
>>>>>>    starting sge_qmaster
>>>>>
>>>>> I am using all default parameters in the install script except 
>>>>> installing as root, sge_qmaster     48620/tcp  and sge_execd       
>>>>> 48621/tcp. Does anybody have suggestions how to figure out what is 
>>>>> hung here.  I think it is at the point where qmaster is being 
>>>>> started. Ive changed the script default/common/sgemaster right 
>>>>> after it is created to print out debug info and it shows I'm 
>>>>> looping at this point in the script:
>>>>>> + masterhost=strat
>>>>>> ++ expr 9 + 1
>>>>>> + loop=10
>>>>>> + '[' false = false -a 10 -ne 30 ']'
>>>>>> + /sge/bin/lx24-amd64/qping -info strat 48620 qmaster 1
>>>>>> + '[' 1 = 0 ']'
>>>>>> + sleep 2
>>>>>> ++ cat /sge/default/common/act_qmaster
>>>>>> + masterhost=strat
>>>>>> ++ expr 10 + 1
>>>>>> + loop=11
>>>>>> + '[' false = false -a 11 -ne 30 ']'
>>>>>> + /sge/bin/lx24-amd64/qping -info strat 48620 qmaster 1
>>>>>> + '[' 1 = 0 ']'
>>>>>> + sleep 2
>>>>>> ++ cat /sge/default/common/act_qmaster
>>>>>> + masterhost=strat
>>>>>> ++ expr 11 + 1
>>>>>> + loop=12
>>>>> I've attached a strace file of the /sge/bin/lx24-amd64/qping -info 
>>>>> strat 48620 qmaster 1 which is looping and it shows it is timing out.
>>>>>
>>>>> Anyone know which process it is trying to ping so I can trace it, 
>>>>> or any other ideas?
>>>>>
>>>>> Thanks for any help
>>>>>
>>>>> Bruce Rothermal
>>>>>
>>>>> ------------------------------------------------------------------------ 
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list