[GE users] qmake: "waiting for child failed: timeout"

Jan Behrend jbehrend at mpifr-bonn.mpg.de
Tue Dec 20 13:50:21 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Joachim,

I found the problem:  It actually is the starter_method.  Do you have
any suggestions how to change the starter_method to not truncate the
command line. I cannot use the -v or -V parameters for the environment
setup since I submit from a different architecture compared to the one I
do the compilation on.  So the sge binary path (lx24-am64 or lx24-x86)
must be set during the login procedure.  This environment setup problem
has several times been addressed in this list, the best solution, I
thought, was the starter_method.

Cheers, Jan

 Joachim Gabler wrote:
> Hi Jan,
> 
> the commandline generated by qmake looks ok, but the commandline
> arguments seem to get truncated somewhere:
> It executes rm, but the options to rm seem to be missing.
> The same happended in your previous example - it executed /bin/sh -c,
> but without further options.
> 
> I do not know the beowulf integration - there seems to be some mechanism
> (starter_method?), that outputs the
> "Starting server daemon ..." lines.
> 
> Maybe this script truncates the commandline?
> 
> Do simple qrsh jobs run correctly (on the same host), e.g.
> qrsh -l h=atom403.beowulf.iri.mpifr-bonn.mpg.de rm -rf /tmp/blablabla
> 
>   Joachim
> 
> Jan Behrend schrieb:
> 
>> Hi Joachim,
>>
>> thanks for the quick answer.  See my comments below ...
>>
>> Joachim Gabler wrote:
>>  
>>
>>> Jan,
>>>
>>> Jan Behrend wrote:
>>>
>>>   
>>>
>>>> Hello,
>>>>
>>>> I am using sge 6.0u7 and have the default PE like this:
>>>> $ qconf -sp make
>>>> pe_name           make
>>>> slots             999
>>>> user_lists        NONE
>>>> xuser_lists       NONE
>>>> start_proc_args   NONE
>>>> stop_proc_args    NONE
>>>> allocation_rule   $round_robin
>>>> control_slaves    TRUE
>>>> job_is_first_task FALSE
>>>> urgency_slots     min
>>>>
>>>> When I test the PE with the following qmake command on the standard
>>>> gnupg sources everything looks fine until I get the following error:
>>>> The environment (PATH, etc.) is setup correctly (I hope).
>>>>
>>>> gnupg-1.4.1$ qmake -verbose -cwd -l arch=lx24-x86 -pe make 2 -- 2>&1
>>>>
>>>>
>>>>     
>>>
>>> If you start qmake with the -verbose option, you will see messages
>>> like the
>>> waiting for child failed: timeout
>>> This just means that qmake waited for the termination of a child (with a
>>> certain timeout), and
>>> the child didn't finish before the timeout. This is normal behaviour,
>>> just a verbose message.
>>>
>>>   
>>>
>>>> gcc: -c: line 2: syntax error: unexpected end of file
>>>>
>>>>
>>>>     
>>>
>>> This is a build error.
>>> Is this reproducable?
>>>   
>>
>> Yes, it is.
>>
>> When I do a clean target for example, I get the following:
>>
>> atom403.beowulf.iri.mpifr-bonn.mpg.de
>> starting job:
>> args[  0] = qrsh
>> args[  1] = -noshell
>> args[  2] = -verbose
>> args[  3] = -inherit
>> args[  4] = -nostdin
>> args[  5] = -cwd
>> args[  6] = -v
>> args[  7] = LS_COLORS,QRSH_COMMAND,MFLAGS,MAKEFLAGS,MAKELEVEL
>> args[  8] = atom403.beowulf.iri.mpifr-bonn.mpg.de
>> args[  9] = rm
>> args[ 10] = -rf
>> args[ 11] = gpg.aux
>> args[ 12] = gpg.cp
>> args[ 13] = gpg.cps
>> args[ 14] = gpg.fn
>> args[ 15] = gpg.fns
>> args[ 16] = gpg.ky
>> args[ 17] = gpg.kys
>> args[ 18] = gpg.log
>> args[ 19] = gpg.pg
>> args[ 20] = gpg.pgs
>> args[ 21] = gpg.tmp
>> args[ 22] = gpg.toc
>> args[ 23] = gpg.tp
>> args[ 24] = gpg.tps
>> args[ 25] = gpg.vr
>> args[ 26] = gpg.vrs
>> args[ 27] = gpg.dvi
>> args[ 28] = gpg.pdf
>> args[ 29] = gpg.ps
>> args[ 30] = gpg.html
>> args[ 31] = gpgv.aux
>> args[ 32] = gpgv.cp
>> args[ 33] = gpgv.cps
>> args[ 34] = gpgv.fn
>> args[ 35] = gpgv.fns
>> args[ 36] = gpgv.ky
>> args[ 37] = gpgv.kys
>> args[ 38] = gpgv.log
>> args[ 39] = gpgv.pg
>> args[ 40] = gpgv.pgs
>> args[ 41] = gpgv.tmp
>> args[ 42] = gpgv.toc
>> args[ 43] = gpgv.tp
>> args[ 44] = gpgv.tps
>> args[ 45] = gpgv.vr
>> gmake requesting status of dead child processes
>> waiting for child failed: timeout
>> gmake requesting status of dead child processes
>> args[ 46] = gpgv.vrs
>> args[ 47] = gpgv.dvi
>> args[ 48] = gpgv.pdf
>> args[ 49] = gpgv.ps
>> args[ 50] = gpgv.html
>> Starting server daemon at host
>> "atom405.beowulf.iri.mpifr-bonn.mpg.de"Starting server daemon at host
>> "atom403.beowulf.iri.mpifr-bonn.mpg.de"
>>
>> Server daemon successfully started with task id "1.iripc46"
>> Establishing /opt/sge-root/utilbin/lx24-x86/rsh session to host
>> atom405.beowulf.iri.mpifr-bonn.mpg.de ...
>> Server daemon successfully started with task id "2.irisrv2"
>> Establishing /opt/sge-root/utilbin/lx24-x86/rsh session to host
>> atom403.beowulf.iri.mpifr-bonn.mpg.de ...
>> /opt/sge-root/utilbin/lx24-x86/rsh exited with exit code 0
>> reading exit code from shepherd ... 1
>> qmake[1]: [clean-generic] Error 1 (ignored)
>> waiting for child failed: timeout
>> /opt/sge-root/utilbin/lx24-x86/rsh exited with exit code 0
>> reading exit code from shepherd ... 0
>> waiting for child failed: timeout
>> Starting server daemon at host "atom405.beowulf.iri.mpifr-bonn.mpg.de"
>> Server daemon successfully started with task id "2.iripc46"
>> Establishing /opt/sge-root/utilbin/lx24-x86/rsh session to host
>> atom405.beowulf.iri.mpifr-bonn.mpg.de ...
>> rm: too few arguments
>> Try `rm --help' for more information.
>> /opt/sge-root/utilbin/lx24-x86/rsh exited with exit code 0
>> reading exit code from shepherd ... 1
>> qmake[1]: *** [mostlyclean-quot] Error 1
>> qmake: *** [clean-recursive] Error 1
>>
>>  
>>
>>> It would be interesting to see the qmake output before the error
>>> message, where it dumps the commandline used to build the target
>>> fileutil.o.
>>>   
>>
>>
>> Here it is:
>>
>> atom403.beowulf.iri.mpifr-bonn.mpg.de
>> starting job:
>> args[  0] = qrsh
>> args[  1] = -noshell
>>
>>                                                   args[  2] = -verbose
>> args[  3] = -inherit
>> args[  4] = -cwd
>>
>>                                                   args[  5] = -v
>> args[  6] = LS_COLORS,QRSH_COMMAND,MFLAGS,MAKEFLAGS,MAKELEVEL
>> args[  7] = -v
>>
>>                                                   args[  8] =
>> SHELL,LS_COLORS,QRSH_COMMAND,MFLAGS,MAKEFLAGS,MAKELEVEL
>> args[  9] = atom403.beowulf.iri.mpifr-bonn.mpg.de
>> args[ 10] = /bin/sh
>>
>>                                                   args[ 11] = -c
>> args[ 12] = if gcc -DHAVE_CONFIG_H -I.
>> -I/home/iriuser/test/gnupg-1.4.1/util -I.. -I..
>> -I/home/iriuser/test/gnupg-1.4.1/include
>> -I/home/iriuser/test/gnupg-1.4.1/intl    -g -O2 -Wall -MT logger.o
>> -MD -MP -MF ".deps/logger.Tpo" -c -o logger.o logger.c;  then mv -f
>> ".deps/logger.Tpo" ".deps/logger.Po"; else rm -f ".deps/logger.Tpo";
>> exit 1; fi
>> gmake requesting status of dead child processes
>> waiting for child failed: timeout
>> enabling next task to be executed as Grid Engine parallel task
>> waiting for child failed: timeout
>> local configuration atom403.beowulf.iri.mpifr-bonn.mpg.de not defined -
>> using global configuration
>> Starting server daemon at host "atom403.beowulf.iri.mpifr-bonn.mpg.de"
>> local configuration atom403.beowulf.iri.mpifr-bonn.mpg.de not defined -
>> using global configuration
>> Starting server daemon at host "atom405.beowulf.iri.mpifr-bonn.mpg.de"
>> Server daemon successfully started with task id "1.iripc46"
>> Establishing /opt/sge-root/utilbin/lx24-x86/rsh session to host
>> atom405.beowulf.iri.mpifr-bonn.mpg.de ...
>> Server daemon successfully started with task id "1.irisrv2"
>> Establishing /opt/sge-root/utilbin/lx24-x86/rsh session to host
>> atom403.beowulf.iri.mpifr-bonn.mpg.de ...
>> gcc: -c: line 2: syntax error: unexpected end of file
>> /opt/sge-root/utilbin/lx24-x86/rsh exited with exit code 0
>> reading exit code from shepherd ... 2
>>
>>                                                   qmake[2]: ***
>> [fileutil.o] Error 2
>> qmake[2]: *** Waiting for unfinished jobs....
>>
>>  
>>
>>> This might be a Makefile issue, e.g. with a rule for the target spawning
>>> multiple lines (without backslash at the end of the line), there are
>>> some examples for qmake problems in the qmake man page.
>>>   
>>
>> What I tried is an simple make with qrsh:
>>
>> qrsh -cwd -q workstations -l arch=lx24-x86 'make'
>>
>> This runs beautifully!
>>
>> I am using the starter method suggested by Reuti:
>> http://gridengine.sunsource.net/servlets/ReadMsg?listName=users&msgNo=8503
>>
>>
>> #!/bin/bash --login
>>
>> exec $*
>>
>> Could this be a quoting problem concerning backslashes?
>>
>> Cheers, Jan
>>
>>  
>>
>>> Best regards,
>>>
>>> Joachim
>>>
>>>   
>>>
>>>> /opt/sge-root/utilbin/lx24-x86/rsh exited with exit code 0
>>>> reading exit code from shepherd ... 2
>>>> qmake[2]: *** [fileutil.o] Error 2
>>>> qmake[2]: *** Waiting for unfinished jobs....
>>>> waiting for child failed: timeout
>>>>
>>>> When I look at the output a little further up there are more of the
>>>> "waiting for child failed: timeout" statements.
>>>>
>>>> The first one looks like this:
>>>>
>>>> local configuration beowulf.beowulf.iri.mpifr-bonn.mpg.de not defined -
>>>> using global configuration
>>>> waiting for interactive job to be scheduled ...
>>>> Your interactive job 761 has been successfully scheduled.
>>>> Establishing /opt/sge-root/utilbin/lx24-amd64/rsh session to host
>>>> atom403.beowulf.iri.mpifr-bonn.mpg.de ...
>>>> sge_argv[0] = qmake
>>>> sge_argv[1] = -inherit
>>>> sge_argv[2] = -verbose
>>>> sge_argv[3] = -cwd
>>>> sge_argv[4] = -l
>>>> sge_argv[5] = arch=lx24-x86
>>>> gmake_argv[0]  = qmake
>>>> determine qmake startmode
>>>> inserting -j option from NSLOTS environment: -j 2
>>>> sge hostfile =
>>>> /opt/sge-root/iri/spool/irisrv2/active_jobs/761.1/pe_hostfile
>>>> qmake  hostfile = /tmp/761.1.workstations/qmake_hostfile
>>>> qmake  lockfile = /tmp/761.1.workstations/qmake_lockfile
>>>> creating qmake hostfile
>>>> number of slots for qmake execution is 2
>>>> enabling next task to be executed as Grid Engine parallel task
>>>> qmake  all-recursive
>>>> export the following environment variables:
>>>> SHELL,LS_COLORS,QRSH_COMMAND,MFLAGS,MAKEFLAGS,MAKELEVEL
>>>> detected recursive make - starting on local machine
>>>> waiting for child failed: timeout
>>>> irisrv2.iri.mpifr-bonn.mpg.de
>>>> starting job:
>>>>
>>>> [...]
>>>>
>>>> Does anyone have an idea?
>>>>
>>>> Cheers Jan Behrend
>>>>
>>>>
>>>>
>>>>     
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>   
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>  
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 


-- 
Jan Behrend
Max-Planck-Institut für Radioastronomie
Abteilung für Infrarot-Interferometrie  Tel:   (+49) 228 525 319
Auf dem Hügel 69                        Fax:   (+49) 228 525 411
D-53121 Bonn (Germany)                  jbehrend at mpifr-bonn.mpg.de
                                        http://www.mpifr-bonn.mpg.de


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list