[GE users] SGE6 does not backfill

christian reissmann Christian.Reissmann at Sun.COM
Wed Apr 13 16:34:21 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


Hi Juha,

It would be very nice to have the qping -dump output with
"message content dump (xml/bin/cull)" enabled.

(This would create ENORMOUS!!! output, but at least shows EVERYTHING
that is transmitted to the execds)

To get the output:

1) Become root on qmaster machine
2) source $SGE_ROOT/default/common/settings file
3) setenv SGE_QPING_OUTPUT_FORMAT "s:12" (This enables the content dump)
4) qping -dump gridware $SGE_QMASTER_PORT qmaster 1 > output.txt
(be aware of getting a large output.txt file)

Best Regards,

Christian



Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
> 
> Juha Jäykkä wrote:
> 
> 
>>About the file descriptors: I increased ulimit -n from 1024 to 8192, but
>>it did not help at all.
>>
>> 
>>
>>
>>>I did not get the qping.log.
>>>   
>>>
>>
>>Strange. Perhaps users at ... kills too long attachments? I put it again on
>>this message, but also cc'd you directly, perhaps it helps. Next, I'll put
>>it on web...
>>
>>You asked, in another email, for the hosts on which the jobs failed. I'm
>>sorry, but I cannot tell you that, I did not monitor that closely, since I
>>thought qping's log would include job ID's - I know which IDs failed, so
>>that would have been sufficient.
>>
>>I tried a few things (the ones mentioned in these emails) and decided
>>to generate a new qping.log while I'm at it. This time, I recorded the
>>failed hosts, too. They are
>>Job 362, host compute-0-8.local
>>Job 365, host compute-0-5.local
>>Job 368, host compute-0-4.local.
>>
>> 
>>
>>
>>>Running the execds in debug mode might revel important information.
>>>   
>>>
>>
>>All of them, that is, those on the nodes, too? I'll try that. What is the
>>format of SGE_DEBUG_LEVEL, since just defining SGE_DEBUG_LEVEL=<number>
>>causes it to report "illegal debug level format"?
>>
> 
> source util/dl.csh
> dl 1
> bin/<arch>sge_execd
> 
> Should to it.
> 
> Stephan
> 
> 
>> 
>>
>>------------------------------------------------------------------------
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>> 
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
Christian Reissmann    Tel: +49 (0)941 3075 112  mailto:crei at sun.com
Software Engineer      Fax: +49 (0)941 3075 222
http://www.sun.com/gridengine
Sun Microsystems GmbH, Dr.-Leo-Ritter-Str. 7,
D-93049 Regensburg,    Tel: +49 (0)941 3075 0


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list