[GE users] TMPDIR

Colin Thomas Colin.Thomas at csr.com
Fri Sep 14 12:03:08 BST 2007


Hi,

Many thanks for pending the time in duplicating this.

As I mentioned we are on 6.03u

Thanks for filing this issue

Best regards

/colin

-----Original Message-----
From: Reuti [mailto:reuti at staff.uni-marburg.de] 
Sent: 14 September 2007 11:57
To: users at gridengine.sunsource.net
Subject: Re: [GE users] TMPDIR

Hi,

Am 14.09.2007 um 11:51 schrieb Colin Thomas:

> I had also had tried
>
> execd_params 	KEEP_ACTIVE=TRUE
>
> without any luck :-(
>
> qconf -mconf bender
> ct05 at bender.csr.com modified "bender.csr.com" in configuration list
> qconf -mconf global
> ct05 at bender.csr.com modified "global" in configuration list
> qsh -q CADTest at bender

although I see in the messages file:

09/14/2007 12:37:06|execd|node01|I|using "KEEP_ACTIVE=TRUE" for  
execd_params

I can only confirm this bug to be present also in 6.0. The  
housekeeping is also wrong:

node01:~ # /etc/init.d/sgeexecd stop
configuration node01 not defined
    Shutting down Grid Engine execution daemon
    Shutting down Grid Engine shepherd of job 396.1
    Shutting down Grid Engine shepherd of job 397.1
    Shutting down Grid Engine shepherd of job 398.1

But there are no shepherds running at all... The KEEP_ACTIVE=TRUE  
triggered obviously something, but not the right thing. Will these  
entries 396-398 ever disappear now? They persist even after stopping  
and starting the execd more than once.

I'll file an issue - thx.

-- Reuti


> and still the directory is erased ..
>
> hummm...
>
> Colin Thomas
>
> -----Original Message-----
> From: Reuti [mailto:reuti at staff.uni-marburg.de]
> Sent: 14 September 2007 10:17
> To: users at gridengine.sunsource.net
> Subject: Re: [GE users] TMPDIR
>
> Am 14.09.2007 um 10:42 schrieb Colin Thomas:
>
>> Hi Reuti,
>>
>> Thanks for picking up the missing "d" (I need new glasses).
>>
>> I have tried in the
>> qconf -mconf bender
>>
>> execd_param 	KEEP_ACTIVE=TRUE
>
> execd_params (now I missed the s)
>
> (just copy from the global configuration). The mentioning of mine of
> a bug was targeting the effect, that a new/changed local
> configuration will only be used if the global configuration is
> changed afterwards - even by removing any space/blank in any line to
> change the format, hence leaving the content as it was.
>
> -- Reuti
>
>
>> and
>> execd_parameter 	KEEP_ACTIVE=TRUE
>> and
>> execd_parameters 	KEEP_ACTIVE=TRUE
>>
>> and non stop the directory from being erased.
>>
>> Note I am getting
>>
>> qconf -mconf bender
>> ct05 at bender.csr.com modified "bender.csr.com" in configuration list
>>
>> If I look at qmon -> cluster configuration for machine bender then  
>> the
>> values are shown.
>>
>> I then run
>>
>> qsh -q CADTest at bender
>>
>> and when the window closes, the TMPDIR is wiped.
>>
>> Best regards
>>
>> Colin Thomas
>>
>> -----Original Message-----
>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>> Sent: 14 September 2007 08:36
>> To: users at gridengine.sunsource.net
>> Subject: Re: [GE users] TMPDIR
>>
>> Am 14.09.2007 um 09:02 schrieb Colin Thomas:
>>
>>> I have just tried your suggestion re
>>>
>>> qconf -mconf global
>>>
>>> and removing blank lines.
>>
>> So a message 'blabla at anywhere modified "global" in configuration
>> list' appeard. Maybe you have to adjust also in the configuration:
>>
>> loglevel                     log_info
>>
>> to get the desired information of the loaded configuration in the
>> messages file.
>>
>>> I reran the
>>>
>>> qsub -q CADTEST at ThisMachine
>>>
>>> (where ThisMachine has exec_parameters               
>>> KEEP_ACTIVE=TRUE)
>>
>> Typo? It's execd_param (missing d).
>>
>> -- Reuti
>>
>>>
>>> And still the directory is deleted.
>>>
>>> I have checked the messages file for ThisMachine, ad nothing has  
>>> been
>>> added.
>>>
>>> Looks like a bug - where do I submit it as a bug ?
>>>
>>> Best regards
>>>
>>> Colin Thomas
>>>
>>> -----Original Message-----
>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>> Sent: 13 September 2007 16:46
>>> To: users at gridengine.sunsource.net
>>> Subject: Re: [GE users] TMPDIR
>>>
>>> Am 13.09.2007 um 17:15 schrieb Colin Thomas:
>>>
>>>> Thanks for the suggestion.
>>>>
>>>> I have KEEP_ACTIVE=ALL in the mconf for a particular machine
>>>>
>>>> i.e.
>>>>
>>>> mailer                       /bin/mailx
>>>> xterm                        /usr/openwin/bin/xterm
>>>> qlogin_daemon                /usr/sbin/in.telnetd
>>>> rlogin_daemon                /usr/sbin/in.rlogind
>>>> exec_parameters              KEEP_ACTIVE=TRUE
>>>>
>>>> I then run qsh on ThisMachine
>>>>
>>>> qsub -q CADTEST at ThisMachine
>>>>
>>>> The temp directory is created, but when the qsh is exited, the temp
>>>> directory is still being erased.
>>>>
>>>> Have I missed something ?
>>>
>>> Mmh - seems the new local configuration isn't read in. Can you just
>>> edit the global configuration by removing a blank in any line (to
>>> change the file), then it should be read in (you can check this in /
>>> var/spool/sge/<your_node>/messages or you location for this file).
>>>
>>> Is this worth to file an issue?
>>>
>>> -- Reuti
>>>
>>>
>>>> Best regards
>>>>
>>>> Colin Thomas
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>> Sent: 13 September 2007 14:50
>>>> To: users at gridengine.sunsource.net
>>>> Subject: Re: [GE users] TMPDIR
>>>>
>>>> Hi,
>>>>
>>>> Am 13.09.2007 um 15:03 schrieb Colin Thomas:
>>>>
>>>>> Many thanks for your reply.
>>>>>
>>>>> Is KEEP_ACTIVE an execd parameter ? If so I can see how (via
>>>>> qmon) it
>>>>> can be changed in the cluster configuration.
>>>>>
>>>>> So I "modify" a single machine, add KEEP_ACTIVE=1 to the
>>>>> execd_parameters , click okay , but it then says that the "host
>>>>> already
>>>>> exists" even though I was modifying it :-(
>>>>>
>>>>> What is an alternative way of altering a machines's execd  
>>>>> parameter
>>>>> (not
>>>>> through qmon).
>>>>
>>>> I would use the value TRUE instead. But the real reason is this:
>>>>
>>>> http://gridengine.sunsource.net/issues/show_bug.cgi?id=2080
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> Best regards
>>>>>
>>>>> Colin Thomas
>>>>>
>>>>> -----Original Message-----
>>>>> From: Reuti [mailto:reuti at staff.uni-marburg.de]
>>>>> Sent: 13 September 2007 13:37
>>>>> To: users at gridengine.sunsource.net
>>>>> Subject: Re: [GE users] TMPDIR
>>>>>
>>>>> Hi,
>>>>>
>>>>> Am 13.09.2007 um 14:03 schrieb Colin Thomas:
>>>>>
>>>>>> I have a question about TMPDIR.
>>>>>>
>>>>>> When job is started the TMPDIR is set to
>>>>>>
>>>>>> <temp dir from queue config>/<gridId>.<queue Name>
>>>>>>
>>>>>> We have a qsh job, which creates an xterm.
>>>>>
>>>>> how? Is it jumping out of the process tree - like started with & ?
>>>>> You can try to set ENABLE_ADDGRP_KILL in the SGE configuration for
>>>>> the execd_params so that the xterm is killed when the main job
>>>>> completed.
>>>>>
>>>>> -- Reuti
>>>>>
>>>>> PS: There is the option KEEP_ACTIVE to avoid the removal, but this
>>>>> way you would end up a) with many orphaned directories on the  
>>>>> nodes
>>>>> over time, and b) violating SGE's policy, as the xterm might still
>>>>> generate load on the system but SGE will put already another  
>>>>> job on
>>>>> it.
>>>>>
>>>>> ...
>>>>>
>>>>>> The xterm is inheriting  the
>>>>>> TMPDIR that the qsh set up. Then the qsh job completes (with the
>>>>>> xterm
>>>>>> still running), and the TMPDIR is removed. The xterm then has no
>>>>>> TMPDIR
>>>>>> to write to. This did start a conversation saying the the qsh job
>>>>>> should
>>>>>> have know that the xterm child process was still active, and so
>>>>>> should
>>>>>> not have cleaned up anyway..
>>>>>>
>>>>>> I have looked (and failed) to find a switch to disable the TMPDIR
>>>>>> from
>>>>>> being deleted : is there such a thing ?
>>>>>>
>>>>>> I did note some talk about in messages concerning the notify
>>>>>> setting
>>>>>> from the queue configuration (which is set to 60sec by default)
>>>>>> but
>>>>>> changing this does not seem to postpone the TMPDIR from being
>>>>>> cleaned.
>>>>>
>>>>> PPS: Depends on the handling of the signal by the main job. If it
>>>>> can't handle it, it will simply quit and tell SGE as a result
>>>>> that it
>>>>> finished already. If you ignore the generated signal in the
>>>>> jobscript, then during the grace period before being killed
>>>>> finally,
>>>>> the TMPDIR should still exist.
>>>>>
>>>>>> I also tried a prolog script to try and set it to a value that I
>>>>>> wanted,
>>>>>> but this route also did not work.
>>>>>>
>>>>>> Any thoughts ? (we are 6.03u)
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>> Colin Thomas
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> .
>>>>>
>>>>> ------------------------------------------------------------------

>>>>> -
>
>>>>> -
>>
>>>>> -
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-
>>>>> help at gridengine.sunsource.net
>>>>>
>>>>
>>>> -------------------------------------------------------------------

>>>> -
>
>>>> -
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>>
>>>>
>>>>  To report this email as spam click
>>>> https://www.mailcontrol.com/sr/wQw0zmjPoHdJTZGyOCrrhg==
>>>> yyWHfXXYSqjB79RLVo6vPFq4nRiFC!
>>>> liSPrGkOZ3KrsRrGRcietNIId7UBsuqLVIXemMXYLU
>>>> gKs3DGazu7PV5EoF8b2JvztAPgoQxfcEX0kv7IZhQm9hOyRSOfEsg
>>>> +HNjoqPfnPERaxB9x8!
>>>> PYkQfQTylaqnOcZZHjFaCqIZHvtLZRS1djMmiDe .
>>>>
>>>> -------------------------------------------------------------------

>>>> -
>
>>>> -
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users- 
>>>> help at gridengine.sunsource.net
>>>>
>>>
>>> --------------------------------------------------------------------

>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>> --------------------------------------------------------------------

>>> -
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list