[GE users] qconf -purge syntax

templedf daniel.templeton at oracle.com
Mon Oct 18 15:11:03 BST 2010


What version are you using?  I just came across an issue that was fixed 
in 6.2u3 that might be what caused your issue.  See Issue 2408.

Daniel

On 10/13/10 09:17 AM, hjmangalam wrote:
> As per your Dan's advice, the offending subdirs were purged to the
> spool/node level:
> ie
>
> cd<SGE_ROOT>/spool
> $ tree a64-102   # (a64-102 is similarly affected)
> a64-102
> |-- active_jobs
> |-- execd.pid
> |-- job_scripts
> |-- jobs
> `-- messages
>
> (so no more job info under the node)
>
> and qmaster and shadowmaster restarted, with no change.
>
> I then mv'ed the top-level node dir out of the way, restarted qmaster,
> with no change - SGE is still seeing these ghosts.
>
> $ qhost |grep ' - ' | grep -v global
> a64-101                 lx24-amd64      2     -    7.7G   ..
> a64-102                 lx24-amd64      2     -    7.7G   ..
> a64-103                 lx24-amd64      2     -   15.6G   ..
>
> and I still can't delete them:
>   $ qconf -de a64-101
> Host object "a64-101" is still referenced in cluster queue "long-adc".
>
> A remaining approach is to just go into the
> <SGE_ROOT/>spool/qmaster/exec_hosts dir and manually rm the
> appropriate hosts file, but that seems extreme (tho at this point, I'd
> do it unless this has cascading implications.
>
> ..?
>
> Thanks very much for your continued help
> hjm
>
> On Tuesday 12 October 2010 16:26:59 templedf wrote:
>> If those jobs aren't actually active anymore, it should be OK to
>> delete them and restart.  That said, those directories look like
>> execd spool directories, in which case you don't need to restart
>> the master to see the effect.
>>
>> Daniel
>>
>> On 10/12/10 3:47 PM, hjmangalam wrote:
>>> Thanks - below is the dir tree which does show active jobs.  Is
>>> it safe to simply delete those entries manually and then restart
>>> the qmaster?
>>>
>>> hjm
>>>
>>> hmangala:/sge62/bduc_nacs/spool
>>> 625 $ tree a64-101/
>>> a64-101/
>>>
>>> |-- active_jobs
>>> |
>>> |   |-- 516060.1
>>> |   |
>>> |   |   |-- addgrpid
>>> |   |   |-- config
>>> |   |   |-- environment
>>> |   |   |-- error
>>> |   |   |-- exit_status
>>> |   |   |-- job_pid
>>> |   |   |-- pe_hostfile
>>> |   |   |-- pid
>>> |   |
>>> |   |   `-- trace
>>> |
>>> |   `-- 516347.1
>>> |
>>> |       |-- addgrpid
>>> |       |-- config
>>> |       |-- environment
>>> |       |-- error
>>> |       |-- exit_status
>>> |       |-- job_pid
>>> |       |-- pe_hostfile
>>> |       |-- pid
>>> |
>>> |       `-- trace
>>> |
>>> |-- execd.pid
>>> |-- job_scripts
>>> |
>>> |   |-- 516060
>>> |
>>> |   `-- 516347
>>> |
>>> |-- jobs
>>> |
>>> |   `-- 00
>>> |
>>> |       `-- 0051
>>> |
>>> |           |-- 6060.1
>>> |
>>> |           `-- 6347.1
>>>
>>> `-- messages
>>>
>>> 7 directories, 24 files
>>>
>>> On Tuesday 12 October 2010 15:14:16 Daniel Templeton wrote:
>>>> With classic spooling, you should be able to prowl around in the
>>>> spool directory and find the bad reference.  grep -r comes to
>>>> mind.  If a reboot of the qmaster didn't clear the issue, then
>>>> it must be spooled somewhere in there.
>>>>
>>>> Daniel
>>>>
>>>> On 10/12/10 3:02 PM, hjmangalam wrote:
>>>>> I believe I'm using classic spooling:
>>>>>
>>>>> bootstrap:6:spooling_method         classic
>>>>>
>>>>> hjm
>>>>>
>>>>> On Tuesday 12 October 2010 14:19:19 Daniel Templeton wrote:
>>>>>> Are you using classic or BDB spooling?
>>>>>>
>>>>>> Daniel
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMe
>> ssageId=286741
>>
>> To unsubscribe from this discussion, e-mail:
>> [users-unsubscribe at gridengine.sunsource.net].
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=288158

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list