[GE users] qconf -purge syntax

hjmangalam harry.mangalam at uci.edu
Wed Oct 13 17:17:52 BST 2010


As per your Dan's advice, the offending subdirs were purged to the 
spool/node level:
ie 

cd <SGE_ROOT>/spool
$ tree a64-102   # (a64-102 is similarly affected)
a64-102
|-- active_jobs
|-- execd.pid
|-- job_scripts
|-- jobs
`-- messages

(so no more job info under the node)

and qmaster and shadowmaster restarted, with no change.

I then mv'ed the top-level node dir out of the way, restarted qmaster, 
with no change - SGE is still seeing these ghosts.

$ qhost |grep ' - ' | grep -v global                                                                       
a64-101                 lx24-amd64      2     -    7.7G   ..   
a64-102                 lx24-amd64      2     -    7.7G   ..
a64-103                 lx24-amd64      2     -   15.6G   ..

and I still can't delete them:
 $ qconf -de a64-101                                                                                        
Host object "a64-101" is still referenced in cluster queue "long-adc".

A remaining approach is to just go into the 
<SGE_ROOT/>spool/qmaster/exec_hosts dir and manually rm the 
appropriate hosts file, but that seems extreme (tho at this point, I'd 
do it unless this has cascading implications.

..?

Thanks very much for your continued help
hjm

On Tuesday 12 October 2010 16:26:59 templedf wrote:
> If those jobs aren't actually active anymore, it should be OK to
> delete them and restart.  That said, those directories look like
> execd spool directories, in which case you don't need to restart
> the master to see the effect.
> 
> Daniel
> 
> On 10/12/10 3:47 PM, hjmangalam wrote:
> > Thanks - below is the dir tree which does show active jobs.  Is
> > it safe to simply delete those entries manually and then restart
> > the qmaster?
> > 
> > hjm
> > 
> > hmangala:/sge62/bduc_nacs/spool
> > 625 $ tree a64-101/
> > a64-101/
> > 
> > |-- active_jobs
> > |
> > |   |-- 516060.1
> > |   |
> > |   |   |-- addgrpid
> > |   |   |-- config
> > |   |   |-- environment
> > |   |   |-- error
> > |   |   |-- exit_status
> > |   |   |-- job_pid
> > |   |   |-- pe_hostfile
> > |   |   |-- pid
> > |   |   
> > |   |   `-- trace
> > |   
> > |   `-- 516347.1
> > |   
> > |       |-- addgrpid
> > |       |-- config
> > |       |-- environment
> > |       |-- error
> > |       |-- exit_status
> > |       |-- job_pid
> > |       |-- pe_hostfile
> > |       |-- pid
> > |       
> > |       `-- trace
> > |
> > |-- execd.pid
> > |-- job_scripts
> > |
> > |   |-- 516060
> > |   
> > |   `-- 516347
> > |
> > |-- jobs
> > |
> > |   `-- 00
> > |   
> > |       `-- 0051
> > |       
> > |           |-- 6060.1
> > |           
> > |           `-- 6347.1
> > 
> > `-- messages
> > 
> > 7 directories, 24 files
> > 
> > On Tuesday 12 October 2010 15:14:16 Daniel Templeton wrote:
> >> With classic spooling, you should be able to prowl around in the
> >> spool directory and find the bad reference.  grep -r comes to
> >> mind.  If a reboot of the qmaster didn't clear the issue, then
> >> it must be spooled somewhere in there.
> >> 
> >> Daniel
> >> 
> >> On 10/12/10 3:02 PM, hjmangalam wrote:
> >>> I believe I'm using classic spooling:
> >>> 
> >>> bootstrap:6:spooling_method         classic
> >>> 
> >>> hjm
> >>> 
> >>> On Tuesday 12 October 2010 14:19:19 Daniel Templeton wrote:
> >>>> Are you using classic or BDB spooling?
> >>>> 
> >>>> Daniel
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMe
> ssageId=286741
> 
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].

-- 
Harry Mangalam - Research Computing, NACS, Rm 225 MSTB, UC Irvine
[ZOT 2225] / 92697  949 824-0084(o), 949 285-4487(c)
MSTB=Bldg 415 (G-5 on <http://today.uci.edu/pdf/UCI_09_map_campus.pdf>
--
Non-sarcarstic use of 'seamless' in any context having to do with 
computers immediately disqualifies the speaker as an expert.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=286906

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list