[GE users] Strang SGE behavior

dom marco.donauer at sun.com
Thu May 14 16:40:17 BST 2009


The 0013 should be enough.

Marco

allantran wrote:
> hah...this is this job ID (13)
> root at master:/usr/local/ge6.2u2_1/default/spool/qmaster/jobs/00/0000[1018]>ls
> 0013
>
> should I remove -rf /00
> or just 0013.
>
> I did have a problem with hosts when I first install. But I resolved
> it and all jobs ran fine except when restarting qmaster, this 13 stuck
> back in and stay in "t".
> Thanks
>
>
> On Thu, May 14, 2009 at 9:17 AM, dom <marco.donauer at sun.com
> <mailto:marco.donauer at sun.com>> wrote:
>
>     You can hava a look into the qmaster spooling directory and look for a
>     jobs directory.
>     This directory contains the spooledjobs. You can look for the
>     job-id of
>     the job which is not deletable.
>     Remove this dir and it should work again.
>     Who did this problem appear. Did you have any problems with your
>     hosts,
>     or network?
>
>     Marco
>
>     allantran wrote:
>     > Thanks for your response, Marco.
>     > I'm using classic spooling. Is there way to remove the that old job
>     > object. Everything else seems working fine so I hesitate to
>     reinstall
>     > the qmaster.
>     > Any input would be appreciated.
>     > Allan
>     >
>     > On Wed, May 13, 2009 at 10:46 PM, dom <marco.donauer at sun.com
>     <mailto:marco.donauer at sun.com>
>     > <mailto:marco.donauer at sun.com <mailto:marco.donauer at sun.com>>>
>     wrote:
>     >
>     >     Hi,
>     >
>     >     what kind of spooling do you use and what is you sge version?
>     >     It looks like any old job object is spooled, which is somehow
>     >     broken and
>     >     the qmaster is not able to remove it.
>     >
>     >     Marco
>     >
>     >
>     >     allantran wrote:
>     >     > I notice that it's not rebooting but everytime sgemaster
>     restarted,
>     >     > the old job stuck back into the queue and stay in
>     "t"state. Anyone
>     >     > know how to remove it permanently so it wont come back? No
>     >     matter how
>     >     > many time I qdel it, it goes away until the machine reboots or
>     >     > sgemaster restarted.
>     >     > Thanks
>     >     >
>     >     >
>     >     > On Tue, May 12, 2009 at 3:09 PM, Allan Tran
>     >     <tran.v.allan at gmail.com <mailto:tran.v.allan at gmail.com>
>     <mailto:tran.v.allan at gmail.com <mailto:tran.v.allan at gmail.com>>
>     >     > <mailto:tran.v.allan at gmail.com
>     <mailto:tran.v.allan at gmail.com> <mailto:tran.v.allan at gmail.com
>     <mailto:tran.v.allan at gmail.com>>>>
>     >     wrote:
>     >     >
>     >     >     Hi group,
>     >     >     I installed a new sge on a new cluster and everything
>     seems
>     >     >     working however, every time I reboot the master node (has
>     >     qmaster
>     >     >     and sgeexecd running), there is an old job stuck back
>     in the
>     >     queue
>     >     >     in "t" state. This causes all jobs submitted after
>     that stays in
>     >     >     "qw" state and not able to run.
>     >     >     Anyone know why the old jobs put back in the queue? I even
>     >     deleted
>     >     >     this job twice before but it seems never gone away
>     after reboot.
>     >     >     Thanks for the help
>     >     >
>     >     >
>     >
>     >     ------------------------------------------------------
>     >    
>     http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=195332
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=195332>
>     >    
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=195332
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=195332>>
>     >
>     >     To unsubscribe from this discussion, e-mail:
>     >     [users-unsubscribe at gridengine.sunsource.net
>     <mailto:users-unsubscribe at gridengine.sunsource.net>
>     >     <mailto:users-unsubscribe at gridengine.sunsource.net
>     <mailto:users-unsubscribe at gridengine.sunsource.net>>].
>     >
>     >
>
>     ------------------------------------------------------
>     http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=195564
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=195564>
>
>     To unsubscribe from this discussion, e-mail:
>     [users-unsubscribe at gridengine.sunsource.net
>     <mailto:users-unsubscribe at gridengine.sunsource.net>].
>
>

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=195571

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list