[GE users] Strang SGE behavior

allantran tran.v.allan at gmail.com
Thu May 14 16:22:53 BST 2009

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

hah...this is this job ID (13)
root at master:/usr/local/ge6.2u2_1/default/spool/qmaster/jobs/00/0000[1018]>ls

should I remove -rf /00
or just 0013.

I did have a problem with hosts when I first install. But I resolved it and all jobs ran fine except when restarting qmaster, this 13 stuck back in and stay in "t".

On Thu, May 14, 2009 at 9:17 AM, dom <marco.donauer at sun.com<mailto:marco.donauer at sun.com>> wrote:
You can hava a look into the qmaster spooling directory and look for a
jobs directory.
This directory contains the spooledjobs. You can look for the job-id of
the job which is not deletable.
Remove this dir and it should work again.
Who did this problem appear. Did you have any problems with your hosts,
or network?


allantran wrote:
> Thanks for your response, Marco.
> I'm using classic spooling. Is there way to remove the that old job
> object. Everything else seems working fine so I hesitate to reinstall
> the qmaster.
> Any input would be appreciated.
> Allan
> On Wed, May 13, 2009 at 10:46 PM, dom <marco.donauer at sun.com<mailto:marco.donauer at sun.com>
> <mailto:marco.donauer at sun.com<mailto:marco.donauer at sun.com>>> wrote:
>     Hi,
>     what kind of spooling do you use and what is you sge version?
>     It looks like any old job object is spooled, which is somehow
>     broken and
>     the qmaster is not able to remove it.
>     Marco
>     allantran wrote:
>     > I notice that it's not rebooting but everytime sgemaster restarted,
>     > the old job stuck back into the queue and stay in "t"state. Anyone
>     > know how to remove it permanently so it wont come back? No
>     matter how
>     > many time I qdel it, it goes away until the machine reboots or
>     > sgemaster restarted.
>     > Thanks
>     >
>     >
>     > On Tue, May 12, 2009 at 3:09 PM, Allan Tran
>     <tran.v.allan at gmail.com<mailto:tran.v.allan at gmail.com> <mailto:tran.v.allan at gmail.com<mailto:tran.v.allan at gmail.com>>
>     > <mailto:tran.v.allan at gmail.com<mailto:tran.v.allan at gmail.com> <mailto:tran.v.allan at gmail.com<mailto:tran.v.allan at gmail.com>>>>
>     wrote:
>     >
>     >     Hi group,
>     >     I installed a new sge on a new cluster and everything seems
>     >     working however, every time I reboot the master node (has
>     qmaster
>     >     and sgeexecd running), there is an old job stuck back in the
>     queue
>     >     in "t" state. This causes all jobs submitted after that stays in
>     >     "qw" state and not able to run.
>     >     Anyone know why the old jobs put back in the queue? I even
>     deleted
>     >     this job twice before but it seems never gone away after reboot.
>     >     Thanks for the help
>     >
>     >
>     ------------------------------------------------------
>     http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=195332
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=195332>
>     To unsubscribe from this discussion, e-mail:
>     [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>].


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].

More information about the gridengine-users mailing list