[GE users] Using qdel leaves queues in error status

Filipe Brandenburger filipe.brandenburger at idilia.com
Mon May 26 14:46:56 BST 2008


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Andrew Preece wrote:
> On 23/05/08 7:28 AM, "Filipe Brandenburger"
> <filipe.brandenburger at idilia.com> wrote:
>> I'm having a problem (happened twice this week) when users submit a very
>> large number of jobs and then use qdel to kill all of them. Twice this
>> problem left me with queues in (E) error state.
>>
>> The problem appears to happen when the kill signal is delivered before
>> the job has started. sge_shepherd quits with a message that says that
>> the "exit_status" file did not exist, then it returns code 7 (problem
>> before prolog), and this leaves the queue in error state.
>
> Filipe, 
> I had the same issue with one of my users.
> We ended up working around it by putting a hold on the jobs for that user by
> running qalter -h u <jid>, then deleting the jobs.
> 
> -Andrew. 

Hi, Andrew. Thanks for the tip. I tried it briefly and it seems to work
fine. I guess I will implement it as a workaround, at least for now.

I wonder, which version of SGE are you using? Did you have this problem
with 6.1 as well? I would like to know if there is a chance that 6.1
could have a fix for this problem.

Did you ever try to find a more definitive solution for this problem?
One way I see to do it is to patch qdel to run qalter before effectively
running the jobs, but I would like to know if you had any different
ideas that could be more effective.

Thanks!
Filipe

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list