[GE users] Using qdel leaves queues in error status

Filipe Brandenburger filipe.brandenburger at idilia.com
Tue May 27 14:15:29 BST 2008

    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi Andreas,

Thank you very much for your answer. I will consider moving the local
queues (and as long as I'm at it, the binaries as well) to local disk.

Andreas.Haas at Sun.COM wrote:
> Another possibility is to upgrade to 6.1u4. That way you would
>    752      6288953   scalability issue with qdel and very large array jobs

That's great!

I found the bug report for this issue here:
However, I couldn't find the actual patch (#6288953). Could you please
point me to it?

I was wondering if this patch would be simple and non-intrusive enough
that I could apply it to 6.0, because the grid right now is very busy
and it's probably going to be quite long until I will be able to upgrade
it to 6.1.

There is another thing about this problem that I would like to try to
understand. It happened twice, but the first time processes got the HUP
signal, and the second time they got the KILL signal:

> 05/20/2008 09:29:44|qmaster|sgemaster|W|job 7972300.1 failed on host s14.mydomain.com assumedly after job because: job 7972300.1 died through signal HUP (1)

> 05/21/2008 18:42:10|qmaster|sgemaster|W|job 8011707.1 failed on host j05.mydomain.com assumedly after job because: job 8011707.1 died through signal KILL (9)

I would like to understand what caused this difference in behaviour,
since I don't really like the idea of having processes (specially lots
of them) being killed with SIGKILL. Is it something with qdel that
activates the KILL signal, like the -f argument?


To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list