[GE users] Deleting jobs without qdel

heywood heywood at cshl.edu
Wed Apr 28 16:00:20 BST 2010


Thanks, Ivan. Looks like the overload is too much for even that...

[root at bhmnode2 qmaster]# qmod -d \*
failed receiving gdi request response for mid=1 (got syncron message receive
timeout error).
error: commlib error: got read error (closing "bhmnode2/qmaster/1")


Todd



On 4/28/10 10:52 AM, "iadzhubey" <iadzhubey at rics.bwh.harvard.edu> wrote:

> Hi Todd
> 
> On Wednesday 28 April 2010 10:38:27 am heywood wrote:
>> Is there any shortcut to deleting jobs in the system without qdel? We had a
>> user "accidentally" submit 500K very short running jobs. SGE goes
>> unresponsive, i.e. all commands hang, even qdel. Qping shows the messgaes
>> in the read buffer constantly growing. I have even tried shutting down the
>> qmaster and restarting it.
> 
> Been there, done that. Except our users often submit arrays in the range of 10
> million tasks easily. If something goes wrong it may take quite an effort to
> get rid of them. My strategy is to first of all immediately disable all queues
> on the system. You can do this with 'qmod -d \*' command which does not
> involve scanning queues contents and thus executes fairly fast even on a
> heavily oversubscribed system. You can then proceed with deleting rogue jobs
> still sitting in the queue.
> 
> Best,
> Ivan
> 
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=255
> 300
> 
> To unsubscribe from this discussion, e-mail:
> [users-unsubscribe at gridengine.sunsource.net].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=255301

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list