[GE users] Temporary removing jobs from the queues

mad margaret_Doll at brown.edu
Tue Jul 7 18:41:32 BST 2009


On Jul 7, 2009, at 12:28 PM, dangruhn wrote:

> mad wrote:
>> On Jul 7, 2009, at 10:18 AM, dangruhn wrote:
>>
>>
>>> mad wrote:
>>>
>>>> On Jul 7, 2009, at 8:57 AM, dangruhn wrote:
>>>>
>>>>
>>>>
>>>>> Margaret,
>>>>>
>>>>> mad wrote:
>>>>>
>>>>>
>>>>>> I need to free up some slots on our system.  One user has  
>>>>>> submitted
>>>>>> two jobs which are taking up all the resources.   I would like to
>>>>>> "suspend" one of her jobs to allow use of the cluster by other
>>>>>> users.
>>>>>>
>>>>>>
>>>>>> I have tried suspend and hold through qmon.  However, the slots  
>>>>>> are
>>>>>> still occupied.
>>>>>>
>>>>>> qstat -g c
>>>>>> CLUSTER QUEUE                   CQLOAD   USED  AVAIL  TOTAL  
>>>>>> aoACDS
>>>>>> cdsuE
>>>>>> -------------------------------------------------------------------------------
>>>>>> all.q                             0.98     72      0     72
>>>>>> 0      0
>>>>>>
>>>>>>
>>>>>> and I cannot qlogin
>>>>>>
>>>>>> qlogin
>>>>>> Your job 13522 ("QLOGIN") has been submitted
>>>>>> waiting for interactive job to be scheduled ...timeout (4 s)
>>>>>> expired
>>>>>> while waiting on socket fd 4
>>>>>>
>>>>>>
>>>>>> Your "qlogin" request could not be scheduled, try again later.
>>>>>>
>>>>>> I do not want to kill the job.  How can I free up some of the
>>>>>> slots?
>>>>>>
>>>>>>
>>>>>>
>>>>> One possibility is to either suspend or hold (I can't remember  
>>>>> which
>>>>> one
>>>>> is the best) and then restart the job.
>>>>>
>>>>>
>>>> Are you using "reschedule" to restart the job?  Resume just takes  
>>>> the
>>>> hold
>>>> or  suspend status off the job.  I am still looking at qmon.
>>>> Restarting is better
>>>> than killing especially if the user is not currently available.
>>>>
>>>>
>>> Yes, reschedule from the qmon/Job Control/Running Jobs tab.
>>>
>>
>> I am getting the message that the jobs are not rerunnable whether  
>> or not
>> I have put a hold, suspend or no setting on the running job.
>>
> Take a look at your queue definitions.  In the General Configuration  
> tab
> there is a "Rerun Jobs" check box that needs to be checked.

Thanks.  That did the trick.


>>
>>>>> This will put the job back in
>>>>> pending but it won't be eligible for execution until the suspend/
>>>>> hold is
>>>>> released.
>>>>>
>>>>> The down side is that this job will be starting over from scratch.
>>>>> Is
>>>>> this okay or is that what you meant by saying you don't want to  
>>>>> kill
>>>>> the
>>>>> job?
>>>>>
>>>>>
>>>>
>>>>
>>>>>> Also how do I hold the user's jobs waiting on the queue so that I
>>>>>> can
>>>>>> release them in a manner that keeps some of the slots open for
>>>>>> other
>>>>>> users?
>>>>>>
>>>>>> ----------------------------------------------------------------------------
>>>>>> all.q at compute-0-8.local        BIP   4/4       4.00     lx26- 
>>>>>> amd64
>>>>>> 13512 0.25000 user1_SOLVER user1        s     07/06/2009
>>>>>> 21:08:09     4
>>>>>> ----------------------------------------------------------------------------
>>>>>>
>>>>>> Although this job is "suspended", it is still running on
>>>>>> compute-0-8
>>>>>> and taking up four CPUs.
>>>>>>
>>>>>> ------------------------------------------------------
>>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206003
>>>>>>
>>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
>>>>>> ].
>>>>>>
>>>>>>
>>>>>>
>>>>> -- 
>>>>> Dan Gruhn
>>>>> Group W Inc.
>>>>> 8315 Lee Hwy, Suite 303
>>>>> Fairfax, VA, 22031
>>>>> PH: (703) 752-5831
>>>>> FX: (703) 752-5851
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206005
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
>>>>> ].
>>>>>
>>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206007
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
>>>> ].
>>>>
>>>>
>>> -- 
>>> Dan Gruhn
>>> Group W Inc.
>>> 8315 Lee Hwy, Suite 303
>>> Fairfax, VA, 22031
>>> PH: (703) 752-5831
>>> FX: (703) 752-5851
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206019
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
>>> ].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206024
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>> ].
>>
>
> -- 
> Dan Gruhn
> Group W Inc.
> 8315 Lee Hwy, Suite 303
> Fairfax, VA, 22031
> PH: (703) 752-5831
> FX: (703) 752-5851
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206040
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206047

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list