[GE users] Temporary removing jobs from the queues

dangruhn Dan.Gruhn at groupw.com
Tue Jul 7 17:28:43 BST 2009


mad wrote:
> On Jul 7, 2009, at 10:18 AM, dangruhn wrote:
>
>   
>> mad wrote:
>>     
>>> On Jul 7, 2009, at 8:57 AM, dangruhn wrote:
>>>
>>>
>>>       
>>>> Margaret,
>>>>
>>>> mad wrote:
>>>>
>>>>         
>>>>> I need to free up some slots on our system.  One user has submitted
>>>>> two jobs which are taking up all the resources.   I would like to
>>>>> "suspend" one of her jobs to allow use of the cluster by other  
>>>>> users.
>>>>>
>>>>>
>>>>> I have tried suspend and hold through qmon.  However, the slots are
>>>>> still occupied.
>>>>>
>>>>> qstat -g c
>>>>> CLUSTER QUEUE                   CQLOAD   USED  AVAIL  TOTAL aoACDS
>>>>> cdsuE
>>>>> -------------------------------------------------------------------------------
>>>>> all.q                             0.98     72      0     72
>>>>> 0      0
>>>>>
>>>>>
>>>>> and I cannot qlogin
>>>>>
>>>>> qlogin
>>>>> Your job 13522 ("QLOGIN") has been submitted
>>>>> waiting for interactive job to be scheduled ...timeout (4 s)  
>>>>> expired
>>>>> while waiting on socket fd 4
>>>>>
>>>>>
>>>>> Your "qlogin" request could not be scheduled, try again later.
>>>>>
>>>>> I do not want to kill the job.  How can I free up some of the  
>>>>> slots?
>>>>>
>>>>>
>>>>>           
>>>> One possibility is to either suspend or hold (I can't remember which
>>>> one
>>>> is the best) and then restart the job.
>>>>
>>>>         
>>> Are you using "reschedule" to restart the job?  Resume just takes the
>>> hold
>>> or  suspend status off the job.  I am still looking at qmon.
>>> Restarting is better
>>> than killing especially if the user is not currently available.
>>>
>>>       
>> Yes, reschedule from the qmon/Job Control/Running Jobs tab.
>>     
>
> I am getting the message that the jobs are not rerunnable whether or not
> I have put a hold, suspend or no setting on the running job.
>   
Take a look at your queue definitions.  In the General Configuration tab 
there is a "Rerun Jobs" check box that needs to be checked.
>   
>>>> This will put the job back in
>>>> pending but it won't be eligible for execution until the suspend/
>>>> hold is
>>>> released.
>>>>
>>>> The down side is that this job will be starting over from scratch.  
>>>> Is
>>>> this okay or is that what you meant by saying you don't want to kill
>>>> the
>>>> job?
>>>>
>>>>         
>>>
>>>       
>>>>> Also how do I hold the user's jobs waiting on the queue so that I  
>>>>> can
>>>>> release them in a manner that keeps some of the slots open for  
>>>>> other
>>>>> users?
>>>>>
>>>>> ----------------------------------------------------------------------------
>>>>> all.q at compute-0-8.local        BIP   4/4       4.00     lx26-amd64
>>>>>  13512 0.25000 user1_SOLVER user1        s     07/06/2009
>>>>> 21:08:09     4
>>>>> ----------------------------------------------------------------------------
>>>>>
>>>>> Although this job is "suspended", it is still running on  
>>>>> compute-0-8
>>>>> and taking up four CPUs.
>>>>>
>>>>> ------------------------------------------------------
>>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206003
>>>>>
>>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
>>>>> ].
>>>>>
>>>>>
>>>>>           
>>>> -- 
>>>> Dan Gruhn
>>>> Group W Inc.
>>>> 8315 Lee Hwy, Suite 303
>>>> Fairfax, VA, 22031
>>>> PH: (703) 752-5831
>>>> FX: (703) 752-5851
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206005
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
>>>> ].
>>>>
>>>>         
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206007
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>>> ].
>>>
>>>       
>> -- 
>> Dan Gruhn
>> Group W Inc.
>> 8315 Lee Hwy, Suite 303
>> Fairfax, VA, 22031
>> PH: (703) 752-5831
>> FX: (703) 752-5851
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206019
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>> ].
>>     
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206024
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
>   

-- 
Dan Gruhn
Group W Inc.
8315 Lee Hwy, Suite 303
Fairfax, VA, 22031
PH: (703) 752-5831
FX: (703) 752-5851

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206040

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list