[GE users] Temporary removing jobs from the queues

mad margaret_Doll at brown.edu
Tue Jul 7 15:32:20 BST 2009


On Jul 7, 2009, at 10:18 AM, dangruhn wrote:

> mad wrote:
>> On Jul 7, 2009, at 8:57 AM, dangruhn wrote:
>>
>>
>>> Margaret,
>>>
>>> mad wrote:
>>>
>>>> I need to free up some slots on our system.  One user has submitted
>>>> two jobs which are taking up all the resources.   I would like to
>>>> "suspend" one of her jobs to allow use of the cluster by other  
>>>> users.
>>>>
>>>>
>>>> I have tried suspend and hold through qmon.  However, the slots are
>>>> still occupied.
>>>>
>>>> qstat -g c
>>>> CLUSTER QUEUE                   CQLOAD   USED  AVAIL  TOTAL aoACDS
>>>> cdsuE
>>>> -------------------------------------------------------------------------------
>>>> all.q                             0.98     72      0     72
>>>> 0      0
>>>>
>>>>
>>>> and I cannot qlogin
>>>>
>>>> qlogin
>>>> Your job 13522 ("QLOGIN") has been submitted
>>>> waiting for interactive job to be scheduled ...timeout (4 s)  
>>>> expired
>>>> while waiting on socket fd 4
>>>>
>>>>
>>>> Your "qlogin" request could not be scheduled, try again later.
>>>>
>>>> I do not want to kill the job.  How can I free up some of the  
>>>> slots?
>>>>
>>>>
>>> One possibility is to either suspend or hold (I can't remember which
>>> one
>>> is the best) and then restart the job.
>>>
>>
>> Are you using "reschedule" to restart the job?  Resume just takes the
>> hold
>> or  suspend status off the job.  I am still looking at qmon.
>> Restarting is better
>> than killing especially if the user is not currently available.
>>
> Yes, reschedule from the qmon/Job Control/Running Jobs tab.

I am getting the message that the jobs are not rerunnable whether or not
I have put a hold, suspend or no setting on the running job.

>>
>>> This will put the job back in
>>> pending but it won't be eligible for execution until the suspend/
>>> hold is
>>> released.
>>>
>>> The down side is that this job will be starting over from scratch.  
>>> Is
>>> this okay or is that what you meant by saying you don't want to kill
>>> the
>>> job?
>>>
>>
>>
>>
>>>> Also how do I hold the user's jobs waiting on the queue so that I  
>>>> can
>>>> release them in a manner that keeps some of the slots open for  
>>>> other
>>>> users?
>>>>
>>>> ----------------------------------------------------------------------------
>>>> all.q at compute-0-8.local        BIP   4/4       4.00     lx26-amd64
>>>>  13512 0.25000 user1_SOLVER user1        s     07/06/2009
>>>> 21:08:09     4
>>>> ----------------------------------------------------------------------------
>>>>
>>>> Although this job is "suspended", it is still running on  
>>>> compute-0-8
>>>> and taking up four CPUs.
>>>>
>>>> ------------------------------------------------------
>>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206003
>>>>
>>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
>>>> ].
>>>>
>>>>
>>> -- 
>>> Dan Gruhn
>>> Group W Inc.
>>> 8315 Lee Hwy, Suite 303
>>> Fairfax, VA, 22031
>>> PH: (703) 752-5831
>>> FX: (703) 752-5851
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206005
>>>
>>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net
>>> ].
>>>
>>
>> ------------------------------------------------------
>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206007
>>
>> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
>> ].
>>
>
> -- 
> Dan Gruhn
> Group W Inc.
> 8315 Lee Hwy, Suite 303
> Fairfax, VA, 22031
> PH: (703) 752-5831
> FX: (703) 752-5851
>
> ------------------------------------------------------
> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206019
>
> To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net 
> ].

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=206024

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list