[GE users] Reservation of resources in duration of checkpoing so that no other job can be able to use those resources.
reuti at staff.uni-marburg.de
Tue Aug 17 16:47:46 BST 2010
Am 17.08.2010 um 16:57 schrieb sgerns:
> I have a scenario here, I am trying to explain this in steps.
> 1. I have some jobs running in the cluster, based on the priority of the queues (Few queues have higher priority than others & so on).
> 2. I am also having a VIP.queue which is having the highest priority among all the queues. So ofcourse jobs which has been submitted through vip queue will definitely go up in queue, as I have shown below.
> 100 0.50500 job_name1 usr1 r normal.q at exehost 32
> 101 0.50500 job_name2 usr1 r normal.q at exehost 16
> 102 0.50500 job_name2 usr1 r normal.q at exehost 32
> 103 0.50500 job_name3 usr2 r normal.q at exehost 128
> 104 0.50973 job_name4 usr3 qw VIP.q at exehost. 64
> 105 0.50973 job_name5 usr3 qw normal.q at exehost 32
> 106 0.51514 job_name6 usr4 qw normal.q at exehost 16
> 3. Now These VIP Jobs are very important jobs & I want these jobs to run ASAP, Hence I will checkpoint the lower priority jobs which are running right now.
how are you checkpointing these jobs, i.e. which checkpointing environment did you set up in SGE?
> 4. Suppose job id 100 & 102 I have selected for checkpointing & send checkpointing signal to those jobs, & It has taken 2 hours to chekpoint these jobs.
> I do not want any other job to run on these resources during this duration of 2 hours.
> Because there is a possibilty that small jobs can get the resources & start running
You mean, you e.g. suspend these jobs, which will be checkpointed as an result? As SGE thinks the queue is free again, it might be used by other jobs. Why not suspend the normal.q at exehostXY
PS: If you are doing all by hand w/o a checkpointing environment, the queue instances normal.q at exehostXY just need to be disabled with `qmod -d normal.q at exehostXY` AFAICS.
> Kindly help me How can I be able to reserve the resources for the duration of checkpointing so that any other un important job can not able to start.
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users