[GE users] SGE and Preemption

Charu Chaubal Charu.Chaubal at Sun.COM
Thu Oct 28 00:12:05 BST 2004


Hi,


Brian Smith wrote:
> Alright list, I'm gonna shoot you another question.  I have done my
> homework on this one, been in touch with Sun, done all the googling that
> google can handle and poured through pages and pages of documentation.
> 
> Here is my situation: I have two research groups that work off a single
> cluster.  We will denote the two groups as '1' and '2'.  '2' is not
> running any jobs at the moment and '1's queue is getting full.  A user
> from '1' decided to run his job in 'all.q' so that he can use some of
> '2's resources.  The '1' user's job is so large that it uses the entire
> '2' queue.  Now, a '2' user logs in and wants to run a job on his queue
> but there are no available resources.
> 
> I would like '2's job to preempt the large '1' job that is running in
> '2's space.
> 
> So far, I have configured 3 queues: all.q, group1.q and group2.q.  all.q
> is subordinate to group1.q and group2.q.  At this point, I would like
> the job running in 'all.q' to be stopped and rescheduled.  

I assume you understand that, unless your apps are set up with some kind of
checkpointing, they will be killed and restarted.

Although it might sound contradictory to what I wrote above, the way to set this
up is with the Checkpointing Environment.  Basically, you use a "zero-order"
checkpoint, ie, restart from the beginning.

The following HOWTO talks about doing this:
http://gridengine.sunsource.net/project/gridengine/howto/reloc.html

Note that it assumes that you are relocating from a desktop workstation, but the
principle is the same.  Instead of the migrate being triggered by a suspend
threshold, it is triggered by a subordinate suspend.  It doesn't matter how the
suspend is triggered, though.  The suspension is overridden by migration, ie,
stop and reschedule.

Please also note that the setup of checkpointing environments has changed in GE
6 --- hopefully you'll get the gist of it from this HOWTO which refers to GE 5.3.

Hope this helps.

	Charu


I have
> created user groups that belong to group1.q and all.q and a group that
> belongs to group2.q and all.q.
> 
> Now, if I have a user in group1 run a full job (it uses all resources)
> in all.q and a user in group2 runs a job, group2's job has to wait until
> the group1 job finishes.  
> 
> How can I get this to work?  Let me know if you need further info.
> 
> 
> Brian Smith
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
####################################################################
# Charu V. Chaubal              # Phone: (650) 786-7672 (x87672)   #
# Grid Computing Technologist   # Fax:   (650) 786-4591            #
# Sun Microsystems, Inc.        # Email: charu.chaubal at sun.com     #
####################################################################


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list