[GE users] Scheduling question

Charu Chaubal Charu.Chaubal at Sun.COM
Wed Dec 7 21:54:26 GMT 2005


Hi Goncalo,

If I can re-summarize the scenario, it is desired that, as long as group
A has any jobs pending, it should have exclusive access to systems it
owns.  If group A doesn't have any jobs pending, then it's OK for group
B jobs to start running on it.  However, if group A submits new jobs,
then the group A host must drain all group B jobs and start running only
group A jobs.  Is this correct?

One way to do this would be as follows.

-- create two custom resources, "groupAjobs" and "groupBjobs".

-- create two GE "department" lists, "groupA" and "groupB".  Make sure
that all users belong to one or the other department.

-- create two hostgroups, "hostgroupA" and "hostgroupB".  Make sure that
all hosts belong to either one or the other group, depending on who is
the owner.

-- write a load sensor script that measures how many jobs are pending
for each department.  This would involve running a qstat periodically,
something you should be very careful not to do too often.  A very rough
example in shell script would include this:

qstat -s p -ext > /tmp/t.tmp
groupAjobs=`awk '{print $7}' /tmp/t.tmp | grep -c groupA`
groupBjobs=`awk '{print $7}' /tmp/t.tmp | grep -c groupB`

(An example of a similar [ but not identical ] load sensor can be found
here:
http://gridengine.sunsource.net/howto/TransferQueues/clusterload.sh)

-- set the slot limit on a per-host basis:
qconf -mattr exechost complex_values slots=<N> `qconf -sel`
where <N> is the maximum number jobs you want to allow simultaneously on
the host.

-- Create two cluster-wide queues, called "owner" and "guest".  Make it
so that each queue has a slots value equal to <N>.

-- Using user_lists and hostgroups, set the access for the "owner" queue
to groupA for hostgroupA, and groupB for hostgroupB, eg, in the queue
definition for "owner":

user_lists	[hostgroupA=groupA],[hostgroupB=groupB]
xuser_lists	[hostgroupA=groupB],[hostgroupB=groupA]

and for "guest", do the exact opposite:

user_lists	[hostgroupA=groupB],[hostgroupB=groupA]
xuser_lists	[hostgroupA=groupA],[hostgroupB=groupB]

-- FINALLY, set the load threshold for the "guest" queue:
load_threshold		[hostgroupA=groupAjobs=0],[hostgroupB=groupBjobs=0]

Hopefully, this means that:

-- the per-host slot limit will prevent a host from being overloaded
-- if only groupB has jobs pending, they can run in either the "owner"
queue on hostgroupB, or the "guest" queue on hostgroupA.
-- if groupA has any jobs pending, it will disable the "guest" queue on
all hosts owned by groupA, still allowing groupB jobs to drain.  From
that point on, hostgroupA will only run the "owner" queue, which will
only run groupA jobs.

Please note that this involves a lot of configuration, so be sure to
test everything out first, including especially the load sensor.

Hope that helps.  If you try this, let us know how it goes!

	Charu



Goncalo Borges wrote On 12/07/05 11:14,:
> Dear All,
> 
> Maybe you can help me and advice me with the following situation:
> 
> Imagine that I have two execution hosts, each one owned by two 
> different groups. If the machines are not loaded, any user from 
> any of the two groups can run jobs in both machines. 
> 
> Now imagine that at a given time, a user from group A submits 4 jobs:
> 	- one will run on machine owned by group A;
> 	- one will run on machine owned by group B;
> 	- The other two will be on hold.
> 
> Some time later, a user from group B submits a job. I would like to know 
> if the following situation is possible with SGE, without any further input
> (priority, deadtime, etc...) from the user or administrator:
> 
> 	- If the machine from group A is the first one to become
>         available, one of the two jobs from user A which were on 
>         hold must pass to the running status in that machine. 
> 
> 	- If the machine from group B is the first one to become
>         available, the job from user B must pass to the run status 
> 	in that machine, although there are two pending jobs from 
>         user A waiting for a longer time. 
>  
> 
> This situation is somehow a pre-requisite in my home institute.
> Different groups contribute to a global computing center. A given 
> group doesn't mind "borrowing" some machines to other group if they 
> are not being used. However, starting from the moment they submit 
> jobs, they want to have full priority on the machines they own.
> 
> Thanks in advance for your replies,	
> Cheers
> 	Goncalo 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 

-- 
####################################################################
# Charu V. Chaubal              # Phone: (650) 786-7672 (x87672)   #
# Grid Computing Technologist   # Fax:   (650) 786-4591            #
# Sun Microsystems, Inc.        # Email: charu.chaubal at sun.com     #
####################################################################


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list