[GE users] Using cycles from a 2nd SGE cluster

izen joe at utdallas.edu
Mon Jan 11 07:16:31 GMT 2010

    Don and I compares notes this morning and we have a bunch of 
question, but first I'd appreciate it if you could confirm that I've 
correctly understood the big picture from your email.

Physically, we have two independent clusters, each with its own 
master node: fester and cosmo that server to user communities at our 
university, High Energy Physics (HEP) and Cosmology (C). The fester 
cluster has 19 x dual quad core batch worker nodes. The cosmo cluster 
has 13 x dual quad core batch worker nodes. Naturally each community 
wants their own batch jobs to have priority on their own hardware, 
however C has consented to share cosmo's batch workers if they are 
idling.  Since HEP jobs potentially can run for hours, it is 
important that some of cosmo's worker nodes be permanently restricted 
to C jobs.

If we run a Service Domain Manager, I think we give up thinking about 
the 19+13 nodes as members of independent clusters, but as resources 
that can be assigned either to a HEP or C cluster according to the 
SLO's.  If fester and cosmo are idling, then:


suggests we define a PermanentRequestSLO with low urgency so that the 
worker resources are returned to their "home: clusters.

At present fester's file systems are nsf-mounted by cosmo's workers, 
but the reverse isn't true. Consequently, HEP jobs can execute on the 
cosmo nodes but most C jobs would fail on fester nodes.

cosmo nodes (up to some maximum number, say 10) can be assigned to 
the HEP cluster
several cosmo nodes (say 3) are always assigned to the C cluster
fester nodes are always assigned to the HEP cluster

I anticipate that hard drives in the fester worker nodes will shortly 
be loaded with data that will occasionally be used by the fester 
master node, so the fester nodes should not be turned off. I'd have 
to check with the owners of cosmo, but it might be acceptable to 
power-down / sleep the cosmo worker nodes.

Do I have the SGE jargon and the big SGE picture right?


Where does the SDM run? Can it run one of the master nodes, fester or 
cosmo? Can it run on a dedicated 3rd machine?  Sometimes, the owners 
of fester or cosmo take their machine offline for maintenance, but we 
wouldn't want to hobble the other user community.

How are users handled in SGE?  In general, the HEP physicists don't 
have accounts on cosmo (and C physicists don't have accounts on 
fester). Would this become moot, because when a cosmo node is 
assigned to the HEP cluster, it inherits the user 
list/password/approved certificate list of the HEP cluster?

Do fester and cosmo have to run identical versions of Linux in order 
to contribute workers to the same HEP cluster managed by a SDM? 
Naturally the executing jobs would need to run in both OS versions, 
but does the SDM impose its own requirement?

Our usage example is not so complicated - it must be fairly common. 
For starters, the urgency of a job need not take into account the 
wait time or a deadline. Is there a sample of SLO's for a SDM that we 
can study and use as a starting point for our own?  If not, it would 
be a big help if you could show us a skeleton of what we need.

Thanks very much!

At 4:14 PM +0100 1/8/10, Richard Hierlmeier wrote:
>Don wrote:
>>I haven't used subordinate queues before- in sge. In PBSpro,
>>I used routing queues to between two master-nodes. I suppose
>>you could think of all queues as subordinate to the route
>>queue. Although - spare-cyle jobs should be subordinate to
>>the owners job mix - so this is useful to think about at
>>at later time.
>>Next- he's explaining exactly what we've already done and
>>talked about. IN this scenario - all slaves would only
>>run SDM and sgeexecd would be removed at install time.
>sgeexecd is installed on a node of SDM moves the host into the 
>cluster. The execd is uninstalled if SDM removes the host from 
>cluster. The installation of the execd is only a matter of seconds.
>However the cluster has a problem if the host is part of an advanced 
>reservation. The reservation will fail. SDM does not give any 
>guarantees that the host will come back in time.
>>Each master node would run independantly.
>This is the big advantage of the SDM solution. The configuration of 
>the two clusters are completely independent.
>>I don't see
>>any way to prioritize the queues
>You can not prioritize queues with SDM, SDM does not know a queue. 
>But you can gives certain kind of jobs a higher urgency. For each 
>job category the corresponding SDM service will have a 
>MaxPendingJobsSLO. The SLO will only produce a resource request if 
>pending jobs belonging to this job category are available. The 
>resource request with the highest urgency will win. The cluster will 
>get more resources (=host).
>>or tie the resources
>>of one cluster to it's own master -beyond just naming
>>in sdm.
>In SDM you can have static resources. SDM will never move such a 
>static host away from the cluster.
>>SLO is complex mess to deal with - very little
>>guidance on how to do much with it beyond basics.
>If you need any help with the SLO setup your are welcome.
>>I'm not clear about how the nodes are powered off and
>>on by sdm - and what associated hw,firmware is involved
>>in this - could be that's only a case for blade,chassis,
>>or only the latest hardware, ie idrac.
>SDM 1.0u5 comes with a power saving solution that turns of host via 
>IPMI. However this is not hard coded. SDM has a well defined 
>scripting interface for power saving. If you can power off your host 
>remotely from the command line power saving with SDM will be 
>>The install/uninstall of sgeexecd is rather useless work
>I agree, SDM automates this task for you.
>>- I don't understand
>>why a static pool cannot be designated spare .
>>acluster sgemaster
>>bcluster sgemaster
>>sparepool aslave1..20 bslave1..13
>I do not really understand what a static pool is. Is it a SDM 
>spare_pool or a subordinate queue?
>>all OS and sw would need to be consistant enough for
>>any job from each owner mix. And the user-acct need
>>to be same for the shared job, usatlas1. I suppose if a user
>>was not defined in one pool - but not the other,
>>the job would fail, error. Not sure if sge can
>>deal with a two indpendant uid/gid.
>SDM supports different kind of resources. Categorize your jobs and 
>setup for each job category a SLO. The SLO will request the needed 
>>On Fri, 8 Jan 2010, Richard Hierlmeier wrote:
>>>izen wrote:
>>>>My sys admin has been trying to configure two independent, linux 
>>>>clusters with static SGE pools, such that when the first cluster 
>>>>batch queue fills, additional jobs will fall over to a low 
>>>>priority queue in the second cluster. Each cluster has its own 
>>>>master node, and it would be a political non-starter to change 
>>>>that. So far, my admin has not succeeded. Is his configuration 
>>>>with static pools workable?
>>>I think you are talking about subordinate queues. Not it is not 
>>>doable with this feature. Sounds more like a use case for the 
>>>Service Domain Manager (SDM) module of SGE.
>>>>If so, we would welcome some guidance in configuring our SGE 
>>>>deployment to do this.
>>>SDM implements resource sharing between two or more SGE clusters. 
>>>For each SGE cluster a SLO (Service Level Objective) can be 
>>>defined. This SLO will request new hosts whenever there are jobs 
>>>in the pending queue. SDM takes hosts out of spare_pool and 
>>>installs the execd of the cluster on it. Once workload goes down 
>>>the hosts are removed from the SGE cluster and put back to the 
>>>You can implement power saving (hosts in spare_pool can be powered 
>>>off) with SDM. In addition you can get hosts from a cloud service 
>>>like EC2.
>>>For a good introduction please have a look at
>>>>We are beginning to wonder whether this is undoable with static 
>>>>pools, and need to switch to a dynamic pool.
>>>>Input would be most welcome.  Thanks!  -Joe
>>>>To unsubscribe from this discussion, e-mail: 
>>>>[users-unsubscribe at gridengine.sunsource.net].
>>>- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>>>Richard Hierlmeier           Phone: ++49 (0)941 3075-223
>>>Software Engineering         Fax:   ++49 (0)941 3075-222
>>>Sun Microsystems GmbH
>>>Dr.-Leo-Ritter-Str. 7         mailto: richard.hierlmeier at sun.com
>>>D-93049 Regensburg           http://www.sun.com/grid
>>>Sitz der Gesellschaft:
>>>Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
>>>Amtsgericht Muenchen: HRB 161028
>>>Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
>>>Vorsitzender des Aufsichtsrates: Martin Haering
>- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>Richard Hierlmeier           Phone: ++49 (0)941 3075-223
>Software Engineering         Fax:   ++49 (0)941 3075-222
>Sun Microsystems GmbH
>Dr.-Leo-Ritter-Str. 7	     mailto: richard.hierlmeier at sun.com
>D-93049 Regensburg           http://www.sun.com/grid
>Sitz der Gesellschaft:
>Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
>Amtsgericht Muenchen: HRB 161028
>Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
>Vorsitzender des Aufsichtsrates: Martin Haering


To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list