[GE users] Using cycles from a 2nd SGE cluster
richard.hierlmeier at sun.com
Fri Jan 8 15:14:45 GMT 2010
> I haven't used subordinate queues before- in sge. In PBSpro,
> I used routing queues to between two master-nodes. I suppose
> you could think of all queues as subordinate to the route
> queue. Although - spare-cyle jobs should be subordinate to
> the owners job mix - so this is useful to think about at
> at later time.
> Next- he's explaining exactly what we've already done and
> talked about. IN this scenario - all slaves would only
> run SDM and sgeexecd would be removed at install time.
sgeexecd is installed on a node of SDM moves the host into the cluster. The
execd is uninstalled if SDM removes the host from cluster. The installation of
the execd is only a matter of seconds.
However the cluster has a problem if the host is part of an advanced
reservation. The reservation will fail. SDM does not give any guarantees that
the host will come back in time.
> Each master node would run independantly.
This is the big advantage of the SDM solution. The configuration of the two
clusters are completely independent.
> I don't see
> any way to prioritize the queues
You can not prioritize queues with SDM, SDM does not know a queue. But you can
gives certain kind of jobs a higher urgency. For each job category the
corresponding SDM service will have a MaxPendingJobsSLO. The SLO will only
produce a resource request if pending jobs belonging to this job category are
available. The resource request with the highest urgency will win. The cluster
will get more resources (=host).
> or tie the resources
> of one cluster to it's own master -beyond just naming
> in sdm.
In SDM you can have static resources. SDM will never move such a static host
away from the cluster.
> SLO is complex mess to deal with - very little
> guidance on how to do much with it beyond basics.
If you need any help with the SLO setup your are welcome.
> I'm not clear about how the nodes are powered off and
> on by sdm - and what associated hw,firmware is involved
> in this - could be that's only a case for blade,chassis,
> or only the latest hardware, ie idrac.
SDM 1.0u5 comes with a power saving solution that turns of host via IPMI.
However this is not hard coded. SDM has a well defined scripting interface for
power saving. If you can power off your host remotely from the command line
power saving with SDM will be possible.
> The install/uninstall of sgeexecd is rather useless work
I agree, SDM automates this task for you.
> - I don't understand
> why a static pool cannot be designated spare .
> acluster sgemaster
> bcluster sgemaster
> sparepool aslave1..20 bslave1..13
I do not really understand what a static pool is. Is it a SDM spare_pool or a
> all OS and sw would need to be consistant enough for
> any job from each owner mix. And the user-acct need
> to be same for the shared job, usatlas1. I suppose if a user
> was not defined in one pool - but not the other,
> the job would fail, error. Not sure if sge can
> deal with a two indpendant uid/gid.
SDM supports different kind of resources. Categorize your jobs and setup for
each job category a SLO. The SLO will request the needed hosts.
> On Fri, 8 Jan 2010, Richard Hierlmeier wrote:
>> izen wrote:
>>> My sys admin has been trying to configure two independent, linux
>>> clusters with static SGE pools, such that when the first cluster
>>> batch queue fills, additional jobs will fall over to a low priority
>>> queue in the second cluster. Each cluster has its own master node,
>>> and it would be a political non-starter to change that. So far, my
>>> admin has not succeeded. Is his configuration with static pools
>> I think you are talking about subordinate queues. Not it is not doable
>> with this feature. Sounds more like a use case for the Service Domain
>> Manager (SDM) module of SGE.
>>> If so, we would welcome some guidance in configuring our SGE
>>> deployment to do this.
>> SDM implements resource sharing between two or more SGE clusters. For
>> each SGE cluster a SLO (Service Level Objective) can be defined. This
>> SLO will request new hosts whenever there are jobs in the pending
>> queue. SDM takes hosts out of spare_pool and installs the execd of the
>> cluster on it. Once workload goes down the hosts are removed from the
>> SGE cluster and put back to the spare_pool.
>> You can implement power saving (hosts in spare_pool can be powered
>> off) with SDM. In addition you can get hosts from a cloud service like
>> For a good introduction please have a look at
>>> We are beginning to wonder whether this is undoable with static
>>> pools, and need to switch to a dynamic pool.
>>> Input would be most welcome. Thanks! -Joe
>>> To unsubscribe from this discussion, e-mail:
>>> [users-unsubscribe at gridengine.sunsource.net].
>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
>> - - -
>> Richard Hierlmeier Phone: ++49 (0)941 3075-223
>> Software Engineering Fax: ++49 (0)941 3075-222
>> Sun Microsystems GmbH
>> Dr.-Leo-Ritter-Str. 7 mailto: richard.hierlmeier at sun.com
>> D-93049 Regensburg http://www.sun.com/grid
>> Sitz der Gesellschaft:
>> Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
>> Amtsgericht Muenchen: HRB 161028
>> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
>> Vorsitzender des Aufsichtsrates: Martin Haering
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Richard Hierlmeier Phone: ++49 (0)941 3075-223
Software Engineering Fax: ++49 (0)941 3075-222
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7 mailto: richard.hierlmeier at sun.com
D-93049 Regensburg http://www.sun.com/grid
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].
More information about the gridengine-users