[GE users] Using cycles from a 2nd SGE cluster

rhierlmeier richard.hierlmeier at sun.com
Fri Jan 8 15:14:45 GMT 2010


Don wrote:
> I haven't used subordinate queues before- in sge. In PBSpro,
> I used routing queues to between two master-nodes. I suppose
> you could think of all queues as subordinate to the route
> queue. Although - spare-cyle jobs should be subordinate to
> the owners job mix - so this is useful to think about at
> at later time.
> 
> Next- he's explaining exactly what we've already done and
> talked about. IN this scenario - all slaves would only
> run SDM and sgeexecd would be removed at install time.

sgeexecd is installed on a node of SDM moves the host into the cluster. The 
execd is uninstalled if SDM removes the host from cluster. The installation of 
the execd is only a matter of seconds.

However the cluster has a problem if the host is part of an advanced 
reservation. The reservation will fail. SDM does not give any guarantees that 
the host will come back in time.

> Each master node would run independantly. 

This is the big advantage of the SDM solution. The configuration of the two 
clusters are completely independent.

> I don't see
> any way to prioritize the queues 

You can not prioritize queues with SDM, SDM does not know a queue. But you can 
gives certain kind of jobs a higher urgency. For each job category the 
corresponding SDM service will have a MaxPendingJobsSLO. The SLO will only 
produce a resource request if pending jobs belonging to this job category are 
available. The resource request with the highest urgency will win. The cluster 
will get more resources (=host).

> or tie the resources
> of one cluster to it's own master -beyond just naming
> in sdm. 

In SDM you can have static resources. SDM will never move such a static host 
away from the cluster.

> SLO is complex mess to deal with - very little
> guidance on how to do much with it beyond basics.

If you need any help with the SLO setup your are welcome.

> I'm not clear about how the nodes are powered off and
> on by sdm - and what associated hw,firmware is involved
> in this - could be that's only a case for blade,chassis,
> or only the latest hardware, ie idrac. 

SDM 1.0u5 comes with a power saving solution that turns of host via IPMI. 
However this is not hard coded. SDM has a well defined scripting interface for 
power saving. If you can power off your host remotely from the command line 
power saving with SDM will be possible.

> The install/uninstall of sgeexecd is rather useless work  

I agree, SDM automates this task for you.

> - I don't understand
> why a static pool cannot be designated spare .
> 
> acluster sgemaster
> bcluster sgemaster
> 
> sparepool aslave1..20 bslave1..13

I do not really understand what a static pool is. Is it a SDM spare_pool or a 
subordinate queue?

> 
> all OS and sw would need to be consistant enough for
> any job from each owner mix. And the user-acct need
> to be same for the shared job, usatlas1. I suppose if a user
> was not defined in one pool - but not the other,
> the job would fail, error. Not sure if sge can
> deal with a two indpendant uid/gid.
> 

SDM supports different kind of resources. Categorize your jobs and setup for 
each job category a SLO. The SLO will request the needed hosts.


Richard

> -/Don
> 
> On Fri, 8 Jan 2010, Richard Hierlmeier wrote:
> 
>>
>> Hi,
>>
>> izen wrote:
>>> My sys admin has been trying to configure two independent, linux 
>>> clusters with static SGE pools, such that when the first cluster 
>>> batch queue fills, additional jobs will fall over to a low priority 
>>> queue in the second cluster. Each cluster has its own master node, 
>>> and it would be a political non-starter to change that. So far, my 
>>> admin has not succeeded. Is his configuration with static pools 
>>> workable?
>>
>> I think you are talking about subordinate queues. Not it is not doable 
>> with this feature. Sounds more like a use case for the Service Domain 
>> Manager (SDM) module of SGE.
>>
>>> If so, we would welcome some guidance in configuring our SGE 
>>> deployment to do this.
>>
>> SDM implements resource sharing between two or more SGE clusters. For 
>> each SGE cluster a SLO (Service Level Objective) can be defined. This 
>> SLO will request new hosts whenever there are jobs in the pending 
>> queue. SDM takes hosts out of spare_pool and installs the execd of the 
>> cluster on it. Once workload goes down the hosts are removed from the 
>> SGE cluster and put back to the spare_pool.
>>
>> You can implement power saving (hosts in spare_pool can be powered 
>> off) with SDM. In addition you can get hosts from a cloud service like 
>> EC2.
>>
>> For a good introduction please have a look at
>>
>> http://www.youtube.com/watch?v=kFrwOdAVxJI
>>
>>
>> Richard
>>
>>>
>>> We are beginning to wonder whether this is undoable with static 
>>> pools, and need to switch to a dynamic pool.
>>>
>>> Input would be most welcome.  Thanks!  -Joe
>>>
>>> ------------------------------------------------------
>>> http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=237301 
>>>
>>>
>>> To unsubscribe from this discussion, e-mail: 
>>> [users-unsubscribe at gridengine.sunsource.net].
>>
>>
>> -- 
>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
>> - - -
>> Richard Hierlmeier           Phone: ++49 (0)941 3075-223
>> Software Engineering         Fax:   ++49 (0)941 3075-222
>> Sun Microsystems GmbH
>> Dr.-Leo-Ritter-Str. 7         mailto: richard.hierlmeier at sun.com
>> D-93049 Regensburg           http://www.sun.com/grid
>>
>> Sitz der Gesellschaft:
>> Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
>> Amtsgericht Muenchen: HRB 161028
>> Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
>> Vorsitzender des Aufsichtsrates: Martin Haering
>>


-- 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Richard Hierlmeier           Phone: ++49 (0)941 3075-223
Software Engineering         Fax:   ++49 (0)941 3075-222
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7	     mailto: richard.hierlmeier at sun.com
D-93049 Regensburg           http://www.sun.com/grid

Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=237391

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list