[GE users] Using gridengine to administer the cluster

Daniel Templeton Dan.Templeton at Sun.COM
Tue Aug 29 14:50:11 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Joe,

SGE doesn't forbid root from submitting jobs.  Also, an easier way to 
submit once to each host would be to set up a consumable with a value of 
1 for each host.  All you'd have to do then is submit n jobs, each 
requesting the consumable, where "n" is the number of hosts.  An even 
easier trick would be to set up a queue specifically for the purpose, 
with one slot per host.  (In case cases, though, the trick would be 
making sure that all jobs got scheduled before any job finished.  
Otherwise, two jobs might go sequentially to one host.)  I do agree, 
though, that SGE job submission is a little too decoupled to be useful 
for general administrative purposes.

Daniel

Joe Landman wrote:
> Hi Michael:
>
>   You would need to make sure the user had sudo capability on each node.
>  Last I remember, SGE doesn't allow root user to submit jobs.
>
>   You could generate the queue list with a simple qselect
>
> 	[landman at minicc ~]$ qselect
> 	all.q at compute-0-2.local
> 	all.q at minicc.local
> 	all.q at compute-0-0.local
> 	all.q at compute-0-3.local
> 	all.q at compute-0-1.local
>
> Then very likely, you could do a simple for loop over the hosts using
> qsub -q queue
>
> 	#!/bin/tcsh
> 	foreach q (`qselect`)
> 	 qsub -q $q sudo -u root $@
> 	end
>
> Now here is why you might not want to do this.
>
> 1) if a queue goes down due to a machine crash, or somehow gets flushed
> ahead of time, this could leave  machine(s) in an odd state relative to
> the rest.
>
> 2) You have to set up sudo across your cluster.  This is unfortunately
> not easy.
>
> 3) you don't have a guaranteed immediate execution of the administrative
> commands.
>
> You might want to look at pdsh for your cluster.  It is a good tool
> designed specifically to enable administration of large collections of
> machines from a command line.  To run /etc/init.d/lmsensors across a
> cluster, you would
>
> 	pdsh /etc/init.d/lmsensors start
>
> Joe
>
>
> Michael James wrote:
>   
>> Is there a way of scheduling a job
>>  so it gets run on each cluster node once?
>>
>> If I could do that I could use gridengine to administer
>>  the cluster, nodes could pull their own updates, etc.
>>
>> Just a thought...
>> michaelj
>>
>>     
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list