[GE users] sge and nodes, please need some recomendation

Reuti reuti at staff.uni-marburg.de
Fri Aug 4 18:47:59 BST 2006


Am 04.08.2006 um 19:40 schrieb Christian Fernandez:

> Hi Thanks for the response, but I am a bit confused still. this is  
> what
> we have done and maybe you can tell me what I am doing wrong.
> 1 we created the master host, runs ok.
> 2 we created a local HD install of a exechosts , we test them and run
> test jobs and all fine.
> 3 I create a file system based on this host above and made a pxe boot
> image, the system starts boots, sgeexec starts and creates the
> spool/nameofnode directory with files on it. now when I go to master
> host and type:
> qconf -sh
> node74 <--- the one working
> node75 <--- node pxe booting
> masterhost <-- our master hosts.

a) shows it up in `qhost`?

b) all.q is using the @allhosts hostgroup?

Then maybe you have to add it there:

qconf -aattr hostgroup hostlist node75 @allhosts

-- Reuti

> but when I do:
> qstat -f
> I can only see node74 like:
> queuename                      qtype used/tot. load_avg  
> arch          states
> ---------------------------------------------------------------------- 
> ------
> all.q at node74         BIP   0/4       -NA-     lx24-x86      au
>
>
> nothing about our new node75
> my question is here.. what do I need to do for node75 be able to  
> listen
> on the queue like node74 so I can submit jobs to it. BTW my master  
> host
> is the submit hosts also for testing.
>
>
> Thanks.
>
>
>
> Reuti wrote:
>> Hi,
>>
>> Am 03.08.2006 um 19:02 schrieb Christian Fernandez:
>>
>>> I installed sge and 4 nodes, on the nodes I did the execution hosts
>>> install like the manual, says..
>>> and is working great.. but this test nodes are installed on the  
>>> HD and
>>> for test, now we need to move this to 80 nodes that will but from  
>>> a pxe
>>> image mounting only /usr,  my question here is.. can I just do  
>>> one node
>>> install on this image and when each node will boot from it will  
>>> be able
>>> to work? meaning does it need specific information for each that  
>>> I cant
>>> have on one image? actually we are running openPBS and this seen  
>>> not be
>>> an issue. but i'm wondering with SGE, and what will be the best  
>>> way to
>>> do this... I saw the possibilities of having 80 different   
>>> directories
>>> mounting /opt from the master node but we really are trying to avoid
>>> that cause is to much nfs and to many directories to maintain and
>>> knowing we are going to double the nodes in 1 year.
>>
>> this is they way I installed all my clusters up to know: install one
>> time execd (which might even be on the head-node, and later be
>> deactivated) which will in the first place set up the correct paths
>> and variables in $SGE_ROOT/default/common/sgeexecd. So you just need
>> to add all nodes then with a loop on the command line as admin hosts,
>> run the sgeexecd script during startup of the nodes, and they will
>> show up on their own as exec hosts.
>>
>> During startup of a node you will see "No local configuration found,
>> using global", but this is what you want most likely anyway. You  
>> could
>> even remove the local configuration of the head-node, if the global
>> one is sufficient anyway.
>>
>> Only thing to think about to reduce network traffic: use a local  
>> spool
>> directory like /var/spool/sge (for the qmaster you would need to
>> specify it as /var/spool/sge/qmaster, the nodes directories are
>> created automatically) owned by sgeadmin (or your admin of SGE). You
>> find some info here:
>>
>> http://gridengine.sunsource.net/howto/nfsreduce.html
>>
>> Cheers - Reuti
>>
>> PS: Run the nodes on their local HDs, or are they diskless?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list