[GE users] sge and nodes, please need some recomendation

Christian Fernandez cfernandez at voicesignal.com
Fri Aug 4 21:50:49 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hey, Thanks a lot that did work great, and one more thing I learn today :-)
now the only detail we missing is that is only seen 1 processor instead
of 4, but the install I did with the install script did found 4 so I am
thinking that the install script may autodetect this and sent it to sge
but when I did it manually it defaulted to 1, how can I tell sge that
that node have 4 cpu's?

Thanks again.
have a nice weekend.



Reuti wrote:
> Am 04.08.2006 um 19:40 schrieb Christian Fernandez:
>
>> Hi Thanks for the response, but I am a bit confused still. this is what
>> we have done and maybe you can tell me what I am doing wrong.
>> 1 we created the master host, runs ok.
>> 2 we created a local HD install of a exechosts , we test them and run
>> test jobs and all fine.
>> 3 I create a file system based on this host above and made a pxe boot
>> image, the system starts boots, sgeexec starts and creates the
>> spool/nameofnode directory with files on it. now when I go to master
>> host and type:
>> qconf -sh
>> node74 <--- the one working
>> node75 <--- node pxe booting
>> masterhost <-- our master hosts.
>
> a) shows it up in `qhost`?
>
> b) all.q is using the @allhosts hostgroup?
>
> Then maybe you have to add it there:
>
> qconf -aattr hostgroup hostlist node75 @allhosts
>
> -- Reuti
>
>> but when I do:
>> qstat -f
>> I can only see node74 like:
>> queuename                      qtype used/tot. load_avg arch         
>> states
>> ----------------------------------------------------------------------------
>>
>> all.q at node74         BIP   0/4       -NA-     lx24-x86      au
>>
>>
>> nothing about our new node75
>> my question is here.. what do I need to do for node75 be able to listen
>> on the queue like node74 so I can submit jobs to it. BTW my master host
>> is the submit hosts also for testing.
>>
>>
>> Thanks.
>>
>>
>>
>> Reuti wrote:
>>> Hi,
>>>
>>> Am 03.08.2006 um 19:02 schrieb Christian Fernandez:
>>>
>>>> I installed sge and 4 nodes, on the nodes I did the execution hosts
>>>> install like the manual, says..
>>>> and is working great.. but this test nodes are installed on the HD and
>>>> for test, now we need to move this to 80 nodes that will but from a
>>>> pxe
>>>> image mounting only /usr,  my question here is.. can I just do one
>>>> node
>>>> install on this image and when each node will boot from it will be
>>>> able
>>>> to work? meaning does it need specific information for each that I
>>>> cant
>>>> have on one image? actually we are running openPBS and this seen
>>>> not be
>>>> an issue. but i'm wondering with SGE, and what will be the best way to
>>>> do this... I saw the possibilities of having 80 different  directories
>>>> mounting /opt from the master node but we really are trying to avoid
>>>> that cause is to much nfs and to many directories to maintain and
>>>> knowing we are going to double the nodes in 1 year.
>>>
>>> this is they way I installed all my clusters up to know: install one
>>> time execd (which might even be on the head-node, and later be
>>> deactivated) which will in the first place set up the correct paths
>>> and variables in $SGE_ROOT/default/common/sgeexecd. So you just need
>>> to add all nodes then with a loop on the command line as admin hosts,
>>> run the sgeexecd script during startup of the nodes, and they will
>>> show up on their own as exec hosts.
>>>
>>> During startup of a node you will see "No local configuration found,
>>> using global", but this is what you want most likely anyway. You could
>>> even remove the local configuration of the head-node, if the global
>>> one is sufficient anyway.
>>>
>>> Only thing to think about to reduce network traffic: use a local spool
>>> directory like /var/spool/sge (for the qmaster you would need to
>>> specify it as /var/spool/sge/qmaster, the nodes directories are
>>> created automatically) owned by sgeadmin (or your admin of SGE). You
>>> find some info here:
>>>
>>> http://gridengine.sunsource.net/howto/nfsreduce.html
>>>
>>> Cheers - Reuti
>>>
>>> PS: Run the nodes on their local HDs, or are they diskless?
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>




    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list