[GE users] sge and nodes, please need some recomendation

Christian Fernandez cfernandez at voicesignal.com
Fri Aug 4 22:27:08 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

to add here are is some output that you may find helpful to assist me.
qhost
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO 
SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -      
-       -
node74                  lx24-x86        4  0.00    2.0G   23.5M 
964.8M     0.0
node75                  lx24-x86        4  0.00    2.0G   23.3M    
0.0     0.0


qhost -q
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO 
SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -      
-       -
node74                  lx24-x86        4  0.00    2.0G   23.2M 
964.8M     0.0
   all.q                BIP   0/4
node75                  lx24-x86        4  0.00    2.0G   23.3M    
0.0     0.0
   all.q                BIP   0/1


were we are concern is that it shows 0/1 instead of 0/4.





Christian Fernandez wrote:
> Hey, Thanks a lot that did work great, and one more thing I learn today :-)
> now the only detail we missing is that is only seen 1 processor instead
> of 4, but the install I did with the install script did found 4 so I am
> thinking that the install script may autodetect this and sent it to sge
> but when I did it manually it defaulted to 1, how can I tell sge that
> that node have 4 cpu's?
>
> Thanks again.
> have a nice weekend.
>
>
>
> Reuti wrote:
>   
>> Am 04.08.2006 um 19:40 schrieb Christian Fernandez:
>>
>>     
>>> Hi Thanks for the response, but I am a bit confused still. this is what
>>> we have done and maybe you can tell me what I am doing wrong.
>>> 1 we created the master host, runs ok.
>>> 2 we created a local HD install of a exechosts , we test them and run
>>> test jobs and all fine.
>>> 3 I create a file system based on this host above and made a pxe boot
>>> image, the system starts boots, sgeexec starts and creates the
>>> spool/nameofnode directory with files on it. now when I go to master
>>> host and type:
>>> qconf -sh
>>> node74 <--- the one working
>>> node75 <--- node pxe booting
>>> masterhost <-- our master hosts.
>>>       
>> a) shows it up in `qhost`?
>>
>> b) all.q is using the @allhosts hostgroup?
>>
>> Then maybe you have to add it there:
>>
>> qconf -aattr hostgroup hostlist node75 @allhosts
>>
>> -- Reuti
>>
>>     
>>> but when I do:
>>> qstat -f
>>> I can only see node74 like:
>>> queuename                      qtype used/tot. load_avg arch         
>>> states
>>> ----------------------------------------------------------------------------
>>>
>>> all.q at node74         BIP   0/4       -NA-     lx24-x86      au
>>>
>>>
>>> nothing about our new node75
>>> my question is here.. what do I need to do for node75 be able to listen
>>> on the queue like node74 so I can submit jobs to it. BTW my master host
>>> is the submit hosts also for testing.
>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>> Reuti wrote:
>>>       
>>>> Hi,
>>>>
>>>> Am 03.08.2006 um 19:02 schrieb Christian Fernandez:
>>>>
>>>>         
>>>>> I installed sge and 4 nodes, on the nodes I did the execution hosts
>>>>> install like the manual, says..
>>>>> and is working great.. but this test nodes are installed on the HD and
>>>>> for test, now we need to move this to 80 nodes that will but from a
>>>>> pxe
>>>>> image mounting only /usr,  my question here is.. can I just do one
>>>>> node
>>>>> install on this image and when each node will boot from it will be
>>>>> able
>>>>> to work? meaning does it need specific information for each that I
>>>>> cant
>>>>> have on one image? actually we are running openPBS and this seen
>>>>> not be
>>>>> an issue. but i'm wondering with SGE, and what will be the best way to
>>>>> do this... I saw the possibilities of having 80 different  directories
>>>>> mounting /opt from the master node but we really are trying to avoid
>>>>> that cause is to much nfs and to many directories to maintain and
>>>>> knowing we are going to double the nodes in 1 year.
>>>>>           
>>>> this is they way I installed all my clusters up to know: install one
>>>> time execd (which might even be on the head-node, and later be
>>>> deactivated) which will in the first place set up the correct paths
>>>> and variables in $SGE_ROOT/default/common/sgeexecd. So you just need
>>>> to add all nodes then with a loop on the command line as admin hosts,
>>>> run the sgeexecd script during startup of the nodes, and they will
>>>> show up on their own as exec hosts.
>>>>
>>>> During startup of a node you will see "No local configuration found,
>>>> using global", but this is what you want most likely anyway. You could
>>>> even remove the local configuration of the head-node, if the global
>>>> one is sufficient anyway.
>>>>
>>>> Only thing to think about to reduce network traffic: use a local spool
>>>> directory like /var/spool/sge (for the qmaster you would need to
>>>> specify it as /var/spool/sge/qmaster, the nodes directories are
>>>> created automatically) owned by sgeadmin (or your admin of SGE). You
>>>> find some info here:
>>>>
>>>> http://gridengine.sunsource.net/howto/nfsreduce.html
>>>>
>>>> Cheers - Reuti
>>>>
>>>> PS: Run the nodes on their local HDs, or are they diskless?
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>         
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>     
>
>
>   
> ------------------------------------------------------------------------
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net




    [ Part 2: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list