[GE users] GE in heterogenic environment

Jacek Strzelczyk szczelba at op.pl
Wed May 28 19:05:14 BST 2008


    [ The following text is in the "ISO-8859-2" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hello everyone,

I'm new here. I decided to join this group because I had some adventures 
with GE while working on my thesis. I successfully installed SGE 6.1 
with MPICH2 in heterogenic environment - network consisted of Linux and 
Windows machines, and I made it all to work at boot, so no manual 
startup is needed. My testing network consisted of three Fedora Core 3 
machines and one machine with Windows 2000. I haven't seen any document 
describing that kind of installation, so I decided to write one. There 
are some tricks on the way, and I hope my experience can be helpful.
You can find my document in attachment. Tell me if this can be helpful 
to anyone (maybe I can publish it somewhere on the web?). All comments 
are welcome.

Regards,
Jacek Strzelczyk


    [ Part 2: "Attached Text" ]

Installing SGE 6.1 in a heterogenic (Linux Fedora Core release 3 and Windows 2000) environment with MPICH2.

Author: Jacek Strzelczyk <jacek.strzelczyk at gmail.com>

1. Pre-install requirements
	1.1 NIS
	1.2 NFS
		1.2.1 on Linux (NFS server and clients)
		1.2.2 on Windows (NFS clients)
2. SGE installation
	2.1 SGE Linux master host
	2.2 SGE Linux exec hosts
	2.3 SGE Windows exec hosts
3. MPICH2
	3.1 on Linux
	3.2 on Windows
	3.3 Add MPICH2 as parallel environment to SGE
4. Configure Interix
5. Post-install check
6. Test


1. Pre-install requirements

	1.1 NIS

	NIS is a service that provides information, that has to be known throughout the network, to all machines on the network. It can be very helpful in maintaining coherent user structure on all the nodes in grid. Full NIS HOWTO can be found here: http://www.linux-nis.org/nis-howto/HOWTO/. For purposes of this HOWTO one user account is needed. Name it 'sgeadmin' and add it to NIS database. Set the $HOME on ???/usr/SGE???.
	Also /etc/hosts can be added to NIS.

	1.2 NFS

	Having a common filesystem to install and run SGE is a simple and flexible solution. It can be achieved in many ways, and I'll focus on NFS. The full NFS HOWTO can be found here: http://nfs.sourceforge.net/nfs-howto/. The easiest way would be installing NFS server on the machine purposed to be SGE master host. The rest of the hosts will be NFS clients.

	1.2.1 NFS on Linux

	NFS server:
	To prepare and share the directory with SGE do:
	$ mkdir /usr/SGE
	$ echo ???/usr/SGE  M1(rw,no_root_squash,async) M2(rw,no_root_squash,async) M3(rw,no_root_squash,async)??? >> /etc/exports	     #	Where M1, M2 and M3 are the names of client hosts (need to be in /etc/hosts).
	Restart NFS.

	NFS client:
	$ mkdir /usr/SGE
	$ chown sgeadmin /usr/SGE
	$ mount -t nfs masternode:/usr/SGE /usr/SGE  #should be added to fstab with suid option

	1.2.2 NFS on Windows

	To mount network drive in Windows login as Administrator and type:
	>net use X: \\masternode\usr\SGE
	To make it automatically at each system boot use AutoExNT (http://support.microsoft.com/kb/243486):
	a) Using a text editor (such as Notepad), create a batch file named Autoexnt.bat and include the commands you want to run at startup in this file ??? that would be
	???@net use X:\ \\masternode\usr\SGE???
	b) Copy the Autoexnt.bat file you just created, in addition to the Autoexnt.exe, Servmess.dll, and Instexnt.exe files located in the Resource Kit CD-ROM (or here: http://www.dynawell.com/reskit/microsoft/win2000/autoexnt.zip) to the C:\WINNT\System32 folder on your computer. 
	c) At a command prompt, type instexnt install, and then press ENTER.
	You should then receive the following message: 
	CreateService AutoExNT SUCCESS with InterActive Flag turned OFF 

	This will create AutoExNT service in Windows, that will automatically mount /usr/SGE as X: drive at boot time. To be sure it will happen after all network connections are up add some dependencies in windows registry. Open registry editor regedt32 (not regedit!). Go to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AutoExNT and add string value named ???DependOnService??? with value ???LanmanWorkstation???.


2.SGE installation

	2.1 SGE Linux master host

	First, create /usr/SGE/.rhosts file containing all hosts in your SGE installation. Chmod it to 600. Check if rsh works on Linux machines by executing: 
	sgeadmin $ rsh otherlinuxhost date

	Then, add three lines to /etc/services:
	???sge_execd		535/tcp
	sge_commd	536/tcp
	sge_qmaster	537/tcp???

	Download SGE 
	Common Files:
	http://gridengine.sunsource.net/download/SGE61/sge-6.1-common.tar.gz 
	Linux files: 
	http://gridengine.sunsource.net/download/SGE61/sge-6.1-bin-lx24-x86.tar.gz 
	Unpack them: 
	#su - sgeadmin 
	$mv sge-6.1-common.tar.gz /usr/SGE/ 
	$mv sge-6.1-bin-lx24-x86.tar.gz /usr/SGE/ 
	$cd /usr/SGE/ 
	$tar -xvf sge* 
	Befor starting installation procedure, file util/arch needs to be edited. Change line 248 to: 
	3*|4*|5*
	and then:
	$su -
	#./install_qmaster
	Full installation procedure described in SGE Docs: http://docs.sun.com/app/docs/doc/817-6118/emrar?q=N1GE&a=view.

	2.2 SGE Linux exec hosts

	Described in SGE Doc: http://docs.sun.com/app/docs/doc/817-6118/emrar?q=N1GE&a=view.

	2.3 SGE Windows exec hosts

	* Create user 'sgeadmin' locally. 
	* Download Services For Unix (SFU) from http://www.microsoft.com/downloads/details.aspx?FamilyID=896c9688-601b-44f1-81a4-02878ff11778&displaylang=en
	* Turn off DEP by adding ???/noexecute=alwaysoff??? to C:\boot.ini
	* Run SFU installation procedure and add Interix SDK and Interix GNU SDK to default installation.
	* Check if User Mapping daemon is working after installation is complete
	* Go to Menu Start -> Programs -> Windows Services for Unix -> Configuration -> User Name Mapping, choose NIS and Show User Maps. Then connect unix user sgeadmin with windows user of the same name. 
	* Mount X: drive as /usr/SGE in Interix:
	%ls -l /dev/fs    # should show also X
	%ln -s /dev/fs/X /usr/SGE
	* Run telnet and rsh from Interix ??? log in to Windows as Administrator, turn off telnet and rsh deamons from Windows permanently, remove comment marks from rsh and telnet lines in /etc/inetd.conf in Interix, restart inet: 
	%ps -ef | grep inetd
	%kill -1 <PIDofINETD>
	Check Windows firewall and open ports 23 (telnet) and 514 (shell). Use nmap to check if everything is ok.
	* Add all grid machines to /etc/hosts in Interix
	* Install bootstrap installer
	http://www.interopsystems.com/tools/pkg_install.htm
	* Add line ???64.235.106.194 ftp.interopsystems.com??? to C:\Windows\system32\drivers\etc\hosts so that ftp can reach this portal (otherwise there are problems in address translation)
	* Install bash:
	%pkg_update -L bash
	* Create $HOME/.rhosts file in Interix containing all hosts in your SGE installation
	* Download windows specific SGE files (sge61u4_addarchs_targz.zip) from Sun's webpage. Available after logging.
	* Unpack and copy them to proper directories
	* Run the installation:
	%/usr/SGE/install_execd

3. MPICH2

	3.1 MPICH2 on Linux

	* Download mpich2-1.0.7.tar.gz archive
	* Configure and install:
	$ mkdir /usr/SGE/mpich2
	$./configure ???prefix=/usr/SGE/mpich2 ???with-pm=smpd --with-pmi=smpd
	$ make
	$ make install
	$ cd $HOME
	$ echo ???phrase=behappy??? > .smpd
	* Add /usr/SGE/mpich2/bin to $PATH
	* Check if smpd daemon is working, if not, run it by smpd -s
	* To compile MPI programs:
	$ gcc mpi-test.c -ompi-test -I/usr/SGE/mpich2/include -L/usr/SGE/mpich2/lib -lmpich
	* Create credentials file:
	$ echo ???sgeadmin\n sgeadmin??? > /usr/SGE/credentials
	$ chmod 600 /usr/SGE/credentials

	3.2 MPICH2 on Windows

	* Download and install Visual C++ 2005 SP1
	* Download and install MPICH2 for Windows
	* Check if MPICH2 Process Manager daemon is working
	* To compile programs in Windows install Dev-Cpp (or other programming environment)
 	* Compile source code in Dev-Cpp with MPICH2 libraries and headers: -I???C:\Program Files\MPICH2\include??? -L???C:\Program Files\MPICH2\lib??? -lmpi
	* Copy the compiled program into C:\WINNT\system32 (or other directory from windows $PATH)

	3.3 Add MPICH2 as parallel environment to SGE

	Use qmon from SGE to manually add MPICH2 as parallel environment to SGE. Description of PE can be found here: http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html

4.Configure Interix

	Simple configuration needs to be done, so that Interix will start after network drive with SGE (drive X:) is mounted on Windows machine. Then, SGE can be started automatically by Interix startup script. To do that, add dependency to the Windows Registry:
	* open regedt32
	* Go to: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Interix and add string value named ???DependOnService??? with value ???AutoExNT???.
	* Copy and adjust one of the Interix startup scripts from /etc/init.d to start SGE.
	* Add symbolic links to sge start script in /etc/rc2.d
	Then, after Windows machine restart all network connections should be up, network drive X: (with SGE) should be mounted and Interix startup script should start SGE exec daemon. All automatically, with no need of user logging.


5.Post-install check

	Ok, so now you should have:
	* Linux master host with: NIS and NFS servers, SGE master and SGE exec daemons running. Check by ps aux | grep sge ??? there should be three processes: sge_qmaster, sge_commd and sge_execd. Also MPICH2 daemon ??? smpd.
	* Linux execution hosts with: mounted /usr/SGE from master host, SGE exec and smpd daemons running. 
	* Windows execution hosts with: mounted /usr/SGE as network drive X:, smpd daemon running (from Windows version of MPICH2) and Interix with SGE exec daemon.

6.Test

	To test the installation, create simple MPI program (or use example from mpich2/examples) and compile it both:
	* on Linux:  gcc -I/usr/SGE/mpich2/include -L/usr/SGE/mpich2/lib -lmpi -ompi-test mpi-test.c
	* on Windows: use Dev-Cpp with arguments -I???C:\Program Files\MPICH2\include??? -L???C:\Program Files\MPICH2\lib??? -lmpi
	Then copy output binary files to the one directory in $PATH, both on Windows (ex. C:\WINNT\system32) and Linux (ex. /usr/bin/). 	Create script (examples in /usr/SGE/examples) that executes mpirun:
	mpirun -n $NSLOTS -machinefile $TMPDIR/machines -pwdfile /usr/SGE/credentials mpi-test
	It should give you proper results (I hope...)! 



    [ Part 3: "Attached Text" ]

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



More information about the gridengine-users mailing list