[GE users] migrating service to the shadow master host

Hugo R. Hernandez-Mora hugo.hernandez at loni.ucla.edu
Wed Jul 25 22:11:13 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Here are the messages I have after I started the sgemaster on the master and shadow host, then when migrating from the master to the shadow host:



	07/25/2007 14:04:47|qmaster|cerebro-rmn1|I|read job database with 0 entries in 0 seconds
	07/25/2007 14:04:47|qmaster|cerebro-rmn1|I|qmaster hard descriptor limit is set to 8192
	07/25/2007 14:04:47|qmaster|cerebro-rmn1|I|qmaster soft descriptor limit is set to 8192
	07/25/2007 14:04:47|qmaster|cerebro-rmn1|I|qmaster will use max. 8172 file descriptors for communication
	07/25/2007 14:04:47|qmaster|cerebro-rmn1|I|qmaster will accept max. 99 dynamic event clients
	07/25/2007 14:04:47|qmaster|cerebro-rmn1|I|starting up GE 6.1 (lx24-amd64)
	07/25/2007 14:05:13|qmaster|cerebro-rmn1|E|commlib error: got read error (closing "cerebro-rmn2.data/qconf/2")
	07/25/2007 14:05:28|qmaster|cerebro-rmn1|I|controlled shutdown 6.1
	

- Hugo

Rayson Ho wrote: 

	If qmaster is shut down cleanly, the lock file should be deleted, and 
	shadow master can only start a new qmaster when a lock file is absent. 
	
	BTW, did you get any message in the qmaster log file?? 
	
	Rayson 
	
	
	
	On 7/25/07, Hugo R. Hernandez-Mora <hugo.hernandez at loni.ucla.edu> <mailto:hugo.hernandez at loni.ucla.edu>  wrote: 
	

		Ok, now I have the lock file on the spool directory and the services are 
		shutting down when migrating from the shadow server but no sge_master and 
		sge_schedd are running on the shadow host or the master host. 
		WHat I did was to disable the iptables in the way the ports 6444 and 6445 
		were blocked.  Now I think I have a different problem.  Any ideas? 
		- Hugo 
		
		master host: 
		rmn1.data> ps -efl | grep sge 
		1 S sgeadmin  4511     1  0  77   0 - 14117 -      13:02 ?        00:00:00 
		/usr/sge/bin/lx24-amd64/sge_shadowd 
		
		shadow host: 
		rmn2.data> ps -efl | grep sge 
		1 S sgeadmin  4718     1  0  77   0 - 14111 -      13:03 ?        00:00:00 
		/usr/sge/bin/lx24-amd64/sge_shadowd 
		
		rmn2.data> cat $SGE_ROOT/cell/common/act_qmaster 
		cerebro-rmn1.data 
		rmn2.data> cat $SGE_ROOT/cell/common/shadow_masters 
		cerebro-rmn2.data 
		<hdezmora at cerebro-rm 
		
		
		Hugo R. Hernandez-Mora wrote: 
		Nope.   It must be on $SGE_ROOT/cell/spool/qmaster, right?  Anyways, no lock 
		file on the entire $SGE_ROOT directory. 
		- Hugo 
		
		Rayson Ho wrote: 
		The qmaster lock file should be created when the qmaster starts up. If 
		you restart the qmaster, can you see the lock file?? 
		
		Rayson 
		
		
		
		On 7/25/07, Hugo R. Hernandez-Mora <hugo.hernandez at loni.ucla.edu> <mailto:hugo.hernandez at loni.ucla.edu>  wrote: 
		Rayson, 
		thanks for your suggestion.  I did it and I have clear the problem is the 
		lock file is not created 
		
		       lock_file_read_retries=10 
		       lock_file_read_count=0 
		       lock_file_found=0 
		       while [ $lock_file_read_count -lt $lock_file_read_retries ]; do 
		          if [ -f $qmaster_spool_dir/lock ]; then 
		             lock_file_found=1 
		             break 
		          fi 
		          sleep 3 
		          lock_file_read_count=`expr $lock_file_read_count + 1` 
		       done 
		
		       if [ $lock_file_found -eq 0 ]; then 
		       #  old qmaster did not write lock file 
		          echo "   old qmaster did not write lock file. Cannot migrate 
		qmaster." 
		          echo "   Please verify that qmaster on host $actual_qmaster_host 
		is down" 
		          echo "   and make sure that the lock file in qmaster spool 
		directory is" 
		          echo "   read-able." 
		          exit 1 
		       fi 
		
		There is something preventing the creation of the lock file on the qmaster 
		spool directory, but what???? :-( 
		- Hugo 
		
		
		Rayson Ho wrote: 
		sgemaster is a Bourne shell script. You can do a little debugging 
		yourself if you add some debug "echo"s in the script -- search for 
		"old qmaster did not write lock file". 
		
		Rayson 
		
		
		
		On 7/25/07, Hugo R. Hernandez-Mora <hugo.hernandez at loni.ucla.edu> <mailto:hugo.hernandez at loni.ucla.edu>  wrote: 
		John, 
		thanks for your suggestion.   I did it whit the same results.   It is 
		supposed a lock file must be created on the spool directory, right? 
		Both, master and shadow hosts can read/write into the NFS filesystem 
		which is in charge of /usr/sge but the lock file can't be written by the 
		master when migrating the service.  It could be permissions?   The 
		sgemaster is owned by root (it must be) on /usr/init.d/sgemaster and the 
		NFS partition is owned by sgeadmin. 
		- Hugo 
		
		John Hearns wrote: 
		> Hugo R. Hernandez-Mora wrote: 
		>> 
		>> 
		>> 
		>> having the same result.  On the server side, we have 
		>> 
		>>     */usr/sge 
		192.168.4.0/255.255.252.0(rw,async,no_root_squash)* 
		> 
		> Maybe the "async" mount option? 
		> 
		> 
		--------------------------------------------------------------------- 
		> To unsubscribe, e-mail: 
		users-unsubscribe at gridengine.sunsource.net 
		> For additional commands, e-mail: 
		users-help at gridengine.sunsource.net 
		> 
		
		-- 
		Hugo R. Hernandez-Mora 
		System Administrator 
		Laboratory of Neuro Imaging, UCLA 
		635 Charles E. Young Drive South, Suite 225 
		Los Angeles, CA 90095-7332 
		Tel: 310.267.5076 
		Fax: 310.206.5518 
		hugo.hernandez at loni.ucla.edu 
		-- 
		
		"Si seus esfor?os, foram vistos com indefren?a, não desanime, 
		que o sol faze un espectacolo maravilhoso todas as manhãs 
		cuando a maior parte das pessoas, ainda estam durmindo" 
		
		--------------------------------------------------------------------- 
		To unsubscribe, e-mail: 
		users-unsubscribe at gridengine.sunsource.net 
		For additional commands, e-mail: 
		users-help at gridengine.sunsource.net 
		
		
		
		--------------------------------------------------------------------- 
		To unsubscribe, e-mail: 
		users-unsubscribe at gridengine.sunsource.net 
		For additional commands, e-mail: 
		users-help at gridengine.sunsource.net 
		
		
		-- 
		Hugo R. Hernandez-Mora 
		System Administrator 
		Laboratory of Neuro Imaging, 
		UCLA 
		635 Charles E. Young Drive South, Suite 225 
		Los Angeles, CA 
		90095-7332 
		Tel: 310.267.5076 
		Fax: 
		310.206.5518 
		hugo.hernandez at loni.ucla.edu 
		-- 
		
		"Si seus esfor?os, foram 
		vistos com indefren?a, não desanime, 
		que o sol faze un espectacolo 
		maravilhoso todas as manhãs 
		cuando a maior parte das pessoas, ainda estam 
		durmindo" 
		
		
		--------------------------------------------------------------------- 
		To unsubscribe, e-mail: 
		users-unsubscribe at gridengine.sunsource.net 
		For additional commands, e-mail: 
		users-help at gridengine.sunsource.net 
		
		
		-- 
		

	Hugo R. Hernandez-Mora 
	System Administrator 
	Laboratory of Neuro Imaging, 
	

		UCLA 
		

	635 Charles E. Young Drive South, Suite 225 
	Los Angeles, CA 
	

		90095-7332 
		

	Tel: 310.267.5076 
	Fax: 
	

		310.206.5518 
		

	hugo.hernandez at loni.ucla.edu 
	-- 
	
	"Si seus esfor?os, foram 
	

		vistos com indefren?a, não desanime, 
		

	que o sol faze un espectacolo 
	

		maravilhoso todas as manhãs 
		

	cuando a maior parte das pessoas, ainda estam 
	

		durmindo" 
		



		-- 
		

	Hugo R. Hernandez-Mora 
	System Administrator 
	Laboratory of Neuro Imaging, 
	

		UCLA 
		

	635 Charles E. Young Drive South, Suite 225 
	Los Angeles, CA 
	

		90095-7332 
		

	Tel: 310.267.5076 
	Fax: 
	

		310.206.5518 
		

	hugo.hernandez at loni.ucla.edu 
	-- 
	
	"Si seus esfor?os, foram 
	

		vistos com indefren?a, não desanime, 
		

	que o sol faze un espectacolo 
	

		maravilhoso todas as manhãs 
		

	cuando a maior parte das pessoas, ainda estam 
	

		durmindo" 
		
		


	--------------------------------------------------------------------- 
	To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net 
	For additional commands, e-mail: users-help at gridengine.sunsource.net 
	
	


-- 
Hugo R. Hernandez-Mora
System Administrator
Laboratory of Neuro Imaging, UCLA
635 Charles E. Young Drive South, Suite 225
Los Angeles, CA 90095-7332
Tel: 310.267.5076
Fax: 310.206.5518
hugo.hernandez at loni.ucla.edu
--

"Si seus esfor?os, foram vistos com indefren?a, não desanime, 
que o sol faze un espectacolo maravilhoso todas as manhãs 
cuando a maior parte das pessoas, ainda estam durmindo" 



More information about the gridengine-users mailing list