[GE users] migrating service to the shadow master host

Rayson Ho rayrayson at gmail.com
Wed Jul 25 21:57:03 BST 2007


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

If qmaster is shut down cleanly, the lock file should be deleted, and
shadow master can only start a new qmaster when a lock file is absent.

BTW, did you get any message in the qmaster log file??

Rayson



On 7/25/07, Hugo R. Hernandez-Mora <hugo.hernandez at loni.ucla.edu> wrote:
> Ok, now I have the lock file on the spool directory and the services are
> shutting down when migrating from the shadow server but no sge_master and
> sge_schedd are running on the shadow host or the master host.
> WHat I did was to disable the iptables in the way the ports 6444 and 6445
> were blocked.  Now I think I have a different problem.  Any ideas?
> - Hugo
>
> master host:
> rmn1.data> ps -efl | grep sge
> 1 S sgeadmin  4511     1  0  77   0 - 14117 -      13:02 ?        00:00:00
> /usr/sge/bin/lx24-amd64/sge_shadowd
>
> shadow host:
> rmn2.data> ps -efl | grep sge
> 1 S sgeadmin  4718     1  0  77   0 - 14111 -      13:03 ?        00:00:00
> /usr/sge/bin/lx24-amd64/sge_shadowd
>
> rmn2.data> cat $SGE_ROOT/cell/common/act_qmaster
> cerebro-rmn1.data
> rmn2.data> cat $SGE_ROOT/cell/common/shadow_masters
> cerebro-rmn2.data
> <hdezmora at cerebro-rm
>
>
> Hugo R. Hernandez-Mora wrote:
> Nope.   It must be on $SGE_ROOT/cell/spool/qmaster, right?  Anyways, no lock
> file on the entire $SGE_ROOT directory.
> - Hugo
>
> Rayson Ho wrote:
> The qmaster lock file should be created when the qmaster starts up. If
> you restart the qmaster, can you see the lock file??
>
> Rayson
>
>
>
> On 7/25/07, Hugo R. Hernandez-Mora <hugo.hernandez at loni.ucla.edu> wrote:
> Rayson,
> thanks for your suggestion.  I did it and I have clear the problem is the
> lock file is not created
>
>        lock_file_read_retries=10
>        lock_file_read_count=0
>        lock_file_found=0
>        while [ $lock_file_read_count -lt $lock_file_read_retries ]; do
>           if [ -f $qmaster_spool_dir/lock ]; then
>              lock_file_found=1
>              break
>           fi
>           sleep 3
>           lock_file_read_count=`expr $lock_file_read_count + 1`
>        done
>
>        if [ $lock_file_found -eq 0 ]; then
>        #  old qmaster did not write lock file
>           echo "   old qmaster did not write lock file. Cannot migrate
> qmaster."
>           echo "   Please verify that qmaster on host $actual_qmaster_host
> is down"
>           echo "   and make sure that the lock file in qmaster spool
> directory is"
>           echo "   read-able."
>           exit 1
>        fi
>
> There is something preventing the creation of the lock file on the qmaster
> spool directory, but what???? :-(
> - Hugo
>
>
> Rayson Ho wrote:
> sgemaster is a Bourne shell script. You can do a little debugging
> yourself if you add some debug "echo"s in the script -- search for
> "old qmaster did not write lock file".
>
> Rayson
>
>
>
> On 7/25/07, Hugo R. Hernandez-Mora <hugo.hernandez at loni.ucla.edu> wrote:
> John,
> thanks for your suggestion.   I did it whit the same results.   It is
> supposed a lock file must be created on the spool directory, right?
> Both, master and shadow hosts can read/write into the NFS filesystem
> which is in charge of /usr/sge but the lock file can't be written by the
> master when migrating the service.  It could be permissions?   The
> sgemaster is owned by root (it must be) on /usr/init.d/sgemaster and the
> NFS partition is owned by sgeadmin.
> - Hugo
>
> John Hearns wrote:
> > Hugo R. Hernandez-Mora wrote:
> >>
> >>
> >>
> >> having the same result.  On the server side, we have
> >>
> >>     */usr/sge
> 192.168.4.0/255.255.252.0(rw,async,no_root_squash)*
> >
> > Maybe the "async" mount option?
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> >
>
> --
> Hugo R. Hernandez-Mora
> System Administrator
> Laboratory of Neuro Imaging, UCLA
> 635 Charles E. Young Drive South, Suite 225
> Los Angeles, CA 90095-7332
> Tel: 310.267.5076
> Fax: 310.206.5518
> hugo.hernandez at loni.ucla.edu
> --
>
> "Si seus esfor?os, foram vistos com indefren?a, não desanime,
> que o sol faze un espectacolo maravilhoso todas as manhãs
> cuando a maior parte das pessoas, ainda estam durmindo"
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
>
>
> --
> Hugo R. Hernandez-Mora
> System Administrator
> Laboratory of Neuro Imaging,
> UCLA
> 635 Charles E. Young Drive South, Suite 225
> Los Angeles, CA
> 90095-7332
> Tel: 310.267.5076
> Fax:
> 310.206.5518
> hugo.hernandez at loni.ucla.edu
> --
>
> "Si seus esfor?os, foram
> vistos com indefren?a, não desanime,
> que o sol faze un espectacolo
> maravilhoso todas as manhãs
> cuando a maior parte das pessoas, ainda estam
> durmindo"
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail:
> users-help at gridengine.sunsource.net
>
>
> --
Hugo R. Hernandez-Mora
System Administrator
Laboratory of Neuro Imaging,
> UCLA
635 Charles E. Young Drive South, Suite 225
Los Angeles, CA
> 90095-7332
Tel: 310.267.5076
Fax:
> 310.206.5518
hugo.hernandez at loni.ucla.edu
--

"Si seus esfor?os, foram
> vistos com indefren?a, não desanime,
que o sol faze un espectacolo
> maravilhoso todas as manhãs
cuando a maior parte das pessoas, ainda estam
> durmindo"

>
> --
Hugo R. Hernandez-Mora
System Administrator
Laboratory of Neuro Imaging,
> UCLA
635 Charles E. Young Drive South, Suite 225
Los Angeles, CA
> 90095-7332
Tel: 310.267.5076
Fax:
> 310.206.5518
hugo.hernandez at loni.ucla.edu
--

"Si seus esfor?os, foram
> vistos com indefren?a, não desanime,
que o sol faze un espectacolo
> maravilhoso todas as manhãs
cuando a maior parte das pessoas, ainda estam
> durmindo"
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list