[GE users] VPN startup problem when using the SDM cloud adapter

rhierlmeier richard.hierlmeier at sun.com
Wed Apr 14 08:32:49 BST 2010


For efficiency reasons, the system has converted the large body of this message into an attachment.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253348

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

    [ Part 2: "Included Message" ]

Date: Wed, 14 Apr 2010 08:32:02 +0100
From: Richard Hierlmeier <Richard.Hierlmeier at Sun.COM>
To: users <users at gridengine.sunsource.net>
Subject: Re: [GE users] VPN startup problem when using the SDM cloud adapter

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]



Hi Joris,

On 04/13/10 19:38, jorisroovers wrote:
> ...
> More concretely, would reinstalling SGE using the variables instead of
> the services approach potentially give any results; or should the 2
> methods be able to co-exist?
>

Normally it should work with both methods. However on opensolaris the
/etc/services file does not contain the definitions for gridengine. The AMI is
based on opensolaris 2009.06.

The port definitions from the auto installation configuration file are not used
for the execd installation. It uses only the settings file.

As work around you can

   o install qmaster using the SGE_QMASTER_PORT and SGE_EXECD_PORT variable
   o or you can patch the util/templates/copy_sge_root_to_cloud.sh script (in
     the SDM distribution)  in a way that it defines the gridengine ports in
     /etc/services on the cloud host.


Richard





> Thanks,
>
> Joris
>
>
> On Tue, Apr 13, 2010 at 16:40, rhierlmeier <richard.hierlmeier at sun.com
> <mailto:richard.hierlmeier at sun.com>> wrote:
>
>     For efficiency reasons, the system has converted the large body of
>     this message into an attachment.
>
>     ------------------------------------------------------
>     http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253249
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253249>
>
>     To unsubscribe from this discussion, e-mail:
>     [users-unsubscribe at gridengine.sunsource.net
>     <mailto:users-unsubscribe at gridengine.sunsource.net>].
>
>
>     ---------- Forwarded message ----------
>     From: Richard Hierlmeier <Richard.Hierlmeier at Sun.COM>
>     To: users <users at gridengine.sunsource.net
>     <mailto:users at gridengine.sunsource.net>>
>     Date: Tue, 13 Apr 2010 16:39:55 +0200
>     Subject: Re: [GE users] VPN startup problem when using the SDM cloud
>     adapter
>
>     Hi Joris,
>
>     On 04/13/10 15:45, jorisroovers wrote:
>
>         Ok,
>
>         I got some valuable information out of this (thanks for your
>         quick reply).
>         The second root_xx directory (=output of the xx script) contains
>         the following error
>
>         Cannot contact qmaster. The command failed:
>
>           ./bin/sol-x86/qconf -sh
>
>         The error message was:
>
>           error: could not get environment variable SGE_QMASTER_PORT or
>         service "sge_qmaster"
>              Setting the SGE_QMASTER_PORT variable does not change
>         anything about this, the error stays there. However, if I add
>         sge_qmaster to /etc/services and do the same for sge_execd the
>         installation works.
>
>         This means that somehow, the automatic installation procedure
>         doesn't do this.
>         If checked the install_execd_cloud.conf file and it contains the
>         correct entries for the ports:
>
>         SGE_QMASTER_PORT="6444"
>         SGE_EXECD_PORT="6445"
>
>         How can this happen? Is this the result of some faulty
>         configuration, or something else ?
>
>
>     That's really strange. I don't know how the execd install script
>     evaluates the SGE_QMASTER_PORT. Normally I would say that it is
>     taken from the auto configuration file. However it is also possible
>     that it is taken from $SGE_ROOT/$SGE_CELL/common/settings.sh.
>
>     The cloud-adapter synchronizes the files in
>     $SGE_ROOT/$SGE_CELL/common at the cloud host with the files from
>     qmaster. Do you have the correct SGE_QMASTER_PORT in
>     $SGE_ROOT/$SGE_CELL/common/setting.sh on the cloud host?
>
>     Did you uncomment the "set -x" line in inst_sge? In the debug output
>     you can see what value SGE_QMASTER_PORT has.
>
>     Richard
>
>
>
>
>         Thanks,
>
>         Joris
>
>         On Tue, Apr 13, 2010 at 13:28, rhierlmeier
>         <richard.hierlmeier at sun.com <mailto:richard.hierlmeier at sun.com>
>         <mailto:richard.hierlmeier at sun.com
>         <mailto:richard.hierlmeier at sun.com>>> wrote:
>
>            For efficiency reasons, the system has converted the large
>         body of
>            this message into an attachment.
>
>            ------------------------------------------------------
>
>          http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253233
>         <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253233>
>
>          <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253233
>         <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=253233>>
>
>            To unsubscribe from this discussion, e-mail:
>            [users-unsubscribe at gridengine.sunsource.net
>         <mailto:users-unsubscribe at gridengine.sunsource.net>
>            <mailto:users-unsubscribe at gridengine.sunsource.net
>         <mailto:users-unsubscribe at gridengine.sunsource.net>>].
>
>            ---------- Forwarded message ----------
>            From: Richard Hierlmeier <Richard.Hierlmeier at Sun.COM>
>            To: users <users at gridengine.sunsource.net
>         <mailto:users at gridengine.sunsource.net>
>            <mailto:users at gridengine.sunsource.net
>         <mailto:users at gridengine.sunsource.net>>>
>            Date: Tue, 13 Apr 2010 13:28:31 +0200
>            Subject: Re: [GE users] VPN startup problem when using the
>         SDM cloud
>            adapter
>
>            Hi Joris,
>
>            welcome back.
>
>            you can debug the complete execd installation process of SDM
>         if you
>            set the keepFiles attribute of the SDM executor on the cloud
>         host:
>
>            1. Modify the configuration of the executor:
>
>            % sdmadm mc -c executor
>            <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
>            <executor:executor ...
>              keepFiles="true"/>
>
>            2. Make the configuration active on the cloud host:
>
>            % sdmadm uc -c executor -h <cloud-host>
>
>            3. Reset the resource (it should be in error state)
>
>            % sdmadm rsr -r <cloud-host>
>
>            The resource_resource command will reinstall the execd on the
>         cloud
>            host.
>
>            4. Wait until the resource goes again into error state.
>
>            5. Login in into the cloud host and look into the directory
>              <local_spool_dir>/tmp/executor.
>
>            You can find the local spool directory on the cloud host with
>
>            # sdmadm -s <system_name> sbc -all
>            ts31040system SYSTEM arges 31040
>               spool=/var/sdm/<system_name>   <-- the local spool directory
>                dist=/opt/sdm
>
>            You will find in <local_spool_dir>/tmp/executor for each executed
>            command
>            a directory. The directory names have the format
>            <user_name>_<sequence_nr>.
>            The directory with the highest sequence number will contain the
>            protocol of the last (un)install command (including stderr and
>            stdout output). The username is always root.
>
>            If the the stderr and stdout outout contains still no use full
>            information please enable the debugging of the inst_sge script.
>            Uncomment the line with
>
>            # set -x
>
>            in $SGE_ROOT/inst_sge on the cloud host.  Repeat the
>         installation:
>
>            # cd <local_spool_dir>/tmp/executor/root_10
>            # ./install_execd.sh
>
>
>            Richard
>
>
>            On 04/13/10 12:15, jorisroovers wrote:
>
>                Hello everyone,
>
>                I have been out of the country for some time, which is the
>                reason this reply is coming so late.
>                However, in the mean time I have reinstalled SDM (and SGE) to
>                make sure that I had a clean install to work with.
>                After installing the GE-adapter, the cloud nodes no
>         longer shut
>                down right after the SDM node installation.
>                I thought this would mean the end of my problems (I couldn't
>                really think of anything that could go wrong after that),
>         but it
>                seems that I was wrong.
>
>                Although the cloud node is successfully added to the
>         geadapter
>                service, it fails to install the SGE execution daemon.
>                 "Script install_execd_cloud.sh failed with status 1"
>
>                So, I started debugging again. The different error logs
>         didn't
>                provide usefull information, so I decided to have a
>         closer look
>                at the installation scripts again.
>                 From the
>                /opt/sdm/util/templates/ge-adapter/install_execd_cloud.sh
>         script
>                I learned that
>
>                ./inst_sge -x -noremote -auto $CONF_FILE \
>                       2> $BASEDIR/install_execd.stderr >
>                $BASEDIR/install_execd.stdout &              is called to
>         install the execution deamon on the cloud host and
>                that $CONF_FILE can be found under
>
>          /var/spool/sdm/sdmjoris/tmp/executor/root_1/install_execd_cloud.conf
>                on the cloud node.
>                By quickly copying this configuration-file on the cloud node
>                (before the uninstallation procedure triggered by the ERROR
>                during install deletes this file), I was able to inspect its
>                content.
>
>                File contents of install_execd_cloud.conf on the cloud node
>                (comments stripped):
>
>                SGE_ROOT="/opt/sge"
>                SGE_QMASTER_PORT="6444"
>                SGE_EXECD_PORT="6445"
>                SGE_ENABLE_SMF="false"
>                SGE_CLUSTER_NAME="sgejoris"
>                CELL_NAME="default"
>                PAR_EXECD_INST_COUNT="1"
>                ADMIN_HOST_LIST=""
>                SUBMIT_HOST_LIST="ip-10-245-209-208"
>                EXEC_HOST_LIST="ip-10-245-209-208"
>                EXECD_SPOOL_DIR_LOCAL=""
>                HOSTNAME_RESOLVING="false" DEFAULT_DOMAIN=""
>                ADD_TO_RC="false"
>                EXEC_HOST_LIST_RM="ip-10-245-209-208"
>                REMOVE_RC="false"
>
>
>                I personally believe that everything is alright here...
>         (I have
>                already tried setting the HOSTNAME_RESOLVING option to true
>                using sdmadm, but that didn't help => I thought the problem
>                could be DNS related again...).
>
>                So, because I don't believe the problems lies here, I tried
>                something different. I added the cloud host to the spare_pool
>                (instead of adding it directly to the geadapter), and then
>                performed the sge execution deamon installation manually
>         on the
>                cloud node.
>
>                When I run
>                ./inst_sge -x -noremote -auto $CONF_FILE \
>                       2> $BASEDIR/install_execd.stderr >
>                $BASEDIR/install_execd.stdout &      with the correct config
>                file, nothing happens. No successfull installation, no error
>                messages, no command output.
>
>                However, when I perform the installation completely
>         manual (that
>                is, without the -auto option and by adding the sgeqmaster and
>                sgeexecd as service to the cloud node), I am able to add the
>                cloud node to the grid engine and run jobs on it...
>
>                I thus think that some kind of configuration option must be
>                wrong, but I don't really know where to go from here.
>
>                Can anyone give some better directions? Is there any way
>         to get
>                better debugging output? Could this be DNS related again,
>         or is
>                this probably an other problem?
>
>                Thanks again,
>
>                Joris
>
>                PS: Should I post a new message to the mailinglist for this
>                since this problem doesn't have anything to do with the
>         VPN/DNS
>                problems I was originally having ?
>
>
>                On Mon, Mar 29, 2010 at 07:47, rhierlmeier
>                <richard.hierlmeier at sun.com
>         <mailto:richard.hierlmeier at sun.com>
>         <mailto:richard.hierlmeier at sun.com
>         <mailto:richard.hierlmeier at sun.com>>
>                <mailto:richard.hierlmeier at sun.com
>         <mailto:richard.hierlmeier at sun.com>
>                <mailto:richard.hierlmeier at sun.com
>         <mailto:richard.hierlmeier at sun.com>>>> wrote:
>
>                   Hi Joris,
>
>                   On 03/26/10 13:22, jorisroovers wrote:
>                    > Hi Torsten,
>                    >
>                    > I changed the output of the hostname command
>                   neo-wn01.cmi.ua.ac.be <http://neo-wn01.cmi.ua.ac.be>
>         <http://neo-wn01.cmi.ua.ac.be>
>                <http://neo-wn01.cmi.ua.ac.be>
>                    > <http://neo-wn01.cmi.ua.ac.be> and restarted the sdm
>                master jvm. This
>                    > solved the problem! The cloud host is now
>         succesfully started.
>                    >
>                    > However, the cloud is shutdown immediately after
>         the startup
>                   procedure
>                    > is completed. It also isn't added to the spare
>         pool. I believe
>                   this is
>                    > because there currently is no load nor SLO defined
>         on the
>                system.
>
>
>                   Per default the spare_pool has a PermanentRequestSLO with
>                urgency 1 and
>                   the cloud service a PermanentRequestSLO with urgency 2
>                (considering
>                   only cloud
>                   resources). This means if no other SLO is defined in the
>                system the
>                   resource
>                   will immediately moved back to the cloud service after
>         startup.
>
>                   Do you have already a Grid Engine service in the system?
>
>                   Grid Engine service has per default a FixedUsageSLO with
>                urgency 50
>                   (gives every
>                   resources at the service a fixed usage). If you move a
>         cloud
>                   resource to the
>                   Grid Engine service it will stay there.
>
>
>                   Richard
>
>                    > I think I'll reinstall the sdm system, grid engine
>         and cloud
>                   adapter to
>                    > make sure that I have a clean install to continue with.
>                This will
>                    > probably solve this problem.
>
>                    >
>                    > Thanks again for all your help. Keep up the good
>         work :-)
>                    >
>                    > Joris
>                    >
>                    > On Wed, Mar 24, 2010 at 16:47, torsten
>                <torsten.blix at sun.com <mailto:torsten.blix at sun.com>
>         <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>
>                    > <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>> wrote:
>                    >
>                    >     Hi Joris,
>                    >
>                    >     thanks for your answers. It looks to me like
>         your problem
>                   comes from the
>                    >     fact that the hostname of your SDM master host
>         (what the
>                   hostname binary
>                    >     returns) is the short version (neo-wn01) while when
>                resolving
>                   this host
>                    >     on the SDM master host you get the fully qualified
>                hostname
>                    >     (neo-wn01.cmi.ua.ac.be
>         <http://neo-wn01.cmi.ua.ac.be> <http://neo-wn01.cmi.ua.ac.be>
>                <http://neo-wn01.cmi.ua.ac.be>
>                   <http://neo-wn01.cmi.ua.ac.be>). This should
>                    >     be consistent.
>                    >
>                    >     Therefore I'd expect your system to work if you
>         change the
>                   hostname to
>                    >     the fully qualified hostname, or influence the
>         hostname
>                   resolving on the
>                    >     SDM master host, so that the host is always
>         resolved
>                to the short
>                    >     hostname (the FQDN must stay resolvable as
>         well). In both
>                   cases, a
>                    >     restart of the SDM system (with sdmadm
>         shutdown_jvm and
>                   startup_jvm) is
>                    >     necessary afterwards.
>                    >
>                    >     A reinstallation of SDM should not be necessary.
>                    >
>                    >     Cheers,
>                    >     Torsten
>                    >
>                    >     On 03/24/10 13:12, jorisroovers wrote:
>                    >      > Hi Torsten,
>                    >      >
>                    >      > To answer your questions:
>                    >      >
>                    >      > 1) Did you install the SDM system (master host)
>                before the
>                   entries to
>                    >      > the DNS server were made (while you still
>         had the
>                manual
>                   entries in
>                    >      > /etc/hosts)?
>                    >      >
>                    >      > Yes I did. Can this be the cause of my
>         problems ?
>                    >      >
>                    >      > 2) Has the SDM system on the master host
>         been running
>                   without restart
>                    >      > since then? (so no "sdmadm shutdown_jvm"
>         command)
>                    >      >
>                    >      > Yes it has. However, after receiving your
>         previous
>                email, I
>                    >     rebooted the
>                    >      > sdm master node to make sure that the master
>         uses
>                the latest
>                    >     configuration.
>                    >      > (java.rmi.server.hostname still is set to
>                   neo-wn01.cmi.ua.ac.be <http://neo-wn01.cmi.ua.ac.be>
>         <http://neo-wn01.cmi.ua.ac.be>
>                <http://neo-wn01.cmi.ua.ac.be>
>                    >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      > <http://neo-wn01.cmi.ua.ac.be>)
>                    >      >
>                    >      > 3) What does the following command (executed on
>                your local
>                   SDM master
>                    >      > host) output? Full or short hostname for
>         neo-wn01?
>                    >      > % grep csInfo
>                /etc/sdm/bootstrap/sdmjoris/prefs.properties
>                    >      >
>                    >      > root at neo-wn01:~# grep csInfo
>                    >     /etc/sdm/bootstrap/sdmjoris/prefs.properties
>                    >      > csInfo=neo-wn01.cmi.ua.ac.be
>         <http://neo-wn01.cmi.ua.ac.be>
>                <http://neo-wn01.cmi.ua.ac.be>
>                   <http://neo-wn01.cmi.ua.ac.be>
>         <http://neo-wn01.cmi.ua.ac.be>
>                    >     <http://neo-wn01.cmi.ua.ac.be>\:6442
>                    >      >
>                    >      > 4) What does the hostname binary return on
>         your SDM
>                master
>                   host?
>                    >     Full or
>                    >      > short hostname for neo-wn01?
>                    >      >
>                    >      > root at neo-wn01:~# hostname
>                    >      > neo-wn01
>                    >      >
>                    >      > I've ran the hostname command before and already
>                thought that
>                    >     this might
>                    >      > be related to the problem, but since I
>         didn't find any
>                   reference
>                    >     to the
>                    >      > command in any of the related gef_ec2_*
>         scripts, I
>                thought
>                   this
>                    >     wasn't
>                    >      > important. Do you think that the hostname
>         command
>                giving the
>                    >     unqualified
>                    >      > name may be related to the problems I'm having?
>                    >      >
>                    >      > Hopefully, this information can help. If
>         not, I'll do a
>                   reinstall
>                    >      > tonight or tomorrow morning.
>                    >      >
>                    >      > Thanks again for all your help.
>                    >      >
>                    >      > Cheers,
>                    >      > Joris
>                    >      >
>                    >      >
>                    >      > On Tue, Mar 23, 2010 at 14:33, torsten
>                   <torsten.blix at sun.com <mailto:torsten.blix at sun.com>
>         <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>
>                    >     <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>>>
>                    >      > <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>>>
>                wrote:
>                    >      >
>                    >      >     Hi Joris,
>                    >      >
>                    >      >     On 03/22/10 17:12, jorisroovers wrote:
>                    >      >      > Hi,
>                    >      >      >
>                    >      >      > I checked this, but it seems that the
>                rmi-registry is
>                    >     setup correctly
>                    >      >      > (ps -eF)
>                    >      >      >
>                    >      >      >
>         /usr/lib/jvm/java-6-sun-1.6.0.15/jre/bin/java
>                    >      >      >
>                -Djava.security.manager=java.rmi.RMISecurityManager
>                   [lot
>                    >     of other
>                    >      >      > arguments]
>                    >
>         -Djava.rmi.server.hostname=neo-wn01.cmi.ua.ac.be
>         <http://neo-wn01.cmi.ua.ac.be>
>                <http://neo-wn01.cmi.ua.ac.be>
>                   <http://neo-wn01.cmi.ua.ac.be>
>                    >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >      > <http://neo-wn01.cmi.ua.ac.be>
>                    >      >
>                    >      >     ok, so this isn't the culprit ...
>                    >      >
>                    >      >      > Your reply got me thinking though. The
>                cluster I'm
>                   using
>                    >     is newly
>                    >      >      > installed, and it has only been added
>         to the
>                DNS-server
>                    >     last week.
>                    >      >      > Before the nodes of the cluster were
>         added
>                to the
>                    >     DNS-server, I
>                    >      >     needed
>                    >      >      > to add entries to /etc/hosts manually
>         if I
>                wanted the
>                    >     hostnames to be
>                    >      >      > resolved. Therefore, I added some
>         entries of
>                other
>                   nodes
>                    >     to the
>                    >      >      > /etc/hosts file of neo-wn01
>         (including neo-wn01
>                   itself).
>                    >      >      > I have now removed those, to be sure
>         that no new
>                   problems
>                    >     arise
>                    >      >     from the
>                    >      >      > /etc/hosts file.
>                    >      >
>                    >      >     Good point! This host name resolving
>                reconfiguration
>                   might be
>                    >     the cause
>                    >      >     of your problem. I have a few questions:
>                    >      >
>                    >      >     1) Did you install the SDM system
>         (master host)
>                before the
>                    >     entries to
>                    >      >     the DNS server were made (while you
>         still had
>                the manual
>                    >     entries in
>                    >      >     /etc/hosts)?
>                    >      >
>                    >      >     2) Has the SDM system on the master host
>         been
>                running
>                   without
>                    >     restart
>                    >      >     since then? (so no "sdmadm shutdown_jvm"
>         command)
>                    >      >
>                    >      >     3) What does the following command
>         (executed on
>                your local
>                    >     SDM master
>                    >      >     host) output? Full or short hostname for
>         neo-wn01?
>                    >      >     % grep csInfo
>                /etc/sdm/bootstrap/sdmjoris/prefs.properties
>                    >      >
>                    >      >     4) What does the hostname binary return
>         on your
>                SDM master
>                    >     host? Full or
>                    >      >     short hostname for neo-wn01?
>                    >      >
>                    >      >     It might help, to configure your master
>         host to
>                resolve
>                    >     itself always to
>                    >      >     the short hostname (neo-wn01) and
>         reinstall SDM (or
>                   install a
>                    >     2nd SDM
>                    >      >     system with a different system name).
>                    >      >
>                    >      >
>                    >      >      > Currently, the only entry in
>         /etc/hosts is
>                    >      >      >
>                    >      >      > 127.0.0.1 localhost
>                    >      >      >
>                    >      >      > I've retried the cloud installation
>         process, but
>                   the same
>                    >     error
>                    >      >     occured.
>                    >      >      > However, I also found an interesting
>         error,
>                that I
>                    >     overlooked before.
>                    >      >      > When doing the sdminstallation
>         manually on the
>                   cloud host
>                    >     (not having
>                    >      >      > edited the /etc/hosts file) I get the
>                following error
>                    >      >      >
>                    >      >      > root at domU-12-31-39-03-CC-61:/opt/sdm/bin#
>                ./sdmadm -p
>                    >     system -ppw -s
>                    >      >      > sdmjoris install_managed_host -au root -l
>                   /root/spool -cs_url
>                    >      >      > neo-wn01.cmi.ua.ac.be:6442
>         <http://neo-wn01.cmi.ua.ac.be:6442>
>                <http://neo-wn01.cmi.ua.ac.be:6442>
>                   <http://neo-wn01.cmi.ua.ac.be:6442>
>                    >     <http://neo-wn01.cmi.ua.ac.be:6442>
>                   <http://neo-wn01.cmi.ua.ac.be:6442>
>                    >      >     <http://neo-wn01.cmi.ua.ac.be:6442>
>                    >      >      > A configuration for system "sdmjoris" has
>                been added.
>                    >      >      > username [root] >
>                    >      >      > password >
>                    >      >      > WARNING: Host neo-wn01 is not resolvable
>                    >      >      > username [root] >
>                    >      >      > password >
>                    >      >      > During installation of system
>         sdmjoris, an error
>                   occurred.
>                    >     The system
>                    >      >      > will be removed from preferences.
>                    >      >      > Error: Cannot connect to JVM
>                   cs_vm at neo-wn01_cmi_ua_ac_be:
>                    >     Exception
>                    >      >      > creating connection to:
>                neo-wn01.cmi.ua.ac.be <http://neo-wn01.cmi.ua.ac.be>
>         <http://neo-wn01.cmi.ua.ac.be>
>                   <http://neo-wn01.cmi.ua.ac.be>
>                    >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >      > <http://neo-wn01.cmi.ua.ac.be>; nested
>                exception is:
>                    >      >      >         java.io.IOException: found no SSL
>                context
>                   for system
>                    >      >     neo-wn01:6442
>                    >      >      >
>                    >      >      >
>                    >      >      > Which would suggest that there is a
>         certificate
>                   problem.
>                    >      >
>                    >      >     The error message suggests that, but this is
>                not the case.
>                    >     This is very
>                    >      >     probably related to hostname resolving
>         on the SDM
>                   master host.
>                    >      >
>                    >      >     Cheers,
>                    >      >     Torsten
>                    >      >
>                    >      >      > Any other suggestions?
>                    >      >      > Thanks again,
>                    >      >      >
>                    >      >      > Joris
>                    >      >      >
>                    >      >      >
>                    >      >      >
>                    >      >      >
>                    >      >      > On Mon, Mar 22, 2010 at 14:43, torsten
>                    >     <torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>
>                    >      >     <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>>
>                    >      >      > <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>
>                    >     <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>>>>
>                wrote:
>                    >      >      >
>                    >      >      >     Hi Joris,
>                    >      >      >
>                    >      >      >     On 03/22/10 13:45, jorisroovers
>         wrote:
>                    >      >      >      > Hi Torsten,
>                    >      >      >      >
>                    >      >      >      > Thanks for your help !
>                    >      >      >      > Sorry for the late reply. I've
>         been busy
>                   last week.
>                    >      >      >      > However, I have been able to
>         solve the
>                   problem. It was
>                    >      >     indeed the
>                    >      >      >      > ssh-tunnel that was not setup
>         correctly.
>                    >      >      >      > The actual problem was that the
>                /etc/hosts
>                   file on
>                    >     the sdm
>                    >      >     master
>                    >      >      >     host
>                    >      >      >      > didn't contain a localhost
>         entry anymore
>                   (apparently,
>                    >      >      >      > I accidentally deleted that
>         entry when
>                   editing the
>                    >     file).
>                    >      >     This caused
>                    >      >      >      > the ssh tunnel setup to fail.
>         This is
>                solved
>                   now.
>                    >      >      >
>                    >      >      >     Good to hear!
>                    >      >      >
>                    >      >      >      > However, I'm now having an
>         other issue.
>                    >      >      >      > The installation now fails when
>                installing
>                   the SDM
>                    >     managed
>                    >      >     host.
>                    >      >      >      > ec2        res#32
>                domU-12-31-39-0B-1D-31 ERROR
>                    >      host       2
>                    >      >      >       Step
>                    >      >      >      > 'Installing and starting up SDM'
>                failed (see
>                   ...)
>                    >      >      >      >
>                    >      >      >      > I believe this has something to do
>                with the
>                    >     /etc/hosts file on
>                    >      >      >     the cloud
>                    >      >      >      > host.
>                    >      >      >      > When I run the
>         install_managed_host
>                on the
>                   cloud host
>                    >      >      >      >
>                    >      >      >      > sdmadm -p system -ppw -s sdmtest
>                    >      install_managed_host -au
>                    >      >     root -l
>                    >      >      >      > /root/spool -cs_url
>                   neo-wn01.cmi.ua.ac.be:6442
>         <http://neo-wn01.cmi.ua.ac.be:6442>
>                <http://neo-wn01.cmi.ua.ac.be:6442>
>                <http://neo-wn01.cmi.ua.ac.be:6442>
>                    >     <http://neo-wn01.cmi.ua.ac.be:6442>
>                    >      >     <http://neo-wn01.cmi.ua.ac.be:6442>
>                    >      >      >     <http://neo-wn01.cmi.ua.ac.be:6442>
>                    >      >      >      >
>         <http://neo-wn01.cmi.ua.ac.be:6442>
>                    >      >      >      >
>                    >      >      >      > (I use the password installation
>                method for
>                   simplicity,
>                    >      >     I've already
>                    >      >      >      > verified that the right
>         certificates
>                that are
>                    >     needed for
>                    >      >      >     password-less
>                    >      >      >      > installation are present on
>         the cloud
>                host)
>                    >      >      >      > I get the following output:
>                    >      >      >      >
>                    >      >      >      > A configuration for system
>         "sdmtest" has
>                   been added.
>                    >      >      >      > username [root] >
>                    >      >      >      > password >
>                    >      >      >      > WARNING: Host neo-wn01 is not
>         resolvable
>                    >      >      >
>                    >      >      >     This looks like a problem with host
>                names resolving
>                    >      >     differently on your
>                    >      >      >     SDM master host and on the cloud
>         host.
>                    >      >      >
>                    >      >      >     A little background:
>                    >      >      >     The cs_url you specified on the
>         command line
>                   above is
>                    >     used to
>                    >      >     contact an
>                    >      >      >     RMI registry on the SDM master host.
>                This registry
>                    >     hands back
>                    >      >     a URL to
>                    >      >      >     which the real RMI connection
>         should be
>                made.
>                    From the
>                    >      >     warning you got
>                    >      >      >     it looks like that this 2nd URL
>         handed
>                back by
>                   the RMI
>                    >      >     registry contains
>                    >      >      >     the short hostname for your SDM
>         master host.
>                    >      >      >
>                    >      >      >     To confirm this suspicion, it
>         would be
>                good if
>                   you could
>                    >      >     check on the
>                    >      >      >     SDM master host, the parameters that
>                were used for
>                    >     starting
>                    >      >     up your SDM
>                    >      >      >     JVMs. Look (e.g. by using ps or
>         pargs on
>                   Solaris) for a
>                    >      >     command line
>                    >      >      >     switch
>                   -Djava.rmi.server.hostname=<master_host_name>
>                    >     in the
>                    >      >     (rather
>                    >      >      >     longish) command line that was
>         used to start
>                   the SDM
>                    >     JVM process.
>                    >      >      >
>                    >      >      >     If my suspicion is correct, than
>         this should
>                   show the
>                    >     short
>                    >      >     name of your
>                    >      >      >     master host (neo-wn01) instead of
>         the FQDN
>                    >      >     neo-wn01.cmi.ua.ac.be
>         <http://neo-wn01.cmi.ua.ac.be>
>                <http://neo-wn01.cmi.ua.ac.be> <http://neo-wn01.cmi.ua.ac.be>
>                   <http://neo-wn01.cmi.ua.ac.be>
>                    >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >      >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >      >
>                    >      >      >     Could you verify this, please?
>                    >      >      >
>                    >      >      >     Cheers,
>                    >      >      >     Torsten
>                    >      >      >
>                    >      >      >      > The installation procedure
>         then again
>                asks
>                   for the
>                    >      >     username and
>                    >      >      >     password
>                    >      >      >      > for 2 times, before exiting.
>                    >      >      >      > The /etc/hosts file on the
>         cloud node
>                currently
>                    >     contains
>                    >      >      >      >
>                    >      >      >      > # SDM master host
>                    >      >      >      > 10.8.0.1
>          neo-wn01.cmi.ua.ac.be <http://neo-wn01.cmi.ua.ac.be>
>                <http://neo-wn01.cmi.ua.ac.be>
>                   <http://neo-wn01.cmi.ua.ac.be>
>                    >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >      >     <http://neo-wn01.cmi.ua.ac.be>
>                    >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >      >      >
>                    >      >      >      > When I replaced this entry
>         with the
>                following
>                    >     (adding the
>                    >      >     unqualified
>                    >      >      >      > name), the installer no longer
>         gives
>                the warning
>                    >     and the
>                    >      >     installation
>                    >      >      >      > seems to go well.
>                    >      >      >      >
>                    >      >      >      > # SDM master host
>                    >      >      >      > 10.8.0.1
>          neo-wn01.cmi.ua.ac.be <http://neo-wn01.cmi.ua.ac.be>
>                <http://neo-wn01.cmi.ua.ac.be>
>                   <http://neo-wn01.cmi.ua.ac.be>
>                    >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >      >     <http://neo-wn01.cmi.ua.ac.be>
>                    >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >      >      > neo-wn01
>                    >      >      >      >
>                    >      >      >      > Now, my question is: Is it normal
>                that the
>                    >     installer needs
>                    >      >     this
>                    >      >      >     second
>                    >      >      >      > alias in the /etc/hosts file?
>         Can I
>                modify
>                   anything
>                    >     in my sdm
>                    >      >      >      > installation so that this is
>         no long
>                necessary?
>                    >      >      >      > I know that the /etc/hosts file is
>                edited by the
>                    >      >     startup-vpn.sh
>                    >      >      >     script
>                    >      >      >      > that is remotely trigged  by the
>                    >      >      >     gef_ec2_startup_vpn_connection.sh
>          script.
>                    >      >      >      >
>                    >      >      >      > execute_ssh_script $RES_dnsName
>                "startup-vpn.sh
>                    >      >     $sdm_master_host
>                    >      >      >      > $SDM_MASTER_VPN_IP
>         $remote_config_file
>                    >     $SDM_MASTER_VPN_IP" 1
>                    >      >      >      >
>                    >      >      >      > Trying to edit $sdm_master_host in
>                the line
>                   above
>                    >     has been
>                    >      >      >     unsuccessful
>                    >      >      >      > so far. Apparently, if this
>         variable
>                contains
>                    >     spaces (like
>                    >      >     when
>                    >      >      >     setting
>                    >      >      >      >
>                $sdm_master_host="neo-wn01.cmi.ua.ac.be
>         <http://neo-wn01.cmi.ua.ac.be>
>                <http://neo-wn01.cmi.ua.ac.be>
>                   <http://neo-wn01.cmi.ua.ac.be>
>                    >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >      >     <http://neo-wn01.cmi.ua.ac.be>
>                    >     <http://neo-wn01.cmi.ua.ac.be>
>                    >      >      >      > neo-wn01") only the first part is
>                added to the
>                    >     /etc/hosts
>                    >      >     file.
>                    >      >      >      >
>                    >      >      >      > Of course, solving this problem by
>                editing
>                    >      >      >      > the
>         gef_ec2_startup_vpn_connection.sh
>                script
>                   would
>                    >     only be
>                    >      >     half a
>                    >      >      >      > solution...
>                    >      >      >      >
>                    >      >      >      > Any ideas or help would be
>         very useful.
>                    >      >      >      > Thanks a lot,
>                    >      >      >      >
>                    >      >      >      > Joris
>                    >      >      >      >
>                    >      >      >      >
>                    >      >      >      >
>                    >      >      >      >
>                    >      >      >      >
>                    >      >      >      >
>                    >      >      >      >
>                    >      >      >      >
>                    >      >      >      >
>                    >      >      >      >
>                    >      >      >      >
>                    >      >      >      >
>                    >      >      >      > On Mon, Mar 15, 2010 at 14:52,
>         torsten
>                    >      >     <torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>
>                    >     <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>>
>                    >      >      >     <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>
>                    >     <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>
>                    >     <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>>>
>                    >      >      >      > <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>
>                    >     <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>
>                    >     <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>>
>                    >      >     <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>
>                    >     <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>
>                <mailto:torsten.blix at sun.com <mailto:torsten.blix at sun.com>>>
>                   <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>
>                <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com
>         <mailto:torsten.blix at sun.com>>>>>>>>
>                wrote:
>                    >      >      >      >
>                    >      >      >      >     Hi Joris,
>                    >      >      >      >
>                    >      >      >      >     any progress in the meantime?
>                    >      >      >      >
>                    >      >      >      >     Maybe my comments below
>         can help.
>                    >      >      >      >
>                    >      >      >      >     On 03/10/10 13:42,
>         jorisroovers
>                wrote:
>                    >      >      >      >      > Hello everyone,
>                    >      >      >      >      >
>                    >      >      >      >      > I'm trying to setup an SDM
>                   installation with
>                    >      >     managed nodes on
>                    >      >      >      >     Amazon EC2
>                    >      >      >      >      > using the SDM Cloud
>         Adapter.
>                    >      >      >      >      > The installation of the
>         adapter
>                   itself was
>                    >     successfull,
>                    >      >      >     but now I'm
>                    >      >      >      >      > having some problems
>         when starting
>                   cloud hosts.
>                    >      >      >      >      >
>                    >      >      >      >      > To start a cloud host,
>         I use the
>                   commands as
>                    >      >     described in
>                    >      >      >     the wiki:
>                    >      >      >      >      >
>                    >      >      >      >      > sdmadm add_resource -s ec2
>                (filled in
>                    >     unbound_name and
>                    >      >      >     amiId in the
>                    >      >      >      >      > editor. I'm using the
>         sample AMI)
>                    >      >      >      >      > smdadm move_resource -r
>         cloud1 -s
>                   spare_pool
>                    >      >      >      >      >
>                    >      >      >      >      > when watching the 'sdmam
>                   show_resource' output I
>                    >      >     can see
>                    >      >      >     that the
>                    >      >      >      >      > instance is successfully
>                started (I
>                    >     confirmed this by
>                    >      >      >     using the
>                    >      >      >      >     online
>                    >      >      >      >      > Amazon EC2 Management
>         Console).
>                    >      >      >      >      > However, during the
>                UNASSIGNING phase, a
>                    >     problem occurs
>                    >      >      >     while the
>                    >      >      >      >     VPN is
>                    >      >      >      >      > started on the cloud host.
>                    >      >      >      >      >
>                    >      >      >      >      > output of 'sdmadm
>         show_resource':
>                    >      >      >      >      > ec2        res#16
>         cloud1                          ERROR
>                    >      >      host U     2
>                    >      >      >      >     Step
>                    >      >      >      >      > 'Starting up VPN
>         connection'
>                failed (see
>                    >      >      >      >      >
>                    >      >
>         'Starting_up_virtual_resource-2010-03-10_11:21:43-res#16.log')
>                    >      >      >      >      >
>                    >      >      >      >      > I already did some
>         research on the
>                   cause of this
>                    >      >     problem (by
>                    >      >      >      >     increasing
>                    >      >      >      >      > log output, removing the
>                undo-steps
>                   so that the
>                    >      >     cloud node
>                    >      >      >     is not
>                    >      >      >      >      > shutdown when the problem
>                occurs and
>                    >     examining the
>                    >      >      >     executed scripts).
>                    >      >      >      >      > I found out that the
>         problem lies
>                   with the
>                    >      >     execution of the
>                    >      >      >      >      >
>                    >     /opt/sdm/util/cloud/ec2/ami_scripts/startup-vpn.sh
>                    >      >     script
>                    >      >      >     on the
>                    >      >      >      >     cloud node.
>                    >      >      >      >      > More specifically, the
>                'wait_for_ping
>                    >      >     $VPN_SERVER_VPN_IP "VPN
>                    >      >      >      >     server"'
>                    >      >      >      >      > part fails.
>                    >      >      >      >      > I suspect this is
>         caused by the
>                   './openvpn
>                    >     --config
>                    >      >      >      >     "$VPN_CONFIG_FILE"
>                    >      >      >      >      > --daemon' that is  executed
>                before the
>                    >      >     wait_for_ping command.
>                    >      >      >      >      >
>                    >      >      >      >      > I tried to run the
>         'openvpn'
>                command
>                   manually on
>                    >      >     the cloud
>                    >      >      >     host
>                    >      >      >      >     and got
>                    >      >      >      >      > the following output:
>                    >      >      >      >      >
>                    >      >      >      >      > Wed Mar 10 10:55:16
>         2010 TCP:
>                connect to
>                    >      >     127.0.0.1:1194 <http://127.0.0.1:1194>
>         <http://127.0.0.1:1194>
>                <http://127.0.0.1:1194>
>                   <http://127.0.0.1:1194> <http://127.0.0.1:1194>
>                    >      >      >     <http://127.0.0.1:1194>
>                    >      >      >      >     <http://127.0.0.1:1194>
>                    >      >      >
>
>
>


--
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Richard Hierlmeier           Phone: ++49 (0)941 3075-223
Software Engineering         Fax:   ++49 (0)941 3075-222
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7        mailto: richard.hierlmeier at sun.com
D-93049 Regensburg           http://www.sun.com/grid

Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028
Geschäftsführer: Jürgen Kunz
gen Kunz



More information about the gridengine-users mailing list