No subject


Wed Jan 12 20:38:46 GMT 2011


./inst_sge -x -noremote -auto $CONF_FILE \
        2> $BASEDIR/install_execd.stderr > $BASEDIR/install_execd.stdout &

is called to install the execution deamon on the cloud host and that $CONF_FILE can be found under /var/spool/sdm/sdmjoris/tmp/executor/root_1/install_execd_cloud.conf on the cloud node.
By quickly copying this configuration-file on the cloud node (before the uninstallation procedure triggered by the ERROR during install deletes this file), I was able to inspect its content.

File contents of install_execd_cloud.conf on the cloud node (comments stripped):

SGE_ROOT="/opt/sge"
SGE_QMASTER_PORT="6444"
SGE_EXECD_PORT="6445"
SGE_ENABLE_SMF="false"
SGE_CLUSTER_NAME="sgejoris"
CELL_NAME="default"
PAR_EXECD_INST_COUNT="1"
ADMIN_HOST_LIST=""
SUBMIT_HOST_LIST="ip-10-245-209-208"
EXEC_HOST_LIST="ip-10-245-209-208"
EXECD_SPOOL_DIR_LOCAL=""
HOSTNAME_RESOLVING="false"
DEFAULT_DOMAIN=""
ADD_TO_RC="false"
EXEC_HOST_LIST_RM="ip-10-245-209-208"
REMOVE_RC="false"


I personally believe that everything is alright here... (I have already tried setting the HOSTNAME_RESOLVING option to true using sdmadm, but that didn't help => I thought the problem could be DNS related again...).

So, because I don't believe the problems lies here, I tried something different. I added the cloud host to the spare_pool (instead of adding it directly to the geadapter), and then performed the sge execution deamon installation manually on the cloud node.

When I run

./inst_sge -x -noremote -auto $CONF_FILE \
        2> $BASEDIR/install_execd.stderr > $BASEDIR/install_execd.stdout &

with the correct config file, nothing happens. No successfull installation, no error messages, no command output.

However, when I perform the installation completely manual (that is, without the -auto option and by adding the sgeqmaster and sgeexecd as service to the cloud node), I am able to add the cloud node to the grid engine and run jobs on it...

I thus think that some kind of configuration option must be wrong, but I don't really know where to go from here.

Can anyone give some better directions? Is there any way to get better debugging output? Could this be DNS related again, or is this probably an other problem?

Thanks again,

Joris

PS: Should I post a new message to the mailinglist for this since this problem doesn't have anything to do with the VPN/DNS problems I was originally having ?


On Mon, Mar 29, 2010 at 07:47, rhierlmeier <richard.hierlmeier at sun.com<mailto:richard.hierlmeier at sun.com>> wrote:
Hi Joris,

On 03/26/10 13:22, jorisroovers wrote:
> Hi Torsten,
>
> I changed the output of the hostname command neo-wn01.cmi.ua.ac.be<http://neo-wn01.cmi.ua.ac.be>
> <http://neo-wn01.cmi.ua.ac.be> and restarted the sdm master jvm. This
> solved the problem! The cloud host is now succesfully started.
>
> However, the cloud is shutdown immediately after the startup procedure
> is completed. It also isn't added to the spare pool. I believe this is
> because there currently is no load nor SLO defined on the system.


Per default the spare_pool has a PermanentRequestSLO with urgency 1 and
the cloud service a PermanentRequestSLO with urgency 2 (considering only cloud
resources). This means if no other SLO is defined in the system the resource
will immediately moved back to the cloud service after startup.

Do you have already a Grid Engine service in the system?

Grid Engine service has per default a FixedUsageSLO with urgency 50 (gives every
resources at the service a fixed usage). If you move a cloud resource to the
Grid Engine service it will stay there.


Richard

> I think I'll reinstall the sdm system, grid engine and cloud adapter to
> make sure that I have a clean install to continue with. This will
> probably solve this problem.

>
> Thanks again for all your help. Keep up the good work :-)
>
> Joris
>
> On Wed, Mar 24, 2010 at 16:47, torsten <torsten.blix at sun.com<mailto:torsten.blix at sun.com>
> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>> wrote:
>
>     Hi Joris,
>
>     thanks for your answers. It looks to me like your problem comes from the
>     fact that the hostname of your SDM master host (what the hostname binary
>     returns) is the short version (neo-wn01) while when resolving this host
>     on the SDM master host you get the fully qualified hostname
>     (neo-wn01.cmi.ua.ac.be<http://neo-wn01.cmi.ua.ac.be> <http://neo-wn01.cmi.ua.ac.be>). This should
>     be consistent.
>
>     Therefore I'd expect your system to work if you change the hostname to
>     the fully qualified hostname, or influence the hostname resolving on the
>     SDM master host, so that the host is always resolved to the short
>     hostname (the FQDN must stay resolvable as well). In both cases, a
>     restart of the SDM system (with sdmadm shutdown_jvm and startup_jvm) is
>     necessary afterwards.
>
>     A reinstallation of SDM should not be necessary.
>
>     Cheers,
>     Torsten
>
>     On 03/24/10 13:12, jorisroovers wrote:
>      > Hi Torsten,
>      >
>      > To answer your questions:
>      >
>      > 1) Did you install the SDM system (master host) before the entries to
>      > the DNS server were made (while you still had the manual entries in
>      > /etc/hosts)?
>      >
>      > Yes I did. Can this be the cause of my problems ?
>      >
>      > 2) Has the SDM system on the master host been running without restart
>      > since then? (so no "sdmadm shutdown_jvm" command)
>      >
>      > Yes it has. However, after receiving your previous email, I
>     rebooted the
>      > sdm master node to make sure that the master uses the latest
>     configuration.
>      > (java.rmi.server.hostname still is set to neo-wn01.cmi.ua.ac.be<http://neo-wn01.cmi.ua.ac.be>
>     <http://neo-wn01.cmi.ua.ac.be>
>      > <http://neo-wn01.cmi.ua.ac.be>)
>      >
>      > 3) What does the following command (executed on your local SDM master
>      > host) output? Full or short hostname for neo-wn01?
>      > % grep csInfo /etc/sdm/bootstrap/sdmjoris/prefs.properties
>      >
>      > root at neo-wn01:~# grep csInfo
>     /etc/sdm/bootstrap/sdmjoris/prefs.properties
>      > csInfo=neo-wn01.cmi.ua.ac.be<http://neo-wn01.cmi.ua.ac.be> <http://neo-wn01.cmi.ua.ac.be>
>     <http://neo-wn01.cmi.ua.ac.be>\:6442
>      >
>      > 4) What does the hostname binary return on your SDM master host?
>     Full or
>      > short hostname for neo-wn01?
>      >
>      > root at neo-wn01:~# hostname
>      > neo-wn01
>      >
>      > I've ran the hostname command before and already thought that
>     this might
>      > be related to the problem, but since I didn't find any reference
>     to the
>      > command in any of the related gef_ec2_* scripts, I thought this
>     wasn't
>      > important. Do you think that the hostname command giving the
>     unqualified
>      > name may be related to the problems I'm having?
>      >
>      > Hopefully, this information can help. If not, I'll do a reinstall
>      > tonight or tomorrow morning.
>      >
>      > Thanks again for all your help.
>      >
>      > Cheers,
>      > Joris
>      >
>      >
>      > On Tue, Mar 23, 2010 at 14:33, torsten <torsten.blix at sun.com<mailto:torsten.blix at sun.com>
>     <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>
>      > <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>>> wrote:
>      >
>      >     Hi Joris,
>      >
>      >     On 03/22/10 17:12, jorisroovers wrote:
>      >      > Hi,
>      >      >
>      >      > I checked this, but it seems that the rmi-registry is
>     setup correctly
>      >      > (ps -eF)
>      >      >
>      >      > /usr/lib/jvm/java-6-sun-1.6.0.15/jre/bin/java
>      >      > -Djava.security.manager=java.rmi.RMISecurityManager [lot
>     of other
>      >      > arguments]
>     -Djava.rmi.server.hostname=neo-wn01.cmi.ua.ac.be<http://neo-wn01.cmi.ua.ac.be>
>     <http://neo-wn01.cmi.ua.ac.be>
>      >     <http://neo-wn01.cmi.ua.ac.be>
>      >      > <http://neo-wn01.cmi.ua.ac.be>
>      >
>      >     ok, so this isn't the culprit ...
>      >
>      >      > Your reply got me thinking though. The cluster I'm using
>     is newly
>      >      > installed, and it has only been added to the DNS-server
>     last week.
>      >      > Before the nodes of the cluster were added to the
>     DNS-server, I
>      >     needed
>      >      > to add entries to /etc/hosts manually if I wanted the
>     hostnames to be
>      >      > resolved. Therefore, I added some entries of other nodes
>     to the
>      >      > /etc/hosts file of neo-wn01 (including neo-wn01 itself).
>      >      > I have now removed those, to be sure that no new problems
>     arise
>      >     from the
>      >      > /etc/hosts file.
>      >
>      >     Good point! This host name resolving reconfiguration might be
>     the cause
>      >     of your problem. I have a few questions:
>      >
>      >     1) Did you install the SDM system (master host) before the
>     entries to
>      >     the DNS server were made (while you still had the manual
>     entries in
>      >     /etc/hosts)?
>      >
>      >     2) Has the SDM system on the master host been running without
>     restart
>      >     since then? (so no "sdmadm shutdown_jvm" command)
>      >
>      >     3) What does the following command (executed on your local
>     SDM master
>      >     host) output? Full or short hostname for neo-wn01?
>      >     % grep csInfo /etc/sdm/bootstrap/sdmjoris/prefs.properties
>      >
>      >     4) What does the hostname binary return on your SDM master
>     host? Full or
>      >     short hostname for neo-wn01?
>      >
>      >     It might help, to configure your master host to resolve
>     itself always to
>      >     the short hostname (neo-wn01) and reinstall SDM (or install a
>     2nd SDM
>      >     system with a different system name).
>      >
>      >
>      >      > Currently, the only entry in /etc/hosts is
>      >      >
>      >      > 127.0.0.1 localhost
>      >      >
>      >      > I've retried the cloud installation process, but the same
>     error
>      >     occured.
>      >      > However, I also found an interesting error, that I
>     overlooked before.
>      >      > When doing the sdminstallation manually on the cloud host
>     (not having
>      >      > edited the /etc/hosts file) I get the following error
>      >      >
>      >      > root at domU-12-31-39-03-CC-61:/opt/sdm/bin# ./sdmadm -p
>     system -ppw -s
>      >      > sdmjoris install_managed_host -au root -l /root/spool -cs_url
>      >      > neo-wn01.cmi.ua.ac.be:6442<http://neo-wn01.cmi.ua.ac.be:6442>
>     <http://neo-wn01.cmi.ua.ac.be:6442> <http://neo-wn01.cmi.ua.ac.be:6442>
>      >     <http://neo-wn01.cmi.ua.ac.be:6442>
>      >      > A configuration for system "sdmjoris" has been added.
>      >      > username [root] >
>      >      > password >
>      >      > WARNING: Host neo-wn01 is not resolvable
>      >      > username [root] >
>      >      > password >
>      >      > During installation of system sdmjoris, an error occurred.
>     The system
>      >      > will be removed from preferences.
>      >      > Error: Cannot connect to JVM cs_vm at neo-wn01_cmi_ua_ac_be:
>     Exception
>      >      > creating connection to: neo-wn01.cmi.ua.ac.be<http://neo-wn01.cmi.ua.ac.be>
>     <http://neo-wn01.cmi.ua.ac.be>
>      >     <http://neo-wn01.cmi.ua.ac.be>
>      >      > <http://neo-wn01.cmi.ua.ac.be>; nested exception is:
>      >      >         java.io.IOException: found no SSL context for system
>      >     neo-wn01:6442
>      >      >
>      >      >
>      >      > Which would suggest that there is a certificate problem.
>      >
>      >     The error message suggests that, but this is not the case.
>     This is very
>      >     probably related to hostname resolving on the SDM master host.
>      >
>      >     Cheers,
>      >     Torsten
>      >
>      >      > Any other suggestions?
>      >      > Thanks again,
>      >      >
>      >      > Joris
>      >      >
>      >      >
>      >      >
>      >      >
>      >      > On Mon, Mar 22, 2010 at 14:43, torsten
>     <torsten.blix at sun.com<mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>
>      >     <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>>
>      >      > <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>
>     <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>>>> wrote:
>      >      >
>      >      >     Hi Joris,
>      >      >
>      >      >     On 03/22/10 13:45, jorisroovers wrote:
>      >      >      > Hi Torsten,
>      >      >      >
>      >      >      > Thanks for your help !
>      >      >      > Sorry for the late reply. I've been busy last week.
>      >      >      > However, I have been able to solve the problem. It was
>      >     indeed the
>      >      >      > ssh-tunnel that was not setup correctly.
>      >      >      > The actual problem was that the /etc/hosts file on
>     the sdm
>      >     master
>      >      >     host
>      >      >      > didn't contain a localhost entry anymore (apparently,
>      >      >      > I accidentally deleted that entry when editing the
>     file).
>      >     This caused
>      >      >      > the ssh tunnel setup to fail. This is solved now.
>      >      >
>      >      >     Good to hear!
>      >      >
>      >      >      > However, I'm now having an other issue.
>      >      >      > The installation now fails when installing the SDM
>     managed
>      >     host.
>      >      >      > ec2        res#32 domU-12-31-39-0B-1D-31 ERROR
>      host       2
>      >      >       Step
>      >      >      > 'Installing and starting up SDM' failed (see ...)
>      >      >      >
>      >      >      > I believe this has something to do with the
>     /etc/hosts file on
>      >      >     the cloud
>      >      >      > host.
>      >      >      > When I run the install_managed_host on the cloud host
>      >      >      >
>      >      >      > sdmadm -p system -ppw -s sdmtest
>      install_managed_host -au
>      >     root -l
>      >      >      > /root/spool -cs_url neo-wn01.cmi.ua.ac.be:6442<http://neo-wn01.cmi.ua.ac.be:6442>
>     <http://neo-wn01.cmi.ua.ac.be:6442>
>      >     <http://neo-wn01.cmi.ua.ac.be:6442>
>      >      >     <http://neo-wn01.cmi.ua.ac.be:6442>
>      >      >      > <http://neo-wn01.cmi.ua.ac.be:6442>
>      >      >      >
>      >      >      > (I use the password installation method for simplicity,
>      >     I've already
>      >      >      > verified that the right certificates that are
>     needed for
>      >      >     password-less
>      >      >      > installation are present on the cloud host)
>      >      >      > I get the following output:
>      >      >      >
>      >      >      > A configuration for system "sdmtest" has been added.
>      >      >      > username [root] >
>      >      >      > password >
>      >      >      > WARNING: Host neo-wn01 is not resolvable
>      >      >
>      >      >     This looks like a problem with host names resolving
>      >     differently on your
>      >      >     SDM master host and on the cloud host.
>      >      >
>      >      >     A little background:
>      >      >     The cs_url you specified on the command line above is
>     used to
>      >     contact an
>      >      >     RMI registry on the SDM master host. This registry
>     hands back
>      >     a URL to
>      >      >     which the real RMI connection should be made. From the
>      >     warning you got
>      >      >     it looks like that this 2nd URL handed back by the RMI
>      >     registry contains
>      >      >     the short hostname for your SDM master host.
>      >      >
>      >      >     To confirm this suspicion, it would be good if you could
>      >     check on the
>      >      >     SDM master host, the parameters that were used for
>     starting
>      >     up your SDM
>      >      >     JVMs. Look (e.g. by using ps or pargs on Solaris) for a
>      >     command line
>      >      >     switch -Djava.rmi.server.hostname=<master_host_name>
>     in the
>      >     (rather
>      >      >     longish) command line that was used to start the SDM
>     JVM process.
>      >      >
>      >      >     If my suspicion is correct, than this should show the
>     short
>      >     name of your
>      >      >     master host (neo-wn01) instead of the FQDN
>      >     neo-wn01.cmi.ua.ac.be<http://neo-wn01.cmi.ua.ac.be> <http://neo-wn01.cmi.ua.ac.be>
>     <http://neo-wn01.cmi.ua.ac.be>
>      >      >     <http://neo-wn01.cmi.ua.ac.be>
>      >      >
>      >      >     Could you verify this, please?
>      >      >
>      >      >     Cheers,
>      >      >     Torsten
>      >      >
>      >      >      > The installation procedure then again asks for the
>      >     username and
>      >      >     password
>      >      >      > for 2 times, before exiting.
>      >      >      > The /etc/hosts file on the cloud node currently
>     contains
>      >      >      >
>      >      >      > # SDM master host
>      >      >      > 10.8.0.1        neo-wn01.cmi.ua.ac.be<http://neo-wn01.cmi.ua.ac.be>
>     <http://neo-wn01.cmi.ua.ac.be>
>      >     <http://neo-wn01.cmi.ua.ac.be>
>      >      >     <http://neo-wn01.cmi.ua.ac.be>
>     <http://neo-wn01.cmi.ua.ac.be>
>      >      >      >
>      >      >      > When I replaced this entry with the following
>     (adding the
>      >     unqualified
>      >      >      > name), the installer no longer gives the warning
>     and the
>      >     installation
>      >      >      > seems to go well.
>      >      >      >
>      >      >      > # SDM master host
>      >      >      > 10.8.0.1        neo-wn01.cmi.ua.ac.be<http://neo-wn01.cmi.ua.ac.be>
>     <http://neo-wn01.cmi.ua.ac.be>
>      >     <http://neo-wn01.cmi.ua.ac.be>
>      >      >     <http://neo-wn01.cmi.ua.ac.be>
>     <http://neo-wn01.cmi.ua.ac.be>
>      >      >      > neo-wn01
>      >      >      >
>      >      >      > Now, my question is: Is it normal that the
>     installer needs
>      >     this
>      >      >     second
>      >      >      > alias in the /etc/hosts file? Can I modify anything
>     in my sdm
>      >      >      > installation so that this is no long necessary?
>      >      >      > I know that the /etc/hosts file is edited by the
>      >     startup-vpn.sh
>      >      >     script
>      >      >      > that is remotely trigged  by the
>      >      >     gef_ec2_startup_vpn_connection.sh  script.
>      >      >      >
>      >      >      > execute_ssh_script $RES_dnsName "startup-vpn.sh
>      >     $sdm_master_host
>      >      >      > $SDM_MASTER_VPN_IP $remote_config_file
>     $SDM_MASTER_VPN_IP" 1
>      >      >      >
>      >      >      > Trying to edit $sdm_master_host in the line above
>     has been
>      >      >     unsuccessful
>      >      >      > so far. Apparently, if this variable contains
>     spaces (like
>      >     when
>      >      >     setting
>      >      >      > $sdm_master_host="neo-wn01.cmi.ua.ac.be<http://neo-wn01.cmi.ua.ac.be>
>     <http://neo-wn01.cmi.ua.ac.be>
>      >     <http://neo-wn01.cmi.ua.ac.be>
>      >      >     <http://neo-wn01.cmi.ua.ac.be>
>     <http://neo-wn01.cmi.ua.ac.be>
>      >      >      > neo-wn01") only the first part is added to the
>     /etc/hosts
>      >     file.
>      >      >      >
>      >      >      > Of course, solving this problem by editing
>      >      >      > the gef_ec2_startup_vpn_connection.sh script would
>     only be
>      >     half a
>      >      >      > solution...
>      >      >      >
>      >      >      > Any ideas or help would be very useful.
>      >      >      > Thanks a lot,
>      >      >      >
>      >      >      > Joris
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      >
>      >      >      > On Mon, Mar 15, 2010 at 14:52, torsten
>      >     <torsten.blix at sun.com<mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>
>     <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>>
>      >      >     <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>
>     <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>
>     <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>>>
>      >      >      > <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>
>     <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>
>     <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>>
>      >     <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>
>     <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com> <mailto:torsten.blix at sun.com<mailto:torsten.blix at sun.com>>>>>> wrote:
>      >      >      >
>      >      >      >     Hi Joris,
>      >      >      >
>      >      >      >     any progress in the meantime?
>      >      >      >
>      >      >      >     Maybe my comments below can help.
>      >      >      >
>      >      >      >     On 03/10/10 13:42, jorisroovers wrote:
>      >      >      >      > Hello everyone,
>      >      >      >      >
>      >      >      >      > I'm trying to setup an SDM installation with
>      >     managed nodes on
>      >      >      >     Amazon EC2
>      >      >      >      > using the SDM Cloud Adapter.
>      >      >      >      > The installation of the adapter itself was
>     successfull,
>      >      >     but now I'm
>      >      >      >      > having some problems when starting cloud hosts.
>      >      >      >      >
>      >      >      >      > To start a cloud host, I use the commands as
>      >     described in
>      >      >     the wiki:
>      >      >      >      >
>      >      >      >      > sdmadm add_resource -s ec2 (filled in
>     unbound_name and
>      >      >     amiId in the
>      >      >      >      > editor. I'm using the sample AMI)
>      >      >      >      > smdadm move_resource -r cloud1 -s spare_pool
>      >      >      >      >
>      >      >      >      > when watching the 'sdmam show_resource' output I
>      >     can see
>      >      >     that the
>      >      >      >      > instance is successfully started (I
>     confirmed this by
>      >      >     using the
>      >      >      >     online
>      >      >      >      > Amazon EC2 Management Console).
>      >      >      >      > However, during the UNASSIGNING phase, a
>     problem occurs
>      >      >     while the
>      >      >      >     VPN is
>      >      >      >      > started on the cloud host.
>      >      >      >      >
>      >      >      >      > output of 'sdmadm show_resource':
>      >      >      >      > ec2        res#16 cloud1                ERROR
>      >      host U     2
>      >      >      >     Step
>      >      >      >      > 'Starting up VPN connection' failed (see
>      >      >      >      >
>      >     'Starting_up_virtual_resource-2010-03-10_11:21:43-res#16.log')
>      >      >      >      >
>      >      >      >      > I already did some research on the cause of this
>      >     problem (by
>      >      >      >     increasing
>      >      >      >      > log output, removing the undo-steps so that the
>      >     cloud node
>      >      >     is not
>      >      >      >      > shutdown when the problem occurs and
>     examining the
>      >      >     executed scripts).
>      >      >      >      > I found out that the problem lies with the
>      >     execution of the
>      >      >      >      >
>     /opt/sdm/util/cloud/ec2/ami_scripts/startup-vpn.sh
>      >     script
>      >      >     on the
>      >      >      >     cloud node.
>      >      >      >      > More specifically, the 'wait_for_ping
>      >     $VPN_SERVER_VPN_IP "VPN
>      >      >      >     server"'
>      >      >      >      > part fails.
>      >      >      >      > I suspect this is caused by the './openvpn
>     --config
>      >      >      >     "$VPN_CONFIG_FILE"
>      >      >      >      > --daemon' that is  executed before the
>      >     wait_for_ping command.
>      >      >      >      >
>      >      >      >      > I tried to run the 'openvpn' command manually on
>      >     the cloud
>      >      >     host
>      >      >      >     and got
>      >      >      >      > the following output:
>      >      >      >      >
>      >      >      >      > Wed Mar 10 10:55:16 2010 TCP: connect to
>      >     127.0.0.1:1194<http://127.0.0.1:1194> <http://127.0.0.1:1194> <http://127.0.0.1:1194>
>      >      >     <http://127.0.0.1:1194>
>      >      >      >     <http://127.0.0.1:1194>
>      >      >      >      > <http://127.0.0.1:1194> failed, will try
>     again in 5
>      >     seconds:
>      >      >      >     Connection
>      >      >      >      > refused (errno=146)
>      >      >      >      >
>      >      >      >      > This probably is the root of the problem.
>      >      >      >     [snip]
>      >      >      >
>      >      >      >     Good debugging so far, valuable information!
>      >      >      >
>      >      >      >     The installation step that fails for you
>     ('Starting up VPN
>      >      >     connection')
>      >      >      >     does two things (see
>      >      >      >
>      >     <sdm_dist_dir>/util/cloud/ec2/gef_ec2_startup_vpn_connection.sh):
>      >      >      >     1) create an ssh tunnel to the started up cloud
>     host from
>      >      >     local port
>      >      >      >     1194 to remote port 1194
>      >      >      >     2) execute a script
>      >      >      >
>      >     <sdm_dist_dir>/util/cloud/ec2/ami_scripts/startup-vpn.sh that
>      >      >     then
>      >      >      >     starts up the openvpn client on the cloud host (the
>      >     part that
>      >      >     fails for
>      >      >      >     you). This openvpn client is configured to
>     connect to port
>      >      >     1194 on the
>      >      >      >     local host (which is the cloud host), a
>     connection which
>      >      >     should be
>      >      >      >     forwarded by the ssh tunnel set up in step 1 to
>     the VPN
>      >      >     master running
>      >      >      >     on your (local) SDM master machine.
>      >      >      >
>      >      >      >     If I shoot down the ssh tunnel in my test
>     system (after a
>      >      >     complete and
>      >      >      >     successful cloud host startup) and try to
>     restart the
>      >     openvpn
>      >      >     client on
>      >      >      >     the cloud host, I get exactly your error message:
>      >     Connection
>      >      >     refused
>      >      >      >     (errno=146).
>      >      >      >
>      >      >      >     So I'm suspecting that step 1 of the script,
>     the ssh
>      >     tunnel
>      >      >     startup,
>      >      >      >     somehow fails for you.
>      >      >      >
>      >      >      >     Could you check on your local SDM master
>     whether this ssh
>      >      >     tunnel process
>      >      >      >     exists after the startup process fails (and NO
>     undo is
>      >     done)?
>      >      >     Something
>      >      >      >     like "ps -ef | grep ssh" should show it. ssh
>     should be
>      >     called
>      >      >     with
>      >      >      >     arguments like "-R 1194:localhost:1194 -N
>      >      >     <public_cloudhost_name>"
>      >      >      >
>      >      >      >     If this ssh process is running, you should be
>     able to
>      >     telnet
>      >      >     from the
>      >      >      >     cloud host to port localhost:1194 ("telnet
>     localhost
>      >     1194")
>      >      >     and get an
>      >      >      >     answer from the VPN master process running on
>     the SDM
>      >     master
>      >      >     host.
>      >      >      >
>      >      >      >     A further thing to check would be the syslog on
>     the SDM
>      >      >     master host. The
>      >      >      >     VPN master is logging into the syslog any kind of
>      >     problems it
>      >      >      >     encounters.
>      >      >      >
>      >      >      >     I hope this helps!
>      >      >      >
>      >      >      >     Cheers,
>      >      >      >     Torsten
>      >      >      >
>      >      >      >
>     ------------------------------------------------------
>      >      >      >
>      >      >
>      >
>     http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721>
>      >
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721>>
>      >      >
>      >
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721>
>      >
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721>>>
>      >      >      >
>      >      >
>      >
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721>
>      >
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721>>
>      >      >
>      >
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721>
>      >
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=248721>>>>
>      >      >      >
>      >      >      >     To unsubscribe from this discussion, e-mail:
>      >      >      >     [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>
>      >     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>>
>      >      >     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>
>      >     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>>>
>      >      >      >
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>
>      >     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>>
>      >      >     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>
>      >     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>>>>].
>      >      >      >
>      >      >      >
>      >      >
>      >      >     ------------------------------------------------------
>      >      >
>      >
>     http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250497
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250497>
>      >
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250497
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250497>>
>      >      >
>      >
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250497
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250497>
>      >
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250497
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250497>>>
>      >      >
>      >      >     To unsubscribe from this discussion, e-mail:
>      >      >     [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>
>      >     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>>
>      >      >     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>
>      >     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>>>].
>      >      >
>      >      >
>      >
>      >     ------------------------------------------------------
>      >
>     http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250807
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250807>
>      >
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250807
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=250807>>
>      >
>      >     To unsubscribe from this discussion, e-mail:
>      >     [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>
>      >     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>>].
>      >
>      >
>
>     ------------------------------------------------------
>     http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=251126
>     <http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=251126>
>
>     To unsubscribe from this discussion, e-mail:
>     [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>
>     <mailto:users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>>].
>
>


--
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Richard Hierlmeier           Phone: ++49 (0)941 3075-223
Software Engineering         Fax:   ++49 (0)941 3075-222
Sun Microsystems GmbH
Dr.-Leo-Ritter-Str. 7        mailto: richard.hierlmeier at sun.com<mailto:richard.hierlmeier at sun.com>
D-93049 Regensburg           http://www.sun.com/grid

Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht München: HRB 161028
Geschäftsführer: Thomas Schröder

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=251666

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net<mailto:users-unsubscribe at gridengine.sunsource.net>].




More information about the gridengine-users mailing list