[GE users] VPN startup problem when using the SDM cloud adapter

torsten torsten.blix at sun.com
Mon Mar 15 13:52:36 GMT 2010

Hi Joris,

any progress in the meantime?

Maybe my comments below can help.

On 03/10/10 13:42, jorisroovers wrote:
> Hello everyone,
> I'm trying to setup an SDM installation with managed nodes on Amazon EC2 
> using the SDM Cloud Adapter.
> The installation of the adapter itself was successfull, but now I'm 
> having some problems when starting cloud hosts.
> To start a cloud host, I use the commands as described in the wiki:
> sdmadm add_resource -s ec2 (filled in unbound_name and amiId in the 
> editor. I'm using the sample AMI)
> smdadm move_resource -r cloud1 -s spare_pool
> when watching the 'sdmam show_resource' output I can see that the 
> instance is successfully started (I confirmed this by using the online 
> Amazon EC2 Management Console).
> However, during the UNASSIGNING phase, a problem occurs while the VPN is 
> started on the cloud host.
> output of 'sdmadm show_resource':
> ec2        res#16 cloud1                ERROR    host U     2     Step 
> 'Starting up VPN connection' failed (see 
> 'Starting_up_virtual_resource-2010-03-10_11:21:43-res#16.log')
> I already did some research on the cause of this problem (by increasing 
> log output, removing the undo-steps so that the cloud node is not 
> shutdown when the problem occurs and examining the executed scripts).
> I found out that the problem lies with the execution of the 
> /opt/sdm/util/cloud/ec2/ami_scripts/startup-vpn.sh script on the cloud node.
> More specifically, the 'wait_for_ping $VPN_SERVER_VPN_IP "VPN server"' 
> part fails.
> I suspect this is caused by the './openvpn --config "$VPN_CONFIG_FILE" 
> --daemon' that is  executed before the wait_for_ping command.
> I tried to run the 'openvpn' command manually on the cloud host and got 
> the following output:
> Wed Mar 10 10:55:16 2010 TCP: connect to 
> <> failed, will try again in 5 seconds: Connection 
> refused (errno=146)
> This probably is the root of the problem.

Good debugging so far, valuable information!

The installation step that fails for you ('Starting up VPN connection') 
does two things (see 
1) create an ssh tunnel to the started up cloud host from local port 
1194 to remote port 1194
2) execute a script 
<sdm_dist_dir>/util/cloud/ec2/ami_scripts/startup-vpn.sh that then 
starts up the openvpn client on the cloud host (the part that fails for 
you). This openvpn client is configured to connect to port 1194 on the 
local host (which is the cloud host), a connection which should be 
forwarded by the ssh tunnel set up in step 1 to the VPN master running 
on your (local) SDM master machine.

If I shoot down the ssh tunnel in my test system (after a complete and 
successful cloud host startup) and try to restart the openvpn client on 
the cloud host, I get exactly your error message: Connection refused 

So I'm suspecting that step 1 of the script, the ssh tunnel startup, 
somehow fails for you.

Could you check on your local SDM master whether this ssh tunnel process 
exists after the startup process fails (and NO undo is done)? Something 
like "ps -ef | grep ssh" should show it. ssh should be called with 
arguments like "-R 1194:localhost:1194 -N <public_cloudhost_name>"

If this ssh process is running, you should be able to telnet from the 
cloud host to port localhost:1194 ("telnet localhost 1194") and get an 
answer from the VPN master process running on the SDM master host.

A further thing to check would be the syslog on the SDM master host. The 
VPN master is logging into the syslog any kind of problems it encounters.

I hope this helps!



To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].

More information about the gridengine-users mailing list