[GE users] VPN startup problem when using the SDM cloud adapter
joris.roovers at gmail.com
Thu Mar 11 09:14:47 GMT 2010
[ The following text is in the "iso-8859-1" character set. ]
[ Your display is set for the "ISO-8859-10" character set. ]
[ Some special characters may be displayed incorrectly. ]
I'm trying to setup an SDM installation with managed nodes on Amazon EC2 using the SDM Cloud Adapter.
The installation of the adapter itself was successfull, but now I'm having some problems when starting cloud hosts.
To start a cloud host, I use the commands as described in the wiki:
sdmadm add_resource -s ec2 (filled in unbound_name and amiId in the editor. I'm using the sample AMI)
smdadm move_resource -r cloud1 -s spare_pool
when watching the 'sdmam show_resource' output I can see that the instance is successfully started (I confirmed this by using the online Amazon EC2 Management Console).
However, during the UNASSIGNING phase, a problem occurs while the VPN is started on the cloud host.
output of 'sdmadm show_resource':
ec2 res#16 cloud1 ERROR host U 2 Step 'Starting up VPN connection' failed (see 'Starting_up_virtual_resource-2010-03-10_11:21:43-res#16.log')
I already did some research on the cause of this problem (by increasing log output, removing the undo-steps so that the cloud node is not shutdown when the problem occurs and examining the executed scripts).
I found out that the problem lies with the execution of the /opt/sdm/util/cloud/ec2/ami_scripts/startup-vpn.sh script on the cloud node.
More specifically, the 'wait_for_ping $VPN_SERVER_VPN_IP "VPN server"' part fails.
I suspect this is caused by the './openvpn --config "$VPN_CONFIG_FILE" --daemon' that is executed before the wait_for_ping command.
I tried to run the 'openvpn' command manually on the cloud host and got the following output:
Wed Mar 10 10:55:16 2010 TCP: connect to 127.0.0.1:1194<http://127.0.0.1:1194> failed, will try again in 5 seconds: Connection refused (errno=146)
This probably is the root of the problem.
Now, I'm not really sure where to go from here. Is this a problem at the cloud host or is this because the vpn server at the SDM master host is not setup correctly (I'm sure it is running though => confirmed by 'ps -e | grep openvpn')?
Can anyone give me some help?
More information about the gridengine-users