Opened 51 years ago
Last modified 10 years ago
#906 new defect
IZ603: Corrupted user mode install because of path expansion for automounted directories
Reported by: | afisch | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | hedeby | Version: | 1.0u1 |
Severity: | Keywords: | Sun cli | |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=603]
Issue #: 603 Platform: Sun Reporter: afisch (afisch) Component: hedeby OS: All Subcomponent: cli Version: 1.0u1 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: DEFECT Target milestone: 1.0u5next Assigned to: adoerr (adoerr) QA Contact: adoerr URL: * Summary: Corrupted user mode install because of path expansion for automounted directories Status whiteboard: Attachments: Issue 603 blocks: Votes for issue 603: Vote for this issue Opened: Tue Nov 11 05:53:00 -0700 2008 ------------------------ Description: The problem can only occur, if the file system of the used machine is expanding the path of auto mounted directories. The following problem can be observed on such machines: If the home directory of user foo is auto mounted as /home/foo, the expanded path may look like /private/var/automount/home/foo. Although the home directory is accessible with /home/foo, a pwd command will report /private/var/automount/home/foo. If this behavior is not consistently present on all machines used, it can lead to the following error scenario: SDM master host is installed in user mode on a machine with the expanded path problem (hostA): hostA% cd /home/foo/sdm_root/bin hostA% pwd /private/ var/automount/home/foo/sdm_root/bin hostA% sdmadm -suserModeSdm -p user install_master_host -ca_admin_mail "a" -ca_state "a" -ca_country "aa" -ca_location "a" -ca_org_unit "a" -ca_org "a" -au some_user -cs_port 31226 -l /tmp/sdmaf2 -sge_root /sge_dir Install and startup should work without problems. Now a managed host is installed on a different machine (hostB) that does not suffer from the problem: hostB% cd /home/foo/sdm_root/bin hostB% pwd /home/foo/sdm_root/bin hostB% /sdmadm -suserModeSdm -p user -keystore keystore.file -cacert cacert.pem install_managed_host -au some_user -l /tmp/sdmaf2 -cs_url hostA:31226 class com.sun.grid.grm.cli.SdmAdm not found in classpath Using file:/home/foo/sdm_root/lib/sdm-common.jar A configuration for system "sdmaf2" has been added. Error: error setting up ssl: No security module found in URLClassLoader{ file:/home/foo/sdm_root/lib/sdm-common.jar } AppClassLoader{ file:/home/foo/sdm_root/lib/sdm-starter.jar } ExtClassLoader{ file:/ ... /jre/lib/ext/dnsns.jar file:/ ... /jre/lib/ext/sunpkcs11.jar file:/ ... /jre/lib/ext/sunjce_provider.jar file:/ ... /jre/lib/ext/localedata.jar } ]]] This path expansion problem was observed on a Mac Os 10.4 machine. Interestingly the expansion phenomenon could be observed on the command line if a tc shell was used, but not with a bash shell. Evaluation: The issue is rated as a p3 defect as it is a rare case and a work around exists. However the user can not conclude from the error message where the problem is rooted. Suggested Fix / Work Around: Suggested Fix: The SDM system should not have problems with the path expansion. If this is not solvable, at least the error message should clearly state what the problem is. Work Around: If the user is aware of the problem in the moment he is installing the master host, he can use the -dist option to provide the dist dir for the install_master_host command explicitly. If the user allready installed the master host without the -dist option, the bootstrap information can be fixed manually: The file /home/[user]/.sdm/bootstrap/[sdm_system]/prefs.properties has to be edited. The path for the property dist has to be replaced with the unexpanded variant: dist=[dist_path] The current value of the dist property can be checked with the -all switch of the sbc command. Analysis: The problem does not happen with a system install, as the system install has usually local bootstrap directories. Thus an expanded dist path is only visible for the machine where the expanded dist path is valid. The problem would disappear after a managed host is installed, because then a host specific bootstrap dir is created (/home/[user]/.sdm/bootstrap/[sdm_system]/[host]/prefs.properties) that outrules the corrupted one in the bootstrap root (/home/[user]/.sdm/bootstrap/[sdm_system]/prefs.properties). However as the install can not be performed, the host specific bootstrap can not be created. Location where SDM automatically determines the dist dir that is saved in the bootstrap config: If a new SDM instance is added, the auto discovery of the dist path is requested by AddSystemCommand objects. The auto discovery of the dist dir happens in PathUtil.getDistLibURL(). The Path is extracted from an url object for a ResourceBundle file. After the dir is determined, it is saved with: PerferencesUtil.setDistDir(env.getDistDir()). The problem can be handled in three ways: We find a java functionality that allows us to get the unexpanded path variant (No idea so far), we make the -dist option mandatory (API change), or we can leave it unfixed and just report a reasonable error. The error message should clearly quote where the problem is rooted and ask the user to manually adjust the dist dir in the pref.properties file. As the problem is not apparent for the Master host install (the expanded dist path is valid here), this has to happen later on any managed host machine where the expanded path can not be resolved. The error message: It comes from Modules.getSecurityModule()#98. It states that it can not find the security module. This is correct. However no module is loaded at all. Thus the command should not get this far. The problem is rooted in a "pre-command stage" in MainWrapper. Here the classloader switch takes places. This trick allows for any SDM instance to have its own binaries (located in the dist dir). After the switch the command is executed with a modified classloader that uses the classes from the dist dir.The Mainwrapper exploits the fact that new threads can have their own classloaders and can then load classes from other locations as the ones defined for the classloader of the main thread. The only condition is that the main thread has not loaded these classes so far, as they are cached. The MainWrapper works as follows: 1) The MainWrapper is started with only sdm-starter.jar to avoid class loading conflicts. The jar mainly consists of the the mainwrapper code. 2) MainWrapper.run() is executed by the main() thread that uses the default initial classloader. In run() the java version is checked then a separate thread (SystemFinderThread) is started with a second independent classloader that only sees the sdm-common.jar of the local dist dir (where the command was started from the command line). The reason for the single jar is, that the thread simply does not need anything else. This thread determines the system that is addressed with the command. In case that no explicit system was addressed it determines the local dist dir that was used to start the command (see SystemFinder.initFromPrefs()). The main thread waits for this SystemFinderThread to end. 3) Then another thread (SystemRunThread) is started with a third classloader that uses the classpath determined by the SystemFinder. This thread executes the command that has been provided by command line arguments. If the third classloader fails the second class loader (the one that only sees common.jar) is used instead. The case that the third classloader could not be initialized (point 3) is the reason for the error message in the description section. The problem then is that the second classloader just knows the common jar. Thus the system will fail to execute the command correctly if it needs additional jars (eg. sdm-common.jar) To fix error message a set of changes has to be made: The problem that the dist dir is invalid should be addressed. This can already be detected in the Systemfinder by checking if the directory exists. If not, it should print/log a warning: "The dist dir [corrupted dir] is invalid, the local dist dir [local dist dir] is used instead." It should then switch to the local one, as it does if no system name was provided (SystemFinder.initFromPrefs()). If the third classloader can not be initialized correctly (for example if the dist dir is empty), the system should *NOT* switch to the second class loader. It should exit instead and print an error message that clearly states that the system failed to use the classpath provided by the SystemFinder and should print the used system name and the location where it found this invalid path (dist dir). Currently it only prints "class com.sun.grid.grm.cli.SdmAdm not found in classpath" and continues. This fix alone does lead to a new problem: The command show_bootstrap_config (sbc) will then also fail if it is executed for a corrupted install on the managed host. This is an unwanted behavior, as the command is helpful to correctly diagnose the problem. To prevent this situation, the SystemFinder class has to be changed. In SystemFinder.initFromPrefs() it should be checked what command will be executed and the local classpath should be used if it is the sbc command that should be executed. The method already parses the command line arguments for the global options to determine the system name. Similarly it should check if the sbc command is called, by using the default routine to determine the command. The classes MainWrapper and SystemFinder should be commented in a way that the bootstrap process can be perceived more easily. Parts of the explanations for this issue could be reused. Additionally we should add a hint to the hedeby installation manual to use the -dist option if there are problems with the path expansion. How to test: There should be three TS tests: 1) Normal mode: Install a masterhost in user mode on one host and start it. ==> Should work without fix. Install managed host for the same system on a different host and start it. ==> Should work without fix. Execute sbc on managed host with the installed system as system name. ==> Should work without fix. 2) Non existent dist path mode: Install a masterhost in user mode on one host and start it. ==> Should work without fix. Change the dist dir in the file /home/[user]/.sdm/bootstrap/[sdm_system]/prefs.properties to an invalid dir. Install managed host for the same system on a different host and start it. ==> without fix: the error of the issue ==> with fix: "Warning that default dist is used instead of invalid one" Execute sbc on managed host with the installed system as system name. ==> Should work in the same way with and without fix. 3) Corrupted dist path mode: Install a masterhost in user mode on one host and start it. ==> Should work even without fix. Change the dist dir in the file /home/[user]/.sdm/bootstrap/[sdm_system]/prefs.properties to an existent but invalid dir (eg. /tmp/). Install managed host for the same system on a different host and start it. ==> without fix: Similar error as the one of the issue. With fix: Errror that states that the dir is corrupted. Execute sbc on managed host with the installed system as system name. ==> Should work in the same way with and without fix. ETC 6 PD{ 3PD to fix, comment the code and update the installation guide. 3PD to write the TS test (it's complicated as the install command for an additional system has to be integrated.) } ------- Additional comments from rhierlmeier Wed Nov 25 07:21:09 -0700 2009 ------- Milestone changed
Note: See
TracTickets for help on using
tickets.