Opened 7 years ago

Last modified 6 years ago

#1461 new defect

inst_sge -auto does not work on SUSE 12.2

Reported by: hrathh Owned by:
Priority: normal Milestone:
Component: sge Version: 8.1.3
Severity: critical Keywords:
Cc: hrathh@…

Description

I installed the 8.1.3 RPMs by extracting them and tried to use

./install_execd -auto MyPool?.conf -noremote

on SUSE 12.2. It did not work without showing any error.

1)
After some debugging I found out that "$ECHO"
is set to "/bin/echo -e" for some reason.

I fixed this by adding "eval" before the $ECHO calls in
./util/install_modules/inst_common.sh
eval $ECHO [...]

2)
Another thing I need to fix was, that "gethostname" returns spaces at the beginning. I added a "sed 's/ g'" to CheckForLocalHostResolving?.

CheckForLocalHostResolving?()
{

output=$SGE_UTILBIN/gethostname| grep "^Host" | cut -f2 -d: | sed 's/ //g'

[...]

3)
Additionally "sgeexecd" causes a hang on shutdown for more than 60seconds.
This is due to an error in the header.
There should be a "Required-Stop: $remote_fs" in the header (not only in Required-Start).

util/rctemplates/sgeexecd_template:# Required-Start: $network $remote_fs
util/rctemplates/sgeexecd_template:# Required-Stop: $remote_fs
util/rctemplates/sgemaster_template:# Required-Start: $network $remote_fs
util/rctemplates/sgemaster_template:# Required-Stop: $remote_fs

4)
On SUSE 12.2 "libcrypto.so.10" does not exist and is called "libcrypto.so.1.0.0"
Maybe this could be fixed, too? I worked around by doing:
ln -fs /lib/libcrypto.so.1.0.0 /lib/libcrypto.so.10
ln -fs /lib/libssl.so.1.0.0 /lib/libssl.so.10
ln -fs /lib64/libcrypto.so.1.0.0 /lib64/libcrypto.so.10
ln -fs /lib64/libssl.so.1.0.0 /lib64/libssl.so.10

It would be very nice if these could be fixed. Thanks a lot!

Change History (20)

comment:1 Changed 7 years ago by hrathh

  • Severity changed from major to critical
  • Type changed from enhancement to defect
Version 0, edited 7 years ago by hrathh (next)

comment:2 Changed 7 years ago by hrathh

  • Cc hrathh@… added

Another minor thing that could be added to
inst_sge, install_execd and install_qmaster:

cd "dirname -- "$_""

Last edited 7 years ago by hrathh (previous) (diff)

comment:3 follow-up: Changed 7 years ago by dlove

Sorry for the late reply -- I wasn't getting bug notifications.

I installed the 8.1.3 RPMs by extracting them and tried to use

./install_execd -auto MyPool?.conf -noremote

I wouldn't really expect that to work. I tried to port the spec file to
SuSE, but didn't understand how you're supposed to specify dependencies,
in particular, since they appear to need specific package version
numbers, so I don't see how to make it work across SuSE releases. I'd
be happy to get suitable changes to the spec file.

on SUSE 12.2. It did not work without showing any error.

1)
After some debugging I found out that "$ECHO"
is set to "/bin/echo -e" for some reason.

Do you mean that the variable substitution comes out with the quotes
included? If so, could you tell me where it fails because of that? I
can't see how.

I fixed this by adding "eval" before the $ECHO calls in
./util/install_modules/inst_common.sh
eval $ECHO [...]

2)
Another thing I need to fix was, that "gethostname" returns spaces at the
beginning. I added a "sed 's/ g'" to CheckForLocalHostResolving?.

CheckForLocalHostResolving?()
{

output=`$SGE_UTILBIN/gethostname| grep "Host" | cut -f2 -d: | sed 's/

g'`
[...]

I don't understand why that matters, as $output is expanded in the "for"
loop subsequently. What shell is /bin/sh, in fact? There seems to be
some strange expansion happening.

3)
Additionally "sgeexecd" causes a hang on shutdown for more than 60seconds.
This is due to an error in the header.
There should be a "Required-Stop: $remote_fs" in the header (not only in
Required-Start).

Thanks. I wonder why that hasn't come up before...

4)
On SUSE 12.2 "libcrypto.so.10" does not exist and is called
"libcrypto.so.1.0.0"
Maybe this could be fixed, too?

I'm afraid it should be rebuilt to link against the right thing on SuSE.

comment:4 Changed 7 years ago by dlove

cd "dirname -- "$_""

That's not portable, but why would it be useful?

comment:5 in reply to: ↑ 3 Changed 7 years ago by hrathh

Thank you very much for your reply.

Replying to dlove:

I wouldn't really expect that to work. I tried to port the spec file to
SuSE, but didn't understand how you're supposed to specify dependencies,
in particular, since they appear to need specific package version
numbers, so I don't see how to make it work across SuSE releases. I'd
be happy to get suitable changes to the spec file.

I am sorry, I never built a spec file for SUSE before, so I would need to do some investigation on this, too.
But, as far as I know this should be very easy with the "openSUSE build service". It allows to build RPMs for any Linux, not just SUSE.
The only linkage difference seems to be the one with libcrypto.so for SUSE. So the symlinks work fine for me currently...

Replying to dlove:

Do you mean that the variable substitution comes out with the quotes
included? If so, could you tell me where it fails because of that? I
can't see how.
I don't understand why that matters, as $output is expanded in the "for"
loop subsequently. What shell is /bin/sh, in fact? There seems to be
some strange expansion happening.

No, the quotes are not in the variable.
I will report details on the two shell issues when I find some time. I was very happy I finally managed to get it running. I assumed this would be reproducible for you.
On SUSE /bin/sh usally is a symlink to /bin/bash (but bash behaves slightly different when called as sh).
Maybe that is the reason.

Replying to dlove:

cd "dirname -- "$_""

That's not portable, but why would it be useful?

There are surely multiple or more portable ways to accomplish the cd to the containing directory. I just picked the easiest solution I found.
Currently you always need to cd to the gridengine directory before you call the binary.
From outside of /software/GridEngine, calling this does not work:

/software/GridEngine/inst_qexecd ...

Because inst_qexecd calls inst_sge by simply doing:

./inst_sge ...

Sure, it's not important because you can simply do a cd before, but it was a bit annoying as it can easily be fixed.

comment:6 Changed 6 years ago by hrathh

I did another check on the shell issue with the gethostname problem.

The issue here is that the script relies heavily on the shell to auto-trim spaces at the beginning
and end of an assignment.

There is done an "cut -f2 -d:", but after the ":" there is actually a space from "Hostname: somename" left.

Normally /bin/sh does it correctly. When I run a small testscript with just the snippet it works, even in /bin/sh.
But when I run the whole install_execd it doesn't work. Maybe you set some special shell setting somewhere?

The snippet from ./util/install_modules/inst_common.sh:

   output=`./utilbin/lx-amd64/gethostname| grep "^Host" | cut -f2 -d:`
   for cmp in $output; do
      case "$cmp" in
      localhost*|127.0*)
         ;;
      *)
         echo "$cmp"
         isIp=`echo IsIpAddress $cmp `
         ;;
      esac
   done

Behaves as (note the additional " at beginning and end):

   output="`./utilbin/lx-amd64/gethostname|grep "^Host"|cut -f2 -d:`"
   for cmp in "$output"; do
      case "$cmp" in
      localhost*|127.0*)
         ;;
      *)
         echo "$cmp"
         isIp=`echo IsIpAddress "$cmp" `
         ;;
      esac
   done

"$cmp" comes out as "<space>$cmp" (" $cmp") and thus IsIpAddress? fails, because it doesn't trim spaces.

There seem to be some other issues like this in the script, but this is the only one that occured as bug to me.

I have used v6.2u5 on an older SUSE before and SGE worked back then.

Maybe that helps you to fix it properly. I have no more clue on this.

comment:7 Changed 6 years ago by hrathh

Now more details about the $ECHO problem.

$ECHO is set in /.util/arch_variables for SUSE to:

ECHO="/bin/echo -e"

So, that seems to be correct.

However, all $ECHO calls in ./util/install_modules/inst_common.sh silently fail.

$ECHO something

I had to change them all to:

eval $ECHO something 

Sadly, I wasn't able to reproduce it in a small testscript either. The shell only behaves like this in ./util/install_modules/inst_common.sh.

Hopefully, I could fully explain the problem for you. That's all I know.

comment:8 Changed 6 years ago by Dave Love <d.love@…>

In 4544/sge:

Don't set IFS when reading install template; just assume sh syntax
Refs #1461. Thanks to Nicolas Joly.

comment:9 Changed 6 years ago by hrathh

Yes, I can confirm removing the IFS solves it all. Thank you very much! This can be closed.

It would be very nice if inst_sge would support to install the sgeexecd as "systemd" style service instead of "init.d" style somewhen in the future. SUSE plans to transition to it.
This shouldn't be difficult, just a small text file (sgeexecd.service).

About that "cd "dirname -- "$_""" thing, here are more portable ways to accomplish this. Maybe one suits.
http://stackoverflow.com/questions/59895/can-a-bash-script-tell-what-directory-its-stored-in

comment:10 follow-up: Changed 6 years ago by dlove

SGE <sge-bugs@…> writes:

Yes, I can confirm removing the IFS solves it all. Thank you very much!
This can be closed.

I still wonder why it was a problem, as I checked portability of doing
that...

It would be very nice if inst_sge would support to install the sgeexecd as
"systemd" style service instead of "init.d" style somewhen in the future.
SUSE plans to transition to it.
This shouldn't be difficult, just a small text file (sgeexecd.service).

Is that actually necessary? Can't systemd run normal init scripts?
Service files may be trivial, but the existing scripts do more. In
particular, the execd one distinguishes "stop" and "softstop", and
"restart" isn't just stop+start.

About that "cd "dirname -- "$_""" thing, here are more portable ways to
accomplish this. Maybe one suits.
http://stackoverflow.com/questions/59895/can-a-bash-script-tell-what-
directory-its-stored-in

Is there a particular problem with cd'ing to the root directory? I'm
not convinced you can do the job portably (e.g. it needs to work on
SunOS, which is the most troublesome current system I know).

comment:11 in reply to: ↑ 10 Changed 6 years ago by hrathh

Replying to dlove:

I still wonder why it was a problem, as I checked portability of doing
that...

I am not sure what that IFS="\n" actually was supposed to do in that case. But, I also had some trouble with using that IFS myself, it always had some really strange undocumented sideeffects to me.


I stumbled about another minor thing, I just remembered:

./install_execd -auto Mittelerde.conf -noremote

This does not work intuitively, and fails silently without any error.
You have to call instead:

./install_execd -auto ./Mittelerde.conf -noremote

No real issue, but still.


Is that actually necessary? Can't systemd run normal init scripts?
Service files may be trivial, but the existing scripts do more. In
particular, the execd one distinguishes "stop" and "softstop", and
"restart" isn't just stop+start.

As far as I know, systemd does legacy scripts only if SysV is also installed. But I might be wrong here.
SUSE currently still runs both (SysV+systemd) until complete transition is done.
Here is a bit info on how this works (/etc/init.d/functions on SUSE is actually some other file, but still):
http://clalance.blogspot.se/2011/09/services-and-systemd.html
Everything still works fine the legacy way. So no real problem, yet.


Is there a particular problem with cd'ing to the root directory? I'm
not convinced you can do the job portably (e.g. it needs to work on
SunOS, which is the most troublesome current system I know).

If one doesn't "cd" to the SGEROOT directory before calling the binary you get strange errors. That's all.
Maybe one could just "cd" to SGEROOT if supported by the OS, and leave it to the user on SunOS or something. Or always use absolute paths using the environment variable SGE_ROOT. Just a suggestion.


Thanks for your effort.

comment:12 follow-up: Changed 6 years ago by dlove

SGE <sge-bugs@…> writes:

I stumbled about another minor thing, I just remembered:

./install_execd -auto Mittelerde.conf -noremote

This does not work intuitively, and fails silently without any
error.
You have to call instead:

./install_execd -auto ./Mittelerde.conf -noremote

I can't reproduce that, at least with the development version. (I have
fixed -auto clearing the screen so that you didn't see error messages.)

Is there a particular problem with cd'ing to the root directory? I'm
not convinced you can do the job portably (e.g. it needs to work on
SunOS, which is the most troublesome current system I know).

If one doesn't "cd" to the SGEROOT directory before calling the binary you
get strange errors. That's all.

I've added a more useful message for now.
Thanks.

comment:13 Changed 6 years ago by Dave Love <d.love@…>

In 4563/sge:

inst_sge tweaks
Warn if not in distribution directory, send -help o/p to stdout, don't
clear screen with -auto.
Refs #1461

comment:14 in reply to: ↑ 12 Changed 6 years ago by hrathh

Replying to dlove:

SGE <sge-bugs@…> writes:
I can't reproduce that, at least with the development version. (I have
fixed -auto clearing the screen so that you didn't see error messages.)

It still persists even after applying your changes here in this thread.
I tracked it down to the following:

in inst_sge:

     if [ ! -f "$2" ]; then
        AUTO="false"
        $INFOTEXT "Error: File %s does not exist!" "$FILE"
        ErrUsage
     fi

This succeeds, because the file "Mittelerde.conf" exists. So no error is shown.

in common_sge.sh

. $FILE

Fails with error:
Reading configuration from file Mittelerde.conf
./util/install_modules/inst_common.sh: line 573: .: Mittelerde.conf: file not found

. Mittelerde.conf

fails in this case. This seems to be another sideeffect of some shell setting. In normal login shell it works.
But, I had to comment out "Stdout2Log" to see the error.

. ./Mittelerde.conf

works on that line.


BTW, in your patch, shouldn't

clear=:

actually be

CLEAR=:

? (if that's correct just ignore that...)

comment:15 Changed 6 years ago by hrathh

BTW, it would be more convenient if the following could just succeed if the RC script did not change at all (maybe "cmp")...

Specified cluster name >$SGE_CLUSTER_NAME=Mittelerde< resulted in the following conflict!
Detected old RC scripts.
/etc/init.d/sgeexecd.Mittelerde

Remove existing component(s) of cluster > Mittelerde < first!

comment:16 Changed 6 years ago by dlove

./util/install_modules/inst_common.sh: line 573: .: Mittelerde.conf: file
not found

. Mittelerde.conf

fails in this case. This seems to be another sideeffect of some shell
setting. In normal login shell it works.
But, I had to comment out "Stdout2Log" to see the error.

. ./Mittelerde.conf

works on that line.

Ah. It looks like a bash bug or undocumented feature that it worked
when I tried. It never worked generally as "." is documented to use
PATH. I'll fix it, thanks.

BTW, in your patch, shouldn't

clear=:

actually be

CLEAR=:

? (if that's correct just ignore that...)

Yes, thanks! I must have edited that separately from the working copy
where I really fixed it some time ago.

comment:17 follow-up: Changed 6 years ago by Dave Love <d.love@…>

In 4570/sge:

Fix inst_sge -auto: set CLEAR correctly; don't lose with relative filename
Thanks to hrathh at uni-konstanz physik.
Refs #1461

comment:18 in reply to: ↑ 17 Changed 6 years ago by hrathh

Replying to Dave Love <d.love@…>:

In 4570/sge:

Fix inst_sge -auto: set CLEAR correctly; don't lose with relative filename
Thanks to hrathh at uni-konstanz physik.
Refs #1461

Sorry, but that fix needs a fix again (at least for SUSE)...

     if [ "`dirname '$FILE'`" = . ]; then  # dirname is documented to output '.' in that case
        # else sourcing fails without . in PATH
        FILE="./$FILE"  # be careful with escaped spaces (if any...)
     fi

Then it works. Thank you.

But then again, it displays ././filename.conf for files with ./ prepended, so it's probably better to do that check just around the failing source statement...

Last edited 6 years ago by hrathh (previous) (diff)

comment:19 Changed 6 years ago by hrathh

There's another problem I just experienced with -auto, if the file
SGE_ROOT/SGE_CELL/common/settings.sh
does not exist (anymore). It seems to have been deleted by

./inst_sge -ux -auto file.conf -noremote

for some reason...
After that

./inst_sge -x -auto file.conf -noremote

fails again without dispplaying anything error at all (more Stddout2Log's ´later in the script...)
after some digging:
line 931: /software/GridEngine/Mittelerde/common/settings.sh: No such file or directory


Another suggestion:
./inst_sge -help
...is very unspecific, which -arguments can be combined with each other. It is documented as if almost all can be combined with all. But it's actually just the case for a very few.
And some are really broken, when combined. I always needed to study the source to find that out...


Thank you very much!

comment:20 Changed 6 years ago by Dave Love <d.love@…>

In 4572/sge:

Handle relative filename correctly in last change
Refs #1461

Note: See TracTickets for help on using tickets.