Opened 2 months ago

#1648 new defect

SGE on Ubuntu 18

Reported by: ddeidda@… Owned by:
Priority: normal Milestone:
Component: sge Version: 8.1.9
Severity: minor Keywords:
Cc:

Description

Hi,
I am Davide Deidda from AWS ParallelCluster? team.
https://github.com/aws/aws-parallelcluster

We are integrating support for SGE 8.1.9 on Ubuntu 18.04 and we found some issues at compilation time.

At compile time we had the following errors:

../libs/comm/cl_ssl_framework.c:487:19: error: storage size of ‘verify_ctx’ isn’t known

X509_STORE_CTX verify_ctx;

../sh.proc.c:153:16: error: storage size of ‘w’ isn’t known

union wait w;

We also tried to checkout the latest build from https://arc.liv.ac.uk/repos/darcs/sge; in this case we were able to build the code but we had a crash (segfault) at runtime after submitting some jobs

kernel: sge_qmaster[9838]: segfault at 4

Steps to reproduce the problem:

1) run commands

echo "sleep 20 && echo hello"|qsub

2) then wait some time and run

qstat

this gives the rror:
error: commlib error: got select error (Connection reset by peer)
[[Aerror: unable to send message to qmaster using port 6444 on host "[...]": got send error

3) Looking at /var/log/syslog we see:

sge_qmaster[14218]: segfault at 4 ip 000055dee9265d1e sp 00007fa649ffa170 error 4 in sge_qmaster[55dee9168000+28d000]

4) Gdb's backtrace

Thread 13 "sge_qmaster" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f17dcff9700 (LWP 16121)]
0x00005571bfc18d1e in sge_create_orders ()
(gdb) bt
#0  0x00005571bfc18d1e in sge_create_orders ()
#1  0x00005571bfc102e9 in sge_build_sgeee_orders ()
#2  0x00005571bfc11100 in sgeee_scheduler ()
#3  0x00005571bfb61c6a in scheduler_method ()
#4  0x00005571bfb5b798 in sge_scheduler_main ()
#5  0x00007f17e84676db in start_thread (arg=0x7f17dcff9700) at pthread_create.c:463
#6  0x00007f17e819088f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

While we did not investigate deeper on the latest build because it was not an official version, we managed to make version 8.1.9 build and run by applying a patch to the code.

Most of the patch comes from this commit to your repository:

https://gitlab.com/loveshack/sge/commit/0b6d6e0dc5f3bb3ad8176141938d6db0935de3b9

plus some minor fixes that of course we can share if needed.

Do you have any plan about addressing this issue in the next future? Is there an official new version in your roadmap which will include support for Ubuntu 18.04 out of the box?

Best Regards
Davide

Change History (0)

Note: See TracTickets for help on using tickets.