Opened 6 years ago

Closed 6 years ago

#1467 closed defect (fixed)

[SoGE 8.1.3] Bug: builtin method qlogin/qrsh failing

Reported by: t.mainka@… Owned by: Dave Love <d.love@…>
Priority: normal Milestone:
Component: sge Version: 8.1.3
Severity: minor Keywords:
Cc:

Description

Hello,

we experienced a problem on RHEL/CentOS 6 machines with qlogin/qrsh via the
builtin starter. The job seems to be scheduled and started fine, but for some
reason the shell at the end won't start and the job ends with a commlib error:

$ qlogin -verbose -q queue@host
Your job 998 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 998 has been successfully scheduled.
Establishing builtin session to host exechost.f.q.d.n ...
error: commlib error: got read error (closing "exechost.f.q.d.n/shepherd_ijs/2")

Tracing through the execd on the destination machine showed that the execle() call
for the shell failed with EFAULT:

write(4, "07/09/2013 08:30:44 [50449:30912]: execle(/bin/bash, -bash, NULL,

env)\n", 71) = 71

execve("/bin/bash", -bash?, ["SHELL=/bin/bash", "HOME=/home/username",

"TERM=xterm", "LOGNAME=username", "PATH=/bin:/usr/bin",
0x7fffffffffff]) = -1 EFAULT

After some digging it looks like the environment array the funtion
start_qlogin_job() generates isn't properly ended with a NULL pointer any more
(like it was in the SGE 6.2u5 source).

The attached trivial patch fixed our problems.

Regards,
Thomas Mainka

--
Thomas Mainka science+computing ag
System Administration Hagellocher Weg 73
mail: t.mainka@… 72070 Tuebingen, Germany
tel.: +49 7071 9457 472 www.science-computing.de
--
Vorstandsvorsitzender/Chairman? of the board of management:
Gerd-Lothar Leonhart
Vorstand/Board? of Management:
Dr. Bernd Finkbeiner, Michael Heinrichs,
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered? Office: Tuebingen
Registergericht/Registration? Court: Stuttgart
Registernummer/Commercial? Register No.: HRB 382196

sge-builtin_starter.patch

Attachments (1)

sge-builtin_starter.patch (471 bytes) - added by t.mainka@… 6 years ago.
Added by email2trac

Download all attachments as: .zip

Change History (2)

Changed 6 years ago by t.mainka@…

Added by email2trac

comment:1 Changed 6 years ago by Dave Love <d.love@…>

  • Owner set to Dave Love <d.love@…>
  • Resolution set to fixed
  • Status changed from new to closed

In 4547/sge:

Fix #1467: avoid builtin_starter crashes (from Thomas Mainka)
Null termination of my_env had got lost.

Note: See TracTickets for help on using tickets.