Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (85 - 87 of 431)

Ticket Resolution Summary Owner Reporter
#1456 fixed min_uid and min_gid apply to prefix users of prolog Dave Love <d.love@…> wish
Description

Setting min_uid and min_gid prevents sge_execd from running a prolog as a user with a uid or gid below this value using the user@ prefix. This can be worked around (eg by running it as a special user with uid/gid above the threshold and using sudo to switch to the desired user) it is unexpected and adds a difficulty for no real security gain. The effect of this failure to run the prolog seems to be to push the job back on to the queue.

The same problem probably applies to the epilog.

#1467 fixed [SoGE 8.1.3] Bug: builtin method qlogin/qrsh failing Dave Love <d.love@…> t.mainka@…
Description

Hello,

we experienced a problem on RHEL/CentOS 6 machines with qlogin/qrsh via the builtin starter. The job seems to be scheduled and started fine, but for some reason the shell at the end won't start and the job ends with a commlib error:

$ qlogin -verbose -q queue@host Your job 998 ("QLOGIN") has been submitted waiting for interactive job to be scheduled ... Your interactive job 998 has been successfully scheduled. Establishing builtin session to host exechost.f.q.d.n ... error: commlib error: got read error (closing "exechost.f.q.d.n/shepherd_ijs/2")

Tracing through the execd on the destination machine showed that the execle() call for the shell failed with EFAULT:

write(4, "07/09/2013 08:30:44 [50449:30912]: execle(/bin/bash, -bash, NULL,

env)\n", 71) = 71

execve("/bin/bash", -bash?, ["SHELL=/bin/bash", "HOME=/home/username",

"TERM=xterm", "LOGNAME=username", "PATH=/bin:/usr/bin", 0x7fffffffffff]) = -1 EFAULT

After some digging it looks like the environment array the funtion start_qlogin_job() generates isn't properly ended with a NULL pointer any more (like it was in the SGE 6.2u5 source).

The attached trivial patch fixed our problems.

Regards, Thomas Mainka

-- Thomas Mainka science+computing ag System Administration Hagellocher Weg 73 mail: t.mainka@… 72070 Tuebingen, Germany tel.: +49 7071 9457 472 www.science-computing.de -- Vorstandsvorsitzender/Chairman? of the board of management: Gerd-Lothar Leonhart Vorstand/Board? of Management: Dr. Bernd Finkbeiner, Michael Heinrichs, Dr. Arno Steitz, Dr. Ingrid Zech Vorsitzender des Aufsichtsrats/ Chairman of the Supervisory Board: Philippe Miltin Sitz/Registered? Office: Tuebingen Registergericht/Registration? Court: Stuttgart Registernummer/Commercial? Register No.: HRB 382196

sge-builtin_starter.patch

#1475 fixed Client JSV slow down submit too much Dave Love <d.love@…> wangvisual
Description

Without client JSVs, qsub can be finished within 0.03 seconds, with one JSV, qsub need 1.1 seconds, with 2 JSVs, qsub need 2.2 seconds. The JSV script only takes 0.1 second to run so it's issue with GRD.

Actually we'e using Univa Grid Engine 8.1.4 but as the code are from the same base, SoGE might have the same issue, you can verify it by just compare the turn around time for: date +%X.%N ; echo ls | qsub -P bnormal -clear; date +%X.%N and date +%X.%N ; echo ls | qsub -P bnormal -clear -jsv jsv_script ; date +%X.%N

We've reported to UGE and they will give a fixed version, but I just noticed they didn't open source their core since 8.0 any more.

The related codes are jsv_stop() in https://arc.liv.ac.uk/trac/SGE/browser/sge/source/libs/sgeobj/sge_jsv.c & sge_peclose() in https://arc.liv.ac.uk/trac/SGE/browser/sge/source/libs/uti/sge_stdio.c

jsv_stop will first send 'QUIT' to jsv process and then call sge_peclose, at this time, the jsv process is about to exit, but most of the time it not becomes zombie yet, so first call of waitpid(pid, NOHANG) will fail and sge_peclose will sleep for 1 second and retry.

The 'sleep 1' is the root cause of the slowness.

BTW, There's one workaround for this issue, If the JSV script suicide after sending the ACCEPT or REJECT command, then the TAT is very short, eg: jsv_accept('Job is now accepted'); kill "INT", $$;

Note: See TracQuery for help on using queries.