Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#1475 closed defect (fixed)

Client JSV slow down submit too much

Reported by: wangvisual Owned by: Dave Love <d.love@…>
Priority: normal Milestone:
Component: sge Version: 8.1.4
Severity: minor Keywords: jsv qsub
Cc:

Description

Without client JSVs, qsub can be finished within 0.03 seconds, with one JSV, qsub need 1.1 seconds, with 2 JSVs, qsub need 2.2 seconds. The JSV script only takes 0.1 second to run so it's issue with GRD.

Actually we'e using Univa Grid Engine 8.1.4 but as the code are from the same base, SoGE might have the same issue, you can verify it by just compare the turn around time for:
date +%X.%N ; echo ls | qsub -P bnormal -clear; date +%X.%N
and
date +%X.%N ; echo ls | qsub -P bnormal -clear -jsv jsv_script ; date +%X.%N

We've reported to UGE and they will give a fixed version, but I just noticed they didn't open source their core since 8.0 any more.

The related codes are jsv_stop() in https://arc.liv.ac.uk/trac/SGE/browser/sge/source/libs/sgeobj/sge_jsv.c & sge_peclose() in https://arc.liv.ac.uk/trac/SGE/browser/sge/source/libs/uti/sge_stdio.c

jsv_stop will first send 'QUIT' to jsv process and then call sge_peclose, at this time, the jsv process is about to exit, but most of the time it not becomes zombie yet, so first call of waitpid(pid, NOHANG) will fail and sge_peclose will sleep for 1 second and retry.

The 'sleep 1' is the root cause of the slowness.

BTW, There's one workaround for this issue, If the JSV script suicide after sending the ACCEPT or REJECT command, then the TAT is very short, eg:
jsv_accept('Job is now accepted');
kill "INT", $$;

Change History (3)

comment:1 Changed 6 years ago by dlove

That's annoyed me, but I didn't get round to looking at it. Thanks for
being socially minded and contributing the analysis. I'll alter the
sleep pattern.

comment:2 Changed 6 years ago by Dave Love <d.love@…>

  • Owner set to Dave Love <d.love@…>
  • Resolution set to fixed
  • Status changed from new to closed

In 4604/sge:

Fix #1475: Alter sleep pattern in sge_peclose to avoid JSV 1s pause
Thanks to Opera Wang.

comment:3 Changed 6 years ago by Dave Love <d.love@…>

In 4613/sge:

Fix change for #1475
Thanks to Nicolas Joly. Refs #1475

Note: See TracTickets for help on using tickets.