Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (25 - 27 of 431)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Ticket Resolution Summary Owner Reporter
#1475 fixed Client JSV slow down submit too much Dave Love <d.love@…> wangvisual
Description

Without client JSVs, qsub can be finished within 0.03 seconds, with one JSV, qsub need 1.1 seconds, with 2 JSVs, qsub need 2.2 seconds. The JSV script only takes 0.1 second to run so it's issue with GRD.

Actually we'e using Univa Grid Engine 8.1.4 but as the code are from the same base, SoGE might have the same issue, you can verify it by just compare the turn around time for: date +%X.%N ; echo ls | qsub -P bnormal -clear; date +%X.%N and date +%X.%N ; echo ls | qsub -P bnormal -clear -jsv jsv_script ; date +%X.%N

We've reported to UGE and they will give a fixed version, but I just noticed they didn't open source their core since 8.0 any more.

The related codes are jsv_stop() in https://arc.liv.ac.uk/trac/SGE/browser/sge/source/libs/sgeobj/sge_jsv.c & sge_peclose() in https://arc.liv.ac.uk/trac/SGE/browser/sge/source/libs/uti/sge_stdio.c

jsv_stop will first send 'QUIT' to jsv process and then call sge_peclose, at this time, the jsv process is about to exit, but most of the time it not becomes zombie yet, so first call of waitpid(pid, NOHANG) will fail and sge_peclose will sleep for 1 second and retry.

The 'sleep 1' is the root cause of the slowness.

BTW, There's one workaround for this issue, If the JSV script suicide after sending the ACCEPT or REJECT command, then the TAT is very short, eg: jsv_accept('Job is now accepted'); kill "INT", $$;

#158 fixed IZ949: When qconf fails during installation, the diagnostic is incorrect uddeborg
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=949]

        Issue #:      949              Platform:     All       Reporter: uddeborg (uddeborg)
       Component:     gridengine          OS:        All
     Subcomponent:    install          Version:      6.0beta      CC:    None defined
        Status:       VERIFIED         Priority:     P3
      Resolution:     FIXED           Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andy (andy)
      QA Contact:     dom
          URL:
       * Summary:     When qconf fails during installation, the diagnostic is incorrect
   Status whiteboard:
      Attachments:

     Issue 949 blocks:
   Votes for issue 949:


   Opened: Fri Apr 2 07:27:00 -0700 2004 
------------------------


When I tried to install the beta on a host, the
procedure failed during the "Checking hostname
resolving" phase.  After failing, it prints "The
error message was:" and then the usage message
from qconf.

Trying to trace this a little, I came across this
code in CheckHostNameResolving() in inst_execd.sh

      $SGE_BIN/qconf -sh > /dev/null 2>&1
      if [ $? = 1 ]; then
         errmsg=`$SGE_BIN/qconf 2>&1`
      else
         errmsg=`$SGE_BIN/qconf -sh 2>&1 |  grep
denied:`
      fi

Here first qconf is run with the "-sh" flag.  Then
 when it fails, it is run again, in order to
capture the error message.  But if the exit code
was 1, it is run without the -sh flag, which seems
like the bug.  Ran in this way it does give the
usage message, but it is not the error message it
got (and discarded) in the first attempt.

   ------- Additional comments from andy Tue Apr 6 00:58:46 -0700 2004 -------
Fixed.

Call "qconf -sh" in case of error.

   ------- Additional comments from uddeborg Wed Apr 28 04:27:21 -0700 2004 -------
I've now verified the fix in beta 2.  (Am I, as a reporter, supposed
to do this step?  Should I be the one closing too/instead?)
#160 fixed IZ960: Buffer sent to getgrgid_r is too small uddeborg
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=960]

        Issue #:      960              Platform:     Sun       Reporter: uddeborg (uddeborg)
       Component:     gridengine          OS:        Solaris
     Subcomponent:    kernel           Version:      6.0beta      CC:    None defined
        Status:       VERIFIED         Priority:     P3
      Resolution:     FIXED           Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    adoerr (adoerr)
      QA Contact:     andreas
          URL:
       * Summary:     Buffer sent to getgrgid_r is too small
   Status whiteboard:
      Attachments:

     Issue 960 blocks:
   Votes for issue 960:


   Opened: Wed Apr 7 05:18:00 -0700 2004 
------------------------


My attempt to install on Solaris failed.  "qconf
-sh" returned the error message:

error: getgrgid(13) failed: No such file or directory

I tried to track this down.  It appears to be
because the buffer sent to getgrgid_r is too
small.  In the function sge_gid2group() in
source/libs/uti/sge_uidgid.c there is a call of
getgrgid_r with a buffer with a size of 2048.
This call fails when I run it on our 64 bit
Solaris machines.

According to the Solaris manual for getgrgid_r,
the maximum size which could be needed can be
found with the call sysconf(_SC_GETGR_R_SIZE_MAX).
 I tried this on a couple of platforms I have
available here, and got those figures:

Sparc, Solaris 8, 32 bit app: 7296
Sparc, Solaris 8, 64 bit app: 10496
PowerPC, AIX 5.2, 32 and 64 bit app: 20023
PARisc, HP-UX 11, 32 and 64 bit app: 2048
AMD64, Red Hat EL 3, 32 and 64 bit app: 1024
IA32, Red Hat EL 3, 32 bit app: 1024

It varies quite a lot, and 2048 obviously is too
small in several cases.  We have some groups with
rather many members, a bit over 100, which
probably affects this.  But not so many members
that an application should break.

Preferably, I'd suggest allocating a buffer with a
size taken from the return value of sysconf().
Otherwise, I would suggest to at least increase
the static size by an order of magnitude.

   ------- Additional comments from andreas Tue May 4 03:17:28 -0700 2004 -------
There is more than one function where this needs to be changed.

   ------- Additional comments from adoerr Tue May 11 05:57:48 -0700 2004 -------
Reassign

   ------- Additional comments from adoerr Sat May 22 07:33:41 -0700 2004 -------
Fixed.

   ------- Additional comments from uddeborg Thu May 27 02:38:42 -0700 2004 -------
I've rebuilt locally with source/libs/uti/sge_uidgid.c taken from
HEAD, and it seems to work fine now.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Note: See TracQuery for help on using queries.