Custom Query (431 matches)

Filters
 
Or
 
  
 
Columns

Show under each result:


Results (61 - 63 of 431)

Ticket Resolution Summary Owner Reporter
#276 fixed IZ1803: Binary jobs are problematic for starter and epilog scripts roland
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1803]

        Issue #:      1803             Platform:     All      Reporter: roland (roland)
       Component:     gridengine          OS:        All
     Subcomponent:    execution        Version:      6.0u3       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    pollinger (pollinger)
      QA Contact:     pollinger
          URL:
       * Summary:     Binary jobs are problematic for starter and epilog scripts
   Status whiteboard:
      Attachments:

     Issue 1803 blocks:
   Votes for issue 1803:


   Opened: Mon Sep 19 02:14:00 -0700 2005 
------------------------


I have a customer with a Sun Grid Engine 6.x installation to whom
we provide a special starter method script to select some resources,
set environment variables, and start the job.

In general, this ksh starter method starts the job using
  $SGE_STARTER_SHELL_PATH "$@"
This normally works, but will NOT WORK in general when a
"binary" job was submitted using qsub -b y.  In this binary
case, $SGE_STARTER_SHELL_PATH="/bin/csh" and $1=="-c" and
$2 is all user arguments in one string.

Problem #1 is that that is not all the arguments the script gets.
If the user typed
        qsub -b y /my/path/to/myprogram arg1 arg2 arg3
$2 will be "/my/path/to/myprogram arg1 arg2 arg3" but
$3 will be "arg1" and $4=arg2 and $5=arg3.  THIS IS A BUG!

In fact, if arg1 were "-none" then /bin/csh parse the -c and
the $2, but then *also* parses the -none in $3 and will NOT
EXECUTE (because csh's -n option means do not execute) the
user's program in $2!  For example, try
      qsub -b y /bin/echo -n " arg2 " " arg3" "arg4 "

Lesser problem #2 is only evident if there are spaces (or
shell metacharacters) in the arguments.  If the user typed
        qsub -b y /my/path/to/myprogram " arg1 " " arg2" "arg3 "
then $2 is "/my/path/to/myprogram  arg1   arg2 arg3 " but
when /bin/csh reinterprets this string, the effect of the user
quotation marks (the spaces that should be with the args) is
lost, and the actual program will see arguments "arg1" "arg2" "arg3"
(assuming that problem #1 is solved).
dean.stanton@sun.com 2005-04-14 03:14:18 GMT

Problem #3 is that the epilog appears to be invoked using the user's
$SHELL with arguments -c "{path to epilog} {job's program-args}"
and then an additional copy of the job's program-args.  I issued
        qsub -b y /home/stanton/scripts/args " arg2 " " arg3" "arg4 "
and my epilog was invoked with
        $0 is '/path/to/my/debug_epilog'
        $* is (arg2 arg3 arg4)
There is no reason that the epilog should need the target program's
arguments.  If the epilog wants those arguments, it surely wants the
name of the target program, as well; $* is a poor way to provide that
optional info to the epilog.  And these arguments have been reparsed,
so the spacing has been lost.

What's worse, the user's shell is invoked (as in #1) like
    /bin/csh -c "/path/to/my/debug_epilog  arg2   arg3 arg4 "
followed by a repeat of the individual arguments
        " arg2 " " arg3" "arg4 "
These are not intended as arguments to /bin/csh nor to epilog but
to the job's target program.

In particular, if the first argument starts with -n, then as described
in #1 above, the epilog is NOT ACTUALLY INVOKED by csh!  Of course,
the intention is that the epilog run for every job.  This is a more
serious bug, as users should not be able to keep the epilog from
running.

And when the user's shell is tcsh, it is even more fussy about its arguments.
When csh sees an unknown argument, such as -w, it seems to ignore it:
        /bin/csh -c "echo -w arg2" -w arg2
        -w arg2
But when tcsh sees an unknown argument, it complains:
        /bin/tcsh -c "echo -w arg2" -w arg2
        Unknown option: `-w'
        Usage: tcsh [ -bcdefilmnqstvVxX ] [ argument ... ].
and exits with an error status (1).  These notes were in the E-mail sent to
the SGE administrative E-mail address:
[26889:1783]: execvp(/bin/tcsh, "-tcsh" "-c" "/gridware/sge/debug_epilog -none
-time " "-none" "-time")
[53:1829]: wait3 returned 1873 (status: 256; WIFSIGNALED: 0,  WIFEXITED: 1,
WEXITSTATUS: 1)
[53:1829]: epilog exited with exit status 1
[53:1829]: reaped "epilog" with pid 1873
[53:1829]: epilog exited not due to signal
[53:1829]: epilog exited with status 1

Issue #4 is that if $SHELL is /bin/csh (and perhaps for tcsh,
as well), the program should arguably be invoked with the -f (fast)
flag, as well, that skips sourcing of the user's ~/.cshrc file.
The reason this can matter is that if the user's ~/.cshrc file
prints anything, that output is included in the job's output
(after the target program is invoked, when the epilog invocation
is attempted).  tty and stty commands will often fail with a
message like "not a tty".  Again, using /usr/bin/env might be a
better way to invoke the epilog than /bin/csh (or the user's $SHELL).

   ------- Additional comments from roland Tue Dec 6 08:34:01 -0700 2005 -------
*** Issue 1337 has been marked as a duplicate of this issue. ***
#294 fixed IZ1882: mutually subordinating queues suspend eachother simultaneously bjfcoomer
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1882]

        Issue #:      1882             Platform:     All      Reporter: bjfcoomer (bjfcoomer)
       Component:     gridengine          OS:        All
     Subcomponent:    scheduling       Version:      6.2u5       CC:    None defined
        Status:       REOPENED         Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: 6.0u7
      Assigned to:    sgrell (sgrell)
      QA Contact:     andreas
          URL:
       * Summary:     mutually subordinating queues suspend eachother simultaneously
   Status whiteboard:
      Attachments:

     Issue 1882 blocks:
   Votes for issue 1882:


   Opened: Fri Nov 11 04:04:00 -0700 2005 
------------------------


The full issue is reproduced by the stuff below. The basic problem is that jobs
get scheduled to queues which are mutually subordinate to eachother
simultaneously. So they suspend each other.


>>> (1) A parallel job is running, and one is queued, and serial jobs
>>> are
>>> queued
>>>
>>> sccomp@test:~/EXAMPLE/serial> qstat -f
>>> queuename                      qtype used/tot. load_avg arch
>>> states
>>> --------------------------------------------------------------------
>>> --------
>>> master.q@test.grid.cluster P     1/8       0.00     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp00.grid.cluster P     1/1       0.03     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp01.grid.cluster P     1/1       0.03     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp02.grid.cluster P     1/1       0.07     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp03.grid.cluster P     1/1       0.03     lx24-amd64
>>>     525 500.51000 PMB-MPI1.s sccomp       r     11/03/2005 18:44:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp00.grid.cluster BI    0/2       0.03     lx24-
>>> amd64    S
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp01.grid.cluster BI    0/2       0.03     lx24-
>>> amd64    S
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp02.grid.cluster BI    0/2       0.07     lx24-
>>> amd64    S
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp03.grid.cluster BI    0/2       0.03     lx24-
>>> amd64    S
>>>
>>> ####################################################################
>>> ########
>>>  - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS -
>>> PENDING
>>> JOBS
>>> ####################################################################
>>> ########
>>>     526 1000.51000 PMB-MPI1.s sccomp       qw    11/03/2005 18:44:28
>>> 5
>>>     527 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:45     1
>>>     528 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:45     1
>>>     529 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:46     1
>>>     530 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:46     1
>>>     531 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:47     1
>>>     532 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:47     1
>>>     533 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:48     1
>>>     534 0.51000 hello.sh   sccomp       qw    11/03/2005
>>> 18:44:48     1
>>>
>>>
>>> (2) I qdel the running parallel job and then do qstat -f
>>>
>>> sccomp@test:~/EXAMPLE/serial> qstat -f
>>> queuename                      qtype used/tot. load_avg arch
>>> states
>>> --------------------------------------------------------------------
>>> --------
>>> master.q@test.grid.cluster P     1/8       0.00     lx24-amd64
>>>     526 1000.51000 PMB-MPI1.s sccomp       t     11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp00.grid.cluster P     1/1       0.28     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp01.grid.cluster P     1/1       0.28     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp02.grid.cluster P     1/1       0.31     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> parallel.q@comp03.grid.cluster P     1/1       0.28     lx24-
>>> amd64    S
>>>     526 1000.51000 PMB-MPI1.s sccomp       St    11/03/2005 18:45:27
>>> 1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp00.grid.cluster BI    2/2       0.28     lx24-
>>> amd64    S
>>>     527 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     533 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp01.grid.cluster BI    2/2       0.28     lx24-
>>> amd64    S
>>>     529 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     531 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp02.grid.cluster BI    2/2       0.31     lx24-
>>> amd64    S
>>>     530 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     534 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>> --------------------------------------------------------------------
>>> --------
>>> serial.q@comp03.grid.cluster BI    2/2       0.28     lx24-
>>> amd64    S
>>>     528 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>     532 0.51000 hello.sh   sccomp       St    11/03/2005
>>> 18:45:27     1
>>>
>>>
>>>
>>> And here is the log from the scheduler monitor:
>>> ::::::::
>>> 525:1:RUNNING:1131043467:600:P:score:slots:5.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 525:1:RUNNING:
>>> 1131043467:600:Q:parallel.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:STARTING:1131043527:600:P:score:slots:5.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:STARTING:
>>> 1131043527:600:Q:parallel.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> 527:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 528:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 529:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> 530:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 531:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> 532:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 533:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 534:1:STARTING:
>>> 1131043527:600:Q:serial.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:SUSPENDED:1131043527:600:P:score:slots:5.000000
>>> 526:1:SUSPENDED:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:SUSPENDED:1131043527:600:P:score:slots:5.000000
>>> 526:1:SUSPENDED:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>> 526:1:RUNNING:1131043527:600:P:score:slots:5.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:master.q@test.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp00.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp02.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp03.grid.cluster.ac.uk:slots:1.000000
>>> 526:1:RUNNING:
>>> 1131043527:600:Q:parallel.q@comp01.grid.cluster.ac.uk:slots:1.000000
>>> ::::::::
>>>
>>>

   ------- Additional comments from sgrell Wed Nov 23 01:33:18 -0700 2005 -------
Started working on this issue.

Stephan

   ------- Additional comments from sgrell Wed Nov 23 09:03:03 -0700 2005 -------
Fixed in maintrunk and for u7.

Stephan

   ------- Additional comments from reuti Mon Aug 9 16:41:41 -0700 2010 -------
A parallel job can suspend itself, we he got slots in the sub- and superordinated queue at the same time:

reuti@pc15370:~> qsub -pe openmpi 8 -l h=pc15370 test_mpich.sh
Your job 1868 ("test_mpich.sh") has been submitted
reuti@pc15370:~> qstat -g t
job-ID  prior   name       user         state submit/start at     queue                          master ja-task-ID
------------------------------------------------------------------------------------------------------------------
   1868 0.75500 test_mpich reuti        S     08/10/2010 01:31:11 all.q@pc15370 SLAVE
                                                                  all.q@pc15370 SLAVE
                                                                  all.q@pc15370 SLAVE
                                                                  all.q@pc15370 SLAVE
   1868 0.75500 test_mpich reuti        S     08/10/2010 01:31:11 extra.q@pc15370 MASTER
                                                                  extra.q@pc15370 SLAVE
                                                                  extra.q@pc15370 SLAVE
                                                                  extra.q@pc15370 SLAVE

extra.q is entered as subordinated queue in all.q (classic subordination). There are some other issues which are similar, so I'm not sure
whether this is the most appropriate one or: 437 / 2397
#295 fixed IZ1887: complex man page describes regex incorrectly ovid
Description

[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1887]

        Issue #:      1887             Platform:     PC       Reporter: ovid (ovid)
       Component:     gridengine          OS:        All
     Subcomponent:    man              Version:      6.0u6       CC:    None defined
        Status:       NEW              Priority:     P4
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    andreas (andreas)
      QA Contact:     andreas
          URL:
       * Summary:     complex man page describes regex incorrectly
   Status whiteboard:
      Attachments:

     Issue 1887 blocks:
   Votes for issue 1887:


   Opened: Fri Nov 11 16:09:00 -0700 2005 
------------------------


The man page for complex(5) says under RESTRING:



           - "[xx]": specifies an array or a range of allowed
                     characters for one character at a specific
                     position

That is incorrect. It should say "[x-y]" . The behaviour specified
in the man page is not supported.

   ------- Additional comments from pollinger Mon Mar 23 04:02:04 -0700 2009 -------
Changed Subcomponent to man

   ------- Additional comments from pollinger Mon Mar 23 04:02:27 -0700 2009 -------
Changed Subcomponent to man

   ------- Additional comments from pollinger Mon Mar 23 04:03:10 -0700 2009 -------
Changing Subcomponent didn't work, 3rd try...
Note: See TracQuery for help on using queries.