Opened 14 years ago

Closed 7 years ago

#276 closed defect (fixed)

IZ1803: Binary jobs are problematic for starter and epilog scripts

Reported by: roland Owned by:
Priority: normal Milestone:
Component: sge Version: 6.0u3
Severity: minor Keywords: execution


[Imported from gridengine issuezilla]

        Issue #:      1803             Platform:     All      Reporter: roland (roland)
       Component:     gridengine          OS:        All
     Subcomponent:    execution        Version:      6.0u3       CC:    None defined
        Status:       NEW              Priority:     P3
      Resolution:                     Issue type:    DEFECT
                                   Target milestone: ---
      Assigned to:    pollinger (pollinger)
      QA Contact:     pollinger
       * Summary:     Binary jobs are problematic for starter and epilog scripts
   Status whiteboard:

     Issue 1803 blocks:
   Votes for issue 1803:

   Opened: Mon Sep 19 02:14:00 -0700 2005 

I have a customer with a Sun Grid Engine 6.x installation to whom
we provide a special starter method script to select some resources,
set environment variables, and start the job.

In general, this ksh starter method starts the job using
This normally works, but will NOT WORK in general when a
"binary" job was submitted using qsub -b y.  In this binary
case, $SGE_STARTER_SHELL_PATH="/bin/csh" and $1=="-c" and
$2 is all user arguments in one string.

Problem #1 is that that is not all the arguments the script gets.
If the user typed
        qsub -b y /my/path/to/myprogram arg1 arg2 arg3
$2 will be "/my/path/to/myprogram arg1 arg2 arg3" but
$3 will be "arg1" and $4=arg2 and $5=arg3.  THIS IS A BUG!

In fact, if arg1 were "-none" then /bin/csh parse the -c and
the $2, but then *also* parses the -none in $3 and will NOT
EXECUTE (because csh's -n option means do not execute) the
user's program in $2!  For example, try
      qsub -b y /bin/echo -n " arg2 " " arg3" "arg4 "

Lesser problem #2 is only evident if there are spaces (or
shell metacharacters) in the arguments.  If the user typed
        qsub -b y /my/path/to/myprogram " arg1 " " arg2" "arg3 "
then $2 is "/my/path/to/myprogram  arg1   arg2 arg3 " but
when /bin/csh reinterprets this string, the effect of the user
quotation marks (the spaces that should be with the args) is
lost, and the actual program will see arguments "arg1" "arg2" "arg3"
(assuming that problem #1 is solved). 2005-04-14 03:14:18 GMT

Problem #3 is that the epilog appears to be invoked using the user's
$SHELL with arguments -c "{path to epilog} {job's program-args}"
and then an additional copy of the job's program-args.  I issued
        qsub -b y /home/stanton/scripts/args " arg2 " " arg3" "arg4 "
and my epilog was invoked with
        $0 is '/path/to/my/debug_epilog'
        $* is (arg2 arg3 arg4)
There is no reason that the epilog should need the target program's
arguments.  If the epilog wants those arguments, it surely wants the
name of the target program, as well; $* is a poor way to provide that
optional info to the epilog.  And these arguments have been reparsed,
so the spacing has been lost.

What's worse, the user's shell is invoked (as in #1) like
    /bin/csh -c "/path/to/my/debug_epilog  arg2   arg3 arg4 "
followed by a repeat of the individual arguments
        " arg2 " " arg3" "arg4 "
These are not intended as arguments to /bin/csh nor to epilog but
to the job's target program.

In particular, if the first argument starts with -n, then as described
in #1 above, the epilog is NOT ACTUALLY INVOKED by csh!  Of course,
the intention is that the epilog run for every job.  This is a more
serious bug, as users should not be able to keep the epilog from

And when the user's shell is tcsh, it is even more fussy about its arguments.
When csh sees an unknown argument, such as -w, it seems to ignore it:
        /bin/csh -c "echo -w arg2" -w arg2
        -w arg2
But when tcsh sees an unknown argument, it complains:
        /bin/tcsh -c "echo -w arg2" -w arg2
        Unknown option: `-w'
        Usage: tcsh [ -bcdefilmnqstvVxX ] [ argument ... ].
and exits with an error status (1).  These notes were in the E-mail sent to
the SGE administrative E-mail address:
[26889:1783]: execvp(/bin/tcsh, "-tcsh" "-c" "/gridware/sge/debug_epilog -none
-time " "-none" "-time")
[53:1829]: wait3 returned 1873 (status: 256; WIFSIGNALED: 0,  WIFEXITED: 1,
[53:1829]: epilog exited with exit status 1
[53:1829]: reaped "epilog" with pid 1873
[53:1829]: epilog exited not due to signal
[53:1829]: epilog exited with status 1

Issue #4 is that if $SHELL is /bin/csh (and perhaps for tcsh,
as well), the program should arguably be invoked with the -f (fast)
flag, as well, that skips sourcing of the user's ~/.cshrc file.
The reason this can matter is that if the user's ~/.cshrc file
prints anything, that output is included in the job's output
(after the target program is invoked, when the epilog invocation
is attempted).  tty and stty commands will often fail with a
message like "not a tty".  Again, using /usr/bin/env might be a
better way to invoke the epilog than /bin/csh (or the user's $SHELL).

   ------- Additional comments from roland Tue Dec 6 08:34:01 -0700 2005 -------
*** Issue 1337 has been marked as a duplicate of this issue. ***

Change History (1)

comment:1 Changed 7 years ago by dlove

  • Resolution set to fixed
  • Severity set to minor
  • Status changed from new to closed

fixed by RD-2005-10-25-0

Note: See TracTickets for help on using tickets.