Opened 15 years ago
Closed 9 years ago
#276 closed defect (fixed)
IZ1803: Binary jobs are problematic for starter and epilog scripts
Reported by: | roland | Owned by: | |
---|---|---|---|
Priority: | normal | Milestone: | |
Component: | sge | Version: | 6.0u3 |
Severity: | minor | Keywords: | execution |
Cc: |
Description
[Imported from gridengine issuezilla http://gridengine.sunsource.net/issues/show_bug.cgi?id=1803]
Issue #: 1803 Platform: All Reporter: roland (roland) Component: gridengine OS: All Subcomponent: execution Version: 6.0u3 CC: None defined Status: NEW Priority: P3 Resolution: Issue type: DEFECT Target milestone: --- Assigned to: pollinger (pollinger) QA Contact: pollinger URL: * Summary: Binary jobs are problematic for starter and epilog scripts Status whiteboard: Attachments: Issue 1803 blocks: Votes for issue 1803: Opened: Mon Sep 19 02:14:00 -0700 2005 ------------------------ I have a customer with a Sun Grid Engine 6.x installation to whom we provide a special starter method script to select some resources, set environment variables, and start the job. In general, this ksh starter method starts the job using $SGE_STARTER_SHELL_PATH "$@" This normally works, but will NOT WORK in general when a "binary" job was submitted using qsub -b y. In this binary case, $SGE_STARTER_SHELL_PATH="/bin/csh" and $1=="-c" and $2 is all user arguments in one string. Problem #1 is that that is not all the arguments the script gets. If the user typed qsub -b y /my/path/to/myprogram arg1 arg2 arg3 $2 will be "/my/path/to/myprogram arg1 arg2 arg3" but $3 will be "arg1" and $4=arg2 and $5=arg3. THIS IS A BUG! In fact, if arg1 were "-none" then /bin/csh parse the -c and the $2, but then *also* parses the -none in $3 and will NOT EXECUTE (because csh's -n option means do not execute) the user's program in $2! For example, try qsub -b y /bin/echo -n " arg2 " " arg3" "arg4 " Lesser problem #2 is only evident if there are spaces (or shell metacharacters) in the arguments. If the user typed qsub -b y /my/path/to/myprogram " arg1 " " arg2" "arg3 " then $2 is "/my/path/to/myprogram arg1 arg2 arg3 " but when /bin/csh reinterprets this string, the effect of the user quotation marks (the spaces that should be with the args) is lost, and the actual program will see arguments "arg1" "arg2" "arg3" (assuming that problem #1 is solved). dean.stanton@sun.com 2005-04-14 03:14:18 GMT Problem #3 is that the epilog appears to be invoked using the user's $SHELL with arguments -c "{path to epilog} {job's program-args}" and then an additional copy of the job's program-args. I issued qsub -b y /home/stanton/scripts/args " arg2 " " arg3" "arg4 " and my epilog was invoked with $0 is '/path/to/my/debug_epilog' $* is (arg2 arg3 arg4) There is no reason that the epilog should need the target program's arguments. If the epilog wants those arguments, it surely wants the name of the target program, as well; $* is a poor way to provide that optional info to the epilog. And these arguments have been reparsed, so the spacing has been lost. What's worse, the user's shell is invoked (as in #1) like /bin/csh -c "/path/to/my/debug_epilog arg2 arg3 arg4 " followed by a repeat of the individual arguments " arg2 " " arg3" "arg4 " These are not intended as arguments to /bin/csh nor to epilog but to the job's target program. In particular, if the first argument starts with -n, then as described in #1 above, the epilog is NOT ACTUALLY INVOKED by csh! Of course, the intention is that the epilog run for every job. This is a more serious bug, as users should not be able to keep the epilog from running. And when the user's shell is tcsh, it is even more fussy about its arguments. When csh sees an unknown argument, such as -w, it seems to ignore it: /bin/csh -c "echo -w arg2" -w arg2 -w arg2 But when tcsh sees an unknown argument, it complains: /bin/tcsh -c "echo -w arg2" -w arg2 Unknown option: `-w' Usage: tcsh [ -bcdefilmnqstvVxX ] [ argument ... ]. and exits with an error status (1). These notes were in the E-mail sent to the SGE administrative E-mail address: [26889:1783]: execvp(/bin/tcsh, "-tcsh" "-c" "/gridware/sge/debug_epilog -none -time " "-none" "-time") [53:1829]: wait3 returned 1873 (status: 256; WIFSIGNALED: 0, WIFEXITED: 1, WEXITSTATUS: 1) [53:1829]: epilog exited with exit status 1 [53:1829]: reaped "epilog" with pid 1873 [53:1829]: epilog exited not due to signal [53:1829]: epilog exited with status 1 Issue #4 is that if $SHELL is /bin/csh (and perhaps for tcsh, as well), the program should arguably be invoked with the -f (fast) flag, as well, that skips sourcing of the user's ~/.cshrc file. The reason this can matter is that if the user's ~/.cshrc file prints anything, that output is included in the job's output (after the target program is invoked, when the epilog invocation is attempted). tty and stty commands will often fail with a message like "not a tty". Again, using /usr/bin/env might be a better way to invoke the epilog than /bin/csh (or the user's $SHELL). ------- Additional comments from roland Tue Dec 6 08:34:01 -0700 2005 ------- *** Issue 1337 has been marked as a duplicate of this issue. ***
Change History (1)
comment:1 Changed 9 years ago by dlove
- Resolution set to fixed
- Severity set to minor
- Status changed from new to closed
Note: See
TracTickets for help on using
tickets.
fixed by RD-2005-10-25-0