[GE users] Suspending jobs submitted with -notify

Göran Uddeborg uddeborg at carmen.se
Fri Jan 28 16:33:48 GMT 2005


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

> As Dan pointed out: are you using trap "" USR1 in your script?

We investigated further, and that is a clue.  What Olle did was start
a binary, not a job script.  He used the flag "-b y".  (And he did not
use -noshell.)  So he gets this process tree

  shepherd -> sh -c a.out -> a.out

Nothing here makes the shell ignore USR1.  As it is, the flags
"-notify" and "-b y" are essentially incompatible.

It works again if "-noshell" is also used.  Or you could modify the
submitted command to work around the problem.  But it isn't obvious
you have to do either.  I consider this a bug.

I can see two possible modifications:

- The command string given to the shell could be prepended with
  "exec".  So that `qsub -b y apa bepa' results in `-sh -c "exec apa bepa"'.

  It would work in many cases.  But it would be a bit sensitive to
  redirections or variables settings before the command proper.  The
  command line parsing would probably be incomplete in practice.
  Consider

    qsub -b y '<' apa X=Y '>' bepa command argument

  And then consider that there are different shells sh, tcsh,
  python ...

- sge_shepherd could set up to ignore the USR1 and USR2 signals in the
  child before exec()ing the command.

  While in a way more intrusive, this would make some sense.  (For all
  submission styles, not just "-b y".)  For these processes started
  under "-notify", the USR1/2 signals has the specified meaning, they
  are an advance warning.  With that meaning, it does make sense to
  ignore these signals, unless you know what to do with them.  And
  those processes in the job, whether at the top or further down the
  tree, that do know something to do, can catch the signal(s) as
  today.

  It would also close the time window mentioned in 1440, subproblem 2.

(Hm, maybe this should be moved to the devel list?)

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list