[GE users] Rescheduling emails

Andy Schwierskott andy.schwierskott at sun.com
Thu Sep 13 15:20:31 BST 2007


Chris,

that would require to change the execd. The reason is that the execd does
not look in the details of job execution that much that it checks if the 
job was not executed.

However the execd does knows that the shepherd after it has executed the
prolog - strictly spoken I'd even tend to agree to consider this as a bug -
the job never was executed nor did it end so there should be no job
begin/end mail. It rather could/should be a case for sending an email when
the "a" flag for the "-m" qsub option was given:

      -m b|e|a|s|n,...
           Available for qsub, qsh, qrsh, qlogin and qalter only.

           Defines or redefines under which circumstances mail  is
           to  be  sent  to  the job owner or to the users defined
           with the -M option described below.  The  option  argu-
           ments have the following meaning:

           'b'     Mail is sent at the beginning of the job.
           'e'     Mail is sent at the end of the job.
           'a'     Mail is sent when the job is aborted or
                   rescheduled.
           's'     Mail is sent when the job is suspended.
           'n'     No mail is sent.

           Currently no mail is sent when a job is suspended.


I'm curious for further comments and opinions of other end users on this.

Andy






On Thu, 13 Sep 2007, Chris MacPhee wrote:

> Hi folks,
>
> We have recently implimented a set of prolog and epilog scripts for one
> of our software queues; if the software cannot run due to
> memory/licensing/etc issues, the job is sent back to the queue (pe_start
> exit code 99).
>
> Unfortunately, it appears that SGE considers the prolog/epilog a full
> start/stop of the job and issues mail when the "-m be" flag is used.
>
> Is anyone aware of an option to SGE that will mail only on execution of
> the job itself, not the prolog/epilog?
>
> Thanks,
> - Chris
>
> ************
>
> Job 1234 (bleh.csh) Rescheduled
>  Exit Status      = -1
>  Signal           = unknown signal
>  User             = itsme
>  Queue            = production.q at mooland
>  Host             = mooland
>  Start Time       = <unknown>
>  End Time         = <unknown>
>  CPU              = NA
>  Max vmem         = NA
> failed rescheduling because:
> 09/10/2007 15:20:25 [60055:6044]: exit_status of pe_start = 99
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list