[GE users] shepherd of job <JOB_ID> exited with exit status = 7

Tobias Raab (SGI) Tobias.Raab at partner.bmw.de
Tue Nov 16 13:05:34 GMT 2004


Hello,

a small correction: all queues of the exec-host go to error-status (not just the queue with the job causing the problem).

> egrep QERROR messages
Mon Nov 15 17:43:15 2004|qmaster|spsdm1c1|E|queue S43a marked QERROR as result of job 213200's failure
Mon Nov 15 17:43:15 2004|qmaster|spsdm1c1|E|queue Mpp43a marked QERROR as result of job 213200's failure at host hppam43a.muc
Mon Nov 15 17:43:15 2004|qmaster|spsdm1c1|E|queue RMpp43a marked QERROR as result of job 213200's failure at host hppam43a.muc
Mon Nov 15 17:43:15 2004|qmaster|spsdm1c1|E|queue S43a marked QERROR as result of job 213200's failure at host hppam43a.muc

Tobias


Tobias.Raab at partner.bmw.de wrote:
> Hello,
> 
> I'm facing a problem with an exiting shepherd. From time to time I get 
> an error in the messages file and the queue (qname S43a) goes to 
> error-state. Are there any suggestions what "exit status 7" stands for? 
> Is low disk space on the exec-spool-dir a possible reason? By now I 
> could not find any hints in the OS-syslogs ...
> 
> Regards & thanks in advance
> Tobias
> 
> messages on exec-host:
> --- snip ---
> Mon Nov 15 17:43:15 2004|execd|hppam43a|E|shepherd of job 213200.1 
> exited with exit status = 7
> Mon Nov 15 17:43:15 2004|execd|hppam43a|E|"abnormal termination of 
> shepherd for job 213200.1: no "exit_status" file"
> Mon Nov 15 17:43:15 2004|execd|hppam43a|E|cant open file 
> active_jobs/213200.1/error: No such file or directory
> Mon Nov 15 17:43:15 2004|execd|hppam43a|E|can't open pid file 
> "active_jobs/213200.1/pid" for job 213200.1
> --- snip ---
> 
> 
> messages on qmaster:
> --- snip ---
> Mon Nov 15 17:43:15 2004|qmaster|spsdm1c1|W|job 213200.1 failed on host 
> hppam43a.muc general  before prolog because: shepherd exited with exit 
> status 7
> --- snip ---
> 
> 
> corresponding accounting information for the job:
> =================================================
> qname        S43a
> hostname     hppam43a.muc
> group        BLANKED
> owner        BLANKED
> project      BLANKED
> department   defaultdepartment
> jobname      BLANKED
> jobnumber    213200
> taskid       undefined
> account      8364
> priority     0
> qsub_time    Mon Nov 15 17:43:03 2004
> start_time   -/-
> end_time     -/-
> granted_pe   none
> slots        0
> failed       7   : before prolog
> exit_status  0
> ru_wallclock 0
> ru_utime     0
> ru_stime     0
> ru_maxrss    0
> ru_ixrss     0
> ru_ismrss    0
> ru_idrss     0
> ru_isrss     0
> ru_minflt    0
> ru_majflt    0
> ru_nswap     0
> ru_inblock   0
> ru_oublock   0
> ru_msgsnd    0
> ru_msgrcv    0
> ru_nsignals  0
> ru_nvcsw     0
> ru_nivcsw    0
> cpu          0
> mem          0.000
> io           0.000
> iow          0.000
> maxvmem      0.000000
> 

-- 
____________________________________________________________

Tobias Raab, SGI                        BMW AG
IT-Management-Service                   Werk 1.5 (FIZ)
Tel: 089/382-46976                      Geb. 12.0 - 1. Stock
Fax: 089/382-42820                      Knorrstr. 147
                                        80788 Muenchen
____________________________________________________________



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list