[GE users] 'Eqw' state of a job

Sreenath Nampally sreenath at tigr.ORG
Wed Mar 15 19:44:38 GMT 2006


Hi All,

One of my job went into 'Eqw' state due to permission problems to the
stderr and stdout files and stays in the queue for ever.  
Is there a way we can configure SGE so that the job actually fails when
it encounters this kind of problems?

( The command that I used : 
   "qsub -e /home/pandaadm/tmp/xx.err -o /home/pandaadm/tmp/xx.out
sleeper.pl 20"
    Both out and err files do not writable in this case)
) 

I have the following message in the email the SGE send to the admin.
How can I trap the exit code (from line 30) of prolog ? My prolog script
outputs few log messages to alog file but none of thos are executed in
this case.  

1 "Job 667 caused action: Job 667 set to ERROR
2  User        = sreenath
3 Queue       = default.q
4 Host        = sreenath-lx.tigr.org
5 Start Time  = <unknown>
6 End Time    = <unknown>
7 failed opening input/output file:03/15/2006 09:50:31 [1824:2107]:
error: can't open output file 8 "/home/pandaadm/tmp/xx.out": Permissi
9 Shepherd trace:
10 03/15/2006 09:50:31 [1132:2106]: shepherd called with uid = 0, euid =
1132
11 03/15/2006 09:50:31 [1132:2106]: starting up 6.0u7
12 03/15/2006 09:50:31 [1132:2106]: setpgid(2106, 2106) returned 0
13 03/15/2006 09:50:31 [1132:2107]: pid=2107 pgrp=2107 sid=2107 old
pgrp=2106 
14 getlogin()=<no login set>
15 03/15/2006 09:50:31 [1132:2107]: reading passwd information for user
'sgetest'
16 03/15/2006 09:50:31 [1132:2106]: forked "prolog" with pid 2107
17 03/15/2006 09:50:31 [1132:2106]: using signal delivery delay of 120
seconds
18 03/15/2006 09:50:31 [1132:2106]: child: prolog - pid: 2107
19 03/15/2006 09:50:31 [1132:2107]: setting limits
20 03/15/2006 09:50:31 [1132:2107]: setting environment
21 03/15/2006 09:50:31 [1132:2107]: Initializing error file
22 03/15/2006 09:50:31 [1132:2107]: switching to intermediate/target
user
23 03/15/2006 09:50:31 [1824:2107]: closing all filedescriptors
24 03/15/2006 09:50:31 [1824:2107]: further messages are in "error" and
"trace"
25 03/15/2006 09:50:31 [1824:2107]: using "/bin/bash" as shell of user
"sgetest"
26 03/15/2006 09:50:31 [1824:2107]: error: can't open output file 
27  "/home/pandaadm/tmp/xx.out": Permission denied
28  03/15/2006 09:50:31 [1132:2106]: wait3 returned 2107 (status: 6656;
WIFSIGNALED: 0,  
29  WIFEXITED: 1, WEXITSTATUS: 26)
30 03/15/2006 09:50:31 [1132:2106]: prolog exited with exit status 26
31  03/15/2006 09:50:31 [1132:2106]: reaped "prolog" with pid 2107
32  03/15/2006 09:50:31 [1132:2106]: prolog exited not due to signal
33 03/15/2006 09:50:31 [1132:2106]: prolog exited with status 26
34 03/15/2006 09:50:31 [1132:2106]: no tasker to notify
35 03/15/2006 09:50:31 [1132:2106]: exit states increased from 0 to 1
36 03/15/2006 09:50:31 [1132:2106]: failed starting prolog

37  Shepherd error:
38  03/15/2006 09:50:31 [1824:2107]: error: can't open output file 
39  "/home/pandaadm/tmp/xx.out": Permission denie"

Thanks for the help.
Sreenath

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list