[GE users] getting failed before writing exit_status:shepherd exited with exit status 19 --- Additional info

Ranga Srinivasan ranga at bizrate.com
Wed Oct 6 17:45:46 BST 2004


    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi

I am resending the email as I did not get any answer to my problem.
Some additional info:

1. /grid/gridware01/default/spool/cruncher01/job_scripts/1 is created as
-rw-r--r--    1 gridadm  gridadm       992 Oct  6 09:25 1

2. All the directories are 777 from /grid onwards.

3. If I try to run the process on the gridmaster machine itself I get the
same error messages.

I am totally confused after doing all that I get the error email even though
the process completes the execution of the script successfully.

Questions

1. What directory should I be looking at to see if it has the permission ?
2. Is there a better way to debugging this problem?
3. Are there any docs on all the error messages that come out of the
shepherd process ?
4. What does "failed before writing exit_status:shepherd exited with exit
status 19" mean.


Can someone help me to resolve this issue

Thanks in Advance

Ranga


-----Original Message-----
From: Ranga Srinivasan [mailto:ranga at bizrate.com]
Sent: Monday, October 04, 2004 4:02 PM
To: users at gridengine.sunsource.net
Subject: [GE users] getting failed before writing exit_status:shepherd
exited with exit status 19


Hi

After recreating the whole Grid setup to use one nfs mounted across the
gridmaster and the execution host.I am getting the same errors. What does "
failed before writing exit_status:shepherd exited with exit status 19" mean.
I did chmod -R 777 on the gridware directory. So I am assuming it should be
a file permission issue.

It completes the simple.sh script, but sends the error email.

Is there something else I need to do to make it work w/o sending me error
messages.

Any help/ pointer realy helpful. I am totally confused as to why I am
getting an error message after ensuring the gridmaster and the execution
host share the same nfs mounted device.

Thanks again

Ranga

-----Original Message-----
From: root [mailto:root at cruncher01.bizrate.com]
Sent: Monday, October 04, 2004 3:54 PM
To: ranga at bizrate.com
Subject: SGE 6.0u1: Job 1 failed


Job 1 caused action: none
 User        = gridadm
 Queue       = all.q at cruncher01.bizrate.com
 Host        = cruncher01.bizrate.com
 Start Time  = 10/04/2004 15:53:18
 End Time    = 10/04/2004 15:53:38
failed before writing exit_status:shepherd exited with exit status 19
Shepherd trace:
10/04/2004 15:53:18 [10461:4462]: shepherd called with uid = 0, euid = 10461
10/04/2004 15:53:18 [10461:4462]: starting up 6.0u1
10/04/2004 15:53:18 [10461:4466]: closing all filedescriptors
10/04/2004 15:53:18 [10461:4466]: further messages are in "error" and
"trace"
10/04/2004 15:53:18 [10461:4466]: using stdout as stderr
10/04/2004 15:53:18 [10461:4466]: execvp(/bin/bash, "bash"
"/grid/gridware01/default/spool/cruncher01/job_scripts/1")

Shepherd pe_hostfile:
cruncher01.bizrate.com 1 all.q at cruncher01.bizrate.com UNDEFINED



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list