[GE users] SGE 6.0u1: Job 11 failed: Getting shepherd exited with exit status 19

Ranga Srinivasan ranga at bizrate.com
Fri Oct 1 18:53:02 BST 2004

    [ The following text is in the "iso-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]


I created a qmaster on one machine and created another host for the
execution. I before creating the exaction host I copied all the files from
the machine that hosts the qmaster to the execution host machine.

When I a run qsub and the simple.sh. the process completes but send back the
abort mail.

What am I doing wrong ?

Also I was looking at the load sensor scripts and was wondering what type
data does anyone monitor. The example script in the manual is for number of
users. Are there anything else we can monitor.

I have included the abort mail that I get.

Thanks for all your help

-----Original Message-----
Subject: SGE 6.0u1: Job 11 failed

Job 11 caused action: none
 User        = gridadm
 Queue       = all.q at cruncher03.XXX.com
 Host        = cruncher03.XXX.com
 Start Time  = 10/01/2004 10:16:54
 End Time    = 10/01/2004 10:17:15
failed before writing exit_status:shepherd exited with exit status 19
Shepherd trace:
10/01/2004 10:16:54 [10461:22685]: shepherd called with uid = 0, euid =
10/01/2004 10:16:54 [10461:22685]: starting up 6.0u1
10/01/2004 10:16:54 [10461:22691]: closing all filedescriptors
10/01/2004 10:16:54 [10461:22691]: further messages are in "error" and
10/01/2004 10:16:54 [10461:22691]: using stdout as stderr
10/01/2004 10:16:54 [10461:22691]:

Shepherd pe_hostfile:
cruncher03.XXX.com 1 all.q at cruncher03.XXX.com UNDEFINED

To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net

More information about the gridengine-users mailing list