[GE users] Erronous job execution

Hairul Ikmal Mohamad Fuzi hairul.ikmal at gmail.com
Wed Apr 12 02:19:18 BST 2006


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Hi everyone,

We have been running a program called MCNP (Monte Carlo N-Particle)
through SGE for quite sometime. Lately, the execution thorugh SGE was
erronous. Does anyone have any idea what actually happens because we
kept receiving this error through email (see below) every time we
submit an MCNP job ? Having said, at first we thought it was caused by
an errounous input file, unfortunately, it wasn't as I have checked
the input file with the application sitting in another PC.

TIA.

- Ikmal

==============================
Job 155 caused action: none
 User        = seang
 Queue       = all.q at hptc.local
 Host        = hptc.local
 Start Time  = <unknown>
 End Time    = <unknown>
failed before writing exit_status:shepherd exited with exit status 19
Shepherd trace:
03/30/2006 09:49:50 [400:1214]: shepherd called with uid = 0, euid = 400
03/30/2006 09:49:50 [400:1214]: starting up 6.0u6
03/30/2006 09:49:50 [400:1214]: setpgid(1214, 1214) returned 0
03/30/2006 09:49:50 [400:1214]: no prolog script to start
03/30/2006 09:49:50 [400:1217]: pid=1217 pgrp=1217 sid=1217 old
pgrp=1214 getlogin()=<no login set>
03/30/2006 09:49:50 [400:1217]: reading passwd information for user 'seang'
03/30/2006 09:49:50 [400:1217]: setosjobid: uid = 0, euid = 400
03/30/2006 09:49:50 [400:1217]: setting limits
03/30/2006 09:49:50 [400:1217]: RLIMIT_CPU setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)
03/30/2006 09:49:50 [400:1217]: RLIMIT_FSIZE setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)
03/30/2006 09:49:50 [400:1217]: RLIMIT_DATA setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)
03/30/2006 09:49:50 [400:1217]: RLIMIT_STACK setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)
03/30/2006 09:49:50 [400:1217]: RLIMIT_CORE setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)
03/30/2006 09:49:50 [400:1217]: RLIMIT_VMEM/RLIMIT_AS setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)
03/30/2006 09:49:50 [400:1217]: RLIMIT_RSS setting: (soft
18446744073709551615 hard 18446744073709551615) resulting: (soft
18446744073709551615 hard 18446744073709551615)
03/30/2006 09:49:50 [400:1217]: setting environment
03/30/2006 09:49:50 [400:1217]: Initializing error file
03/30/2006 09:49:50 [400:1217]: now doing chown(seang) of trace and error files
03/30/2006 09:49:50 [400:1217]: switching to intermediate/target user
03/30/2006 09:49:50 [511:1217]: now running with uid=511, euid=511
03/30/2006 09:49:50 [511:1217]: closing all filedescriptors
03/30/2006 09:49:50 [511:1217]: further messages are in "error" and "trace"
03/30/2006 09:49:50 [400:1214]: forked "job" with pid 1217
03/30/2006 09:49:50 [400:1214]: child: job - pid: 1217
03/30/2006 09:49:50 [511:1217]: using stdout as stderr
03/30/2006 09:49:50 [511:1217]: now running with uid=511, euid=511
03/30/2006 09:49:50 [511:1217]: execvp(/bin/bash, "-bash"
"/opt/gridengine/default/spool/hptc/job_scripts/155")

Shepherd pe_hostfile:
hptc.local 1 all.q at hptc.local <NULL>
==============================

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list