[GE users] Trying to use starter_method with sge 6.0

Andreas Haas Andreas.Haas at Sun.COM
Mon Nov 8 15:43:03 GMT 2004


How do you know starter procedure wasn't started? Are you using
echo commands that would leave traces in jobs output?

For tracing into Grid Engine job start you'll need shepherds trace
output. You might try setting KEEP_ACTIVE execd_params in sge_conf(5)
with the execd where you test this

   # qconf -sconf durin | grep execd_param
   execd_params                 KEEP_ACTIVE=true

if you use the keep active setting the 'trace' file isn't removed
after job finish. In my case the trace file looks like this

   # cat $SGE_ROOT/default/spool/durin/active_jobs/189.1/trace
   11/08/2004 16:37:46 [115088:12471]: shepherd called with uid = 0, euid = 115088
   11/08/2004 16:37:46 [115088:12471]: starting up 6.0
   11/08/2004 16:37:46 [115088:12471]: setpgid(12471, 12471) returned 0
   11/08/2004 16:37:46 [115088:12471]: no prolog script to start
   11/08/2004 16:37:46 [115088:12471]: forked "job" with pid 12472
   11/08/2004 16:37:46 [115088:12472]: pid=12472 pgrp=12472 sid=12472 old pgrp=12471 getlogin()=<no login set>
   11/08/2004 16:37:46 [115088:12472]: setosjobid: uid = 0, euid = 115088
   11/08/2004 16:37:46 [115088:12471]: child: job - pid: 12472
   11/08/2004 16:37:46 [115088:12472]: RLIMIT_CPU setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
   11/08/2004 16:37:46 [115088:12472]: RLIMIT_FSIZE setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard
   4294967295)
   11/08/2004 16:37:46 [115088:12472]: RLIMIT_DATA setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard
   4294967295)
   11/08/2004 16:37:46 [115088:12472]: RLIMIT_STACK setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard
   4294967295)
   11/08/2004 16:37:46 [115088:12472]: RLIMIT_CORE setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard
   4294967295)
   11/08/2004 16:37:46 [115088:12472]: RLIMIT_VMEM/RLIMIT_AS setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard
   4294967295)
   11/08/2004 16:37:46 [115088:12472]: RLIMIT_RSS setting: (soft 4294967295 hard 4294967295) resulting: (soft 4294967295 hard 4294967295)
   11/08/2004 16:37:46 [115088:12472]: closing all filedescriptors
   11/08/2004 16:37:46 [115088:12472]: further messages are in "error" and "trace"
   11/08/2004 16:37:46 [115088:12472]: execvp(/home/ah114088/bin/starter.sh, "-starter.sh"
   "/cod_home/ah114088/SGE60/default/spool/durin/job_scripts/189" "5")
   11/08/2004 16:37:51 [115088:12471]: wait3 returned 12472 (status: 0; WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 0)
   11/08/2004 16:37:51 [115088:12471]: job exited with exit status 0
   11/08/2004 16:37:51 [115088:12471]: reaped "job" with pid 12472
   11/08/2004 16:37:51 [115088:12471]: job exited not due to signal
   11/08/2004 16:37:51 [115088:12471]: job exited with status 0
   11/08/2004 16:37:51 [115088:12471]: now sending signal KILL to pid -12472
   11/08/2004 16:37:51 [115088:12471]: writing usage file to "usage"
   11/08/2004 16:37:51 [115088:12471]: no tasker to notify
   11/08/2004 16:37:51 [115088:12471]: no epilog script to start

it proves starter was used (---> execvp(/home/ah114088/bin/starter.sh)

Btw. be sure you undo the keep active setting once you finish your debugging!

Andreas



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list