[GE users] Trying to use starter_method with sge 6.0

Robert Olson olson at mcs.anl.gov
Mon Nov 8 16:59:08 GMT 2004


Argh, I keep forgetting this changes things.

I am submitting jobs via the DRMAA interface. I find that if I submit 
using qsub, the starter_method works Just Fine. So apparently something in 
the way the DRMAA submit works that changes it.

--bob

   Bob Olson   -*-   olson at mcs.anl.gov   -*-   Argonne National Laboratory
     "You can't win a game of chess with an action figure!" --bob&dave

On Mon, 8 Nov 2004, Robert Olson wrote:

> ah, I see. 
> 
> Now this is interesting. with the keep-exec stuff you pointed me at, I 
> find that if I set the starter_method to a script that doesn't exist, I 
> get this in the error file:
> 
> 11/08/2004 10:14:29 [762:12301]: unable to find shell 
> "/home/olson/bin/run_gendb_tool_foo"
> 
> however, if it's set to something valid, it doesn't appear to get used:
> 
> 11/08/2004 10:12:29 [762:12212]: 
> execvp(/Users/fig/FIGdisk/env/mac/bin/blastall, 
> "/Users/fig/FIGdisk/env/mac/bin/blastall" "-d" 
> "/Users/fig/FIGdisk/gendb/databases/fasta/nr" "-i" "/dev/fd/0" "-p" 
> "blastp" "-m" "9")
> 11/08/2004 10:12:29 [762:12206]: wait3 returned 12212 (status: 6912; 
> WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 27)
> 
> the config file in the active_jobs dir does show the starter_method, but's 
> not being used. Full trace attached at end. I also don't get the log 
> message that is present in the starter_method script.
> 
> Is it at all significant that the starter script is a perl script? I tried 
> adding perl to the shells lst; no difference.
> 
> --bob
> 
> 
> 11/08/2004 10:25:43 [762:12553]: shepherd called with uid = 762, euid = 762
> 11/08/2004 10:25:43 [762:12553]: starting up 6.0u1
> 11/08/2004 10:25:43 [762:12553]: warning: starting not as root (uid=762)
> 11/08/2004 10:25:43 [762:12553]: setpgid(12553, 12553) returned 0
> 11/08/2004 10:25:43 [762:12554]: pid=12554 pgrp=12554 sid=12554 old pgrp=12553 getlogin()=<no login set>
> 11/08/2004 10:25:43 [762:12553]: forked "prolog" with pid 12554
> 11/08/2004 10:25:43 [762:12553]: using signal delivery delay of 120 seconds
> 11/08/2004 10:25:43 [762:12553]: child: prolog - pid: 12554
> 11/08/2004 10:25:43 [762:12554]: tried to change uid/gid without being root
> 11/08/2004 10:25:43 [762:12554]: try running further with uid=762
> 11/08/2004 10:25:43 [762:12554]: closing all filedescriptors
> 11/08/2004 10:25:43 [762:12554]: further messages are in "error" and "trace"
> 11/08/2004 10:25:43 [762:12554]: using "/bin/bash" as shell of user "olson"
> 11/08/2004 10:25:43 [762:12554]: execvp(/home/olson/SGE/transfer-prolog, "/home/olson/SGE/transfer-prolog" "1" "a6" "/Users/fig/FIGdisk/FIG/Tmp/cCydH4EW/1.fas" "/tmp/208.1.tg/1.fas")
> 11/08/2004 10:25:44 [762:12553]: wait3 returned 12554 (status: 0; WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 0)
> 11/08/2004 10:25:44 [762:12553]: prolog exited with exit status 0
> 11/08/2004 10:25:44 [762:12553]: reaped "prolog" with pid 12554
> 11/08/2004 10:25:44 [762:12553]: prolog exited not due to signal
> 11/08/2004 10:25:44 [762:12553]: prolog exited with status 0
> 11/08/2004 10:25:44 [762:12559]: pid=12559 pgrp=12559 sid=12559 old pgrp=12553 getlogin()=<no login set>
> 11/08/2004 10:25:44 [762:12553]: forked "job" with pid 12559
> 11/08/2004 10:25:44 [762:12559]: setosjobid: uid = 762, euid = 762
> 11/08/2004 10:25:44 [762:12553]: child: job - pid: 12559
> 11/08/2004 10:25:44 [762:12559]: RLIMIT_CPU setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> 11/08/2004 10:25:44 [762:12559]: RLIMIT_FSIZE setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> 11/08/2004 10:25:44 [762:12559]: RLIMIT_DATA setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> 11/08/2004 10:25:44 [762:12559]: RLIMIT_STACK setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> 11/08/2004 10:25:44 [762:12559]: RLIMIT_CORE setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> 11/08/2004 10:25:44 [762:12559]: RLIMIT_VMEM/RLIMIT_AS setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> 11/08/2004 10:25:44 [762:12559]: RLIMIT_RSS setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> 11/08/2004 10:25:44 [762:12559]: tried to change uid/gid without being root
> 11/08/2004 10:25:44 [762:12559]: try running further with uid=762
> 11/08/2004 10:25:44 [762:12559]: closing all filedescriptors
> 11/08/2004 10:25:44 [762:12559]: further messages are in "error" and "trace"
> 11/08/2004 10:25:44 [762:12559]: execvp(/Users/fig/FIGdisk/env/mac/bin/blastpgp, "/Users/fig/FIGdisk/env/mac/bin/blastpgp" "-d" "/Users/fig/FIGdisk/gendb/databases/fasta/sprot.fas" "-i" "/dev/fd/0" "-j" "5" "-m" "9")
> 11/08/2004 10:25:44 [762:12553]: wait3 returned 12559 (status: 6912; WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 27)
> 11/08/2004 10:25:44 [762:12553]: job exited with exit status 27
> 11/08/2004 10:25:44 [762:12553]: reaped "job" with pid 12559
> 11/08/2004 10:25:44 [762:12553]: job exited not due to signal
> 11/08/2004 10:25:44 [762:12553]: job exited with status 27
> 11/08/2004 10:25:44 [762:12553]: now sending signal KILL to pid -12559
> 11/08/2004 10:25:44 [762:12553]: no tasker to notify
> 11/08/2004 10:25:44 [762:12553]: failed starting job
> 11/08/2004 10:25:44 [762:12560]: pid=12560 pgrp=12560 sid=12560 old pgrp=12553 getlogin()=<no login set>
> 11/08/2004 10:25:44 [762:12553]: forked "epilog" with pid 12560
> 11/08/2004 10:25:44 [762:12553]: using signal delivery delay of 120 seconds
> 11/08/2004 10:25:44 [762:12553]: child: epilog - pid: 12560
> 11/08/2004 10:25:44 [762:12560]: tried to change uid/gid without being root
> 11/08/2004 10:25:44 [762:12560]: try running further with uid=762
> 11/08/2004 10:25:44 [762:12560]: closing all filedescriptors
> 11/08/2004 10:25:44 [762:12560]: further messages are in "error" and "trace"
> 11/08/2004 10:25:44 [762:12560]: using "/bin/bash" as shell of user "olson"
> 11/08/2004 10:25:44 [762:12560]: execvp(/home/olson/SGE/transfer-epilog, "/home/olson/SGE/transfer-epilog" "1" "a6" "/Users/fig/FIGdisk/FIG/Tmp/cCydH4EW/1.stdout" "/tmp/208.1.tg/1.stdout" "1" "a6" "/Users/fig/FIGdisk/FIG/Tmp/cCydH4EW/1.stderr" "/tmp/208.1.tg/1.stderr")
> 11/08/2004 10:25:46 [762:12553]: wait3 returned 12560 (status: 0; WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 0)
> 11/08/2004 10:25:46 [762:12553]: epilog exited with exit status 0
> 11/08/2004 10:25:46 [762:12553]: reaped "epilog" with pid 12560
> 11/08/2004 10:25:46 [762:12553]: epilog exited not due to signal
> 11/08/2004 10:25:46 [762:12553]: epilog exited with status 0
> 11/08/2004 10:25:46 [762:12553]: no tasker to notify
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list