[GE users] Trying to use starter_method with sge 6.0

Andreas Haas Andreas.Haas at Sun.COM
Mon Nov 8 17:20:51 GMT 2004


Interesting.

Does a qsub -b y job use the starter method? DRMAA assumes
'drmaa_remote_command' be available at target machine, thus
DRMAA jobs are always started with '-b y' implicitely.

Andreas

On Mon, 8 Nov 2004, Robert Olson wrote:

> Argh, I keep forgetting this changes things.
>
> I am submitting jobs via the DRMAA interface. I find that if I submit
> using qsub, the starter_method works Just Fine. So apparently something in
> the way the DRMAA submit works that changes it.
>
> --bob
>
>    Bob Olson   -*-   olson at mcs.anl.gov   -*-   Argonne National Laboratory
>      "You can't win a game of chess with an action figure!" --bob&dave
>
> On Mon, 8 Nov 2004, Robert Olson wrote:
>
> > ah, I see.
> >
> > Now this is interesting. with the keep-exec stuff you pointed me at, I
> > find that if I set the starter_method to a script that doesn't exist, I
> > get this in the error file:
> >
> > 11/08/2004 10:14:29 [762:12301]: unable to find shell
> > "/home/olson/bin/run_gendb_tool_foo"
> >
> > however, if it's set to something valid, it doesn't appear to get used:
> >
> > 11/08/2004 10:12:29 [762:12212]:
> > execvp(/Users/fig/FIGdisk/env/mac/bin/blastall,
> > "/Users/fig/FIGdisk/env/mac/bin/blastall" "-d"
> > "/Users/fig/FIGdisk/gendb/databases/fasta/nr" "-i" "/dev/fd/0" "-p"
> > "blastp" "-m" "9")
> > 11/08/2004 10:12:29 [762:12206]: wait3 returned 12212 (status: 6912;
> > WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 27)
> >
> > the config file in the active_jobs dir does show the starter_method, but's
> > not being used. Full trace attached at end. I also don't get the log
> > message that is present in the starter_method script.
> >
> > Is it at all significant that the starter script is a perl script? I tried
> > adding perl to the shells lst; no difference.
> >
> > --bob
> >
> >
> > 11/08/2004 10:25:43 [762:12553]: shepherd called with uid = 762, euid = 762
> > 11/08/2004 10:25:43 [762:12553]: starting up 6.0u1
> > 11/08/2004 10:25:43 [762:12553]: warning: starting not as root (uid=762)
> > 11/08/2004 10:25:43 [762:12553]: setpgid(12553, 12553) returned 0
> > 11/08/2004 10:25:43 [762:12554]: pid=12554 pgrp=12554 sid=12554 old pgrp=12553 getlogin()=<no login set>
> > 11/08/2004 10:25:43 [762:12553]: forked "prolog" with pid 12554
> > 11/08/2004 10:25:43 [762:12553]: using signal delivery delay of 120 seconds
> > 11/08/2004 10:25:43 [762:12553]: child: prolog - pid: 12554
> > 11/08/2004 10:25:43 [762:12554]: tried to change uid/gid without being root
> > 11/08/2004 10:25:43 [762:12554]: try running further with uid=762
> > 11/08/2004 10:25:43 [762:12554]: closing all filedescriptors
> > 11/08/2004 10:25:43 [762:12554]: further messages are in "error" and "trace"
> > 11/08/2004 10:25:43 [762:12554]: using "/bin/bash" as shell of user "olson"
> > 11/08/2004 10:25:43 [762:12554]: execvp(/home/olson/SGE/transfer-prolog, "/home/olson/SGE/transfer-prolog" "1" "a6" "/Users/fig/FIGdisk/FIG/Tmp/cCydH4EW/1.fas" "/tmp/208.1.tg/1.fas")
> > 11/08/2004 10:25:44 [762:12553]: wait3 returned 12554 (status: 0; WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 0)
> > 11/08/2004 10:25:44 [762:12553]: prolog exited with exit status 0
> > 11/08/2004 10:25:44 [762:12553]: reaped "prolog" with pid 12554
> > 11/08/2004 10:25:44 [762:12553]: prolog exited not due to signal
> > 11/08/2004 10:25:44 [762:12553]: prolog exited with status 0
> > 11/08/2004 10:25:44 [762:12559]: pid=12559 pgrp=12559 sid=12559 old pgrp=12553 getlogin()=<no login set>
> > 11/08/2004 10:25:44 [762:12553]: forked "job" with pid 12559
> > 11/08/2004 10:25:44 [762:12559]: setosjobid: uid = 762, euid = 762
> > 11/08/2004 10:25:44 [762:12553]: child: job - pid: 12559
> > 11/08/2004 10:25:44 [762:12559]: RLIMIT_CPU setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> > 11/08/2004 10:25:44 [762:12559]: RLIMIT_FSIZE setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> > 11/08/2004 10:25:44 [762:12559]: RLIMIT_DATA setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> > 11/08/2004 10:25:44 [762:12559]: RLIMIT_STACK setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> > 11/08/2004 10:25:44 [762:12559]: RLIMIT_CORE setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> > 11/08/2004 10:25:44 [762:12559]: RLIMIT_VMEM/RLIMIT_AS setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> > 11/08/2004 10:25:44 [762:12559]: RLIMIT_RSS setting: (soft 18446744073709551615 hard 18446744073709551615) resulting: (soft 18446744073709551615 hard 18446744073709551615)
> > 11/08/2004 10:25:44 [762:12559]: tried to change uid/gid without being root
> > 11/08/2004 10:25:44 [762:12559]: try running further with uid=762
> > 11/08/2004 10:25:44 [762:12559]: closing all filedescriptors
> > 11/08/2004 10:25:44 [762:12559]: further messages are in "error" and "trace"
> > 11/08/2004 10:25:44 [762:12559]: execvp(/Users/fig/FIGdisk/env/mac/bin/blastpgp, "/Users/fig/FIGdisk/env/mac/bin/blastpgp" "-d" "/Users/fig/FIGdisk/gendb/databases/fasta/sprot.fas" "-i" "/dev/fd/0" "-j" "5" "-m" "9")
> > 11/08/2004 10:25:44 [762:12553]: wait3 returned 12559 (status: 6912; WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 27)
> > 11/08/2004 10:25:44 [762:12553]: job exited with exit status 27
> > 11/08/2004 10:25:44 [762:12553]: reaped "job" with pid 12559
> > 11/08/2004 10:25:44 [762:12553]: job exited not due to signal
> > 11/08/2004 10:25:44 [762:12553]: job exited with status 27
> > 11/08/2004 10:25:44 [762:12553]: now sending signal KILL to pid -12559
> > 11/08/2004 10:25:44 [762:12553]: no tasker to notify
> > 11/08/2004 10:25:44 [762:12553]: failed starting job
> > 11/08/2004 10:25:44 [762:12560]: pid=12560 pgrp=12560 sid=12560 old pgrp=12553 getlogin()=<no login set>
> > 11/08/2004 10:25:44 [762:12553]: forked "epilog" with pid 12560
> > 11/08/2004 10:25:44 [762:12553]: using signal delivery delay of 120 seconds
> > 11/08/2004 10:25:44 [762:12553]: child: epilog - pid: 12560
> > 11/08/2004 10:25:44 [762:12560]: tried to change uid/gid without being root
> > 11/08/2004 10:25:44 [762:12560]: try running further with uid=762
> > 11/08/2004 10:25:44 [762:12560]: closing all filedescriptors
> > 11/08/2004 10:25:44 [762:12560]: further messages are in "error" and "trace"
> > 11/08/2004 10:25:44 [762:12560]: using "/bin/bash" as shell of user "olson"
> > 11/08/2004 10:25:44 [762:12560]: execvp(/home/olson/SGE/transfer-epilog, "/home/olson/SGE/transfer-epilog" "1" "a6" "/Users/fig/FIGdisk/FIG/Tmp/cCydH4EW/1.stdout" "/tmp/208.1.tg/1.stdout" "1" "a6" "/Users/fig/FIGdisk/FIG/Tmp/cCydH4EW/1.stderr" "/tmp/208.1.tg/1.stderr")
> > 11/08/2004 10:25:46 [762:12553]: wait3 returned 12560 (status: 0; WIFSIGNALED: 0,  WIFEXITED: 1, WEXITSTATUS: 0)
> > 11/08/2004 10:25:46 [762:12553]: epilog exited with exit status 0
> > 11/08/2004 10:25:46 [762:12553]: reaped "epilog" with pid 12560
> > 11/08/2004 10:25:46 [762:12553]: epilog exited not due to signal
> > 11/08/2004 10:25:46 [762:12553]: epilog exited with status 0
> > 11/08/2004 10:25:46 [762:12553]: no tasker to notify
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> > For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list