[GE users] debugging tight integration

pollinger harald.pollinger at sun.com
Thu Dec 17 17:07:08 GMT 2009


fx wrote:
> pollinger <harald.pollinger at sun.com> writes:
> 
>> I don't know if this was already answered, but when a process was 
>> started by SGE and is not a children of the shepherd when it's running, 
>> then it detached itself from the shepherd.
>> What's the parent process ID of these processes?
> 
> They (`gamess.64.x' in the pstree below) are just children of init:
> 
>   init-+-acpid
>        |-agetty
>        |-console-kit-dae---61*[{console-kit-dae}]
>        |-cron
>        |-dbus-daemon
>        |-dhcpcd
>        |-4*[gamess.64.x]
>        |-gmond
>        |-hald---hald-runner-+-hald-addon-acpi
>        |                    `-hald-addon-inpu
>        |-irqbalance
>        |-klogd
>        |-master-+-pickup
>        |        `-qmgr
>        |-6*[mingetty]
>        |-nscd---8*[{nscd}]
>        |-ntpd
>        |-portmap
>        |-rpc.statd
>        |-scoutd.exe
>        |-sge_execd---sge_shepherd---rshd---qrsh_starter
>   ...
> 
> [I realize there's junk running as I haven't properly purged our
> so-called integrator's setup yet.]
> 
> I don't see anything odd in the source from a quick look, and was
> particularly interested in typical things that might cause this from
> experience, in the hope of avoiding all the work of debugging it
> systematically.

So the process chain from the sge_execd to the qrsh_starter is fine, but 
the job itself (gamess.64.x) is not a child of the qrsh_starter, but a 
child of init. And I'm missing a shell at the end of the process chain. 
Did you specify the "-shell no" option to qrsh?

It seems either the job script exited/died or gamess daemonized itself. 
But then I'm wondering why the qrsh_starter doesn't quit.

You could replace gamess by a script like this:
#!/bin/sh

echo "starting"
sleep 100
echo "done"
exit 0


and start it with exactly the same command line. If it works fine and is 
a child (or a child of a child) of the qrsh_starter, gamess itself does 
something wrong.


Regards,
Harald




-- 
Sun Microsystems GmbH         Harald Pollinger
Dr.-Leo-Ritter-Str. 7         Sun Grid Engine Engineering
D-93049 Regensburg            Phone: +49 (0)941 3075-209  (x60209)
Germany                       Fax: +49 (0)941 3075-222  (x60222)
http://www.sun.com/gridware
mailto:harald.pollinger at sun.com
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=233952

To unsubscribe from this discussion, e-mail: [users-unsubscribe at gridengine.sunsource.net].



More information about the gridengine-users mailing list