[GE users] jobs never die on nodes with mpich

Reuti reuti at staff.uni-marburg.de
Thu Aug 12 16:54:58 BST 2004


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Bogdan,

who will set the process group of a new command/bash?

Just an interactive test: sleep 100

  PID TT       USER     TMOUT   F WCHAN   PGRP COMMAND
30604 pts/1    reuti        - 004 wait4  30604      \_ -bash
30811 pts/1    reuti        - 000 nanosl 30811      |   \_ sleep 100

Should the process group be the same as the starting bash?

On the slave node, the job-specific-rshd runs under root, should it set a 
process id for the qrsh_starter and the first started command of qrsh_starter?

How will any of them know, that a new bash was created (forked)? Therefore my 
idea, to avoid the bash-in-the-middle. But I must admit, that I also saw chains 
of processes on the nodes and was confused, that all are having the same 
process group id. Until I discovered, that these were threads and not forks 
(e.g. Linda is creating threads on the slaves).

Should the process group id be inherited from the starting process, i.e. 
qrsh_starter?

BTW: The Myrinet MPICH seems to be have a different startup than ch_p4. Myrinet 
will make n-times qrsh for all of the n processes in the starting perl script 
of Myrinet. ch_p4 starts the master process without rsh, and only using qrsh 
(n-1) times. For the PE definition the job_is_first_task can be set 
accordingly.

Cheers - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list