[GE users] Suspend/Resume with MPICH-GM

Reuti reuti at staff.uni-marburg.de
Wed Mar 29 10:25:25 BST 2006


    [ The following text is in the "UTF-8" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some characters may be displayed incorrectly. ]

Hi,

Am 29.03.2006 um 11:04 schrieb Andrew Beresford:

> Hello,
>
> I'm having a problem with jobs running in our MPICH-GM PE.
>
> When I issue a qmod -sj <blah> to grid engine nothing seems to happen.
> This only seems to affect our MPICH PE, the jobs running under OpenMP
> seem to be fine.
>

parallel suspend isn't implemented in SGE for the slave tasks, as  
this might lead to timing problems anyway. - Reuti

> Here's an example of the pstree of the processes running on the  
> workers;
>
> ??scsi_eh_0
> ??sge_execd???sge_shepherd???rshd???qrsh_starter 
> ???bash???fluent-run- 
> mep0???fluent???fluent_gmpi.6.2
>
>
> If I try to stop the job running fluent_gmpi.6.2 by using qmod -sj,
> nothing happens.
>
> If I try to send a SIGSTOP to the bash process under qrsh_starter,  
> again
> nothing happens.
>
> It only suspends if I send a SIGSTOP to the "fluent_gmpi.6.2".
>
> I'm unsure how SGE suspends processes. Does it just send a SIGSTOP to
> the single process at the top, or does it traverse the process tree  
> and
> send SIGSTOP to all processess underneath qrsh_starter.
>
> Is there anything I can do to fix this?
>
> Cheers,
>
> Andrew

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list