[GE users] job suspension

Reuti reuti at staff.uni-marburg.de
Thu May 22 09:15:10 BST 2008


Hi,

Am 22.05.2008 um 08:24 schrieb Viktor Oudovenko:

> Hello to everybody,
>
> any ideas why job suspension does not work?
>
> I have SGE 6.0u4 running on dual Athlon server.
> Job is parallel (tight integration).
> Queue status correctly changes to "S" but job continue to run (so  
> both jobs continue to run).
> plz see below:
>
>  185157 5.00400 mpi_p1       user1        r     05/22/2008 02:06:11  
> wparallel1 at sub04n103              64
>  185081 2.01388 mpi_p2       user2        S     05/21/2008 22:13:02  
> wparallel1_lp at sub04n103           64
>
> So , queue with "wparallel3_lp" (low priority) is defined as  
> subordinated queue of  wparallel3.
>
> it seems to me when I created queue "_lp" and tested  job  
> suspension under my account is worked on x86 architecture and did  
> not work on opterons but now it does not work even on x86 machines.
>
> I found info in the net that 6.0u4 does have bug that after  
> sgemaster restart jobs are not suspended but I have not restarted  
> the master rather only computed nodes.

SGE will send a -sigstop to the complete processgroup of the job. So  
please check, wether it's in the correct group.

ps -e f -o pid,ppid,pgrp,command

(f w/o -). - Reuti

> If any questions plz let me know.
> any ideas are welcome.
>
> best,
> vic
> p.s.
> CLUSTER QUEUE    CQLOAD   USED  AVAIL  TOTAL aoACDS  cdsuE
> ---------------------------------------------------------------------- 
> ---------
> wparallel1                          1.82     64      0     64       
> 0      0
> wparallel1_lp                     1.82     64      0     64      
> 64      0
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list