[GE users] job not suspending?

Reuti reuti at staff.uni-marburg.de
Thu Sep 4 14:48:11 BST 2008


Hi,

Am 04.09.2008 um 15:35 schrieb Davide Cittaro:

> Hi all, I've noticed that a job I have has been stopped because it  
> overloaded a queue:
>
> low_priority.q at itanium1.bioinf BIP   6/6       3.53     lx26- 
> ia64     aA
> 	gl:disk_array1=0.040000
> 	gl:disk_array2=28.870000
>  162434 0.05000 srsbuild   root         T     09/04/2008 15:00:12  
> SLAVE
>                                                                    
> SLAVE
>                                                                    
> SLAVE
>                                                                    
> SLAVE
>                                                                    
> SLAVE
>                                                                    
> SLAVE
> ---------------------------------------------------------------------- 
> ------
> low_priority.q at itanium2.bioinf BIP   6/6       2.92     lx26- 
> ia64     aA
> 	gl:disk_array1=0.040000
> 	gl:disk_array2=28.870000
>  162434 0.05000 srsbuild   root         T     09/04/2008 15:00:12  
> MASTER
>                                                                    
> SLAVE
>                                                                    
> SLAVE
>                                                                    
> SLAVE
>                                                                    
> SLAVE
>                                                                    
> SLAVE
>                                                                    
> SLAVE
>
>
> Nevertheless, on each node I can clearly see that processes are in  
> R status, they are marked stoppen only on MASTER (in this case  
> itanium2)...
> This is a qmake job:

this is by design in SGE. If you want to suspend a parallel job, you  
will have to use a custom suspend_method, which will be called on the  
node where the jobscript runs, and then ssh to each node to put these  
processes to sleep. The same stands for waking them up again with a  
resume_method.

-- Reuti

> qmake -N srsbuild -cwd -V -l arch=lx26-ia64 -l low=TRUE -pe make  
> 1-20 -- -f $SRSETC/srsupdmakefile all
>
> How can I fix this?
> d
>
> /*
> Davide Cittaro
>
> Cogentech - Consortium for Genomic Technologies
> via adamello, 16
> 20139 Milano
> Italy
>
> tel.: +39(02)574303007
> e-mail: davide.cittaro at ifom-ieo-campus.it
> */
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list