[GE users] question on custom 'suspend method' ...

TRAN Chanh chanh.tran at dassault-aviation.fr
Mon May 30 18:01:09 BST 2005


Reuti wrote:

>Okay, can you try a procedure:
>
>#!/bin/sh
>while [ 1 ]; do :; done
>
>for the suspend_method. A "ps -e f" on the node should show something like:
>
> 7747 ?        S      3:34 /usr/sge/bin/lx26-x86/sge_execd
> 1738 ?        S      0:00  \_ sge_shepherd-3038 -bg
> 1739 ?        Ss     0:00      \_ /bin/sh /var/spool/sge/x/job_scripts/3038
> 1741 ?        R      6:11      |   \_ /home/reuti/ever
> 1756 ?        R      0:06      \_ /bin/sh /home/reuti/abcdef
>
>CU - Reuti
>
>
>  
>
Tried your procedure & got the same output via 'ps -ef' before & after 
'suspend' ...
BTW, I tried doing the same thing w/ version SGE 6.0u3 & its works !!!
So, I'm going to give up this w/ SGE 5.3p6 & upgrade to SGE 6.0u3/4 ASAP

Thanks for your help anyway,
Chanh

>Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
>
>  
>
>>Reuti wrote:
>>
>>    
>>
>>>Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
>>>
>>> 
>>>
>>>      
>>>
>>>>Reuti wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>Is there any error message in the messages file (of the qmaster or the
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>node)? 
>>>>   
>>>>
>>>>        
>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>Sorry to ask but where I can check these msg ?
>>>>   
>>>>
>>>>        
>>>>
>>>By default these are located in $SGE_ROOT/default/spool/qmaster and
>>>      
>>>
>>$SGE_ROOT/
>>    
>>
>>>default/spool/<name_of_node>, unless you have defined a custom spool
>>>      
>>>
>>directory. 
>>    
>>
>>>- Reuti
>>>
>>> 
>>>
>>>      
>>>
>>I checked file 'messages' in qmaster + node and found nothing related to 
>>my script ...
>>
>>    
>>
>>>>>You tried your script interactive also on the execution nodes, maybe /nfs
>>>>>          
>>>>>
>>is
>>    
>>
>>>>>     
>>>>>
>>>>>mounted there without "exec"? - Reuti
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>I tried to run my script on execution nodes, its works
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Reuti wrote:
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>One thing I just saw: the changes to the queue will only be accepted 
>>>>>>>before the job starts to run on the node. Changing the queue 
>>>>>>>definition of then suspend method while the job is already running, 
>>>>>>>will not invoke it. - Reuti
>>>>>>>
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>I did try also 'qmod -s job_id' w/ same result .
>>>>>>The way I proceeded is :
>>>>>>
>>>>>>1. set up the method in queue
>>>>>>2. 'qsub' job to queue
>>>>>>3. test w/ 'qmod -s queue' & 'qmod -s job_id'
>>>>>>
>>>>>>All I did observe in both cases is my job 's state changed from 
>>>>>>'running' to 'suspended' ....
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>TRAN Chanh wrote:
>>>>>>>
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Reuti wrote:
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>Mmh, for me it's working (the default and also custom procedures). 
>>>>>>>>>What in detail do you observe. E.g., having a running job, issuing a
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>'qmod -s ...' and log in to the the node. Then the 'ps -e f' should 
>>>>>>>>>list the status of the job as 'T' for stopped (on Linux).
>>>>>>>>>
>>>>>>>>>Having a custom procedure, can you try to echo something to a file 
>>>>>>>>>in your home directory? This way we might check, whether the 
>>>>>>>>>procedure is invoked at all.
>>>>>>>>>
>>>>>>>>>What platform are you on? - Reuti
>>>>>>>>>
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>I've have my 'qmaster' on  'AIX 5.2'  &  my execution plateforms 're 
>>>>>>>>on 'Linux RedHat Enterprise 3.0'.
>>>>>>>>I just re-double-checked my test case which is :
>>>>>>>>
>>>>>>>>- queue named 'queue.q' in which I defined a suspend method called 
>>>>>>>>'/nfs/suspend.sh'
>>>>>>>>- /nfs/suspend.sh :
>>>>>>>>#!/bin/ksh
>>>>>>>>output=/nfs/test.out
>>>>>>>>date >| $output
>>>>>>>>echo suspend  >> $output
>>>>>>>>
>>>>>>>>- /nfs/suspend.sh is set to 777
>>>>>>>>- /nfs/test.out is set to 777
>>>>>>>>- I did try this script to make sure its works in observing traces 
>>>>>>>>produced by 'date' & 'echo' in /nfs/test.out
>>>>>>>>- With 'qmod -s queue.q' executed from my 'qmaster' plateform, I did 
>>>>>>>>see no change in /nfs/test.out ...
>>>>>>>>
>>>>>>>>Chanh
>>>>>>>>
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>TRAN Chanh wrote:
>>>>>>>>>
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>Reuti,
>>>>>>>>>>
>>>>>>>>>>Sorry for not having said I did try 'qmod -s' to trigger the 
>>>>>>>>>>'suspend method' but saw no effect ... That 's why I posted ...
>>>>>>>>>>
>>>>>>>>>>Reuti wrote:
>>>>>>>>>>
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>>>You can use in 5.3p6, but the command is always 'qmod -s ...'. 
>>>>>>>>>>>It's the syntax, which is deprecated for 6.0. - Reuti
>>>>>>>>>>>
>>>>>>>>>>>TRAN Chanh wrote:
>>>>>>>>>>>
>>>>>>>>>>>            
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>>>If I get U right, this means I can't have this behavior under SGE
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>>5.3p6 ....
>>>>>>>>>>>>
>>>>>>>>>>>>Thanks a lot anyway,
>>>>>>>>>>>>Cheers
>>>>>>>>>>>>
>>>>>>>>>>>>Reuti wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>              
>>>>>>>>>>>>
>>>>>>>>>>>>                   
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>>>Yes, for 6.0 there are the new options 'qmod -sj <job_id>' and
>>>>>>>>>>>>>'qmod -sq <queue_name>'. And also any set suspend thresholds or 
>>>>>>>>>>>>>subordinations might invoke the suspend-method. - Reuti
>>>>>>>>>>>>>
>>>>>>>>>>>>>Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                
>>>>>>>>>>>>>
>>>>>>>>>>>>>                     
>>>>>>>>>>>>>
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>>>>>>>>>>>>>Reuti wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                  
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                       
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Chanc,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>the methods will be invoked, when a job e.g. has to be
>>>>>>>>>>>>>>>                         
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                              
>>>>>>>>>>>>>>>
>>>>suspended.
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>                    
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                         
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                              
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Reuti,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>What leads a job to state "has to be suspended", can this be 
>>>>>>>>>>>>>>triggered by 'qmod -s job_id' or 'qmod -s queue_name' something
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>alike ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                  
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                       
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>The default action is to send a sigstop to the whole process 
>>>>>>>>>>>>>>>group in this case. If you define a procedure on your own, you
>>>>>>>>>>>>>>>                              
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>can use some special variables, which will give you e.g. the 
>>>>>>>>>>>>>>>PID and do any cleanup or other things that are necessary (see
>>>>>>>>>>>>>>>                              
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>man queue_conf):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>suspend_method /usr/sge/mysuspend $job_pid
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>and the script:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>#!/bin/sh
>>>>>>>>>>>>>>>kill -stop -- -$1
>>>>>>>>>>>>>>>exit 0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Should behave like the default built-in if you suspend a job. 
>>>>>>>>>>>>>>>- Reuti
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>TRAN Chanh wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                         
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                              
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>Hi all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>I've pb understanding how the custom 'suspend / resume 
>>>>>>>>>>>>>>>>/terminate' method in a queue configuration works ?
>>>>>>>>>>>>>>>>How 're this methods related to action 'suspend/resume' on a 
>>>>>>>>>>>>>>>>queue via 'qmon' ?
>>>>>>>>>>>>>>>>More precisely, what I'm trying to do is to have this method 
>>>>>>>>>>>>>>>>triggered via 'suspend / resume' from 'qmon' ...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>Will someone please give me some insights on this matter ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>Thanks in advance,
>>>>>>>>>>>>>>>>Chanh
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                      
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                           
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                                
>>>>>>>>>>>>>>>>
>>>>>>--------------------------------------------------------------------- 
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>>>>>>>>To unsubscribe, e-mail: 
>>>>>>>>>>>>>>>>users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>>>>>>>>    
>>>>>>>>>>>>>>>>                      
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                           
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>                                
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                    
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                         
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                              
>>>>>>>>>>>>>>>
>>>>>>--------------------------------------------------------------------- 
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>>>>>>>To unsubscribe, e-mail: 
>>>>>>>>>>>>>>>users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  
>>>>>>>>>>>>>>>                    
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                         
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>                              
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                  
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                       
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>--------------------------------------------------------------------- 
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>>>>>>To unsubscribe, e-mail:
>>>>>>>>>>>>>>                       
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>users-unsubscribe at gridengine.sunsource.net
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                  
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                       
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>>>>>>>>>>>---------------------------------------------------------------------
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>>>                
>>>>>>>>>>>>>
>>>>>>>>>>>>>To unsubscribe, e-mail:
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>users-unsubscribe at gridengine.sunsource.net
>>    
>>
>>>>>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                
>>>>>>>>>>>>>
>>>>>>>>>>>>>                     
>>>>>>>>>>>>>
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>>>>>>>>>>---------------------------------------------------------------------
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>>>              
>>>>>>>>>>>>
>>>>>>>>>>>>To unsubscribe, e-mail:
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>users-unsubscribe at gridengine.sunsource.net
>>    
>>
>>>>>>>>>>>>For additional commands, e-mail:
>>>>>>>>>>>>                   
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>users-help at gridengine.sunsource.net
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>>>>>>              
>>>>>>>>>>>>
>>>>>>>>>>>>                   
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>---------------------------------------------------------------------
>>>>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>For additional commands, e-mail:
>>>>>>>>>>>                      
>>>>>>>>>>>
>>users-help at gridengine.sunsource.net
>>    
>>
>>>>>>>>>>>            
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>---------------------------------------------------------------------
>>>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>For additional commands, e-mail:
>>>>>>>>>>                    
>>>>>>>>>>
>>users-help at gridengine.sunsource.net
>>    
>>
>>>>>>>>>>          
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>---------------------------------------------------------------------
>>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>        
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>---------------------------------------------------------------------
>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>>      
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>---------------------------------------------------------------------
>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>>    
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>>  
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>> 
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>    
>>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list