[GE users] question on custom 'suspend method' ...

TRAN Chanh chanh.tran at dassault-aviation.fr
Fri May 27 13:50:19 BST 2005


Reuti wrote:

>Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
>
>  
>
>>Reuti wrote:
>>
>>    
>>
>>>Is there any error message in the messages file (of the qmaster or the
>>>      
>>>
>>node)? 
>>    
>>
>>> 
>>>
>>>      
>>>
>>Sorry to ask but where I can check these msg ?
>>    
>>
>
>By default these are located in $SGE_ROOT/default/spool/qmaster and $SGE_ROOT/
>default/spool/<name_of_node>, unless you have defined a custom spool directory. 
>- Reuti
>
>  
>

I checked file 'messages' in qmaster + node and found nothing related to 
my script ...

>>>You tried your script interactive also on the execution nodes, maybe /nfs is
>>>      
>>>
>>>mounted there without "exec"? - Reuti
>>> 
>>>
>>>      
>>>
>>I tried to run my script on execution nodes, its works
>>
>>    
>>
>>>Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
>>>
>>> 
>>>
>>>      
>>>
>>>>Reuti wrote:
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>One thing I just saw: the changes to the queue will only be accepted 
>>>>>before the job starts to run on the node. Changing the queue 
>>>>>definition of then suspend method while the job is already running, 
>>>>>will not invoke it. - Reuti
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>I did try also 'qmod -s job_id' w/ same result .
>>>>The way I proceeded is :
>>>>
>>>>1. set up the method in queue
>>>>2. 'qsub' job to queue
>>>>3. test w/ 'qmod -s queue' & 'qmod -s job_id'
>>>>
>>>>All I did observe in both cases is my job 's state changed from 
>>>>'running' to 'suspended' ....
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>TRAN Chanh wrote:
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Reuti wrote:
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Mmh, for me it's working (the default and also custom procedures). 
>>>>>>>What in detail do you observe. E.g., having a running job, issuing a 
>>>>>>>'qmod -s ...' and log in to the the node. Then the 'ps -e f' should 
>>>>>>>list the status of the job as 'T' for stopped (on Linux).
>>>>>>>
>>>>>>>Having a custom procedure, can you try to echo something to a file 
>>>>>>>in your home directory? This way we might check, whether the 
>>>>>>>procedure is invoked at all.
>>>>>>>
>>>>>>>What platform are you on? - Reuti
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>I've have my 'qmaster' on  'AIX 5.2'  &  my execution plateforms 're 
>>>>>>on 'Linux RedHat Enterprise 3.0'.
>>>>>>I just re-double-checked my test case which is :
>>>>>>
>>>>>>- queue named 'queue.q' in which I defined a suspend method called 
>>>>>>'/nfs/suspend.sh'
>>>>>>- /nfs/suspend.sh :
>>>>>>#!/bin/ksh
>>>>>>output=/nfs/test.out
>>>>>>date >| $output
>>>>>>echo suspend  >> $output
>>>>>>
>>>>>>- /nfs/suspend.sh is set to 777
>>>>>>- /nfs/test.out is set to 777
>>>>>>- I did try this script to make sure its works in observing traces 
>>>>>>produced by 'date' & 'echo' in /nfs/test.out
>>>>>>- With 'qmod -s queue.q' executed from my 'qmaster' plateform, I did 
>>>>>>see no change in /nfs/test.out ...
>>>>>>
>>>>>>Chanh
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>TRAN Chanh wrote:
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Reuti,
>>>>>>>>
>>>>>>>>Sorry for not having said I did try 'qmod -s' to trigger the 
>>>>>>>>'suspend method' but saw no effect ... That 's why I posted ...
>>>>>>>>
>>>>>>>>Reuti wrote:
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>You can use in 5.3p6, but the command is always 'qmod -s ...'. 
>>>>>>>>>It's the syntax, which is deprecated for 6.0. - Reuti
>>>>>>>>>
>>>>>>>>>TRAN Chanh wrote:
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>If I get U right, this means I can't have this behavior under SGE 
>>>>>>>>>>5.3p6 ....
>>>>>>>>>>
>>>>>>>>>>Thanks a lot anyway,
>>>>>>>>>>Cheers
>>>>>>>>>>
>>>>>>>>>>Reuti wrote:
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>>>Yes, for 6.0 there are the new options 'qmod -sj <job_id>' and
>>>>>>>>>>>'qmod -sq <queue_name>'. And also any set suspend thresholds or 
>>>>>>>>>>>subordinations might invoke the suspend-method. - Reuti
>>>>>>>>>>>
>>>>>>>>>>>Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>>>Reuti wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                   
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>>>Chanc,
>>>>>>>>>>>>>
>>>>>>>>>>>>>the methods will be invoked, when a job e.g. has to be
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>suspended.
>>    
>>
>>>>>>>>>>>>>   
>>>>>>>>>>>>>                     
>>>>>>>>>>>>>
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>Reuti,
>>>>>>>>>>>>
>>>>>>>>>>>>What leads a job to state "has to be suspended", can this be 
>>>>>>>>>>>>triggered by 'qmod -s job_id' or 'qmod -s queue_name' something 
>>>>>>>>>>>>alike ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                   
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>>>The default action is to send a sigstop to the whole process 
>>>>>>>>>>>>>group in this case. If you define a procedure on your own, you 
>>>>>>>>>>>>>can use some special variables, which will give you e.g. the 
>>>>>>>>>>>>>PID and do any cleanup or other things that are necessary (see 
>>>>>>>>>>>>>man queue_conf):
>>>>>>>>>>>>>
>>>>>>>>>>>>>suspend_method /usr/sge/mysuspend $job_pid
>>>>>>>>>>>>>
>>>>>>>>>>>>>and the script:
>>>>>>>>>>>>>
>>>>>>>>>>>>>#!/bin/sh
>>>>>>>>>>>>>kill -stop -- -$1
>>>>>>>>>>>>>exit 0
>>>>>>>>>>>>>
>>>>>>>>>>>>>Should behave like the default built-in if you suspend a job. 
>>>>>>>>>>>>>- Reuti
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>TRAN Chanh wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                     
>>>>>>>>>>>>>
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>>>>>>>>>>>>>Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>I've pb understanding how the custom 'suspend / resume 
>>>>>>>>>>>>>>/terminate' method in a queue configuration works ?
>>>>>>>>>>>>>>How 're this methods related to action 'suspend/resume' on a 
>>>>>>>>>>>>>>queue via 'qmon' ?
>>>>>>>>>>>>>>More precisely, what I'm trying to do is to have this method 
>>>>>>>>>>>>>>triggered via 'suspend / resume' from 'qmon' ...
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Will someone please give me some insights on this matter ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Thanks in advance,
>>>>>>>>>>>>>>Chanh
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                       
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>--------------------------------------------------------------------- 
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>>>>>>>>To unsubscribe, e-mail: 
>>>>>>>>>>>>>>users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>>>>>>     
>>>>>>>>>>>>>>                       
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                     
>>>>>>>>>>>>>
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>>>--------------------------------------------------------------------- 
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>>>>>>>To unsubscribe, e-mail: 
>>>>>>>>>>>>>users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>   
>>>>>>>>>>>>>                     
>>>>>>>>>>>>>
>>>>>>>>>>>>>                          
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                   
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>--------------------------------------------------------------------- 
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>>>>>>To unsubscribe, e-mail:
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>users-unsubscribe at gridengine.sunsource.net
>>    
>>
>>>>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>>>>
>>>>>>>>>>>> 
>>>>>>>>>>>>                   
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>---------------------------------------------------------------------
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>---------------------------------------------------------------------
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>For additional commands, e-mail:
>>>>>>>>>>                    
>>>>>>>>>>
>>users-help at gridengine.sunsource.net
>>    
>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>---------------------------------------------------------------------
>>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>---------------------------------------------------------------------
>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>>           
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>
>>>>>>>---------------------------------------------------------------------
>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>> 
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>    
>>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list