[GE users] question on custom 'suspend method' ...

TRAN Chanh chanh.tran at dassault-aviation.fr
Thu May 26 16:54:31 BST 2005


Reuti wrote:

>Is there any error message in the messages file (of the qmaster or the node)? 
>  
>
Sorry to ask but where I can check these msg ?

>You tried your script interactive also on the execution nodes, maybe /nfs is 
>mounted there without "exec"? - Reuti
>  
>
I tried to run my script on execution nodes, its works

>Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
>
>  
>
>>Reuti wrote:
>>
>>    
>>
>>>One thing I just saw: the changes to the queue will only be accepted 
>>>before the job starts to run on the node. Changing the queue 
>>>definition of then suspend method while the job is already running, 
>>>will not invoke it. - Reuti
>>>
>>>      
>>>
>>I did try also 'qmod -s job_id' w/ same result .
>>The way I proceeded is :
>>
>>1. set up the method in queue
>>2. 'qsub' job to queue
>>3. test w/ 'qmod -s queue' & 'qmod -s job_id'
>>
>>All I did observe in both cases is my job 's state changed from 
>>'running' to 'suspended' ....
>>
>>    
>>
>>>TRAN Chanh wrote:
>>>
>>>      
>>>
>>>>Reuti wrote:
>>>>
>>>>        
>>>>
>>>>>Mmh, for me it's working (the default and also custom procedures). 
>>>>>What in detail do you observe. E.g., having a running job, issuing a 
>>>>>'qmod -s ...' and log in to the the node. Then the 'ps -e f' should 
>>>>>list the status of the job as 'T' for stopped (on Linux).
>>>>>
>>>>>Having a custom procedure, can you try to echo something to a file 
>>>>>in your home directory? This way we might check, whether the 
>>>>>procedure is invoked at all.
>>>>>
>>>>>What platform are you on? - Reuti
>>>>>
>>>>>          
>>>>>
>>>>I've have my 'qmaster' on  'AIX 5.2'  &  my execution plateforms 're 
>>>>on 'Linux RedHat Enterprise 3.0'.
>>>>I just re-double-checked my test case which is :
>>>>
>>>>- queue named 'queue.q' in which I defined a suspend method called 
>>>>'/nfs/suspend.sh'
>>>>- /nfs/suspend.sh :
>>>>#!/bin/ksh
>>>>output=/nfs/test.out
>>>>date >| $output
>>>>echo suspend  >> $output
>>>>
>>>>- /nfs/suspend.sh is set to 777
>>>>- /nfs/test.out is set to 777
>>>>- I did try this script to make sure its works in observing traces 
>>>>produced by 'date' & 'echo' in /nfs/test.out
>>>>- With 'qmod -s queue.q' executed from my 'qmaster' plateform, I did 
>>>>see no change in /nfs/test.out ...
>>>>
>>>>Chanh
>>>>
>>>>        
>>>>
>>>>>TRAN Chanh wrote:
>>>>>
>>>>>          
>>>>>
>>>>>>Reuti,
>>>>>>
>>>>>>Sorry for not having said I did try 'qmod -s' to trigger the 
>>>>>>'suspend method' but saw no effect ... That 's why I posted ...
>>>>>>
>>>>>>Reuti wrote:
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>You can use in 5.3p6, but the command is always 'qmod -s ...'. 
>>>>>>>It's the syntax, which is deprecated for 6.0. - Reuti
>>>>>>>
>>>>>>>TRAN Chanh wrote:
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>If I get U right, this means I can't have this behavior under SGE 
>>>>>>>>5.3p6 ....
>>>>>>>>
>>>>>>>>Thanks a lot anyway,
>>>>>>>>Cheers
>>>>>>>>
>>>>>>>>Reuti wrote:
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>Yes, for 6.0 there are the new options 'qmod -sj <job_id>' and
>>>>>>>>>'qmod -sq <queue_name>'. And also any set suspend thresholds or 
>>>>>>>>>subordinations might invoke the suspend-method. - Reuti
>>>>>>>>>
>>>>>>>>>Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
>>>>>>>>>
>>>>>>>>> 
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>Reuti wrote:
>>>>>>>>>>
>>>>>>>>>> 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>>>Chanc,
>>>>>>>>>>>
>>>>>>>>>>>the methods will be invoked, when a job e.g. has to be suspended.
>>>>>>>>>>>    
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>Reuti,
>>>>>>>>>>
>>>>>>>>>>What leads a job to state "has to be suspended", can this be 
>>>>>>>>>>triggered by 'qmod -s job_id' or 'qmod -s queue_name' something 
>>>>>>>>>>alike ?
>>>>>>>>>>
>>>>>>>>>> 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>>>The default action is to send a sigstop to the whole process 
>>>>>>>>>>>group in this case. If you define a procedure on your own, you 
>>>>>>>>>>>can use some special variables, which will give you e.g. the 
>>>>>>>>>>>PID and do any cleanup or other things that are necessary (see 
>>>>>>>>>>>man queue_conf):
>>>>>>>>>>>
>>>>>>>>>>>suspend_method /usr/sge/mysuspend $job_pid
>>>>>>>>>>>
>>>>>>>>>>>and the script:
>>>>>>>>>>>
>>>>>>>>>>>#!/bin/sh
>>>>>>>>>>>kill -stop -- -$1
>>>>>>>>>>>exit 0
>>>>>>>>>>>
>>>>>>>>>>>Should behave like the default built-in if you suspend a job. 
>>>>>>>>>>>- Reuti
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>TRAN Chanh wrote:
>>>>>>>>>>>
>>>>>>>>>>> 
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>>>Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>>I've pb understanding how the custom 'suspend / resume 
>>>>>>>>>>>>/terminate' method in a queue configuration works ?
>>>>>>>>>>>>How 're this methods related to action 'suspend/resume' on a 
>>>>>>>>>>>>queue via 'qmon' ?
>>>>>>>>>>>>More precisely, what I'm trying to do is to have this method 
>>>>>>>>>>>>triggered via 'suspend / resume' from 'qmon' ...
>>>>>>>>>>>>
>>>>>>>>>>>>Will someone please give me some insights on this matter ?
>>>>>>>>>>>>
>>>>>>>>>>>>Thanks in advance,
>>>>>>>>>>>>Chanh
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>--------------------------------------------------------------------- 
>>    
>>
>>>>>>>>>>>>To unsubscribe, e-mail: 
>>>>>>>>>>>>users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>>>>      
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>--------------------------------------------------------------------- 
>>    
>>
>>>>>>>>>>>To unsubscribe, e-mail: 
>>>>>>>>>>>users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>--------------------------------------------------------------------- 
>>    
>>
>>>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>>
>>>>>>>>>>  
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>---------------------------------------------------------------------
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>>For additional commands, e-mail: 
>>>>>>>>>users-help at gridengine.sunsource.net
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>---------------------------------------------------------------------
>>>>>>>>                
>>>>>>>>
>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>>                
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>---------------------------------------------------------------------
>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>---------------------------------------------------------------------
>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>            
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>---------------------------------------------------------------------
>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>---------------------------------------------------------------------
>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>        
>>>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>
>>>
>>>      
>>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>    
>>
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>For additional commands, e-mail: users-help at gridengine.sunsource.net
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list