[GE users] question on custom 'suspend method' ...

Reuti reuti at staff.uni-marburg.de
Fri May 27 21:41:50 BST 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Okay, can you try a procedure:

#!/bin/sh
while [ 1 ]; do :; done

for the suspend_method. A "ps -e f" on the node should show something like:

 7747 ?        S      3:34 /usr/sge/bin/lx26-x86/sge_execd
 1738 ?        S      0:00  \_ sge_shepherd-3038 -bg
 1739 ?        Ss     0:00      \_ /bin/sh /var/spool/sge/x/job_scripts/3038
 1741 ?        R      6:11      |   \_ /home/reuti/ever
 1756 ?        R      0:06      \_ /bin/sh /home/reuti/abcdef

CU - Reuti


Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:

> Reuti wrote:
> 
> >Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
> >
> >  
> >
> >>Reuti wrote:
> >>
> >>    
> >>
> >>>Is there any error message in the messages file (of the qmaster or the
> >>>      
> >>>
> >>node)? 
> >>    
> >>
> >>> 
> >>>
> >>>      
> >>>
> >>Sorry to ask but where I can check these msg ?
> >>    
> >>
> >
> >By default these are located in $SGE_ROOT/default/spool/qmaster and
> $SGE_ROOT/
> >default/spool/<name_of_node>, unless you have defined a custom spool
> directory. 
> >- Reuti
> >
> >  
> >
> 
> I checked file 'messages' in qmaster + node and found nothing related to 
> my script ...
> 
> >>>You tried your script interactive also on the execution nodes, maybe /nfs
> is
> >>>      
> >>>
> >>>mounted there without "exec"? - Reuti
> >>> 
> >>>
> >>>      
> >>>
> >>I tried to run my script on execution nodes, its works
> >>
> >>    
> >>
> >>>Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>>>Reuti wrote:
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>One thing I just saw: the changes to the queue will only be accepted 
> >>>>>before the job starts to run on the node. Changing the queue 
> >>>>>definition of then suspend method while the job is already running, 
> >>>>>will not invoke it. - Reuti
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>I did try also 'qmod -s job_id' w/ same result .
> >>>>The way I proceeded is :
> >>>>
> >>>>1. set up the method in queue
> >>>>2. 'qsub' job to queue
> >>>>3. test w/ 'qmod -s queue' & 'qmod -s job_id'
> >>>>
> >>>>All I did observe in both cases is my job 's state changed from 
> >>>>'running' to 'suspended' ....
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>TRAN Chanh wrote:
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>>>Reuti wrote:
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>Mmh, for me it's working (the default and also custom procedures). 
> >>>>>>>What in detail do you observe. E.g., having a running job, issuing a
> 
> >>>>>>>'qmod -s ...' and log in to the the node. Then the 'ps -e f' should 
> >>>>>>>list the status of the job as 'T' for stopped (on Linux).
> >>>>>>>
> >>>>>>>Having a custom procedure, can you try to echo something to a file 
> >>>>>>>in your home directory? This way we might check, whether the 
> >>>>>>>procedure is invoked at all.
> >>>>>>>
> >>>>>>>What platform are you on? - Reuti
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>I've have my 'qmaster' on  'AIX 5.2'  &  my execution plateforms 're 
> >>>>>>on 'Linux RedHat Enterprise 3.0'.
> >>>>>>I just re-double-checked my test case which is :
> >>>>>>
> >>>>>>- queue named 'queue.q' in which I defined a suspend method called 
> >>>>>>'/nfs/suspend.sh'
> >>>>>>- /nfs/suspend.sh :
> >>>>>>#!/bin/ksh
> >>>>>>output=/nfs/test.out
> >>>>>>date >| $output
> >>>>>>echo suspend  >> $output
> >>>>>>
> >>>>>>- /nfs/suspend.sh is set to 777
> >>>>>>- /nfs/test.out is set to 777
> >>>>>>- I did try this script to make sure its works in observing traces 
> >>>>>>produced by 'date' & 'echo' in /nfs/test.out
> >>>>>>- With 'qmod -s queue.q' executed from my 'qmaster' plateform, I did 
> >>>>>>see no change in /nfs/test.out ...
> >>>>>>
> >>>>>>Chanh
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>>>TRAN Chanh wrote:
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>>>Reuti,
> >>>>>>>>
> >>>>>>>>Sorry for not having said I did try 'qmod -s' to trigger the 
> >>>>>>>>'suspend method' but saw no effect ... That 's why I posted ...
> >>>>>>>>
> >>>>>>>>Reuti wrote:
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>                
> >>>>>>>>
> >>>>>>>>>You can use in 5.3p6, but the command is always 'qmod -s ...'. 
> >>>>>>>>>It's the syntax, which is deprecated for 6.0. - Reuti
> >>>>>>>>>
> >>>>>>>>>TRAN Chanh wrote:
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>>>If I get U right, this means I can't have this behavior under SGE
> 
> >>>>>>>>>>5.3p6 ....
> >>>>>>>>>>
> >>>>>>>>>>Thanks a lot anyway,
> >>>>>>>>>>Cheers
> >>>>>>>>>>
> >>>>>>>>>>Reuti wrote:
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>>>Yes, for 6.0 there are the new options 'qmod -sj <job_id>' and
> >>>>>>>>>>>'qmod -sq <queue_name>'. And also any set suspend thresholds or 
> >>>>>>>>>>>subordinations might invoke the suspend-method. - Reuti
> >>>>>>>>>>>
> >>>>>>>>>>>Quoting TRAN Chanh <chanh.tran at dassault-aviation.fr>:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                      
> >>>>>>>>>>>
> >>>>>>>>>>>>Reuti wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>                   
> >>>>>>>>>>>>
> >>>>>>>>>>>>                        
> >>>>>>>>>>>>
> >>>>>>>>>>>>>Chanc,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>the methods will be invoked, when a job e.g. has to be
> >>>>>>>>>>>>>                          
> >>>>>>>>>>>>>
> >>suspended.
> >>    
> >>
> >>>>>>>>>>>>>   
> >>>>>>>>>>>>>                     
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                          
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>Reuti,
> >>>>>>>>>>>>
> >>>>>>>>>>>>What leads a job to state "has to be suspended", can this be 
> >>>>>>>>>>>>triggered by 'qmod -s job_id' or 'qmod -s queue_name' something
> 
> >>>>>>>>>>>>alike ?
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>                   
> >>>>>>>>>>>>
> >>>>>>>>>>>>                        
> >>>>>>>>>>>>
> >>>>>>>>>>>>>The default action is to send a sigstop to the whole process 
> >>>>>>>>>>>>>group in this case. If you define a procedure on your own, you
> 
> >>>>>>>>>>>>>can use some special variables, which will give you e.g. the 
> >>>>>>>>>>>>>PID and do any cleanup or other things that are necessary (see
> 
> >>>>>>>>>>>>>man queue_conf):
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>suspend_method /usr/sge/mysuspend $job_pid
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>and the script:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>#!/bin/sh
> >>>>>>>>>>>>>kill -stop -- -$1
> >>>>>>>>>>>>>exit 0
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>Should behave like the default built-in if you suspend a job. 
> >>>>>>>>>>>>>- Reuti
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>TRAN Chanh wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                     
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                          
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>Hi all,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>I've pb understanding how the custom 'suspend / resume 
> >>>>>>>>>>>>>>/terminate' method in a queue configuration works ?
> >>>>>>>>>>>>>>How 're this methods related to action 'suspend/resume' on a 
> >>>>>>>>>>>>>>queue via 'qmon' ?
> >>>>>>>>>>>>>>More precisely, what I'm trying to do is to have this method 
> >>>>>>>>>>>>>>triggered via 'suspend / resume' from 'qmon' ...
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>Will someone please give me some insights on this matter ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>Thanks in advance,
> >>>>>>>>>>>>>>Chanh
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                       
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                            
> >>>>>>>>>>>>>>
> >>>>--------------------------------------------------------------------- 
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>>>>>>>>>>To unsubscribe, e-mail: 
> >>>>>>>>>>>>>>users-unsubscribe at gridengine.sunsource.net
> >>>>>>>>>>>>>>For additional commands, e-mail: 
> >>>>>>>>>>>>>>users-help at gridengine.sunsource.net
> >>>>>>>>>>>>>>     
> >>>>>>>>>>>>>>                       
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>                            
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                     
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                          
> >>>>>>>>>>>>>
> >>>>--------------------------------------------------------------------- 
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>>>>>>>>>To unsubscribe, e-mail: 
> >>>>>>>>>>>>>users-unsubscribe at gridengine.sunsource.net
> >>>>>>>>>>>>>For additional commands, e-mail: 
> >>>>>>>>>>>>>users-help at gridengine.sunsource.net
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>   
> >>>>>>>>>>>>>                     
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                          
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>                   
> >>>>>>>>>>>>
> >>>>>>>>>>>>                        
> >>>>>>>>>>>>
> >>>>--------------------------------------------------------------------- 
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>>>>>>>>>>To unsubscribe, e-mail:
> >>>>>>>>>>>>                        
> >>>>>>>>>>>>
> >>users-unsubscribe at gridengine.sunsource.net
> >>    
> >>
> >>>>>>>>>>>>For additional commands, e-mail: 
> >>>>>>>>>>>>users-help at gridengine.sunsource.net
> >>>>>>>>>>>>
> >>>>>>>>>>>> 
> >>>>>>>>>>>>                   
> >>>>>>>>>>>>
> >>>>>>>>>>>>                        
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> 
>>>>>>>>>>>---------------------------------------------------------------------
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> >>>>>>>>>>>For additional commands, e-mail: 
> >>>>>>>>>>>users-help at gridengine.sunsource.net
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                      
> >>>>>>>>>>>
> 
>>>>>>>>>>---------------------------------------------------------------------
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>To unsubscribe, e-mail:
> users-unsubscribe at gridengine.sunsource.net
> >>>>>>>>>>For additional commands, e-mail:
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>users-help at gridengine.sunsource.net
> >>    
> >>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                    
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>---------------------------------------------------------------------
> >>>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>>>>>>For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                  
> >>>>>>>>>
> >>>>>>>>---------------------------------------------------------------------
> >>>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>>>>>For additional commands, e-mail:
> users-help at gridengine.sunsource.net
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>                
> >>>>>>>>
> >>>>>>>
> >>>>>>>---------------------------------------------------------------------
> >>>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>              
> >>>>>>>
> >>>>>>---------------------------------------------------------------------
> >>>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>>>>       
> >>>>>>
> >>>>>>            
> >>>>>>
> >>>>>---------------------------------------------------------------------
> >>>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>>>
> >>>>>
> >>>>>     
> >>>>>
> >>>>>          
> >>>>>
> >>>>---------------------------------------------------------------------
> >>>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>>
> >>>>   
> >>>>
> >>>>        
> >>>>
> >>>
> >>>---------------------------------------------------------------------
> >>>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>>
> >>>
> >>> 
> >>>
> >>>      
> >>>
> >>---------------------------------------------------------------------
> >>To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >>For additional commands, e-mail: users-help at gridengine.sunsource.net
> >>
> >>    
> >>
> >
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> >For additional commands, e-mail: users-help at gridengine.sunsource.net
> >
> >
> >  
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list