[GE users] Job suspended every 60 seconds

Reuti reuti at staff.uni-marburg.de
Tue Mar 8 21:04:18 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

Quoting Patrice Hamelin <phamelin at clumeq.mcgill.ca>:

>    I wrote a script to suspend MPI job in a queue and include that path 
> in the "suspend method" field of the queue configuration.  My problem is 
> that SGE keeps trying to suspend the job again every minutes, even 
> though I setup my queue like:
> 
> suspend_thresholds    load_avg=3.0
> nsuspend              0
> suspend_interval      INFINITY

It may not be easy to suspend a MPI job at all (which MPI implementation?), 
because of possible timeouts in the communication. What are you doing in your 
script exactly, which version of SGE and: are there any entries in the messages 
files of the qmaster and/or execd? Just 60 seconds it's just like the deafult 
notify time - how did you submitted your job? - Reuti

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list