[GE users] Pb w/ suspending job ...

Reuti reuti at staff.uni-marburg.de
Fri May 20 14:29:35 BST 2005


If -notify is set, the jobs are warned with a SIGUSR1 before SIGSTOP. 
The default action of SIGUSR1 is to terminate the process, unless you 
trap the signal to do some custom procedures on your own (and this means 
to trap it in each program in the process group).

For parallel jobs, the parallel lib may face some timeouts. But just try.

CU - Reuti


TRAN Chanh wrote:
> 
> 
> Reuti wrote:
> 
>> Were these just plain serial jobs? There is indeed the possibility to 
>> change the suspend/resume method, but the built-in:
> 
> 
> Currently, all the jobs 're plain serail one.
> BTW, I just discovered that I have '-notify'  option in my 'qsub' & by 
> eliminating this now my 'suspend' pb is gone.
> I must say I'm happy w/ this but nevertheless remain interested in 
> having an explanation why I did have this effect ...
> 
> Actually, next step for me is to suspend 'multi-proc' jobs & 
> 'multi-node' jobs & hope everything 'll work out fine
> 
> Thanks again,
> Chanh
> 
>>
>> kill -stop -- -<pid>
>>
>> should stop the whole process group. Did you define any procedures on 
>> your own? Are some forks/threads of your application jumping out of 
>> the process group? - Reuti
>>
> Otherwise, I don't have any specific procedure of my own ...
> 
>>
>> TRAN Chanh wrote:
>>
>>> Hi Reuti,
>>>
>>> Actually, I did try to do this :
>>> 1. via 'qmon->jobs->suspend ....'
>>> 2. qmod -s job_id
>>>
>>> Both 2 bring the same result
>>>
>>> Chanh
>>>
>>> Reuti wrote:
>>>
>>>> Chanh,
>>>>
>>>> which SGE commands did you use in detail to suspend and unsuspend 
>>>> your jobs? - Reuti
>>>>
>>>> TRAN Chanh wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm using SGE 5.3p6 & try to have my executing jobs suspended & 
>>>>> have these one 'back-to-work' via 'resume'.
>>>>> What happens is these jobs instead of being suspended like 'kill 
>>>>> -SIGSTOP', they 're all aborted like 'kill -9'.
>>>>> Is there anyway to change this behavior ?
>>>>>
>>>>> Thanks a lot for any help,
>>>>> Chanh
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list