[GE users] Job suspended every 60 seconds

Reuti reuti at staff.uni-marburg.de
Tue Mar 15 12:25:20 GMT 2005


Stephan, small typo I guess - should be:

reprioritize false


But anyway, in libs/sched/sgeee.c I see:

bool update_execd = ( reprioritize_interval == 0 || (now >= (past + 
reprioritize_interval)));

Why not:

bool update_execd = ( reprioritize_interval != 0 && (now >= (past + 
reprioritize_interval)));

since for "reprioritize_interval == 0" the second expression seemsalways 
to be true - maybe I'm wrong in this place. But: the man page sched_conf 
states, that 0:0:0 is turning it off already.

What's now the truth? - Reuti


Stephan Grell - Sun Germany - SSG - Software Engineer wrote:
> You should also set the
> 
> reprioritze false
> 
> setting in the cluster configuration (qconf -mconf)
> 
> Stephan
> 
> Reuti wrote:
> 
>> Mmh, the reprioritize_interval in the scheduler is set to 0:0:0, so 
>> that SGE is not changing the priority on it's own? For me it's working 
>> and I get a nice of 19 on all slave nodes of a parallel job. Are some 
>> jobs going to the wrong queue?
>>
>> My idea was not to renice the job during their execution, but start 
>> them already with 19 on all nodes. When their is only one job, they 
>> will get the most of the CPU time anyway also with 19.
>>
>> Cheers - Reuti
>>
>> Patrice Hamelin wrote:
>>
>>> Reuti,
>>>
>>>   I tried to renice the processes with the Priority of the queues and 
>>> it shares the processors between processes (66%-33% ratio).  I also 
>>> tried to renice the parallel job to the lower 19 priority, but it is 
>>> not propagating to the slave nodes.  Only the master node processes 
>>> have a higher scheduling priority.
>>>
>>>   I think I will stick to my suspension scripts since I really need 
>>> the higher priority queue to get ALL the processor whenever they need 
>>> it.  I would warn the users of the lower priority queue that the 
>>> result can be bad if suspension occurs.
>>>
>>> Thanks.
>>>
>>> Patrice Hamelin wrote:
>>>
>>>> Reuti,
>>>>
>>>>   At first I was killing all the user's processes, but there was a 
>>>> problem with that.  The suspend script is running as the user ID 
>>>> itself, and not as sgeadmin, as I first thought.  The result was 
>>>> that the shell running the kill's was killing itself, heading to 
>>>> unwanted result.
>>>>
>>>>   I tested my suspension script with two different MPI codes, one 
>>>> that is doing only communication, and another one that computes 
>>>> Jacobi integration in parallel.  The main problem in getting the 
>>>> PIDs it that I really have to target those two processes that are 
>>>> runnning the users code and eating 99% or so of the CPU each. I have 
>>>> to suspend only those two processes.  I verified that the processes 
>>>> are in T state after the kill -19 command was sent to them.
>>>>
>>>>   I agree that there may be weird results doing that kind of 
>>>> operation on an MPICH job, and I will also test the queue priority 
>>>> 19 that you mentionned.  It looks promising!
>>>>
>>>> Ciao!
>>>>
>>>> Reuti wrote:
>>>>
>>>>> Hi Patrice,
>>>>>
>>>>> I really think, it's not a good idea to suspend a MPICH-GM job. IMO 
>>>>> the easier solution would be to have a special cluster queue with a 
>>>>> priority of 19 for them. So any other job running on the nodes in 
>>>>> another queue with a priority of 0 will get most of the CPU time.
>>>>>
>>>>> But anyway, if I understand your script in the correct way: you 
>>>>> want to suspend all jobs from a user on a node by selecting him/her 
>>>>> by the $LOGNAME in top? So the user's name may not appear in any 
>>>>> other field at all, and only one job per user per node is the 
>>>>> limitation. And: head -2 will list the first two lines, at least I 
>>>>> get only two blank lines with it (which platform/OS are you using?).
>>>>>
>>>>> Whether you decide to use it, or get a special cluster queue for 
>>>>> MPICH-GM: better suited is the ps command, because there you can 
>>>>> specify a user and output format, hence the complete:
>>>>>
>>>>> top -b -n1 | grep $LOGNAME | head -2 | awk '{print $1}'
>>>>>
>>>>> can be:
>>>>>
>>>>> ps --user $LOGNAME -o pid --no-headers
>>>>>
>>>>>
>>>>> Next enhancement is not to stop each process on its own, but the 
>>>>> whole process group (if you have a tight integration according to 
>>>>> the Howto for MPICH, which also has a hint for MPICH-GM) [of 
>>>>> course: it's untested]:
>>>>>
>>>>> for proc in `rsh $nodes ps --user $LOGNAME -o pgrp --no-headers | 
>>>>> uniq` ; do
>>>>>     rsh $nodes kill -19 -- -$proc
>>>>> done
>>>>>
>>>>> If there is only one job on the node, you wouldn't need the loop at 
>>>>> all now. Did you verified on the nodes, that your job is really 
>>>>> suspended with your script by looking in the e.g. "ps -e f" output 
>>>>> for the field STAT which will show T for stopped jobs?
>>>>>
>>>>>
>>>>> Cheers - Reuti
>>>>>
>>>>>
>>>>> Quoting Patrice Hamelin <phamelin at clumeq.mcgill.ca>:
>>>>>
>>>>>
>>>>>> Reuti,
>>>>>>
>>>>>>   Thanks for your answer, and sorry not to give enough details.  I 
>>>>>> am using GE 6.0u1 with MPICH-GM implementation. I found nothing 
>>>>>> interesting in the qmaster message file.  My script simply send 
>>>>>> SIGSTOP signals to all the MPI processes on all nodes members of 
>>>>>> the job.  I tested it with a simple communication program, but I 
>>>>>> still have to test it in a real production environment, next 
>>>>>> step.  You will find my cript below.  The "unsuspend" script 
>>>>>> simply send SIGCONT signal to the processes.
>>>>>>
>>>>>>   I fooled the re-suspension by creating a file at the first 
>>>>>> suspension.
>>>>>>
>>>>>> F=/tmp/suspend_MPI_job.$LOGNAME.log
>>>>>> touch $F
>>>>>>
>>>>>> if [ -f $TMPDIR/suspended ];then
>>>>>>   echo "`date` Job already suspended; exiting" >> $F
>>>>>>   exit
>>>>>> fi
>>>>>> #
>>>>>> # For each node
>>>>>> #
>>>>>>   for nodes in `cat $TMPDIR/machines | /usr/bin/uniq`
>>>>>>   do
>>>>>> #
>>>>>> # Create a file that contains PIDs of suspended processes
>>>>>> #
>>>>>>     touch $TMPDIR/$nodes
>>>>>>     > $TMPDIR/$nodes
>>>>>> #
>>>>>> # Determine processes to suspend
>>>>>> #
>>>>>>     for proc in `rsh $nodes top -b -n1 | grep $LOGNAME | head -2 | 
>>>>>> awk '{print $1}'`
>>>>>>     do
>>>>>>       echo "`date` Suspending process $proc on $nodes" >> $F
>>>>>>       echo $proc >> $TMPDIR/$nodes
>>>>>>       rsh $nodes kill -19 $proc
>>>>>>     done
>>>>>>   done
>>>>>> touch $TMPDIR/suspended
>>>>>>
>>>>>>
>>>>>> Reuti wrote:
>>>>>>
>>>>>>> Quoting Patrice Hamelin <phamelin at clumeq.mcgill.ca>:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>  I wrote a script to suspend MPI job in a queue and include that 
>>>>>>>> path in the "suspend method" field of the queue configuration.  
>>>>>>>> My problem is that SGE keeps trying to suspend the job again 
>>>>>>>> every minutes, even though I setup my queue like:
>>>>>>>>
>>>>>>>> suspend_thresholds    load_avg=3.0
>>>>>>>> nsuspend              0
>>>>>>>> suspend_interval      INFINITY
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> It may not be easy to suspend a MPI job at all (which MPI 
>>>>>>> implementation?),
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> because of possible timeouts in the communication. What are you 
>>>>>>> doing in
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> your
>>>>>>
>>>>>>> script exactly, which version of SGE and: are there any entries 
>>>>>>> in the
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> messages
>>>>>>
>>>>>>> files of the qmaster and/or execd? Just 60 seconds it's just like 
>>>>>>> the
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> deafult
>>>>>>
>>>>>>> notify time - how did you submitted your job? - Reuti
>>>>>>>
>>>>>>> --------------------------------------------------------------------- 
>>>>>>>
>>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Patrice Hamelin ing, M.Sc.A, CCNA
>>>>>> Systems Administrator
>>>>>> CLUMEQ Supercomputer Centre
>>>>>> McGill University
>>>>>> 688 Sherbrooke Street West, Suite 710
>>>>>> Montreal, QC, Canada H3A 2S6
>>>>>> Tel: 514-398-3344
>>>>>> Fax: 514-398-2203
>>>>>> http://www.clumeq.mcgill.ca
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list