[GE users] Job suspended every 60 seconds

Stephan Grell - Sun Germany - SSG - Software Engineer stephan.grell at sun.com
Tue Mar 15 10:56:36 GMT 2005


    [ The following text is in the "ISO-8859-1" character set. ]
    [ Your display is set for the "ISO-8859-10" character set.  ]
    [ Some special characters may be displayed incorrectly. ]

You should also set the

reprioritze false

setting in the cluster configuration (qconf -mconf)

Stephan

Reuti wrote:

> Mmh, the reprioritize_interval in the scheduler is set to 0:0:0, so 
> that SGE is not changing the priority on it's own? For me it's working 
> and I get a nice of 19 on all slave nodes of a parallel job. Are some 
> jobs going to the wrong queue?
>
> My idea was not to renice the job during their execution, but start 
> them already with 19 on all nodes. When their is only one job, they 
> will get the most of the CPU time anyway also with 19.
>
> Cheers - Reuti
>
> Patrice Hamelin wrote:
>
>> Reuti,
>>
>>   I tried to renice the processes with the Priority of the queues and 
>> it shares the processors between processes (66%-33% ratio).  I also 
>> tried to renice the parallel job to the lower 19 priority, but it is 
>> not propagating to the slave nodes.  Only the master node processes 
>> have a higher scheduling priority.
>>
>>   I think I will stick to my suspension scripts since I really need 
>> the higher priority queue to get ALL the processor whenever they need 
>> it.  I would warn the users of the lower priority queue that the 
>> result can be bad if suspension occurs.
>>
>> Thanks.
>>
>> Patrice Hamelin wrote:
>>
>>> Reuti,
>>>
>>>   At first I was killing all the user's processes, but there was a 
>>> problem with that.  The suspend script is running as the user ID 
>>> itself, and not as sgeadmin, as I first thought.  The result was 
>>> that the shell running the kill's was killing itself, heading to 
>>> unwanted result.
>>>
>>>   I tested my suspension script with two different MPI codes, one 
>>> that is doing only communication, and another one that computes 
>>> Jacobi integration in parallel.  The main problem in getting the 
>>> PIDs it that I really have to target those two processes that are 
>>> runnning the users code and eating 99% or so of the CPU each. I have 
>>> to suspend only those two processes.  I verified that the processes 
>>> are in T state after the kill -19 command was sent to them.
>>>
>>>   I agree that there may be weird results doing that kind of 
>>> operation on an MPICH job, and I will also test the queue priority 
>>> 19 that you mentionned.  It looks promising!
>>>
>>> Ciao!
>>>
>>> Reuti wrote:
>>>
>>>> Hi Patrice,
>>>>
>>>> I really think, it's not a good idea to suspend a MPICH-GM job. IMO 
>>>> the easier solution would be to have a special cluster queue with a 
>>>> priority of 19 for them. So any other job running on the nodes in 
>>>> another queue with a priority of 0 will get most of the CPU time.
>>>>
>>>> But anyway, if I understand your script in the correct way: you 
>>>> want to suspend all jobs from a user on a node by selecting him/her 
>>>> by the $LOGNAME in top? So the user's name may not appear in any 
>>>> other field at all, and only one job per user per node is the 
>>>> limitation. And: head -2 will list the first two lines, at least I 
>>>> get only two blank lines with it (which platform/OS are you using?).
>>>>
>>>> Whether you decide to use it, or get a special cluster queue for 
>>>> MPICH-GM: better suited is the ps command, because there you can 
>>>> specify a user and output format, hence the complete:
>>>>
>>>> top -b -n1 | grep $LOGNAME | head -2 | awk '{print $1}'
>>>>
>>>> can be:
>>>>
>>>> ps --user $LOGNAME -o pid --no-headers
>>>>
>>>>
>>>> Next enhancement is not to stop each process on its own, but the 
>>>> whole process group (if you have a tight integration according to 
>>>> the Howto for MPICH, which also has a hint for MPICH-GM) [of 
>>>> course: it's untested]:
>>>>
>>>> for proc in `rsh $nodes ps --user $LOGNAME -o pgrp --no-headers | 
>>>> uniq` ; do
>>>>     rsh $nodes kill -19 -- -$proc
>>>> done
>>>>
>>>> If there is only one job on the node, you wouldn't need the loop at 
>>>> all now. Did you verified on the nodes, that your job is really 
>>>> suspended with your script by looking in the e.g. "ps -e f" output 
>>>> for the field STAT which will show T for stopped jobs?
>>>>
>>>>
>>>> Cheers - Reuti
>>>>
>>>>
>>>> Quoting Patrice Hamelin <phamelin at clumeq.mcgill.ca>:
>>>>
>>>>
>>>>> Reuti,
>>>>>
>>>>>   Thanks for your answer, and sorry not to give enough details.  I 
>>>>> am using GE 6.0u1 with MPICH-GM implementation. I found nothing 
>>>>> interesting in the qmaster message file.  My script simply send 
>>>>> SIGSTOP signals to all the MPI processes on all nodes members of 
>>>>> the job.  I tested it with a simple communication program, but I 
>>>>> still have to test it in a real production environment, next 
>>>>> step.  You will find my cript below.  The "unsuspend" script 
>>>>> simply send SIGCONT signal to the processes.
>>>>>
>>>>>   I fooled the re-suspension by creating a file at the first 
>>>>> suspension.
>>>>>
>>>>> F=/tmp/suspend_MPI_job.$LOGNAME.log
>>>>> touch $F
>>>>>
>>>>> if [ -f $TMPDIR/suspended ];then
>>>>>   echo "`date` Job already suspended; exiting" >> $F
>>>>>   exit
>>>>> fi
>>>>> #
>>>>> # For each node
>>>>> #
>>>>>   for nodes in `cat $TMPDIR/machines | /usr/bin/uniq`
>>>>>   do
>>>>> #
>>>>> # Create a file that contains PIDs of suspended processes
>>>>> #
>>>>>     touch $TMPDIR/$nodes
>>>>>     > $TMPDIR/$nodes
>>>>> #
>>>>> # Determine processes to suspend
>>>>> #
>>>>>     for proc in `rsh $nodes top -b -n1 | grep $LOGNAME | head -2 | 
>>>>> awk '{print $1}'`
>>>>>     do
>>>>>       echo "`date` Suspending process $proc on $nodes" >> $F
>>>>>       echo $proc >> $TMPDIR/$nodes
>>>>>       rsh $nodes kill -19 $proc
>>>>>     done
>>>>>   done
>>>>> touch $TMPDIR/suspended
>>>>>
>>>>>
>>>>> Reuti wrote:
>>>>>
>>>>>> Quoting Patrice Hamelin <phamelin at clumeq.mcgill.ca>:
>>>>>>
>>>>>>
>>>>>>
>>>>>>>  I wrote a script to suspend MPI job in a queue and include that 
>>>>>>> path in the "suspend method" field of the queue configuration.  
>>>>>>> My problem is that SGE keeps trying to suspend the job again 
>>>>>>> every minutes, even though I setup my queue like:
>>>>>>>
>>>>>>> suspend_thresholds    load_avg=3.0
>>>>>>> nsuspend              0
>>>>>>> suspend_interval      INFINITY
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> It may not be easy to suspend a MPI job at all (which MPI 
>>>>>> implementation?),
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> because of possible timeouts in the communication. What are you 
>>>>>> doing in
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> your
>>>>>
>>>>>> script exactly, which version of SGE and: are there any entries 
>>>>>> in the
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> messages
>>>>>
>>>>>> files of the qmaster and/or execd? Just 60 seconds it's just like 
>>>>>> the
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> deafult
>>>>>
>>>>>> notify time - how did you submitted your job? - Reuti
>>>>>>
>>>>>> --------------------------------------------------------------------- 
>>>>>>
>>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>>
>>>>>
>>>>> -- 
>>>>> Patrice Hamelin ing, M.Sc.A, CCNA
>>>>> Systems Administrator
>>>>> CLUMEQ Supercomputer Centre
>>>>> McGill University
>>>>> 688 Sherbrooke Street West, Suite 710
>>>>> Montreal, QC, Canada H3A 2S6
>>>>> Tel: 514-398-3344
>>>>> Fax: 514-398-2203
>>>>> http://www.clumeq.mcgill.ca
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
>>>> For additional commands, e-mail: users-help at gridengine.sunsource.net
>>>>
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
> For additional commands, e-mail: users-help at gridengine.sunsource.net
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe at gridengine.sunsource.net
For additional commands, e-mail: users-help at gridengine.sunsource.net




More information about the gridengine-users mailing list